When performance problems arise, your system might be totally innocent, while the real culprit is buildings away. An easy way to tell if the network is affecting overall performance is to compare those operations that involve the network with those that do not. If you are running a program that does a considerable amount of remote reads and writes and it is running slowly, but everything else seems to be running as usual, then it is probably a network problem. Some of the potential network bottlenecks can be caused by the following:
Several tools can measure network statistics and give a variety of information, but only part of this information is related to performance tuning.
To enhance performance, you can use the no (network options) command and the nfso command for tuning NFS options. You can also use the chdev and ifconfig commands to change system and network parameters.
The ping command is useful for the following:
Some ping command options relevant to performance tuning are as follows:
If you need to load your network or systems, the -f option is convenient. For example, if you suspect that your problem is caused by a heavy load, load your environment intentionally to confirm your suspicion. Open several aixterm windows and run the ping -f command in each window. Your Ethernet utilization quickly gets to around 100 percent. The following is an example:
# date ; ping -c 1000 -f wave ; date Fri Jul 23 11:52:39 CDT 1999 PING wave.austin.ibm.com: (9.53.153.120): 56 data bytes . ----wave.austin.ibm.com PING Statistics---- 1000 packets transmitted, 1000 packets received, 0% packet loss round-trip min/avg/max = 1/1/23 ms Fri Jul 23 11:52:42 CDT 1999
Note: This command can be very hard on a network and should be used with caution. Flood-pinging can only be performed by the root user.
In this example, 1000 packets were sent for 3 seconds. Be aware that this command uses IP and Internet Control Message Protocol (ICMP) protocol and therefore, no transport protocol (UDP/TCP) and application activities are involved. The measured data, such as round-trip time, does not reflect the total performance characteristics.
When you try to send a flood of packets to your destination, consider several points:
You can use the ftp command to send a very large file by using /dev/zero as input and /dev/null as output. This allows you to transfer a large file without involving disks (which might be a bottleneck) and without having to cache the entire file in memory.
Use the following ftp subcommands (change count to increase or decrease the number of blocks read by the dd command):
> bin > put "|dd if=/dev/zero bs=32k count=10000" /dev/null
Remember, if you change the TCP send or receive space parameters, then for the ftp command, you must refresh the inetd daemon with the refresh -s inetd command.
Make sure that tcp_senspace and tcp_recvspace are at least 65535 for the Gigabit Ethernet "jumbo frames" and for the ATM with MTU 9180 or larger to get good performance due to larger MTU size.
An example to set the parameters is as follows:
# no -o tcp_sendspace=65535 # no -o tcp_recvspace=65535 # refresh -s inetd 0513-095 The request for subsystem refresh was completed successfully.
The ftp subcommands are as follows:
ftp> bin 200 Type set to I. ftp> put "|dd if=/dev/zero bs=32k count=10000" /dev/null 200 PORT command successful. 150 Opening data connection for /dev/null. 10000+0 records in 10000+0 records out 226 Transfer complete. 327680000 bytes sent in 8.932 seconds (3.583e+04 Kbytes/s) local: |dd if=/dev/zero bs=32k count=10000 remote: /dev/null ftp> quit 221 Goodbye.
The netstat command is used to show network status. Traditionally, it is used more for problem determination than for performance measurement. However, the netstat command can be used to determine the amount of traffic on the network to ascertain whether performance problems are due to network congestion.
The netstat command displays information regarding traffic on the configured network interfaces, such as the following:
The netstat command displays the contents of various network-related data structures for active connections. In this chapter, only the options and output fields that are relevant for network performance determinations are discussed. For all other options and columns, see the AIX 5L Version 5.1 Commands Reference.
Shows the state of all configured interfaces.
The following example shows the statistics for a workstation with an integrated Ethernet and a Token-Ring adapter:
# netstat -i Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll lo0 16896 <Link> 144834 0 144946 0 0 lo0 16896 127 localhost 144834 0 144946 0 0 tr0 1492 <Link>10.0.5a.4f.3f.61 658339 0 247355 0 0 tr0 1492 9.3.1 ah6000d 658339 0 247355 0 0 en0 1500 <Link>8.0.5a.d.a2.d5 0 0 112 0 0 en0 1500 1.2.3 1.2.3.4 0 0 112 0 0
The count values are summarized since system startup.
Note: The netstat -i command does not support the collision count for Ethernet interfaces (see The entstat Command for Ethernet statistics).
Following are some tuning guidelines:
Ierrs > 0.01 x Ipkts
Then run the netstat -m command to check for a lack of memory.
Oerrs > 0.01 x Opkts
Then increase the send queue size (xmt_que_size) for that interface. The size of the xmt_que_size could be checked with the following command:
# lsattr -El adapter
Coll / Opkts > 0.1
Then there is a high network utilization, and a reorganization or partitioning may be necessary. Use the netstat -v or entstat command to determine the collision rate.
This function of the netstat command clears all the statistic counters for the netstat -i command to zero.
Displays the statistics for the specified interface. It offers information similar to the netstat -i command for the specified interface and reports it for a given time interval. For example:
# netstat -I en0 1 input (en0) output input (Total) output packets errs packets errs colls packets errs packets errs colls 0 0 27 0 0 799655 0 390669 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 78 0 254 0 0 0 0 0 0 0 200 0 62 0 0 0 0 1 0 0 0 0 2 0 0
The previous example shows the netstat -I command output for the ent0 interface. Two reports are generated side by side, one for the specified interface and one for all available interfaces (Total). The fields are similar to the ones in the netstat -i example, input packets = Ipkts, input errs = Ierrs and so on.
Displays the statistics recorded by the mbuf memory-management routines. The most useful statistics in the output of the netstat -m command are the counters that show the requests for mbufs denied and non-zero values in the failed column. If the requests for mbufs denied is not displayed, then this must be an SMP system running operating system version 4.3.2 or later; for performance reasons, global statistics are turned off by default. To enable the global statistics, set the no parameter extended_netstats to 1. This can be done by changing the /etc/rc.net file and rebooting the system.
The following example shows the first part of the netstat -m output with extended_netstats set to 1:
# netstat -m 29 mbufs in use: 16 mbuf cluster pages in use 71 Kbytes allocated to mbufs 0 requests for mbufs denied 0 calls to protocol drain routines Kernel malloc statistics: ******* CPU 0 ******* By size inuse calls failed delayed free hiwat freed 32 419 544702 0 0 221 800 0 64 173 22424 0 0 19 400 0 128 121 37130 0 0 135 200 4 256 1201 118326233 0 0 239 480 138 512 330 671524 0 0 14 50 54 1024 74 929806 0 0 82 125 2 2048 384 1820884 0 0 8 125 5605 4096 516 1158445 0 0 46 150 21 8192 9 5634 0 0 1 12 27 16384 1 2953 0 0 24 30 41 32768 1 1 0 0 0 1023 0 By type inuse calls failed delayed memuse memmax mapb Streams mblk statistic failures: 0 high priority mblk failures 0 medium priority mblk failures 0 low priority mblk failures
If global statistics are not on and you want to determine the total number of requests for mbufs denied, add up the values under the failed columns for each CPU. If the netstat -m command indicates that requests for mbufs or clusters have failed or been denied, then you may want to increase the value of thewall by using the no -o thewall=NewValue command. See Overview of the mbuf Management Facility for additional details about the use of thewall and maxmbuf.
Beginning with AIX 4.3.3, a delayed column was added. If the requester of an mbuf specified the M_WAIT flag, then if an mbuf was not available, the thread is put to sleep until an mbuf is freed and can be used by this thread. The failed counter is not incremented in this case; instead, the delayed column will be incremented. Prior to AIX 4.3.3, the failed counter was also not incremented, but there was no delayed column.
Also, if the currently allocated amount of network memory is within 85 percent of thewall, you may want to increase thewall. If the value of thewall is increased, use the vmstat command to monitor total memory use to determine if the increase has had a negative impact on overall memory performance.
If buffers are not available when a request is received, the request is most likely lost (to see if the adapter actually dropped a package, see Adapter Statistics). Keep in mind that if the requester of the mbuf specified that it could wait for the mbuf if not available immediately, this puts the requestor to sleep but does not count as a request being denied.
If the number of failed requests continues to increase, the system might have an mbuf leak. To help track down the problem, the no command parameter net_malloc_police can be set to 1, and the trace hook with ID 254 can be used with the trace command.
After an mbuf/cluster is allocated and pinned, it can be freed by the application. Instead of unpinning this buffer and giving it back to the system, it is left on a free-list based on the size of this buffer. The next time that a buffer is requested, it can be taken off this free-list to avoid the overhead of pinning. After the number of buffers on the free list reaches the highwater mark, buffers smaller than 4096 will be coalesced together into page-sized units so that they can be unpinned and given back to the system. When the buffers are given back to the system, the freed column is incremented. If the freed value consistently increases, the highwater mark is too low. In AIX 4.3.2 and later, the highwater mark is scaled according to the amount of RAM on the system.
The netstat -v command displays the statistics for each Common Data Link Interface (CDLI)-based device driver that is in operation. Interface-specific reports can be requested using the tokstat, entstat, fddistat, or atmstat commands.
Every interface has its own specific information and some general information. The following example shows the Token-Ring and Ethernet part of the netstat -v command; other interface parts are similar. With a different adapter, the statistics will differ somewhat. The most important output fields are highlighted.
# netstat -v ------------------------------------------------------------- ETHERNET STATISTICS (ent0) : Device Type: IBM 10/100 Mbps Ethernet PCI Adapter (23100020) Hardware Address: 00:60:94:e9:29:18 Elapsed Time: 9 days 19 hours 5 minutes 51 seconds Transmit Statistics: Receive Statistics: -------------------- ------------------- Packets: 0 Packets: 0 Bytes: 0 Bytes: 0 Interrupts: 0 Interrupts: 0 Transmit Errors: 0 Receive Errors: 0 Packets Dropped: 0 Packets Dropped: 0 Bad Packets: 0 Max Packets on S/W Transmit Queue: 0 S/W Transmit Queue Overflow: 0 Current S/W+H/W Transmit Queue Length: 0 Broadcast Packets: 0 Broadcast Packets: 0 Multicast Packets: 0 Multicast Packets: 0 No Carrier Sense: 0 CRC Errors: 0 DMA Underrun: 0 DMA Overrun: 0 Lost CTS Errors: 0 Alignment Errors: 0 Max Collision Errors: 0 No Resource Errors: 0 Late Collision Errors: 0 Receive Collision Errors: 0 Deferred: 0 Packet Too Short Errors: 0 SQE Test: 0 Packet Too Long Errors: 0 Timeout Errors: 0 Packets Discarded by Adapter: 0 Single Collision Count: 0 Receiver Start Count: 0 Multiple Collision Count: 0 Current HW Transmit Queue Length: 0 General Statistics: ------------------- No mbuf Errors: 0 Adapter Reset Count: 0 Driver Flags: Up Broadcast Running Simplex 64BitSupport PrivateSegment IBM 10/100 Mbps Ethernet PCI Adapter Specific Statistics: ------------------------------------------------ Chip Version: 25 RJ45 Port Link Status : down Media Speed Selected: 10 Mbps Half Duplex Media Speed Running: Unknown Receive Pool Buffer Size: 384 Free Receive Pool Buffers: 128 No Receive Pool Buffer Errors: 0 Inter Packet Gap: 96 Adapter Restarts due to IOCTL commands: 0 Packets with Transmit collisions: 1 collisions: 0 6 collisions: 0 11 collisions: 0 2 collisions: 0 7 collisions: 0 12 collisions: 0 3 collisions: 0 8 collisions: 0 13 collisions: 0 4 collisions: 0 9 collisions: 0 14 collisions: 0 5 collisions: 0 10 collisions: 0 15 collisions: 0 Excessive deferral errors: 0x0 ------------------------------------------------------------- TOKEN-RING STATISTICS (tok0) : Device Type: IBM PCI Tokenring Adapter (14103e00) Hardware Address: 00:20:35:7a:12:8a Elapsed Time: 29 days 18 hours 3 minutes 47 seconds Transmit Statistics: Receive Statistics: -------------------- ------------------- Packets: 1355364 Packets: 55782254 Bytes: 791555422 Bytes: 6679991641 Interrupts: 902315 Interrupts: 55782192 Transmit Errors: 0 Receive Errors: 1 Packets Dropped: 0 Packets Dropped: 0 Bad Packets: 0 Max Packets on S/W Transmit Queue: 182 S/W Transmit Queue Overflow: 42 Current S/W+H/W Transmit Queue Length: 0 Broadcast Packets: 18878 Broadcast Packets: 54615793 Multicast Packets: 0 Multicast Packets: 569 Timeout Errors: 0 Receive Congestion Errors: 0 Current SW Transmit Queue Length: 0 Current HW Transmit Queue Length: 0 General Statistics: ------------------- No mbuf Errors: 0 Lobe Wire Faults: 0 Abort Errors: 12 AC Errors: 0 Burst Errors: 1 Frame Copy Errors: 0 Frequency Errors: 0 Hard Errors: 0 Internal Errors: 0 Line Errors: 0 Lost Frame Errors: 0 Only Station: 1 Token Errors: 0 Remove Received: 0 Ring Recovered: 17 Signal Loss Errors: 0 Soft Errors: 35 Transmit Beacon Errors: 0 Driver Flags: Up Broadcast Running AlternateAddress 64BitSupport ReceiveFunctionalAddr 16 Mbps IBM PCI Tokenring Adapter (14103e00) Specific Statistics: --------------------------------------------------------- Media Speed Running: 16 Mbps Half Duplex Media Speed Selected: 16 Mbps Full Duplex Receive Overruns : 0 Transmit Underruns : 0 ARI/FCI errors : 0 Microcode level on the adapter :001PX11B2 Num pkts in priority sw tx queue : 0 Num pkts in priority hw tx queue : 0 Open Firmware Level : 001PXRS02
The highlighted fields are described as follows:
Number of output/input errors encountered on this device. This field counts unsuccessful transmissions due to hardware/network errors.
These unsuccessful transmissions could also slow down the performance of the system.
Maximum number of outgoing packets ever queued to the software transmit queue.
An indication of an inadequate queue size is if the maximal transmits queued equals the current queue size (xmt_que_size). This indicates that the queue was full at some point.
To check the current size of the queue, use the lsattr -El adapter command (where adapter is, for example, tok0 or ent0). Because the queue is associated with the device driver and adapter for the interface, use the adapter name, not the interface name. Use the SMIT or the chdev command to change the queue size.
Number of outgoing packets that have overflowed the software transmit queue. A value other than zero requires the same actions as would be needed if the Max Packets on S/W Transmit Queue reaches the xmt_que_size. The transmit queue size must be increased.
Number of broadcast packets received without any error.
If the value for broadcast packets is high, compare it with the total received packets. The received broadcast packets should be less than 20 percent of the total received packets. If it is high, this could be an indication of a high network load; use multicasting. The use of IP multicasting enables a message to be transmitted to a group of hosts, instead of having to address and send the message to each group member individually.
The DMA Overrun statistic is incremented when the adapter is using DMA to put a packet into system memory and the transfer is not completed. There are system buffers available for the packet to be placed into, but the DMA operation failed to complete. This occurs when the MCA bus is too busy for the adapter to be able to use DMA for the packets. The location of the adapter on the bus is crucial in a heavily loaded system. Typically an adapter in a lower slot number on the bus, by having the higher bus priority, is using so much of the bus that adapters in higher slot numbers are not being served. This is particularly true if the adapters in a lower slot number are ATM or SSA adapters.
Number of unsuccessful transmissions due to too many collisions. The number of collisions encountered exceeded the number of retries on the adapter.
Number of unsuccessful transmissions due to the late collision error.
Number of unsuccessful transmissions due to adapter reported timeout errors.
Number of outgoing packets with single (only one) collision encountered during transmission.
Number of outgoing packets with multiple (2 - 15) collisions encountered during transmission.
Number of incoming packets with collision errors during reception.
Number of times that mbufs were not available to the device driver. This usually occurs during receive operations when the driver must obtain memory buffers to process inbound packets. If the mbuf pool for the requested size is empty, the packet will be discarded. Use the netstat -m command to confirm this, and increase the parameter thewall.
The No mbuf Errors value is interface-specific and not identical to the requests for mbufs denied from the netstat -m output. Compare the values of the example for the commands netstat -m and netstat -v (Ethernet and Token-Ring part).
To determine network performance problems, check for any Error counts in the netstat -v output.
Additional guidelines:
(Max Collision Errors + Timeouts Errors) / Transmit Packets
If the result is greater than 5 percent, reorganize the network to balance the load.
If the total number of collisions from the netstat -v output (for Ethernet) is greater than 10 percent of the total transmitted packets, as follows:
Number of collisions / Number of Transmit Packets > 0.1
Shows statistics about the value specified for the protocol variable (udp, tcp, ip, icmp), which is either a well-known name for a protocol or an alias for it. Some protocol names and aliases are listed in the /etc/protocols file. A null response indicates that there are no numbers to report. If there is no statistics routine for it, the program report of the value specified for the protocol variable is unknown.
The following example shows the output for the ip protocol:
# netstat -p ip ip: : 491351 total packets received 0 bad header checksums 0 with size smaller than minimum 0 with data size < data length 0 with header length < data size 0 with data length < header length 0 with bad options 0 with incorrect version number 25930 fragments received 0 fragments dropped (dup or out of space) 0 fragments dropped after timeout 12965 packets reassembled ok 475054 packets for this host 0 packets for unknown/unsupported protocol 0 packets forwarded 3332 packets not forwardable 0 redirects sent 405650 packets sent from this host 0 packets sent with fabricated ip header 0 output packets dropped due to no bufs, etc. 0 output packets discarded due to no route 5498 output datagrams fragmented 10996 fragments created 0 datagrams that can't be fragmented 0 IP Multicast packets dropped due to no receiver 0 ipintrq overflows
The highlighted fields are described as follows:
Number of total IP datagrams received.
If the output shows bad header checksum or fragments dropped due to dup or out of space, this indicates either a network that is corrupting packets or device driver receive queues that are not large enough.
Number of total fragments received.
If the fragments dropped after timeout is other than zero, then the time to life counter of the ip fragments expired due to a busy network before all fragments of the datagram arrived. To avoid this, use the no command to increase the value of the ipfragttl network parameter. Another reason could be a lack of mbufs; increase thewall.
Number of IP datagrams that were created and sent out from this system. This counter does not include the forwarded datagrams (passthrough traffic).
Number of fragments created in this system when IP datagrams were sent out.
When viewing IP statistics, look at the ratio of packets received to fragments received. As a guideline for small MTU networks, if 10 percent or more of the packets are getting fragmented, you should investigate further to determine the cause. A large number of fragments indicates that protocols above the IP layer on remote hosts are passing data to IP with data sizes larger than the MTU for the interface. Gateways/routers in the network path might also have a much smaller MTU size than the other nodes in the network. The same logic can be applied to packets sent and fragments created.
Fragmentation results in additional CPU overhead so it is important to determine its cause. Be aware that some applications, by their very nature, can cause fragmentation to occur. For example, an application that sends small amounts of data can cause fragments to occur. However, if you know the application is sending large amounts of data and fragmentation is still occurring, determine the cause. It is likely that the MTU size used is not the MTU size configured on the systems.
The following example shows the output for the udp protocol:
# netstat -p udp udp: 11521194 datagrams received 0 incomplete headers 0 bad data length fields 0 bad checksums 16532 dropped due to no socket 232850 broadcast/multicast datagrams dropped due to no socket 77 socket buffer overflows 11271735 delivered 796547 datagrams output
Statistics of interest are:
Bad checksums could happen due to hardware card or cable failure.
Number of received UDP datagrams of that destination socket ports were not opened. As a result, the ICMP Destination Unreachable - Port Unreachable message must have been sent out. But if the received UDP datagrams were broadcast datagrams, ICMP errors are not generated. If this value is high, investigate how the application is handling sockets.
Socket buffer overflows could be due to insufficient transmit and receive UDP sockets, too few nfsd daemons, or too small nfs_socketsize, udp_recvspace and sb_max values.
If the netstat -p udp command indicates socket overflows, then you might need to increase the number of the nfsd daemons on the server. First, check the affected system for CPU or I/O saturation, and verify the recommended setting for the other communication layers by using the no -a command. If the system is saturated, you must either to reduce its load or increase its resources.
The following example shows the output for the tcp protocol:
# netstat -p tcp tcp: 63726 packets sent 34309 data packets (6482122 bytes) 198 data packets (161034 bytes) retransmitted 17437 ack-only packets (7882 delayed) 0 URG only packets 0 window probe packets 3562 window update packets 8220 control packets 71033 packets received 35989 acks (for 6444054 bytes) 2769 duplicate acks 0 acks for unsent data 47319 packets (19650209 bytes) received in-sequence 182 completely duplicate packets (29772 bytes) 4 packets with some dup. data (1404 bytes duped) 2475 out-of-order packets (49826 bytes) 0 packets (0 bytes) of data after window 0 window probes 800 window update packets 77 packets received after close 0 packets with bad hardware assisted checksum 0 discarded for bad checksums 0 discarded for bad header offset fields 0 connection request 3125 connection requests 1626 connection accepts 4731 connections established (including accepts) 5543 connections closed (including 31 drops) 62 embryonic connections dropped 38552 segments updated rtt (of 38862 attempts) 0 resends due to path MTU discovery 3 path MTU discovery terminations due to retransmits 553 retransmit timeouts 28 connections dropped by rexmit timeout 0 persist timeouts 464 keepalive timeouts 26 keepalive probes sent 1 connection dropped by keepalive 0 connections in timewait reused 0 delayed ACKs for SYN 0 delayed ACKs for FIN 0 send_and_disconnects
Statistics of interest are:
For the TCP statistics, compare the number of packets sent to the number of data packets retransmitted. If the number of packets retransmitted is over 10-15 percent of the total packets sent, TCP is experiencing timeouts indicating that network traffic may be too high for acknowledgments (ACKs) to return before a timeout. A bottleneck on the receiving node or general network problems can also cause TCP retransmissions, which will increase network traffic, further adding to any network performance problems.
Also, compare the number of packets received with the number of completely duplicate packets. If TCP on a sending node times out before an ACK is received from the receiving node, it will retransmit the packet. Duplicate packets occur when the receiving node eventually receives all the retransmitted packets. If the number of duplicate packets exceeds 10-15 percent, the problem may again be too much network traffic or a bottleneck at the receiving node. Duplicate packets increase network traffic.
The value for retransmit timeouts occurs when TCP sends a packet but does not receive an ACK in time. It then resends the packet. This value is incremented for any subsequent retransmittals. These continuous retransmittals drive CPU utilization higher, and if the receiving node does not receive the packet, it eventually will be dropped.
The netstat -s command shows statistics for each protocol (while the netstat -p command shows the statistics for the specified protocol).
The undocumented -s -s option shows only those lines of the netstat -s output that are not zero, making it easier to look for error counts.
This is an undocumented function of the netstat command. It clears all the statistic counters for the netstat -s command to zero.
Another option relevant to performance is the display of the discovered Path Maximum Transmission Unit (PMTU).
For two hosts communicating across a path of multiple networks, a transmitted packet will become fragmented if its size is greater than the smallest MTU of any network in the path. Because packet fragmentation can result in reduced network performance, it is desirable to avoid fragmentation by transmitting packets with a size no larger than the smallest MTU in the network path. This size is called the path MTU.
Use the netstat -r command to display this value. In the following is example the netstat -r -f inet command is used to display only the routing tables:
# netstat -r -f inet Routing tables Destination Gateway Flags Refs Use PMTU If Exp Groups Route Tree for Protocol Family 2: default itsorusi UGc 1 348 - tr0 - 9.3.1 sv2019e Uc 25 12504 - tr0 - itsonv sv2019e UHW 0 235 - tr0 - itsorusi sv2019e UHW 1 883 1492 tr0 - ah6000d sv2019e UHW 1 184 1492 tr0 - ah6000e sv2019e UHW 0 209 - tr0 - sv2019e sv2019e UHW 4 11718 1492 tr0 - coyote.ncs.mainz itsorusi UGHW 1 45 1492 tr0 - kresna.id.ibm.co itsorusi UGHW 0 14 1492 tr0 - 9.184.104.111 kresna.id.ibm.com UGc 0 5 - tr0 - 127 localhost U 3 96 - lo0 -
The -D option allows you to see packets coming into and going out of each layer in the communications subsystem along, with packets dropped at each layer.
# netstat -D Source Ipkts Opkts Idrops Odrops ------------------------------------------------------------------------------- tok_dev0 19333058 402225 3 0 ent_dev0 0 0 0 0 --------------------------------------------------------------- Devices Total 19333058 402225 3 0 ------------------------------------------------------------------------------- tok_dd0 19333055 402225 0 0 ent_dd0 0 0 0 0 --------------------------------------------------------------- Drivers Total 19333055 402225 0 0 ------------------------------------------------------------------------------- tok_dmx0 796966 N/A 18536091 N/A ent_dmx0 0 N/A 0 N/A --------------------------------------------------------------- Demuxer Total 796966 N/A 18536091 N/A ------------------------------------------------------------------------------- IP 694138 677411 7651 6523 TCP 143713 144247 0 0 UDP 469962 266726 0 812 --------------------------------------------------------------- Protocols Total 1307813 1088384 7651 7335 ------------------------------------------------------------------------------- lo_if0 22088 22887 799 0 tr_if0 796966 402227 0 289 --------------------------------------------------------------- Net IF Total 819054 425114 799 289 ------------------------------------------------------------------------------- --------------------------------------------------------------- NFS/RPC Total N/A 1461 0 0 ------------------------------------------------------------------------------- (Note: N/A -> Not Applicable)
The Devices layer shows number of packets coming into the adapter, going out of the adapter, and number of packets dropped on input and output. There are various causes of adapter errors, and the netstat -v command can be examined for more details.
The Drivers layer shows packet counts handled by the device driver for each adapter. Output of the netstat -v command is useful here to determine which errors are counted.
The Demuxer values show packet counts at the demux layer, and Idrops here usually indicate that filtering has caused packets to be rejected (for example, Netware or DecNet packets being rejected because these are not handled by the system under examination).
Details for the Protocols layer can be seen in the output of the netstat -s command.
Note: In the statistics output, a N/A displayed in a field value indicates the count is not applicable. For the NFS/RPC statistics, the number of incoming packets that pass through RPC are the same packets which pass through NFS, so these numbers are not summed in the NFS/RPC Total field, hence the N/A. NFS has no outgoing packet or outgoing packet drop counters specific to NFS and RPC. Therefore, individual counts have a field value of N/A, and the cumulative count is stored in the NFS/RPC Total field.
The netpmon command uses the trace facility to obtain a detailed picture of network activity during a time interval. Because it uses the trace facility, the netpmon command can be run only by a root user or by a member of the system group.
Also, the netpmon command cannot run together with any of the other trace-based performance commands such as tprof and filemon. In its usual mode, the netpmon command runs in the background while one or more application programs or system commands are being executed and monitored.
The netpmon command focuses on the following system activities:
The following will be computed:
To determine whether the netpmon command is installed and available, run the following command:
# lslpp -lI perfagent.tools
Tracing is started by the netpmon command, optionally suspended with the trcoff subcommand and resumed with the trcon subcommand, and terminated with the trcstop subcommand. As soon as tracing is terminated, the netpmon command writes its report to standard output.
The netpmon command will start tracing immediately unless the -d option is used. Use the trcstop command to stop tracing. At that time, all the specified reports are generated, and the netpmon command exits. In the client-server environment, use the netpmon command to view how networking affects the overall performance. It can be run on both client and server.
The netpmon command can read the I/O trace data from a specified file, instead of from the real-time trace process. In this case, the netpmon report summarizes the network activity for the system and period represented by the trace file. This offline processing method is useful when it is necessary to postprocess a trace file from a remote machine or perform the trace data collection at one time and postprocess it at another time.
The trcrpt -r command must be executed on the trace logfile and redirected to another file, as follows:
# gennames > gennames.out # trcrpt -r trace.out > trace.rpt
At this point, an adjusted trace logfile is fed into the netpmon command to report on I/O activity captured by a previously recorded trace session as follows:
# netpmon -i trace.rpt -n gennames.out | pg
In this example, the netpmon command reads file system trace events from the trace.rpt input file. Because the trace data is already captured on a file, the netpmon command does not put itself in the background to allow application programs to be run. After the entire file is read, a network activity report will be displayed on standard output (which, in this example, is piped to the pg command).
If the trace command was run with the -C all flag, then run the trcrpt command also with the -C all flag (see Formatting a Report from trace -C Output).
The following netpmon command running on an NFS server executes the sleep command and creates a report after 400 seconds. During the measured interval, a copy to an NFS-mounted file system /nfs_mnt is taking place.
# netpmon -o netpmon.out -O all; sleep 400; trcstop
With the -O option, you can specify the report type to be generated. Valid report type values are:
# cat netpmon.out Thu Jan 21 15:02:45 2000 System: AIX itsosmp Node: 4 Machine: 00045067A000 401.053 secs in measured interval ======================================================================== Process CPU Usage Statistics: ----------------------------- Network Process (top 20) PID CPU Time CPU % CPU % ---------------------------------------------------------- nfsd 12370 42.2210 2.632 2.632 nfsd 12628 42.0056 2.618 2.618 nfsd 13144 41.9540 2.615 2.615 nfsd 12886 41.8680 2.610 2.610 nfsd 12112 41.4114 2.581 2.581 nfsd 11078 40.9443 2.552 2.552 nfsd 11854 40.6198 2.532 2.532 nfsd 13402 40.3445 2.515 2.515 lrud 1548 16.6294 1.037 0.000 netpmon 15218 5.2780 0.329 0.000 gil 2064 2.0766 0.129 0.129 trace 18284 1.8820 0.117 0.000 syncd 3602 0.3757 0.023 0.000 swapper 0 0.2718 0.017 0.000 init 1 0.2201 0.014 0.000 afsd 8758 0.0244 0.002 0.000 bootpd 7128 0.0220 0.001 0.000 ksh 4322 0.0213 0.001 0.000 pcimapsvr.ip 16844 0.0204 0.001 0.000 netm 1806 0.0186 0.001 0.001 ---------------------------------------------------------- Total (all processes) 358.3152 22.336 20.787 Idle time 1221.0235 76.114 ======================================================================== First Level Interrupt Handler CPU Usage Statistics: --------------------------------------------------- Network FLIH CPU Time CPU % CPU % ---------------------------------------------------------- PPC decrementer 9.9419 0.620 0.000 external device 4.5849 0.286 0.099 UNKNOWN 0.1716 0.011 0.000 data page fault 0.1080 0.007 0.000 floating point 0.0012 0.000 0.000 instruction page fault 0.0007 0.000 0.000 ---------------------------------------------------------- Total (all FLIHs) 14.8083 0.923 0.099 ======================================================================== Second Level Interrupt Handler CPU Usage Statistics: ---------------------------------------------------- Network SLIH CPU Time CPU % CPU % ---------------------------------------------------------- tokdd 12.4312 0.775 0.775 ascsiddpin 0.5178 0.032 0.000 ---------------------------------------------------------- Total (all SLIHs) 12.9490 0.807 0.775 ======================================================================== Network Device-Driver Statistics (by Device): --------------------------------------------- ----------- Xmit ----------- -------- Recv --------- Device Pkts/s Bytes/s Util QLen Pkts/s Bytes/s Demux ------------------------------------------------------------------------------ token ring 0 31.61 4800 1.7% 0.046 200.93 273994 0.0080 ======================================================================== Network Device-Driver Transmit Statistics (by Destination Host): ---------------------------------------------------------------- Host Pkts/s Bytes/s ---------------------------------------- ah6000c 31.57 4796 9.3.1.255 0.03 4 itsorusi 0.00 0 ======================================================================== TCP Socket Call Statistics (by Process): ---------------------------------------- ------ Read ----- ----- Write ----- Process (top 20) PID Calls/s Bytes/s Calls/s Bytes/s ------------------------------------------------------------------------ telnetd 18144 0.03 123 0.06 0 ------------------------------------------------------------------------ Total (all processes) 0.03 123 0.06 0 ======================================================================== NFS Server Statistics (by Client): ---------------------------------- ------ Read ----- ----- Write ----- Other Client Calls/s Bytes/s Calls/s Bytes/s Calls/s ------------------------------------------------------------------------ ah6000c 0.00 0 31.54 258208 0.01 ------------------------------------------------------------------------ Total (all clients) 0.00 0 31.54 258208 0.01 ======================================================================== Detailed Second Level Interrupt Handler CPU Usage Statistics: ------------------------------------------------------------- SLIH: tokdd count: 93039 cpu time (msec): avg 0.134 min 0.026 max 0.541 sdev 0.051 SLIH: ascsiddpin count: 8136 cpu time (msec): avg 0.064 min 0.012 max 0.147 sdev 0.018 COMBINED (All SLIHs) count: 101175 cpu time (msec): avg 0.128 min 0.012 max 0.541 sdev 0.053 ======================================================================== Detailed Network Device-Driver Statistics: ------------------------------------------ DEVICE: token ring 0 recv packets: 80584 recv sizes (bytes): avg 1363.6 min 50 max 1520 sdev 356.3 recv times (msec): avg 0.081 min 0.010 max 0.166 sdev 0.020 demux times (msec): avg 0.040 min 0.008 max 0.375 sdev 0.040 xmit packets: 12678 xmit sizes (bytes): avg 151.8 min 52 max 184 sdev 3.3 xmit times (msec): avg 1.447 min 0.509 max 4.514 sdev 0.374 ======================================================================== Detailed Network Device-Driver Transmit Statistics (by Host): ------------------------------------------------------------- HOST: ah6000c xmit packets: 12662 xmit sizes (bytes): avg 151.9 min 52 max 184 sdev 2.9 xmit times (msec): avg 1.448 min 0.509 max 4.514 sdev 0.373 HOST: 9.3.1.255 xmit packets: 14 xmit sizes (bytes): avg 117.0 min 117 max 117 sdev 0.0 xmit times (msec): avg 1.133 min 0.884 max 1.730 sdev 0.253 HOST: itsorusi xmit packets: 1 xmit sizes (bytes): avg 84.0 min 84 max 84 sdev 0.0 xmit times (msec): avg 0.522 min 0.522 max 0.522 sdev 0.000 ======================================================================== Detailed TCP Socket Call Statistics (by Process): ------------------------------------------------- PROCESS: telnetd PID: 18144 reads: 12 read sizes (bytes): avg 4096.0 min 4096 max 4096 sdev 0.0 read times (msec): avg 0.085 min 0.053 max 0.164 sdev 0.027 writes: 23 write sizes (bytes): avg 3.5 min 1 max 26 sdev 7.0 write times (msec): avg 0.143 min 0.067 max 0.269 sdev 0.064 PROTOCOL: TCP (All Processes) reads: 12 read sizes (bytes): avg 4096.0 min 4096 max 4096 sdev 0.0 read times (msec): avg 0.085 min 0.053 max 0.164 sdev 0.027 writes: 23 write sizes (bytes): avg 3.5 min 1 max 26 sdev 7.0 write times (msec): avg 0.143 min 0.067 max 0.269 sdev 0.064 ======================================================================== Detailed NFS Server Statistics (by Client): ------------------------------------------- CLIENT: ah6000c writes: 12648 write sizes (bytes): avg 8187.5 min 4096 max 8192 sdev 136.2 write times (msec): avg 138.646 min 0.147 max 1802.067 sdev 58.853 other calls: 5 other times (msec): avg 1.928 min 0.371 max 8.065 sdev 3.068 COMBINED (All Clients) writes: 12648 write sizes (bytes): avg 8187.5 min 4096 max 8192 sdev 136.2 write times (msec): avg 138.646 min 0.147 max 1802.067 sdev 58.853 other calls: 5 other times (msec): avg 1.928 min 0.371 max 8.065 sdev 3.068
The output of the netpmon command is composed of two different types of reports: global and detailed. The global reports list statistics as follows:
The global reports are shown at the beginning of the netpmon output and are the occurrences during the measured interval. The detailed reports provide additional information for the global reports. By default, the reports are limited to the 20 most active statistics measured. All information in the reports is listed from top to bottom as most active to least active.
The reports generated by the netpmon command begin with a header, which identifies the date, the machine ID, and the length of the monitoring period in seconds. The header is followed by a set of global and detailed reports for all specified report types.
Each row describes the CPU usage associated with a process. Unless the verbose (-v) option is specified, only the 20 most active processes are included in the list. At the bottom of the report, CPU usage for all processes is totaled, and CPU idle time is reported. The idle time percentage number is calculated from the idle time divided by the measured interval. The difference between the CPU time totals and measured interval is due to Interrupt handlers.
The Network CPU % is the percentage of total time that this process spent executing network-related code.
If the -t flag is used, a thread CPU usage statistic is also present. Each process row described above is immediately followed by rows describing the CPU usage of each thread owned by that process. The fields in these rows are identical to those for the process, except for the name field. Threads are not named.
In the example report, the Idle time percentage number (76.114 percent) shown in the global CPU usage report is calculated from the Idle time (1221.0235) divided by the measured interval times 4 (401.053 times 4), because there are four CPUs in this server. If you want to look at each CPU's activity, you can use sar, ps, or any other SMP-specific command. Similar calculation applies to the total CPU % that is occupied by all processes. The Idle time is due to network I/O. The difference between the CPU Time totals (1221.0235 + 358.315) and the measured interval is due to interrupt handlers and the multiple CPUs. It appears that in the example report, the majority of the CPU usage was network-related: (20.787 / 22.336) = 93.07 percent. About 77.664 percent of CPU usage is either CPU idle or CPU wait time.
Note: If the result of total network CPU % divided by total CPU % is greater than 0.5 from Process CPU Usage Statistics for NFS server, then the majority of CPU usage is network-related.
This method is also a good way to view CPU usage by process without tying the output to a specific program.
Each row describes the CPU usage associated with a first-level interrupt handler (FLIH). At the bottom of the report, CPU usage for all FLIHs is totaled.
Each row describes the CPU usage associated with a second-level interrupt handler (SLIH). At the bottom of the report, CPU usage for all SLIHs is totaled.
Each row describes the statistics associated with a network device.
In this example, the Xmit QLen is only 0.046. This number is very small compared to its default size (30). Its Recv Bytes/s is 273994, much smaller than the Token-Ring transmit speed (16 Mb/s). Therefore, in this case, the network is not saturated, at least from this system's view.
Each row describes the amount of transmit traffic associated with a particular destination host, at the device-driver level.
These statistics are shown for each used Internet protocol. Each row describes the amount of read() and write() subroutine activity on sockets of this protocol type associated with a particular process. At the bottom of the report, all socket calls for this protocol are totaled.
Each row describes the amount of NFS activity handled by this server on behalf of a particular client. At the bottom of the report, calls for all clients are totaled.
On a client machine, the NFS server statistics are replaced by the NFS client statistics (NFS Client Statistics for each Server (by File), NFS Client RPC Statistics (by Server), NFS Client Statistics (by Process)).
Detailed reports are generated for all requested (-O) report types. For these report types, a detailed report is produced in addition to the global reports. The detailed reports contain an entry for each entry in the global reports with statistics for each type of transaction associated with the entry.
Transaction statistics consist of a count of the number of transactions for that type, followed by response time and size distribution data (where applicable). The distribution data consists of average, minimum, and maximum values, as well as standard deviations. Roughly two-thirds of the values are between average minus standard deviation and average plus standard deviation. Sizes are reported in bytes. Response times are reported in milliseconds.
The output fields are described as follows:
The output fields are described as follows:
There are other detailed reports, such as Detailed Network Device-Driver Transmit Statistics (by Host) and Detailed TCP Socket Call Statistics for Each Internet Protocol (by Process). For an NFS client, there are the Detailed NFS Client Statistics for Each Server (by File), Detailed NFS Client RPC Statistics (by Server), and Detailed NFS Client Statistics (by Process) reports. For an NFS server, there is the Detailed NFS Server Statistics (by Client) report. They have similar output fields as explained above.
In the example, the results from the Detailed Network Device-Driver Statistics lead to the following:
As in the global device driver report, you can conclude that this case is not network-saturated. The average receive size is 1363.6 bytes, near to the default MTU (maximum transmission unit) value, which is 1492 when the device is a Token-Ring card. If this value is larger than the MTU (from lsattr -E -l interface, replacing interface with the interface name, such as en0 or tr0, you could change the MTU or adapter transmit-queue length value to get better performance with the following command:
# ifconfig tr0 mtu 8500
OR
# chdev -l 'tok0' -a xmt_que_size='150'
If the network is congested already, changing the MTU or queue value will not help.
Note:
- If transmit and receive packet sizes are small on the device driver statistics report, then increasing the current MTU size will probably result in better network performance.
- If system wait time due to network calls is high from the network wait time statistics for the NFS client report, the poor performance is due to the network.
The netpmon command uses the trace facility to collect the statistics. Therefore, it has an impact on the system workload, as follows.
To alleviate these situations, use offline processing and on systems with many CPUs use the -C all flag with the trace command.
While the ping command confirms IP network reachability, you cannot pinpoint and improve some isolated problems. Consider the following situation:
The traceroute command can inform you where the packet is located and why the route is lost. If your packets must pass through routers and links, which belong to and are managed by other organizations or companies, it is difficult to check the related routers through the telnet command. The traceroute command provides a supplemental role to the ping command.
Note: The traceroute command is intended for use in network testing, measurement, and management. It should be used primarily for manual fault isolation. Because of the load it imposes on the network, do not use the traceroute command during typical operations or from automated scripts.
The traceroute command uses UDP packets and uses the ICMP error-reporting function. It sends a UDP packet three times to each gateway or router on the way. It starts with the nearest gateway and expands the search by one hop. Finally, the search gets to the destination system. In the output, you see the gateway name, the gateway's IP address, and three round-trip times for the gateway. See the following example:
# traceroute wave trying to get source for wave source should be 9.53.155.187 traceroute to wave.austin.ibm.com (9.53.153.120) from 9.53.155.187 (9.53.155.187), 30 hops max outgoing MTU = 1500 1 9.111.154.1 (9.111.154.1) 5 ms 3 ms 2 ms 2 wave (9.53.153.120) 5 ms 5 ms 5 ms
Following is another example:
# traceroute wave trying to get source for wave source should be 9.53.155.187 traceroute to wave.austin.ibm.com (9.53.153.120) from 9.53.155.187 (9.53.155.187), 30 hops max outgoing MTU = 1500 1 9.111.154.1 (9.111.154.1) 10 ms 2 ms 3 ms 2 wave (9.53.153.120) 8 ms 7 ms 5 ms
After the address resolution protocol (ARP) entry expired, the same command was repeated. Note that the first packet to each gateway or destination took a longer round-trip time. This is due to the overhead caused by the ARP. If a public-switched network (WAN) is involved in the route, the first packet consumes a lot of memory due to a connection establishment and may cause a timeout. The default timeout for each packet is 3 seconds. You can change it with the -w option.
The first 10 ms is due to the ARP between the source system (9.53.155.187) and the gateway 9.111.154.1. The second 8 ms is due to the ARP between the gateway and the final destination (wave). In this case, you are using DNS, and every time before the traceroute command sends a packet, the DNS server is searched.
For a long path to your destination or complex network routes, you may see a lot of problems with the traceroute command. Because many things are implementation-dependent, searching for the problem may only waste your time. If all routers or systems involved are under your control, you may be able to investigate the problem completely.
In the following example, packets were sent from the system 9.53.155.187. There are two router systems on the way to the bridge. The routing capability was intentionally removed from the second router system by setting the option ipforwarding of the no command to 0. See the following example:
# traceroute lamar trying to get source for lamar source should be 9.53.155.187 traceroute to lamar.austin.ibm.com (9.3.200.141) from 9.53.155.187 (9.53.155.187), 30 hops max outgoing MTU = 1500 1 9.111.154.1 (9.111.154.1) 12 ms 3 ms 2 ms 2 9.111.154.1 (9.111.154.1) 3 ms !H * 6 ms !H
If an ICMP error message, excluding Time Exceeded and Port Unreachable, is received, it is displayed as follows:
When the destination system does not respond within a 3-second time-out interval, all queries are timed out, and the results are displayed with an asterisk (*).
# traceroute chuys trying to get source for chuys source should be 9.53.155.187 traceroute to chuys.austin.ibm.com (9.53.155.188) from 9.53.155.187 (9.53.155.187), 30 hops max outgoing MTU = 1500 1 * * * 2 * * * 3 * * * ^C#
If you think that the problem is due to a communication link, use a longer timeout period with the -w flag. Although rare, all the ports queried might have been used. You can change the ports and try again.
Another output example might be as follows:
# traceroute mysystem.university.edu (129.2.130.22) traceroute to mysystem.university.edu (129.2.130.22), 30 hops max 1 helios.ee.lbl.gov (129.3.112.1) 0 ms 0 ms 0 ms 2 lilac-dmc.university.edu (129.2.216.1) 39 ms 19 ms 39 ms 3 lilac-dmc.university.edu (129.2.215.1) 19 ms 39 ms 19 ms 4 ccngw-ner-cc.university.edu (129.2.135.23) 39 ms 40 ms 19 ms 5 ccn-nerif35.university.edu (129.2.167.35) 39 ms 39 ms 39 ms 6 csgw/university.edu (129.2.132.254) 39 ms 59 ms 39 ms 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 rip.university.EDU (129.2.130.22) 59 ms! 39 ms! 39 ms!
In this example, exactly half of the 12 gateway hops (13 is the final destination) are "missing." However, these hops were actually not gateways. The destination host used the time to live (ttl) from the arriving datagram as the ttl in its ICMP reply; thus, the reply timed out on the return path. Because ICMPs are not sent for ICMPs, no notice was received. The ! (exclamation mark) after each round-trip time indicates some type of software incompatibility problem. (The cause was diagnosed after the traceroute command issued a probe of twice the path length. The destination host was really only seven hops away.)
You can use many tools for observing network activity. Some run under the operating system, others run on dedicated hardware. One tool that can be used to obtain a detailed, packet-by-packet description of the LAN activity generated by a workload is the combination of the iptrace daemon and the ipreport command. To use the iptrace daemon with operating system version 4, you need the bos.net.tcp.server fileset. The iptrace daemon is included in this fileset, as well as some other useful commands such as the trpt and tcdump commands. The iptrace daemon can only be started by a root user.
By default, the iptrace daemon traces all packets. The option -a allows exclusion of address resolution protocol (ARP) packets. Other options can narrow the scope of tracing to a particular source host (-s), destination host (-d), or protocol (-p). Because the iptrace daemon can consume significant amounts of processor time, be as specific as possible when you describe the packets you want traced.
Because iptrace is a daemon, start the iptrace daemon with the startsrc command rather than directly from the command line. This method makes it easier to control and shut down cleanly. A typical example would be as follows:
# startsrc -s iptrace -a "-i tr0 /home/user/iptrace/log1"
This command starts the iptrace daemon with instructions to trace all activity on the Token-Ring interface, tr0, and place the trace data in /home/user/iptrace/log1. To stop the daemon, use the following:
# stopsrc -s iptrace
If you did not start the iptrace daemon with the startsrc command, you must use the ps command to find its process ID with and terminate it with the kill command.
The ipreport command is a formatter for the log file. Its output is written to standard output. Options allow recognition and formatting of RPC packets (-r), identifying each packet with a number (-n), and prefixing each line with a 3-character string that identifies the protocol (-s). A typical ipreport command to format the log1 file just created (which is owned by the root user) would be as follows:
# ipreport -ns log1 >log1_formatted
This would result in a sequence of packet reports similar to the following examples. The first packet is the first half of a ping packet. The fields of most interest are as follows:
Packet Number 131 TOK: =====( packet transmitted on interface tr0 )=====Fri Jan 14 08:42:07 2000 TOK: 802.5 packet TOK: 802.5 MAC header: TOK: access control field = 0, frame control field = 40 TOK: [ src = 90:00:5a:a8:88:81, dst = 10:00:5a:4f:35:82] TOK: routing control field = 0830, 3 routing segments TOK: routing segments [ ef31 ce61 ba30 ] TOK: 802.2 LLC header: TOK: dsap aa, ssap aa, ctrl 3, proto 0:0:0, type 800 (IP) IP: < SRC = 129.35.145.140 > (alborz.austin.ibm.com) IP: < DST = 129.35.145.135 > (xactive.austin.ibm.com) IP: ip_v=4, ip_hl=20, ip_tos=0, ip_len=84, ip_id=38892, ip_off=0 IP: ip_ttl=255, ip_sum=fe61, ip_p = 1 (ICMP) ICMP: icmp_type=8 (ECHO_REQUEST) icmp_id=5923 icmp_seq=0 ICMP: 00000000 2d088abf 00054599 08090a0b 0c0d0e0f |-.....E.........| ICMP: 00000010 10111213 14151617 18191a1b 1c1d1e1f |................| ICMP: 00000020 20212223 24252627 28292a2b 2c2d2e2f | !"#$%&'()*+,-./| ICMP: 00000030 30313233 34353637 |01234567 |
The next example is a frame from an ftp operation. Note that the IP packet is the size of the MTU for this LAN (1492 bytes).
Packet Number 501 TOK: =====( packet received on interface tr0 )=====Fri Dec 10 08:42:51 1999 TOK: 802.5 packet TOK: 802.5 MAC header: TOK: access control field = 18, frame control field = 40 TOK: [ src = 90:00:5a:4f:35:82, dst = 10:00:5a:a8:88:81] TOK: routing control field = 08b0, 3 routing segments TOK: routing segments [ ef31 ce61 ba30 ] TOK: 802.2 LLC header: TOK: dsap aa, ssap aa, ctrl 3, proto 0:0:0, type 800 (IP) IP: < SRC = 129.35.145.135 > (xactive.austin.ibm.com) IP: < DST = 129.35.145.140 > (alborz.austin.ibm.com) IP: ip_v=4, ip_hl=20, ip_tos=0, ip_len=1492, ip_id=34233, ip_off=0 IP: ip_ttl=60, ip_sum=5ac, ip_p = 6 (TCP) TCP: <source port=20(ftp-data), destination port=1032 > TCP: th_seq=445e4e02, th_ack=ed8aae02 TCP: th_off=5, flags<ACK |> TCP: th_win=15972, th_sum=0, th_urp=0 TCP: 00000000 01df0007 2cd6c07c 00004635 000002c2 |....,..|..F5....| TCP: 00000010 00481002 010b0001 000021b4 00000d60 |.H........!....`| --------- Lots of uninteresting data omitted ----------- TCP: 00000590 63e40000 3860000f 4800177d 80410014 |c...8`..H..}.A..| TCP: 000005a0 82220008 30610038 30910020 |."..0a.80.. |
The ipfilter command extracts different operation headers from an ipreport output file and displays them in a table. Some customized NFS information regarding requests and replies is also provided.
To determine whether the ipfilter command is installed and available, run the following command:
# lslpp -lI perfagent.tools
An example command is as follows:
# ipfilter log1_formatted
The operation headers currently recognized are: udp, nfs, tcp, ipx, icmp. The ipfilter command has three different types of reports, as follows:
The commands in this section provide output comparable to the netstat -v command. They allow you to reset adapter statistics (-r) and to get more detailed output (-d) than the netstat -v command output provides.
The entstat command displays the statistics gathered by the specified Ethernet device driver. The user can optionally specify that the device-specific statistics be displayed in addition to the device-generic statistics. Using the -d option will list any extended statistics for this adapter and should be used to ensure all statistics are displayed. If no flags are specified, only the device-generic statistics are displayed.
The entstat command is also invoked when the netstat command is run with the -v flag. The netstat command does not issue any entstat command flags.
# entstat ent0 ------------------------------------------------------------- ETHERNET STATISTICS (ent0) : Device Type: IBM 10/100 Mbps Ethernet PCI Adapter (23100020) Hardware Address: 00:60:94:e9:29:18 Elapsed Time: 0 days 0 hours 0 minutes 0 seconds Transmit Statistics: Receive Statistics: -------------------- ------------------- Packets: 0 Packets: 0 Bytes: 0 Bytes: 0 Interrupts: 0 Interrupts: 0 Transmit Errors: 0 Receive Errors: 0 Packets Dropped: 0 Packets Dropped: 0 Bad Packets: 0 Max Packets on S/W Transmit Queue: 0 S/W Transmit Queue Overflow: 0 Current S/W+H/W Transmit Queue Length: 0 Broadcast Packets: 0 Broadcast Packets: 0 Multicast Packets: 0 Multicast Packets: 0 No Carrier Sense: 0 CRC Errors: 0 DMA Underrun: 0 DMA Overrun: 0 Lost CTS Errors: 0 Alignment Errors: 0 Max Collision Errors: 0 No Resource Errors: 0 Late Collision Errors: 0 Receive Collision Errors: 0 Deferred: 0 Packet Too Short Errors: 0 SQE Test: 0 Packet Too Long Errors: 0 Timeout Errors: 0 Packets Discarded by Adapter: 0 Single Collision Count: 0 Receiver Start Count: 0 Multiple Collision Count: 0 Current HW Transmit Queue Length: 0 General Statistics: ------------------- No mbuf Errors: 0 Adapter Reset Count: 0 Driver Flags: Up Broadcast Running Simplex 64BitSupport
In the above report, you may want to concentrate on:
Notice in this example, the Ethernet adapter is behaving well because there are no Receive Errors. These errors are sometimes caused when a saturated network only transmits partial packets. The partial packets are eventually retransmitted successfully but are recorded as receive errors.
If you receive S/W Transmit Queue Overflow errors, the value of Max Packets on S/W Transmit Queue will correspond to the transmit queue limit for this adapter (xmt_que_size).
Note: These values can represent the hardware queue if the adapter does not support a software transmit queue. If there are transmit-queue overflows, then increased the hardware or software queue limits for the driver.
If there are not enough receive resources, this would be indicated by Packets Dropped: and depending on the adapter type, would be indicated by Out of Rcv Buffers or No Resource Errors: or some similar counter.
The elapsed time displays the real-time period that has elapsed since the last time the statistics were reset. To reset the statistics, use the entstat -r adapter_name command.
Similar output can be displayed for Token-Ring, FDDI, and ATM interfaces using the tokstat, fddistat, and atmstat commands.
The tokstat command displays the statistics gathered by the specified Token-Ring device driver. The user can optionally specify that the device-specific statistics be displayed in addition to the device driver statistics. If no flags are specified, only the device driver statistics are displayed.
This command is also invoked when the netstat command is run with the -v flag. The netstat command does not issue any tokstat command flags.
The output produced by the tokstat tok0 command and the problem determination are similar to that described in The entstat Command.
The fddistat command displays the statistics gathered by the specified FDDI device driver. The user can optionally specify that the device-specific statistics be displayed in addition to the device driver statistics. If no flags are specified, only the device driver statistics are displayed.
This command is also invoked when the netstat command is run with the -v flag. The netstat command does not issue any fddistat command flags.
The output produced by the fddistat fddi0 command and the problem determination are similar to that described in The entstat Command.
The atmstat command displays the statistics gathered by the specified ATM device driver. The user can optionally specify that the device-specific statistics be displayed in addition to the device driver statistics. If no flags are specified, only the device driver statistics are displayed.
The output produced by the atmstat atm0 command and the problem determination are similar to that described in The entstat Command.
Use the no command to display current network values and to change options.
For a listing of all attributes for the no command, see Network Option Tunable Parameters.
Some network attributes are run-time attributes that can be changed at any time. Others are load-time attributes that must be set before the netinet kernel extension is loaded.
Note: When the no command is used to change parameters, the change is in effect only until the next system boot. At that point, all parameters are initially reset to their defaults. To make the change permanent, put the appropriate no command in the /etc/rc.net file.
If your system uses Berkeley-style network configuration, set the attributes near the top of the /etc/rc.bsdnet file. If you use an SP system, edit the tuning.cust file as documented in the RS/6000 SP: Installation and Relocation manual.
Note: The no command performs no-range checking. If it is used incorrectly, the no command can cause your system to become inoperable.
The following tuning sections discuss some of the no command attributes and how to adjust them.