Performance Management Guide

Analyzing Network Performance

When performance problems arise, your system might be totally innocent, while the real culprit is buildings away. An easy way to tell if the network is affecting overall performance is to compare those operations that involve the network with those that do not. If you are running a program that does a considerable amount of remote reads and writes and it is running slowly, but everything else seems to be running as usual, then it is probably a network problem. Some of the potential network bottlenecks can be caused by the following:

Client-network interface
Network bandwidth
Network topology
Server network interface
Server CPU load
Server memory usage
Server bandwidth
Inefficient configuration

Several tools can measure network statistics and give a variety of information, but only part of this information is related to performance tuning.

To enhance performance, you can use the no (network options) command and the nfso command for tuning NFS options. You can also use the chdev and ifconfig commands to change system and network parameters.

The ping Command

The ping command is useful for the following:

Determining the status of the network and various foreign hosts.
Tracking and isolating hardware and software problems.
Testing, measuring, and managing networks.

Some ping command options relevant to performance tuning are as follows:

-c: Specifies the number of packets. This option is useful when you get an IP trace log. You can capture a minimum of ping packets.
-s: Specifies the length of packets. You can use this option to check fragmentation and reassembly.
-f: Sends the packets at 10 ms intervals or immediately after each response. Only the root user can use this option.

If you need to load your network or systems, the -f option is convenient. For example, if you suspect that your problem is caused by a heavy load, load your environment intentionally to confirm your suspicion. Open several aixterm windows and run the ping -f command in each window. Your Ethernet utilization quickly gets to around 100 percent. The following is an example:

# date ; ping -c 1000 -f wave ; date
Fri Jul 23 11:52:39 CDT 1999
PING wave.austin.ibm.com: (9.53.153.120): 56 data bytes
.
----wave.austin.ibm.com PING Statistics----
1000 packets transmitted, 1000 packets received, 0% packet loss
round-trip min/avg/max = 1/1/23 ms
Fri Jul 23 11:52:42 CDT 1999

Note

This command can be very hard on a network and should be used with caution. Flood-pinging can only be performed by the root user.

In this example, 1000 packets were sent for 3 seconds. Be aware that this command uses IP and Internet Control Message Protocol (ICMP) protocol and therefore, no transport protocol (UDP/TCP) and application activities are involved. The measured data, such as round-trip time, does not reflect the total performance characteristics.

When you try to send a flood of packets to your destination, consider several points:

Sending packets puts a load on your system.
Use the netstat -i command to monitor the status of your network interface during the experiment. You may find that the system is dropping packets during a send by looking at the Oerrs output.
You should also monitor other resources, such as mbufs and send/receive queue. It can be difficult to place a heavy load onto the destination system. Your system might be overloaded before the other system is.
Consider the relativity of the results. If you want to monitor or test just one destination system, do the same experiment on some other systems for comparison, because your network or router might have a problem.

The ftp Command

You can use the ftp command to send a very large file by using /dev/zero as input and /dev/null as output. This allows you to transfer a large file without involving disks (which might be a bottleneck) and without having to cache the entire file in memory.

Use the following ftp subcommands (change count to increase or decrease the number of blocks read by the dd command):

> bin
> put "|dd if=/dev/zero bs=32k count=10000" /dev/null

Remember, if you change the TCP send or receive space parameters, then for the ftp command, you must refresh the inetd daemon with the refresh -s inetd command.

Make sure that tcp_senspace and tcp_recvspace are at least 65535 for the Gigabit Ethernet "jumbo frames" and for the ATM with MTU 9180 or larger to get good performance due to larger MTU size.

An example to set the parameters is as follows:

# no -o tcp_sendspace=65535
# no -o tcp_recvspace=65535
# refresh -s inetd
0513-095 The request for subsystem refresh was completed successfully.

The ftp subcommands are as follows:

ftp> bin
200 Type set to I.
ftp> put "|dd if=/dev/zero bs=32k count=10000" /dev/null
200 PORT command successful.
150 Opening data connection for /dev/null.
10000+0 records in
10000+0 records out
226 Transfer complete.
327680000 bytes sent in 8.932 seconds (3.583e+04 Kbytes/s)
local: |dd if=/dev/zero bs=32k count=10000 remote: /dev/null
ftp> quit
221 Goodbye.

The netstat Command

The netstat command is used to show network status. Traditionally, it is used more for problem determination than for performance measurement. However, the netstat command can be used to determine the amount of traffic on the network to ascertain whether performance problems are due to network congestion.

The netstat command displays information regarding traffic on the configured network interfaces, such as the following:

The address of any protocol control blocks associated with the sockets and the state of all sockets
The number of packets received, transmitted, and dropped in the communications subsystem
Cumulative statistics per interface
Routes and their status

Using the netstat Command

The netstat command displays the contents of various network-related data structures for active connections. In this chapter, only the options and output fields that are relevant for network performance determinations are discussed. For all other options and columns, see the AIX 5L Version 5.2 Commands Reference.

netstat -i

Shows the state of all configured interfaces.

The following example shows the statistics for a workstation with an integrated Ethernet and a Token-Ring adapter:

# netstat -i
Name  Mtu   Network     Address            Ipkts Ierrs    Opkts Oerrs  Coll
lo0   16896 <Link>                         144834     0   144946     0     0
lo0   16896 127         localhost          144834     0   144946     0     0
tr0   1492  <Link>10.0.5a.4f.3f.61         658339     0   247355     0     0
tr0   1492  9.3.1       ah6000d            658339     0   247355     0     0
en0   1500  <Link>8.0.5a.d.a2.d5                0     0      112     0     0
en0   1500  1.2.3       1.2.3.4                 0     0      112     0     0

The count values are summarized since system startup.

Name: Interface name.
Mtu: Maximum transmission unit. The maximum size of packets in bytes that are transmitted using the interface.
Ipkts: Total number of packets received.
Ierrs: Total number of input errors. For example, malformed packets, checksum errors, or insufficient buffer space in the device driver.
Opkts: Total number of packets transmitted.
Oerrs: Total number of output errors. For example, a fault in the local host connection or adapter output queue overrun.
Coll: Number of packet collisions detected.

Note

The netstat -i command does not support the collision count for Ethernet interfaces (see The entstat Command for Ethernet statistics).

Following are some tuning guidelines:

If the number of errors during input packets is greater than 1 percent of the total number of input packets (from the command netstat -i); that is,
```
Ierrs > 0.01 x Ipkts
```
Then run the netstat -m command to check for a lack of memory.
If the number of errors during output packets is greater than 1 percent of the total number of output packets (from the command netstat -i); that is,
```
Oerrs > 0.01 x Opkts
```
Then increase the send queue size (xmt_que_size) for that interface. The size of the xmt_que_size could be checked with the following command:
```
# lsattr -El adapter
```
If the collision rate is greater than 10 percent, that is,
```
Coll / Opkts > 0.1
```
Then there is a high network utilization, and a reorganization or partitioning may be necessary. Use the netstat -v or entstat command to determine the collision rate.

netstat -i -Z

This function of the netstat command clears all the statistic counters for the netstat -i command to zero.

netstat -I interface interval

Displays the statistics for the specified interface. It offers information similar to the netstat -i command for the specified interface and reports it for a given time interval. For example:

# netstat -I en0 1
    input   (en0)      output           input   (Total)    output
   packets  errs  packets  errs colls  packets  errs  packets   errs colls
       0     0       27     0     0   799655     0   390669     0     0
       0     0        0     0     0        2     0        0     0     0
       0     0        0     0     0        1     0        0     0     0
       0     0        0     0     0       78     0      254     0     0
       0     0        0     0     0      200     0       62     0     0
       0     0        1     0     0        0     0        2     0     0

The previous example shows the netstat -I command output for the ent0 interface. Two reports are generated side by side, one for the specified interface and one for all available interfaces (Total). The fields are similar to the ones in the netstat -i example, input packets = Ipkts, input errs = Ierrs and so on.

netstat -m

Displays the statistics recorded by the mbuf memory-management routines. The most useful statistics in the output of the netstat -m command are the counters that show the requests for mbufs denied and non-zero values in the failed column. If the requests for mbufs denied is not displayed, then this must be an SMP system running operating system version 4.3.2 or later; for performance reasons, global statistics are turned off by default. To enable the global statistics, set the no parameter extended_netstats to 1. This can be done by changing the /etc/rc.net file and rebooting the system.

The following example shows the first part of the netstat -m output with extended_netstats set to 1:

# netstat -m
29 mbufs in use:
16 mbuf cluster pages in use
71 Kbytes allocated to mbufs
0 requests for mbufs denied
0 calls to protocol drain routines

Kernel malloc statistics:

******* CPU 0 *******
By size       inuse     calls failed   delayed    free   hiwat   freed
32              419    544702      0         0     221     800       0
64              173     22424      0         0      19     400       0
128             121     37130      0         0     135     200       4
256            1201 118326233      0         0     239     480     138
512             330    671524      0         0      14      50      54
1024             74    929806      0         0      82     125       2
2048            384   1820884      0         0       8     125    5605
4096            516   1158445      0         0      46     150      21
8192              9      5634      0         0       1      12      27
16384             1      2953      0         0      24      30      41
32768             1         1      0         0       0    1023       0

By type       inuse   calls failed delayed  memuse  memmax  mapb

Streams mblk statistic failures:
0 high priority mblk failures
0 medium priority mblk failures
0 low priority mblk failures

If global statistics are not on and you want to determine the total number of requests for mbufs denied, add up the values under the failed columns for each CPU. If the netstat -m command indicates that requests for mbufs or clusters have failed or been denied, then you may want to increase the value of thewall by using the no -o thewall=NewValue command. See Overview of the mbuf Management Facility for additional details about the use of thewall and maxmbuf.

Beginning with AIX 4.3.3, a delayed column was added. If the requester of an mbuf specified the M_WAIT flag, then if an mbuf was not available, the thread is put to sleep until an mbuf is freed and can be used by this thread. The failed counter is not incremented in this case; instead, the delayed column will be incremented. Prior to AIX 4.3.3, the failed counter was also not incremented, but there was no delayed column.

Also, if the currently allocated amount of network memory is within 85 percent of thewall, you may want to increase thewall. If the value of thewall is increased, use the vmstat command to monitor total memory use to determine if the increase has had a negative impact on overall memory performance.

If buffers are not available when a request is received, the request is most likely lost (to see if the adapter actually dropped a package, see Adapter Statistics). Keep in mind that if the requester of the mbuf specified that it could wait for the mbuf if not available immediately, this puts the requestor to sleep but does not count as a request being denied.

If the number of failed requests continues to increase, the system might have an mbuf leak. To help track down the problem, the no command parameter net_malloc_police can be set to 1, and the trace hook with ID 254 can be used with the trace command.

After an mbuf/cluster is allocated and pinned, it can be freed by the application. Instead of unpinning this buffer and giving it back to the system, it is left on a free-list based on the size of this buffer. The next time that a buffer is requested, it can be taken off this free-list to avoid the overhead of pinning. After the number of buffers on the free list reaches the highwater mark, buffers smaller than 4096 will be coalesced together into page-sized units so that they can be unpinned and given back to the system. When the buffers are given back to the system, the freed column is incremented. If the freed value consistently increases, the highwater mark is too low. In AIX 4.3.2 and later, the highwater mark is scaled according to the amount of RAM on the system.

netstat -v

The netstat -v command displays the statistics for each Common Data Link Interface (CDLI)-based device driver that is in operation. Interface-specific reports can be requested using the tokstat, entstat, fddistat, or atmstat commands.

Every interface has its own specific information and some general information. The following example shows the Token-Ring and Ethernet part of the netstat -v command; other interface parts are similar. With a different adapter, the statistics will differ somewhat. The most important output fields are highlighted.

# netstat -v
-------------------------------------------------------------
ETHERNET STATISTICS (ent0) :
Device Type: IBM 10/100 Mbps Ethernet PCI Adapter (23100020)
Hardware Address: 00:60:94:e9:29:18
Elapsed Time: 9 days 19 hours 5 minutes 51 seconds

Transmit Statistics:                          Receive Statistics:
--------------------                          -------------------
Packets: 0                                    Packets: 0
Bytes: 0                                      Bytes: 0
Interrupts: 0                                 Interrupts: 0
Transmit Errors: 0                            Receive Errors: 0
Packets Dropped: 0                            Packets Dropped: 0
                                              Bad Packets: 0
Max Packets on S/W Transmit Queue: 0
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 0                          Broadcast Packets: 0
Multicast Packets: 0                          Multicast Packets: 0
No Carrier Sense: 0                           CRC Errors: 0
DMA Underrun: 0                               DMA Overrun: 0
Lost CTS Errors: 0                            Alignment Errors: 0
Max Collision Errors: 0                       No Resource Errors: 0
Late Collision Errors: 0                      Receive Collision Errors: 0
Deferred: 0                                   Packet Too Short Errors: 0
SQE Test: 0                                   Packet Too Long Errors: 0
Timeout Errors: 0                             Packets Discarded by Adapter: 0
Single Collision Count: 0                     Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Driver Flags: Up Broadcast Running
        Simplex 64BitSupport PrivateSegment

IBM 10/100 Mbps Ethernet PCI Adapter Specific Statistics:
------------------------------------------------
Chip Version: 25
RJ45 Port Link Status : down
Media Speed Selected: 10 Mbps Half Duplex
Media Speed Running: Unknown
Receive Pool Buffer Size: 384
Free Receive Pool Buffers: 128
No Receive Pool Buffer Errors: 0
Inter Packet Gap: 96
Adapter Restarts due to IOCTL commands: 0
Packets with Transmit collisions:
 1 collisions: 0           6 collisions: 0          11 collisions: 0
 2 collisions: 0           7 collisions: 0          12 collisions: 0
 3 collisions: 0           8 collisions: 0          13 collisions: 0
 4 collisions: 0           9 collisions: 0          14 collisions: 0
 5 collisions: 0          10 collisions: 0          15 collisions: 0
Excessive deferral errors: 0x0
-------------------------------------------------------------
TOKEN-RING STATISTICS (tok0) :
Device Type: IBM PCI Tokenring Adapter (14103e00)
Hardware Address: 00:20:35:7a:12:8a
Elapsed Time: 29 days 18 hours 3 minutes 47 seconds

Transmit Statistics:                          Receive Statistics:
--------------------                          -------------------
Packets: 1355364                              Packets: 55782254
Bytes: 791555422                              Bytes: 6679991641
Interrupts: 902315                            Interrupts: 55782192
Transmit Errors: 0                            Receive Errors: 1
Packets Dropped: 0                            Packets Dropped: 0
                                              Bad Packets: 0
Max Packets on S/W Transmit Queue: 182
S/W Transmit Queue Overflow: 42
Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 18878                      Broadcast Packets: 54615793
Multicast Packets: 0                          Multicast Packets: 569
Timeout Errors: 0                             Receive Congestion Errors: 0
Current SW Transmit Queue Length: 0
Current HW Transmit Queue Length: 0

General Statistics:
-------------------
No mbuf Errors: 0                             Lobe Wire Faults: 0
Abort Errors: 12                              AC Errors: 0
Burst Errors: 1                               Frame Copy Errors: 0
Frequency Errors: 0                           Hard Errors: 0
Internal Errors: 0                            Line Errors: 0
Lost Frame Errors: 0                          Only Station: 1
Token Errors: 0                               Remove Received: 0
Ring Recovered: 17                            Signal Loss Errors: 0
Soft Errors: 35                               Transmit Beacon Errors: 0
Driver Flags: Up Broadcast Running
        AlternateAddress 64BitSupport ReceiveFunctionalAddr
        16 Mbps

IBM PCI Tokenring Adapter (14103e00) Specific Statistics:
---------------------------------------------------------
Media Speed Running: 16 Mbps Half Duplex
Media Speed Selected: 16 Mbps Full Duplex
Receive Overruns : 0
Transmit Underruns : 0
ARI/FCI errors : 0
Microcode level on the adapter :001PX11B2
Num pkts in priority sw tx queue  : 0
Num pkts in priority hw tx queue  : 0
Open Firmware Level : 001PXRS02

The highlighted fields are described as follows:

Transmit and Receive Errors
Number of output/input errors encountered on this device. This field counts unsuccessful transmissions due to hardware/network errors.

These unsuccessful transmissions could also slow down the performance of the system.
Max Packets on S/W Transmit Queue
Maximum number of outgoing packets ever queued to the software transmit queue.

An indication of an inadequate queue size is if the maximal transmits queued equals the current queue size (xmt_que_size). This indicates that the queue was full at some point.

To check the current size of the queue, use the lsattr -El adapter command (where adapter is, for example, tok0 or ent0). Because the queue is associated with the device driver and adapter for the interface, use the adapter name, not the interface name. Use the SMIT or the chdev command to change the queue size.
S/W Transmit Queue Overflow
Number of outgoing packets that have overflowed the software transmit queue. A value other than zero requires the same actions as would be needed if the Max Packets on S/W Transmit Queue reaches the xmt_que_size. The transmit queue size must be increased.
Broadcast Packets
Number of broadcast packets received without any error.

If the value for broadcast packets is high, compare it with the total received packets. The received broadcast packets should be less than 20 percent of the total received packets. If it is high, this could be an indication of a high network load; use multicasting. The use of IP multicasting enables a message to be transmitted to a group of hosts, instead of having to address and send the message to each group member individually.
DMA Overrun
The DMA Overrun statistic is incremented when the adapter is using DMA to put a packet into system memory and the transfer is not completed. There are system buffers available for the packet to be placed into, but the DMA operation failed to complete. This occurs when the MCA bus is too busy for the adapter to be able to use DMA for the packets. The location of the adapter on the bus is crucial in a heavily loaded system. Typically an adapter in a lower slot number on the bus, by having the higher bus priority, is using so much of the bus that adapters in higher slot numbers are not being served. This is particularly true if the adapters in a lower slot number are ATM or SSA adapters.
Max Collision Errors
Number of unsuccessful transmissions due to too many collisions. The number of collisions encountered exceeded the number of retries on the adapter.
Late Collision Errors
Number of unsuccessful transmissions due to the late collision error.
Timeout Errors
Number of unsuccessful transmissions due to adapter reported timeout errors.
Single Collision Count
Number of outgoing packets with single (only one) collision encountered during transmission.
Multiple Collision Count
Number of outgoing packets with multiple (2 - 15) collisions encountered during transmission.
Receive Collision Errors
Number of incoming packets with collision errors during reception.
No mbuf Errors
Number of times that mbufs were not available to the device driver. This usually occurs during receive operations when the driver must obtain memory buffers to process inbound packets. If the mbuf pool for the requested size is empty, the packet will be discarded. Use the netstat -m command to confirm this, and increase the parameter thewall.

The No mbuf Errors value is interface-specific and not identical to the requests for mbufs denied from the netstat -m output. Compare the values of the example for the commands netstat -m and netstat -v (Ethernet and Token-Ring part).

To determine network performance problems, check for any Error counts in the netstat -v output.

Additional guidelines:

To check for an overloaded Ethernet network, calculate (from the netstat -v command):
```
(Max Collision Errors + Timeouts Errors) / Transmit Packets
```
If the result is greater than 5 percent, reorganize the network to balance the load.
Another indication for a high network load is (from the command netstat -v):
If the total number of collisions from the netstat -v output (for Ethernet) is greater than 10 percent of the total transmitted packets, as follows:
```
Number of collisions / Number of Transmit Packets > 0.1
```

netstat -p protocol

Shows statistics about the value specified for the protocol variable (udp, tcp, ip, icmp), which is either a well-known name for a protocol or an alias for it. Some protocol names and aliases are listed in the /etc/protocols file. A null response indicates that there are no numbers to report. If there is no statistics routine for it, the program report of the value specified for the protocol variable is unknown.

The following example shows the output for the ip protocol:

# netstat -p ip
ip:
:
        491351 total packets received
        0 bad header checksums
        0 with size smaller than minimum
        0 with data size < data length
        0 with header length < data size
        0 with data length < header length
        0 with bad options
        0 with incorrect version number
        25930 fragments received
        0 fragments dropped (dup or out of space)
       0 fragments dropped after timeout
        12965 packets reassembled ok
        475054 packets for this host
        0 packets for unknown/unsupported protocol
        0 packets forwarded
        3332 packets not forwardable
        0 redirects sent
        405650 packets sent from this host
        0 packets sent with fabricated ip header
        0 output packets dropped due to no bufs, etc.
        0 output packets discarded due to no route
        5498 output datagrams fragmented
        10996 fragments created
        0 datagrams that can't be fragmented
        0 IP Multicast packets dropped due to no receiver
        0 ipintrq overflows

The highlighted fields are described as follows:

Total Packets Received
Number of total IP datagrams received.
Bad Header Checksum or Fragments Dropped
If the output shows bad header checksum or fragments dropped due to dup or out of space, this indicates either a network that is corrupting packets or device driver receive queues that are not large enough.
Fragments Received
Number of total fragments received.
Dropped after Timeout
If the fragments dropped after timeout is other than zero, then the time to life counter of the ip fragments expired due to a busy network before all fragments of the datagram arrived. To avoid this, use the no command to increase the value of the ipfragttl network parameter. Another reason could be a lack of mbufs; increase thewall.
Packets Sent from this Host
Number of IP datagrams that were created and sent out from this system. This counter does not include the forwarded datagrams (passthrough traffic).
Fragments Created
Number of fragments created in this system when IP datagrams were sent out.

When viewing IP statistics, look at the ratio of packets received to fragments received. As a guideline for small MTU networks, if 10 percent or more of the packets are getting fragmented, you should investigate further to determine the cause. A large number of fragments indicates that protocols above the IP layer on remote hosts are passing data to IP with data sizes larger than the MTU for the interface. Gateways/routers in the network path might also have a much smaller MTU size than the other nodes in the network. The same logic can be applied to packets sent and fragments created.

Fragmentation results in additional CPU overhead so it is important to determine its cause. Be aware that some applications, by their very nature, can cause fragmentation to occur. For example, an application that sends small amounts of data can cause fragments to occur. However, if you know the application is sending large amounts of data and fragmentation is still occurring, determine the cause. It is likely that the MTU size used is not the MTU size configured on the systems.

The following example shows the output for the udp protocol:

# netstat -p udp
udp:
        11521194 datagrams received
        0 incomplete headers
        0 bad data length fields
       0 bad checksums
        16532 dropped due to no socket
        232850 broadcast/multicast datagrams dropped due to no socket
       77 socket buffer overflows
        11271735 delivered
        796547 datagrams output

Statistics of interest are:

Bad Checksums
Bad checksums could happen due to hardware card or cable failure.
Dropped Due to No Socket
Number of received UDP datagrams of that destination socket ports were not opened. As a result, the ICMP Destination Unreachable - Port Unreachable message must have been sent out. But if the received UDP datagrams were broadcast datagrams, ICMP errors are not generated. If this value is high, investigate how the application is handling sockets.
Socket Buffer Overflows
Socket buffer overflows could be due to insufficient transmit and receive UDP sockets, too few nfsd daemons, or too small nfs_socketsize, udp_recvspace and sb_max values.

If the netstat -p udp command indicates socket overflows, then you might need to increase the number of the nfsd daemons on the server. First, check the affected system for CPU or I/O saturation, and verify the recommended setting for the other communication layers by using the no -a command. If the system is saturated, you must either to reduce its load or increase its resources.

The following example shows the output for the tcp protocol:

# netstat -p tcp
tcp:
        63726 packets sent
                34309 data packets (6482122 bytes)
                198 data packets (161034 bytes) retransmitted
                17437 ack-only packets (7882 delayed)
                0 URG only packets
                0 window probe packets
                3562 window update packets
                8220 control packets
        71033 packets received
                35989 acks (for 6444054 bytes)
                2769 duplicate acks
                0 acks for unsent data
                47319 packets (19650209 bytes) received in-sequence
                182 completely duplicate packets (29772 bytes)
                4 packets with some dup. data (1404 bytes duped)
                2475 out-of-order packets (49826 bytes)
                0 packets (0 bytes) of data after window
                0 window probes
                800 window update packets
                77 packets received after close
                0 packets with bad hardware assisted checksum
                0 discarded for bad checksums
                0 discarded for bad header offset fields
        0 connection request
        3125 connection requests
        1626 connection accepts
        4731 connections established (including accepts)
        5543 connections closed (including 31 drops)
        62 embryonic connections dropped
        38552 segments updated rtt (of 38862 attempts)
        0 resends due to path MTU discovery
        3 path MTU discovery terminations due to retransmits
        553 retransmit timeouts
                28 connections dropped by rexmit timeout
        0 persist timeouts
        464 keepalive timeouts
                26 keepalive probes sent
                1 connection dropped by keepalive
        0 connections in timewait reused
        0 delayed ACKs for SYN
        0 delayed ACKs for FIN
        0 send_and_disconnects

Statistics of interest are:

Packets Sent
Data Packets
Data Packets Retransmitted
Packets Received
Completely Duplicate Packets
Retransmit Timeouts

For the TCP statistics, compare the number of packets sent to the number of data packets retransmitted. If the number of packets retransmitted is over 10-15 percent of the total packets sent, TCP is experiencing timeouts indicating that network traffic may be too high for acknowledgments (ACKs) to return before a timeout. A bottleneck on the receiving node or general network problems can also cause TCP retransmissions, which will increase network traffic, further adding to any network performance problems.

Also, compare the number of packets received with the number of completely duplicate packets. If TCP on a sending node times out before an ACK is received from the receiving node, it will retransmit the packet. Duplicate packets occur when the receiving node eventually receives all the retransmitted packets. If the number of duplicate packets exceeds 10-15 percent, the problem may again be too much network traffic or a bottleneck at the receiving node. Duplicate packets increase network traffic.

The value for retransmit timeouts occurs when TCP sends a packet but does not receive an ACK in time. It then resends the packet. This value is incremented for any subsequent retransmittals. These continuous retransmittals drive CPU utilization higher, and if the receiving node does not receive the packet, it eventually will be dropped.

netstat -s

The netstat -s command shows statistics for each protocol (while the netstat -p command shows the statistics for the specified protocol).

netstat -s -s

The undocumented -s -s option shows only those lines of the netstat -s output that are not zero, making it easier to look for error counts.

netstat -s -Z

This is an undocumented function of the netstat command. It clears all the statistic counters for the netstat -s command to zero.

netstat -r

Another option relevant to performance is the display of the discovered Path Maximum Transmission Unit (PMTU).

For two hosts communicating across a path of multiple networks, a transmitted packet will become fragmented if its size is greater than the smallest MTU of any network in the path. Because packet fragmentation can result in reduced network performance, it is desirable to avoid fragmentation by transmitting packets with a size no larger than the smallest MTU in the network path. This size is called the path MTU.

Use the netstat -r command to display this value. In the following is example the netstat -r -f inet command is used to display only the routing tables:

# netstat -r -f inet
Routing tables
Destination      Gateway           Flags   Refs     Use   PMTU If  Exp  Groups

Route Tree for Protocol Family 2:
default          itsorusi          UGc       1      348   -   tr0   -
9.3.1            sv2019e           Uc       25    12504   -   tr0   -
itsonv           sv2019e           UHW       0      235   -   tr0   -
itsorusi         sv2019e           UHW       1      883  1492 tr0   -
ah6000d          sv2019e           UHW       1      184  1492 tr0   -
ah6000e          sv2019e           UHW       0      209   -   tr0   -
sv2019e          sv2019e           UHW       4    11718  1492 tr0   -
coyote.ncs.mainz itsorusi          UGHW      1       45  1492 tr0   -
kresna.id.ibm.co itsorusi          UGHW      0       14  1492 tr0   -
9.184.104.111    kresna.id.ibm.com UGc       0        5   -    tr0   -
127              localhost         U         3       96  -    lo0   -

netstat -D

The -D option allows you to see packets coming into and going out of each layer in the communications subsystem along, with packets dropped at each layer.

# netstat -D

Source                         Ipkts                Opkts     Idrops     Odrops
-------------------------------------------------------------------------------
tok_dev0                    19333058               402225          3          0
ent_dev0                           0                    0          0          0
                ---------------------------------------------------------------
Devices Total               19333058               402225          3          0
-------------------------------------------------------------------------------
tok_dd0                     19333055               402225          0          0
ent_dd0                            0                    0          0          0
                ---------------------------------------------------------------
Drivers Total               19333055               402225          0          0
-------------------------------------------------------------------------------
tok_dmx0                      796966                  N/A   18536091        N/A
ent_dmx0                           0                  N/A          0        N/A
                ---------------------------------------------------------------
Demuxer Total                 796966                  N/A   18536091        N/A
-------------------------------------------------------------------------------
IP                            694138               677411       7651       6523
TCP                           143713               144247          0          0
UDP                           469962               266726          0        812
                ---------------------------------------------------------------
Protocols Total              1307813              1088384       7651       7335
-------------------------------------------------------------------------------
lo_if0                         22088                22887        799          0
tr_if0                        796966               402227          0        289
                ---------------------------------------------------------------
Net IF Total                  819054               425114        799        289
-------------------------------------------------------------------------------
                ---------------------------------------------------------------
NFS/RPC Total                    N/A                 1461          0          0
-------------------------------------------------------------------------------
(Note:  N/A -> Not Applicable)

The Devices layer shows number of packets coming into the adapter, going out of the adapter, and number of packets dropped on input and output. There are various causes of adapter errors, and the netstat -v command can be examined for more details.

The Drivers layer shows packet counts handled by the device driver for each adapter. Output of the netstat -v command is useful here to determine which errors are counted.

The Demuxer values show packet counts at the demux layer, and Idrops here usually indicate that filtering has caused packets to be rejected (for example, Netware or DecNet packets being rejected because these are not handled by the system under examination).

Details for the Protocols layer can be seen in the output of the netstat -s command.

Note

In the statistics output, a N/A displayed in a field value indicates the count is not applicable. For the NFS/RPC statistics, the number of incoming packets that pass through RPC are the same packets which pass through NFS, so these numbers are not summed in the NFS/RPC Total field, hence the N/A. NFS has no outgoing packet or outgoing packet drop counters specific to NFS and RPC. Therefore, individual counts have a field value of N/A, and the cumulative count is stored in the NFS/RPC Total field.

The netpmon Command

The netpmon command uses the trace facility to obtain a detailed picture of network activity during a time interval. Because it uses the trace facility, the netpmon command can be run only by a root user or by a member of the system group.

Also, the netpmon command cannot run together with any of the other trace-based performance commands such as tprof and filemon. In its usual mode, the netpmon command runs in the background while one or more application programs or system commands are being executed and monitored.

The netpmon command focuses on the following system activities:

CPU usage
- By processes and interrupt handlers
- How much is network-related
- What causes idle time
Network device driver I/O
- Monitors I/O operations through all Ethernet, Token-Ring, and Fiber-Distributed Data Interface (FDDI) network device drivers.
- In the case of transmission I/O, the command monitors utilizations, queue lengths, and destination hosts. For receive ID, the command also monitors time in the demux layer.
Internet socket calls
- Monitors send(), recv(), sendto(), recvfrom(), sendmsg(), read(), and write() subroutines on Internet sockets.
- Reports statistics on a per-process basis for the Internet Control Message Protocol (ICMP), Transmission Control Protocol (TCP), and the User Datagram Protocol (UDP).
NFS I/O
- On client: RPC requests, NFS read/write requests.
- On server: Per-client, per-file, read/write requests.

The following will be computed:

Response times and sizes associated with transmit and receive operations at the device driver level.
Response times and sizes associated with all types of Internet socket read and write system calls.
Response times and sizes associated with NFS read and write system calls.
Response times associated with NFS remote procedure call requests.

To determine whether the netpmon command is installed and available, run the following command:

# lslpp -lI perfagent.tools

Tracing is started by the netpmon command, optionally suspended with the trcoff subcommand and resumed with the trcon subcommand, and terminated with the trcstop subcommand. As soon as tracing is terminated, the netpmon command writes its report to standard output.

Using netpmon

The netpmon command will start tracing immediately unless the -d option is used. Use the trcstop command to stop tracing. At that time, all the specified reports are generated, and the netpmon command exits. In the client-server environment, use the netpmon command to view how networking affects the overall performance. It can be run on both client and server.

The netpmon command can read the I/O trace data from a specified file, instead of from the real-time trace process. In this case, the netpmon report summarizes the network activity for the system and period represented by the trace file. This offline processing method is useful when it is necessary to postprocess a trace file from a remote machine or perform the trace data collection at one time and postprocess it at another time.

The trcrpt -r command must be executed on the trace logfile and redirected to another file, as follows:

# gennames > gennames.out
# trcrpt -r trace.out > trace.rpt

At this point, an adjusted trace logfile is fed into the netpmon command to report on I/O activity captured by a previously recorded trace session as follows:

# netpmon -i trace.rpt -n gennames.out | pg

In this example, the netpmon command reads file system trace events from the trace.rpt input file. Because the trace data is already captured on a file, the netpmon command does not put itself in the background to allow application programs to be run. After the entire file is read, a network activity report will be displayed on standard output (which, in this example, is piped to the pg command).

If the trace command was run with the -C all flag, then run the trcrpt command also with the -C all flag (see Formatting a Report from trace -C Output).

The following netpmon command running on an NFS server executes the sleep command and creates a report after 400 seconds. During the measured interval, a copy to an NFS-mounted file system /nfs_mnt is taking place.

# netpmon -o netpmon.out -O all; sleep 400; trcstop

With the -O option, you can specify the report type to be generated. Valid report type values are:

cpu

CPU usage

dd

Network device-driver I/O

so

Internet socket call I/O

nfs

NFS I/O

all

All reports are produced. The following is the default value.

# cat netpmon.out

Thu Jan 21 15:02:45 2000
System: AIX itsosmp Node: 4 Machine: 00045067A000
401.053 secs in measured interval
========================================================================
Process CPU Usage Statistics:
-----------------------------
                                                   Network
Process (top 20)             PID  CPU Time   CPU %   CPU %
----------------------------------------------------------
nfsd                       12370   42.2210   2.632   2.632
nfsd                       12628   42.0056   2.618   2.618
nfsd                       13144   41.9540   2.615   2.615
nfsd                       12886   41.8680   2.610   2.610
nfsd                       12112   41.4114   2.581   2.581
nfsd                       11078   40.9443   2.552   2.552
nfsd                       11854   40.6198   2.532   2.532
nfsd                       13402   40.3445   2.515   2.515
lrud                        1548   16.6294   1.037   0.000
netpmon                    15218    5.2780   0.329   0.000
gil                         2064    2.0766   0.129   0.129
trace                      18284    1.8820   0.117   0.000
syncd                       3602    0.3757   0.023   0.000
swapper                        0    0.2718   0.017   0.000
init                           1    0.2201   0.014   0.000
afsd                        8758    0.0244   0.002   0.000
bootpd                      7128    0.0220   0.001   0.000
ksh                         4322    0.0213   0.001   0.000
pcimapsvr.ip               16844    0.0204   0.001   0.000
netm                        1806    0.0186   0.001   0.001
----------------------------------------------------------
Total (all processes)             358.3152  22.336  20.787
Idle time                        1221.0235  76.114

========================================================================
First Level Interrupt Handler CPU Usage Statistics:
---------------------------------------------------
                                                   Network
FLIH                              CPU Time   CPU %   CPU %
----------------------------------------------------------
PPC decrementer                     9.9419   0.620   0.000
external device                     4.5849   0.286   0.099
UNKNOWN                             0.1716   0.011   0.000
data page fault                     0.1080   0.007   0.000
floating point                      0.0012   0.000   0.000
instruction page fault              0.0007   0.000   0.000
----------------------------------------------------------
Total (all FLIHs)                  14.8083   0.923   0.099

========================================================================
Second Level Interrupt Handler CPU Usage Statistics:
----------------------------------------------------
                                                   Network
SLIH                              CPU Time   CPU %   CPU %
----------------------------------------------------------
tokdd                              12.4312   0.775   0.775
ascsiddpin                          0.5178   0.032   0.000
----------------------------------------------------------
Total (all SLIHs)                  12.9490   0.807   0.775

========================================================================
Network Device-Driver Statistics (by Device):
---------------------------------------------
                        ----------- Xmit -----------   -------- Recv ---------
Device                   Pkts/s  Bytes/s  Util  QLen   Pkts/s  Bytes/s   Demux
------------------------------------------------------------------------------
token ring 0              31.61     4800  1.7% 0.046   200.93   273994  0.0080

========================================================================

Network Device-Driver Transmit Statistics (by Destination Host):
----------------------------------------------------------------
Host                     Pkts/s  Bytes/s
----------------------------------------
ah6000c                   31.57     4796
9.3.1.255                  0.03        4
itsorusi                   0.00        0

========================================================================

TCP Socket Call Statistics (by Process):
----------------------------------------
                                   ------ Read -----   ----- Write -----
Process (top 20)             PID   Calls/s   Bytes/s   Calls/s   Bytes/s
------------------------------------------------------------------------
telnetd                    18144      0.03       123      0.06         0
------------------------------------------------------------------------
Total (all processes)                 0.03       123      0.06         0

========================================================================

NFS Server Statistics (by Client):
----------------------------------
                         ------ Read -----   ----- Write -----     Other
Client                   Calls/s   Bytes/s   Calls/s   Bytes/s   Calls/s
------------------------------------------------------------------------
ah6000c                     0.00         0     31.54    258208      0.01
------------------------------------------------------------------------
Total (all clients)         0.00         0     31.54    258208      0.01

========================================================================
Detailed Second Level Interrupt Handler CPU Usage Statistics:
-------------------------------------------------------------
SLIH: tokdd
count:                  93039
  cpu time (msec):      avg 0.134   min 0.026   max 0.541   sdev 0.051
SLIH: ascsiddpin
count:                  8136
  cpu time (msec):      avg 0.064   min 0.012   max 0.147   sdev 0.018
COMBINED (All SLIHs)
count:                  101175
  cpu time (msec):      avg 0.128   min 0.012   max 0.541   sdev 0.053

========================================================================
Detailed Network Device-Driver Statistics:
------------------------------------------
DEVICE: token ring 0
recv packets:           80584
  recv sizes (bytes):   avg 1363.6  min 50      max 1520    sdev 356.3
  recv times (msec):    avg 0.081   min 0.010   max 0.166   sdev 0.020
  demux times (msec):   avg 0.040   min 0.008   max 0.375   sdev 0.040
xmit packets:           12678
  xmit sizes (bytes):   avg 151.8   min 52      max 184     sdev 3.3
  xmit times (msec):    avg 1.447   min 0.509   max 4.514   sdev 0.374

========================================================================
Detailed Network Device-Driver Transmit Statistics (by Host):
-------------------------------------------------------------
HOST: ah6000c
xmit packets:           12662
  xmit sizes (bytes):   avg 151.9   min 52      max 184     sdev 2.9
  xmit times (msec):    avg 1.448   min 0.509   max 4.514   sdev 0.373

HOST: 9.3.1.255
xmit packets:           14
  xmit sizes (bytes):   avg 117.0   min 117     max 117     sdev 0.0
  xmit times (msec):    avg 1.133   min 0.884   max 1.730   sdev 0.253

HOST: itsorusi
xmit packets:           1
  xmit sizes (bytes):   avg 84.0    min 84      max 84      sdev 0.0
  xmit times (msec):    avg 0.522   min 0.522   max 0.522   sdev 0.000

========================================================================
Detailed TCP Socket Call Statistics (by Process):
-------------------------------------------------
PROCESS: telnetd   PID: 18144
reads:                  12
  read sizes (bytes):   avg 4096.0  min 4096    max 4096    sdev 0.0
  read times (msec):    avg 0.085   min 0.053   max 0.164   sdev 0.027
writes:                 23
  write sizes (bytes):  avg 3.5     min 1       max 26      sdev 7.0
  write times (msec):   avg 0.143   min 0.067   max 0.269   sdev 0.064
PROTOCOL: TCP (All Processes)
reads:                  12
  read sizes (bytes):   avg 4096.0  min 4096    max 4096    sdev 0.0
  read times (msec):    avg 0.085   min 0.053   max 0.164   sdev 0.027
writes:                 23
  write sizes (bytes):  avg 3.5     min 1       max 26      sdev 7.0
  write times (msec):   avg 0.143   min 0.067   max 0.269   sdev 0.064

========================================================================
Detailed NFS Server Statistics (by Client):
-------------------------------------------
CLIENT: ah6000c
writes:                 12648
  write sizes (bytes):  avg 8187.5  min 4096    max 8192    sdev 136.2
  write times (msec):   avg 138.646 min 0.147   max 1802.067 sdev 58.853
other calls:            5
  other times (msec):   avg 1.928   min 0.371   max 8.065   sdev 3.068
COMBINED (All Clients)
writes:                 12648
  write sizes (bytes):  avg 8187.5  min 4096    max 8192    sdev 136.2
  write times (msec):   avg 138.646 min 0.147   max 1802.067 sdev 58.853
other calls:            5
  other times (msec):   avg 1.928   min 0.371   max 8.065   sdev 3.068

The output of the netpmon command is composed of two different types of reports: global and detailed. The global reports list statistics as follows:

Most active processes
First-level interrupt handlers
Second-level interrupt handlers
Network device drivers
Network device-driver transmits
TCP socket calls
NFS server or client statistics

The global reports are shown at the beginning of the netpmon output and are the occurrences during the measured interval. The detailed reports provide additional information for the global reports. By default, the reports are limited to the 20 most active statistics measured. All information in the reports is listed from top to bottom as most active to least active.

Global Reports of netpmon

The reports generated by the netpmon command begin with a header, which identifies the date, the machine ID, and the length of the monitoring period in seconds. The header is followed by a set of global and detailed reports for all specified report types.

Process CPU Usage Statistics

Each row describes the CPU usage associated with a process. Unless the verbose (-v) option is specified, only the 20 most active processes are included in the list. At the bottom of the report, CPU usage for all processes is totaled, and CPU idle time is reported. The idle time percentage number is calculated from the idle time divided by the measured interval. The difference between the CPU time totals and measured interval is due to Interrupt handlers.

The Network CPU % is the percentage of total time that this process spent executing network-related code.

If the -t flag is used, a thread CPU usage statistic is also present. Each process row described above is immediately followed by rows describing the CPU usage of each thread owned by that process. The fields in these rows are identical to those for the process, except for the name field. Threads are not named.

In the example report, the Idle time percentage number (76.114 percent) shown in the global CPU usage report is calculated from the Idle time (1221.0235) divided by the measured interval times 4 (401.053 times 4), because there are four CPUs in this server. If you want to look at each CPU's activity, you can use sar, ps, or any other SMP-specific command. Similar calculation applies to the total CPU % that is occupied by all processes. The Idle time is due to network I/O. The difference between the CPU Time totals (1221.0235 + 358.315) and the measured interval is due to interrupt handlers and the multiple CPUs. It appears that in the example report, the majority of the CPU usage was network-related: (20.787 / 22.336) = 93.07 percent. About 77.664 percent of CPU usage is either CPU idle or CPU wait time.

Note

If the result of total network CPU % divided by total CPU % is greater than 0.5 from Process CPU Usage Statistics for NFS server, then the majority of CPU usage is network-related.

This method is also a good way to view CPU usage by process without tying the output to a specific program.

First Level Interrupt Handler CPU Usage Statistics

Each row describes the CPU usage associated with a first-level interrupt handler (FLIH). At the bottom of the report, CPU usage for all FLIHs is totaled.

CPU Time: Total amount of CPU time used by this FLIH
CPU %: CPU usage for this interrupt handler as a percentage of total time
Network CPU %: Percentage of total time that this interrupt handler executed on behalf of network-related events

Second Level Interrupt Handler CPU Usage Statistics

Each row describes the CPU usage associated with a second-level interrupt handler (SLIH). At the bottom of the report, CPU usage for all SLIHs is totaled.

Network Device-Driver Statistics (by Device)

Each row describes the statistics associated with a network device.

Device: Name of special file associated with device
Xmit Pkts/s: Packets per second transmitted through this device
Xmit Bytes/s: Bytes per second transmitted through this device
Xmit Util: Busy time for this device, as a percent of total time
Xmit Qlen: Number of requests waiting to be transmitted through this device, averaged over time, including any transaction currently being transmitted
Recv Pkts/s: Packets per second received through this device
Recv Bytes/s: Bytes per second received through this device
Recv Demux: Time spent in demux layer as a fraction of total time

In this example, the Xmit QLen is only 0.046. This number is very small compared to its default size (30). Its Recv Bytes/s is 273994, much smaller than the Token-Ring transmit speed (16 Mb/s). Therefore, in this case, the network is not saturated, at least from this system's view.

Network Device-Driver Transmit Statistics (by Destination Host)

Each row describes the amount of transmit traffic associated with a particular destination host, at the device-driver level.

Host: Destination host name. An asterisk (*) is used for transmissions for which no host name can be determined.
Pkts/s: Packets per second transmitted to this host.
Bytes/s: Bytes per second transmitted to this host.

TCP Socket Call Statistics for Each Internet Protocol (by Process)

These statistics are shown for each used Internet protocol. Each row describes the amount of read() and write() subroutine activity on sockets of this protocol type associated with a particular process. At the bottom of the report, all socket calls for this protocol are totaled.

NFS Server Statistics (by Client)

Each row describes the amount of NFS activity handled by this server on behalf of a particular client. At the bottom of the report, calls for all clients are totaled.

On a client machine, the NFS server statistics are replaced by the NFS client statistics (NFS Client Statistics for each Server (by File), NFS Client RPC Statistics (by Server), NFS Client Statistics (by Process)).

Detailed Reports of netpmon

Detailed reports are generated for all requested (-O) report types. For these report types, a detailed report is produced in addition to the global reports. The detailed reports contain an entry for each entry in the global reports with statistics for each type of transaction associated with the entry.

Transaction statistics consist of a count of the number of transactions for that type, followed by response time and size distribution data (where applicable). The distribution data consists of average, minimum, and maximum values, as well as standard deviations. Roughly two-thirds of the values are between average minus standard deviation and average plus standard deviation. Sizes are reported in bytes. Response times are reported in milliseconds.

Detailed Second-Level Interrupt Handler CPU-Usage Statistics

The output fields are described as follows:

SLIH: Name of second-level interrupt handler
count: Number of interrupts of this type
cpu time (msec): CPU usage statistics for handling interrupts of this type

Detailed Network Device-Driver Statistics (by Device)

The output fields are described as follows:

DEVICE: Path name of special file associated with device
recv packets: Number of packets received through this device
recv sizes (bytes): Size statistics for received packets
recv times (msec): Response time statistics for processing received packets
demux times (msec): Time statistics for processing received packets in the demux layer
xmit packets: Number of packets transmitted through this device
xmit sizes (bytes): Size statistics for transmitted packets
xmit times (msec): Response time statistics for processing transmitted packets

There are other detailed reports, such as Detailed Network Device-Driver Transmit Statistics (by Host) and Detailed TCP Socket Call Statistics for Each Internet Protocol (by Process). For an NFS client, there are the Detailed NFS Client Statistics for Each Server (by File), Detailed NFS Client RPC Statistics (by Server), and Detailed NFS Client Statistics (by Process) reports. For an NFS server, there is the Detailed NFS Server Statistics (by Client) report. They have similar output fields as explained above.

In the example, the results from the Detailed Network Device-Driver Statistics lead to the following:

recv bytes = 80584 packets * 1364 bytes/packet = 109,916,576 bytes
xmit bytes = 12678 packets * 152 bytes/packet = 1,927,056 bytes
total bytes exchanged = 109,916,576 + 1,927,056 = 111,843,632 bytes
total bits exchanged = 111,843,632 * 8 bits/byte = 894,749,056 bits
transmit speed = 894,749,056 / 401.053 = 2.23 Mb/s (assuming that the copy took the entire monitoring period)

As in the global device driver report, you can conclude that this case is not network-saturated. The average receive size is 1363.6 bytes, near to the default MTU (maximum transmission unit) value, which is 1492 when the device is a Token-Ring card. If this value is larger than the MTU (from lsattr -E -l interface, replacing interface with the interface name, such as en0 or tr0, you could change the MTU or adapter transmit-queue length value to get better performance with the following command:

# ifconfig tr0 mtu 8500

# chdev -l 'tok0' -a xmt_que_size='150'

If the network is congested already, changing the MTU or queue value will not help.

Notes:

If transmit and receive packet sizes are small on the device driver statistics report, then increasing the current MTU size will probably result in better network performance.
If system wait time due to network calls is high from the network wait time statistics for the NFS client report, the poor performance is due to the network.

Limitations of netpmon

The netpmon command uses the trace facility to collect the statistics. Therefore, it has an impact on the system workload, as follows.

In a moderate, network-oriented workload, the netpmon command increases overall CPU utilization by 3-5 percent.
In a CPU-saturated environment with little I/O of any kind, the netpmon command slowed a large compile by about 3.5 percent.

To alleviate these situations, use offline processing and on systems with many CPUs use the -C all flag with the trace command.

The traceroute Command

While the ping command confirms IP network reachability, you cannot pinpoint and improve some isolated problems. Consider the following situation:

When there are many hops (for example, gateways or routes) between your system and the destination, and there seems to be a problem somewhere along the path. The destination system may have a problem, but you need to know where a packet is actually lost.
The ping command hangs up and does not tell you the reasons for a lost packet.

The traceroute command can inform you where the packet is located and why the route is lost. If your packets must pass through routers and links, which belong to and are managed by other organizations or companies, it is difficult to check the related routers through the telnet command. The traceroute command provides a supplemental role to the ping command.

Note

The traceroute command is intended for use in network testing, measurement, and management. It should be used primarily for manual fault isolation. Because of the load it imposes on the network, do not use the traceroute command during typical operations or from automated scripts.

Successful traceroute Examples

The traceroute command uses UDP packets and uses the ICMP error-reporting function. It sends a UDP packet three times to each gateway or router on the way. It starts with the nearest gateway and expands the search by one hop. Finally, the search gets to the destination system. In the output, you see the gateway name, the gateway's IP address, and three round-trip times for the gateway. See the following example:

# traceroute wave
trying to get source for wave
source should be 9.53.155.187
traceroute to wave.austin.ibm.com (9.53.153.120) from 9.53.155.187 (9.53.155.187), 30 hops max
outgoing MTU = 1500
 1  9.111.154.1 (9.111.154.1)  5 ms  3 ms  2 ms
 2  wave (9.53.153.120)  5 ms  5 ms  5 ms

Following is another example:

# traceroute wave
trying to get source for wave
source should be 9.53.155.187
traceroute to wave.austin.ibm.com (9.53.153.120) from 9.53.155.187 (9.53.155.187), 30 hops max
outgoing MTU = 1500
 1  9.111.154.1 (9.111.154.1)  10 ms  2 ms  3 ms
 2  wave (9.53.153.120)  8 ms  7 ms  5 ms

After the address resolution protocol (ARP) entry expired, the same command was repeated. Note that the first packet to each gateway or destination took a longer round-trip time. This is due to the overhead caused by the ARP. If a public-switched network (WAN) is involved in the route, the first packet consumes a lot of memory due to a connection establishment and may cause a timeout. The default timeout for each packet is 3 seconds. You can change it with the -w option.

The first 10 ms is due to the ARP between the source system (9.53.155.187) and the gateway 9.111.154.1. The second 8 ms is due to the ARP between the gateway and the final destination (wave). In this case, you are using DNS, and every time before the traceroute command sends a packet, the DNS server is searched.

Failed traceroute Examples

For a long path to your destination or complex network routes, you may see a lot of problems with the traceroute command. Because many things are implementation-dependent, searching for the problem may only waste your time. If all routers or systems involved are under your control, you may be able to investigate the problem completely.

Gateway (Router) Problem

In the following example, packets were sent from the system 9.53.155.187. There are two router systems on the way to the bridge. The routing capability was intentionally removed from the second router system by setting the option ipforwarding of the no command to 0. See the following example:

# traceroute lamar
trying to get source for lamar
source should be 9.53.155.187
traceroute to lamar.austin.ibm.com (9.3.200.141) from 9.53.155.187 (9.53.155.187), 30 hops max
outgoing MTU = 1500
 1  9.111.154.1 (9.111.154.1)  12 ms  3 ms  2 ms
 2  9.111.154.1 (9.111.154.1)  3 ms !H *  6 ms !H

If an ICMP error message, excluding Time Exceeded and Port Unreachable, is received, it is displayed as follows:

!H: Host Unreachable
!N: Network Unreachable
!P: Protocol Unreachable
!S: Source route failed
!F: Fragmentation needed

Destination System Problem

When the destination system does not respond within a 3-second time-out interval, all queries are timed out, and the results are displayed with an asterisk (*).

# traceroute chuys
trying to get source for chuys
source should be 9.53.155.187
traceroute to chuys.austin.ibm.com (9.53.155.188) from 9.53.155.187 (9.53.155.187), 30 hops max
outgoing MTU = 1500
 1  * * *
 2  * * *
 3  * * *
^C#

If you think that the problem is due to a communication link, use a longer timeout period with the -w flag. Although rare, all the ports queried might have been used. You can change the ports and try again.

Number of "hops" to Destination

Another output example might be as follows:

# traceroute mysystem.university.edu (129.2.130.22)
traceroute to mysystem.university.edu (129.2.130.22), 30 hops max
1 helios.ee.lbl.gov (129.3.112.1) 0 ms 0 ms 0 ms
2 lilac-dmc.university.edu (129.2.216.1) 39 ms 19 ms 39 ms
3 lilac-dmc.university.edu (129.2.215.1) 19 ms 39 ms 19 ms
4 ccngw-ner-cc.university.edu (129.2.135.23) 39 ms 40 ms 19 ms
5 ccn-nerif35.university.edu (129.2.167.35) 39 ms 39 ms 39 ms
6 csgw/university.edu (129.2.132.254) 39 ms 59 ms 39 ms
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 rip.university.EDU (129.2.130.22) 59 ms! 39 ms! 39 ms!

In this example, exactly half of the 12 gateway hops (13 is the final destination) are "missing." However, these hops were actually not gateways. The destination host used the time to live (ttl) from the arriving datagram as the ttl in its ICMP reply; thus, the reply timed out on the return path. Because ICMPs are not sent for ICMPs, no notice was received. The ! (exclamation mark) after each round-trip time indicates some type of software incompatibility problem. (The cause was diagnosed after the traceroute command issued a probe of twice the path length. The destination host was really only seven hops away.)

The iptrace daemon, and the ipreport and ipfilter Commands

You can use many tools for observing network activity. Some run under the operating system, others run on dedicated hardware. One tool that can be used to obtain a detailed, packet-by-packet description of the LAN activity generated by a workload is the combination of the iptrace daemon and the ipreport command. To use the iptrace daemon with operating system version 4, you need the bos.net.tcp.server fileset. The iptrace daemon is included in this fileset, as well as some other useful commands such as the trpt and tcdump commands. The iptrace daemon can only be started by a root user.

By default, the iptrace daemon traces all packets. The option -a allows exclusion of address resolution protocol (ARP) packets. Other options can narrow the scope of tracing to a particular source host (-s), destination host (-d), or protocol (-p). Because the iptrace daemon can consume significant amounts of processor time, be as specific as possible when you describe the packets you want traced.

Because iptrace is a daemon, start the iptrace daemon with the startsrc command rather than directly from the command line. This method makes it easier to control and shut down cleanly. A typical example would be as follows:

# startsrc -s iptrace -a "-i tr0 /home/user/iptrace/log1"

This command starts the iptrace daemon with instructions to trace all activity on the Token-Ring interface, tr0, and place the trace data in /home/user/iptrace/log1. To stop the daemon, use the following:

# stopsrc -s iptrace

If you did not start the iptrace daemon with the startsrc command, you must use the ps command to find its process ID with and terminate it with the kill command.

The ipreport command is a formatter for the log file. Its output is written to standard output. Options allow recognition and formatting of RPC packets (-r), identifying each packet with a number (-n), and prefixing each line with a 3-character string that identifies the protocol (-s). A typical ipreport command to format the log1 file just created (which is owned by the root user) would be as follows:

# ipreport -ns log1 >log1_formatted

This would result in a sequence of packet reports similar to the following examples. The first packet is the first half of a ping packet. The fields of most interest are as follows:

The source (SRC) and destination (DST) host address, both in dotted decimal and in ASCII
The IP packet length (ip_len)
The indication of the higher-level protocol in use (ip_p)

Packet Number 131
TOK: =====( packet transmitted on interface tr0 )=====Fri Jan 14 08:42:07 2000
TOK: 802.5 packet
TOK: 802.5 MAC header:
TOK: access control field = 0, frame control field = 40
TOK: [ src = 90:00:5a:a8:88:81, dst = 10:00:5a:4f:35:82]
TOK: routing control field = 0830,  3 routing segments
TOK: routing segments [ ef31 ce61 ba30  ]
TOK: 802.2 LLC header:
TOK: dsap aa, ssap aa, ctrl 3, proto 0:0:0, type 800 (IP)
IP:     < SRC =  129.35.145.140 >  (alborz.austin.ibm.com)
IP:     < DST =  129.35.145.135 >  (xactive.austin.ibm.com)
IP:     ip_v=4, ip_hl=20, ip_tos=0, ip_len=84, ip_id=38892, ip_off=0
IP:     ip_ttl=255, ip_sum=fe61, ip_p = 1 (ICMP)
ICMP:   icmp_type=8 (ECHO_REQUEST)  icmp_id=5923  icmp_seq=0
ICMP: 00000000     2d088abf 00054599 08090a0b 0c0d0e0f     |-.....E.........|
ICMP: 00000010     10111213 14151617 18191a1b 1c1d1e1f     |................|
ICMP: 00000020     20212223 24252627 28292a2b 2c2d2e2f     | !"#$%&'()*+,-./|
ICMP: 00000030     30313233 34353637                       |01234567        |

The next example is a frame from an ftp operation. Note that the IP packet is the size of the MTU for this LAN (1492 bytes).

Packet Number 501
TOK: =====( packet received on interface tr0 )=====Fri Dec 10 08:42:51 1999
TOK: 802.5 packet
TOK: 802.5 MAC header:
TOK: access control field = 18, frame control field = 40
TOK: [ src = 90:00:5a:4f:35:82, dst = 10:00:5a:a8:88:81]
TOK: routing control field = 08b0,  3 routing segments
TOK: routing segments [ ef31 ce61 ba30  ]
TOK: 802.2 LLC header:
TOK: dsap aa, ssap aa, ctrl 3, proto 0:0:0, type 800 (IP)
IP:     < SRC =  129.35.145.135 >  (xactive.austin.ibm.com)
IP:     < DST =  129.35.145.140 >  (alborz.austin.ibm.com)
IP:     ip_v=4, ip_hl=20, ip_tos=0, ip_len=1492, ip_id=34233, ip_off=0
IP:     ip_ttl=60, ip_sum=5ac, ip_p = 6 (TCP)
TCP:    <source port=20(ftp-data), destination port=1032 >
TCP:    th_seq=445e4e02, th_ack=ed8aae02
TCP:    th_off=5, flags<ACK |>
TCP:    th_win=15972, th_sum=0, th_urp=0
TCP: 00000000     01df0007 2cd6c07c 00004635 000002c2     |....,..|..F5....|
TCP: 00000010     00481002 010b0001 000021b4 00000d60     |.H........!....`|
             --------- Lots of uninteresting data omitted -----------
TCP: 00000590     63e40000 3860000f 4800177d 80410014     |c...8`..H..}.A..|
TCP: 000005a0     82220008 30610038 30910020              |."..0a.80..     |

The ipfilter command extracts different operation headers from an ipreport output file and displays them in a table. Some customized NFS information regarding requests and replies is also provided.

To determine whether the ipfilter command is installed and available, run the following command:

# lslpp -lI perfagent.tools

An example command is as follows:

# ipfilter log1_formatted

The operation headers currently recognized are: udp, nfs, tcp, ipx, icmp. The ipfilter command has three different types of reports, as follows:

A single file (ipfilter.all) that displays a list of all the selected operations. The table displays packet number, Time, Source & Destination, Length, Sequence #, Ack #, Source Port, Destination Port, Network Interface, and Operation Type.
Individual files for each selected header (ipfilter.udp, ipfilter.nfs, ipfilter.tcp, ipfilter.ipx, ipfilter.icmp). The information contained is the same as ipfilter.all.
A file nfs.rpt that reports on NFS requests and replies. The table contains: Transaction ID #, Type of Request, Status of Request, Call Packet Number, Time of Call, Size of Call, Reply Packet Number, Time of Reply, Size of Reply, and Elapsed millisecond between call and reply.

Adapter Statistics

The commands in this section provide output comparable to the netstat -v command. They allow you to reset adapter statistics (-r) and to get more detailed output (-d) than the netstat -v command output provides.

The entstat Command

The entstat command displays the statistics gathered by the specified Ethernet device driver. The user can optionally specify that the device-specific statistics be displayed in addition to the device-generic statistics. Using the -d option will list any extended statistics for this adapter and should be used to ensure all statistics are displayed. If no flags are specified, only the device-generic statistics are displayed.

The entstat command is also invoked when the netstat command is run with the -v flag. The netstat command does not issue any entstat command flags.

# entstat ent0
-------------------------------------------------------------
ETHERNET STATISTICS (ent0) :
Device Type: IBM 10/100 Mbps Ethernet PCI Adapter (23100020)
Hardware Address: 00:60:94:e9:29:18
Elapsed Time: 0 days 0 hours 0 minutes 0 seconds

Transmit Statistics:                          Receive Statistics:
--------------------                          -------------------
Packets: 0                                    Packets: 0
Bytes: 0                                      Bytes: 0
Interrupts: 0                                 Interrupts: 0
Transmit Errors: 0                            Receive Errors: 0
Packets Dropped: 0                            Packets Dropped: 0
                                              Bad Packets: 0
Max Packets on S/W Transmit Queue: 0
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 0                          Broadcast Packets: 0
Multicast Packets: 0                          Multicast Packets: 0
No Carrier Sense: 0                           CRC Errors: 0
DMA Underrun: 0                               DMA Overrun: 0
Lost CTS Errors: 0                            Alignment Errors: 0
Max Collision Errors: 0                       No Resource Errors: 0
Late Collision Errors: 0                      Receive Collision Errors: 0
Deferred: 0                                   Packet Too Short Errors: 0
SQE Test: 0                                   Packet Too Long Errors: 0
Timeout Errors: 0                             Packets Discarded by Adapter: 0
Single Collision Count: 0                     Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Driver Flags: Up Broadcast Running
        Simplex 64BitSupport

In the above report, you may want to concentrate on:

Transmit Errors: Number of output errors encountered on this device. This is a counter for unsuccessful transmissions due to hardware/network errors.
Receive Errors: Number of input errors encountered on this device. This is a counter for unsuccessful reception due to hardware/network errors.
Packets Dropped: Number of packets accepted by the device driver for transmission which were not (for any reason) given to the device.
Max Packets on S/W Transmit Queue: Maximum number of outgoing packets ever queued to the software transmit queue.
S/W Transmit Queue Overflow: Number of outgoing packets that have overflowed the transmit queue.
No Resource Errors: Number of incoming packets dropped by the hardware due to lack of resources. This error usually occurs because the receive buffers on the adapter were exhausted. Some adapters may have the size of the receive buffers as a configurable parameter. Check the device configuration attributes (or SMIT helps) for possible tuning information.
Single Collision Count/Multiple Collision Count: Number of collisions on an Ethernet network. These collisions are accounted for here rather than in the collision column of the output of the netstat -i command.

Notice in this example, the Ethernet adapter is behaving well because there are no Receive Errors. These errors are sometimes caused when a saturated network only transmits partial packets. The partial packets are eventually retransmitted successfully but are recorded as receive errors.

If you receive S/W Transmit Queue Overflow errors, the value of Max Packets on S/W Transmit Queue will correspond to the transmit queue limit for this adapter (xmt_que_size).

Note

These values can represent the hardware queue if the adapter does not support a software transmit queue. If there are transmit-queue overflows, then increased the hardware or software queue limits for the driver.

If there are not enough receive resources, this would be indicated by Packets Dropped: and depending on the adapter type, would be indicated by Out of Rcv Buffers or No Resource Errors: or some similar counter.

The elapsed time displays the real-time period that has elapsed since the last time the statistics were reset. To reset the statistics, use the entstat -r adapter_name command.

Similar output can be displayed for Token-Ring, FDDI, and ATM interfaces using the tokstat, fddistat, and atmstat commands.

The tokstat Command

The tokstat command displays the statistics gathered by the specified Token-Ring device driver. The user can optionally specify that the device-specific statistics be displayed in addition to the device driver statistics. If no flags are specified, only the device driver statistics are displayed.

This command is also invoked when the netstat command is run with the -v flag. The netstat command does not issue any tokstat command flags.

The output produced by the tokstat tok0 command and the problem determination are similar to that described in The entstat Command.

The fddistat Command

The fddistat command displays the statistics gathered by the specified FDDI device driver. The user can optionally specify that the device-specific statistics be displayed in addition to the device driver statistics. If no flags are specified, only the device driver statistics are displayed.

This command is also invoked when the netstat command is run with the -v flag. The netstat command does not issue any fddistat command flags.

The output produced by the fddistat fddi0 command and the problem determination are similar to that described in The entstat Command.

The atmstat Command

The atmstat command displays the statistics gathered by the specified ATM device driver. The user can optionally specify that the device-specific statistics be displayed in addition to the device driver statistics. If no flags are specified, only the device driver statistics are displayed.

The output produced by the atmstat atm0 command and the problem determination are similar to that described in The entstat Command.

The no Command

Use the no command to display current network values and to change options.

-a: Prints all options and current values (example: no -a)
-d: Sets options back to default (example: no -d thewall)
-o: option=NewValue (example: no -o thewall=16384)

For a listing of all attributes for the no command, see Network Option Tunable Parameters.

Some network attributes are run-time attributes that can be changed at any time. Others are load-time attributes that must be set before the netinet kernel extension is loaded.

Note

When the no command is used to change parameters, the change is in effect only until the next system boot. At that point, all parameters are initially reset to their defaults. To make the change permanent, put the appropriate no command in the /etc/rc.net file.

If your system uses Berkeley-style network configuration, set the attributes near the top of the /etc/rc.bsdnet file. If you use an SP system, edit the tuning.cust file as documented in the RS/6000 SP: Installation and Relocation manual.

Note

The no command performs no-range checking. If it is used incorrectly, the no command can cause your system to become inoperable.

The following tuning sections discuss some of the no command attributes and how to adjust them.