ITEM: F1423L

nfsstat, mbuf, netstat Tuning Considerations

This item describes how to interpret some of the output of the nfsstat
command, and how it can relate to network memory (lowmbuf, lowclust,
etc.) tuning.  The information that follows is compiled from personal
experience, various HOWTO and HONE ASKQ items, and some excellent
literature references listed at the end of this document.

On the client, run the command:     nfsstat -c

and review the Client rpc data, which could look similar to this:

Client rpc:  calls   badcalls   retrans    badxid    timeout wait  newcred
             38397    0         2517       0         2517      0      0

TOO MANY TIMOUTS?

If timeout is greater than 5% of calls, rpc requests from the client
are timing out before getting a response from a server.  The cause of
these timeouts can be better understood by looking at badxid.

If badxid is approximately equal to timeout, the client's initial
request is timing out, the client is retransmitting the request, and
the server is responding to both requests (badxid is incremented at
the client when a response is received to a request that has already
been answered).  Increase the timeo parameter on the client mount, or
tune the server to reduce the average request service time.  Timeouts
of this nature are more likely when client and server are on different
IP subnets, with router(s) in between, particularly if links between
routers are relatively low speed (e.g., 56Kb) when compared to LAN
speeds.

The default timeout on an NFS client mount in AIX is 7 (tenths of a
second).  A good rule of thumb is to increase the timeo parameter 2
tenths for each router between client and server.  You might want to
increase the timeo parameter even more if you are crossing network
links that are significantly slower than local lan speeds, (e.g., 56Kb
links between routers).

If badxid is approximately equal to zero with a large timeout, it is
likely that the network is dropping parts of NFS requests or replies
between the NFS client and server.  Reduce the NFS buffer sizes using
the rsize and wsize options on the client mount.

Reducing read and write buffer size (rsize and wsize) can give
dramatic performance improvements, by "leveling" the bursts of
traffic.  Performance improvements are particularly noticable when
clients are accessing "large" files on the NFS server (500Kbytes and
larger).  CAD/CAM users often access models that are 1-2Mbyte or
larger in products like CADAM, CATIA, CAEDS, AUTOCAD, etc.

NFS mount options at the client can be adjusted in one of two ways:

1) smitty chnfsmnt
     Buffer SIZE for read                          [4096]          \#
     Buffer SIZE for writes                        [4096]          \#
     NFS TIMEOUT. In tenths of a second            [12]            \#

  The numeric fields above will be blank if the defaults are in effect.

2) edit /etc/filesystems, and find the stanza for the filesystem of
   interest.  On the options line of this stanza, add

   rsize=4096,wsize=4096,timeo=12

   then unmount and remount the filesystem.  As with the smit method,
   these options will not be listed in /etc/filesystems if the defaults
   are in effect.

HOW MANY nfsd DAEMONS DO I NEED?

If     netstat -s -p udp      shows any socket buffer overflows, you
may benefit from more nfsd daemons running.  On the other hand, if
nfsstat -s     shows any nullrecv counts in the Server rpc: data, you
have more nfsd daemons running than you need.

From a performance standpoint, it is bad to have excessive numbers of
nfsd daemons running.  These are kernel level processes, running at
higher priority than user work, and always dispatched ahead of user
work.  A nullrecv count indicates an nfsd daemon was called, but there
was no request waiting on the queue for it to process.

Change the number of pcnfsd daemons by doing a   smitty chnfs   

mbuf MEMORY TUNING

Beyond just NFS tuning, underlying memory allocation for TCP/IP
networking can also be a performance factor.  Memory for networking is
allocated from the AIX Virtual Memory Manager (VMM) into two pools,
the mbuf pool (each mbuf is 256 bytes) and the cluster pool (an mbuf
cluster is 4096 bytes).

The netstat -m command will have output like the following:

255 mbufs in use:
        17 mbufs allocated to data
        6 mbufs allocated to packet headers
        83 mbufs allocated to socket structures
        115 mbufs allocated to protocol control blocks
        22 mbufs allocated to routing table entries
        8 mbufs allocated to socket names and addresses
        4 mbufs allocated to interface addresses
        17/105 mapped pages in use
        483 Kbytes allocated to network  (27% in use)
        0 requests for memory denied
        0 requests for memory delayed
        0 calls to protocol drain routines

The line that begins "17/105 mapped pages..." indicates that there are
105 pinned clusters, of which 17 are currently in use.  If the
"requests for memory denied" value is nonzero, the mbuf and/or cluster
pools may need to be expanded.  A request for memory denied represents
a request for an mbuf that could not be satisfied, and essentially, a
dropped packet.

The netstat -m report can be compared to the existing system
parameters by issuing the no -a command, which reports all current
settings, including:

                 lowclust = 49
                  lowmbuf = 188
                  thewall = 2048
              mb_cl_hiwat = 98

lowclust  and lowmbuf are the number of 4K clusters and 256Byte mbufs
the system will always try to keep free and available for networking.
When a burst of network activity causes the system to fall below these
thresholds, the netm process is called to allocate more
memory for networking from the VMM.  Memory can be allocated for
networking up to thewall, in this case 2048K (2M).  Notice from
netstat -m the 483 Kbytes allocated is well below thewall.

As network activity subsides, clusters allocated for networking become
free and available again.  When free clusters reach mb_cl_hiwat, the
netm process is called again, this time to give memory back to VMM.
If mb_cl_hiwat is set too close to lowclust, netm may be called
periodically to get more memory for networking, then quickly called
again to give it back, only to be called to allocate memory for
networking again with the next burst of network traffic.  To avoid
this netm "thrashing," mb_cl_hiwat should be at least twice the value
of lowclust.

The networking options lowclust, lowmbuf, etc, can be adjusted
dynamically with the no command, like so:

        no -o lowmbuf=200
        no -o lowclust=50
        no -o mb_cl_hiwat=100
        no -o thewall=4096

To make these changes persist thru a reboot, add them to the bottom of
/etc/rc.net (or the bottom of /etc/bsd.net, if the customer is using
Berkeley style configuration).

When changing networking options, the following rules must be adhered
to:

1) when increasing lowclust, increase lowmbuf by at least the same
   amount, because each cluster is pointed to by an mbuf.

2) mb_cl_hiwat should be set to at least twice the value of lowclust.

3) depending on network traffic for the system, thewall may have to be
   increased to hold the lowclust and lowmbuf pools, as well as the memory
   in use for networking.

OVERFLOWING the IP INPUT QUEUE?

This question is answered by using the crash command:

\# crash
> knlist ipintrq
     ipintrq: 0x0149ba68  (add hex 10 to the value returned for ipintrq,
                           and use it for input with the od subcommand)
> od 0149ba78 1
0149ba78: 00000000        (if value returned is greater than zero, over-
                           flows have occured)
> quit

If overflows have occured, raise the ipqmaxlen parameter of the no
command in small increments.  The default is 50.

TRANSMIT and RECEIVE QUEUE SIZE

At the device driver layer, both the transmit and receive queues are
configurable, and it is possible to overrun these queues. The command
netstat -v will show Max Transmits queued and Max Receives queued on
each interface in the system. If these values are close to the value of
their respective lengths (use lsattr -E -l device to see the queue
lengths) then the queues should be increased. Another indication that
the transmit queue is being overrun is if the netstat -i command reports
values greater than 0 in the Oerr column (Oerr stands for Outgoing error
and is an indication that the IF layer received an error when it 
called the device driver). In general, increasing the values to 90
seems to handle high throughput loads, but, as with the IP input queue,
these values depend on how the application uses the network.

Transmit and Receive Queue sizes default to 30, which can be too
small, particularly for a server to diskless or dataless clients.
The maximum Transmit and Receive Queue size in 150.

To increase Transmit or Receive Queue size, do

   smitty       (fastpath: smitty commodev)
      devices
         communication
             (choose the adapter of choice)
             Change/Show Characteristics

  TRANSMIT queue size               [100]         +\#
  RECEIVE queue size                [100]         +\#
  ...
  Apply change to DATABASE only     yes           +

Be sure to toggle DATABASE only to yes, and then you will have to
reboot for the large queue sizes to take effect.

Another (perhaps more common) cause of input errs on a network interface
is the presence of other protocols on your network that the INET 
interface does not recognize, and counts as errors.  The data above for
a tr0 interface shows input errs.  On this particular machine, an iptrace
and netstat were run over approximately a "measured minute," with the 
following commands:

netstat -I tr0 60
iptrace -d \ -i tr0 /tmp/nettrace  (all input frames on tr0)

The netstat command showed 127 packets, 8 errs.  The /tmp/nettrace 
file was formatted with the following command

ipreport -r -n -s /tmp/nettrace >/tmp/netreport

and the /net/netreport file showed a corresponding 8 packets with
the following LLC header information:

TOK: dsap aa, ssap aa, ctrl 3, proto 0:0:0, type 600 (unknown)

For the purpose of information, a Network General Sniffer identified
type 600 LLC packets as XNS protocol.  There may be other protocols
on your network that are equally unknown to INET.
 LITERATURE REFERENCES

 Hal Stern, "Managing NFS and NIS," O'Reilly & Associates, Inc. April 1992

 Performance Monitoring and Tuning Guide, SC23-2365.




Support Line: nfsstat, mbuf, netstat Tuning Considerations ITEM: F1423L

Dated: November 1993 Category: N/A

This HTML file was generated 99/06/24~13:30:52

Comments or suggestions?
Contact us