[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

Performance Management Guide


Tuning the SP Network

This section provides information about network tunable parameters that need special attention in an SP environment. For a more detailed discussion on SP tuning, see RS/6000 SP System Performance Tuning.

SP Switch Statistics

Note: The commands in this section are SP-specific.

The estat Command

The unsupported and undocumented estat command can be helpful in determining SP Switch problems. The entstat command is located in the /usr/lpp/ssp/css directory and produces output similar to the entstat command. The output contains sections for transmit, receive, and general statistics.

The output that is helpful in determining SP Switch problems is Transmit Errors, Receive Errors, Packets Dropped, and No mbuf Errors. The second line in the output indicates how long the adapter has been online.

The vdidlxxxx Commands

These unsupported and undocumented commands display the SP Switch pool usage since the SP Switch was last started. There are several commands called vdidlxxxx (where vdidl3 is for an MCA-based node, vdidl3mx for a 332 MHz node, and vdidl3pci for the S70 and S7A). These commands are found in the /usr/lpp/ssp/css directory on each node. For the SP Switch, only the send pool is used because microcode in the adapter manages the receive pool.

Following is an example for the vdidl3 command:

# /usr/lpp/ssp/css/vdidl3 -i
send pool: size=524288 anchor@=0x50002c00 start@=0x50dc0000 tags@=0x50001d00
bkt   allocd     free  success     fail    split     comb    freed
 12        0        0        0        0       1         0        0
 13        0        0        0        0        0        0        0
 14        0        0        0        0        0        0        0
 15        0        0        0        0        0        0        0
 16        0        8        0        0        0        0        0
rsvd pool: size=262144 anchor@=0x50002000 start@=0x50e40000 tags@=0x50b84680
bkt   allocd     free  success     fail    split     comb    freed
 12        0        0        0        0        0        0        0
 13        0        0        0        0        0        0        0
 14        0        0        0        0        0        0        0
 15        0        0        0        0        0        0        0
 16        0        4        0        0        0        0        0
recv pool: size=524288 anchor@=0x50002e00 start@=0x50e80000 tags@=0x50001e00
bkt   allocd     free  success     fail    split     comb    freed
 12        0        0        0        0        0        0        0
 13        0        0        0        0        0        0        0
 14        0        0        0        0        0        0        0
 15        0        0        0        0        0        0        0
 16        0        0        0        0        0        0        0

Interpret the output carefullt because some of the statistics have several meanings. For the SP Switch, only the send pool is used because the receive pool is managed by microcode in the adapter. Each column is described as follows:

bkt
Lists the pool allocation in powers of 2 for the line it is on. The line starting with 12 means 2 to the 12th or 4 KB allocations, and the line starting with 16 means 2 to the 16th or 64 KB allocations.

allocd
Lists the current allocations at the time the command was run, for each of the size allocations in the first column. This "snapshot" value fluctuates between successive executions of the command.

free
Lists the number of buffers of each allocation that are allocated and unused at the time the command was run. In the above example, eight 64 K allocations are free for use. This instantaneous value canfluctuates between successive executions of the command.

success
This counter increments every time an allocation of the given size succeeded. This counter is cumulative, so it shows the number of successes since the adapter was last initialized.

fail
This counter is incremented every time an allocation is not available for the size requested. However, it is possible that the allocation is made by splitting up a larger allocation, or combining smaller ones into the size needed. A fail does not necessarily mean a packet was dropped. It is also possible that an allocation was split from a larger allocation without incurring a failed count. This counter is cumulative, so it shows the number of fails since the adapter was last initialized.

split
This counter indicates how many times the allocation size was extracted from the pool by carving the size needed from a larger allocation in the pool. This counter is cumulative, so it shows the number of splits since the adapter was last initialized.

comb
This field is currently not used.

freed
This field is currently not used.

SP System-Specific Tuning Recommendations

The following are specific details for setting the network tunables for the SP system.

arptab_bsiz
Number of ARP cache entries in each bucket (default = 7). See the following table for recommendations.

arptab_nb
Number of ARP cache buckets (default = 25). See the following table for recommendations.

CSS MTU size
The recommended MTU of a switch is 65520. However, under some circumstances (for example, when you want to avoid the Nagle algorithm causing very slow traffic), it may be necessary to reduce this value. You can reduce the MTU of a switch to 32678 with only a 2 to 10 percent loss in throughput. However, CPU utilization will be slightly higher due to the per-packet overhead.

 
To reduce the MTU of the switch, run the following:
ifconfig css0 mtu new_size
This command takes effect immediately and must be run as root user. Always use the same MTU across all nodes in an SP.

rfc1323
See rfc1323.

sb_max
See TCP Socket Buffer Tuning.

tcp_mssdflt
See When to Use the tcp_mssdflt Option of the no Command.

tcp_sendspace and tcp_recvspace
See TCP Socket Buffer Tuning. These parameters should never be higher than the major network adapter transmit queue limit. To calculate this limit, use:
(major adapter queue size) * (major network adapter MTU)

thewall
See Tuning Network Memory.

udp_sendspace and udp_recvspace
See UDP Socket Buffer Tuning.

By default, the maximum number of ARP entries allowed is 175 (25 * 7). This default value of 175 might not be large enough in SP environments with many nodes. An inadequate number of slots in the ARP cache will slow the performance of nodes in the system. Use the following table to estimate optimal values for the arptab_nb and arptab_bsiz variables.

Number of Nodes arptab_nb Number of Interfaces arptab_bsiz
1 - 64 25 1 - 3 7
65 - 128 64 4 8
129 - 256 128 5 10
257 - 512 256 more... 2 times number of interfaces

In general, arptab_nb increases monotonically with the number of nodes and arptab_bsiz with the number of IP interfaces in an SP system.

These parameters must be placed in the first section of the /etc/rc.net file in front of the configuration methods.

Managing Tunable SP Parameters

The SP usually requires that tunable settings be changed from the default values in order to achieve optimal performance of the entire system. Placement of these tunable values is crucial. If they are not set in the correct places, subsequent rebooting of the nodes or other changes can cause them to change or be lost.

For all dynamically tunable values (those that take effect immediately), the setting for each node should be set in the tuning.cust file. This file is found in the /tftpboot directory on each node. There is also a copy of the file in this same directory on the Control Work Station (CWS). Tunable parameters changed using the no, nfso or vmtune command can be included in this file. Even though the sample files do not include nfso and vmtune commands, the commands can be added.

A small number of tuning recommendations that are not dynamically tunable values need to be changed in the rc.net file. These tunable parameters are for ARP cache-tuning and setting the number of adapter types per interface. The following are the only tunable parameters that should be added to rc.net:

Using the sample tuning.cust settings selected as part of the installation is a sufficient starting point for the SP nodes in the environment type selected.

If the system has nodes that require different tuning settings, it is recommended that a copy of each setting be saved on the CWS. When nodes with specific tuning settings are installed, that version of tuning.cust must be moved into /tftpboot on the CWS.

Another option is to create one tuning.cust file that determines the node number, and based on that node number, sets the appropriate tuning values.

Initial Settings of SP Tunable Parameters

When a node is installed, migrated or customized, and that node's boot/install server does not have a /tftpboot/tuning.cust file, a default file with performance tuning variable settings in /usr/lpp/ssp/install/tuning.default is copied to /tftpboot/tuning.cust on that node. You can choose from one of the four sample tuning files, or you can create and customize your own. The existing files are located in the /usr/lpp/ssp/install/config directory and are as follows:

tuning.commercial
Contains initial performance tuning parameters for a typical commercial environment.

tuning.development
Contains initial performance tuning parameters for a typical interactive or development environment. These are the default tuning parameters.

tuning.scientific
Contains initial performance tuning parameters for a typical engineering/scientific environment.

tuning.server
Contains initial performance tuning parameters for a typical server environment.

The other option is to create and select your own alternate tuning file. While this may not be the initial choice, it certainly must be the choice at some point in time. On the CWS, create a tuning.cust file, or you can begin with one of the sample files. Edit the tuning.cust file and proceed to the installation of nodes. This tuning.cust file is then propagated to each node's /tftpboot/tuning.cust file from the boot/install server when the node is installed, migrated, or customized. The tuning file is maintained across reboots.

Tuning the SP Network for Specific Workloads

The following table provides a combined overview of tunable parameters for different environments. The settings given are only initial settings and are not guaranteed to be optimized for your environment. Examine your specific implementation and adjust your tuning settings accordingly.

Parameter Commercial Environment Server Environment Scientific Environment Development Environment
thewall 16384 65536 16384 16384
sb_max 1310720 1310720 1310720 1310720
subnetsarelocal 1 1 1 1
ipforwarding 1 1 1 1
tcp_sendspace 262144 65536 655360 65536
tcp_recvspace 262144 65536 655360 65536
udp_sendspace 65536 65536 65536 32768
udp_recvspace 655360 655360 655360 65536
rfc1323 1 1 1 1
tcp_mssdflt 1448 1448 Varies depending on other network types 1448
tcp_mtu_discover (AIX 4.2.1 and later) 1 1 1 1
udp_mtu_discover (AIX 4.2.1 and later) 1 1 1 1


[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]