HOW TO TUNE AIX KERNEL MBUF SETTING?

ITEM: RTA000024192



Q: How do I change the AIX kernel mbuf and cluster pools?  I am getting         
many "requests for memory denied" when I issue a 'netstat -m' command.          
                                                                                
---------- ---------- ---------- --------- ---------- ----------                
A: This information was found in a forum.  Because I feel the entire            
document is of use in solving this type of problem, I have appended             
it here.                                                                        
                        AIX 3.2 Network Tuning Guide                            
                                                      April 27, 1992            
        1.  Tuning the memory buffer (mbuf) pools                               
        1.1  Why tune the mbuf pools                                            
        The network subsystem uses a memory management facility that            
        revolves around a data structure called an "mbuf".  Mbufs               
        are mostly used to store data for incoming and outbound                 
        network traffic.  Having mbuf pools of the right size can              
        have a very positive effect on network performance. If the              
        mbuf pools are configured improperly, both network and                  
        system performance can suffer.  AIX offers the capability               
        for run-time mbuf pool configuration. With this convenience             
        comes the responsibility for knowing when the pools need                
        adjusting and how much they should be adjusted.                         
        1.2  Overview of the mbuf management facility                           
        The mbuf management facility controls two pools of buffers:             
        a pool of small buffers (256 bytes each), which are simply              
        called "mbufs", and a pool of large buffers (4096 bytes                 
        each), which are usually called "mbuf-clusters" or just                 
        "clusters". The pools are created from system memory by                 
        making an allocation request to the Virtual Memory Manager              
        (VMM). The pools consist of pinned pieces of virtual memory;            
        this means that they must always reside in physical memory             
        and are never paged out. The result is that the real memory             
        available for paging-in application programs and data has               
        been decreased by the amount that the mbuf pools have been              
        increased. This is a non-trivial cost that must always be               
        taken into account when considering an increase in the size             
        of the mbuf pools.                                                      
        The initial size of the mbuf pools is system-dependant.                 
        There is a minimum number of (small) mbufs and clusters                 
        allocated for each system, but these minimums are increased             
        by an amount that depends on the specific system                        
        configuration.  One factor affecting how much they are                  
        increased is the number of communications adapters in the               
        system. The default pool sizes are initially configured to              
        handle small to medium size network loads (network traffic              
        100-500 packets/second). The pool sizes dynamically increase           
        as network loads increase. The cluster pool size is reduced             
        as network loads decrease.  The mbuf pool is never reduced.             
        To optimize network performance, the administrator should               
        balance mbuf pool sizes with network loads (packets/second).            
        If the network load is particularly oriented towards UDP                
        traffic (e.g. NFS server) the size of the mbuf pool should              
        be 2 times the packet/second rate. This is due to UDP                   
                                     1                Network Tuning            
                                                      April 27, 1992            
        traffic consuming an extra small mbuf.                                  
        To provide an efficient mbuf allocation service, an attempt             
        is made to maintain a minimum number of free buffers in the             
        pools at all times. The following network options (which can            
        be manipulated using the no command)  are used to define                
        these lower limits:                                                    
           o lowmbuf                                                            
           o lowclust                                                           
        The lowmbuf option controls the minimum number of free                  
        buffers for the mbuf pool. The lowclust option controls the             
        minimum number of free buffers for the cluster pool.  When              
        the number of buffers in the pools drop below the lowmbuf or            
        lowclust thresholds the pools are expanded by some amount.              
        The expansion of the mbuf free pools is not done                        
        immediately, but is scheduled to be done by a kernel process            
        with the process name of "netm".  When netm is dispatched,              
        the pools will be expanded to meet the minimum requirements             
        of lowclust and lowmbuf. Having a kernel process do this                
        work is required by the structure of the VMM.                           
        An additional function that netm provides is to limit the               
        growth of the cluster pool. The network option that defines            
        this maximum value is:                                                  
           o mb_cl_hiwat                                                        
        The mb_cl_hiwat option controls the maximum number of free              
        buffers the cluster pool can contain. When the number of                
        free clusters in the pool exceeds mb_cl_hiwat, netm will be             
        scheduled to release some of the clusters back to the VMM.              
        The last network option that is used by the mbuf management             
        facility is                                                             
           o thewall                                                            
        The thewall option controls the maximum RAM (in K bytes)                
        that the mbuf management facility can allocate from the VMM.            
        This option is used to prevent unbalanced VMM resources                 
        which result in poor system performance.                                
                                                      April 27, 1992            
        1.3  When to tune the mbuf pools                                       
        When and how much to tune the mbuf pools is directly related            
        to the network load a given machine is being subjected to. A            
        server machine that is supporting many clients is a good                
        candidate for having the mbuf pools tuned to optimize                   
        network performance.  It is important for the system                    
        administrator to understand the networking load for a given             
        system. By using the netstat command you can get a rough                
        idea of the network load in packets/second. For example:                
        netstat -I tr0 1 reports the input and output traffic for               
        both the tr0 network interface and for all network                      
        interfaces on the system. The output below shows the                    
        activity caused by a large ftp operation:                               
           input   (tr0)     output          input  (Total)    output           
      packets errs  packets errs  colls packets errs  packets  errs colls       
        183     0     349     0     0     183     0     349     0     0        
        183     0     353     0     0     183     0     353     0     0         
        203     0     380     0     0     203     0     380     0     0         
        189     0     363     0     0     189     0     363     0     0         
        158     0     293     0     0     158     0     293     0     0         
        191     0     365     0     0     191     0     365     0     0         
        179     0     339     0     0     179     0     339     0     0         
        The netstat command also has an option, -m, that gives                  
        detailed information about the use and availability of the              
        mbufs and clusters                                                      
        182 mbufs in use:                                                       
                17 mbufs allocated to data                                      
                2 mbufs allocated to packet headers                             
                60 mbufs allocated to socket structures                         
                83 mbufs allocated to protocol control blocks                   
                11 mbufs allocated to routing table entries                    
                6 mbufs allocated to socket names and addresses                 
                3 mbufs allocated to interface addresses                        
        16/54 mapped pages in use                                               
        261 Kbytes allocated to network (41% in use)                            
        0 requests for memory denied                                            
        0 requests for memory delayed                                           
        0 calls to protocol drain routines                                      
        The line that begins "16/54 mapped pages..." indicates that             
        there are 54 pinned clusters, of which 16 are currently in              
        use. If the "requests for memory denied" value is nonzero,              
                                     3                Network Tuning            
                                                      April 27, 1992            
        the mbuf and/or cluster pools may need to be expanded.                  
        This report can be compared against the existing system                 
        parameters by issuing the command no -a which reports all of           
        the current settings (the following report has been                     
        abbreviated):                                                           
                         lowclust = 29                                          
                          lowmbuf = 88                                          
                          thewall = 2048                                        
                      mb_cl_hiwat = 58                                          
        It is clear that on the test system the "261 Kbytes                     
        allocated to the network" is considerably short of thewall              
        value of 2048K and the (54-16 = 38) free clusters are short             
        of mb_cl_hiwat limit of 58.                                             
        The "requests for memory denied" counter is maintained by               
        the mbuf management facility and is incremented each time a             
        request for an mbuf allocation cannot be satisfied.                     
        Normally the  "requests for memory denied" value will be                
        zero. If a system experiences a high burst of network                  
        traffic, the default configured mbuf pools will not be                  
        sufficient to meet the demand of the incoming burst, causing            
        the error counter to be incremented once for each mbuf                  
        allocation request that fails. Usually this is in the                   
        thousands due to the large number of packets arriving all at            
        once. The request for memory denied statistic will                      
        correspond with dropped packets on the network. Dropped                 
        network packets mean re-transmissions, resulting in degraded            
        network performance.  If the "requests for memory denied"               
        value is greater than zero it may be appropriate to tune the            
        mbuf parameters -- see "How to tune the mbuf Pools", below.             
        The "Kbytes allocated to the network" statistic is                      
        maintained by the mbuf management facility and represents               
        the current amount of system memory that has been allocated             
        to both mbuf pools.  The upper bound of this statistic set             
        by thewall is used to prevent the mbuf management facility              
        from consuming too much of a system's physical memory.  The             
        default value for thewall limits the mbuf management                    
        facility to 2048K bytes (as shown in the above no -a                    
        report). If "Kbytes currently allocated to the network"                 
        approaches thewall, it may be appropriate to tune the mbuf              
        parameters -- "see How to tune the mbuf Pools", below.                  
        The netm kernel process runs at a very favored priority                 
        (fixed 37). Because of this,  excessive netm dispatching can            
        cause not only poor network performance but also poor system            
                                                      April 27, 1992            
        performance because of contention with other system and user            
        processes. Improperly configured pools can result in netm               
        "thrashing" due to conflicting network traffic needs and                
        improperly tuned thresholds. netm dispatching can be                   
        minimized by properly configuring the mbuf pools to match               
        system and networking needs.                                            
        There are cases where the above indicators suggest that the             
        mbuf pools may need to be expanded, when in fact there is a             
        system problem that should be corrected first. For example:             
           o mbuf memory leak                                                   
           o queued data not being read from socket or other                    
             internal queueing structure                                        
        An mbuf memory leak is a situation in which some kernel or              
        kernel-extension code has neglected to release an mbuf                  
        resource and has destroyed the pointer to its memory                    
        location, thereby losing the address of the mbuf forever. If            
        this occurs repeatedly, eventually all the mbuf resources               
        will be used up.  If the netstat mbuf statistics show a                 
        gradual increase in usage that never decreases or high mbuf            
        usage on a relatively idle system, there may be an mbuf                 
        memory leak.  Developers of kernel extensions that use mbufs            
        should always include checks for memory leaks in their                  
        testing.                                                                
        It is also possible to have a large number of mbufs queued              
        at the socket layer because of an application defect.                   
        Normally an application program would read data from the                
        socket, causing the mbufs to be returned back to the mbuf               
        management facility. An administrator can monitor the                   
        netstat -m mbuf statistics  and look for high mbuf usage                
        while there is no expected network traffic. The                         
        administrator can also view the current list of running                 
        processes (ps -ef) and scan for those that use the network              
        subsystem with large amounts of CPU time being used. If this            
        behavior is observed, the suspected application defect                 
        should be isolated and fixed.                                           
        1.4  How to tune the mbuf pools                                         
        With an understanding of how the mbuf pools are organized               
        and managed, tuning the mbuf pools is simple in AIX and can             
        be done at run-time (unlike other UNIX systems in which the             
        kernel must be recompiled and the system rebooted).                     
                                     5                Network Tuning            
                                                      April 27, 1992            
        The network options (no) command can be used by root to                 
        modify the mbuf pool parameters. Soem guidelines are:                   
           o After expanding the pools, use the vmstat command to               
             ensure that paging rates have not increased. If you                
             cannot expand the pools to the necessary levels without            
             adversely affecting the paging rates, additional memory            
             may be required.                                                  
           o When adjusting lowclust, lowmbuf should be adjusted by             
             at least the amount that lowclust is. For every cluster            
             there will exist an mbuf that points to it.                        
           o mb_cl_hiwat should remain at least two times greater               
             than lowclust at all times. This will prevent the netm             
             thrashing discussed earlier.                                       
           o When adjusting lowclust and lowmbuf, thewall may need              
             to be increased to prevent pool expansions from hitting            
             thewall.                                                           
        The following is an example shell script that might be                  
        placed at the end of /etc/rc.net to tune the mbuf pools for             
        an NFS server that experiences a network traffic load of                
        approximately 1500 packets/sec.                                         
                                                      April 27, 1992            
        #¢/bin/ksh                                                             
        # echo "Tuning mbuf pools..."                                           
        # set maximum amount of memory to allow for allocation                  
        no -o thewall=10000                                                     
        # set minimum number of small mbufs                                     
        no -o lowmbuf=3000                                                      
        # generate network traffic to force small mbuf pool expansion           
        ping 127.0.0.1  1000 1 >/dev/null                                       
        # set minimum number of small mbufs back to default to prevent          
          netm from running unnecessarily.                                      
        no -d lowmbuf                                                           
        # set maximum number of large mbufs before pool expansion               
        no -o mb_cl_hiwat=1500                                                  
        # gradually expand large mbuf pool                                      
        N=10                                                                    
        while . $N -lt 1500 .                                                  
        do                                                                      
          no -o lowclust=$N                                                     
          ping 127.0.0.1 1000 1 >/dev/null                                      
          let N=N+10                                                            
        done                                                                    
        # set minimum number of large mbufs back to default to prevent          
          netm from running unnecessarily.                                      
        no -d lowclust                                                          
        You can use netstat -m following the above script to verify             
        the size of the pool of clusters (which netstat calls                   
        "mapped pages"). To verify the size of the pool of mbufs you            
        can use the crash command to examine a kernel data                      
        structure, mbstat (see /usr/include/sys/mbuf.h). The kernel             
        address of mbstat can be displayed while in crash using od              
        mbstat. You will then need to od  30 to dump           
        the first word in the mbstat structure, which contains the              
        size of the mbuf pool. The dialog would be approximately as             
        follows:                                                                
                                     7                Network Tuning            
                                                      April 27, 1992            
        $ crash                                                                 
        > od mbstat                                                             
        000e2be0: 001f7008                                                      
        > od 1f7008                                                             
        001f7008: 00000130                                                      
        > quit                                                                  
        The size of the mbuf pool is therefore 130(hex)                         
        304(decimal).                                                           
                                                                                
---------- ---------- ---------- --------- ---------- ----------               
                                                                                
                                                                                
This item was created from library item Q589150      2VRDS                      
                                                                                
Additional search words:                                                        
AIX ALTERNATE COMMUNICATIO INDEX IX JUL92 KERNEL MBUF PERFORMANCE               
RISCSYSTEM RISCTCP SETTING SOFTWARE TCPIP TUNE TUNING 2VRDS                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                               


WWQA: ITEM: RTA000024192 ITEM: RTA000024192
Dated: 04/1996 Category: RISCPERF
This HTML file was generated 99/06/24~12:43:08
Comments or suggestions? Contact us