This fax addresses memory leaks and what to do when paging space runs out. When a process first touches a page it allocated in memory (RAM), an adjacent page in designated paging space is allocated. As a process continues grabbing memory, paging space continues being "reserved" or allocated. When this process finishes, it is responsible for deallocating the paging space it used. A memory leak occurs when a process fails to deallocate its paging space. This may result in paging space becoming entirely consumed, which in time will hang any system.
This document applies to AIX Version 4.x.
Processes requesting additional memory are killed once the system runs low on paging space. The system appears hung as new processes and telnet connections are terminated. Error messages like "Not enough memory" or "Fork function failed" are generated. There are three ways to resolve this situation.
An example output of the command 'lsps -s' looks like the following:
Total Paging Space Percent Used 200MB 51%
Discussed below are ways to find out what process is causing the memory leak and the tools used to accomplish this task:
A sample output from 'ps vg | pg' looks like the following:
PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND 0 - A 87:42 6 20 8 xx 0 0 0.1 0.0 swapper 1 - A 191:58 94 240 240 xx 25 28 0.3 0.0 /etc/init 516 - A 70228:47 0 16 20 xx 0 0 97.0 0.0 kproc 774 - A 5:53 1 24 28 xx 0 0 0.0 0.0 kproc 1032 - A 28:40 0 56 56 xx 0 0 0.0 0.0 kproc 1866 - A 0:00 0 24 20 xx 0 0 0.0 0.0 kproc 2174 pts/1 A 2:55 31 420 544 32768 260 164 0.0 1.0 aixterm 2454 - A 1:32 62 272 224 xx 96 60 0.0 0.0 /usr/dt/b
Collect 'ps vg' output at different instances throughout the period of time that "%Used" from 'lsps -s' grows to 99%. The output can then be examined for large numerical increases from the "SIZE" column. This process would exhibit extraordinarily large increases in the amount of paging space it uses between the two 'ps vg' readings.
There is a tool that creates delta reports of 'ps vg' over any designated period of time. The script is called 'ps_' and is located in /usr/sbin/perf/pmr. It is only available at AIX 4.1.x and above. This tool is not installed by default. The fileset name is bos.perf.pmr and can be installed from install media. This 'ps_' script is run with the following syntax:
ps_ <#seconds to run>
It takes a 'ps vg' snapshot at the beginning, and then at the end of designated time period and creates a delta report (final values minus initial values). The output file for 'ps_' is called "ps.sum" and is created in /var/perf/tmp.
For example, a system user notices the "%Used" value from 'lsps -s' rises from 40% to 80% in a few hours, eventually reaching 99% and freezing all activity on the system. He/she realizes this is not normal and that there may be a memory leak at hand. To find the process responsible for the memory leak, running a 'ps_ 600' every half hour during the time paging space became consumed, would most likely reveal the process causing the memory leak. The following is a sample reading of 'ps_' (as seen below from ps.sum):
DELTA DELTA DELTA DELTA DELTA DELTA BEFORE AFTER PID PGIN SIZE RSS TRS DRS C TIME TIME CMD 0 0 0 0 0 0 0 10:58 10:58 swapper 1 0 0 0 0 0 -1 71:31 71:31 init 516 0 0 0 0 0 0 17136:33 17137:29 kproc 50328 1 78 -124 0 -124 1 0:00 0:00 ksh 50450 0 0 0 0 0 0 0:00 0:00 telnetd 50724 0 -20 0 0 0 0 0:29 0:29 ttsession 53746 0 0 0 0 0 0 0:00 0:00 ksh
From the "DELTA SIZE" column, we can see that PID 50328 allocated 78K of paging space during the time 'ps_' was run. PID 50724, however, deallocated 20K of paging space during this time and any process showing zero indicates that it allocated no paging space.
Note: PAIDE/6000 must be installed in order use svmon (and others, such as tprof, netpmon, and filemon). To check if this is installed, enter: 'lslpp -1 perfagent.tools'
If you are at AIX Version 4.3.0 or higher, this file can be found on the AIX Base Operating System media. Otherwise, to order PAIDE/6000, contact your local IBM representative.
As root enter the following command:
svmon -Pau 10 | more
This will list the top 10 memory consumers in decreasing order, the first process being the largest consumer. The rest of the report shows memory and paging space usage for each segment of each process.
Sample output looks like the following:
Pid Command Inuse Pin Pgspace 13794 dtwm 1603 1 449 Pid: 13794 Command: dtwm Segid Type Description Inuse Pin Pgspace Address Range b23 pers /dev/hd2:24849 2 0 0 0..1 14a5 pers /dev/hd2:24842 0 0 0 0..2 6179 work lib data 131 0 98 0..891 280a work shared library text 1101 0 10 0..65535 181 work private 287 1 341 0..310:65277..65535 57d5 pers code,/dev/hd2:61722 82 0 0 0..135
In each process report, find "Type" = work and "Description" = private and check how many 4K (4096 byte) pages are used under the "Pgspace" column. This is the minimum number of working pages this segment is using in all of virtual memory. A "Pgspace" number that grows but never decreases may indicate a memory leak.
The "maxuproc" parameter can be increased via SMIT. Go into "SMIT/System Environments/Change Show Characteristics of the Operating System". The first line on this screen is "maxuproc". Increasing this number by a conservative increment (50-100 at a time) will allow users to fork more processes, thus avoiding any "Out of memory" or "Cannot fork" messages.