Performance Management Guide

Analyzing the PDT Report

A PDT report consists of several sections and differs depending on the severity level chosen (see PDT Severity Levels).

The header section indicates the release number of PDT, the date the report was printed, the host from which the data was collected, and the range of dates of the data that fed the analysis. The content of this section is consistent for all severity levels.

An example of a report header follows:

Performance Diagnostic Facility 1.0
 
 Report printed: Thu Feb 10 17:54:50 2000
 
Host name: itsosmp.itsc.austin.ibm.com
 Range of analysis includes measurements
 from: Hour 12 on Tuesday, February 8th, 2000
 to: Hour 17 on Wednesday, February 9th, 2000
 
 
 Notice: To disable/modify/enable collection or reporting
         execute the pdt_config script as root

Alerts

The Alerts section focuses on identified violations of applied concepts and thresholds. The following subsystems may have problems when they appear in the Alerts section: file systems, I/O configuration, paging configuration, I/O balance, paging space, virtual memory, real memory, processes, and network.

For severity 1 levels, alerts focus on file systems, physical volumes, paging, and memory. If severity 2 or 3 is selected, information on configuration and processes is added.

An example of an Alerts section follows:

------------------------ Alerts ---------------------
 
  I/O CONFIGURATION
   -  Note: volume hdisk1 has 872 MB available for allocation
      while volume hdisk0 has 148 MB available
   -  Physical volume hdisk2 is unavailable; (in no volume group)
 
  PAGING CONFIGURATION
   -  Physical Volume hdisk2 (type: SCSI) has no paging space defined
   -  Paging space paging00 on volume group rootvg is fragmented
   -  Paging space paging01 on volume group uservg is fragmented
 
  I/O BALANCE
   -  Phys. volume hdisk2 is not busy
      volume hdisk2, mean util. = 0.00 %
 
  PROCESSES
   -  First appearance of 20642 (cpubound) on top-3 cpu list
      (cpu % = 24.10)
   -  First appearance of 20106 (eatmem) on top-3 memory list
      (memory % = 8.00)
 
  FILE SYSTEMS
   -  File system hd2 (/usr) is nearly full at 100 %
 
  NETWORK
   -  Host ah6000e appears to be unreachable.
      (ping loss % = 100) and has been for the past 4 days

The I/O configuration indicates that the data is not well-distributed through the disks and that hdisk2 is not used at all. Add this disk to a volume group and define a paging space on it. The existing paging areas are fragmented and should be reorganized; for example, with the reorgvg command. The I/O balance section shows that hdisk2 is not busy, because hdisk2 has not been assigned to a volume group.

In most systems, /usr file system use is nearly 100 percent. Usually this is not a problem, but system administrators should check if there is any application writing data to this file system.

Upward and Downward Trends

PDT employs a statistical technique to determine whether there is a trend in a series of measurements. If a trend is detected, the slope of the trend is evaluated for its practical significance. An estimated date at which a file system or the page space will be full is provided, based on an assumption of continued linear growth. For upward trends, the following items are evaluated: files, file systems, hardware and software errors, paging space, processes, and network. For downward trends, the following can be reported: files, file systems, and processes.

An example of an Upward Trends and Downward Trends section follows:

---------------------- Upward Trends ----------------
 
  FILES
   -  File (or directory) /usr/adm/wtmp SIZE is increasing
      now, 20 KB and increasing an avg. of 2163 bytes/day
   -  File (or directory) /var/adm/ras/ SIZE is increasing
      now, 677 KB and increasing an avg. of 11909 bytes/day
  FILE SYSTEMS
   -  File system hd9var (/var) is growing
      now, 17.00 % full, and growing an avg. of 0.38 %/day
   -  File system lv00 (/usr/vice/cache) is growing
      now, 51.00 % full, and growing an avg. of 4.64 %/day
      At this rate, lv00 will be full in about 9 days
  PAGE SPACE
   -  Page space hd6 USE is growing
      now, 81.60 MB and growing an avg. of 2.69 MB/day
      At this rate, hd6 will be full in about 29 days
  ERRORS
   -  Software ERRORS; time to next error is 0.958 days
 
---------------------- Downward Trends --------------
  PROCESSES
   -  Process 13906 (maker4X.e) CPU use is declining
      now 1.20 % and declining an avg. of 0.68 % per day
   -  Process 13906 (maker4X.e) MEMORY use is declining
      now 13.00 and declining an avg. of 0.98 % per day
  FILES
   -  File (or directory) /tmp/ SIZE is declining
  FILE SYSTEMS
   -  File system hd3 (/tmp) is shrinking

The /usr/adm/wtmp is liable to grow unbounded. If it gets too large, login times can increase. In some cases, the solution is to delete the file. In most cases, it is important to identify the user causing the growth and work with that user to correct the problem.

The error log file is located in the directory /var/adm/ras. The ERRORS section shows that the number of software errors is increasing, which is most likely the reason why the directory size increased. Check the error log, verify which application is in error, and correct the problem.

The increase in the use of paging space might be due to a process with a memory leak. That process should be identified and the application fixed. However, the paging space might not be well-dimensioned and may need to be enlarged.

System Health

The System Health section provides an assessment of the average number of processes in each process state on the system. Additionally, workload indicators are noted for any upward trends.

An example of a System Health section follows:

----------------------- System Health ---------------
 
  SYSTEM HEALTH
   -  Current process state breakdown:
      75.00 [ 100.0 %] : active
      0.40 [ 0.5 %] : swapped
      75.00 = TOTAL
      [based on 1 measurement consisting of 10 2-second samples]

Summary

The severity level of the current report is listed, as well as an indication as to whether more details are available at higher severity levels.

An example of a Summary section follows:

-------------------- Summary -------------------------
  This is a severity level 3 report
  No further details available at severity levels > 3