------------------------------------------------------------------------------- IBM xSeries Performance Logger for Linux (xPL) Description: Tool to collect Linux performance counters into a CSV log file readable by Windows System Monitor (PerfMon). Author: Payam Abrishami Released: 02/27/2006 Note: Works with Linux Kernels 2.4 and 2.6 on all x86 and x86_64 platforms. ------------------------------------------------------------------------------- LICENSED MATERIAL - PROPERTY OF IBM (c) Copyright IBM Corp. 2005 ALL RIGHTS RESERVED US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP schedule contract with IBM Corp. ATTENTION: Please read the counter descriptions before proceeding to use the tool if you are not familiar with the Linux proc file system. While the counters collected by xPL may have similar names to what is seen under Windows, they *might* have very different meanings under Linux. Table of Contents: 1. Instructions 2. Parameter file description 3. Counters descriptions 4. Notes 1. Instructions: ---------------- Do one of the following: i - Set parameters in parameter file (default: input.prm) ii - xpl parameter_file Run using parameter_file iii - Use the created output file(s) with Windows System Monitor (1,2) (1) Please refer to Windows System Monitor (perfmon) help for instructions on how to import a CSV log file. Hint: (Start -> Settings -> Control Panel -> Administrative Tools -> Performance) Click on the cylinder on the toolbar. (2) "Help Me Find My IBM eServer xSeries Performance Problem!" Redpaper is an excellent article to help analyze performance bottlenecks: http://www.redbooks.ibm.com/redpapers/abstracts/redp3938.html or - xpl -g parameter_file Generate parameter_file and quit or - xpl -s system_info_file Create system_info_file and quit Note: You can kill xPL ctrl+c when necessary and it will perform a graceful shutdown. 2. Parameter file description: ------------------------------ [config] Tracing configuration. Options are: 0: Time limited trace 1: Log file size limited Note: If xPL is unable to write any further to disk, i.e. disk is full it will stop. Trace interval Trace interval consists of interval in seconds and interval in milliseconds. Total of the two is used for trace interval and are separated for convenience. These values are required by all trace configurations. [int_s] Trace interval in seconds [int_m] Trace interval in milliseconds, must be 0 or between 20 and 999. Trace duration Trace duration is only required by config 0 (Time limited trace). [dur_s] Trace duration in seconds [dur_m] Trace duration in milliseconds must be 0 or between 20 and 999. [log_size] Log file size limit in MB, applies to all configs. Set to 0 for no limit (stops when disk full or time limit has reached.) Note: on average xPL writes 3KB per sample. [new_log] 0: no 1: yes Start a new log file when log file size limit is reached, for all configs. If set to no(0) along with config 0, xPL will overwrite the file if log_size limit has reached so long as the time limit is not reached. If set to yes(1) xPL will create a new log file incrementing the log file number (see log_start), if used along with config 1 xPL will not stop until manually stopped or when disk is full. [log_start] Starting number of log file, applies only if new_log is set to yes(1). [output_file] Output file name, xPL will append .csv extension to the end. No spaces here, xPL will only use the first portion before first space. [output_buffer] Output buffer size in KB. If set to a number other than zero xPL will wait until buffer is full before writing to disk. [counter] List of counters to trace, 1 to trace and 0 to not trace. 3. Counters descriptions: ------------------------ a. CpuStat General CPU statistics. Includes: % User Time - Per processor and total of all Percent of total CPU time spent executing user processes. % System Time: - Per processor and total of all Percent of total CPU time spent executing system (kernel) processors. Note: In 2.6 kernel %System Time reported by xPL includes IRQ and SoftIRQ times. % Idle Time: - Per processor and total of all Percent of total CPU time being idle. [2.6 kernel only] ----------------------------------------------------------------------- % Iowait Time: - Per processor and total of all Percent of total CPU time waiting for I/O to complete % IRQ Time: - Per processor and total of all Percent of total CPU time servicing interrupts % SoftIRQ Time: - Per processor and total of all Percent of total CPU time servicing softirqs Note: Under Linux, a softirq is an interrupt handler that runs outside of the normal interrupt context, runs with interrupts enabled, and runs concurrently when necessary. The kernel will run up to one copy of a softirq on each processor in a system. Softirqs are designed to provide more scalable interrupt handling in SMP settings. ----------------------------------------------------------------------- [end of 2.6 kernel only] % Processor Time: - Per processor and total of all Sum of user and system time. Context Switches/ sec - Total of all CPUs Total number of context switches across all CPUs. Note: A context switch (also sometimes referred to as a process switch or a task switch) is the switching of the CPU from one process or thread to another. A context is the contents of a CPU's registers and program counter at any point in time. Context switching can be described in slightly more detail as the kernel performing the following activities with regard to processes including threads) on the CPU: a. Suspending the progression of one process and storing the CPU's state (i.e., the context) for that process somewhere in memory b. Retrieving the context of the next process from memory and restoring it in the CPU's registers and c. Returning to the location indicated by the program counter (i.e., returning to the line of code at which the process was interrupted) in order to resume the process. b. IntStat Provides interrupts statistics per second per active interrupt per processor and totals of all per interrupt and per processor. Devices mapped to each interrupt is displayed after the interrupt number in the trace. c. DskStat General Physical disk statistics. Each stat is provided per Physical disk and total of all. Reads/sec Total # of read operations completed successfully per second. Writes/sec Total # of write operations completed successfully per second. Transactions/sec Sum of read and write operations per second. Read Bytes/sec Total # of bytes read successfully per second. Write Bytes/sec Total # of bytes written successfully per second. Bytes/sec Sum of read and write bytes per second. Ave. Bytes/Read Total # of bytes read successfully per total # of reads completed successfully. Ave. Bytes/Write Total # of bytes written successfully per total # of writes completed successfully. Ave. Bytes/Transaction Sum of reads and writes. Ave. sec/Read This is the total number of seconds spent by all reads per total # of reads completed successfully. Ave. sec/Write This is the total number of seconds spent by all writes per total # of writes completed successfully. Ave. sec/Transaction Sum of reads and writes. IO's in Progress Snapshot of current disk queue. Incremented as requests are given to appropriate request queue and decremented as they finish. It is the number of requests outstanding on the disk at the time the performance data is collected. It also includes requests in service at the time of the collection. This is a instantaneous snapshot, not an average over the time interval. d. MemStat Provides memory statistics. Total MB Total physical memory. Free MB Total un-allocated physical memory. This is not a good indicator of available physical memory. To check to see if you're out of available physical memory watch for paging or swapping activity. Page In KB/sec Total number of kilobytes the system paged in from disk per second. Page Out KB/sec Total number of kilobytes the system paged out to disk per second. Page KB/sec Sum of Page In and Page Out KB/sec. Swap In/sec Total number of swap pages the system brought in per second. Swap Out/sec Total number of swap pages the system brought out per second. Swaps/sec Sum of Swap In and Swap Out /sec. Note: The fundamental unit of memory under Linux is the page, a non-overlapping region of contiguous memory. All available physical memory is organized into pages near the end of the kernel’s boot process, and pages are doled out to and revoked from processes by the kernel’s memory management algorithms at runtime. Linux uses 4k-byte pages for most processors. Paging and swapping both refer to virtual memory activity in Linux. With paging, when the kernel requires more main memory for an active process, only the least recently used pages of processes are moved to the swap space. The page counters are similar to Memory Page Reads and Writes in Windows. Swapping means to move an entire process out of main memory and to the swap area on hard disk, whereby all pages of that process are moved at the same time. e. NetStat Provides network statistics. Each stat is provided per network device and total of all. Packets Sent/sec Packets Received/sec Packets/sec (sum of sent and received) Bytes Sent/sec Bytes Received/sec Bytes/sec (sum of sent and received) 4. Notes: --------- a. PerfMon can handle partially written samples at the end of the log. b. You can use the 'relog' command on Windows to manipulate log files (e.g.: convert csv logs to binary). Refer to relog's help for more info c. Data is generated as it is read from /proc so there is no limit on how often you can read them, although it may affect your system performance if the data is read too often. use the 'relog' command on Windows to manipulate log files (e.g.: convert csv logs to binary). Refer to relog's help for more info c. Data is generated as it is read from /proc so there is no limit on how often you can read them, although it may affect your system performance if the data is read too often.