-------------------------------------------------------------------------------
                IBM xSeries Performance Logger for Linux (xPL)
Description:    Tool to collect Linux performance counters into a CSV log file 
                readable by Windows System Monitor (PerfMon).
Author:         Payam Abrishami
Released:       02/27/2006
Note:           Works with Linux Kernels 2.4 and 2.6 on all x86 and x86_64
		platforms.

-------------------------------------------------------------------------------

LICENSED MATERIAL - PROPERTY OF IBM
(c) Copyright IBM Corp. 2005 ALL RIGHTS RESERVED
US Government Users Restricted Rights - Use, duplication or disclosure 
restricted by GSA ADP schedule contract with IBM Corp.

ATTENTION:
    Please read the counter descriptions before proceeding to use the tool if
    you are not familiar with the Linux proc file system.
    While the counters collected by xPL may have similar names to what is
    seen under Windows, they *might* have very different meanings under Linux.


Table of Contents:
    1. Instructions
    2. Parameter file description
    3. Counters descriptions
    4. Notes


1. Instructions:
----------------

    Do one of the following:
    
        i - Set parameters in parameter file (default: input.prm)
        ii - xpl parameter_file       Run using parameter_file
        iii - Use the created output file(s) with Windows System Monitor (1,2)
     
    	(1) Please refer to Windows System Monitor (perfmon) help for
    	    instructions on how to import a CSV log file.
    	    Hint: (Start -> Settings -> Control Panel -> Administrative Tools 
    	    	-> Performance)
    	    Click on the cylinder on the toolbar.
    	    
    	(2) "Help Me Find My IBM eServer xSeries Performance Problem!" Redpaper
    	    is an excellent article to help analyze performance bottlenecks:
    	    http://www.redbooks.ibm.com/redpapers/abstracts/redp3938.html
        
    or    
        
        - xpl -g parameter_file    Generate parameter_file and quit
        
    or
        
        - xpl -s system_info_file  Create system_info_file and quit


    Note: You can kill xPL ctrl+c when necessary and it will perform a
    graceful shutdown.


2. Parameter file description:
------------------------------

    [config]
        Tracing configuration. Options are:
        0: Time limited trace
        1: Log file size limited

        Note: If xPL is unable to write any further to disk, i.e. disk is
        full it will stop.


    Trace interval
        Trace interval consists of interval in seconds and interval in 
        milliseconds. Total of the two is used for trace interval and are 
        separated for convenience. These values are required by all trace 
        configurations.

        [int_s]
            Trace interval in seconds

        [int_m]
            Trace interval in milliseconds, must be 0 or between 20 and 999.


    Trace duration
        Trace duration is only required by config 0 (Time limited trace).

        [dur_s]
            Trace duration in seconds

        [dur_m]
            Trace duration in milliseconds must be 0 or between 20 and 999.


    [log_size]
        Log file size limit in MB, applies to all configs. 
        Set to 0 for no limit (stops when disk full or time limit has reached.)
        Note: on average xPL writes 3KB per sample.

        
    [new_log]
        0: no
        1: yes
        
        Start a new log file when log file size limit is reached, for all 
        configs. If set to no(0) along with config 0, xPL will overwrite  
        the file if log_size limit has reached so long as the time limit is 
        not reached. If set to yes(1) xPL will create a new log file 
        incrementing the log file number (see log_start), if used along with 
        config 1 xPL will not stop until manually stopped or when disk is 
        full.

        
    [log_start]
        Starting number of log file, applies only if new_log is set to yes(1).
        
        
    [output_file]
        Output file name, xPL will append .csv extension to the end.
        No spaces here, xPL will only use the first portion before first 
        space.


    [output_buffer]
        Output buffer size in KB. 
        If set to a number other than zero xPL will wait until buffer is full
        before writing to disk.

    [counter]
        List of counters to trace, 1 to trace and 0 to not trace.


3. Counters descriptions:
------------------------

    a. CpuStat
    
        General CPU statistics. Includes:

        % User Time
            - Per processor and total of all
            Percent of total CPU time spent executing user processes.

        % System Time:
            - Per processor and total of all
            Percent of total CPU time spent executing system (kernel) 
            processors.
            
            Note: In 2.6 kernel %System Time reported by xPL includes 
            IRQ and SoftIRQ times.

        % Idle Time:
            - Per processor and total of all
            Percent of total CPU time being idle.

        [2.6 kernel only]
        -----------------------------------------------------------------------
        % Iowait Time:
            - Per processor and total of all
            Percent of total CPU time waiting for I/O to complete

        % IRQ Time:
            - Per processor and total of all
            Percent of total CPU time servicing interrupts
            
        % SoftIRQ Time:
            - Per processor and total of all
            Percent of total CPU time servicing softirqs
            
            Note: Under Linux, a softirq is an interrupt handler that runs 
            outside of the normal interrupt context, runs with interrupts 
            enabled, and runs concurrently when necessary. The kernel will run 
            up to one copy of a softirq on each processor in a system. 
            Softirqs are designed to provide more scalable interrupt handling 
            in SMP settings.
        -----------------------------------------------------------------------
        [end of 2.6 kernel only]
        
        % Processor Time:
            - Per processor and total of all
            Sum of user and system time.

        Context Switches/ sec
            - Total of all CPUs
            Total number of context switches across all CPUs.

            Note: A context switch (also sometimes referred to as a process  
            switch or a task switch) is the switching of the CPU from one 
            process or thread to another. A context is the contents of a CPU's 
            registers and program counter at any point in time. Context  
            switching can be described in slightly more detail as the kernel 
            performing the following activities with regard to processes 
            including threads) on the CPU:
            
                a. Suspending the progression of one process and storing the  
                    CPU's state (i.e., the context) for that process somewhere
                    in memory
                    
                b. Retrieving the context of the next process from memory and 
                    restoring it in the CPU's registers and
                    
                c. Returning to the location indicated by the program counter 
                    (i.e., returning to the line of code at which the process  
                    was interrupted) in order to resume the process.


    b. IntStat
    
        Provides interrupts statistics per second per active interrupt per 
        processor and totals of all per interrupt and per processor.
        Devices mapped to each interrupt is displayed after the interrupt
        number in the trace.


    c. DskStat
    
        General Physical disk statistics.
        Each stat is provided per Physical disk and total of all.

        Reads/sec
            Total # of read operations completed successfully per second.

        Writes/sec
            Total # of write operations completed successfully per second.

        Transactions/sec
            Sum of read and write operations per second.

        Read Bytes/sec
            Total # of bytes read successfully per second.

        Write Bytes/sec
            Total # of bytes written successfully per second.

        Bytes/sec
            Sum of read and write bytes per second.

        Ave. Bytes/Read
            Total # of bytes read successfully per total # of reads completed 
            successfully.

        Ave. Bytes/Write
            Total # of bytes written successfully per total # of writes  
            completed successfully.

        Ave. Bytes/Transaction
            Sum of reads and writes.

        Ave. sec/Read
            This is the total number of seconds spent by all reads per total 
            # of reads completed successfully.

        Ave. sec/Write
            This is the total number of seconds spent by all writes per total 
            # of writes completed successfully.

        Ave. sec/Transaction
            Sum of reads and writes.

        IO's in Progress
            Snapshot of current disk queue. Incremented as requests are given 
            to appropriate request queue and decremented as they finish.
            It is the number of requests outstanding on the disk at the time 
            the performance data is collected. It also includes requests in 
            service at the time of the collection. This is a instantaneous 
            snapshot, not an average over the time interval.


   d. MemStat
   
        Provides memory statistics.

        Total MB
            Total physical memory.

        Free MB
            Total un-allocated physical memory. This is not a good indicator 
            of available physical memory. To check to see if you're out of 
            available physical memory watch for paging or swapping activity.

        Page In KB/sec
            Total number of kilobytes the system paged in from disk per second.

        Page Out KB/sec
            Total number of kilobytes the system paged out to disk per second.

        Page KB/sec
            Sum of Page In and Page Out KB/sec.

        Swap In/sec
            Total number of swap pages the system brought in per second.

        Swap Out/sec
            Total number of swap pages the system brought out per second.

        Swaps/sec
            Sum of Swap In and Swap Out /sec.

            Note: The fundamental unit of memory under Linux is the page, a 
            non-overlapping region of contiguous memory. All available 
            physical memory is organized into pages near the end of the 
            kernel’s boot process, and pages are doled out to and revoked from 
            processes by the kernel’s memory management algorithms at runtime.
            Linux uses 4k-byte pages for most processors. 
            
            Paging and swapping both refer to virtual memory activity in Linux.
            With paging, when the kernel requires more main memory for an 
            active process, only the least recently used pages of processes are
            moved to the swap space. The page counters are similar to Memory 
            Page Reads and Writes in Windows.
            
            Swapping means to move an entire process out of main memory and to 
            the swap area on hard disk, whereby all pages of that process are 
            moved at the same time.
            
            
    e. NetStat
    
        Provides network statistics.
        Each stat is provided per network device and total of all.

        Packets Sent/sec
        Packets Received/sec
        Packets/sec (sum of sent and received)

        Bytes Sent/sec
        Bytes Received/sec
        Bytes/sec (sum of sent and received)


4. Notes:
---------

    a. PerfMon can handle partially written samples at the end of the log.
    b. You can use the 'relog' command on Windows to manipulate log files 
        (e.g.: convert csv logs to binary). Refer to relog's help for more info
    c. Data is generated as it is read from /proc so there is no limit on how
        often you can read them, although it may affect your system performance
        if the data is read too often.
    
use the 'relog' command on Windows to manipulate log files 
        (e.g.: convert csv logs to binary). Refer to relog's help for more info
    c. Data is generated as it is read from /proc so there is no limit on how
        often you can read them, although it may affect your system performance
        if the data is read too often.