[ Bottom of Page | Previous Page | Next Page | Contents | Index | Library Home |
Legal |
Search ]
Performance Management Guide
What is a Performance Problem
Support personnel need to determine when a reported problem is a functional
problem or a performance problem. When an application, a hardware system,
or a network is not behaving correctly, this is referred to as a functional problem. For example, an application or a system with a memory
leak has a functional problem.
Sometimes functional problems lead to performance problems; for example,
when the functions are being achieved, but the speed of the functions are
slow. In these cases, rather than tune the system, it is more important to
determine the root cause of the problem and fix it. Another example would
be when communication is slowed because of networks or name servers that are
down.
Performance Problem Description
Support personnel often receive problem reports stating that someone has
a performance problem on the system and providing some data analysis. This
information is insufficient to accurately determine the nature of a performance
problem. The data might indicate 100 percent CPU utilization and a high run
queue, but that may have nothing to do with the cause of the performance problem.
For example, a system might have users logged in from remote terminals
over a network that goes over several routers. The users report that the system
is slow. Data might indicate that the CPU is very heavily utilized. But the
real problem could be that the characters get displayed after long delays
on their terminals due to packets getting lost on the network (which could
be caused by failing routers or overloaded networks). This situation might
have nothing to do with the CPU utilization on the machine. If on the other
hand, the complaint was that a batch job on the system was taking a long time
to run, then CPU utilization or I/O bandwidth might be related.
Always obtain as much detail as possible before you attempt to collect
or analyze data, by asking the following questions regarding the performance
problem:
- Can the problem be demonstrated by running a specific command or reconstructing
a sequence of events? (for example: ls /slow/fs or ping xxxxx). If not, describe the
least complex example of the problem.
- Is the slow performance intermittent? Does it get slow, but then disappear
for a while? Does it occur at certain times of the day or in relation to some
specific activity?
- Is everything slow or only some things?
- What aspect is slow? For example, time to echo a character, or elapsed
time to complete a transaction, or time to paint the screen?
- When did the problem start occurring? Was the situation the same ever
since the system was first installed or went into production? Did anything
change on the system before the problem occurred (such as adding more users
or migrating additional data to the system)?
- If client/server, can the problem be demonstrated when run just locally
on the server (network versus server issue)?
- If network related, how are the network segments configured (including
bandwidth such as 10 Mb/sec or 9600 baud)? Are there any routers between the
client and server?
- What vendor applications are running on the system, and are those applications
involved in the performance issue?
- What is the impact of the performance problem on the users?
Reporting a Performance Problem
You should report operating system performance problems to IBM support.
Use your normal software problem-reporting channel. If you are not familiar
with the correct problem-reporting channel for your organization, check with
your IBM representative.
The AIX Performance PMR (perfpmr) data collection tools are the best way
to collect performance data when an AIX performance problem is suspected.
Access these tools via the web at ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr Follow the instructions in the README file in the directory that matches
the AIX version you will be measuring to obtain, install, and collect data
on your system. Instructions are also provided on how to send the data to
IBM support for analysis once a PMR has been opened.
When someone reports a performance problem, it is not enough just to gather
data and then analyze it. Without knowing the nature of the performance problem,
you might waste a lot of time analyzing data which may have nothing to do
with the problem being reported.
Before you involve support personnel to report a problem, prepare in advance
the information that you will be asked to supply to facilitate the problem
to be investigated. Your local support personnel will attempt to quickly solve
your performance problem directly with you.
Three further ways you can help to get the problem resolved faster are:
- Provide a clear written statement of a simple specific instance of problem,
but be sure to separate the symptoms and facts from the theories, ideas and
your own conclusions. PMRs that report "the system is slow" require extensive
investigation to determine what you mean by slow, how it is measured, and
what is acceptable performance.
- Provide information about everything that has changed on the system in
the weeks before the problem. Missing something that changed can block a possible
investigation path and will only delay finding a resolution. If all the facts
are available, the performance team can quickly eliminate the unrelated ones.
- Use the correct machine to supply information. In very large sites it
is easy to accidentally collect the data on the wrong machine. This makes
it very hard to investigate the problem.
When you report the problem, supply the following basic information:
- A problem description that can be used to search the problem-history database
to see if a similar problem has already been reported.
- What aspect of your analysis led you to conclude that the problem is due
to a defect in the operating system?
- What is the hardware and software configuration in which the problem is
occurring?
- Is the problem confined to a single system, or does it affect multiple
systems?
- What are the models, memory sizes, as well as number and size of disks
on the affected systems?
- What kinds of LAN and other communications media are connected to the
systems?
- Does the overall configuration include those for other operating systems?
- What are the characteristics of the program or workload that is experiencing
the problem?
- Does an analysis with the time, iostat, and vmstat commands indicate that it is
CPU-limited or I/O-limited?
- Are the workloads being run on the affected systems: workstation, server,
multiuser, or a combination?
- What are the performance objectives that are not being met?
- Is the primary objective in terms of console or terminal response time,
throughput, or real-time responsiveness?
- Were the objectives derived from measurements on another system? If so,
what was its configuration?
If this is the first report of the problem, you will receive a PMR number
for use in identifying any additional data you supply and for future reference.
Include all of the following items when the supporting information and
the perfpmr data for the PMR is first gathered:
- A means of reproducing the problem
- If possible, a program or shell script that demonstrates the problem should
be included.
- At a minimum, a detailed description of the conditions under which the
problem occurs is needed.
- The application experiencing the problem:
- If the application is, or depends on, any software product, the exact
version and release of that product should be identified.
- If the source code of a user-written application cannot be released, the
exact set of compiler parameters used to create the executable program should
be documented.
[ Top of Page | Previous Page | Next Page | Contents | Index | Library Home |
Legal |
Search ]