ITEM: G6626L

Problem with 0% idle time on a production system running ORACLE.


Question:

I have a RISC System/6000 model 370 (?) running AIX version 3.2.5 and
I am having problem because there is 0% idle time on the system.  The
problem seems to have started at around 10:00 this morning, according
to the sar command.  I had 90% idle time and then I had 0% idle time,
and I have no ideas as to the reason why.  I do not see any runaway
processes according to the ps command and my paging seems to be fine.
The system has 64MB of memory.

I have 4 local users on the system and about 40 client database users.

Response:

Database product Oracle V7.0.15.4.0.  SGA: Large model as sent by
Oracle.  Prior to problem, 32 users, frontended from OS/2 with EASEL.
All worked ok.  IDLE was always over 90%.

Added 10 SQLFORMS 3.0 users on OS/2, SQLNET V1 to the system.  Still
over 80% idle.

At 10:10, the system hit the wall with 93% user cpu, 7% system or wait,
0% idle.

Mirroring is on, 3 disks, one is a mirror.  Oracle on hdisk1, mirror
is hdisk2.

System has an HACMP mode 2(?) backup.   Added 1 user running SQLPLUS -
then we hit the wall.

64 Mb RAM, 160 Mb Paging, all on hdisk0 with the OS, etc.

The person running SQLPLUS was ctrl-C'ing , leaving ORACLE processes
still running.  All the same ORACLE server process.

Here is how we found the culprit and fixed the problem.

ps -ef|grep ora - revealed some number of processes containing
                  a string similar to "ORACLEsid=..." where sid is the
                  ORACLE_SID environment variable.

Looking at the PID column, we found large numbers (expected).  Looking
at the PPID column, we found "1" (unexpected).  This indicates that
the user abnormally terminated their SQLPLUS session, causing INIT to
inherit ownership of the ORACLEsid processes.

Looking at the STIME column, we found that each process had been
started about 5 minutes apart.  The user who started them confirmed
that she had started them about 5 minutes apart, and had terminated
each one with a CTRL-C because she thought she had made a typing
mistake.

Looking at the TIME column, we found huge numbers, like 23:45.  It
was determined that she had retyped her commands almost exactly each
time, and was trying to query the same tables each time.

CONCLUSION:  the processes (ORACLEsid...) were inherited by INIT, and
each one was competing for the same data resource.  The CPU time
skyrocketed because of lock promotion, demotion, and swapping as time
slices expired.  Referred the customer to ORACLE V7 Server
Adminstrator's Guide, Part Number 6694-70-1292, page 4-14 thru 4-15.

FIX:
1. run SQLDBA as the SYSTEM or SYSDBA management user;

2. type the next line EXACTLY as shown:

SELECT sid, serial\#, status, server FROM v$session WHERE username = 'XXX';

   where "XXX" is the AIX username of the suspect user.
   Example return from SQLDBA:

       SID           SERIAL\#       STATUS   SERVER
    ---------------- ------------- -------- ---------
                   7            15 INACTIVE DEDICATED

3.  type the next line EXACTLY as shown:

    ALTER SYSTEM KILL SESSION 'sid,serial\#';

    where sid and serial\# are values returned from command 2.


Support Line: Problem with 0% idle time on a production system running ORACLE. ITEM: G6626L
Dated: February 1994 Category: N/A
This HTML file was generated 99/06/24~13:30:49
Comments or suggestions? Contact us