Problems With The /etc/utmp File


Contents

About This Document
The w Command Reports Incorrect Idle Time
Indicators of utmp Corruption
Problem: Uptime greater than 8000 days
Solution
who or w show users logged in when they are not
How to find out what program caused the corruption
Known Problems
Fixes/Problems
How to fix the utmp file

About This Document

The /etc/utmp file is used by the who, w and uptime commands to display when the system was last booted and who is currently logged in. This document describes possible solutions to a corrupted utmp file and is applicable to AIX Version 3.2.


The w Command Reports Incorrect Idle Time

If the w command shows idle time greater than the uptime of the system, the following fixes should be installed:

 
For Version 3.2 - APAR IX51806 

Indicators of utmp Corruption

Corruption of the utmp file show up in two different ways:

  1. The uptime and w commands show a time greater than 8000 days since the system was last booted.
  2. Users are shown as still logged in when in fact they are not.

Both types of corruption have many causes, because both AIX commands and third party applications write to the utmp file.


Problem: Uptime greater than 8000 days

If record number 0 is overwritten by anyone (normally a third party program), the uptime shows up as greater than 8000 days.


Solution

To correct the invalid boot time, you must reboot the system. The utmp file is recreated with each boot.

To attempt to discover who or what over wrote the first entry in the file, use the following command to create a readable version of the utmp file and look at record 0:

Note: The fwtmp command (bosext2.acct.obj) must first be installed.

 
/usr/sbin/acct/fwtmp < /etc/utmp  >/tmp/out 

A valid entry looks something like this:

 
system boot 0 2 0000 0000 818538505 Sat Dec 9 13:48:25 
CST 1995 

Instead of the system boot entry, you will probably find a entry like:

 
jones pts/2 19193 7 0000 0000 818683926 Mon Dec 11 06:12:06 
CST 1995 

This output means that the time stamp was corrupted by whatever program jones on pts/2 used to login with. A program should never overwrite the first two entries in the utmp file. You would have to talk with jones to see what he did. This is almost always caused by a third party program that is incorrectly writing to the utmp file or a corrupted file system where the data is invalid.


who or w show users logged in when they are not

When a user logs into the system, the /usr/sbin/getty program writes a entry in /etc/utmp like:

 
sandy pts/17 39667 7 0000 0000 818690973 Mon Dec 11 
08:09:33 CST 1995 
 *                      * 
Field #1 = user's name 
Field #2 = tty used to login on 
Field #3 = PID (process id) 
Field #4 = type of entry 

The types of entries can be seen by examining the /usr/include/utmp.h file under ut_type. Type 7 is a USER_PROCESS.

When a user logs out, it is the responsibility of the last process running to update the entry in the utmp file. After a logout, the entry should look like:

 
pts/17 39667 8 0000 0000 818690973 Mon Dec 11 08:09:33 
CST 1995 
*             * 

The user name is erased and the state is changed from 7 to 8 (DEAD_PROCESS).

The who command will only show entries that are in state 7.


How to find out what program caused the corruption

  1. Set up auditing on writes to the utmp file.
  2. Have cron do the who command each minute and send the results to a file.
  3. When you notice corruption with the who or w command, check the cron output files to determine when the corruption occurred.
  4. Look in the audit log to determine what process was writing to the utmp file at the time the corruption occurred.

    This is an example of an audit log output:

     
    event      login  status   time                     command 
    ---------- ------ -------  ----------------------   ------- 
    UTMP_WRITE root   OK       Tue Dec 19 17:00:29 1995 telnetd 
    

    The example above shows that telnetd wrote to the file at 17:00:29.


Known Problems

  1. Fixed in AIX 3.2.5
    IX38013
    Xterm (X11 Release 4) can corrupt utmp
    IX37873
    w command hangs under AIX 3.2.4
  2. Fixed in AIX 3.2.5.1
    IX45401
    Telnet corrupts utmp under certain circumstances
    IX44950
    Rlogind does not write exit utmp entry in some cases
    IX43576
    Init intermittently fails to write utmp entries
    IX43059
    HCF does not clean up utmp file after logoff
    IX46168
    Rlogin causes utmp corruption
    IX45319
    /bin/logout believes corrupted utmp entries (caused by corrupted utmp)
    IX44956
    Xterm does not mark utmp entry on exit
    IX46179
    w command hangs when su - is used, 3.2.5 (caused by corrupted utmp) defect in ptydd drivers
    IX40003
    Who -u piped to commands causes garbage with many users logged in (problem with the who command, utmp is ok)
  3. Fixed Post 3.2.5.1
    IX56333
    w command hangs if the port was improperly closed. Not a utmp problem but a port problem, the fix is to timeout and check the rest of the ports.

Fixes/Problems

Fixes for AIX Version 3.2 can be downloaded via the Internet with the FixDist service.


How to fix the utmp file

Rebooting clears the utmp file and is the recommended method of correcting the results of corruption.

The following is an awk script that can be used to attempt to clean out bad entries in the /etc/utmp file. It may not clean certain types of corruption and a reboot will be required to clean up the file.

Warning: Since the utmp file is constantly being changed, there is always the possibility that an attempt at correction (other than by rebooting) may corrupt the /etc/utmp file.

 
#!/usr/bin/ksh 
# utmp_clean.awk 
# 12/12/95 
# awk script to clean out entries in the /etc/utmp file 
# that have no current matching correct process in the 
# process table. 
# This MUST be run by the root user, either from the 
# command line or 
# from the root crontab entry. 
# 
if [ ! -s /usr/sbin/acct/fwtmp ] 
then 
# accounting not installed 
   print "Accounting must be installed first,fwtmp file does not exist" 
   exit 
fi 
# 
SUM=1 
NEWSUM=0 
while [ "$SUM" != "$NEWSUM" ] 
     do 
          SUM=$(/usr/bin/sum /etc/utmp) 
          /usr/sbin/acct/fwtmp </etc/utmp >/tmp/utmp.out 
          ps au |awk '{print $2,$1,$7}' |grep -v USER >/tmp/ps.out 
          NEWSUM=$(/usr/bin/sum /etc/utmp) 
          # loop until the file is unchanged 
          # on a busy system, this may take a long time. 
        done 
# 
cat /tmp/utmp.out | awk ' 
# load the array 
BEGIN { 
counter=0 
holder = "" 
ss=1 
while (ss == 1) 
     { 
     ss = (getline holder < "/tmp/ps.out") 
     if (ss == 0) 
          break 
     n=split(holder,temp) 
     combine=sprintf("%s %s",temp[2],temp[3]) 
     lookup[temp[1]]=combine 
     } 
} # end of BEGIN section 
{ 
if ((length($4) == 1) && ($4 == 7)) 
  { 
 
ps_name=lookup[$3] 
if (length(ps_name)  > 0) 
   { 
    #found a ps table entry with same pid 
    # entry needs to be checked for accuracy 
    #only if the name and tty match, write the entry 
    utmp_name=sprintf("%s %s",$1,$3) 
     if (ps_name == utmp_name) 
          print $0 
    } 
  } 
else # Not a entry to look at, just pass it along 
    { 
           print $0 
    } 
}' > /tmp/utmp.tmp 
/usr/sbin/acct/fwtmp -ic </tmp/utmp.tmp  >/tmp/utmp.new 
# Only if the /etc/utmp file is still unchanged from when 
# we last looked will the file be overwritten with the 
# updated copy. 
#           WARNING WARNING WARNING 
# There is a chance that this step may corrupt the 
# /etc/utmp file if a process changes it after we look 
# and before we can write it. 
CURRENTSUM=$(/usr/bin/sum /etc/utmp) 
if [ "$CURRENTSUM" = "$SUM" ] 
     then 
     /usr/bin/cp /tmp/utmp.new /etc/utmp 
     print "utmp successfully updated on "$(date) 
     else 
     print "utmp was too busy on "$(date)" to update now" 
     print "try again later" 
fi 
rm /tmp/ps.out 
rm /tmp/utmp.out 
rm /tmp/utmp.tmp 
rm /tmp/utmp.new 

Problems With The /etc/utmp File: utmp.problems.32.cmd ITEM: FAX
Dated: 98/08/31~00:00 Category: cmd
This HTML file was generated 99/06/24~12:42:03
Comments or suggestions?
Contact us