[ Bottom of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]

System Management Guide:
Operating System and Devices

Starting and Stopping the System

This chapter deals with system startup activities such as booting, creating boot images or files for starting the system, and setting the system run level. Using the reboot and shutdown commands is also covered.

The following topics are included in this chapter:

Booting an Uninstalled System

The procedure for booting a new or uninstalled system is part of the installation process. For information on how to boot an uninstalled system, see Start the System in the AIX 5L Version 5.2 Installation Guide and Reference.

Rebooting a Running System

There are two methods for shutting down and rebooting your system, shutdown and reboot. Always use the shutdown method when multiple users are logged onto the system. Because processes might be running that should be terminated more gracefully than a reboot permits, shutdown is the preferred method for all systems.

Rebooting a Running System Tasks
Web-based System Manager wsm, then select System
-OR-
Task SMIT Fast Path Command or File
Rebooting a Multiuser System smit shutdown shutdown -r
Rebooting a Single-User System smit shutdown shutdown -r or reboot

Remotely Rebooting an Unresponsive System

The remote reboot facility allows the system to be rebooted through a native (integrated) serial port. The system is rebooted when the reboot_string is received at the port. This facility is useful when the system does not otherwise respond but is capable of servicing serial port interrupts. Remote reboot can be enabled on only one native serial port at a time. Users are expected to provide their own external security for the port. This facility runs at the highest device interrupt class and a failure of the UART (Universal Asynchronous Receive/Transmit) to clear the transmit buffer quickly may have the effect of causing other devices to lose data if their buffers overflow during this time. It is suggested that this facility only be used to reboot a machine that is otherwise hung and cannot be remotely logged into. File systems will not be synchronized, and a potential for some loss of data which has not been flushed exists. It is strongly suggested that when remote reboot is enabled that the port not be used for any other purpose, especially file transfer, to prevent an inadvertent reboot.

Two native serial port attributes control the operation of remote reboot.

reboot_enable

Indicates whether this port is enabled to reboot the machine on receipt of the remote reboot_string, and if so, whether to take a system dump prior to rebooting.

no        - Indicates remote reboot is disabled
reboot    - Indicates remote reboot is enabled
dump      - Indicates remote reboot is enabled, and prior to rebooting a system dump
            will be taken on the primary dump device

reboot_string

Specifies the remote reboot_string that the serial port will scan for when the remote reboot feature is enabled. When the remote reboot feature is enabled and the reboot_string is received on the port, a '>' character is transmitted and the system is ready to reboot. If a '1' character is received, the system is rebooted; any character other than '1' aborts the reboot process. The reboot_string has a maximum length of 16 characters and must not contain a space, colon, equal sign, null, new line, or Ctrl-\ character.

Remote reboot can be enabled through SMIT or the command line. For SMIT the path System Environments -> Manage Remote Reboot Facility may be used for a configured TTY. Alternatively, when configuring a new TTY, remote reboot may be enabled from the Add a TTY or Change/Show Characteristics of a TTY menus. These menus are accessed through the path Devices -> TTY.

From the command line, the mkdev or chdev commands are used to enable remote reboot. For example, the following command enables remote reboot (with the dump option) and sets the reboot string to ReBoOtMe on tty1.

chdev -l tty1 -a remreboot=dump -a reboot_string=ReBoOtMe

This example enables remote reboot on tty0 with the current reboot_string in the database only (will take effect on the next reboot).

chdev -P -l tty0 -a remreboot=reboot

If the tty is being used as a normal port, then you will have to use the pdisable command before enabling remote reboot. You may use penable to reenable the port afterwards.

Booting from Hard Disk for Maintenance

Prerequisites

A bootable removable media (tape or CD-ROM) must not be in the drive. Also, refer to the hardware documentation for the specific instructions to enable maintenance mode boot on your particular model.

Procedure

To boot a machine in maintenance mode from a hard disk:

  1. To reboot, either turn the machine off and then power it back on, or press the reset button.
  2. Press the key sequence for rebooting in maintenance mode that is specified in your hardware documentation.
  3. The machine will boot to a point where it has a console device configured.

    If there is a system dump that needs to be retrieved, the system dump menu will be displayed on the console.

    Note
    If the console fails to configure when there is a dump to be retrieved, the system will hang. The system must be booted from a removable medium to retrieve the dump.
  4. If there is no system dump, or if it has been copied, the diagnostic operating instructions will be displayed. Press Enter to continue to the Function Selection menu.
  5. From the Function Selection menu, you can select diagnostic or single user mode:

    Single-User Mode: To perform maintenance in a single-user environment, choose this option (option 5). The system continues to boot and enters single-user mode. Maintenance that requires the system to be in a standalone mode can be performed in this mode, and the bosboot command can be run, if required.

Booting a System That Crashed

In some instances, you might have to boot a system that has stopped (crashed) without being properly shut down. This procedure covers the basics of how to boot if your system was unable to recover from the crash.

Prerequisites

  1. Your system crashed and was not properly shut down due to unusual conditions.
  2. Your system is turned off.

Procedure

  1. Ensure that all hardware and peripheral devices are correctly connected.
  2. Turn on all of the peripheral devices.
  3. Watch the screen for information about automatic hardware diagnostics.

Accessing a System That Will Not Boot

If you have a system that will not boot from the hard disk, see the procedure on how to access a system that will not boot in Troubleshooting in the AIX 5L Version 5.2 Installation Guide and Reference.

This procedure enables you to get a system prompt so that you can attempt to recover data from the system or perform corrective action enabling the system to boot from the hard disk.

Notes:
  1. This procedure is intended only for experienced system managers who have knowledge of how to boot or recover data from a system that is unable to boot from the hard disk. Most users should not attempt this procedure, but should contact their service representative.
  2. This procedure is not intended for system managers who have just completed a new installation, because in this case the system does not contain data that needs to be recovered. If you are unable to boot from the hard disk after completing a new installation, contact your service representative.

Rebooting a System With Planar Graphics

If the machine has been installed with the planar graphics susbsystem only, and later an additional graphics adapter is added to the system, the following occurs:

  1. A new graphics adapter is added to the system, and its associated device driver software is installed.
  2. The system is rebooted, and one of the following occurs:
    1. If the system console is defined to be /dev/lft0 (lscons displays this information), the user is asked to select which display is the system console at reboot time. If the user selects a graphics adapter (non-TTY device), it also becomes the new default display. If the user selects a TTY device instead of an LFT device, no system login appears. Reboot again, and the TTY login screen is displayed. It is assumed that if the user adds an additional graphics adapter into the system and the system console is an LFT device, the user will not select the TTY device as the system console.
    2. If the system console is defined to be a TTY, then at reboot time the newly added display adapter becomes the default display.

      Note: Since the TTY is the system console, it remains the system console.
  3. If the system console is /def/lft0, then after reboot, DPMS is disabled in order to show the system console selection text on the screen for an indefinite period of time. To re-enable DPMS, reboot the system again.

Diagnosing Boot Problems

A variety of factors can cause a system to be unable to boot:

For information on accessing a system that will not boot from the disk drive, see Accessing a System That Will Not Boot.

Creating Boot Images

To install the base operating system or to access a system that will not boot from the system hard drive, you need a boot image. This procedure describes how to create boot images. The boot image varies for each type of device. The associated RAM disk file system contains device configuration routines for the following devices:

Prerequisites

Creating a Boot Image on a Boot Logical Volume

If the base operating system is being installed (either a new installation or an update), the bosboot command is called to place the boot image on the boot logical volume. The boot logical volume is a physically contiguous area on the disk created through the Logical Volume Manager (LVM) during installation.

The bosboot command does the following:

  1. Checks the file system to see if there is enough room to create the boot image.
  2. Creates a RAM file system using the mkfs command and a prototype file.
  3. Calls the mkboot command, which merges the kernel and the RAM file system into a boot image.
  4. Writes the boot image to the boot logical volume.

To create a boot image on the default boot logical volume on the fixed disk, type the following at a command prompt:

bosboot -a

OR:

bosboot -ad /dev/ipldevice

Note: Do not reboot the machine if the bosboot command fails while creating a boot image. Resolve the problem and run the bosboot command to successful completion.

You must reboot the system for the new boot image to be available for use.

Creating a Boot Image for a Network Device

To create a boot image for an Ethernet boot, type the following at a command prompt:

bosboot -ad /dev/ent

For a Token-Ring boot:

bosboot -ad /dev/tok

Identifying System Run Levels

Before performing maintenance on the operating system or changing the system run level, you might need to examine the various run levels. This procedure describes how to identify the run level at which the system is operating and how to display a history of previous run levels. The init command determines the system run level.

Identifying the Current Run Level

At the command line, type cat /etc/.init.state. The system displays one digit; that is the current run level. See the init command or the /etc/inittab file for more information about run levels.

Displaying a History of Previous Run Levels

You can display a history of previous run levels using the fwtmp command.

Note: The bosext2.acct.obj code must be installed on your system to use this command.
  1. Log in as root user.
  2. Type the following at a command prompt:

    /usr/lib/acct/fwtmp </var/adm/wtmp |grep run-level

    The system displays information similar to the following:

    run-level 2  0 1 0062 0123 697081013 Sun Feb  2 19:36:53 CST 1992
    run-level 2  0 1 0062 0123 697092441 Sun Feb  2 22:47:21 CST 1992
    run-level 4  0 1 0062 0123 698180044 Sat Feb 15 12:54:04 CST 1992
    run-level 2  0 1 0062 0123 698959131 Sun Feb 16 10:52:11 CST 1992
    run-level 5  0 1 0062 0123 698967773 Mon Feb 24 15:42:53 CST 1992

Changing System Run Levels

This procedure describes two methods for changing system run levels for multi-user or single-user systems.

When the system starts the first time, it enters the default run level defined by the initdefault entry in the /etc/inittab file. The system operates at that run level until it receives a signal to change it.

The following are the currently defined run levels:

0-9 When the init command changes to run levels 0-9, it kills all processes at the current run levels then restarts any processes associated with the new run levels.
0-1 Reserved for the future use of the operating system.
2 Default run level.
3-9 Can be defined according to the user's preferences.
a, b, c When the init command requests a change to run levels a, b, or c, it does not kill processes at the current run levels; it simply starts any processes assigned with the new run levels.
Q, q Tells the init command to reexamine the /etc/inittab file.

Changing Run Levels on Multiuser Systems

  1. Check the /etc/inittab file to confirm that the run level to which you are changing supports the processes that you are running. The getty process is particularly important, since it controls the terminal line access for the system console and other logins. Ensure that the getty process is enabled at all run levels.
  2. Use the wall command to inform all users that you intend to change the run level and request that users log off.
  3. Use the smit telinit fast path to access the Set System Run Level menu.
  4. Type the new run level in the System RUN LEVEL field.
  5. Press Enter to implement all of the settings in this procedure.

    The system responds by telling you which processes are terminating or starting as a result of the change in run level and by displaying the message:

    INIT: New run level: n

    where n is the new run-level number.

Changing Run Levels on Single-User Systems

  1. Check the /etc/inittab file to confirm that the run level to which you are changing supports the processes that you are running. The getty process is particularly important, since it controls the terminal line access for the system console and other logins. Ensure that the getty process is enabled at all run levels.
  2. Use the smit telinit fast path to access the Set System Run Level menu.
  3. Type the new system run level in the System RUN LEVEL field.
  4. Press Enter to implement all of the settings in this procedure.

    The system responds by telling you which processes are terminating or starting as a result of the change in run level and by displaying the message:

    INIT: New run level: n

    where n is the new run-level number.

Changing the /etc/inittab File

This section contains procedures for using the four commands (chitab, lsitab, mkitab, and rmitab) that modify the records in the etc/inittab file.

Adding Records - mkitab Command

To add a record to the /etc/inittab file, type the following at a command prompt:

mkitab Identifier:Run Level:Action:Command

For example, to add a record for tty2, type the following at a command prompt:

mkitab tty002:2:respawn:/usr/sbin/getty /dev/tty2

In the above example:

tty002 Identifies the object whose run level you are defining.
2 Specifies the run level at which this process runs.
respawn Specifies the action that the init command should take for this process.
/usr/sbin/getty /dev/tty2 Specifies the shell command to be executed.

Changing Records - chitab Command

To change a record to the /etc/inittab file, type the following at a command prompt:

chitab Identifier:Run Level:Action:Command

For example, to change a record for tty2 so that this process runs at run levels 2 and 3, type:

chitab tty002:23:respawn:/usr/sbin/getty /dev/tty2

In the above example:

tty002 Identifies the object whose run level you are defining.
23 Specifies the run levels at which this process runs.
respawn Specifies the action that the init command should take for this process.
/usr/sbin/getty /dev/tty2 Specifies the shell command to be executed.

Listing Records - lsitab Command

To list all records in the /etc/inittab file, type the following at a command prompt:

lsitab -a

To list a specific record in the /etc/inittab file, type:

lsitab Identifier

For example, to list the record for tty2, type: lsitab tty2.

Removing Records

To remove a record from the /etc/inittab file, type the following at a command prompt:

rmitab Identifier

For example, to remove the record for tty2, type: rmitab tty2.

Stopping the System

The shutdown command is the safest and most thorough way to halt the operating system. When you designate the appropriate flags, this command notifies users that the system is about to go down, kills all existing processes, unmounts file systems, and halts the system. The following methods for shutting down the system are covered in this section:

Shutting Down the System without Rebooting

You can use two methods to shut down the system without rebooting: the SMIT fastpath, or the shutdown command.

Prerequisites

You must have root user authority to shut down the system.

Procedure

To shut down the system using SMIT:

  1. Log in as root.
  2. At the command prompt, type:

    smit shutdown

To shut down the system using the shutdown command:

  1. Log in as root.
  2. At the command prompt, type:

    shutdown

Shutting Down the System to Single-User Mode

In some cases, you might need to shut down the system and enter single-user mode to perform software maintenance and diagnostics.

  1. Type cd / to change to the root directory. You must be in the root directory to shut down the system to single-user mode to ensure that file systems are unmounted cleanly.
  2. Type shutdown -m. The system shuts down to single-user mode. A system prompt displays and you can perform maintenance activities.

Shutting Down the System in an Emergency

You can also use the shutdown command to shut down the system under emergency conditions. Use this procedure to stop the system quickly without notifying other users.

Type shutdown -F. The -F flag instructs the shutdown command to bypass sending messages to other users and shut down the system as quickly as possible.

Reactivating an Inactive System

Your system can become inactive because of a hardware problem, a software problem, or a combination of both. This procedure guides you through steps to correct the problem and restart your system. If your system is still inactive after completing the procedure, refer to the problem-determination information in your hardware documentation.

Use the following procedures to reactivate an inactive system:

Checking the Hardware

Check your hardware by:

Checking the Power

If the Power-On light on your system is active, go to Checking the Operator Panel Display

If the Power-On light on your system is not active, check that the power is on and the system is plugged in.

Checking the Operator Panel Display

If your system has an operator panel display, check it for any messages.

If the operator panel display on your system is blank, go to Activating Your Display or Terminal.

If the operator panel display on your system is not blank, go to the service guide for your unit to find information concerning digits in the Operator Panel Display.

Activating Your Display or Terminal

Check several parts of your display or terminal, as follows:

If your system is now active, your hardware checks have corrected the problem.

If your system became inactive while you were trying to restart the system, go to Restarting the System.

If your system did not become inactive while you were trying to restart the system, go to Checking the Processes.

Checking the Processes

A stopped or stalled process might make your system inactive. Check your system processes by:

Restarting Line Scrolling

Restart line scrolling halted by the Ctrl-S key sequence by doing the following:

  1. Activate the window or shell with the problem process.
  2. Press the Ctrl-Q key sequence to restart scrolling.

The Ctrl-S key sequence stops line scrolling, and the Ctrl-Q key sequence restarts line scrolling.

If your scroll check did not correct the problem with your inactive system, go to the next step, Using the Ctrl-D Key Sequence .

Using the Ctrl-D Key Sequence

End a stopped process by doing the following:

  1. Activate the window or shell with the problem process.
  2. Press the Ctrl-D key sequence. The Ctrl-D key sequence sends an end of file (EOF) signal to the process. The Ctrl-D key sequence may close the window or shell and log you out.

If the Ctrl-D key sequence did not correct the problem with your inactive system, go to the next step, Using the Ctrl-C Key Sequence .

Using the Ctrl-C Key Sequence

End a stopped process by doing the following:

  1. Activate the window or shell with the problem process.
  2. Press the Ctrl-C key sequence. The Ctrl-C key sequence stops the current search or filter.

If the Ctrl-C key sequence did not correct the problem with your inactive system, go to the next step, Logging In from a Remote Terminal or Host .

Logging In from a Remote Terminal or Host

Log in remotely in either of two ways:

If you were able to log in to the system from a remote terminal or host, go to the next step, Ending Stalled Processes Remotely .

If you were not able to log in to the system from a remote terminal or host, go to Restarting the System .

You can also start a system dump to determine why your system became inactive. For more information, see System Dump Facility .

Ending Stalled Processes Remotely

End a stalled process from a remote terminal by doing the following:

  1. List active processes by typing the following ps command:

    ps -ef

    The -e and -f flags identify all active and inactive processes.

  2. Identify the process ID of the stalled process.

    For help in identifying processes, use the grep command with a search string. For example, to end the xlock process, type the following to find the process ID:

    ps -ef | grep xlock

    The grep command allows you to search on the output from the ps command to identify the process ID of a specific process.

  3. End the process by typing the following kill command:

    Note: You must have root user authority to use the kill command on processes you did not initiate.

    kill -9 ProcessID

    If you cannot identify the problem process, the most recently activated process might be the cause of your inactive system. End the most recent process if you think that is the problem.

If your process checks have not corrected the problem with your inactive system, go to Restarting the System .

You can also start a system dump to determine why your system became inactive. For more information, see System Dump Facility.

Restarting the System

If the first two procedures fail to correct the problem that makes your system inactive, you need to restart your system.

Note
Before restarting your system, complete a system dump. For more information, see System Dump Facility .

This procedure involves the following:

Checking the State of the Boot Device

Your system boots with either a removable medium, an external device, a small computer system interface (SCSI) device, an integrated device electronics (IDE) device, or a local area network (LAN). Decide which method applies to your system, and use the following instructions to check the boot device:

If the boot device is working correctly, go to Loading the Operating System.

Loading the Operating System

Load your operating system by doing the following:

  1. Turn off your system's power.
  2. Wait one minute.
  3. Turn on your system's power.
  4. Wait for the system to boot.

If the operating system failed to load, boot the hard disk from maintenance mode or hardware diagnostics.

If you are still unable to restart the system, use an SRN to report the problem with your inactive system to your service representative.

System Hang Management

System hang management allows users to run mission-critical applications continuously while improving application availability. System hang detection alerts the system administrator of possible problems and then allows the administrator to log in as root or to reboot the system to resolve the problem.

shconf Command

The shconf command is invoked when System Hang Detection is enabled. The shconf command configures which events are surveyed and what actions are to be taken if such events occur. You can specify any of the following actions, the priority level to check, the time out while no process or thread executes at a lower or equal priority, the terminal device for the warning action, and the getty command action:

For the Launch a command and Give a special getty options, system hang detection launches the special getty command or the specified command at the highest priority. The special getty command prints a warning message that it is a recovering getty running at priority 0. The following table captures the various actions and the associated default parameters for priority hang detection. Only one action is enabled for each type of detection.

Option Enablement Priority Timeout (seconds)
Log an error in errlog file disabled 60 120
Display a warning message disabled 60 120
Give a recovering getty enabled 60 120
Launch a command disabled 60 120
Reboot the system disabled 39 300
Note
When Launch a recovering getty on a console is enabled, the shconf command adds the -u flag to the getty command in the inittab that is associated with the console login.

For lost IO detection, you can set the time out value and enable the following actions:

Option Enablement
Display a warning message disabled
Reboot the system disabled

Lost IO events are recorded in the AIX error log file.

shdaemon Daemon

The shdaemon daemon is a process that is launched by init and runs at priority 0 (zero). It is in charge of handling system hang detection by retrieving configuration information, initiating working structures, and starting detection times set by the user.

Changing the System Hang Detection Configuration

You can manage the system hang detection configuration from the SMIT management tool. SMIT menu options allow you to enable or disable the detection mechanism, display the current state of the feature, and change or show the current configuration. The fast paths for system hang detection menus are:

smit shd
Manage System Hang Detection
smit shstatus
System Hang Detection Status
smit shpriocfg
Change/Show Characteristics of Priority Problem Detection
smit shreset
Restore Default Priority Problem Configuration
smit shliocfg
Change/Show Characteristics of Lost I/O Detection
smit shlioreset
Restore Default Lost I/O Detection Configuration

You can also manage system hang detection using the shconf command, which is documented in AIX 5L Version 5.2 Commands Reference.

[ Top of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]