This chapter deals with system startup activities such as booting, creating boot images or files for starting the system, and setting the system run level. Using the reboot and shutdown commands is also covered.
The following topics are included in this chapter:
The procedure for booting a new or uninstalled system is part of the installation process. For information on how to boot an uninstalled system, see Start the System in the AIX 5L Version 5.2 Installation Guide and Reference.
There are two methods for shutting down and rebooting your system, shutdown and reboot. Always use the shutdown method when multiple users are logged onto the system. Because processes might be running that should be terminated more gracefully than a reboot permits, shutdown is the preferred method for all systems.
Rebooting a Running System Tasks | ||
---|---|---|
Web-based System Manager | wsm, then select System | |
-OR- | ||
Task | SMIT Fast Path | Command or File |
Rebooting a Multiuser System | smit shutdown | shutdown -r |
Rebooting a Single-User System | smit shutdown | shutdown -r or reboot |
The remote reboot facility allows the system to be rebooted through a native (integrated) serial port. The system is rebooted when the reboot_string is received at the port. This facility is useful when the system does not otherwise respond but is capable of servicing serial port interrupts. Remote reboot can be enabled on only one native serial port at a time. Users are expected to provide their own external security for the port. This facility runs at the highest device interrupt class and a failure of the UART (Universal Asynchronous Receive/Transmit) to clear the transmit buffer quickly may have the effect of causing other devices to lose data if their buffers overflow during this time. It is suggested that this facility only be used to reboot a machine that is otherwise hung and cannot be remotely logged into. File systems will not be synchronized, and a potential for some loss of data which has not been flushed exists. It is strongly suggested that when remote reboot is enabled that the port not be used for any other purpose, especially file transfer, to prevent an inadvertent reboot.
Two native serial port attributes control the operation of remote reboot.
Indicates whether this port is enabled to reboot the machine on receipt of the remote reboot_string, and if so, whether to take a system dump prior to rebooting.
no - Indicates remote reboot is disabled reboot - Indicates remote reboot is enabled dump - Indicates remote reboot is enabled, and prior to rebooting a system dump will be taken on the primary dump device
Specifies the remote reboot_string that the serial port will scan for when the remote reboot feature is enabled. When the remote reboot feature is enabled and the reboot_string is received on the port, a '>' character is transmitted and the system is ready to reboot. If a '1' character is received, the system is rebooted; any character other than '1' aborts the reboot process. The reboot_string has a maximum length of 16 characters and must not contain a space, colon, equal sign, null, new line, or Ctrl-\ character.
Remote reboot can be enabled through SMIT or the command line. For SMIT the path System Environments -> Manage Remote Reboot Facility may be used for a configured TTY. Alternatively, when configuring a new TTY, remote reboot may be enabled from the Add a TTY or Change/Show Characteristics of a TTY menus. These menus are accessed through the path Devices -> TTY.
From the command line, the mkdev or chdev commands are used to enable remote reboot. For example, the following command enables remote reboot (with the dump option) and sets the reboot string to ReBoOtMe on tty1.
chdev -l tty1 -a remreboot=dump -a reboot_string=ReBoOtMe
This example enables remote reboot on tty0 with the current reboot_string in the database only (will take effect on the next reboot).
chdev -P -l tty0 -a remreboot=reboot
If the tty is being used as a normal port, then you will have to use the pdisable command before enabling remote reboot. You may use penable to reenable the port afterwards.
A bootable removable media (tape or CD-ROM) must not be in the drive. Also, refer to the hardware documentation for the specific instructions to enable maintenance mode boot on your particular model.
To boot a machine in maintenance mode from a hard disk:
If there is a system dump that needs to be retrieved, the system dump menu will be displayed on the console.
Single-User Mode: To perform maintenance in a single-user environment, choose this option (option 5). The system continues to boot and enters single-user mode. Maintenance that requires the system to be in a standalone mode can be performed in this mode, and the bosboot command can be run, if required.
In some instances, you might have to boot a system that has stopped (crashed) without being properly shut down. This procedure covers the basics of how to boot if your system was unable to recover from the crash.
If you have a system that will not boot from the hard disk, see the procedure on how to access a system that will not boot in Troubleshooting in the AIX 5L Version 5.2 Installation Guide and Reference.
This procedure enables you to get a system prompt so that you can attempt to recover data from the system or perform corrective action enabling the system to boot from the hard disk.
Notes:
- This procedure is intended only for experienced system managers who have knowledge of how to boot or recover data from a system that is unable to boot from the hard disk. Most users should not attempt this procedure, but should contact their service representative.
- This procedure is not intended for system managers who have just completed a new installation, because in this case the system does not contain data that needs to be recovered. If you are unable to boot from the hard disk after completing a new installation, contact your service representative.
If the machine has been installed with the planar graphics susbsystem only, and later an additional graphics adapter is added to the system, the following occurs:
Note: Since the TTY is the system console, it remains the system console.
A variety of factors can cause a system to be unable to boot:
For information on accessing a system that will not boot from the disk drive, see Accessing a System That Will Not Boot.
To install the base operating system or to access a system that will not boot from the system hard drive, you need a boot image. This procedure describes how to create boot images. The boot image varies for each type of device. The associated RAM disk file system contains device configuration routines for the following devices:
lsvg -l rootvg
The lsvg -l command lists the logical volumes on the root volume group (rootvg). From this list you can find the name of the boot logical volume. Then type the following at a command prompt:
lsvg -M rootvg
The lsvg -M command lists the physical disks that contain the various logical volumes.
If the base operating system is being installed (either a new installation or an update), the bosboot command is called to place the boot image on the boot logical volume. The boot logical volume is a physically contiguous area on the disk created through the Logical Volume Manager (LVM) during installation.
The bosboot command does the following:
To create a boot image on the default boot logical volume on the fixed disk, type the following at a command prompt:
bosboot -a
OR:
bosboot -ad /dev/ipldevice
Note: Do not reboot the machine if the bosboot command fails while creating a boot image. Resolve the problem and run the bosboot command to successful completion.
You must reboot the system for the new boot image to be available for use.
To create a boot image for an Ethernet boot, type the following at a command prompt:
bosboot -ad /dev/ent
For a Token-Ring boot:
bosboot -ad /dev/tok
Before performing maintenance on the operating system or changing the system run level, you might need to examine the various run levels. This procedure describes how to identify the run level at which the system is operating and how to display a history of previous run levels. The init command determines the system run level.
At the command line, type cat /etc/.init.state. The system displays one digit; that is the current run level. See the init command or the /etc/inittab file for more information about run levels.
You can display a history of previous run levels using the fwtmp command.
Note: The bosext2.acct.obj code must be installed on your system to use this command.
/usr/lib/acct/fwtmp </var/adm/wtmp |grep run-level
The system displays information similar to the following:
run-level 2 0 1 0062 0123 697081013 Sun Feb 2 19:36:53 CST 1992 run-level 2 0 1 0062 0123 697092441 Sun Feb 2 22:47:21 CST 1992 run-level 4 0 1 0062 0123 698180044 Sat Feb 15 12:54:04 CST 1992 run-level 2 0 1 0062 0123 698959131 Sun Feb 16 10:52:11 CST 1992 run-level 5 0 1 0062 0123 698967773 Mon Feb 24 15:42:53 CST 1992
This procedure describes two methods for changing system run levels for multi-user or single-user systems.
When the system starts the first time, it enters the default run level defined by the initdefault entry in the /etc/inittab file. The system operates at that run level until it receives a signal to change it.
The following are the currently defined run levels:
0-9 | When the init command changes to run levels 0-9, it kills all processes at the current run levels then restarts any processes associated with the new run levels. |
0-1 | Reserved for the future use of the operating system. |
2 | Default run level. |
3-9 | Can be defined according to the user's preferences. |
a, b, c | When the init command requests a change to run levels a, b, or c, it does not kill processes at the current run levels; it simply starts any processes assigned with the new run levels. |
Q, q | Tells the init command to reexamine the /etc/inittab file. |
The system responds by telling you which processes are terminating or starting as a result of the change in run level and by displaying the message:
INIT: New run level: n
where n is the new run-level number.
The system responds by telling you which processes are terminating or starting as a result of the change in run level and by displaying the message:
INIT: New run level: n
where n is the new run-level number.
This section contains procedures for using the four commands (chitab, lsitab, mkitab, and rmitab) that modify the records in the etc/inittab file.
To add a record to the /etc/inittab file, type the following at a command prompt:
mkitab Identifier:Run Level:Action:Command
For example, to add a record for tty2, type the following at a command prompt:
mkitab tty002:2:respawn:/usr/sbin/getty /dev/tty2
In the above example:
tty002 | Identifies the object whose run level you are defining. |
2 | Specifies the run level at which this process runs. |
respawn | Specifies the action that the init command should take for this process. |
/usr/sbin/getty /dev/tty2 | Specifies the shell command to be executed. |
To change a record to the /etc/inittab file, type the following at a command prompt:
chitab Identifier:Run Level:Action:Command
For example, to change a record for tty2 so that this process runs at run levels 2 and 3, type:
chitab tty002:23:respawn:/usr/sbin/getty /dev/tty2
In the above example:
tty002 | Identifies the object whose run level you are defining. |
23 | Specifies the run levels at which this process runs. |
respawn | Specifies the action that the init command should take for this process. |
/usr/sbin/getty /dev/tty2 | Specifies the shell command to be executed. |
To list all records in the /etc/inittab file, type the following at a command prompt:
lsitab -a
To list a specific record in the /etc/inittab file, type:
lsitab Identifier
For example, to list the record for tty2, type: lsitab tty2.
To remove a record from the /etc/inittab file, type the following at a command prompt:
rmitab Identifier
For example, to remove the record for tty2, type: rmitab tty2.
The shutdown command is the safest and most thorough way to halt the operating system. When you designate the appropriate flags, this command notifies users that the system is about to go down, kills all existing processes, unmounts file systems, and halts the system. The following methods for shutting down the system are covered in this section:
You can use two methods to shut down the system without rebooting: the SMIT fastpath, or the shutdown command.
You must have root user authority to shut down the system.
To shut down the system using SMIT:
smit shutdown
To shut down the system using the shutdown command:
shutdown
In some cases, you might need to shut down the system and enter single-user mode to perform software maintenance and diagnostics.
You can also use the shutdown command to shut down the system under emergency conditions. Use this procedure to stop the system quickly without notifying other users.
Type shutdown -F. The -F flag instructs the shutdown command to bypass sending messages to other users and shut down the system as quickly as possible.
Your system can become inactive because of a hardware problem, a software problem, or a combination of both. This procedure guides you through steps to correct the problem and restart your system. If your system is still inactive after completing the procedure, refer to the problem-determination information in your hardware documentation.
Use the following procedures to reactivate an inactive system:
Check your hardware by:
If the Power-On light on your system is active, go to Checking the Operator Panel Display
If the Power-On light on your system is not active, check that the power is on and the system is plugged in.
If your system has an operator panel display, check it for any messages.
If the operator panel display on your system is blank, go to Activating Your Display or Terminal.
If the operator panel display on your system is not blank, go to the service guide for your unit to find information concerning digits in the Operator Panel Display.
Check several parts of your display or terminal, as follows:
If your system is now active, your hardware checks have corrected the problem.
If your system became inactive while you were trying to restart the system, go to Restarting the System.
If your system did not become inactive while you were trying to restart the system, go to Checking the Processes.
A stopped or stalled process might make your system inactive. Check your system processes by:
Restart line scrolling halted by the Ctrl-S key sequence by doing the following:
The Ctrl-S key sequence stops line scrolling, and the Ctrl-Q key sequence restarts line scrolling.
If your scroll check did not correct the problem with your inactive system, go to the next step, Using the Ctrl-D Key Sequence .
End a stopped process by doing the following:
If the Ctrl-D key sequence did not correct the problem with your inactive system, go to the next step, Using the Ctrl-C Key Sequence .
End a stopped process by doing the following:
If the Ctrl-C key sequence did not correct the problem with your inactive system, go to the next step, Logging In from a Remote Terminal or Host .
Log in remotely in either of two ways:
tn YourSystemName
The system asks for your regular login name and password when you use the tn command.
If you were able to log in to the system from a remote terminal or host, go to the next step, Ending Stalled Processes Remotely .
If you were not able to log in to the system from a remote terminal or host, go to Restarting the System .
You can also start a system dump to determine why your system became inactive. For more information, see System Dump Facility .
End a stalled process from a remote terminal by doing the following:
ps -ef
The -e and -f flags identify all active and inactive processes.
For help in identifying processes, use the grep command with a search string. For example, to end the xlock process, type the following to find the process ID:
ps -ef | grep xlock
The grep command allows you to search on the output from the ps command to identify the process ID of a specific process.
Note: You must have root user authority to use the kill command on processes you did not initiate.
kill -9 ProcessID
If you cannot identify the problem process, the most recently activated process might be the cause of your inactive system. End the most recent process if you think that is the problem.
If your process checks have not corrected the problem with your inactive system, go to Restarting the System .
You can also start a system dump to determine why your system became inactive. For more information, see System Dump Facility.
If the first two procedures fail to correct the problem that makes your system inactive, you need to restart your system.
This procedure involves the following:
Your system boots with either a removable medium, an external device, a small computer system interface (SCSI) device, an integrated device electronics (IDE) device, or a local area network (LAN). Decide which method applies to your system, and use the following instructions to check the boot device:
If the boot device is working correctly, go to Loading the Operating System.
Load your operating system by doing the following:
If the operating system failed to load, boot the hard disk from maintenance mode or hardware diagnostics.
If you are still unable to restart the system, use an SRN to report the problem with your inactive system to your service representative.
System hang management allows users to run mission-critical applications continuously while improving application availability. System hang detection alerts the system administrator of possible problems and then allows the administrator to log in as root or to reboot the system to resolve the problem.
The shconf command is invoked when System Hang Detection is enabled. The shconf command configures which events are surveyed and what actions are to be taken if such events occur. You can specify any of the following actions, the priority level to check, the time out while no process or thread executes at a lower or equal priority, the terminal device for the warning action, and the getty command action:
For the Launch a command and Give a special getty options, system hang detection launches the special getty command or the specified command at the highest priority. The special getty command prints a warning message that it is a recovering getty running at priority 0. The following table captures the various actions and the associated default parameters for priority hang detection. Only one action is enabled for each type of detection.
Option | Enablement | Priority | Timeout (seconds) |
---|---|---|---|
Log an error in errlog file | disabled | 60 | 120 |
Display a warning message | disabled | 60 | 120 |
Give a recovering getty | enabled | 60 | 120 |
Launch a command | disabled | 60 | 120 |
Reboot the system | disabled | 39 | 300 |
For lost IO detection, you can set the time out value and enable the following actions:
Option | Enablement |
---|---|
Display a warning message | disabled |
Reboot the system | disabled |
Lost IO events are recorded in the AIX error log file.
The shdaemon daemon is a process that is launched by init and runs at priority 0 (zero). It is in charge of handling system hang detection by retrieving configuration information, initiating working structures, and starting detection times set by the user.
You can manage the system hang detection configuration from the SMIT management tool. SMIT menu options allow you to enable or disable the detection mechanism, display the current state of the feature, and change or show the current configuration. The fast paths for system hang detection menus are:
You can also manage system hang detection using the shconf command, which is documented in AIX 5L Version 5.2 Commands Reference.