IBM Books

Diagnosis Guide


Error symptoms, responses, and recoveries

Most of the error symptoms have obvious solutions such as "add the user to the database", or "have the user issue the correct login command". These types of errors will not have responses and recovery actions listed here.

There are some errors that are not normally seen, but can be generated and are seen through the remote commands. These types of errors are included here for reference.

Remote command (rsh/rcp) symptoms

Use this table to locate the problem symptom and corresponding recover action.

Table 51. Remote command (rsh/rcp) symptoms and recovery actions

Symptom Recovery action
SIGINT error from rsh. See SIGINT error from rsh command.
Protocol failure errors - Kerberos V4. See Protocol failure errors - Kerberos V4.
Exec Format Error. See Protocol failure errors - Kerberos V4.
SP services fail due to Kerberos V5 remote command authentication problems. See SP Services fail when using Kerberos V5.
Non-SP system applications fail when using rsh or rcp. See Non-SP System applications fail when using rsh or rcp commands.
kshell or tcp: Unknown service. See Unknown service: kshell or TCP.
Unable to rsh, telnet, rlogin.

ping shows host is up.

See Unable to rsh, rcp, telnet, rlogin, but ping shows host is up.
Error messages pointing to or from authentication subsystem. See
  1. Protocol failure errors - Kerberos V4
  2. SP Services fail when using Kerberos V5
  3. Non-SP System applications fail when using rsh or rcp commands
  4. Error messages pointing to or from an authentication subsystem
Error messages indicating SP Security configuration problems. See
  1. Protocol failure errors - Kerberos V4
  2. SP Services fail when using Kerberos V5
  3. Non-SP System applications fail when using rsh or rcp commands
  4. Error messages pointing to or from an authentication subsystem
  5. Error messages indicating SP Security Services configuration problems
Kerberos rsh message 0041-010 Cannot import nflag or options variables. See Kerberos rsh message 0041-010.
Error messages pointing to connection problems. See Remote command connection problems.
Error messages pointing to server configuration errors (Kerberos V5). See Remote command server configuration errors.
Remote to remote rcp errors. See Remote to remote rcp errors.
Cannot contact KDC for realm (Kerberos V5). See Cannot contact KDC (Kerberos V5).
Decrypt integrity check failed (Kerberos V5). See Decrypt integrity check failed (Kerberos V5).
Unexpected remote command authorization failures. See Remote command authorization failures.

Remote commands (rsh/rcp) recovery actions

SIGINT error from rsh command

This error is received if rsh is issued at the command line and placed in the background without using the -n flag. The recovery action is to use the -n flag when issuing rsh in the background.

Protocol failure errors - Kerberos V4

This error message is usually received when the rsh client is passing a message through from either the krshd server, or one of the lower level Kerberos V4 libraries, where an error occurred. This message normally has the "passed through" message appended to it or on the next line.

If the message does not concern items like network errors or connection errors, the first recovery action is to verify that krshd is not having problems, by setting up syslog to capture any daemon error messages. See Error information. Remember to set syslog up on the target machine - not the machine where you issued the rsh command.

Any error messages logged from krshd should help pinpoint the problem and possible solution. The type of messages range from "cannot locate servers", or "cannot reach server", to errors obtained due to incorrect principal format for authentication or authorization.

While a complete list of "passed through" error messages cannot be specified, some general types of errors and recovery actions are helpful:

SP Services fail when using Kerberos V5

SP services can fail using the remote commands with the Kerberos V5 (through DCE) authentication method, due to expired credentials and the inability for services to obtain new credentials if the key used to login has expired.

An SP Service must login to DCE in order to obtain credentials for use with the remote commands. If the principal's key ("password") has expired, the service cannot log in to obtain credentials. Remote command failures may be the first indication of this situation, with other seemingly unrelated failures occurring. This type of error may affect services on only one node if that node has been down during the time an automatic refresh of keys had taken place.

For information on how to handle this situation, see Diagnosing SP Security Services problems.

Non-SP System applications fail when using rsh or rcp commands

The AIX /usr/bin/rsh and /usr/bin/rcp commands now support Kerberos V5 (through DCE), Kerberos V4 (through an SP-supplied library) and Standard AIX. These commands expect users and applications to have the proper tickets or credentials for the authentication mechanism installed, and the method enabled on the SP system.

Non-SP system applications, such as database or workflow applications, may need to be updated to obtain tickets or credentials for the authentication method enabled. These applications may also need to be updated to place the proper principals, and create the proper accounts in the authentication database or DCE registry for that application's use.

The only bypass for applications that do not support the authentication methods installed, configured, and enabled on the SP system is to disable all authentication methods except for AIX Standard and to put in place the proper authorization files (.rhosts).

Unknown service: kshell or TCP

This message usually means that the proper krshd configuration has not been done. The primary cause is either a missing or commented out kshell line in the /etc/services file on the source host, target host or both.

Verify that the following line is valid in the /etc/services file on both the source and target host:

kshell          544/tcp         krcmd

Unable to rsh, rcp, telnet, rlogin, but ping shows host is up

This type of error can indicate one of two problems.

Error messages pointing to or from an authentication subsystem

When you receive messages indicating a problem with the underlying authentication mechanism, it is best to consult the proper manual for diagnosis and recovery information. Although the remote commands would not be the only function to see errors from the authentication mechanism, the remote commands may be the first indication of these errors.

Some authentication mechanism problems that can directly affect the remote commands are:

Error messages indicating SP Security Services configuration problems

As indicated in the dependency section, the remote commands rely on the authentication method being installed and configured on the SP system, configured for SP system use and enabled. Depending on the authentication mechanism, the SP install and configure scripts perform a number of functions for the proper setup of the system.

Configuration and enablement of an authentication method depends on the choices you have made on the SP Security SMIT panels, and the resulting installation and configuration of the nodes after those choices have been made.

Some of the errors that directly affect the remote commands include:

For more information, consult Diagnosing node installation problems in this book, PSSP: Installation and Migration Guide, and RS/6000 SP: Planning Volume 2.

Kerberos rsh message 0041-010

This message is received if you have not installed the AIX APAR IX85420 but have updated your SP system with ssp.clients fixes. Install the AIX APAR to obtain the companion fix to allow the Kerberos rsh routine to obtain the variables it needs from the AIX rsh client. The minimum AIX file set required is bos.net.tcp.client.4.3.2.4. If you have any questions, contact the IBM Support Center.

Remote command connection problems

The target host may be in the process of rebooting, or the inetd system may be down. Check whether the krshd daemon is listed in the file /etc/inetd.conf on the target system. If you had to add the krshd daemon, be sure to stop and restart the inetd subsystem.

For a "Kerberos V5 Connection abort" message, it means that DCE was deinstalled, but a DCE unconfig_admin was never done for the target.

For a "Kerberos V5 Connection ended by software" message, DCE configuration is incomplete. Client services are only partially available.

Remote command server configuration errors

The authentication services (servers and clients) may be inactive even though the services are still enabled. The authentication services may have been unconfigured. Use the lsauthent and lsauthpar commands to verify which authentication services are enabled, then check that they are running.

Remote to remote rcp errors

In order for remote to remote rcp to work, credentials must be present on the intermediate source host. Kerberos V5 (through DCE) supports forwarding of credentials. Kerberos V4 does not support forwarding of credentials. With a choice of authentication now available, this command may or may not work, depending on the authentication methods enabled on the systems involved.

For example, if you have three hosts named: HostA, HostB, and HostC and you issue the following command from HostA:

rcp HostB:/file HostC:/file

The following table shows what can happen and some of the necessary requirements.

Table 52. Remote to remote rcp

Authentication method Result Comments
Kerberos V5 (through DCE) on all hosts Success User must have forwardable credentials to allow use of those credentials on HostB.
Kerberos V4 on all hosts Error No credentials are forwarded, so no credentials exist on HostB for the rcp command to use from HostB to HostC.

If you are root, you may use the rsh command to issue the rcmdtgt and rcp commands on HostB.

AIX standard on all hosts Success Based on IP addresses and not user authentication.
Kerberos V5 on HostA and HostB

Kerberos V4 on HostA, HostB and HostC

Error The rcp command from HostA to HostB works, but the rcp command started on HostB uses DCE first, and fails because DCE is not on HostC.

The rcp command on HostB then tries Kerberos V4 and does not have the tickets necessary to continue.

Kerberos V5 on HostA and HostB

Standard AIX on HostA, HostB and HostC

Success The rcp command from HostA to HostB works, but the rcp command started on HostB uses DCE first, and fails because DCE is not on HostC.

The rcp command on HostB then tries the next method (standard AIX), which is successful.

Kerberos V5 on HostB and HostC

Kerberos V4 on HostA, HostB and HostC

Possible success If the user has a DCE login session on HostB, the command works because the rcp command on HostB has the user's credentials available.

Otherwise, the command fails.

Cannot contact KDC (Kerberos V5)

The /etc/krb5.conf file contains an error in the realms stanza. The kdc= entry does not contain the correct address for the security server's location. Usually, the problem is that the kdc entry address is on a different subnet than the current host.

Decrypt integrity check failed (Kerberos V5)

This usually indicates that a node's DCE services are out of synch with the rest of the DCE cell. Perform these steps:

  1. Stop the DCE servers and clients, by issuing the stopsrc command.
  2. Issue the clean_up.dce command.
  3. Issue the clean_up.dce-core command.
  4. Issue the rmxcreds command.
  5. Restart the DCE servers and clients, by issuing the startsrc command.

Remote command authorization failures

If you experience unexpected failure when issuing remote commands, check the following:

  1. Check the SDR Site Environment settings. Run splstdata -e on the control workstation to make sure that the Restricted Root Access setting is correct. If not, issue the spsitenv command for the restrict_root_rcmd attribute. A value of true indicates that Restricted Root Access in to be used. A value of false indicates that Restricted Root Access is not to be used.
  2. Check the remote command authorization files on the control workstation. If they are not correct, run spsitenv for the restrict_root_rcmd attribute to force all authorization files to be regenerated. If the restrict_root_rcmd attribute is false and your authentication is Kerberos Version 4, also run the setup_authent command.
  3. Check the remote command authorization files on the nodes. If they are not correct, run spsitenv for the restrict_root_rcmd attribute to force all authorization files to be regenerated. If the restrict_root_rcmd attribute is false and your authentication is Kerberos Version 4, also run the setup_authent command.

Using secure remote commands - symptoms


Table 53. secure remote command symptoms and recovery actions

Symptom Recovery action
Parallel command (dsh, pcp) hangs with secure shell enabled See Action 1.
Parallel command (dsh, pcp) is using the wrong remote command method (rsh versus secure shell). See Action 2.
Secure connection to hostname refused. See Action 3.
After node install with secure shell enabled, failures in remote copy of files form the control workstation to the nodes from firstboot.cust. See Action 4.

Using secure remote commands - recovery actions

Action 1

Perform these steps:

  1. Check to see that the user public key is installed on the nodes to which the dsh command is being sent.
  2. Check to see that either StrictHostName checking is disabled, or that the node's name (long and short hostname) is in the known_host file.
  3. Perform Action 2.

Action 2

Check your setting of the RCMD_PGM, DSH_REMOTE_CMD and REMOTE_COPY_CMD environment variables. These variables determine which remote command method is used by parallel commands.

Action 3

Check to see that the sshd daemon is running on the host that is listed in the error message received when the connection failed.

Action 4

Check to see that the secure remote command program was installed, and that the daemon started by script.cust on the node. Ensure that the sshd daemon was put in file /etc/inittab right after the rctcpip command.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]