Troubleshooting Service Director Communications

ABSTRACT: Troubleshooting Service Director Communications
.SYMPTOM:  Service Director is not calling out on problems reported by a
          client, or a "Test Communications Path" returns "unknown
          status code 0" error.
.STEP 1: Verify the Communications Path is Working from the Forwarding
.     When a problem is detected on the forwarding server (the one with
     the modem), the servdir.analyze program handles modem
     communications.  By testing the communications path from the
     server, we can determine if the failure to call out is a problem
     with the modem.
.     1) On the server, type in the command:
.             \# servdir.analyze ctest:

     2) If you receive a return code of zero, the problem is with the
        modem configuration.  Verify your modem configuration:
.        For the modem (consult your modem's user's guide):
             Echo should be turned off;
             Responses should be turned off;
             Error control should be turned on;
             RTS/CTS (hardware handshaking) should be turned on.
.        For the TTY definition in AIX:
           * Enable LOGIN              disable
             BAUD rate                 9600
             PARITY                    none
             BITS per character        8
             Number of STOP BITS       1
.           * AIX 3.2.5 setting will be called "Enable program?"
.     3) If you receive a return code of 14, then the modem communi-
        cations are working correctly, and the problem most likely
        lies with the client/server communications.
.STEP 2: Verify Network Communications Between the Client and Server.
.     When Service Director initiates the callhome process, it runs on a
     port assigned to the IP address of the PRIMARY hostname (the name
     you get when you run the 'hostname' command).  However, the IP
     address that the clients "call home" to is determined by the
     hostname the server is registered with.  Clients will not be able
     to connect with the callhome process on the server if they attempt
     to use any other hostname/IP address for the server.
.     1) Verify that the server was registered using its primary
        hostname.  If it was not, re-register the server with the
        correct hostname, rebuild reporting topology files, and
        re-distribute the keys (steps 1-4 of the Service Director
        registration screen).
.     2) From the client, "ping" the server using the hostname specified
        for the server when you registered your configuration.
.     3) If the ping fails, verify that the server's hostname and IP
        address are correct.
.STEP 3: Test Communications Between the Client(s) and Server.
.     If you have multiple clients, ideally you should perform this test
     for each client.
.     1) On the client, type in:
             \# servdir.analyze ctest:
.     2) If you receive a return code of 14, and you believe that this
        client is not calling out problems when it should be, please
        review the "Errors Reported by Service Director" and "Service
        Director Limitations Based on Type of Error" sections in
        Chapter 1 of the User's Guide.  A copy of the User's Guide can
        be found in /usr/lpp/servdir as UG.ASCII.  
.     3) If you receive a return code of 0 from the communications
        test, the problem lies either with the callhome program
        running on the server or the RPC port configuration.
.STEP 4: Verify the 'callhome' Program Is Running on the Server.
.     The callhome program runs on the server, "listening" for calls
     from designated client machines.  When a call is received from
     an authorized client machine, the callhome program uses the
     modem on the server to place a problem call to IBM.
.     1) On the server, run the command:
             \# ps -ef | grep callhome
.     2) If no callhome process is found, you can restart it in one of
        two ways:
.        a) \# smit servdir_reg
           select option 3, 'Build Reporting Topology', and answer yes
           to use the forwarding daemon.  This will rebuild the
           reporting topology using the hostnames you used when your
           registered your configuration and will restart the callhome
        b) \# vi /etc/inittab
           add the following line to the end of the inittab script:
             callhome:2:respawn:/usr/lib/ras/callhome -p /usr/lib/ras
           run the following command to restart callhome from inittab:
             \# telinit q
.        NOTE:  These steps will restart the default callhome process.
               There are additional options available; please refer to
               section 5.5 of the User's Guide for more information.
.     3) After restarting the callhome process, test the communications
        path from the client machine again.  If you're still getting a
        return code of 0, go on to the next step.
.     4) If the callhome process is already running on the server, there
        may be a problem with the RPC configuration.
.STEP 5. Verify the Contents of the Reporting Topology Files.
.     The reporting topology of Service Director is based on the
     contents of two ASCII files: the /usr/lib/ras/callhomehost file
     on the clients, and the /usr/lib/ras/probrephosts file on the
.     The probrephosts file on the server contains the hostnames of the
     client machines it will accept problem reports from.
.     The callhomehost file on the client machine contains the IP
     addresses of hosts running the 'callhome' program; that is, the
     server (or forwarder as it is referred to in the User's Guide).
     If the primary server's callhome process becomes unavailable, the
     client will automatically try the second IP address in the
     callhomehost file.  Please note that if a client can establish a
     link with a callhome process, it will use that process even if it
     times out, and will not fall back to the secondary IP address.
     You should only use a secondary IP address in the callhomehost
     files if you have a second server acting as a backup to the
     primary server.
.     1) Verify that the callhomehost file on each client contains the
        primary IP address of your server, and that the probrephosts
        file on the server contains the hostnames of all client
        machines.  If you're missing an IP address or hostname, edit
        the appropriate file to include the missing information and
        perform the following steps on the server:
             \# smit servdir_reg
             select option 3, "Build Reporting Topology" and answer
              'yes' to use the forwarding daemon.
.     2) If you have more than one network connection between the
        client(s) and the server and want to enable Service Director
        to use the secondary connection(s) you can add the secondary
        hostnames of the clients to the probrephosts file and rebuild
        the reporting topology as described above.
.     3) After rebuilding the reporting topology, test the
        communications path from the client again.  If you're still
        receiving the return code 0, go on to the next step.
.STEP 6. Checking the RPC Port Mapping.
.     If you are still receiving a return code of 0 from testing
     the communications path from the client at this point, it is
     likely that there is a problem with the RPC port mapping.
.     1) To check the status of the callhome process, run the following
        command on the server:
             \# rpcinfo -t \.\  *.*   LISTEN
.        If you do not get any output, or if you don't see the
        server's IP address in front of the port number, you will
        need to perform problem determination on the network.
.     4) If all of the RPC info looks OK, but you're still receiving a
        return code of 0 from testing the communications path from the
        client, you can get additional information from the callhome
        program by running it in the foreground.  To run the callhome
        program in the foreground, perform the following steps:
.        a) \# rmitab "callhome"
        b) \# ps -ef | grep callhome
           If any callhome processes are still running, kill them
        c) \# /usr/lib/ras/callhome -p /usr/lib/ras

Support Line: Troubleshooting Service Director Communications ITEM: BY3135L
Dated: February 1998 Category: N/A
This HTML file was generated 99/06/24~13:30:19
Comments or suggestions? Contact us