IBM Books

Diagnosis Guide


Diagnostics

File Collections errors are divided into three types:

  1. File Collections errors on the control workstation (the master server), which usually affect distribution of files across the entire SP system.
  2. File Collections errors on a subset of nodes, which usually points to the boot/install server for those nodes as the source of the problem.

    A boot/install server is a File Collections client of the File Collections server running on the control workstation. A boot/install server is in turn the File Collections server for the File Collections client running on each node.

  3. File Collections errors on a client (usually a node), which only affect updates of the collections, one collection, or a single file on that node.

One of the first diagnostic actions is to login to the node where a problem has occurred. Verify that a collection can be updated by running the supper command manually, either interactively or through the command line.

File collections errors on any SP host

Answer these questions and follow these steps to debug file collections errors on an SP host system:

  1. Has File Collections been chosen for configuration on the SP system?

    To configure File Collection Management on the SP system, the Site Environment variable, filecoll_config, must be set to true through the SMIT SP Site Environment panel, or from the command line by issuing the command:

    spsitenv filecoll_config=true
    

    If you make this change after installation of your system, you are required to customize the nodes for this change to take effect.

  2. Does the /etc/services file list the port number for the File Collections supfilesrv daemon?

    There should be an entry similar to: supfilesrv 8431/tcp.

  3. Is the network up and running?

    Verify connectivity between the node and the node's boot/install server. Normally, File Collections uses the internal SP connection.

  4. Has a new adapter been installed? Has the hostname changed? Has a different adapter been configured as the primary?

    Consult PSSP: Administration Guide for instructions on updating the PSSP software to recognize the new AIX hostname, IP address, or adapter.

  5. Have you received configuration errors, either directly from File Collections or the Installation/Configuration SP software?

    Most File Collection configuration errors can be corrected by running /usr/lpp/ssp/install/bin/services_config. There is no easy way to run /usr/lpp/ssp/install/bin/filec_config directly, without specifying a number of environment variables that are already taken care of within /usr/lpp/ssp/install/bin/services_config. You should run this on each SP host with File Collections errors.

File collections configuration errors

Answer these questions and follow these steps to debug File Collections configuration errors:

  1. Have you received installation errors or errors from applying a PTF?

    If errors occurred during installation or the application of a PTF, this could indicate that the files installed were not copied to the proper place. Depending on where the installation or PTF errors occurred, you may be able to correct the problem by running /usr/lpp/ssp/install/bin/services_config on the affected hosts.

  2. Are the collections being updated from the wrong server?

    This can happen if a new boot/install server has been defined or an existing one undefined, and the affected nodes have not yet been rebooted. On a node, File Collections obtains the server hostname by use of the /etc/mastername command. This command relies on a file on the node. That file is updated when a node has been customized after a configuration change.

File collections server (control workstation or boot/install server) errors

Answer these questions and follow these steps to debug File Collections server errors:

  1. Is the supfilesrv daemon running on the control workstation and any configured boot/install servers?

    This daemon runs on the control workstation and the boot/install servers. It responds to requests issued on the nodes to update the collections of files configured for File Collections. The supfilesrv daemon is under control of the AIX System Resource Controller (SRC). It is normally started as part of the SP initialization scripts. Under normal circumstances, the system administrator does not have to start or stop the daemon.

  2. Is the File Collections ID, supman, listed in the /etc/passwd file?

    File Collections requires a unique, unused ID for supman, through which the File Collection daemon, supfilesrv, can communicate. The default installation configures the user ID, supman_uid, to 102 and the port, supfilesrv_port, to 8341. These values could be changed using SMIT or the spsitenv command. The default user name and uid attributes listed in the /etc/passwd file are: supman for the user name, and 102 for the uid.

  3. Is the password * (asterisk) for the File Collections id, supman, in the /etc/passwd file?

    The /etc/passwd file is an ASCII file that contains basic user attributes. The entry for each user is unique. The second field of each entry is the Password field that contains a valid encrypted password, an * (asterisk) indicating an invalid password, or and ! (exclamation point) indicating that the password is in the /etc/security/passwd file. If the second field has an * (asterisk) and a password is required for authentication, the user cannot log in.

  4. Is the File Collections id, supman, part of the AIX security group?

    The /etc/group file is an ASCII file that contains the basic group attributes. Each record is identified by a group name, and contains attributes separated by colons. The fields of interest to File Collections are:

    The file collection daemon, supfilesrv requires read access permission to any files that you want managed by file collections. You must add the supman ID to the security group in the /etc/group file. This provides supman with read access to files that have security group permission and allows these files to be managed across the SP system by file collections. This is necessary to distribute the files in the user.admin collection. If you are seeing errors distributing only the user.admin collection, check whether the supman ID is part of the AIX security group.

    At the prompt on the control workstation, issue: id supman You should see output similar to:

    uid=102(supman) gid=7(security)
    

    If this is not the case, verify the configuration of SP User Management and run /usr/lpp/ssp/install/ bin/services_config.

  5. Are all the AIX hostnames of the nodes being served by this server (control workstation or boot/install server) in the host file in each collection?

    During the configuration of file collections, a per-collection host file is created to limit access for the predefined collections to the nodes of the SP system. This host file resides in the directory /var/sysman/file_collection/host, where file_collection is the unique name of a file collection.

    The absence of the per-collection host file indicates that any hosts are allowed access. SP systems in earlier releases did not have this host file, allowing any client from any machine to access file collections.

    If you are using public code, you are responsible for updating and maintaining this host file. While some public code documentation exists on the per-collection host file, this information is not readily available in IBM documentation. IBM discourages the use of public code commands in favor of our supplied supper command and our IBM supplied collections. IBM creates and maintains the public code host file. This automatically grants access to the IBM supplied collections, but only to SP nodes by default. See Notices.

    IBM ships a new file, collection.host.list which lists the SP-supplied collections. Only the SP-supplied collections will be updated with a host file default. If you wish to have your own customer-defined collections in the /var/sysman/ path updated with a host file, the collection names must be placed in this collection.host.list file. This file must always be distributed to be present and updated on both the control workstation and any defined boot/install servers.

  6. Has the collection's list file been altered on the server?

    When adding or removing a file or directory of files for distribution, it is very easy to produce errors in the list file for a collection. For example, if the "." (period) is not the first character in the line of a list file, the file or directory may not be found since paths usually start at the root level.

    You may also see unintended actions. Depending on the error in the list file, for example, you could distribute your entire root directory to a node.

File collections client errors

  1. Is the cron daemon running, and is there a File Collections entry in the root crontab file?

    An entry is placed in the root crontab file during File Collections configuration. The default is to update the collections once each hour, usually ten minutes after the hour. During peak times when nodes are updating collections, you may see Server Busy messages if you are using the supper line command on a node interactively. This may also occur if you reboot an SP system with a large number of nodes all at once.

  2. Is a scan file present in the collection on a node or the server?

    File Collections uses the scan file to prevent building new lists of files to check and update every time it runs. If you have changed a collection to add a file or remove a file from distribution, you need to remove the current scan file for each collection, or run the command:

    supper scan
    

    on the control workstation to build or update the scan file. It is important that you run the supper scan command when you make changes to a collection. This is because the existing scan file does not have the new changes, and the changes will not be included in the collection when a supper update command is issued from the node.

  3. Has the master copy of a file been updated?

    If you do not see a file being updated, or if you see an old copy of a file replacing a new one, verify that the master copy of the file, located on the control workstation, has been updated.

    If a file was updated on a boot/install server and this same file is also distributed to the boot/install server from the control workstation, the updated file will be overwritten by the distributed file.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]