File Collections errors are divided into three types:
A boot/install server is a File Collections client of the File Collections server running on the control workstation. A boot/install server is in turn the File Collections server for the File Collections client running on each node.
One of the first diagnostic actions is to login to the node where a problem has occurred. Verify that a collection can be updated by running the supper command manually, either interactively or through the command line.
You will see the message Supper Collection Maintenance --- type 'help' for a list of commands. This is followed by the supper prompt line, supper>. By entering help, a list of supper options is displayed, along with a brief description of each option. For example, the when option shows the last update time of all resident collections. You remain in Supper Collection Maintenance mode until you enter the quit option.
Perform these steps to see if the collection update is taking place:
Updating collection sup.admin from server c180cw.ppd.pok.ibm.com Files changes: 1 updated, 0 removed, 0 errors.
This example shows that one master file on the control workstation was touched or changed, and the update was done on the node.
Answer these questions and follow these steps to debug file collections errors on an SP host system:
To configure File Collection Management on the SP system, the Site Environment variable, filecoll_config, must be set to true through the SMIT SP Site Environment panel, or from the command line by issuing the command:
spsitenv filecoll_config=true
If you make this change after installation of your system, you are required to customize the nodes for this change to take effect.
There should be an entry similar to: supfilesrv 8431/tcp.
Verify connectivity between the node and the node's boot/install server. Normally, File Collections uses the internal SP connection.
Consult PSSP: Administration Guide for instructions on updating the PSSP software to recognize the new AIX hostname, IP address, or adapter.
Most File Collection configuration errors can be corrected by running /usr/lpp/ssp/install/bin/services_config. There is no easy way to run /usr/lpp/ssp/install/bin/filec_config directly, without specifying a number of environment variables that are already taken care of within /usr/lpp/ssp/install/bin/services_config. You should run this on each SP host with File Collections errors.
Answer these questions and follow these steps to debug File Collections configuration errors:
If errors occurred during installation or the application of a PTF, this could indicate that the files installed were not copied to the proper place. Depending on where the installation or PTF errors occurred, you may be able to correct the problem by running /usr/lpp/ssp/install/bin/services_config on the affected hosts.
This can happen if a new boot/install server has been defined or an existing one undefined, and the affected nodes have not yet been rebooted. On a node, File Collections obtains the server hostname by use of the /etc/mastername command. This command relies on a file on the node. That file is updated when a node has been customized after a configuration change.
Answer these questions and follow these steps to debug File Collections server errors:
This daemon runs on the control workstation and the boot/install servers. It responds to requests issued on the nodes to update the collections of files configured for File Collections. The supfilesrv daemon is under control of the AIX System Resource Controller (SRC). It is normally started as part of the SP initialization scripts. Under normal circumstances, the system administrator does not have to start or stop the daemon.
File Collections requires a unique, unused ID for supman, through which the File Collection daemon, supfilesrv, can communicate. The default installation configures the user ID, supman_uid, to 102 and the port, supfilesrv_port, to 8341. These values could be changed using SMIT or the spsitenv command. The default user name and uid attributes listed in the /etc/passwd file are: supman for the user name, and 102 for the uid.
The /etc/passwd file is an ASCII file that contains basic user attributes. The entry for each user is unique. The second field of each entry is the Password field that contains a valid encrypted password, an * (asterisk) indicating an invalid password, or and ! (exclamation point) indicating that the password is in the /etc/security/passwd file. If the second field has an * (asterisk) and a password is required for authentication, the user cannot log in.
The /etc/group file is an ASCII file that contains the basic group attributes. Each record is identified by a group name, and contains attributes separated by colons. The fields of interest to File Collections are:
The file collection daemon, supfilesrv requires read access permission to any files that you want managed by file collections. You must add the supman ID to the security group in the /etc/group file. This provides supman with read access to files that have security group permission and allows these files to be managed across the SP system by file collections. This is necessary to distribute the files in the user.admin collection. If you are seeing errors distributing only the user.admin collection, check whether the supman ID is part of the AIX security group.
At the prompt on the control workstation, issue: id supman You should see output similar to:
uid=102(supman) gid=7(security)
If this is not the case, verify the configuration of SP User Management and run /usr/lpp/ssp/install/ bin/services_config.
During the configuration of file collections, a per-collection host file is created to limit access for the predefined collections to the nodes of the SP system. This host file resides in the directory /var/sysman/file_collection/host, where file_collection is the unique name of a file collection.
The absence of the per-collection host file indicates that any hosts are allowed access. SP systems in earlier releases did not have this host file, allowing any client from any machine to access file collections.
If you are using public code, you are responsible for updating and maintaining this host file. While some public code documentation exists on the per-collection host file, this information is not readily available in IBM documentation. IBM discourages the use of public code commands in favor of our supplied supper command and our IBM supplied collections. IBM creates and maintains the public code host file. This automatically grants access to the IBM supplied collections, but only to SP nodes by default. See Notices.
IBM ships a new file, collection.host.list which lists the SP-supplied collections. Only the SP-supplied collections will be updated with a host file default. If you wish to have your own customer-defined collections in the /var/sysman/ path updated with a host file, the collection names must be placed in this collection.host.list file. This file must always be distributed to be present and updated on both the control workstation and any defined boot/install servers.
When adding or removing a file or directory of files for distribution, it is very easy to produce errors in the list file for a collection. For example, if the "." (period) is not the first character in the line of a list file, the file or directory may not be found since paths usually start at the root level.
You may also see unintended actions. Depending on the error in the list file, for example, you could distribute your entire root directory to a node.
An entry is placed in the root crontab file during File Collections configuration. The default is to update the collections once each hour, usually ten minutes after the hour. During peak times when nodes are updating collections, you may see Server Busy messages if you are using the supper line command on a node interactively. This may also occur if you reboot an SP system with a large number of nodes all at once.
File Collections uses the scan file to prevent building new lists of files to check and update every time it runs. If you have changed a collection to add a file or remove a file from distribution, you need to remove the current scan file for each collection, or run the command:
supper scan
on the control workstation to build or update the scan file. It is important that you run the supper scan command when you make changes to a collection. This is because the existing scan file does not have the new changes, and the changes will not be included in the collection when a supper update command is issued from the node.
If you do not see a file being updated, or if you see an old copy of a file replacing a new one, verify that the master copy of the file, located on the control workstation, has been updated.
If a file was updated on a boot/install server and this same file is also distributed to the boot/install server from the control workstation, the updated file will be overwritten by the distributed file.