The network lock manager is a facility that works in cooperation with the Network File System (NFS) to provide a System V style of advisory file and record locking over the network. The network lock manager (rpc.lockd) and the network status monitor (rpc.statd) are network-service daemons. The rpc.statd daemon is a user level process while the rpc.lockd daemon is implemented as a set of kernel threads (similar to the NFS server). Both daemons are essential to the kernel's ability to provide fundamental network services.
Note: Mandatory or enforced locks are not supported over NFS.
The network lock manager contains both server and client functions. The client functions are responsible for processing requests from the applications and sending requests to the network lock manager at the server. The server functions are responsible for accepting lock requests from clients and generating the appropriate locking calls at the server. The server will then respond to the client's locking request.
In contrast to NFS, which is stateless, the network lock manager has an implicit state. In other words, the network lock manager must remember certain information about a client, that is, whether the client currently has a lock. The network status monitor, rpc.statd, implements a simple protocol that allows the network lock manager to monitor the status of other machines on the network. By having accurate status information, the network lock manager can maintain a consistent state within the stateless NFS environment.
When an application wants to obtain a lock on a local file, it sends its request to the kernel using the lockf, fcntl, or flock subroutine. The kernel then processes the lock request. However, if an application on an NFS client makes a lock request for a remote file, the Network Lock Manager client will generate a Remote Procedure Call (RPC) to the server to handle the request.
When the client receives an initial remote lock request, it registers interest in the server with the client's rpc.statd daemon. The same is true for the network lock manager at the server. On the initial request from a client, it will register interest in the client with the local network status monitor.
Each machine's rpc.statd daemon notifies every other machine's rpc.statd daemon of its activities. When a machine's rpc.statd daemon receives notice that another machine crashed or recovered, it notifies its rpc.lockd daemon.
If a server crashes, clients with locked files must be able to recover their locks. If a client crashes, its servers must hold the client locks while it recovers. Additionally, to preserve the overall transparency of NFS, the crash recovery must occur without requiring the intervention of the applications themselves.
The crash recovery procedure is simple. If the failure of a client is detected, the server releases the failed client locks, on the assumption that the client application will request locks again as needed. If the crash and recovery of a server is detected, the client lock manager retransmits all lock requests previously granted by the server. This retransmitted information is used by the server to reconstruct its locking state during a grace period. (The grace period, 45 seconds by default, is a time period within which a server allows clients to reclaim their locks.)
The rpc.statd daemon uses the host names kept in /etc/sm and /etc/sm.bak to keep track of which hosts must be informed when the machine needs to recover operations.
By default, the /etc/rc.nfs script starts the rpc.lockd and rpc.statd daemons along with the other NFS daemons. If NFS is already running, you can verify that the rpc.lockd and rpc.statd daemons are running by following the instructions in "Get the Current Status of the NFS Daemons". The status of these two daemons should be active. If the rpc.lockd and rpc.statd daemons are not active, and therefore not running, do the following:
if [ -x /usr/sbin/rpc.statd ]; then startsrc -s rpc.statd fi if [ -x /usr/sbin/rpc.lockd ]; then startsrc -s rpc.lockd fi
Note: Sequence is important. Always start the statd daemon first.
Note: Sequence is important. Always start the statd daemon first.
If the rpc.statd and rpc.lockd daemons are still not running, see "Troubleshooting the Network Lock Manager."
If you receive a message on a client similar to:
clnttcp_create: RPC: Remote System error - Connection refused rpc.statd:cannot talk to statd at {server}
then the machine thinks there is another machine which needs to be informed that it may have to take recovery measures. When a machine reboots, or when rpc.lockd and rpc.statd are stopped and restarted, machine names are moved from /etc/sm to /etc/sm.bak and the rpc.statd tries to inform each machine corresponding to each entry in /etc/sm.bak that recovery procedures are needed.
If the rpc.statd can reach the machine, then its entry in /etc/sm.bak is removed. If rpc.statd cannot reach the machine, then it will keep trying at regular intervals. Each time the machine fails to respond, the timeout generates the above message. In the interest of locking integrity, the daemon will continue to try, however, this can have an adverse effect on locking performance. The handling is different, depending on whether the target machine is just unresponsive or semi-permanently taken out of production. To eliminate the message:
Note: Sequence is important. Always start the statd daemon first.
After you have restarted the daemons, remember that there is a grace period. During this time, the lockd daemons allow reclaim requests to come from other clients that previously held locks with the server, so you will not get a new lock immediately after starting the daemons.
Alternatively, you can eliminate the message by:
rm /etc/sm.bak/TargetMachineName
This action will keep the target machine from being aware that it may need to participate in locking recovery, so it should only be used when it can be determined that the machine does not have any applications running that are participating in network locking with the affected machine.
If you are unable to obtain a lock from a client, do the following:
Note: Sequence is important. Always start the statd daemon first.
If the procedure does not alleviate the locking problem, run the lockd daemon in debug mode, by doing the following:
/usr/sbin/rpc.lockd -d1When invoked with the -d1 flag, the lockd daemon provides diagnostic messages to standard output. At first, there will be a number of messages dealing with the grace period; wait for them to time out. After the grace period has timed out on both the server and any clients, run the application that is having lock problems and verify that a lock request is transmitted from client to server and server to client.