The network lock manager is a facility that works in cooperation with the Network File System (NFS) to provide a System V style of advisory file and record locking over the network. The network lock manager (rpc.lockd) and the network status monitor (rpc.statd) are network-service daemons. The rpc.statd daemon is a user level process while the rpc.lockd daemon is implemented as a set of kernel threads (similar to the NFS server). Both daemons are essential to the ability of the kernel to provide fundamental network services.
Note: Mandatory or enforced locks are not supported over NFS.
The network lock manager contains both server and client functions. The client functions are responsible for processing requests from the applications and sending requests to the network lock manager at the server. The server functions are responsible for accepting lock requests from clients and generating the appropriate locking calls at the server. The server will then respond to the locking request of the client.
In contrast to NFS, which is stateless, the network lock manager has an implicit state. In other words, the network lock manager must remember whether the client currently has a lock. The network status monitor, rpc.statd, implements a simple protocol that allows the network lock manager to monitor the status of other machines on the network. By having accurate status information, the network lock manager can maintain a consistent state within the stateless NFS environment.
When an application wants to obtain a lock on a local file, it sends its request to the kernel using the lockf, fcntl, or flock subroutines. The kernel then processes the lock request. However, if an application on an NFS client makes a lock request for a remote file, the Network Lock Manager client generates a Remote Procedure Call (RPC) to the server to handle the request.
When the client receives an initial remote lock request, it registers interest in the server with the client's rpc.statd daemon. The same is true for the network lock manager at the server. On the initial request from a client, it registers interest in the client with the local network status monitor.
Therpc.statd daemon on each machine notifies the rpc.statd daemon on every other machine of its activities. When the rpc.statd daemon receives notice that another machine crashed or recovered, it notifies its rpc.lockd daemon.
If a server crashes, clients with locked files must be able to recover their locks. If a client crashes, its servers must hold the client locks while it recovers. Additionally, to preserve the overall transparency of NFS, the crash recovery must occur without requiring the intervention of the applications themselves.
The crash recovery procedure is simple. If the failure of a client is detected, the server releases the failed client locks on the assumption that the client application will request locks again as needed. If the crash and recovery of a server is detected, the client lock manager retransmits all lock requests previously granted by the server. This retransmitted information is used by the server to reconstruct its locking state during a grace period. (The grace period, 45 seconds by default, is a time period within which a server allows clients to reclaim their locks.)
The rpc.statd daemon uses the host names kept in /etc/sm and /etc/sm.bak to keep track of which hosts must be informed when the machine needs to recover operations.
By default, the /etc/rc.nfs script starts the rpc.lockd and rpc.statd daemons along with the other NFS daemons. If NFS is already running, you can verify that the rpc.lockd and rpc.statd daemons are running by following the instructions in Get the Current Status of the NFS Daemons. The status of these two daemons should be active. If the rpc.lockd and rpc.statd daemons are not active, and therefore not running, do the following:
if [ -x /usr/sbin/rpc.statd ]; then startsrc -s rpc.statd fi if [ -x /usr/sbin/rpc.lockd ]; then startsrc -s rpc.lockd fi
Note: Sequence is important. Always start the statd daemon first.
Note: Sequence is important. Always start the statd daemon first.
If the rpc.statd and rpc.lockd daemons are still not running, see Troubleshooting the Network Lock Manager.
If you receive a message on a client similar to:
clnttcp_create: RPC: Remote System error - Connection refused rpc.statd:cannot talk to statd at {server}
then the machine thinks there is another machine which needs to be informed that it might have to take recovery measures. When a machine restarts, or when the rpc.lockd and the rpc.statd daemons are stopped and restarted, machine names are moved from /etc/sm to /etc/sm.bak and the rpc.statd daemon tries to inform each machine corresponding to each entry in /etc/sm.bak that recovery procedures are needed.
If the rpc.statd daemon can reach the machine, then its entry in /etc/sm.bak is removed. If the rpc.statd daemon cannot reach the machine, it will keep trying at regular intervals. Each time the machine fails to respond, the timeout generates the above message. In the interest of locking integrity, the daemon will continue to try; however, this can have an adverse effect on locking performance. The handling is different, depending on whether the target machine is just unresponsive or semi-permanently taken out of production. To eliminate the message:
Note: Sequence is important. Always start the statd daemon first.
After you have restarted the daemons, remember that there is a grace period. During this time, the lockd daemons allow reclaim requests to come from other clients that previously held locks with the server, so you might not get a new lock immediately after starting the daemons.
Alternatively, eliminate the message by:
rm /etc/sm.bak/TargetMachineName
This action keeps the target machine from being aware that it might need to participate in locking recovery. It should only be used when it can be determined that the machine does not have any applications running that are participating in network locking with the affected machine.
If you are unable to obtain a lock from a client, do the following:
Note: Sequence is important. Always start the statd daemon first.
If the procedure does not alleviate the locking problem, run the lockd daemon in debug mode, by doing the following:
/usr/sbin/rpc.lockd -d1
When invoked with the -d1 flag, the lockd daemon provides diagnostic messages to syslog. At first, there will be a number of messages dealing with the grace period; wait for them to time out. After the grace period has timed out on both the server and any clients, run the application that is having lock problems and verify that a lock request is transmitted from client to server and server to client.