September 01, 2009
Delegations and leases
This is part of a recent report we prepared for Google, who sponsored some of CITI's Linux NFS work.
Management of delegation and leases in NFSv4 involves some tricky VFS surgery. There are basically two problems to solve:
A mutating operation breaks leases, then updates. Leases have traditionally been broken by a single call into the locking code. This introduces a potential race condition if new leases are requested after the old leases are broken but before the mutating operation completes.
For NFSv4 (and also Samba), leases must be revoked on all mutating operation, but they are currently revoked only on conﬂicting opens.
We have a patch set that addresses both issues by
- replacing the single break_lease call by a break_lease_start ... break_lease_end pair, and
- adding calls into all the other mutating VFS operations: unlink, rename, chmod, chown, creat, mknod, mkdir, symlink, link, and rmdir.
In some cases, the modiﬁcations for completeness require delicate surgery on core parts of the VFS. For example, rename takes kernel mutex locks on the source and target directory before calling lookup, i.e., before we discover whether there are leases to break. But breaking a lease might take dozens of seconds if the client is unreachable, so we cannot afford to break a lease while holding kernel mutex locks. Therefore, if the lookup reveals that there are leases to break, we back out of the kernel mutex locks, break the leases, then start over. (This is not guaranteed to terminate ... hope that's OK!)
To implement this, we introduced a try_break_lease operation, a non-blocking operation that tries to break a lease and either succeeds immediately or returns an error. In the latter case, the caller can release mutex locks, issue a blocking break_lease operation, then retry the operation. This implementation also meets the needs of NFSD, which cannot afford to let server threads block while waiting for an established lease to be broken.
We have been tinkering with these patches on our own for too long—regression testing, ﬁnding and ﬁxing some small bugs, adding comments, and reworking the interface to make the goals clearer—when we should have been sending them out for comments. That will be remedied soon. For now the patch set is available from the “leases” branch of:
which is browsable here.
We have also written some prototype code to support directory leases, which are needed to support NFSv4 directory delegations.