September 13, 2009

Haskell Kernel Modules

Insanity! Someone has made it possible to write kernel modules in Haskell. (FYI, Haskell is a functional language with very strong typing.) Currently, they support only x86, but I wouldn't be surprised if some other architectures got a port soonish.

Posted by jsipek at 01:58 PM | Comments (0) | TrackBack

September 01, 2009

Delegations and leases

This is part of a recent report we prepared for Google, who sponsored some of CITI's Linux NFS work.

Management of delegation and leases in NFSv4 involves some tricky VFS surgery. There are basically two problems to solve:

We have a patch set that addresses both issues by

  1. replacing the single break_lease call by a break_lease_start ... break_lease_end pair, and

  2. adding calls into all the other mutating VFS operations: unlink, rename, chmod, chown, creat, mknod, mkdir, symlink, link, and rmdir.

In some cases, the modifications for completeness require delicate surgery on core parts of the VFS. For example, rename takes kernel mutex locks on the source and target directory before calling lookup, i.e., before we discover whether there are leases to break. But breaking a lease might take dozens of seconds if the client is unreachable, so we cannot afford to break a lease while holding kernel mutex locks. Therefore, if the lookup reveals that there are leases to break, we back out of the kernel mutex locks, break the leases, then start over. (This is not guaranteed to terminate ... hope that's OK!)

To implement this, we introduced a try_break_lease operation, a non-blocking operation that tries to break a lease and either succeeds immediately or returns an error. In the latter case, the caller can release mutex locks, issue a blocking break_lease operation, then retry the operation. This implementation also meets the needs of NFSD, which cannot afford to let server threads block while waiting for an established lease to be broken.

We have been tinkering with these patches on our own for too long—regression testing, finding and fixing some small bugs, adding comments, and reworking the interface to make the goals clearer—when we should have been sending them out for comments. That will be remedied soon. For now the patch set is available from the “leases” branch of:


which is browsable here.

We have also written some prototype code to support directory leases, which are needed to support NFSv4 directory delegations.

Posted by honey at 11:00 PM | Comments (0) | TrackBack

August 13, 2009

Roadmap for pNFS in the Linux kernel

In early 2008, we sketched out a road map for pNFS that tried to predict progress on NFSv4.1 implementation, standardization, and inclusion in the Linux kernel. Briefly, we predicted:

• Complete interoperable and functional implementations
• Convergence IETF Internet drafts

• NFSv4.1 RFC issued
• NFSv4.1 merged into mainline Linux kernel

• Developers tune pNFS performance at scale

This note looks at progress in adding NFSv4.1 to the Linux kernel. We're not far off track, maybe a couple months.

Linux kernels are not released on a specific schedule, but there is a discernible pattern.

When a kernel is released, a development kernel "opens up." Kernel maintainers then have a window of about two weeks to merge in major changes ready to see the light of day. The development kernel is then worked over for a couple months by maintainers. When the development kernel has stabilized, it is released, and the process starts anew.

The last several kernels were released on the following dates:

2.6.26 on July 13, 2008
2.6.27 on October 9. 2008
2.6.28 on December 24, 2008
2.6.29 on March 23, 2009
2.6.30 on June 9, 2009

This is consistent with a two and a half month cycle, with an extra two weeks over the winter holidays. So our best guess for the schedule of future releases is

2.6.31 in late August 2009
2.6.32 in early November 2009
2.6.33 in early February 2010

Pieces of NFSv4.1 are already present in the Linux kernel, in particular the sessions communication layer, mandatory in NFSv4.1, has a toehold: 2.6.30 has some preliminary server-side sessions code, although it lacks a few things:

• trunking
• back channel
• SSV (and some other security-related features)
• reboot recovery (no RECLAIM_COMPLETE)
• some miscellaneous — but mandatory — state operations, like DESTROY_CLIENTID and TEST_STATEID

None of the optional NFSv4.1 features, e.g., directory delegations, pNFS, and file delegation enhancements, are in 2.6.30.

Although the 2.6.31 is not yet released — it is "in stabilization" — the important stuff has already been accepted, so we know that 2.6.31 will have preliminary client-side sessions code, with more or less the same caveats as the 2.6.30 server-side sessions code.

When 2.6.32 opens up, we're expecting the sessions back channel — necessary for delegation and layout recalls — to be merged in. Developers have been testing this code at interoperability events, and it passes artificial tests, but it will need some TLC as 2.6.32 stabilizes before it can be used in production.

I'll write about the prospects for pNFS (i.e., layout ops) in my next post.

Posted by honey at 01:14 PM | Comments (2)