September 02, 2009

Roadmap for pNFS in the Linux kernel, continued

In this note, we look ahead at adding pNFS to the Linux kernel.

We expect the 2.6.32 kernel to "open up" in a few days. That kernel will have a preliminary implementation of the client and server sides of the sessions communication layer, and the back channel is being merged in.

So when will pNFS RPC operations be merged in?

Although prototype implementations of client and server support for file, block, and object layouts have been around for some time, it's not looking good for layout ops in 2.6.32. Let's take a look at each of them.

File layouts

The file layout client has been tested at numerous interoperability events. However, before the code can be merged into the kernel, the developers have to submit patches for review, and this step has not yet been taken. Moreover, we hear that the developers intend to rewrite the client layout code before refactoring and submitting patches. So, it's fair to say that the file layout client is iffy even for 2.6.33.

There are two candidate implementations of the file layout metadata server, one based on GFS2, the other on spNFS.

CITI developed a file metadata layout server by extending GFS2, one of two cluster file systems in the Linux kernel. Lock contention in GFS2 may limit scalability in large clusters, but CITI is working on a performance test bed — an eight-node cluster that uses Linux iSCSI targets as shared storage — where we can look at scaling properties.

Andy Adamson is enhancing and completing the effort begun at CITI, so a GFS2-based metadata server in 2.6.33 possible.

The other file layout metadata server is NetApp's spNFS, a user space implementation that uses local disks on the data servers instead of a common shared disk. spNFS uses NFS as the server-to-server protocol. It is our understanding that the I/O path between clients and the metadata server proves difficult to implement, and the project seems stalled for the moment. We expect NetApp to revive the effort, but not in time for 2.6.32.

One (mandatory) feature lacking from both file layout metadata servers candidates is I/O stateid enforcement, which requires an (as yet unspecified) server-to-server protocol. We hear rumors that NetApp is working on a solution.

Object layouts

Panasas wrote an OSD-based local file system called exofs, that has been merged into the kernel. Their exofs-based pNFS implementation currently supports only a single OSD, limiting scalability and making it less interesting for pNFS, but work is underway for multiple OSD support. The pNFS code hasn't been reviewed by anyone outside of Panasas yet. It may be ready for 2.6.33.

Block layouts

LSI developed a block server based in part on infrastructure from spNFS, but stopped working on it and posted the code last month. Probably, no one other than the main developer has looked at or tested that code yet. It may need a lot of work.

Summary

For servers, there's a good chance that a simple version of the GFS2-based file layout server will be merged in 2.6.33. The exofs-based object layout server might be ready at about the same time. The LSI block layout server is a big question mark.

The client side has the advantage that there is not the variety of backend storage architectures to choose from, so there need only be a single project for each layout type. There are still a number of architectural issues to work out to make the three layout type implementations fit together well, so we estimate client layout code will be merged into 2.6.33 or 2.6.34.

We expect that the initial submissions will pass artificial tests, but will have limitations that will prevent them from being useful in production, and that some additional months will be required to make them fast and reliable. Exactly when the various distributions will start picking them up will depend on the intended audience of the distributions, their tolerance for rough edges, and on what developers and maintainers communicate about the readiness of the code.

Posted by honey at 05:00 PM | Comments (0) | TrackBack

August 17, 2009

git engineering for pNFS

I'm reading a month-old thread on the pNFS Linux developer's mailing list that helps understand the very hard problem of factoring code for pNFS operation into a set of patches in a way that is both generic and useful.

Check it out.

Posted by honey at 12:57 PM | Comments (0) | TrackBack

August 13, 2009

Roadmap for pNFS in the Linux kernel

In early 2008, we sketched out a road map for pNFS that tried to predict progress on NFSv4.1 implementation, standardization, and inclusion in the Linux kernel. Briefly, we predicted:

2008:
• Complete interoperable and functional implementations
• Convergence IETF Internet drafts

2009:
• NFSv4.1 RFC issued
• NFSv4.1 merged into mainline Linux kernel

2010:
• Developers tune pNFS performance at scale

This note looks at progress in adding NFSv4.1 to the Linux kernel. We're not far off track, maybe a couple months.

Linux kernels are not released on a specific schedule, but there is a discernible pattern.

When a kernel is released, a development kernel "opens up." Kernel maintainers then have a window of about two weeks to merge in major changes ready to see the light of day. The development kernel is then worked over for a couple months by maintainers. When the development kernel has stabilized, it is released, and the process starts anew.

The last several kernels were released on the following dates:

2.6.26 on July 13, 2008
2.6.27 on October 9. 2008
2.6.28 on December 24, 2008
2.6.29 on March 23, 2009
2.6.30 on June 9, 2009

This is consistent with a two and a half month cycle, with an extra two weeks over the winter holidays. So our best guess for the schedule of future releases is

2.6.31 in late August 2009
2.6.32 in early November 2009
2.6.33 in early February 2010

Pieces of NFSv4.1 are already present in the Linux kernel, in particular the sessions communication layer, mandatory in NFSv4.1, has a toehold: 2.6.30 has some preliminary server-side sessions code, although it lacks a few things:

• trunking
• back channel
• SSV (and some other security-related features)
• reboot recovery (no RECLAIM_COMPLETE)
• some miscellaneous — but mandatory — state operations, like DESTROY_CLIENTID and TEST_STATEID

None of the optional NFSv4.1 features, e.g., directory delegations, pNFS, and file delegation enhancements, are in 2.6.30.

Although the 2.6.31 is not yet released — it is "in stabilization" — the important stuff has already been accepted, so we know that 2.6.31 will have preliminary client-side sessions code, with more or less the same caveats as the 2.6.30 server-side sessions code.

When 2.6.32 opens up, we're expecting the sessions back channel — necessary for delegation and layout recalls — to be merged in. Developers have been testing this code at interoperability events, and it passes artificial tests, but it will need some TLC as 2.6.32 stabilizes before it can be used in production.

I'll write about the prospects for pNFS (i.e., layout ops) in my next post.

Posted by honey at 01:14 PM | Comments (2)

Roadmap for pNFS in the Linux kernel

In early 2008, we sketched out a road map for pNFS that tried to predict progress on NFSv4.1 implementation, standardization, and inclusion in the Linux kernel. Briefly, we predicted:

2008:
• Complete interoperable and functional implementations
• Convergence IETF Internet drafts

2009:
• NFSv4.1 RFC issued
• NFSv4.1 merged into mainline Linux kernel

2010:
• Developers tune pNFS performance at scale

This note looks at progress in adding NFSv4.1 to the Linux kernel. We're not far off track, maybe a couple months.

Linux kernels are not released on a specific schedule, but there is a discernible pattern.

When a kernel is released, a development kernel "opens up." Kernel maintainers then have a window of about two weeks to merge in major changes ready to see the light of day. The development kernel is then worked over for a couple months by maintainers. When the development kernel has stabilized, it is released, and the process starts anew.

The last several kernels were released on the following dates:

2.6.26 on July 13, 2008
2.6.27 on October 9. 2008
2.6.28 on December 24, 2008
2.6.29 on March 23, 2009
2.6.30 on June 9, 2009

This is consistent with a two and a half month cycle, with an extra two weeks over the winter holidays. So our best guess for the schedule of future releases is

2.6.31 in late August 2009
2.6.32 in early November 2009
2.6.33 in early February 2010

Pieces of NFSv4.1 are already present in the Linux kernel, in particular the sessions communication layer, mandatory in NFSv4.1, has a toehold: 2.6.30 has some preliminary server-side sessions code, although it lacks a few things:

• trunking
• back channel
• SSV (and some other security-related features)
• reboot recovery (no RECLAIM_COMPLETE)
• some miscellaneous — but mandatory — state operations, like DESTROY_CLIENTID and TEST_STATEID

None of the optional NFSv4.1 features, e.g., directory delegations, pNFS, and file delegation enhancements, are in 2.6.30.

Although the 2.6.31 is not yet released — it is "in stabilization" — the important stuff has already been accepted, so we know that 2.6.31 will have preliminary client-side sessions code, with more or less the same caveats as the 2.6.30 server-side sessions code.

When 2.6.32 opens up, we're expecting the sessions back channel — necessary for delegation and layout recalls — to be merged in. Developers have been testing this code at interoperability events, and it passes artificial tests, but it will need some TLC as 2.6.32 stabilizes before it can be used in production.

I'll write about the prospects for pNFS (i.e., layout ops) in my next post.

Posted by honey at 01:14 PM | Comments (2)