September 24, 2009
PAPI - Getting at Hardware Performance Counters
Recently, I wanted to figure out whether or not an application I was analyzing was memory bound or not. While on this quest, I was introduced to Performance Application Programming Interface (PAPI).
There is a rather good HOWTO that shows step-by-step instructions on getting it all running on Debian. The text below is more or less just a short version of that HOWTO, with my thoughts interspersed.
PAPI is a library that hooks into the hardware performance counters, and presents them in a uniform way. Installation is rather simple if you pay attention to the installation instructions.
- Get the kernel source
- Get the perfctr tarball
- Extract the sources, and run the update-kernel script. I really mean this, if you try to be clever and apply the patch by hand, you'll have a broken source tree. (The script runs patch to fixup some existing kernel files, and then it copies a whole bunch of other files into kernel tree.)
- Configure, build, install, and reboot into the new kernel
- You can modprobe perfctr and see spew in dmesg
That's it for perfctr. Now PAPI itself...
- Get & extract the source
- ./configure, make, make fulltest, make install-all
That's it for PAPI. The make fulltest will run the tests. Chances are that they will all either pass or all fail. If they fail, then something is wrong (probably with perfctr). If they pass, then you are all set.
There are some examples in the src/examples directory. Those should get you started with using PAPI. It takes about 100 lines of C to get an arbitrary counter going.
Some other time, I'll talk more about PAPI, and how I used it in my experiments.
Posted by jsipek at 07:47 PM | Comments (0) | TrackBack (0)
September 16, 2009
Fluid Dynamics Computing using GPUs
This summer I had the opportunity to work on an emerging new platform for scientific computing - Graphics Processing Units (GPUs).
GPUs have long been very powerful alternative for graphics processing, but until recently were built on a platform that didn't allow them to be used for any other application. With the launch of the NVIDIA G80 and G200 series on the common unified device architecture (CUDA), GPUs now have an intuitive interface that allows researchers to harness their full potential. These now have over 100 processor cores (currently at 1.3 GHz), making them a powerful coprocessor able to accelerate scientific codes by one or more orders of magnitude by taking advantage of parallelism in the most computationally expensive mathematical operations.
The goal for my summer research was to investigate the use of these processors for fluid dynamics applications (historically one of the most computationally expensive fields). In order to test the GPU for non-idealistic conditions, a first order unstructured finite volume algorithm was chosen, as it is somewhat simple, yet relatively difficult to parallelize. It was found that even for this algorithm, a speedup of over 25 times can be achieved. The tech report can be found at http://johanndahm.com/papers.php.
Continue reading "Fluid Dynamics Computing using GPUs"
Posted by jdahm at 05:04 PM | Comments (0) | TrackBack (0)
September 13, 2009
Haskell Kernel Modules
Insanity! Someone has made it possible to write kernel modules in Haskell. (FYI, Haskell is a functional language with very strong typing.) Currently, they support only x86, but I wouldn't be surprised if some other architectures got a port soonish.
Posted by jsipek at 01:58 PM | Comments (0) | TrackBack (0)
September 02, 2009
Roadmap for pNFS in the Linux kernel, continued
In this note, we look ahead at adding pNFS to the Linux kernel.
We expect the 2.6.32 kernel to "open up" in a few days. That kernel will have a preliminary implementation of the client and server sides of the sessions communication layer, and the back channel is being merged in.
So when will pNFS RPC operations be merged in?
Continue reading "Roadmap for pNFS in the Linux kernel, continued"
Posted by honey at 05:00 PM | Comments (0) | TrackBack (0)
45 disks in a 4u box
Via Slashdot, design and parts list for a $7,867 machine with 45 1.5TB drives.
That's 12 cents a gigabyte as opposed to 12 dimes a gigabyte for ten. Looks like it's probably also slower. (No idea what disk bandwidth they'd get, but it probably doesn't matter since they appear to have only one gigabit network interface.)
Posted by bfields at 04:23 PM | Comments (3) | TrackBack (0)
September 01, 2009
Delegations and leases
This is part of a recent report we prepared for Google, who sponsored some of CITI's Linux NFS work.
Management of delegation and leases in NFSv4 involves some tricky VFS surgery. There are basically two problems to solve:
- Correctness
A mutating operation breaks leases, then updates. Leases have traditionally been broken by a single call into the locking code. This introduces a potential race condition if new leases are requested after the old leases are broken but before the mutating operation completes. - Completeness
For NFSv4 (and also Samba), leases must be revoked on all mutating operation, but they are currently revoked only on conflicting opens.
Continue reading "Delegations and leases"
Posted by honey at 11:00 PM | Comments (0) | TrackBack (0)
August 26, 2009
Benchmarking Is Hard, Let's Go Shopping
It's been a while since I started telling people that benchmarking systems is hard. I'm here today because of an article about an article about an article from the ACM Transactions on Storage. (If anyone refers to this post, they should cite it as "blog post about an article about ... ;) .)
While the statement "benchmarking systems is hard" is true for most of systems benchmarking (yes, that's an assertion without supporting data, but this is a blog and so I can state these opinions left and right!), the underlying article (henceforth the article) is about filesystem and storage benchmarks specifically.
For those of you who are getting the TL;DR feeling already, here's a quick summary:
- FS benchmarking is hard to get right.
- Many commonly accepted fs benchmarks are wrong.
- Many people misconfigure benchmarks yielding useless data.
- Many people don't specify their experimental setup properly.
Hrm, I think I just summarized a 56 page journal article in 4-bullet points. I wonder what the authors will have to say about this :)
Continue reading "Benchmarking Is Hard, Let's Go Shopping"
Posted by jsipek at 10:05 PM | Comments (0) | TrackBack (0)