May 19, 2006
Volunteer grid computing projects
Most people have heard of SETI@Home, the volunteer distributed grid computing project in which computer owners let software run on their machine when it is idle (especially at night) that helps search through electromagnetic data from space in an effort to find communications from extra-terrestials. But this is only one of many such projects; over a dozen are described in "Volunteer Computer Grids: Beyond SETI@home" by Michael Muchmore, many of them devoted to health applications.
Why do people donate their computer cycles. At first glance, why not? These programs, most of which run BOINC (Berkeley Open iNfrastructure for Networked Computing), are careful to only use CPU cycles not in demand by the computer owner's software, so the cycles donated are free, right? Well, sort of, but it takes time to download and install the software, there is some risk of infecting one's machine with a virus, many users may perceive some risk that the CPU demands will infringe on their own use, etc. Most users will believe there is some amount of cost.
With certain projects, volunteers may get some pleasure or entertainment value out of participating: for example, the search for large Mersennes primes is exciting to those who enjoy number theory; searching for alien intelligence probably provides a thrill to many.
I suspect a related motivation is sufficient for most volunteers: the projects generally have a socially valuable goal, so people can feel like they are helping make the world a better place, at a rather small cost to themselves. For example there are projects to screen cancer drugs, search for medications for tuberous sclerosis, and help calibrate the Large Hadron Collider (for physics research). As Muchmore writes, "a couple of the projects—Ubero and Gómez—will pay you a pittance for your processing time. But wouldn't you feel better curing cancer or AIDS?"
These projects appear to attract a lot of volunteerism. Muchmore reports estimates of participation that range from one to over five million computers at any given moment. According to the BOINC project, volunteers are generating about 400 teraflops/second of processing, far more than the 280 tps that the largest operational supercomputer can provide.
Posted by jmm at 03:29 PM
May 11, 2006
But that's just the tip of the iceberg...
CNet captures some anecdotes about the rise in splog (spamming blogs) in "Blogosphere suffers spam explosion". They're right of course, but the following was not the most impressive summary:
While technology and legislation may have made spam in e-mail manageable, there is still some way to go when it comes to keeping it out of blogs.
Two common types of splog are comments or tracebacks that point to a commercial site (often for medications or porn), or comments (or fake blogs) filled with links to raise the PageRank (Google index strength) for sites.
Just a quick personal note: this is the least publicized blog on the planet (and no one seems to care enough about it to leave comments!), but I've been splogged nonetheless. Was a few weeks ago, in the midst of last week of class so I tucked this away for a better day (the site seems to be gone, so I'm putting in the full posting including URL):
Sent: Thursday, April 20, 2006 11:29 PM
Subject: [ICD stuff] New TrackBack Ping to Entry 2992 (Principal-agent problem in action)
A new TrackBack ping has been sent to your weblog, on the entry 2992 (Principal-agent problem in action).
IP Address: 220.127.116.11
Title: pregnant movies
May 10, 2006
CAPTCHAs (2): Technical screens vulnerable to motivated humans
A particularly interesting approach to breaking purely technical screens, like CAPTCHAs, is to provide humans with incentives to end-run the screen. The CAPTCHA is a test that is easy for humans to pass, but costly or impossible for machines to pass. The goal is to keep out polluters who rely on cheap CPU cycles to proliferate their pollution. But polluters can be smart, and in this case the smart move may be "if you can't beat 'em, join 'em".
Say a polluter wants to get many free email accounts from Yahoo! (from which to launch pollution distribution, such as spamming). Their approach was to have a computer go through the process of setting up an account at Yahoo! and to replicate this many times to get many accounts. For many similar settings, it is easy to write code to automatically navigate the signup (or other) service.
CAPTCHAs make it very costly for computers to aid polluters, because most computers fail, or take a very long time decoding a CAPTCHA.
As I discussed in my CAPTCHAs (1) entry, one approach for polluters to get around the screen is to improve the ability of computers to crack the CAPTCHA. But another is to give in: if humans can easily pass the screen, then enlist large numbers of human hours to get past the test repeatedly. There are at least two ways to motivate humans to prove repeatedly to a CAPTCHA that they are human: pay low-wage workers (usually in developing countries) to sit at screens all day and solve CAPTCHAs, or give (higher-wage) users some other currency they value to solve the CAPTCHAs: the most common in-kind payment has been access to a collection of pornography in exchange for solving a CAPTCHA.
This puts us back in the usual problem space for screening: how to come up with a screen that is low cost for desirable human participants, but high cost for undesirable humans?
The lesson is that CAPTCHAs may be able to distinguish humans from computers, but only if the computers act like computers. If they enlist humans to help them, the CAPTCHAs fail.
Ironically, enlisting large numbers of humans to solve problems that are hard for computers is an example of what Louis von Ahn (one of the inventors of CAPTCHAs) calls "social computing".
CAPTCHAs (1): Technical screens are vulnerable to technical progress
One of the most wildly successful technical screening mechanisms for blocking pollution in recent years is the CAPTCHA (Complete Automated Public Turing Test to Tell Computers and Humans Apart). The idea is ingenious, and respects basic incentive-centered design principles necessary for a screen to be successful. However, it suffers from a common flaw: purely technical screens often are not very durable because technology advances. I think it may be important to include human-behavior incentive features in screening mechanisms.
The basic idea behind a CAPTCHA is beautifully simple: present a graphically distorted image of a word to a subject. A computer will not be able to recognize the word, but a human will, so a correct answer identifies a human.
Of course, as we know from screening theory, for a CAPTCHA to work, the cost for the computer to successfully recognize the word has to be substantially higher than for humans. And, since the test is generally dissipative (wasteful of time, at least for the human user), the system will be more efficient (user satisfaction will be higher) the lower is the screening cost for the humans. So, the CAPTCHA should be very easy for humans, but hard to impossible for computers.
With rapidly advancing technology (not just hardware, but especially machine vision algorithms), the cost of decoding any particular family of CAPTCHAs will decline rapidly. Once the decoding cost is low enough, the CAPTCHA no longer screens effectively: we get a pooling equilibrium rather than a separating equilibrium (the test can't tell computers and humans apart). The creators of CAPTCHAs (Ahn, Blum, Hopper and Langford) note, reasonably enough, that this isn't all bad: developing an algorithm that has a high success rate against a particular family of CAPTCHAs is solving an outstanding artificial intelligence problem. But, while good for science, that probably isn't much comfort to people who are relying on CAPTCHAs to secure various open access systems from automated polluting agents.
The vulnerability of CAPTCHAs to rapid technological advance is now clear. A recent paper shows that computers can now beat humans at single character CAPTCHA recognition. The CAPTCHA project documents successful efforts to break two CAPTCHA families (ez-gimpy and gimpy-r).