October 31, 2013

Everything (of value) is for sale

There's a truism that bothers many (except economists): if there is a good or service that has value to some and can be produced at a cost below that value by someone else, there will be a market. This is disturbing to many because it is as true for areas of dubious morality such as sexual transactions, clear immorality (human trafficking and slavery) as it is for lawn mowing and automobiles.

Likewise for online activities, as I've documented many times here. You can buy twitter followers, Yelp reviews, likes on Facebook, votes on Reddit. And, of course, Wikipedia, where you can buy pages or edits, or even (shades of The Sopranos), "protection".

Here is an article that reports at some length on large scale, commercialized Wikipedia editing and page management services. Surprised? Just another PR service, like social media management services provided by every advertising / marketing / image management service today.

Posted by jmm at 09:47 AM | Comments (0) | Permalink »

June 06, 2013

Everything can -- and will -- be manipulated

Well, not "everything". But every measure on which decisions of value depend (e.g., book purchases, dating opportunities, or tenure) can and will be manipulated.

And if the measure depends on user-contributed content distributed on an open platform, the manipulation often will be easy and low cost, and thus we should expect to see it happen a lot. This is a big problem for "big data" applications.

This point has been the theme of many posts I've made here. Today, a new example: citations of scholarly work. One of the standard, often highly-valued (as in, makes a real difference to tenure decisions, salary increases and outside job offers) measures of the impact of a scholar's work is how often it is cited in the published work of other scholars. ISI Thompson has been providing citations indices for many years. ISI is not so easy to manipulate because -- though it depends on user-contributed content (articles by one scholar that cite the work of another) -- that content is distributed on closed platforms (ISI only indexes citations from a set of published journals that have editorial boards which protect their reputation and brand by screening what they publish).

But over the past several years, scholars have increasingly relied on Google Scholar (and sometimes Microsoft Academic) to count citations. Google Scholar indexes citations from pretty much anything that appears to be a scholarly article that is reachable by the Google spiders crawling the open web. So, for example, it includes citations in self-published articles, or e-prints of articles published elsewhere. Thus, Google Scholar citation counts depends on user-contributed content distributed on an open platform (the open web).

And, lo and behold, it's relatively easy to manipulate such citation counts, as demonstrated by a recent scholarly paper that did so: Delgado Lopez-Cozar, Emilio; Robinson-Garcia, Nicolas; Torres Salinas, Daniel (2012). Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting. EC3 Working Papers 6: 29 May, 2012, available as http://arxiv.org/abs/1212.0638v2.

Their method was simple: they created some fake papers that cited other papers, and published the fake papers on the Web. Google's spider dutifully found them and increased the citation counts for the real papers that these fake papers "cited".

The lesson is simple: for every measure that depends on user-contributed content on an open platform, if valuable decisions depend on it, we should assume that it is vulnerable to manipulation. This is a sad and ugly fact about a lot of new opportunities for measurement ("big data"), and one that we must start to address. The economics are unavoidable: the cost of manipulation is low, so if there is much value to doing so, it will be manipulated. We have to think about ways to increase the cost of manipulating, if we don't want to lose the value of the data.

Posted by jmm at 11:09 AM | Comments (1) | Permalink »

April 19, 2010

Even academics pollute Amazon reviews (updated)

[Oops. Turns out that Orlando Figes himself was the poison pen reviewer, and that he simply compounded his dishonesty by blaming his wife. That's got to put a bit of strain on the home life.]

That people use pseudonyms to write not-arm's-length book reviews on Amazon is no longer news.

But I couldn't resist pointing out this new case, if nothing else as an especially fun example to use in teaching. Dr. Stephanie Palmer, a senior law (!) lecturer at Cambridge University (UK), was outed by her husband, Prof. Orlando Figes, for writing reviews under a pseudonym that savaged the works of his rivals, while also writing a review of a book by her husband that it was a "beautiful and necessary" account, written by an author with "superb story-telling skills." Story-telling, indeed.

A closing comment by the editor of the Times Literary Service, which broke the story: "What is new and is regrettable is when historians use the law to stifle debate and to put something in the paper which is untrue....[Figes's] whole business is replacing a mountain of lies with a few truths".

Via The Guardian.

Posted by jmm at 11:13 PM | Comments (0) | Permalink »

November 26, 2008

What is pollution, what is manipulation?

For some time, I've referred to a variety of user-contributed content activities as "pollution". Spam, for example. The typical user doesn't want to receive it. It pollutes the inbox.

But some behaviors that reduce the value of user-contributed content are commonly called "manipulation". For example, stuffing the ballot box in an online rating system, such as Netflix, which might be done by, say, the producer of a movie. My colleagues, Resnick and Sami, have been publishing work on "manipulation-resistant" systems [1][2].

Is there a difference? In both cases, a user with a product to sell is submitting content that most users would agree (if they knew) has negative value. Why not call them both pollution? Both manipulation?

I think there is a difference, but it's more a matter of degree than absolute. The defining features of pollution are that the polluter does not benefit from the pollution itself: the pollution cost imposed on users is an inadvertent. They are victims of a side-effect. This is also known as an externality problem: the producer creates and benefits from creating X; X imposes a cost on others, but the producer's benefit is not directly related to the cost imposed on others (the producer is not generating pollution because she gets satisfaction from making others suffer).

Manipulation costs are not externalities: the benefit to the producer is directly related to the cost experienced by the others. For example, in the Netflix example, the cost to users is that they pay for and watch movies that are not as suited to their tastes as they otherwise would. But that is precisely the outcome that the manipulative content producer wanted to achieve. The manipulator intends to get the others to do or experience something they would would rather not.

I said this was a matter of degree: In the spam example, some of the benefit (sometimes, perhaps most) to the producer is that she convinces consumers to purchase, even though ex ante those consumers might have said they would rather not receive the spam advertisements (considers them polluting). Thus, part of the costs of spam might be manipulation costs. The producer doesn't care about many users, who ignore the spam but suffer from it, but does care about the effect she is having on those she manipulates into purchasing.

Why does it matter what we call them? It may not matter much, but labels are useful for guiding our understanding, and our design efforts. If I recognize something as having the features of a pollution problem, I immediately know that I can refer to the literature on pollution problems to help me characterize it, and the literature on solving pollution problems to help general good designs. Labels are short hand for abstracting models, compactly representing the essential features of the problem and its context.

[1] Eric Friedman, Paul Resnick, and Rahul Sami (2007). "Manipulation-Resistant Reputation Systems", Ch. 27 in Algorithmic Game Theory, (N.~Nisan, T.~Roughgarden, E.~Tardos, V.~Vazirani, editors),
Cambridge University Press, 2007.

[2] Paul Resnick and Rahul Sami (2007). "The Influence-Limiter: Provably Manipulation-Resistant Recommender Systems", Proceedings of the ACM Recommender Systems Conference.

Posted by jmm at 10:38 AM | Comments (0) | Permalink »

September 01, 2008

The fine line between spam and foie gras

The New York Times (following others) reported today on a large number of detailed, informed, and essentially all flattering edits to Sarah Palin's Wikipedia page made --- hmmm --- in the 24 hours before her selection as the Republican vice presidential nominee was made public. The edits were made anonymously, and the editor has not yet been identified, though he acknowledges that he is a McCain campaign volunteer.

Good or bad content? The potential conflict of interest is clear. But that doesn't mean the content is bad. Most of the facts were supported with citations. But were the written in overly flattering langauge? And was the selection of facts unbiased? Much of the material has been revised and toned down or removed in the few days since, which is not surprising regardless of the quality of this anonymous editor's contributions, given the attention that Ms. Palin has been receiving.

Posted by jmm at 04:18 PM | Comments (0) | Permalink »

April 12, 2008

Pollution as revenge

One of my students alerted me to a recent dramatic episode. Author and psychologist Cooper Lawrence appeared on a Fox News segment and made some apparently false statements about the Xbox game "Mass Effect", which she admitted she had never seen or played. Irate gamers shortly thereafter started posting (to Amazon) one-star (lowest possible score) reviews of her recent book that she was plugging on Fox News. Within a day or so, there were about 400 one-star reviews, and only a handful any better.

Some of the reviewers acknowledged they had not read or even looked at the book (arguing they shouldn't have to since she reviewed a game without looking at it). Many explicitly criticized her for what she said about the game, without actually saying anything about her book.

When alerted, Amazon apparently deleted most of the reviews. Its strategy apparently was to delete reviews that mentioned the name of the game, or video games at all (the book has nothing to do with video games). With this somewhat conservative strategy, the reviews remaining (68 at the moment) are still lopsidedly negative (57 one-star, 8 two-star, 3 five-star), more than I've ever noticed for any somewhat serious book, though there's no obvious way to rule these out as legitimate reviews. (I read several and they do seem to address the content of the book, at least superficially.)

Aside from being a striking, and different example of book review pollution (past examples I've noted have been about favorable reviews written by friends and authors themselves), I think this story highlights troubling issues. The gamers have, quite possibly, intentionally damaged Lawrence's business prospects: her sales likely will be lower (I know that I pay attention to review scores when I'm choosing books to buy). Of course, she arguably damaged the sales of "Mass Effect", too. Arguably, her harm was unintentional and careless (negligent rather than malicious). But she presumably is earning money by promoting herself and her writing by appearing on TV shows: is a reasonable social response to discipline her in her for negligence? (And the reviewers who have more or less written "she speaks about things she doesn't know; don't trust her as an author" may have a reasonable point: so-called "public intellectuals" probably should be guarding their credibility in every public venue if they want people to pay them for their ideas.)

I also find it disturbing, as a consumer of book reviews, but not video games, that reviews might be revenge-polluted. Though this may discipline authors in a way that benefits gamers, is it right for them to disadvantage book readers?

I wonder how long it will be (if it hasn't already happened) before an author or publisher sues Amazon for providing a nearly-open access platform for detractors to attack a book (or CD, etc.). I don't know the law in this area well enough to judge whether Amazon is liable (after all, arguably she could sue the individual reviewers for some sort of tortious interference with her business prospects), but given the frequency of contributory negligence or similar malfeasances in other domains (such as Napster and Grokster facilitating the downloading of copyrighted materials), it seems like some lawyer will try to make the case one of these days. After all, Amazon provides the opportunity for readers to post reviews in order to advance its own business interests.

Some significant risk of contributory liability could be hugely important for the problem of screening pollution in user-contributed content. If you read some of the reviews still on Amazon's site in this example, you'll see that it would not be easy to decide which of them were "illegitimate" and delete all of those. And what kind of credibility would the review service have if publishers made a habit of deciding (behind closed doors) which too-negative reviews to delete, particularly en masse. I think Amazon has done a great job of making it clear that they permit both positive and negative reviews and don't over-select the positive ones to display, which was certainly a concern I had when they first started posting reviews. But it authors and publishers can hold it liable if they let "revenge" reviews appear, I suspect it (and similar sites) will have to shut down reviewing altogether.

(Thanks to Sarvagya Kochak.)

Posted by jmm at 01:42 PM | Comments (0) | Permalink »

March 29, 2008

Keeping the good stuff out at Yahoo! Answers

This is, I think, an amusing and instructive tale. I'm a bit sorry to be telling it, because I have a lot of friends at Yahoo! (especially in the Research division), and I respect the organization. The point is not to criticize Yahoo! Answers, however: keeping pollution out is a hard problem for user-contributed content information services, and that their system is imperfect is a matter for sympathy, not scorn.

While preparing for my recent presentation at Yahoo! Research, I wondered whether Yahoo! Mail was still using the the Goodmail spam-reduction system (which is based on monetary incentives). I couldn't find the answer with a quick Google search, nor by searching the Goodmail and Yahoo! corporate web sites (Goodmail claims that Yahoo! is a current client, but there was no information about whether Yahoo! is actually using the service, or what impact it is having).

So, I thought, this is a great chance to give Yahoo! Answers a try. I realize the question answerers are not generally Yahoo! employees, but I figured some knowledgeable people might notice the question. Here is my question, in full:

Is Yahoo! Mail actually using Goodmail's Certified Email? In 2005 Yahoo!, AOL and Goodmail announced that the former 2 had adopted Goodmail's "Certified Email" system to allow large senders to buy "stamps" to certify their mail (see e.g., http://tinyurl.com/2atncr). The Goodmail home page currently states that this system is available at Yahoo!. Yet I can find nothing about it searching Yahoo!Mail Help, etc. My question: I the system actually being used at Yahoo!Mail? Bonus: Any articles, reports, etc. about its success or impacts on user email experience?

A day later I received the following "Violation Notice" from Yahoo! Answers:

You have posted content to Yahoo! Answers in violation of our Community Guidelines or Terms of Service. As a result, your content has been deleted. Community Guidelines help to keep Yahoo! Answers a safe and useful community, so we appreciate your consideration of its rules.

So, what is objectionable about my question? It is not profane or a rant. It is precisely stated (though compound), and I provided background context to aid answerers (and so they knew what I already knew).

I dutifully went and read the Community Guidelines (CG) and the Terms of Service (TOS), and I could not figure out what I had violated. I had heard elsewhere that some people did not like TinyURLs because it it not clear where you are being redirected, and thus it might be used to maliciously direct traffic. But I saw nothing in the CG or TOS that prohibited URLs in general, or TinyURLs specifically.

So I contacted the link they provided to appeal the deletion. A few days later I received a reply that cut-and-pasted the information from the Yahoo! Answers help page explaining why content is deleted. This merely repeated what I had been told in the first message (since none of the other categories applied): my content was in violation of the CG or TOS. But no information was provided (second time) on how the content violated these rules.

Another address was provided to appeal the decision, so I wrote a detailed message to that address, explaining my question, and my efforts to figure out what I was violating. A few days later, I got my third email from Yahoo! Answers:

We have reviewed your appeal request. Upon review we found that your content was indeed in violation of the Yahoo! Answers Community Guidelines, Yahoo! Community Guidelines or the Yahoo! Terms of Service. As a result, your content will remain removed from Yahoo! Answers.

Well... Apparently it's clear to others that my message violates the CG or the TOS, but no one wants to tell me what the violation actually is. Three answers, all three with no specific explanation. Starting to feel like I'm a character in a Kafka novel.

At this point, I laughed and gave up (it was time for me to travel to Yahoo! to give my -- apparently dangerous and community-guideline-violating -- presentation anyway).

I have to believe that there is something about the use of a URL, a TinyURL, or the content to which I pointed that is a violation. I've looked, and found many answers that post URLs (not surprisingly) to provide people with further information. Perhaps the problem is that I was linking to a Goodmail press release on their web site, and they have a copyright notice on that page? But does Yahoo! really think providing a URL is "otherwise make available any Content that infringes any patent, trademark, trade secret, copyright" (from the TOS)? Isn't that what Yahoo's search engine does all the time?

End of story.

Moral? Yahoo! Answers is a user-contributed content platform. Like most, that means it is fundamentally an open-access publishing platform. There will be people who want to publish content that is outside the host's desired content scope. How to keep out the pollution? Yahoo! uses a well-understood, expensive method to screen: labor. People read the posted questions and make determinations about acceptability. But, as with any screen, there are Type I (false negative) and Type II (false positive) errors. Screening polluting content is hard.

(My question probably does violate something, but surely the spirit of my question does not. I had a standard, factual, reference question, ironically, to learn a fact that I wanted to use in a presentation to Yahoo! Research. A bit more clarity about what I was violating and I would have contributed desirable content to Yahoo! Answers. Instead, a "good" contributor was kept out.)

Posted by jmm at 10:19 AM | Comments (5) | Permalink »

ICD for home computer security

Ph.D. student Rick Wash and I are applying ICD design tools to the problem of home computer security. Metromode (online magazine) recently published an article featuring our project.

One of the major threats to home computers are viruses that install bots, creating botnets. These bots are code that use the computer's resources to perform something on behalf of the bot owner. Most commonly, the bots become spam sending engines, so that spammers can send mail from thousands of home computers, making it harder to block the spam by originating IP (and also saving them the cost of buying and maintaining a server farm). Bots, of course, may also log keystrokes and try to capture bank passwords and credit card numbers.

The problem is crawling with incentives issues. Unlike first generation viruses, bots tend to be smarter about detection. In particular, they watch the process table, and limit themselves to using CPU cycles when other programs are not using many. That way, a normal home user may not see any evidence that he or she has a virus: the computer does not seem to noticeably slow down (but while they are away from the machine the bot may be running full tilt sending out spam). So, the bot doesn't harm its host much, but it harms others (spreading spam, the bot virus itself, possibly other harmful activity like denial-of-service attacks on other hosts). This is a classic negative externality: the computer owner has little incentive (and often little appropriate knowledge) to stop the bot, but others suffer. How to get the home computer user to protect his or her machine better?

We are developing a social firewall that integrates with standard personal firewall services to provide the user additional benefits (motivating them to use the service), while simultaneously providing improved security information to the firewalls employed by other users.

We don't have any papers released on this new system yet, but for some of the foundational ideas, see "Incentive-Centered Design for Information Security", ICEC-07.

Posted by jmm at 09:44 AM | Comments (0) | Permalink »

January 13, 2008

All user-contributed, all the time (almost)

I've been fascinated for the past couple of years with businesses that rely on user-contributed content (UCC) for substantial inputs to production. It is sometimes jokingly referred to as the "Tom Sawyer business model": get your friends to whitewash the fence for you, without paying them (in fact, they paid Tom quite handsomely, including "a key that wouldn't unlock anything, a fragment of chalk...and a dead rat on a string"). Tom Sawyer's Whitewash

Randall Stross writes in today's New York Times about two fairly well-known businesses that have nearly perfected the art: Plenty of Fish, and Craigslist. Craigslist is a wide-open classified advertising service where employers post jobs, homeowners sell their old "Monopoly -- Star Wars Version" games and unwanted gifts, and, most piquantly, people of every shape, age, color and preference seek partners for a nearly infinite variety of polymorphously perverse, chaste and romantic interactions. Craigslist is one of the top 10 visited English language sites, has versions for 450 localities in over 50 countries, and runs with only 25 employees. All of the content is written, edited (such as it is) and maintained voluntarily by users; user volunteers also provide most of the customer service through help forums.

Plenty of Fish is more specialized and not quite as successful, but perhaps more remarkable. It is a dating service localized to 50 Canadian, US and Australian cities. Markus Frind created it and devotes only about 10 hours a week to running it...and he only in the past year hired his first employee. Yet the site has 600,000 registered users (which grows rapidly despite purging 30,000 inactives a month), and receives 50,000 new photos per day. Spam-filtering of text is done by software. Filtering of photos (to make sure they are human and clothed) is done by user volunteere: in the past year the top 120 volunteers scanned over 100,000 photos each! The users provide the customer service too, through help forums.

Great business model: have the users whitewash the fence, and you work 10 hours a week for $10 million in annual profits (Stross estimates that Frind's claim about his advertising-only profits is plausible). What are the generalizable principles. How can *I* start such a business and succeed (the road is littered with UCC-driven businesses that never turn a profit).

It is obvious that one of the most important questions is why? Why would users volunteer the time and effort to provide the content, the customer service, the photo filtering, etc.? You may think it's obvious why users want to visit Plenty of Fish: there are a lot of lonely hearts out there. And it is 100% free to users: Frind only charges advertisers. Of course, without user effort, it won't succeed: there will be no information about potential life partners, no help information, and lots of undesirable photos polluting the service. But no individual user needs to contribute anything: there is no requirement for volunteer hours (as there is at our local food coop), there is no public tracking of effort and peer pressure to pull your weight. It's a free-rider's dream.

Contributing content is easy: if you don't submit a profile you aren't going to get any dates. But what about photo scanning? Yes, you want to scan photos anyway: that's why you're there. But why not let someone else filter out the junk so you only have to filter the worthwhile photos? Is there that much of a first-mover advantage that you are willing to filter 100,000 photos per year to have a shot at being the first to contact the newest hunk? My guess is that the expected return on that investment is pretty low.

And why spend your time providing free help service to other users? Maybe Plenty of Fish is lucky to have a demographic for whom the value of time is unusually low (lonely single people with nothing else to do on Saturday night), but that just means the cost is lower to make the contribution: what is the benefit? Is it that the volunteer helpers are trying to be noticed as helpful, well-informed web geeks as a way of attracting dates?

I think the answers to these questions are transparently not obvious. If the answers were easy, we'd have a lot more people working 10 hours a week to make $10 million per year. And the answers are not likely to be something that involves only traditional economic views about incentives and motivations. Developing generalizable principles about the motivations for user-contributed content will surely need to draw on psychological explanations as well, from the psychology of personality and self, and social psychology (at least).

Posted by jmm at 11:03 AM | Permalink »

September 08, 2007

Op-ed in Wall Street Journal advocates hybrid solution to spam

Three researchers published an op-ed in today's Wall Street Journal (subscription only) suggesting that two practical methods to greatly reduce spam are now technically workable, but will not be implemented without cooperation on standards by the major email providers. They urge the providers to agree on a hybrid system:

To break this logjam, we advocate a hybrid system that would allow email users to choose their preferred email system. Those who want anonymity and no incremental cost for email can continue to send emails under the current system, without authentication and without sender bonds. Those who want the lowestcosts and don't care about anonymity (most legitimate businesses would likely fall into this category) can send email that is user authenticated, but not bonded. People who want anonymity but are willing to pay to demonstrate the value they place on the recipient's attention can post a bond. Payment could be made anonymously via a clearinghouse, using the electronic equivalent of a tiny traveler's check bundled with each message. Those with especially high-value messages can make them both authenticated and bonded.

The authors are Jonathan Koomey (Lawrence Berkeley National Labs), Marshall van Alstyne (Boston U) and Erik Brynjolfsson (MIT Sloan).

The ideas are not new; they are trying to create public pressure. The authentication system in play is DKIM, a standard approved by the IETF earlier this year. The sender bond method was detailed in a paper by Thede Loder, Rick Wash and van Alstyne. Loder has started a company offering the service (Boxbe); Wash is currently one of my Ph.D. students (though he did this research while working with van Alstyne while Marshall was my colleague at Michigan).

Posted by jmm at 10:37 AM | Comments (0) | Permalink »

September 02, 2007

Incentive wars: Buy blog comments

Raising the cost of polluting is one way to reduce pollution. Whether the pollution is a by-product (e.g., effluent from a factory), or whether it is the polluter's product (e.g., spam advertising), if the polluter is forced to bear more of the social cost of producing the pollution, he or she will have a good reason to produce less.

Most techniques for discouraging or blocking spam can be interpreted as raising the cost. My Ph.D. student Lian Jian told me about a lovely example that makes this point clearly: http://buyblogcomments.com. Most bloggers moderate comments so they can screen out spam by hand; they also often put up technical barriers to block spambots. These efforts are a form of raising costs for spammers, and the market has responded by putting a price on getting around these efforts. BuyBlogComments is a service that pays humans to enter "quality comments" that are related to blog postings, so they won't be deleted, yet that also include the URL the spammer wants to disseminate. For example, you can buy 100 blog comments for $24.99. They probably don't have 100% success, but suppose they do: we now have a reasonable estimate that comment moderation is at most going to eliminate spam comments that are worth less than $0.25 to the spammer.

Posted by jmm at 05:09 PM | Comments (1) | Permalink »

June 11, 2007

Arms race with CAPTCHAs continues, and continues...

A Dog or a Cat? New Tests to Fool Automated Spammers - New York Times

Posted by jmm at 09:32 PM | Permalink »

January 31, 2007

Wikipedia is in trouble

I'm going out on a limb here: unless Wikipedia comes up with a coherent contribution policy that is consistent with the economic value of its content, it will start to deteriorate.

In a widely published Associated Press story, Brian Bergstein reports that Jimmy Wales, Wikipedia founder, Board Chair Emeritus, and currently President of for-profit Wikia, Inc., blocked the account of a small entrepreneur, Gregory Kohs, who was selling his services to (openly, with attribution) write Wikipedia articles about businesses. Wales reportedly told Kohs that his MyWikiBiz was "antithetical to Wikipedia's mission", and that even posting his stories on his personal page inside Wikipedia so independent editors could grab them and insert them in the encyclopedia was "absolutely unacceptable".

Before I get into my dire forecast, what is antithetical about someone who is paid as a professional writer to prepare content, especially if he is open about that fact? There are three "fundamental" Wikipedia editorial policies with which all contributions must comply:

  1. Neutral point of view (NPOV)
  2. Verifiability
  3. No original research

The first two are relevant here. NPOV means all content "must be written from a neutral point of view (NPOV), representing fairly and without bias all significant views." Verifiability means "any reader should be able to check that material added to Wikipedia has already been published by a reliable source." Kohs stated in his corporate materials that he is committed to compliance with these two policies: he would prepare the content for interested parties, but it would be neutral and verifiable. Of course, on any particular contribution other editors might disagree and choose to revise the content, but that is the core process of Wikipedia.

The problem is deep: arguably all contributors have a subjective (non-neutral) point of view, no matter how much they may wish, and believe otherwise. What is rather remarkable about Wikipedia is how well the group editing process has worked to enforce neutrality (and verifiability) through collective action. In any case, there is no clear reason to believe a paid professional writer is going to be systematically non-neutral any more or less than a volunteer writer.

In part, this is just a simple statement about incentives. A reasonable starting point is to accept that everyone who makes the effort to research and write material for Wikipedia is doing it for some motivating reason. Research and writing take time away from other desirable activities, so unless the writer is consistently irrational, she by revealed preference believes she is getting some benefit out of writing greater than the opportunity cost of the foregone time. It follows directly that point of view might be biased by whatever is motivating a given writer. To believe otherwise is naive. Dangerously naive, for the future of Wikipedia.

Even if the "everyone is motivated by someone" argument is too subtle for some true believers in massive social altruism, there is an obvious problem with Wikipedia's position on Gregory Kohs: surely there are many, many writers who are being paid for time and effort they devote to Wikipedia, but who are not being open about it. For example, employees of corporations, non-profits, educational institutions, etc., asked to maintain a Wikipedia entry on the corporation, who do so from an IP address not traceable to the corporation (e.g., from home). We already know from past experience that political operatives have made sub rosa contributions.

So, the problem of distinguishing between a priori neutral and a priori non-neutral contributors is deep and possibly not amenable to any reasonably effective solution. This is a fundamental problem of hidden information: the contributor knows things about her motivations and point of view that are not observable by others. Rather, others can only infer her motivations, by seeing what she writes, and at that point, the motivations are moot: if her content is not neutral or verifiable, other editors can fix it, and if she systematically violates these principles, she can be banned based on what she did, not who she purports to be.

Indeed, given the intractability of knowing the motivations and subjective viewpoints of contributors, it might seem that the sensible policy would be to encourage contributors to disclose any potential conflicts of interest, to alert editors to be vigilant for particular types of bias. This disclosure, of course, is exactly what Kohs did.

And now, for my prediction that Wikipedia is in trouble. Wikipedia has become mainstream: people in all walks of life rely on it as a valuable source of information for an enormous variety of activities. That is, the content has economic value: economic in the sense that it is a scarce resource, valuable precisely because for many purposes it is better than the next alternative (it is cheaper, or more readily available, or more reliable, or more complete, etc.). Having valuable content, of course, is the prime directive for Wikipedia, and it is, truly, a remarkable success.

However, precisely because the content has economic value to the millions of users, there are millions of agents who have an economic interest in what the content contains. Some are interested merely that content exist (for example, there are not many detailed articles about major businesses, which was the hole that Kohs was trying to plug). Others might want that content to reflect a particular point of view.

Because there is economic value to many who wish to influence the content available, they will be willing to spend resources to do the influencing. And where there are resources -- value to be obtained -- there is initiative and creativity. A policy that tries to ex ante filter out certain types of contributors based on who they are, or on very limited information about what their subjective motivations might be, is as sure to be increasingly imperfect and unsuccessful as is any spam filtering technology that tries to set up ex ante filtering rules. Sure, some of this pollution will be filtered, but there will also be false positives, and worse, those with an interest in influencing content will simply find new clever ways to get around the imperfect ex ante policies about who can contribute. And they will succeed, just as spammers in other contexts succeed, because of the intrinsic information asymmetry: the contributors know who they are and what their motivations are better than any policy rule formulated by another can ever know.

So, trying to pre-filter subjective content based on extremely limited, arbitrary information about the possible motivations of a contributor will just result in a spam-like arms race: content influencers will come up with new ways to get in and edit Wikipedia, and Wikipedia's project managers will spend ever increasing amounts of time trying to fix up the rules and filters to keep them out (but they won't succeed).

This vicious cycle has always been a possibility, and indeed, we've seen examples of pollution in Wikipedia before. The reason I think the problem is becoming quite dangerous to the future of Wikipedia its very success. By becoming such a valuable source of content, content influencers will be willing to spend ever increasing amounts to win the arms race.

Wikipedia is, unavoidably (and hooray! this is a sign of the success of its mission) an economic resource. Ignoring the unavoidable implications of that fact will doom the resource to deteriorating quality and marginalization (remember Usenet?).

Ironically, at first blush there seems to be a simple, obvious alternative right at hand: let Wikipedia be Wikipedia. The marvel of the project is that the collective editorial process maintains very high quality standards. Further, by allowing people to contribute, and then evaluating their contributions, persistent abusers can be identified and publicly humiliated (as Jimmy Wales himself was when he was caught making non-neutral edits to the Wikipedia entry about himself). Hasn't Wikipedia learned its own key lessons? Let the light shine, and better the devil you know.

(Wikipedia itself offers an enlightening summary of the battle of Kohs's efforts to contribute content. This summary serves to emphasize the impossibility of Wikipedia's fantasy of pre-screening contributors.)

Posted by jmm at 12:29 AM | Comments (4) | Permalink »

January 25, 2007

Good in or bad out?

In his New York Times Circuits Newsletter, David Pogue writes about Microsoft's recent gift of $2200 laptops to about 90 bloggers who write about technology -- laptops loaded with about-to-be-released Vista and Office 2007.

Reviewers need access to the technology they are reviewing, but as Pogue notes, MS could lend the computers.

But I'm more interested in the general point Pogue makes: we live in a culture in which most journalists are trained, and managed by editors who direct them to adhere to ethical guidelines that among other things prohibit accepting gifts from subjects of stories and reviews presented as objective. But technology is moving faster than culture, and a whole new class of influential communicators has emerged -- bloggers -- who for the most part are not trained or managed to follow a specific code of ethics.

If bloggers want durable credibility and success, the culture (theirs and the greater context in which they are embedded) will need to evolve practices and standards that establish and maintain trust. Without trust -- especially at blogs that specialize in providing information for costly decisions, like purchasing consumer electronics and software -- bloggers will lose their audiences. The speed of the development of reliable practices and reputation mechanisms may determine which parts of the blogosphere succeed, and whether much of it degenerates into a morass of spam-like paid (but disguised) product placement announcements.

Posted by jmm at 04:10 PM | Permalink »

January 06, 2007

Spam as security problem

Here is the blurb Rick Wash and I wrote for the USENIX paper (slightly edited for later re-use) about spam as a security problem ripe for ICD treatment. I've written a lot about spam elsewhere in this blog!

Spam (and its siblings spim, splog, spit, etc.) exhibits a classic hidden information problem. Before a message is read, the sender knows much more about its likely value to the recipient than does the recipient herself. The incentives of spammers encourage them to hide the relevant information from the recipient to get through the technological and human filters.
While commercial spam is not a traditional security problem, it is closely related due to the adversarial relationship between spammers and email users. Further, much spam carries security-threatening payloads: phishing and viruses are two examples. In the latter case, the email channel is just one more back door access to system resources, so spam can have more than a passing resemblance to hacking problems.

Posted by jmm at 11:48 PM | Comments (0) | Permalink »


Here's a paragraph Rick Wash and I wrote for the USENIX paper, somewhat revised for later use, concerning spyware:

An installer program acts on behalf of the computer owner to install desired software. However, the installer program is also acting on behalf of its author, who may have different incentives than the computer owner. The author may surreptitiously include installation of undesired software such as spyware, zombies, or keystroke loggers. Rogue installation is a hidden action problem: the actions of one party (the installer) are not easy to observe. One typical design response is to require a bond that can be seized if unwanted behavior is discovered (an escrowed warranty, in essence), or a mechanism that screens unwanted behavior by providing incentives that induce legitimate installers to take actions distinguishable from those who are illegitimate.

Posted by jmm at 11:43 PM | Comments (0) | Permalink »

December 31, 2006

Flaming as pollution

David Pogue (New York Times) worries about another kind of pollution overwhelming popular user-contributed content sites: flaming.

In 2007, the challenge may be keeping that conversation from descending into the muck.

As a Web 2.0 site or a blog becomes more popular, a growing percentage of its reader contributions devolve into vitriol, backstabbing and name-calling....One thing is clear, however: the uncivil participants are driving away the civil ones. The result is an acceleration of the cycle, and an increasing proportion of hostile remarks.

Interesting point. Flaming isn't new, but that doesn't mean it should be ignored: it does tend to degrade the quality of otherwise valuable open access resources.

Are there incentive-centered designs that could reduce flaming? Pogue mentions requiring real names, though he acknowledges that might drive away many desirable users. He concludes that people may just have to accept flaming the way they accept spam: can't we do better?

Posted by jmm at 01:16 AM | Permalink »

December 24, 2006

Spamming Web 2.0

The New York Times today ran a short note highlighting CNET's story about commercial spamming of Digg.com and similar sites. There are companies being paid upwards of $15,000 to get a product placed on the front page of Digg, and most recently a top 30 Digger admitted that he entered an agreement to help elevate a new business to the front page of Digg (and solicited the other top 30 Diggers to participate).

The world was pretty darned excited when it discovered email (for most people, in the early 1990s). Spam followed in a big way within a year or two. It's clear to me that we're on the same trajectory with user-contributed content sites on the Web. There is an ever-increasing need for incentive-centered designs to help keep the bad stuff out.

Posted by jmm at 08:17 AM | Permalink »

November 30, 2006

Yelp: Local reviews via social networking site: why contribute?

So, reviews of local businesses written by local patrons are popular. Why not? Newspapers have always done well running "Best of ___" or "Reader's Choice" contests. Now we have Yelp.com, Judy's Book, Intuit's Zipingo, Insider Pages, and offerings from Yahoo!, Microsoft Live and other players. Even our small city (Ann Arbor, MI) has about 250 businesses reviewed by the newest entrant, Yelp:

And the venture capitalists are giving the new players some dough.

But, why? These sites will make revenues if they sell ads, which should work if there are eyeballs since the eyeballs will be looking specifically for businesses in the local area so advertising on the page should have a good return. But to get eyeballs, these sites have to get volunteer labor to enter ratings and write reviews. And those volunteers come from a diffuse group of local business patrons, many of whom don't know from Web 2.0, and even fewer know about Yelp.com. And even if they know, what's in it for the volunteers?

It's possible that these Web 2.0 companies are simply using Incentives 1.0: They could hire paid reviewers who at least seed the site with reviews on a number of popular businesses in each city. Yelp and the others claim that they don't do this: "real reviews from real people" (I guess we're supposed to assume that paid employees are not real people). But how would users know if they did? What forfeitable bond is Yelp posting to convince us they are trustworthy? Or if they bribed "real people" to do reviews by sending a salesperson to the establishments and handing out bling in exchange for promises to enter a review?

There's another old-school way to get review content generated, too: tell the business owners about your site, and they'll take the initiative to write their own reviews (the "Amazon" problem). And so that they look popular -- not just loved by one critic -- they ask their mothers and cousins to submit reviews too. Again, how could we tell?

Posted by jmm at 01:46 AM | Permalink »

November 29, 2006

Research presentation: Web 2.0 and ICD

On 20 Nov 06 I gave an invited plenary Association Lecture at the Southern Economic Association Annual Conference in Charleston, SC. The title was "Getting the good stuff in, keeping the bad stuff out: Incentives and the Web". Here are the slides (not PowerPoint!).

In this talk geared to professional economists I explained the user-contributed content explosion that is one characteristic of so-called Web 2.0, and showed that this is happening through all phases of information production, organization, retrieval and use. I then discussed three fundamental economic issues that arise with user-contributed content: getting the good stuff in (private provision of public goods); keeping the bad stuff out (pollution); and evaluating the stuff (signaling, reputation). Familiar topics to the hordes who read this blog!

I finished with a simple elaboration to illustrate how ICD methods could be used to design mechanisms for dealing with these problems. The model is based on an event that occurred last spring on Digg.com.

Posted by jmm at 11:33 PM | Comments (0) | Permalink »

October 26, 2006

Political Google bombing: bad stuff in

A New Campaign Tactic: Manipulating Google Data - New York Times

Posted by jmm at 11:14 PM | Permalink »

September 07, 2006

Digg changes algorithm to help keep bad stuff out

Is Digg rigged? JP thinks so. He offers an informal analysis indicating that several "top 30" Digg users cross-dig each other's posts frequently, which could mutually contribute to them staying in the top group of users.

For this post, I don't really care if there is a cartel of Digg users coordinating self-promotion. The story for ICDers is that Digg changed its system to try to reduce the possibility of this type of pollution.[1]

The announced goal is to "weigh a diversified group of Diggers more heavily than groups acting together." According to Digg co-founder Kevin Rose,

This algorithm update will look at the unique digging diversity of the individuals digging the story. Users that follow a gaming pattern will have less promotion weight. This doesn't mean that the story won't be promoted, it just means that a more diverse pool of individuals will be need [sic] to deem the story homepage-worthy.
As Rose notes, keeping the bad stuff out (the pollution problem I regularly discuss) is a well-known and ongoing challenge to a user-contributed content community: "we have learned a lot about the user base and how to defend digg from spam, artificial diggs, and digg fraud. It's a battle we will continue to fight and one that we don't take lightly" (id).


1There are interesting questions about the conditions under which it is in one's self-interest to cooperate with a cartel, and what the enforcement mechanisms are that enable this. In fact, my first published article concerned conditions causing international mineral cartels of the past century to succeed or fail: MacKie-Mason, Jeffrey K. and Robert S. Pindyck, "Cartel Theory and Cartel Experience in International Minerals Markets," in Energy: Markets and Regulation, Richard L. Gordon, Henry D. Jacoby and Martin B. Zimmerman, eds. Cambridge: MIT Press, 1987: 187-214.)

Posted by jmm at 02:17 AM | Comments (0) | Permalink »

August 02, 2006

Beating code with code

CAPTCHAs are a great example of a clever incentive-centered design for an information world problem. But, as many people point out, they aren't perfect. Matt May at W3C has a nice slide presentation explaining CAPTCHAs and a number of their accessibility problems (based on a nice paper for those with more taste for details). He also discusses a variety of ideas about how to do better. Clever as some are, they all suffer a common problem: the incremental improvement from each is largely a technological fix, not an improvement in the incentive structure of CAPTCHAs. And technological fixes in this area are doomed to fail approximately equally rapidly.

What do I mean by this? The costs of computing cycles are falling exponentially, and the implementing usable clever algorithms is probably falling at a slower but still exponential rate (if for no other reason than a big part of the cost is the enormous computational power needed for some tough problems like password cracking and automated visual recognition of CAPTCHAs, etc.).

Technological fixes are just a loop in an arms race. CAPTCHAs, for example, grew out of the observation that automated visual recognition of distorted alphanumerics was pretty poor a few years ago. But now, largely in response to CAPTCHAs, automated breaking has rapidly advanced, and CAPTCHA security is getting rather weak (which is why it's used only to protect relatively low value resources).

Unless we identify a human cost (or more precisely, difference in cost between good guys and bad guys, a difference we can use to distinguish between them), and design incentives around that cost (or benefit, if you want to flip the sign bit) tech fixes will be very short term and their efficacy will decrease rapidly. Incentive-based solutions can be more durable if they are based on features of humans or their utility functions that are are not subject to technological end-runs. It's true, it's not always easy to find incentives that aren't susceptible to end-runs, but it's not hopeless. Money works pretty well in many cases; sure, technology (i.e., counterfeiting) can sometimes do an end-run, but the rate at which technology has been making money obsolete as an effective incentive is a whole lot slower than pattern-recognition software is advancing on CAPTCHAs and the post-CAPTCHA fixes that W3C discusses.

Posted by jmm at 12:45 AM | Comments (0) | Permalink »

June 17, 2006

Google bombing keeps booming

An enterprising spammer apparently has created over 5 billion pages and gotten Google to index them, in only 18 days. The pages carry Adsense (Google) ads, and apparently by spamming blogs they are getting enough cross-links to get ranked and hit in searches. The business model is to steer hapless searchers to totally useless pages and get Adsense clickthrough revenues.

The idea isn't new, but the scale is pretty astonishing. The Monetize blog article linked above includes step-by-step instructions on how to do this. A pernicious type of pollution, and it looks like it may be growing exponentially: a problem begging for an incentives-based solution!

Posted by jmm at 06:20 PM | Permalink »

May 11, 2006

But that's just the tip of the iceberg...

CNet captures some anecdotes about the rise in splog (spamming blogs) in "Blogosphere suffers spam explosion". They're right of course, but the following was not the most impressive summary:

While technology and legislation may have made spam in e-mail manageable, there is still some way to go when it comes to keeping it out of blogs.

Two common types of splog are comments or tracebacks that point to a commercial site (often for medications or porn), or comments (or fake blogs) filled with links to raise the PageRank (Google index strength) for sites.

Posted by jmm at 11:31 PM | Comments (0) | Permalink »

Been splogged

Just a quick personal note: this is the least publicized blog on the planet (and no one seems to care enough about it to leave comments!), but I've been splogged nonetheless. Was a few weeks ago, in the midst of last week of class so I tucked this away for a better day (the site seems to be gone, so I'm putting in the full posting including URL):

Sent: Thursday, April 20, 2006 11:29 PM
To: jmm@umich.edu
Subject: [ICD stuff] New TrackBack Ping to Entry 2992 (Principal-agent problem in action)

A new TrackBack ping has been sent to your weblog, on the entry 2992 (Principal-agent problem in action).

IP Address:
Title: pregnant movies

pregnant porn pregnant fuck pregnant milk gallery

Posted by jmm at 11:27 PM | Comments (0) | Permalink »

May 10, 2006

CAPTCHAs (2): Technical screens vulnerable to motivated humans

A particularly interesting approach to breaking purely technical screens, like CAPTCHAs, is to provide humans with incentives to end-run the screen. The CAPTCHA is a test that is easy for humans to pass, but costly or impossible for machines to pass. The goal is to keep out polluters who rely on cheap CPU cycles to proliferate their pollution. But polluters can be smart, and in this case the smart move may be "if you can't beat 'em, join 'em".

Say a polluter wants to get many free email accounts from Yahoo! (from which to launch pollution distribution, such as spamming). Their approach was to have a computer go through the process of setting up an account at Yahoo! and to replicate this many times to get many accounts. For many similar settings, it is easy to write code to automatically navigate the signup (or other) service.

CAPTCHAs make it very costly for computers to aid polluters, because most computers fail, or take a very long time decoding a CAPTCHA.

As I discussed in my CAPTCHAs (1) entry, one approach for polluters to get around the screen is to improve the ability of computers to crack the CAPTCHA. But another is to give in: if humans can easily pass the screen, then enlist large numbers of human hours to get past the test repeatedly. There are at least two ways to motivate humans to prove repeatedly to a CAPTCHA that they are human: pay low-wage workers (usually in developing countries) to sit at screens all day and solve CAPTCHAs, or give (higher-wage) users some other currency they value to solve the CAPTCHAs: the most common in-kind payment has been access to a collection of pornography in exchange for solving a CAPTCHA.

This puts us back in the usual problem space for screening: how to come up with a screen that is low cost for desirable human participants, but high cost for undesirable humans?

The lesson is that CAPTCHAs may be able to distinguish humans from computers, but only if the computers act like computers. If they enlist humans to help them, the CAPTCHAs fail.

Ironically, enlisting large numbers of humans to solve problems that are hard for computers is an example of what Louis von Ahn (one of the inventors of CAPTCHAs) calls "social computing".

Posted by jmm at 12:37 AM | Comments (0) | Permalink »

CAPTCHAs (1): Technical screens are vulnerable to technical progress

One of the most wildly successful technical screening mechanisms for blocking pollution in recent years is the CAPTCHA (Complete Automated Public Turing Test to Tell Computers and Humans Apart). The idea is ingenious, and respects basic incentive-centered design principles necessary for a screen to be successful. However, it suffers from a common flaw: purely technical screens often are not very durable because technology advances. I think it may be important to include human-behavior incentive features in screening mechanisms.

The basic idea behind a CAPTCHA is beautifully simple: present a graphically distorted image of a word to a subject. A computer will not be able to recognize the word, but a human will, so a correct answer identifies a human.

Of course, as we know from screening theory, for a CAPTCHA to work, the cost for the computer to successfully recognize the word has to be substantially higher than for humans. And, since the test is generally dissipative (wasteful of time, at least for the human user), the system will be more efficient (user satisfaction will be higher) the lower is the screening cost for the humans. So, the CAPTCHA should be very easy for humans, but hard to impossible for computers.

With rapidly advancing technology (not just hardware, but especially machine vision algorithms), the cost of decoding any particular family of CAPTCHAs will decline rapidly. Once the decoding cost is low enough, the CAPTCHA no longer screens effectively: we get a pooling equilibrium rather than a separating equilibrium (the test can't tell computers and humans apart). The creators of CAPTCHAs (Ahn, Blum, Hopper and Langford) note, reasonably enough, that this isn't all bad: developing an algorithm that has a high success rate against a particular family of CAPTCHAs is solving an outstanding artificial intelligence problem. But, while good for science, that probably isn't much comfort to people who are relying on CAPTCHAs to secure various open access systems from automated polluting agents.

The vulnerability of CAPTCHAs to rapid technological advance is now clear. A recent paper shows that computers can now beat humans at single character CAPTCHA recognition. The CAPTCHA project documents successful efforts to break two CAPTCHA families (ez-gimpy and gimpy-r).

Posted by jmm at 12:12 AM | Comments (0) | Permalink »

April 08, 2006

Some basics from the economics of pollution

I have taken to characterizing "keeping the bad stuff out" of user-contributed content resources as a pollution problem. What can we learn from the long-standing economics literature on pollution (traditionally about environmental pollution)?

For this entry, I'm going to quote some passages from an article I published on "Economic Incentives" in the Encyclopedia of the Environment (Marshall Cavendish, Tarrytown, NY, 2000):

When costly side effects can be ignored by a polluter, there will be too much pollution relative to a Pareto optimum. Several policies can give polluters an economic incentive to consider side effects when deciding how much pollution to generate. The three most important economic incentives are taxes, subsidies, and tradable permits. These all work by internalizing the externality, that is by making the polluter directly face the cost created by the pollution.

Taxes, subsidies, tradable permits: this is a pretty limited palette, though it does comprise most of what the environmental economics literature has comprised. Over time, I'll be suggesting other incentive-based approaches to pollution in user-contributed content resources, but for now will stick to these three:

Taxes. Since 1920 economists have recommended imposing a tax on polluting activity equal to the incremental social cost imposed by that activity (A. C. Pigou, 1920). Then as long as the social cost of the pollution is greater than cost of prevention or clean-up, the polluter will want to reduce the pollution, to the point at which the social benefits of further reductions are not sufficient to warrant the costs of obtaining the reductions. A Pareto optimum can be achieved.
Subsidies. Rather than impose a tax to discourage pollution the government can offer an equal subsidy per unit of reduction. A subsidy per pound of gunk reduced creates the identical incentive as a tax per pound produced: each pound of gunk eliminated raises profits by the amount of the subsidy or tax. The main difference between taxes and subsidies is distributional: the cost of control can be paid by taxpayers (through a subsidy), or by some combination of the factory's owner, workers and customers (through a tax).
Tradable permits. A very different approach to using economic incentives for environmental problems is to create a market in which the polluter must pay a price for the use of the formerly unpriced input (e.g., clean air). The usual method is for the government to issue permits for a fixed amount of gunk and to allow individuals and firms to buy and sell the permits. The fewer the permits the higher will be their market price. By controlling the quantity of permits the government can control the permit price so that the polluter has to pay the same amount per pound of gunk as it would under a tax or subsidy. Thus, all three methods can solve equivalently the problem of equating the costs and benefits of externalities, while using the polluter's self-interest to obtain the socially desirable level of control.

Another well-known point from the pollution economics literature, this one quite relevant for the prevalent approach of trying to design technical fixes to prevent information pollution:

One advantage of using these economic incentives is that a given level of pollution control can be attained at the least cost. For example, with tradable permits, the permits will be most valuable to polluters with the highest control costs; they will purchase the permits, while those with lower control costs sell their permits and reduce their pollution. This result contrasts with the use of emission standards: all polluters must to control to a given level, even though it will likely be cheaper to have some polluters control a bit more while others control an equal amount less.

In the article I also discussed several complications that change the way in which the above mechanisms work. These include uncertainties about the benefits or costs of control; market imperfections in the markets in which polluters operate; non-convexities caused by severe or irreversible harms. These all have analogues for information pollution as well. Of course, the discussion above is only in terms of the effectiveness of these methods for maximizing social efficiency; if the objective function puts weight on other factors as well, these approaches may work less well.

Posted by jmm at 11:30 PM | Comments (0) | Permalink »

Polluting user-contributed reviews

A recent First Monday article by David and Pinch (2006) documents an interesting case of book review pollution on Amazon. A user review of one book critically compared it to another. Immediately following a "user" entered another review blatantly plagiarizing a favorable review of the first book, and further user reviews did additional plagiarizing.

When the author of the first book discovered the plagiarism, he notified Amazon which at the time had a completely hands-off policy on user reviews, so it refused to intervene even for blatant plagiarism. (The policy since has changed.) Another example of the problem of keeping bad quality contributions out.

David and Pinch remind us that when an Amazon Canada programming glitch revealed reviewer identities,

a large number of authors had "gotten glowing testimonials from friends, husbands, wives, colleagues or paid professionals." A few had even 'reviewed' their own books, and, unsurprisingly, some had unfairly slurred the competition.

David and Pinch address the issue of review pollution at some length. First, the catalogue six discrete layers of reputation in the Amazon system, including user ratings of reviews by others, and a mechanism to report abuse. Then they conducted an analysis of 50,000 reviews of 10,000 books and CDs. Categories of review pollution they identified automatically (using software algorithms):

They also make an interesting point about the arms-race limitations of technical pollution screens:

The sorts of practices we have documented in this paper could have been documented by Amazon.com themselves (and for all we know may have indeed been documented). Furthermore if we can write an algorithm to detect copying then it is possible for Amazon.com to go further and use such algorithms to alert users to copying and if necessary remove material. If Amazon.com were to write such an algorithm and, say, remove copied material, this will not be the end of the story. Users will adapt to the new feature and will no doubt try and find new ways to game the system.

Posted by jmm at 02:35 PM | Comments (0) | Permalink »

Spam economics: Private stamps vs. repudiable bond payments to recipients

After years during which everyone talked about economic incentives to better sender and receiver interests in unsolicited email, we may finally be seeing the dawn of the incentive-centered design era for email.

AOL and Yahoo! this winter announced they were adopting the Goodmail system to create a special class of incoming mail: senders that paid the Goodmail fee per message would have their mail placed directly in user inboxes, with no server-side filtering or blocking by the ESP (email service provider, AOL and Yahoo! in this case). Mail without the Goodmail stamp will receive traditional treatment, being filtered and possibly placed in the user's spam folder.

A rather loud debate immediately followed, focused primarily on one concern: AOL and Yahoo! would tighten the filtering screws on unstamped email, eventually shoving so much of it into the spam folder that everyone would be "forced" to pay for the Goodmail stamp or likely have their mail discarded, unopened by users (or users would be forced to treat their spam folder as a regular inbox, and lose the benefits of the filtering). Nonprofits in particular howled because, they claimed, their mail is valuable, but they are too poor to pay for the stamps. (If members of non-profits aren't willing to pony up $0.25 per email in member fees, just how valuable are the millions of pieces of mail that non-profits want to send?)

But rather than get into that debate right now (see "Backlash to sender-pays email incentives"), I want to discuss the economics of two different but related approaches to using financial incentives to economically filter spam: the private stamp (Goodmail) approach, and the use of recipient-repudiable bonds ("stamps" vs. "bonds" for short).

The bond approach is similar to stamps, with critical differences. The sender pays for a (digitally-signed) stamp; mail with that stamp goes directly into the reader's inbox, unfiltered. However, after opening the message, the reader can either keep the stamp (push a button in the mail client to "deposit stamp"), or relinquish it back to the sender, which can be interpreted as a message that "I valued this mail, you can send more like it in the future."

How do the differences matter? First, an implementation issue: it is relatively easy for a third-party provider like Goodmail to implement a payment deposit system; it is not nearly so easy, at least right now, for individual email users to receive a micropayment attached to every email and deposit it. Email clients aren't programmed for this, and in any case, the necessary micropayments infrastructure just doesn't exist (yet) at that level of granularity.

Assuming that technical detail can be solved in the near future, how else are the two different? One of the most important differences is the very limited role for recipient preferences in the private stamp approach. A stamp of, say, $0.01, will discourage senders from sending email that is worth less than $0.01 for the sender. But the threshold is being set by the third party (Goodmail, perhaps together with an ESP like AOL), not by individual users, and thus does not directly reflect the value to the recipient of receiving unsolicited email (or not). Arguably, competition between ESPs would push the stamp price to about the right average level over time, but it would not reflect heterogeneity in user preferences.

A bond system could with little or no cost allow each user to set their own threshold for the required size of bond, thus allowing recipients to customize their own mail preferences.

Another problem with the stamp approach is that that goes through this channel pays for a stamp. For mail that both sender and recipient agree is desirable, that incurs unnecessary expense. But perhaps more important, it will prevent some desirable mail from being sent. Suppose the stamp is $0.01, and a sender has mail to send that the sender values at only worth $0.005 if delivered, but the recipient also values at $0.02 if received. The sender won't be willing to buy the stamp, and the mail won't get sent. With a repudiable bond, however, the sender might send a trial message, and if the recipient repudiates the bond, the sender will know the recipient values the mail and will allow similar messages to arrive without a bond payment in the future.

Why won't recipients always keep the bond payment? Well, first, this would just make the system work the same as stamps (except that users get the money, not a third party), so that's not a reason why bonds are worse. However, it also doesn't make sense in the example above. If I want to receive, say, an electronic catalog, but I keep the bond, then the sender may stop sending to me, and I lose out.

This is a very quick review of the two approaches, and yes, of course the issues can be more subtle. See Loder et al. for a scholarly discussion of the two.* Vanquish Labs, a vendor of a bond system, has an online article that critiques the Goodmail stamp approach (February 2006 : CertifiedMail = Certified Disaster).

*Thede Loder, Marshall Van Alstyne, and Rick Wash. "An economic solution to unsolicited communication". Advances in Economic Analysis and Policy, 6 (1), 2006.

Posted by jmm at 11:51 AM | Comments (0) | Permalink »

Keeping bad stuff out: Making a play on social news sites?

About a month ago, some rumors were about that Google was about to acquire Sun Microsystems. The news got hot when blog stories claiming an acquisition was imminent were promoted to the front page on community/social new site Digg.com. It pretty quickly became clear that the rumors were largely unfounded. What hasn't been quickly resolved is whether or not someone tried to manipulate Digg, possibly to cash in on speculative trading in Google or Sun stock.

The basic idea is simple: get enough shill users to vote for a financially-significant rumor to promote it to the front page, thus automatically getting more widespread attention, and hope that the burst of attention causes a temporary stock price adjustment that can be exploited. (For example, in an acquisition the price of Sun would almost surely increase, and thus gullible readers might start buying it and bidding it up; the scam artist could purchase shares in advance to sell at the inflated price, or sell it short at the bubble price and collect when price returns to normal.)

Digg claims that it almost surely was not manipulated, but it seems clear that such manipulation is possible in user-contributed content news sites. Recall how Rich Wiggins found that people could get flim-flam press releases fed into Google News (here and here), and how authors using pseudonyms have promoted their own books with favorable "reviews" on Amazon.com.

It appears that in the past Digg has been manipulated (though apparently as an experiment, not to manipulate stock prices).

Posted by jmm at 02:05 AM | Comments (0) | Permalink »

April 05, 2006

ET call ICD

Wired News: Cheaters Bow to Peer Pressure

This is an old (2001) story I recently heard about. At the time, the SETI@Home project (which distributes the search for extraterrestial intelligence computations to volunteer machines around the world) was plagued by cheaters, including folks who hacked the client software to report that they were contributing far more CPU cycles than they actually were, apparently in order to get the reputation of being the top contributor in the search for ET. (Can you put that on a resume and get a job?)

Another user-contributed content (or effort, in this case) service that had trouble keeping the bad stuff out.

Posted by jmm at 10:38 PM | Permalink »

March 18, 2006

Local "keep the bad stuff out" problem

We locally had an annoying pollution experience yesterday. Our research group at UM runs an ICD wiki for sharing our research, announcements, &c. Access is pretty open, and sure enough, after about a year in operation, a splogger found us. He or she created an account and added spam links to about 40 pages in the wiki (invisible to us but visible to search engines, to increase the link rankings for the underlying spam sites). One of our grad students, Rick Wash, spent hours cleaning things up for us. What's the solution?...

We haven't thought about any incentive schemes to protect our wiki yet (time to start thinking!). The obvious technological solution is to limit editing access to accounts authorized by a moderator, but that is not a great solution: we have over 120 new master's students entering the program every year, and we want them to be able to participate, but we don't have an automated system in place to give them accounts, so either they get to create their own, or we have to install some more overhead.

We could use the human solution, as Wikipedia does: let anyone in, but keep a close eye on changes, clean them up and disable abusing accounts -- what Rick did this time. But we don't have a lot of hard-core users, and that could become quite a large burden on the few who have wiki-admin skills.

Just a mildly painful reminder that there's a reason for us to be researching these problems!

Posted by jmm at 12:37 PM | Comments (0) | Permalink »

March 15, 2006

i-Newswire is out, that's who

A couple of days after Rich Wiggins posted his blog story about the ability to place false news stories in Google News, CNN has picked up the story, and Google has now dropped i-Newswire as a source for Google News.

i-Newswire was a user-contributed content (UCC) service, and thus subject to the pollution problem I've been discussing (link and link). More precisely, i-Newswire is an un-moderated or un-edited UCC service (all press release newswires rely on user-contributed content, but most employ editors to decide whether press released are legitimate).

Google News, on the other hand, is not a UCC, and is edited: there is central control over which content feeds are included. So, in a crude way, Google can handle the pollution problem: if pollution is coming in through channel A, turn channel A off. Google News may be a case where a technological pollution prevention approach will work pretty well, obviating the need for an incentive system.

Posted by jmm at 10:25 PM | Comments (0) | Permalink »

March 14, 2006

Digg, Google News...User-contributed "news"

I'm developing an interest in the phenomenon of user-contributed content, and the two fundmental incentives problems that it faces: pollution (keeping the bad stuff out) and the private provision of public goods (inducing contributions of the good stuff). User-contributed "news" is one example to explore.

Digg.com is one currently hot user-contributed news site:

Digg is a technology news website that combines social bookmarking, blogging, RSS, and non-hierarchical editorial control. With digg, users submit stories for review, but rather than allow an editor to decide which stories go on the homepage, the users do.

Slashdot of course is the grande dame. Digg and Slashdot both rely on multiple techniques of community moderation to try to maintain the quality of content (keep out the pollution). For example, proposed stories for Digg are not promoted to the homepage until they have sufficient support from multiple users; and users can report bad entries (apparently to a team of human editors).

How effective (and socially costly) are these community moderation techniques? By now we've all heard about Wikipedia founder Jimmy Wales manipulating his own Wikipedia entry, which led to publicity about multiple members of Congress, etc., who have been doing the same thing.

And even if a site has an efficient moderation system to filter out pollution, there is still the problem of inducing people to volunteer time and effort to contribute to the public good by creating valuable content. Obviously, this can happen (see Slashdot, Wikipedia). But suppose you are designing a new user-contributed content service: how are you going to create a community of users, and how are you going to induce them to donate (high quality) content?

Apparently we can now start to count Google News as a site for user-contributed news.

Posted by jmm at 08:18 AM | Comments (0) | Permalink »

Spamming Google News: Who's in, who's out?

An old acquaintance of mine, Rich Wiggins, recently blogged about his discovery of how easy it is to insert content in Google News. He discovered this when he noticed regular press releases published in Google News that were a front for the musings of self-proclaimed "2008 Presidential contender" Daniel Imperato. Who?

Wiggins figured out how Imperato did it, and tested the method by publishing a press release (screen shot) about his thoughts while celebrating his 50th birthday in Florida. Sure enough, you can find this item by searching on "Rich Wiggins" in Google News.

This is (for now) a fun example of one of the two fundamental incentives problems for important and fast-growing phenomenon of user-contributed content:

  1. How to keep the undesirable stuff out?
  2. How to induce people to contribute desirable stuff?

The first we can call the pollution problem, the second the private provision of public goods problem. Though Wiggins example is funny, will we soon find Google News polluted beyond usefulness (the decline of the Usenet was largely due to spam pollution).

Blogs, of course, are a major example of user-contributed content. At first glance, they don't suffer as much from the first problem: readers know that blogs are personal, unvetted opinion pages, and so they don't blindly rely on what is posted as truth. (Or do they?) But then there's the problem of splogging, which isn't really a problem for blogs as much as for search engines that are being tricked into directing searchers to fake blog pages that are in fact spam advertisements (a commercial variant on the older practice of Google bombing).

There is a lengthy and informative Wikipedia article that discusses the wide variety of pollution techniques (spamming) that have been developed for many different settings (besides email and blogs, also instant messaging, cell phones, online games, wikis, etc.), with an index to a family of detailed articles on each subtype.

Posted by jmm at 07:44 AM | Comments (0) | Permalink »