November 26, 2008
What is pollution, what is manipulation?
For some time, I've referred to a variety of user-contributed content activities as "pollution". Spam, for example. The typical user doesn't want to receive it. It pollutes the inbox.
But some behaviors that reduce the value of user-contributed content are commonly called "manipulation". For example, stuffing the ballot box in an online rating system, such as Netflix, which might be done by, say, the producer of a movie. My colleagues, Resnick and Sami, have been publishing work on "manipulation-resistant" systems .
Is there a difference? In both cases, a user with a product to sell is submitting content that most users would agree (if they knew) has negative value. Why not call them both pollution? Both manipulation?
I think there is a difference, but it's more a matter of degree than absolute. The defining features of pollution are that the polluter does not benefit from the pollution itself: the pollution cost imposed on users is an inadvertent. They are victims of a side-effect. This is also known as an externality problem: the producer creates and benefits from creating X; X imposes a cost on others, but the producer's benefit is not directly related to the cost imposed on others (the producer is not generating pollution because she gets satisfaction from making others suffer).
Manipulation costs are not externalities: the benefit to the producer is directly related to the cost experienced by the others. For example, in the Netflix example, the cost to users is that they pay for and watch movies that are not as suited to their tastes as they otherwise would. But that is precisely the outcome that the manipulative content producer wanted to achieve. The manipulator intends to get the others to do or experience something they would would rather not.
I said this was a matter of degree: In the spam example, some of the benefit (sometimes, perhaps most) to the producer is that she convinces consumers to purchase, even though ex ante those consumers might have said they would rather not receive the spam advertisements (considers them polluting). Thus, part of the costs of spam might be manipulation costs. The producer doesn't care about many users, who ignore the spam but suffer from it, but does care about the effect she is having on those she manipulates into purchasing.
Why does it matter what we call them? It may not matter much, but labels are useful for guiding our understanding, and our design efforts. If I recognize something as having the features of a pollution problem, I immediately know that I can refer to the literature on pollution problems to help me characterize it, and the literature on solving pollution problems to help general good designs. Labels are short hand for abstracting models, compactly representing the essential features of the problem and its context.
 Eric Friedman, Paul Resnick, and Rahul Sami (2007). "Manipulation-Resistant Reputation Systems", Ch. 27 in Algorithmic Game Theory, (N.~Nisan, T.~Roughgarden, E.~Tardos, V.~Vazirani, editors),
Cambridge University Press, 2007.
 Paul Resnick and Rahul Sami (2007). "The Influence-Limiter: Provably Manipulation-Resistant Recommender Systems", Proceedings of the ACM Recommender Systems Conference.
New UCC opportunity, new opportunity for manipulation and spam
Google has made available a striking set of new features for search, which it calls SearchWiki. If you are logged in to a Google account, when you search you will have the ability to add or delete results you get if you search that page again, re-order the results, and post comments (which can be viewed by others).
But the comments are user-contributed content: this is a relatively open publishing platform. If others search on the same keyword(s) and select "view comments" they will see what you entered. Which might be advertising, political speech, whatever. As Lauren Weinstein points out, this is an obvious opportunity for pollution, and (to a lesser extent in my humble opinion, because there is no straightforward way to affect the behavior of other users) manipulation. In fact, she finds that comment wars and nastiness started within hours of SearchWiki's availability:
It seem inevitable that popular search results in particular will
quickly become laden with all manner of "dueling comments" which can
quickly descend into nastiness and even potentially libel. In fact,
a quick survey of some obvious search queries shows that in the few
hours that SearchWiki has been generally available, this pattern is
*already* beginning to become established. It doesn't take a
lot of imagination to visualize the scale of what could happen with
the search results for anybody or anything who is the least bit
Lauren even suggests that lawsuits are likely by site owners whose links in Google become polluted, presumably claiming they have some sort of property right in clean display of their beachfront URL.
November 10, 2008
Don't worry about contributed content: Wikipedia has figured it all out!
When I explain to people the fundamental ICD problem of motivating users to contribute content to a user-contributed content information resource, I often use Wikipedia as a familiar example: "Why do so many people voluntarily donate so much time and effort to research, write content, and copy edit and correct the content of others? That's a lot of unpaid work!"
Some people ask what the problem is, and why this needs academic research: "Wikipedia is doing great! They don't need to come up with clever incentives to motivate contribution." My reply: "Yes (maybe), but the point is, how do we create the next Wikipedia" (that is, another fabulously successful and valuable information resource dependent on all that volunteer labor)? What is the special sauce? Is it replicable?
Simson Garfinkel has an article in the current Technology Review that, indirectly, makes the point nicely. Yes, Wikipedia is fabulously successful...in some ways. But certainly not everyone thinks Wikipedia is that final word in online reference, such that we don't need to create any other reference resources. Simson focuses on "Wikipedia and the Meaning of Truth". Wikipedia's primary rule for admissible content is not that it be verifiably true (which would be diffcult to enforce, to say the least!), but that it be verifiably published somewhere "reliable".
That not everything in Wikipedia is correct is well-known, and not surprising. There are enthusiastic debates about whether it is as accurate as traditional encyclopedias, like Britannica. And so forth. The point is: many people want other types of reference resources as an alternative, or at least as a complement to Wikipedia. And thus the question: to build such a resource with user-contributed content, we need to motivate the users.
Some are trying to create more accurate, reliable alternatives, and they are not nearly as successful in getting contribution as Wikipedia has been. One of the interesting examples is Google's Knol, which is trying to establish greater reliability by having each topic "owned" by its original author (who may then permit and seek contributions from other users).
Do you think Wikipedia is the final word, forever, in online reference? If not, perhaps you should be wondering how to motivate users to contribute to other resources, and thinking about whether motivation is trivial now that Wikipedia has "figured it out".