November 26, 2008
What is pollution, what is manipulation?
For some time, I've referred to a variety of user-contributed content activities as "pollution". Spam, for example. The typical user doesn't want to receive it. It pollutes the inbox.
But some behaviors that reduce the value of user-contributed content are commonly called "manipulation". For example, stuffing the ballot box in an online rating system, such as Netflix, which might be done by, say, the producer of a movie. My colleagues, Resnick and Sami, have been publishing work on "manipulation-resistant" systems .
Is there a difference? In both cases, a user with a product to sell is submitting content that most users would agree (if they knew) has negative value. Why not call them both pollution? Both manipulation?
I think there is a difference, but it's more a matter of degree than absolute. The defining features of pollution are that the polluter does not benefit from the pollution itself: the pollution cost imposed on users is an inadvertent. They are victims of a side-effect. This is also known as an externality problem: the producer creates and benefits from creating X; X imposes a cost on others, but the producer's benefit is not directly related to the cost imposed on others (the producer is not generating pollution because she gets satisfaction from making others suffer).
Manipulation costs are not externalities: the benefit to the producer is directly related to the cost experienced by the others. For example, in the Netflix example, the cost to users is that they pay for and watch movies that are not as suited to their tastes as they otherwise would. But that is precisely the outcome that the manipulative content producer wanted to achieve. The manipulator intends to get the others to do or experience something they would would rather not.
I said this was a matter of degree: In the spam example, some of the benefit (sometimes, perhaps most) to the producer is that she convinces consumers to purchase, even though ex ante those consumers might have said they would rather not receive the spam advertisements (considers them polluting). Thus, part of the costs of spam might be manipulation costs. The producer doesn't care about many users, who ignore the spam but suffer from it, but does care about the effect she is having on those she manipulates into purchasing.
Why does it matter what we call them? It may not matter much, but labels are useful for guiding our understanding, and our design efforts. If I recognize something as having the features of a pollution problem, I immediately know that I can refer to the literature on pollution problems to help me characterize it, and the literature on solving pollution problems to help general good designs. Labels are short hand for abstracting models, compactly representing the essential features of the problem and its context.
 Eric Friedman, Paul Resnick, and Rahul Sami (2007). "Manipulation-Resistant Reputation Systems", Ch. 27 in Algorithmic Game Theory, (N.~Nisan, T.~Roughgarden, E.~Tardos, V.~Vazirani, editors),
Cambridge University Press, 2007.
 Paul Resnick and Rahul Sami (2007). "The Influence-Limiter: Provably Manipulation-Resistant Recommender Systems", Proceedings of the ACM Recommender Systems Conference.
Posted by jmm at November 26, 2008 10:38 AM