June 06, 2013

Everything can -- and will -- be manipulated

Well, not "everything". But every measure on which decisions of value depend (e.g., book purchases, dating opportunities, or tenure) can and will be manipulated.

And if the measure depends on user-contributed content distributed on an open platform, the manipulation often will be easy and low cost, and thus we should expect to see it happen a lot. This is a big problem for "big data" applications.

This point has been the theme of many posts I've made here. Today, a new example: citations of scholarly work. One of the standard, often highly-valued (as in, makes a real difference to tenure decisions, salary increases and outside job offers) measures of the impact of a scholar's work is how often it is cited in the published work of other scholars. ISI Thompson has been providing citations indices for many years. ISI is not so easy to manipulate because -- though it depends on user-contributed content (articles by one scholar that cite the work of another) -- that content is distributed on closed platforms (ISI only indexes citations from a set of published journals that have editorial boards which protect their reputation and brand by screening what they publish).

But over the past several years, scholars have increasingly relied on Google Scholar (and sometimes Microsoft Academic) to count citations. Google Scholar indexes citations from pretty much anything that appears to be a scholarly article that is reachable by the Google spiders crawling the open web. So, for example, it includes citations in self-published articles, or e-prints of articles published elsewhere. Thus, Google Scholar citation counts depends on user-contributed content distributed on an open platform (the open web).

And, lo and behold, it's relatively easy to manipulate such citation counts, as demonstrated by a recent scholarly paper that did so: Delgado Lopez-Cozar, Emilio; Robinson-Garcia, Nicolas; Torres Salinas, Daniel (2012). Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting. EC3 Working Papers 6: 29 May, 2012, available as http://arxiv.org/abs/1212.0638v2.

Their method was simple: they created some fake papers that cited other papers, and published the fake papers on the Web. Google's spider dutifully found them and increased the citation counts for the real papers that these fake papers "cited".

The lesson is simple: for every measure that depends on user-contributed content on an open platform, if valuable decisions depend on it, we should assume that it is vulnerable to manipulation. This is a sad and ugly fact about a lot of new opportunities for measurement ("big data"), and one that we must start to address. The economics are unavoidable: the cost of manipulation is low, so if there is much value to doing so, it will be manipulated. We have to think about ways to increase the cost of manipulating, if we don't want to lose the value of the data.

Posted by jmm at 11:09 AM | Comments (1) | Permalink »

May 27, 2013

Mining social data -- what is revealed?

Here is a recent article about high school students manipulating their Facebook presence to fool college admissions officers. Not terribly surprising: the content is (largely) created and controlled by the target of the background searches (by admissions, prospective employers, prospective dating partners etc) so it's easy to manipulate. We've been seeing this sort of manipulation since the early days of user-contributed content.

People mining user-contributed content should be giving careful thought to this. Social scientists like it when they can observe behavior, because it often reveals something more authentic than simply asking someone a question (about what they like, or what they would have done in a hypothetical situation, etc). Economists, for example, are thrilled when they get to observe "revealed preference", which are choices people make when faced with a true resource allocation problem. It could be that I purchased A instead of B to fool an observer, but there is a cost to my doing so (I bought and paid for a product that I didn't want), and as long as the costs are sufficiently salient, it is more likely that we are observing preferences untainted by manipulation.

There are costs to manipulating user-contributed content, like Facebook profiles, of course: some amount of time, at the least, and probably some reduced value from the service (for example, students say that during college application season they hide their "regular" Facebook profile, and create a dummy in which they talk about all of the community service they are doing, and how they love bunnies and want to solve world hunger: all fine, but they are giving up the other uses of Facebook that they normally prefer). But costs of manipulating user-contributed content often may be low, and thus we shouldn't be surprised if there is substantial manipulation in the data, especially if the users have reason to think they are being observed in a way that will affect an outcome they care about (like college admissions).

Put another way, the way people portray themselves online is behavior and so reveals something, but it may not reveal what the data miner thinks it does.

Posted by jmm at 02:39 PM | Comments (0) | Permalink »

April 25, 2011

User maybe-not-contributed content

Curiouser and curiouser.

Last week Jonathan Tasini, a free-lance writer, filed a lawsuit on behalf of himself and other bloggers who contributed -- well maybe not contributed -- their work to the Huffington Post site. His complaint is that Huffington Post sold itself to AOL for $315 million and did not share any of the gain with the volunteer -- well maybe not volunteer -- writers.

The lawsuit complaint makes fun reading, as these things go.

The main gripe (other than class warfare: it's unfair!) seems to be that HuffPo "lured" (paragraph 2) writers to contribute their work not for payment but for "exposure (visibility, promotion and distribution)", yet did not provide "a real and accurate measure of exposure" (paragraph 103). However, as far as I can see, there is no claim that HuffPo ever told its writers that HuffPo would not be earning revenue, nor a promise that it would provide any page view or other web analytic data.

How deceived was Tasini? He's no innocent. In fact, he volunteers (oops! there's that word again) in the complaint that he runs his own web site, that he posts articles to it written by volunteers, and that he earned revenue from the web site (paragraph 15). And he was the lead plaintiff in the famous (successful) lawsuit against the New York Times when it tried to resell freelance writer content to digital outlets (not authorized in its original contracts with the writers). And, gosh, though he was "lured" into writing for the HuffPo, and was "deceived" into thinking it was a "free forum for ideas", he didn't notice that they sold ads and were making money during the several years in which he contributed 216 articles to the site. That's a pretty powerful fog of deception! Maybe Arianna Huffington should work for the CIA.

Posted by jmm at 09:44 AM | Comments (1) | Permalink »

February 12, 2009

Work for the Patent Office, for free!

The Peer-to-Patent system created by Beth Noveck's group at NYU Law School and being piloted by the U.S. Patent Office has gotten a fair bit of attention. The basic idea is to gather user-contributed content from experts who can help patent examiners figure out whether a proposed invention is novel (no prior art). Anyone can submit comments on the posted patent proposals, and in particular can cite to evidence of prior art (which generally leads, if valid, to denial of the patent application). The purpose is to speed up patent reviews, and in particular to help prevent the granting of invalid patents, because it is often costly, time-consuming and chilling to later innovation to fight and prove a granted patent is invalid.

Andy Oram wrote an editorial in the Feb 2008 Communications of the ACM urging computer scientists to participate (viewing article may require subscription). He explained the system, and why it would be good for innovation for experts to donate their time to read and comment on patent applications.

Why would experts -- whose time is somewhat valuable -- want to do this? Andy argues that the primary reason is public service: donate to create a public good (better software patent system) for all. There are lots of ideas of things that would be "good for all" that require volunteer donations of time, effort, money. It's actually not a given that such public goods are a good idea: the value of a public good does not always or automatically exceed the cost of the time or other resources donated by the people who created it. The experts who Andy seeks to contribute to Peer-to-Patent are highly trained people whose time is generally valued quite highly. In any case, if P-to-P depends on volunteer contributions by experts, how likely is it to succeed? These are people who already feel deluged by requests to volunteer their time to referee conference and journal articles, advise students on projects, advice government, serve on department and university committees, serve on professional organization committees and edit journals, etc., etc. I know few serious, successful academics who work less than 50 or 60 hours a week already.

Andy also suggests another reason to volunteer time for Peer-to-Patent: the bad patent you block may save your startup company! Now we're talking....a monetary incentive to "volunteer" time. But this is a bit problematic too: it points out a strategic concern with P-to-P. Potential competitors, or entrepreneurs who at least want to use the disclosed invention, have an interest in trying to block patent applications, and may try to do so even if the invention is legitimate? They can flood the Patent Office with all sorts of "prior art", which may not be valid, but now the patent examiners will have more work to do. And just as patent examiners may conclude incorrectly that a patent application is valid, so may they conclude incorrectly that one is invalid. It's not prima facie obvious, especially given that those most motivated to "donate" time and effort are those who themselves have a financial stake in the outcome, that user-contributed content in this setting will be a good thing, on balance.

Posted by jmm at 05:29 PM | Comments (2) | Permalink »

November 26, 2008

New UCC opportunity, new opportunity for manipulation and spam

Google has made available a striking set of new features for search, which it calls SearchWiki. If you are logged in to a Google account, when you search you will have the ability to add or delete results you get if you search that page again, re-order the results, and post comments (which can be viewed by others).

But the comments are user-contributed content: this is a relatively open publishing platform. If others search on the same keyword(s) and select "view comments" they will see what you entered. Which might be advertising, political speech, whatever. As Lauren Weinstein points out, this is an obvious opportunity for pollution, and (to a lesser extent in my humble opinion, because there is no straightforward way to affect the behavior of other users) manipulation. In fact, she finds that comment wars and nastiness started within hours of SearchWiki's availability:

It seem inevitable that popular search results in particular will
quickly become laden with all manner of "dueling comments" which can
quickly descend into nastiness and even potentially libel. In fact,
a quick survey of some obvious search queries shows that in the few
hours that SearchWiki has been generally available, this pattern is
*already* beginning to become established. It doesn't take a
lot of imagination to visualize the scale of what could happen with
the search results for anybody or anything who is the least bit
controversial.

Lauren even suggests that lawsuits are likely by site owners whose links in Google become polluted, presumably claiming they have some sort of property right in clean display of their beachfront URL.

Posted by jmm at 10:27 AM | Comments (0) | Permalink »

November 10, 2008

Don't worry about contributed content: Wikipedia has figured it all out!

When I explain to people the fundamental ICD problem of motivating users to contribute content to a user-contributed content information resource, I often use Wikipedia as a familiar example: "Why do so many people voluntarily donate so much time and effort to research, write content, and copy edit and correct the content of others? That's a lot of unpaid work!"

Some people ask what the problem is, and why this needs academic research: "Wikipedia is doing great! They don't need to come up with clever incentives to motivate contribution." My reply: "Yes (maybe), but the point is, how do we create the next Wikipedia" (that is, another fabulously successful and valuable information resource dependent on all that volunteer labor)? What is the special sauce? Is it replicable?

Simson Garfinkel has an article in the current Technology Review that, indirectly, makes the point nicely. Yes, Wikipedia is fabulously successful...in some ways. But certainly not everyone thinks Wikipedia is that final word in online reference, such that we don't need to create any other reference resources. Simson focuses on "Wikipedia and the Meaning of Truth". Wikipedia's primary rule for admissible content is not that it be verifiably true (which would be diffcult to enforce, to say the least!), but that it be verifiably published somewhere "reliable".

That not everything in Wikipedia is correct is well-known, and not surprising. There are enthusiastic debates about whether it is as accurate as traditional encyclopedias, like Britannica. And so forth. The point is: many people want other types of reference resources as an alternative, or at least as a complement to Wikipedia. And thus the question: to build such a resource with user-contributed content, we need to motivate the users.

Some are trying to create more accurate, reliable alternatives, and they are not nearly as successful in getting contribution as Wikipedia has been. One of the interesting examples is Google's Knol, which is trying to establish greater reliability by having each topic "owned" by its original author (who may then permit and seek contributions from other users).

Do you think Wikipedia is the final word, forever, in online reference? If not, perhaps you should be wondering how to motivate users to contribute to other resources, and thinking about whether motivation is trivial now that Wikipedia has "figured it out".

Posted by jmm at 12:23 AM | Comments (0) | Permalink »

October 12, 2008

Why do people write Amazon book reviews?

As I've given talks and written the past couple of years about the motivation mysteries surrounding user-contributed content sites, I generally mention Amazon book reviews as a prominent example. It is not uncommon for over 100 people to review a popular book. And the top 10 reviewers (as of today) have each written more than 1600 reviews (leader Harriet Klausner is about to pass 17,500!).

Why? Not only is that a lot of time (allegedly) reading, but it's a lot of time writing...for the economic benefit of Amazon. What do reviewers get out of it?

One explanation for open source software contributions is that new programmers get professional experience on a team software engineering project, and their contributions are publicly documented so they can show them to potential employers. That might explain some reviewers on Amazon: they can show their reviews to an employer, and users rate them so they can show their scores too. But how many jobs are out there for book reviewers (and what about those with massive output who remain "amateur")?

Slate published Garth Hallberg's article in January 2008 (yes, I'm a bit behind in posting things to this blog!) that suggests the amateur reviewers may be motivated the old-fashioned way: through extrinsic, direct benefits. For example, apparently publishers send free copies of their books to prolific reviewers, so people who do want to read a lot get a lot of in-kind compensation. Grady Harp (#6) said he is "innundated". Amazon has extended this form of compensation by creating its Vine program, in which it selects successful reviewers and gives them free products from across its line of goods (electronics, appliances, etc.), as long as they write reviews (Amazon claims they do not influence opinions or modify or edit reviews).

(Thanks to Rick Wash for pointing me to the Hallberg article.)

Posted by jmm at 03:35 PM | Comments (0) | Permalink »

August 23, 2008

Good stuff in, bad stuff out

A fun ad from IBM that makes the point... (Thanks to Mark McCabe)

Posted by jmm at 12:07 AM | Comments (0) | Permalink »

July 25, 2008

You think you bought the music?

(This is not really an incentive design entry, just information economics more broadly. But too interesting to pass up.)

Yahoo! Music store announced yesterday it would be closing this fall. All that music you bought (well, not many people actually bought from Yahoo! Music, but still)? They are taking down the DRM servers in September, and your computer will not be able to "phone home" to get the key. The only solution: burn to CD (which of course, made DRM pretty ineffective in the first place). Apparently the same problem occurred when Microsoft and Sony announced the shuttering of their online music stores.

Conventional notions of "owning" property generally involve control over the use of that property in perpetuity (including transfer of ownership). When there are significant use restrictions and rights retained by the provider, it's licensing, not buying. This has been drummed into us over the years with software licenses (you can't take a copy of Windows off your old machine and install it on your new machine, for example). With music, I think the general sense is that we are buying it, not licensing it, however. Be that as it may, DRM imposes licensing-like restrictions, and apparently one of them is "you may not be able to listen to this music if we decide to shut down our service in the future."

Note to self: Finish burning backup CD copies of all of my iTunes music!

Posted by jmm at 07:51 AM | Comments (0) | Permalink »

April 12, 2008

Pollution as revenge

One of my students alerted me to a recent dramatic episode. Author and psychologist Cooper Lawrence appeared on a Fox News segment and made some apparently false statements about the Xbox game "Mass Effect", which she admitted she had never seen or played. Irate gamers shortly thereafter started posting (to Amazon) one-star (lowest possible score) reviews of her recent book that she was plugging on Fox News. Within a day or so, there were about 400 one-star reviews, and only a handful any better.

Some of the reviewers acknowledged they had not read or even looked at the book (arguing they shouldn't have to since she reviewed a game without looking at it). Many explicitly criticized her for what she said about the game, without actually saying anything about her book.

When alerted, Amazon apparently deleted most of the reviews. Its strategy apparently was to delete reviews that mentioned the name of the game, or video games at all (the book has nothing to do with video games). With this somewhat conservative strategy, the reviews remaining (68 at the moment) are still lopsidedly negative (57 one-star, 8 two-star, 3 five-star), more than I've ever noticed for any somewhat serious book, though there's no obvious way to rule these out as legitimate reviews. (I read several and they do seem to address the content of the book, at least superficially.)

Aside from being a striking, and different example of book review pollution (past examples I've noted have been about favorable reviews written by friends and authors themselves), I think this story highlights troubling issues. The gamers have, quite possibly, intentionally damaged Lawrence's business prospects: her sales likely will be lower (I know that I pay attention to review scores when I'm choosing books to buy). Of course, she arguably damaged the sales of "Mass Effect", too. Arguably, her harm was unintentional and careless (negligent rather than malicious). But she presumably is earning money by promoting herself and her writing by appearing on TV shows: is a reasonable social response to discipline her in her for negligence? (And the reviewers who have more or less written "she speaks about things she doesn't know; don't trust her as an author" may have a reasonable point: so-called "public intellectuals" probably should be guarding their credibility in every public venue if they want people to pay them for their ideas.)

I also find it disturbing, as a consumer of book reviews, but not video games, that reviews might be revenge-polluted. Though this may discipline authors in a way that benefits gamers, is it right for them to disadvantage book readers?

I wonder how long it will be (if it hasn't already happened) before an author or publisher sues Amazon for providing a nearly-open access platform for detractors to attack a book (or CD, etc.). I don't know the law in this area well enough to judge whether Amazon is liable (after all, arguably she could sue the individual reviewers for some sort of tortious interference with her business prospects), but given the frequency of contributory negligence or similar malfeasances in other domains (such as Napster and Grokster facilitating the downloading of copyrighted materials), it seems like some lawyer will try to make the case one of these days. After all, Amazon provides the opportunity for readers to post reviews in order to advance its own business interests.

Some significant risk of contributory liability could be hugely important for the problem of screening pollution in user-contributed content. If you read some of the reviews still on Amazon's site in this example, you'll see that it would not be easy to decide which of them were "illegitimate" and delete all of those. And what kind of credibility would the review service have if publishers made a habit of deciding (behind closed doors) which too-negative reviews to delete, particularly en masse. I think Amazon has done a great job of making it clear that they permit both positive and negative reviews and don't over-select the positive ones to display, which was certainly a concern I had when they first started posting reviews. But it authors and publishers can hold it liable if they let "revenge" reviews appear, I suspect it (and similar sites) will have to shut down reviewing altogether.

(Thanks to Sarvagya Kochak.)

Posted by jmm at 01:42 PM | Comments (0) | Permalink »

March 29, 2008

Keeping the good stuff out at Yahoo! Answers

This is, I think, an amusing and instructive tale. I'm a bit sorry to be telling it, because I have a lot of friends at Yahoo! (especially in the Research division), and I respect the organization. The point is not to criticize Yahoo! Answers, however: keeping pollution out is a hard problem for user-contributed content information services, and that their system is imperfect is a matter for sympathy, not scorn.

While preparing for my recent presentation at Yahoo! Research, I wondered whether Yahoo! Mail was still using the the Goodmail spam-reduction system (which is based on monetary incentives). I couldn't find the answer with a quick Google search, nor by searching the Goodmail and Yahoo! corporate web sites (Goodmail claims that Yahoo! is a current client, but there was no information about whether Yahoo! is actually using the service, or what impact it is having).

So, I thought, this is a great chance to give Yahoo! Answers a try. I realize the question answerers are not generally Yahoo! employees, but I figured some knowledgeable people might notice the question. Here is my question, in full:

Is Yahoo! Mail actually using Goodmail's Certified Email? In 2005 Yahoo!, AOL and Goodmail announced that the former 2 had adopted Goodmail's "Certified Email" system to allow large senders to buy "stamps" to certify their mail (see e.g., http://tinyurl.com/2atncr). The Goodmail home page currently states that this system is available at Yahoo!. Yet I can find nothing about it searching Yahoo!Mail Help, etc. My question: I the system actually being used at Yahoo!Mail? Bonus: Any articles, reports, etc. about its success or impacts on user email experience?

A day later I received the following "Violation Notice" from Yahoo! Answers:

You have posted content to Yahoo! Answers in violation of our Community Guidelines or Terms of Service. As a result, your content has been deleted. Community Guidelines help to keep Yahoo! Answers a safe and useful community, so we appreciate your consideration of its rules.

So, what is objectionable about my question? It is not profane or a rant. It is precisely stated (though compound), and I provided background context to aid answerers (and so they knew what I already knew).

I dutifully went and read the Community Guidelines (CG) and the Terms of Service (TOS), and I could not figure out what I had violated. I had heard elsewhere that some people did not like TinyURLs because it it not clear where you are being redirected, and thus it might be used to maliciously direct traffic. But I saw nothing in the CG or TOS that prohibited URLs in general, or TinyURLs specifically.

So I contacted the link they provided to appeal the deletion. A few days later I received a reply that cut-and-pasted the information from the Yahoo! Answers help page explaining why content is deleted. This merely repeated what I had been told in the first message (since none of the other categories applied): my content was in violation of the CG or TOS. But no information was provided (second time) on how the content violated these rules.

Another address was provided to appeal the decision, so I wrote a detailed message to that address, explaining my question, and my efforts to figure out what I was violating. A few days later, I got my third email from Yahoo! Answers:

We have reviewed your appeal request. Upon review we found that your content was indeed in violation of the Yahoo! Answers Community Guidelines, Yahoo! Community Guidelines or the Yahoo! Terms of Service. As a result, your content will remain removed from Yahoo! Answers.

Well... Apparently it's clear to others that my message violates the CG or the TOS, but no one wants to tell me what the violation actually is. Three answers, all three with no specific explanation. Starting to feel like I'm a character in a Kafka novel.

At this point, I laughed and gave up (it was time for me to travel to Yahoo! to give my -- apparently dangerous and community-guideline-violating -- presentation anyway).

I have to believe that there is something about the use of a URL, a TinyURL, or the content to which I pointed that is a violation. I've looked, and found many answers that post URLs (not surprisingly) to provide people with further information. Perhaps the problem is that I was linking to a Goodmail press release on their web site, and they have a copyright notice on that page? But does Yahoo! really think providing a URL is "otherwise make available any Content that infringes any patent, trademark, trade secret, copyright" (from the TOS)? Isn't that what Yahoo's search engine does all the time?

End of story.

Moral? Yahoo! Answers is a user-contributed content platform. Like most, that means it is fundamentally an open-access publishing platform. There will be people who want to publish content that is outside the host's desired content scope. How to keep out the pollution? Yahoo! uses a well-understood, expensive method to screen: labor. People read the posted questions and make determinations about acceptability. But, as with any screen, there are Type I (false negative) and Type II (false positive) errors. Screening polluting content is hard.

(My question probably does violate something, but surely the spirit of my question does not. I had a standard, factual, reference question, ironically, to learn a fact that I wanted to use in a presentation to Yahoo! Research. A bit more clarity about what I was violating and I would have contributed desirable content to Yahoo! Answers. Instead, a "good" contributor was kept out.)

Posted by jmm at 10:19 AM | Comments (5) | Permalink »

Presentation at Yahoo! Research on user-contributed content

Yahoo! Research invited me to speak in their "Big Thinkers" series at the Santa Clara campus on 12 March 2008. My talk was "Incentive-centered design for user-contributed content: Getting the good stuff in, Keeping the bad stuff out."

My hosts wrote a summary of the talk (that is a bit incorrect in places and skips some of the main points, but is reasonably good), and posted a video they took of the talk. The video, unfortunately, focuses mostly on me without my visual presentation, panning only occasionally to show a handful of the 140 or so illustrations I used. The talk is, I think, much more effective with the visual component. (In particular, it reduces the impact of the amount of time I spend glancing down to check my speaker notes!)

In the talk I present a three-part story: UCC problems are unavoidably ICD problems; ICD offers a principled approach to design; and ICD works in practical settings. I described three main incentives challenges for UCC design: getting people to contribute; motivating quality and variety of contributions; and discouraging "polluters" from using the UCC platform as an opportunity to publish off-topic content (such as commercial ads, or spam). I illustrated with a number of examples in the wild, and a number of emerging research projects on which my students and I are working.

Posted by jmm at 10:02 AM | Comments (0) | Permalink »

March 03, 2008

UCC incentives the old-fashioned way

Ben Kaufman announced Kluster at TED 2008. This is a business through which businesses can solicit user-contributed content: innovative technology or product ideas, business solutions, etc. Why would anyone give a for-profit company good innovation ideas? For a cash incentive...Business post challenges with a cash bonus, and Kluster has a scheme for paying out tha bonus to people whose ideas are successful. (It also runs a prediction market on the side for wagers on which of the proposed ideas will succeed.) No volunteers here: this UCC is compensated in the traditional form of tournament prizes.

Two similar businesses, at least, are already operating: InnoCentive and Cambrian House.

Think you're smart, but don't have time or capital to turn your ideas into businesses? Go sell your ideas online!

(Based on reporting in Putting Innovation in the Hands of a Crowd - New York Times)

Posted by jmm at 01:27 AM | Permalink »

March 02, 2008

Looking for (well-paid, highly-trained, very busy) volunteers

The Peer to Patent project is one of my favorite examples of a user-contributed content (UCC) project recently, not because it has been very successful (yet), but because it demonstrates the surprising and important ways that UCC may go to benefit society. It's no all Wikipedia and social networking!

Peer to Patent is a project started by Prof. Beth Noveck and her Do Tank group at NYU Law School. The US Patent Office adopted it for a one-year pilot starting 15 June 2007. It is a system to post patent applications for public comment, in particular seeking suggestions about possible prior art, to assist USPTO examiners determine whether a patent should be granted. It was motivated by a widely held sense, particular in the area of software and business process patents, that the USPTO has been overwhelmed with the number of applications and the advances in technology in recent years, and that many and patents have been granted which can have the effect of stifling new innovation. During the first six months of the pilot, over 1800 people have registered to participate, and over 150 prior art references have been submitted on 24 patent applications that can be reviewed through this system.

In the February 2008 issue of the Communications of the ACM, Andy Oram published a column about the project in which he discussed the incentives challenges that may stand in the way of success. First, of course, not just any user is likely to be able to make quality contributions: to be useful, a contributor must have serious expertise in the area of the patent in order to be able to understand the application well enough to recognize possible prior art, and must know the literature well enough to identify the prior art. That's not a lot of people, and they aren't the type who have a lot of underpaid hours to volunteer. Indeed, he quotes Jon Bentley of Avaya Labs who points out that the whole essence of patenting is making money, and that the people in the best position to contribute may be those least interested in doing so.

One of the hopes of the project is that it is the monetary incentive itself -- not provided by Peer to Patent, but indirectly -- that will induce people to contribute: competitors. That is, if some company is using technology on which a patent is proposed, or is developing something similar, it will have a financial interest in seeing that the patent is not granted. Thus, they might be the ones to put the time in to review the application and propose the prior art. Although they are interested parties, as Oram says "prior art is prior art no matter who finds it".

Interesting problem, and I'm looking forward to seeing whether or not Peer to Patent can succeed (and I hope it does, because I tend to think that too many software and business patent applications are approved).

Posted by jmm at 09:28 PM | Comments (1) | Permalink »

February 26, 2008

Encyclopedia of Life

The Encyclopedia of Life is a rather new project to create an online and every growing encyclopedia of the species of life on earth. This is just a narrow slice of what Wikipedia nominally covers (everything!), but its ambition highlights the fact that we should expect to see more and more specialized encyclopedias growing through user-contributed content (that is, Wikipedia cannot be a successful single source of knowledge).

The task: there are currently about 1.8 million known species; the relevant scientific community thinks there are about 18 million more to be discovered. And an encyclopedia does not merely name the speciies: it compiles a wide range of useful information (some of the current developed pages have 20 or more subheadings, multiple photographs, extensive bibliographies, even references to the species in literature; see, e.g., Eastern White Pine).

And what about the usual incentives problems: why contribute? what make the effort necessary for quality contributions? how to limit pollution? By focusing on a specialized and visible community of scientists, some of these problems may be smaller than in other settings. In particular, I expect that quality will be handled by a mix of contributors not wanting to look foolish to their colleagues, and other contributors delighting in showing how much better their knowledge is as they make quality-improving edits. The rewards are similar to the standard rewards of recognition and satisfaction with documenting and adding to knowledge that have kept academia moving for the past several hundred years.

Pollution and quality also will be managed, it appears, by having volunteer experts assigned as "curators", or moderators for each page. This is reminiscent of the method that seems to work well on Slashdot, for example.

But what about inducing contributions in the first place? The project's Executive Director, Prof. James Edwards, said

“We have not given enough thought to the people who provide the information on which the Encyclopedia of Life is built. “We are looking into ways to keep that community going.��?

(This quote and other material above from today's New York Times article on the EOL.)

Indeed, in a twist not often heard when talking about getting people to contribute to, say, Wikipedia, the founders are worried about the community dying off:

"The ranks of taxonomists — the scientists who describe species and revise old descriptions — have been shrinking steadily for decades. Dr. Wilson hopes the Encyclopedia of Life will foster the growth of that group."
(Carl Zimmer, NYT 26 Feb 2008.)

My student, Lian Jian, is currently working on a project to discover reasons why contributors "exit" (stop contributing to) Wikipedia. Here is a poster describing her preliminary work. As the many new and exciting user-contributed content projects mature, inducing a flow of new contributors to replace those who exit, and passing along the organizational memory and routines, will become important determinants of long-term success.

Posted by jmm at 10:08 AM | Comments (0) | Permalink »

January 13, 2008

All user-contributed, all the time (almost)

I've been fascinated for the past couple of years with businesses that rely on user-contributed content (UCC) for substantial inputs to production. It is sometimes jokingly referred to as the "Tom Sawyer business model": get your friends to whitewash the fence for you, without paying them (in fact, they paid Tom quite handsomely, including "a key that wouldn't unlock anything, a fragment of chalk...and a dead rat on a string"). Tom Sawyer's Whitewash

Randall Stross writes in today's New York Times about two fairly well-known businesses that have nearly perfected the art: Plenty of Fish, and Craigslist. Craigslist is a wide-open classified advertising service where employers post jobs, homeowners sell their old "Monopoly -- Star Wars Version" games and unwanted gifts, and, most piquantly, people of every shape, age, color and preference seek partners for a nearly infinite variety of polymorphously perverse, chaste and romantic interactions. Craigslist is one of the top 10 visited English language sites, has versions for 450 localities in over 50 countries, and runs with only 25 employees. All of the content is written, edited (such as it is) and maintained voluntarily by users; user volunteers also provide most of the customer service through help forums.

Plenty of Fish is more specialized and not quite as successful, but perhaps more remarkable. It is a dating service localized to 50 Canadian, US and Australian cities. Markus Frind created it and devotes only about 10 hours a week to running it...and he only in the past year hired his first employee. Yet the site has 600,000 registered users (which grows rapidly despite purging 30,000 inactives a month), and receives 50,000 new photos per day. Spam-filtering of text is done by software. Filtering of photos (to make sure they are human and clothed) is done by user volunteere: in the past year the top 120 volunteers scanned over 100,000 photos each! The users provide the customer service too, through help forums.

Great business model: have the users whitewash the fence, and you work 10 hours a week for $10 million in annual profits (Stross estimates that Frind's claim about his advertising-only profits is plausible). What are the generalizable principles. How can *I* start such a business and succeed (the road is littered with UCC-driven businesses that never turn a profit).

It is obvious that one of the most important questions is why? Why would users volunteer the time and effort to provide the content, the customer service, the photo filtering, etc.? You may think it's obvious why users want to visit Plenty of Fish: there are a lot of lonely hearts out there. And it is 100% free to users: Frind only charges advertisers. Of course, without user effort, it won't succeed: there will be no information about potential life partners, no help information, and lots of undesirable photos polluting the service. But no individual user needs to contribute anything: there is no requirement for volunteer hours (as there is at our local food coop), there is no public tracking of effort and peer pressure to pull your weight. It's a free-rider's dream.

Contributing content is easy: if you don't submit a profile you aren't going to get any dates. But what about photo scanning? Yes, you want to scan photos anyway: that's why you're there. But why not let someone else filter out the junk so you only have to filter the worthwhile photos? Is there that much of a first-mover advantage that you are willing to filter 100,000 photos per year to have a shot at being the first to contact the newest hunk? My guess is that the expected return on that investment is pretty low.

And why spend your time providing free help service to other users? Maybe Plenty of Fish is lucky to have a demographic for whom the value of time is unusually low (lonely single people with nothing else to do on Saturday night), but that just means the cost is lower to make the contribution: what is the benefit? Is it that the volunteer helpers are trying to be noticed as helpful, well-informed web geeks as a way of attracting dates?

I think the answers to these questions are transparently not obvious. If the answers were easy, we'd have a lot more people working 10 hours a week to make $10 million per year. And the answers are not likely to be something that involves only traditional economic views about incentives and motivations. Developing generalizable principles about the motivations for user-contributed content will surely need to draw on psychological explanations as well, from the psychology of personality and self, and social psychology (at least).

Posted by jmm at 11:03 AM | Permalink »

January 07, 2008

UCC search arrives...manipulation and pollution to follow soon

Jimmy Wales announced the release of the public "alpha" of his new, for-profit search service, Wikia Search. The service is built on a standard search engine, but its primary feature is that users can evaluate and comment on search results, building a user-contributed content database that Wikia hopes will improve search quality, making Wikia a viable but open (and hopefully profitable) alternative to Google.

Miguel Helft, writer for the New York Times was quick to note that such a search service might be quite vulnerable to manipulation:

Like other search engines and sites that rely on the so-called “wisdom of crowds,? the Wikia search engine is likely to be susceptible to people who try to game the system, by, for example, seeking to advance the ranking of their own site. Mr. Wales said Wikia would attempt to “block them, ban them, delete their stuff,? just as other wiki projects do.

The tension is interesting: Wikia promotes itself as a valuable alternative to Google largely because its search and ranking algorithms are open, so that users know more about why some sites are being selected or ranked more highly than others.

“I think it is unhealthy for the citizens of the world that so much of our information is controlled by such a small number of players, behind closed doors,? [Wales] said. “We really have no ability to understand and influence that process.?

But, although the search and ranking algorithms may be public, whether or not searches are being manipulated by user contributed content will not be so obvious. It is far from obvious which approach is more dependable and "open". Wikia's success apparently will depend on its ad hoc and technical methods for "blocking, banning and deleting" manipulation.

Posted by jmm at 09:23 AM | Permalink »

March 24, 2007

Getting good stuff in: Participation Inequality

An interesting phenomenon, noted by many, is that most content in user-contributed content venues (including online communities that focus more on "community" than on creating a durable information resource) is provided by a small fraction of users. Many have documented that participation in a wide variety of voluntary settings follows a power law (that the amount of contribution decreases proportional to 1 over the rank of the contributor).

Jakob Nielsen offers a nice summary including some historical references:

In most online communities, 90% of users are lurkers who never contribute, 9% of users contribute a little, and 1% of users account for almost all the action.

This sort of participation inequality has been seen in online communities, Amazon book reviews, Wikipedia edits, blogs, and peer-to-peer file sharing networks, to name a few venues (for the latter, see E. Adar and B. Huberman, "Free Riding on Gnutella", First Monday, 5 (2000), and S. Sariou, P. Gummadi, and S. Gribble, "A measurement study of peer-to-peer file sharing
systems", Multimedia Computing and Networking (January 2002).)

Inequality isn't directly an issue of "getting good stuff in", as much as about getting stuff in at all. But of course, the quality of contributions is going to depend on who is motivated to contribute, not just how many contributors there are. Thus, the problem of getting critical mass for a user-contributed content service is not just getting enough contributors, but getting the right contributors.

Posted by jmm at 01:49 PM | Permalink »

March 17, 2007

Making a business of user-contributed content

Just how important is it to manage the incentive-centered design problems for a user-contributed content web site? If you want to make a business out of it, crucial. There is not a lot of advertising revenue to be made (yet) by most online services: to succeed, you need to keep your costs very low and your page views high. ICD is critical to both.

Dan Mitchell today wrote in the New York Times about the economics of advertising-supported web sites. He reports on a study by venture capitalist Jeremy Liew, "Three ways to build an online media business to $50m in revenue", plus a followup. Liew argues that to make $50 million in revenues, you need to get to Top 10, 25 or 125 levels of US website traffic (depending on the demographic of your viewers, which affects what advertising you can deliver). The basic implication: few sites will make major revenues from advertising, and to do so is very hard.

And, so, the payoff to terrific ICD; actually the two payoffs: higher quality (more traffic) at lower cost. To get significant advertising revenues you'll need consistent high quality to attract visitors and keep them returning. That means getting good stuff in, and keeping bad stuff out. And, given the modest advertising revenues that are available (much less than $50 million is going to be possible for most sites), you'll want to keep costs down, which means getting most of your labor for free (user-contributed content) without a lot of expensive editorial staff or other interventions.

As an example of the possibility of keeping down the cost side, the Wikimedia Foundation (which owns Wikipedia) currently has less than ten full-time employees.


Posted by jmm at 08:47 AM | Permalink »

February 12, 2007

Pay as much as you want?

I stumbled on a couple of online businesses that are trying a variant on shareware pricing for content and services: "pay as much as you want".

The first is Magnatune, which sells music from selected independent artists. You can stream for free (similar to various Internet "radio" stations). If you want to download a "CD" of music from an artist, Magnatune suggests a price (typically about $8), but you can pay as little as $5 or as much as $18 (they don't seem to allow you to pay more). In a USA Today story, the CEO claims that the average price paid is $8.93, not the minimum $5 allowed.

The second is LibraryThing. If you don't know it, LibraryThing is a favorite in some SI circles: it's an online book social network / book cataloguing service. You enter books you own (with the ability to keyword search a dozen or more online catalogues, like Amazon's, and the University of Michigan Library's, to get all of the bibliographic information), tag them, share or not with others, browse through others' collections via tags, etc. LibraryThing announced on Saturday that it was intrigued by Magnatune's idea, LibraryThing charges $10 for one year, or $25 lifetime, but now "suggests" those amounts but lets you choose from a range ($19 to $55 for lifetime accounts).

Shareware is somewhat different: you get to use the shareware with or without restrictions (e.g., some functions are crippled) without paying anything, then if you want to pay (usually for an unrestricted version), the price is usually fixed. Freeware is a bit closer: lots of freeware "suggests" a donation. However, with Magnatune and LibraryThing you have to pay something to get the goods or services, but the amount is (within bounds), up to the buyer.

An interesting experiment in altruism / guilty consciences (or perhaps peer pressure or peer regard? Magnatune and LibraryThing know who you are!).

Posted by jmm at 11:15 PM | Comments (0) | Permalink »

January 31, 2007

Wikipedia is in trouble

I'm going out on a limb here: unless Wikipedia comes up with a coherent contribution policy that is consistent with the economic value of its content, it will start to deteriorate.

In a widely published Associated Press story, Brian Bergstein reports that Jimmy Wales, Wikipedia founder, Board Chair Emeritus, and currently President of for-profit Wikia, Inc., blocked the account of a small entrepreneur, Gregory Kohs, who was selling his services to (openly, with attribution) write Wikipedia articles about businesses. Wales reportedly told Kohs that his MyWikiBiz was "antithetical to Wikipedia's mission", and that even posting his stories on his personal page inside Wikipedia so independent editors could grab them and insert them in the encyclopedia was "absolutely unacceptable".

Before I get into my dire forecast, what is antithetical about someone who is paid as a professional writer to prepare content, especially if he is open about that fact? There are three "fundamental" Wikipedia editorial policies with which all contributions must comply:

  1. Neutral point of view (NPOV)
  2. Verifiability
  3. No original research

The first two are relevant here. NPOV means all content "must be written from a neutral point of view (NPOV), representing fairly and without bias all significant views." Verifiability means "any reader should be able to check that material added to Wikipedia has already been published by a reliable source." Kohs stated in his corporate materials that he is committed to compliance with these two policies: he would prepare the content for interested parties, but it would be neutral and verifiable. Of course, on any particular contribution other editors might disagree and choose to revise the content, but that is the core process of Wikipedia.

The problem is deep: arguably all contributors have a subjective (non-neutral) point of view, no matter how much they may wish, and believe otherwise. What is rather remarkable about Wikipedia is how well the group editing process has worked to enforce neutrality (and verifiability) through collective action. In any case, there is no clear reason to believe a paid professional writer is going to be systematically non-neutral any more or less than a volunteer writer.

In part, this is just a simple statement about incentives. A reasonable starting point is to accept that everyone who makes the effort to research and write material for Wikipedia is doing it for some motivating reason. Research and writing take time away from other desirable activities, so unless the writer is consistently irrational, she by revealed preference believes she is getting some benefit out of writing greater than the opportunity cost of the foregone time. It follows directly that point of view might be biased by whatever is motivating a given writer. To believe otherwise is naive. Dangerously naive, for the future of Wikipedia.

Even if the "everyone is motivated by someone" argument is too subtle for some true believers in massive social altruism, there is an obvious problem with Wikipedia's position on Gregory Kohs: surely there are many, many writers who are being paid for time and effort they devote to Wikipedia, but who are not being open about it. For example, employees of corporations, non-profits, educational institutions, etc., asked to maintain a Wikipedia entry on the corporation, who do so from an IP address not traceable to the corporation (e.g., from home). We already know from past experience that political operatives have made sub rosa contributions.

So, the problem of distinguishing between a priori neutral and a priori non-neutral contributors is deep and possibly not amenable to any reasonably effective solution. This is a fundamental problem of hidden information: the contributor knows things about her motivations and point of view that are not observable by others. Rather, others can only infer her motivations, by seeing what she writes, and at that point, the motivations are moot: if her content is not neutral or verifiable, other editors can fix it, and if she systematically violates these principles, she can be banned based on what she did, not who she purports to be.

Indeed, given the intractability of knowing the motivations and subjective viewpoints of contributors, it might seem that the sensible policy would be to encourage contributors to disclose any potential conflicts of interest, to alert editors to be vigilant for particular types of bias. This disclosure, of course, is exactly what Kohs did.

And now, for my prediction that Wikipedia is in trouble. Wikipedia has become mainstream: people in all walks of life rely on it as a valuable source of information for an enormous variety of activities. That is, the content has economic value: economic in the sense that it is a scarce resource, valuable precisely because for many purposes it is better than the next alternative (it is cheaper, or more readily available, or more reliable, or more complete, etc.). Having valuable content, of course, is the prime directive for Wikipedia, and it is, truly, a remarkable success.

However, precisely because the content has economic value to the millions of users, there are millions of agents who have an economic interest in what the content contains. Some are interested merely that content exist (for example, there are not many detailed articles about major businesses, which was the hole that Kohs was trying to plug). Others might want that content to reflect a particular point of view.

Because there is economic value to many who wish to influence the content available, they will be willing to spend resources to do the influencing. And where there are resources -- value to be obtained -- there is initiative and creativity. A policy that tries to ex ante filter out certain types of contributors based on who they are, or on very limited information about what their subjective motivations might be, is as sure to be increasingly imperfect and unsuccessful as is any spam filtering technology that tries to set up ex ante filtering rules. Sure, some of this pollution will be filtered, but there will also be false positives, and worse, those with an interest in influencing content will simply find new clever ways to get around the imperfect ex ante policies about who can contribute. And they will succeed, just as spammers in other contexts succeed, because of the intrinsic information asymmetry: the contributors know who they are and what their motivations are better than any policy rule formulated by another can ever know.

So, trying to pre-filter subjective content based on extremely limited, arbitrary information about the possible motivations of a contributor will just result in a spam-like arms race: content influencers will come up with new ways to get in and edit Wikipedia, and Wikipedia's project managers will spend ever increasing amounts of time trying to fix up the rules and filters to keep them out (but they won't succeed).

This vicious cycle has always been a possibility, and indeed, we've seen examples of pollution in Wikipedia before. The reason I think the problem is becoming quite dangerous to the future of Wikipedia its very success. By becoming such a valuable source of content, content influencers will be willing to spend ever increasing amounts to win the arms race.

Wikipedia is, unavoidably (and hooray! this is a sign of the success of its mission) an economic resource. Ignoring the unavoidable implications of that fact will doom the resource to deteriorating quality and marginalization (remember Usenet?).

Ironically, at first blush there seems to be a simple, obvious alternative right at hand: let Wikipedia be Wikipedia. The marvel of the project is that the collective editorial process maintains very high quality standards. Further, by allowing people to contribute, and then evaluating their contributions, persistent abusers can be identified and publicly humiliated (as Jimmy Wales himself was when he was caught making non-neutral edits to the Wikipedia entry about himself). Hasn't Wikipedia learned its own key lessons? Let the light shine, and better the devil you know.

(Wikipedia itself offers an enlightening summary of the battle of Kohs's efforts to contribute content. This summary serves to emphasize the impossibility of Wikipedia's fantasy of pre-screening contributors.)

Posted by jmm at 12:29 AM | Comments (4) | Permalink »

January 25, 2007

Good in or bad out?

In his New York Times Circuits Newsletter, David Pogue writes about Microsoft's recent gift of $2200 laptops to about 90 bloggers who write about technology -- laptops loaded with about-to-be-released Vista and Office 2007.

Reviewers need access to the technology they are reviewing, but as Pogue notes, MS could lend the computers.

But I'm more interested in the general point Pogue makes: we live in a culture in which most journalists are trained, and managed by editors who direct them to adhere to ethical guidelines that among other things prohibit accepting gifts from subjects of stories and reviews presented as objective. But technology is moving faster than culture, and a whole new class of influential communicators has emerged -- bloggers -- who for the most part are not trained or managed to follow a specific code of ethics.

If bloggers want durable credibility and success, the culture (theirs and the greater context in which they are embedded) will need to evolve practices and standards that establish and maintain trust. Without trust -- especially at blogs that specialize in providing information for costly decisions, like purchasing consumer electronics and software -- bloggers will lose their audiences. The speed of the development of reliable practices and reputation mechanisms may determine which parts of the blogosphere succeed, and whether much of it degenerates into a morass of spam-like paid (but disguised) product placement announcements.

Posted by jmm at 04:10 PM | Permalink »

December 24, 2006

Spamming Web 2.0

The New York Times today ran a short note highlighting CNET's story about commercial spamming of Digg.com and similar sites. There are companies being paid upwards of $15,000 to get a product placed on the front page of Digg, and most recently a top 30 Digger admitted that he entered an agreement to help elevate a new business to the front page of Digg (and solicited the other top 30 Diggers to participate).

The world was pretty darned excited when it discovered email (for most people, in the early 1990s). Spam followed in a big way within a year or two. It's clear to me that we're on the same trajectory with user-contributed content sites on the Web. There is an ever-increasing need for incentive-centered designs to help keep the bad stuff out.

Posted by jmm at 08:17 AM | Permalink »

December 17, 2006

Did someone mention user-contributed content?

Well, if Time magazine recognzies user-contributed content (Person of the Year: You) is the next big thing, maybe it is (or maybe it's on the way out).

Time recognizes there are incentives issues:

Who are these people? ... Who has that time and that energy and that passion? ... Sure, it's a mistake to romanticize all this any more than is strictly necessary. Web 2.0 harnesses the stupidity of crowds as well as its wisdom.

Posted by jmm at 02:49 PM | Permalink »

November 29, 2006

Research presentation: Web 2.0 and ICD

On 20 Nov 06 I gave an invited plenary Association Lecture at the Southern Economic Association Annual Conference in Charleston, SC. The title was "Getting the good stuff in, keeping the bad stuff out: Incentives and the Web". Here are the slides (not PowerPoint!).

In this talk geared to professional economists I explained the user-contributed content explosion that is one characteristic of so-called Web 2.0, and showed that this is happening through all phases of information production, organization, retrieval and use. I then discussed three fundamental economic issues that arise with user-contributed content: getting the good stuff in (private provision of public goods); keeping the bad stuff out (pollution); and evaluating the stuff (signaling, reputation). Familiar topics to the hordes who read this blog!

I finished with a simple elaboration to illustrate how ICD methods could be used to design mechanisms for dealing with these problems. The model is based on an event that occurred last spring on Digg.com.

Posted by jmm at 11:33 PM | Comments (0) | Permalink »

September 07, 2006

Digg changes algorithm to help keep bad stuff out

Is Digg rigged? JP thinks so. He offers an informal analysis indicating that several "top 30" Digg users cross-dig each other's posts frequently, which could mutually contribute to them staying in the top group of users.

For this post, I don't really care if there is a cartel of Digg users coordinating self-promotion. The story for ICDers is that Digg changed its system to try to reduce the possibility of this type of pollution.[1]

The announced goal is to "weigh a diversified group of Diggers more heavily than groups acting together." According to Digg co-founder Kevin Rose,

This algorithm update will look at the unique digging diversity of the individuals digging the story. Users that follow a gaming pattern will have less promotion weight. This doesn't mean that the story won't be promoted, it just means that a more diverse pool of individuals will be need [sic] to deem the story homepage-worthy.
As Rose notes, keeping the bad stuff out (the pollution problem I regularly discuss) is a well-known and ongoing challenge to a user-contributed content community: "we have learned a lot about the user base and how to defend digg from spam, artificial diggs, and digg fraud. It's a battle we will continue to fight and one that we don't take lightly" (id).


Notes

1There are interesting questions about the conditions under which it is in one's self-interest to cooperate with a cartel, and what the enforcement mechanisms are that enable this. In fact, my first published article concerned conditions causing international mineral cartels of the past century to succeed or fail: MacKie-Mason, Jeffrey K. and Robert S. Pindyck, "Cartel Theory and Cartel Experience in International Minerals Markets," in Energy: Markets and Regulation, Richard L. Gordon, Henry D. Jacoby and Martin B. Zimmerman, eds. Cambridge: MIT Press, 1987: 187-214.)

Posted by jmm at 02:17 AM | Comments (0) | Permalink »

August 05, 2006

Getting good -- not just more -- stuff in at Wikipedia

At the three day Wikimania conference this week, Wikipedia founder Jimmy Wales urged contributors to start focusing more on quality than on quantity.

Interesting incentives problems. The article count is a very visible sign of group accomplishment, and individuals can also make verifiable claims about the number of articles for which they were the initial creator. But what reward is there for improving the quality of entries? This seems like a case where the difference between intrinsic and extrinsic incentives may be important, particularly if designers want to induce contributors to shift along the quality-quantity axis in user-contributed content resources. Surely some, perhaps most of the rewards for quantity of contribution are also intrinsic, but for the designer, it might be easier to tweak the extrinsic rewards, disadvantaging quality.

Posted by jmm at 11:01 AM | Comments (0) | Permalink »

July 20, 2006

Incentives the old fashioned way: hard cash for user-contributed content

TechCrunch posted a story that AOL is coming on strong with incentives for user-contributed content:

A little known Digg-fact is that a relatively small group of users submit a large percentage of the stories that end up on the Digg home page. Netscape, which recently relaunched as a Digg-clone, wants to pay those top users to switch over to them. Jason Calacanis, who runs the Netscape property, wrote a post earlier today offering to pay top Digg users $1,000 a month or more to switch to Netscape and submit news there instead.

I don't find this surprising, and I expect we'll see more of it. We already know that many "information portals" pay people to provide ripped off or unreliable content to draw viewers for their ads. And of course, paying writers for content which earns ad bucks is the standard business model for magazines and newspapers. Little surprise that "new media" newspapers (like the new Netscape) are trying the same incentive design.

Also interesting, but again probably not too surprising, that the majority of successful contributions to digg.com come from a small group of users: 20% of stories from the top 20 contributors; a bit over half from the top 100 contributors.

Posted by jmm at 05:33 PM | Permalink »

June 17, 2006

Open access to Wikipedia less open

Growing Wikipedia Revises Its 'Anyone Can Edit' Policy - New York Times (registration required).

Wikipedia "organically" grows its content through the contributions of volunteers. Most of the work appears to be done by about 1000 core participants, but for the most part, anyone on the planet with Internet access can edit, delete or add content. Until recently...

This past winter, the media drew attention to a few glaring examples of subjective Wikipedia manipulation. One of the first publicized was a joke reference to John Seigenthaler Sr. as a participant in the assassinations of both John and Robert Kennedy. But then folks started noticing that politicians' pages were burnished, and other dubious content.

Now, in response, the informal Wikipedia management has implemented rules that put some limits on who may edit an article, and when. A small number of articles are "protected" so that no one can edit them (except the management team?) until further notice. Another group are "semi-protected" so that only users who have been registered for at least four days may edit. (Both protected and semi-protected articles are labeled in color at the top of the entry so that readers are warned about the controversy and the likelihood that some content is unreliable.)

For ICD, it is the latter category that is interesting. The "protected" articles simply are not open-access user-contributed content during the time that they are protected; they have become a traditional proprietary content vehicle. But the "semi-protected" articles are still open, with an incentive mechanism to discourage inappropriate editing. There are a couple of elements to the mechanism, and various ways in which these elements may interact with incentives:

Registration Only registered users may edit a semi-protected article. Since there is no identity authentication when an account is created, this mostly serves as a small cost imposed on users who want to edit. The pseudonymous identity can be used to build a positive reputation through repeated good contributions, but is not likely to do much to discourage bad contributions since the pseudonym can be abandoned. (I think, however, that the Wikipedia system also tracks the IP address that editors are using.)
Waiting period New pseudonyms have to wait four days before editing semi-protected articles. This cooling off period seems intended primarily to slow down "revert wars" when two opposing camps rapidly re-edit a controversial page. This kind of pollution is annoying, but probably not too dangerous since it is pretty evident to anyone who is making serious use of an article. The cooling off also may be enough of a time cost to discourage casual polluters who think it would be fun to manipulate an article on a controversial topic.


The NYT published a list of articles currently under editing restrictions.

Posted by jmm at 10:23 AM | Permalink »

April 08, 2006

Polluting user-contributed reviews

A recent First Monday article by David and Pinch (2006) documents an interesting case of book review pollution on Amazon. A user review of one book critically compared it to another. Immediately following a "user" entered another review blatantly plagiarizing a favorable review of the first book, and further user reviews did additional plagiarizing.

When the author of the first book discovered the plagiarism, he notified Amazon which at the time had a completely hands-off policy on user reviews, so it refused to intervene even for blatant plagiarism. (The policy since has changed.) Another example of the problem of keeping bad quality contributions out.

David and Pinch remind us that when an Amazon Canada programming glitch revealed reviewer identities,

a large number of authors had "gotten glowing testimonials from friends, husbands, wives, colleagues or paid professionals." A few had even 'reviewed' their own books, and, unsurprisingly, some had unfairly slurred the competition.

David and Pinch address the issue of review pollution at some length. First, the catalogue six discrete layers of reputation in the Amazon system, including user ratings of reviews by others, and a mechanism to report abuse. Then they conducted an analysis of 50,000 reviews of 10,000 books and CDs. Categories of review pollution they identified automatically (using software algorithms):

They also make an interesting point about the arms-race limitations of technical pollution screens:

The sorts of practices we have documented in this paper could have been documented by Amazon.com themselves (and for all we know may have indeed been documented). Furthermore if we can write an algorithm to detect copying then it is possible for Amazon.com to go further and use such algorithms to alert users to copying and if necessary remove material. If Amazon.com were to write such an algorithm and, say, remove copied material, this will not be the end of the story. Users will adapt to the new feature and will no doubt try and find new ways to game the system.

Posted by jmm at 02:35 PM | Comments (0) | Permalink »

Keeping bad stuff out: Making a play on social news sites?

About a month ago, some rumors were about that Google was about to acquire Sun Microsystems. The news got hot when blog stories claiming an acquisition was imminent were promoted to the front page on community/social new site Digg.com. It pretty quickly became clear that the rumors were largely unfounded. What hasn't been quickly resolved is whether or not someone tried to manipulate Digg, possibly to cash in on speculative trading in Google or Sun stock.

The basic idea is simple: get enough shill users to vote for a financially-significant rumor to promote it to the front page, thus automatically getting more widespread attention, and hope that the burst of attention causes a temporary stock price adjustment that can be exploited. (For example, in an acquisition the price of Sun would almost surely increase, and thus gullible readers might start buying it and bidding it up; the scam artist could purchase shares in advance to sell at the inflated price, or sell it short at the bubble price and collect when price returns to normal.)

Digg claims that it almost surely was not manipulated, but it seems clear that such manipulation is possible in user-contributed content news sites. Recall how Rich Wiggins found that people could get flim-flam press releases fed into Google News (here and here), and how authors using pseudonyms have promoted their own books with favorable "reviews" on Amazon.com.

It appears that in the past Digg has been manipulated (though apparently as an experiment, not to manipulate stock prices).

Posted by jmm at 02:05 AM | Comments (0) | Permalink »

April 05, 2006

Didn't keep the bad stuff out

Here's a user-contributed content service that sure didn't keep the bad stuff out (from their point of view!). Chevy set up an interactive marketing site at which you could create your own customized version of a video commercial for one of their SUVs. A few users had some fun with the freedom. (Good chance the links won't last at the Chevy site for long -- I'll try to find another source when they disappear.)
[link][link][link]

Posted by jmm at 10:22 PM | Comments (0) | Permalink »

March 15, 2006

i-Newswire is out, that's who

A couple of days after Rich Wiggins posted his blog story about the ability to place false news stories in Google News, CNN has picked up the story, and Google has now dropped i-Newswire as a source for Google News.

i-Newswire was a user-contributed content (UCC) service, and thus subject to the pollution problem I've been discussing (link and link). More precisely, i-Newswire is an un-moderated or un-edited UCC service (all press release newswires rely on user-contributed content, but most employ editors to decide whether press released are legitimate).

Google News, on the other hand, is not a UCC, and is edited: there is central control over which content feeds are included. So, in a crude way, Google can handle the pollution problem: if pollution is coming in through channel A, turn channel A off. Google News may be a case where a technological pollution prevention approach will work pretty well, obviating the need for an incentive system.

Posted by jmm at 10:25 PM | Comments (0) | Permalink »

March 14, 2006

Digg, Google News...User-contributed "news"

I'm developing an interest in the phenomenon of user-contributed content, and the two fundmental incentives problems that it faces: pollution (keeping the bad stuff out) and the private provision of public goods (inducing contributions of the good stuff). User-contributed "news" is one example to explore.

Digg.com is one currently hot user-contributed news site:

Digg is a technology news website that combines social bookmarking, blogging, RSS, and non-hierarchical editorial control. With digg, users submit stories for review, but rather than allow an editor to decide which stories go on the homepage, the users do.

Slashdot of course is the grande dame. Digg and Slashdot both rely on multiple techniques of community moderation to try to maintain the quality of content (keep out the pollution). For example, proposed stories for Digg are not promoted to the homepage until they have sufficient support from multiple users; and users can report bad entries (apparently to a team of human editors).

How effective (and socially costly) are these community moderation techniques? By now we've all heard about Wikipedia founder Jimmy Wales manipulating his own Wikipedia entry, which led to publicity about multiple members of Congress, etc., who have been doing the same thing.

And even if a site has an efficient moderation system to filter out pollution, there is still the problem of inducing people to volunteer time and effort to contribute to the public good by creating valuable content. Obviously, this can happen (see Slashdot, Wikipedia). But suppose you are designing a new user-contributed content service: how are you going to create a community of users, and how are you going to induce them to donate (high quality) content?

Apparently we can now start to count Google News as a site for user-contributed news.

Posted by jmm at 08:18 AM | Comments (0) | Permalink »

Spamming Google News: Who's in, who's out?

An old acquaintance of mine, Rich Wiggins, recently blogged about his discovery of how easy it is to insert content in Google News. He discovered this when he noticed regular press releases published in Google News that were a front for the musings of self-proclaimed "2008 Presidential contender" Daniel Imperato. Who?

Wiggins figured out how Imperato did it, and tested the method by publishing a press release (screen shot) about his thoughts while celebrating his 50th birthday in Florida. Sure enough, you can find this item by searching on "Rich Wiggins" in Google News.

This is (for now) a fun example of one of the two fundamental incentives problems for important and fast-growing phenomenon of user-contributed content:


  1. How to keep the undesirable stuff out?
  2. How to induce people to contribute desirable stuff?

The first we can call the pollution problem, the second the private provision of public goods problem. Though Wiggins example is funny, will we soon find Google News polluted beyond usefulness (the decline of the Usenet was largely due to spam pollution).

Blogs, of course, are a major example of user-contributed content. At first glance, they don't suffer as much from the first problem: readers know that blogs are personal, unvetted opinion pages, and so they don't blindly rely on what is posted as truth. (Or do they?) But then there's the problem of splogging, which isn't really a problem for blogs as much as for search engines that are being tricked into directing searchers to fake blog pages that are in fact spam advertisements (a commercial variant on the older practice of Google bombing).

There is a lengthy and informative Wikipedia article that discusses the wide variety of pollution techniques (spamming) that have been developed for many different settings (besides email and blogs, also instant messaging, cell phones, online games, wikis, etc.), with an index to a family of detailed articles on each subtype.

Posted by jmm at 07:44 AM | Comments (0) | Permalink »

March 03, 2006

Web 2.0 vulnerabilities

Wired ran an article last fall about vulnerabilities becoming apparent in various "Web 2.0" applications (whatever those are). Some are similar to spam in email: for example, splogs (fake blogs created to attract search engine interest and drive viewers to see their Viagra ads).

Many interesting social computing applications have enough openness that they are vulnerable to misuses and manipulations. A traditional approach is to develop technical means to close or limit the vulnerabilities (like filters for spam). We know that the inevitable trade-off between the benefits (even necessity) of some degree of openness for social applications and the resulting vulnerability means technical solutions are unlikely to be 100% satisfactory. That leaves open the room for incentive-based mechanisms to discourage misuse of social computing applications, like the various payment schemes proposed to fight spam. What incentive scheme might reduce splogging, for example?

For many social computing applications, financial incentive schemes may be undesirable, suggesting a growing need to develop effective non-pecuniary incentive mechanisms.

Posted by jmm at 10:56 PM | Comments (0) | Permalink »

March 01, 2006

Ad-supported web affects content creation incentives

For years I've been interested in the way that the financial model for information content affects the incentives to create, the quality and the diversity of the content.

The basic point is simple: if readers are paying for content (buying a book, subscribing to a service), then presumably the creators are trying to create content valuable to the readers. In the other leading model, advertisers pay for content: then presumably creators are trying to create content that attracts the attention of readers, but isn't necessarily of high value to them.

Does this explain the quality difference between typical broadcast TV shows and the subscriber content like "The Sopranos" and "Six Feet Under"?

I, together with a couple of students, gathered a lot of facts and notes on this topic several years back, but I haven't written much on it. (The idea shows up in passing in a couple of my scholarly articles.)

The Wall Street Journal published a nice column illustrating just this point for current web site content creation.

"If there is a topic in the news, people will be searching on it. If you can get those searchers to land on a seemingly authoritative page you've set up, you can make money from their arrival. Via ads, for instance. Then, to get your site ranked high in search engines, it's best to have 'original content' about whatever the subject of your site happens to be. The content needs to include all the keywords that people might search for. But it can't be just an outright copy of what's on some other site; you get penalized for that by search engines."
The WSJ author contracted as a freelance writer to create content for a site, and found that the assignment was primarily to cut-and-paste content from elsewhere with enough changes to fool the search engines.

I think there are some important ICD opportunities here for people thinking about creating content portals and other information services.

(Thanks to Rick Wash for pointing me to this column.)

Posted by jmm at 10:50 PM | Comments (0) | Permalink »