Can You Trust The Wikipedia?

Trust – reliance on the integrity, strength, ability, surety, etc., of a person or thing; confidence.

How confident are you that a particular Wikipedia page has reliable information? How sure are you in the ability of all the people who may have edited that page? Thanks to Luca de Alfaro and colleagues at the University of California, Santa Cruz you may soon be able to know which parts of a given Wikipedia page you can and can’t trust.

de Alfaro and his team are developing software that will color text different hues of orange if the text may be less than trustworthy. The deeper the orange the less you may want to trust the particular text. You can see a demo of the software by visiting the Wikipedia trust coloring demo page. Some pages are pretty clean so you may have to click to view a few random pages before you really see much in the way of orange.

It works by first evaluating the reputation of the author.

We compute the reputation of Wikipedia authors according to how long their contributions last in the Wikipedia. Specifically, authors whose contributions are preserved, or built-upon, gain reputation; authors whose contributions are undone lose reputation.

and then using that reputation to compute the trust of each word of each revision.

We compute the trust value of each word of a revision according to the reputation of the original author of the word, as well as to the reputation of any authors that have edited the page, especially if the edit is in the proximity of the word.

de Alfaro’s goal isn’t to show the Wikipedia shouldn’t be trusted, but rather to build trust trough transparency so that “nobody can single-handedly modify information without some traces of that being available for some time afterwards.”

After looking at a few random pages some pages show more shades of orange than others. I’m not always sure what to make of the orange text in all cases. Some edits were clearly made for grammatical reasons and while a given word or two shows a deep orange removing the words doesn’t alter the facts at all. In other places the text is a little more tied into the ‘facts’ of the page and seeing them with an orange background does make you question the snippet a little.

The trust coloring probably isn’t perfect, but the team is still fine-tuning the algorithms and the results will likely improve with time.

Combined with the recent release of the Wikiscanner, which shows how many edits a particular user of IP address has made the coloring software should help inspire more trust in Wikipedia entries or at the very least show which pages should perhaps be looked at with a grain of skepticism. It should also hold authors more accountable.

No matter what you think of the Wikipedia the trust coloring is interesting and might make you think a little differently about what you can and can’t trust.

  1. Damn you, post impressionist! I was about to write a blog post about this subject. I didn’t know about the color coded trust formula, though.

    The thing is, I don’t trust Wikipedia, and for reasons this application won’t show. The”reputation” of an editor is a many-splendid thing. The main thing that drove me away from Wikipedia-land is “page ownership” where editors stake their claim and put in more time than any normal person is willing to, to reverting changes made by others. There’s a three-revert rule that takes away a person’s editing rights for the rest of the day, but this is easily bypassed by scrubbing instead of reverting. After encountering this in several corners of Wikipedia, where I unfortunately knew more than the page editors, I gave up and left.

    The editors got their way, and have been able to make long-standing changes to the effect that proper diet is a sometimes-lethal eating disorder(!), the Soviet Union is “Western,” persecution of Christians has only ever been perpetrated by Muslims, Saddam’s cause of death may have been hanging, and Lake Tahoe is in central Utah. That last one, enough non-biased editors were brought in from other sectors to restore order.

    Wikipedia’s problem is they let the inmates run the asylum. And sadly I think this new software will only tell us which words have the backing of the in crowd, the popular clique. That’s a very different thing from what’s trust worthy.

  2. Sorry John for getting in there first, though I only touched on one aspect and would be interested in seeing your post.

    You make some good points about what the color coding won’t show. Even where it did show what it considered untrustworthy text I wasn’t really convinced the particular text was any less trustworthy.

    I agree that the color coding won’t tell the whole story, though I think any transparency is good with the Wikipedia. What I think the color coding and the wikiscanner can do is start to raise the questions even if they can’t answer all the questions they raise.

  3. IT is the one area where I trust Wikipedia. If I need to look up some obscure API, it’s usually there, and sometimes the documentation is better, or at least more thorough than other docs.

    They seem to like automated solutions … looking into bots crawling my site lead me to an article on the surging bot population at WP. Most of them scan a “recent changes” page for all kinds of spam and general vandalism. It seems fitting they would turn to bots for a solution, or part of one, to the trust problem.

  4. That’s interesting Forrest. I’m not a big user of Wikipedia. At most I treat it as a shallow source of information. I might grab a quick definition of something or I’ll use it as a jumping off point. That’s probably all it was meant to be anyway.

    I am a little tired of seeing it show up at the top of every search result, though.

    Do you go directly to Wikipedia looking for things or do you go there through search results pages?

  5. I’m curious to see the answer to that last question, too.

    Oh, yeah, and I heard Google is toying with the idea of re-labeling the “I feel lucky” to “Take me to Wikipedia.”

  6. John I’d laugh, but it’s almost gotten to the point where they might as well change the button to take people to Wikipedia. The “I Feel Lucky” takes you to the first result which more and more is leading to a page on the Wikipedia.

  7. I could count on my fingers the number of times I’ve clicked into Wikipedia through the SERPs. If I want to know about something, generally, and the internet is the best way to find what I need to know, the Wikipedia isn’t my first choice, or in my top however many. I generally turn to local newspapers, who almost always publish their stories online, or to any number of sources I think are generally more reputable. I also try to find a couple of different sources to agree on something in any case, but much more so from the peoples’ encyclopedia.

    I tend to trust The Economist, The Christian Science Monitor – even though I’m an atheist, BBC and England’s “Guardian,” Canada’s “Post & Mail” and a few others. I keep an eye out for them, but take what I can get, and occasionally that takes me to Wikipedia. Either because I can’t find info elsewhere, or because in a lot of cases, WP has much longer, and sometimes more detailed articles.

    So, pretty much any time I wind up reading a Wikipedia article, I’m in WP mode. I’ve either exhausted other sources, or, sometimes I’m looking for software development info, which is easily testable. And that’s one place they tend to excel.

  8. It’s good to know you don’t click on the Wikipedia for everything, though it does seem to be in every SERP at the moment. In a way that’s silly because the people who will keep clicking on Wikipedia links in search results will probably just start going there directly, skipping Google altogether.

    I think the Wikipedia can be a decent source of information for some things. Like I mentioned I use it more for quick and shallow looks at a subject. Maybe a starting point if I can’t find anything else. I don’t click on Wikipedia links in search results either. If anything I pass on the link because it’s leading to the Wikipedia and I already know how to get there.

