Today I read the discussion on slashdot about Wikipedia 2.0 - now with added trust article. Apparently the new version of wikipedia will include some sort of trust model beyond what is available right now. This was not surprising to me since I have for a while thought that wikipedia needs some more tools to help in ensuring quality of its content.
Recently I have found myself finding the site indispensables to simply satisfy my curiosity from time to time. I like watching the history channel and if I hear something that I want to know about it well just type it into google and if its a well known topic the first result is usually for a wikipedia article. At work I find myself going to wikipedia articles sometimes when I was to get the first idea about some technology I want to look into. I got a bad case of Poison Ivy last week and although I was not able to diagnose it by looking at the list of skin diseases (too bad no visual search exist yet) I was able to find out everything I needed to know to confirm my diagnose from a doctor and even get a pretty good idea which over the counter meds can help me. So as it is obvious the site is important. Yes the reliability is not guaranteed and yes I find it necessary sometimes to read the second sources to get a clearer idea sometimes (especially in the Poison Ivy situation) but overall the site is taking on an ubiquity where most trust in its contents will only benefit the general public. It should be in everyones interest that tools are being worked on to help find vandalism more efficiently and score editors and most importantly facilitate better research!
As far as the technology of verifying trust goes things are not clear cut. Its going to be harder at developing helpful tools then the article makes it sound. To me the stuff is basically in the research lab stages. I watched the a google tech talk by one of the Wikipedia chief programmers. The sense I got from that talk is that technologically wikipedia has bigger technical issues to spend its resources on. For one they need better editing functionality. They need scalability beyond what their current architecture can provide. And they need better UI improvements in many places. If anything they need to start structuring their articles in a more structured form then the hodge podge of wiki tags (although I am not sure if their current approach is actually more effective overall). Anyway if anything comes out of the metawiki related to the trust model at best it will be experimental at first.
One of the reasons why I think integrating a trust model is hard is as follows. The computers will have too much say at deciding what is trustworthy and what is not. Although this kind of work for Google at deciding which web pages are relevant it wont work as good for wikipedia. If Google misinterprets a webpage as relevant there is not much of an implication but if the wikidedia makes mistakes many people will lose trust in the system. As the article suggested another side effect will be that certain rouge users will build up trust ratings for the sole purpose of increasing trust in some edit they they would be say paid for. In effect the rating system will become a means of getting higher ratings in itself. Instead of trying to do really hard statistical analysis on the data and protecting against all the potential spammers that would try to profit from it somehow a few simple policy tweaks can go a long way.
Here is a quick list of tweaks that I think can go a long way on increasing wikipedia trust without trying do an overly complex system:
- Stop allowing anonymous edits. The IP trails have become kind of a irrelevant distraction form the issue at hand. It does not matter that the IP of the edit belonged to whoever company. The IP does not represent that any company and too many words have been written on blogs about this pointless metric. Take out the IP tracking for anonymous users and only allow individual users to take responsibility for their edits. Of course the IP can still stick around in the log files and those can still be provided to everyone as they are now but thats about it.
- Better automation for people who want to do fact checking and reviews of changes to articles. Right now there is no clearing house for this. There is a raw feed of edits but that is less useful because those come at a rate thats impossible to track. Perhaps the edits should be distributed in a collaborative way to different reviewers and put into an inbox for them to handle. An edit can go to two people and if both people disagree the software can escalate the review of some edit. This feature would be in addition to all the watch lists that people use.
- Put people into expert groups and provide them the necessary materials that they would need to do proper research. This is kind of hard too. The internet is not always the best research source. My poison ivy case is a perfect example. I found very little scientific information on the web on the topic of "contact dermatitis". Apparently medical researchers have little interest in a condition that generally can be considered an annoyance; never mind that it generates about a million ER visits annually in the US. For the few facts that had sources cited in the wikipedia article I found the sources to be no more than first page google results and being particularly weak in their authority. For one of the section I suspected that the manufacturer of the product may have played a role it its edit although it was impossible for me to verify. (Good luck at letting AI do it when even a human cannot)