New research suggests high positive ratings may not be a true reflection of an item's value or quality.
How much do you trust the online ratings of articles and posts on websites? New research suggests high positive ratings may not be a true reflection of an item’s value or quality, but are partly due to “herd” effects, where people end up liking something that is already well liked by others.
Negative ratings, on the other hand, do not seem to have affect the overall rating of an item, since other users seemed more likely to “correct” these with positive ratings further down the line, according to a study of tens of thousands of comments on a news aggregation website.
The findings have implications for how consumers interpret positive ratings for comments, articles or products on websites and also how ratings websites design their platforms to avoid fraud and manipulation by organisations that would seek to artificially boost a product’s perceived appeal.
Ratings are used on many websites, from Amazon to Reddit, where users vote on whether they like or dislike an article, product or online comment. The ratings are often used to rank the perceived online appeal of those items.
“Real and important decisions are made based on these ratings and, in fact, these rating systems are a big part of consumers’ confidence in e-commerce transactions,” said Sinan Aral of the Massachusetts Institute of Technology’s Sloan Business School, who led the latest research. “[Consumers] rely on them to judge the quality of products and services online.”
His team looked at the users’ ratings of posts on a news aggregation website that worked in the same way as Reddit. On the site, users can post articles and then others vote to like them (up-voting) or dislike them (down-voting). The overall score of a post, and its prominence on the website, is determined by subtracting its down-votes from its up-votes.
Over the course of the study, the team randomly assigned an “up” or “down” or “no” vote to around 100,000 new posts, and they watched to see what happened next. The results, published in the journal Science, showed that the subsequent overall score of a post was increased by 25% when it got a single, random up-vote as soon as it was initially published. These posts were also 30% more likely to achieve very high scores, defined as 10 or over, where the average score on the website was 1.9.
This herding effect was not seen, however, the other way around. Aral said that, on seeing an initial down-vote for a post, large numbers of people seemed to distrust the rating and would often “correct” it. “We didn’t test the psychology of the people but one intuitive explanation is that we’re more sceptical of negative social influence and we’re more willing to go along with positive social influence.”
The net result of the corrections was that there was no significant change in the score for posts that had initially been assigned down-votes randomly by the researchers.
Suw Charman-Anderson, a social technologist, said the results were not surprising. “From my experience with social systems, I would have guessed that people would be more likely to correct a down-vote that they disagreed with than an up-vote, just on the basis that a ‘wrong’ down-vote feels much more unfair than a ‘wrong’ up-vote.”
Aral said that the results of the research might sound positive “in a kumbaya way” but the implications were stark. “It’s not positive when you think about the runaway positivity of stockmarket bubbles and housing bubbles and things like that, where things get systematically overvalued relative to their actual value and eventually the bottom falls out of them,” he said. “The housing bubble is a situation where people are saying, ‘oh the last guy paid a lot for the house, it must be that houses are worth more than I thought and I’m going to pay more the house’ and so on and so forth.”
Charman-Anderson said there was no doubt that existing liking, rating and commenting systems on the web were, for the most part, inadequate. “I’ve seen estimates that something like one third of all ratings on e-commerce sites are fake, for example, and there seems to have been little development in rating system design over the last few years,” she said.
Bernie Hogan, social media expert at the Oxford Internet Institute, said: “While we would think up-votes and down-votes are equivalent, they’re not, they mean different things. Liking something is partially a way to regulate what sort of information we want other people to see. Whereas down-voting is a very personal matter, about a personal opinion, which is why we see much fewer down-votes on Reddit or YouTube.”
Aral’s team also found that the herding effects differed depending on topic. Topics such as culture and society, business, politics, saw lots of herding behaviour, whereas general news and economics were less affected. “One might say politics, culture and society are more subjective and socially constructed, potentially polarised,” said Aral. “Whereas general news and economics is a little bit more factual, a little bit less subjective. I don’t want to hang my hat on that and claim that, because who’s really to know which of these is more objective and subjective?”
People might have strong opinions on news stories but they have little to lose or gain from specific ratings they might have. “However, on e-commerce sites, suppliers and manufacturers (and to some extent, fans) have a huge interest in seeing their particular product get high ratings and glowing reviews,” says Charman-Anderson. “This leads to such things as, for example, authors buying positive Amazon reviews in bulk, or using ‘sockpuppet’ accounts to give themselves high ratings and reviews, and competing authors poor ones.”
Aral said his research highlighted the potential for fraud and manipulation by those who wanted to artificially boost an item’s ranking and perceived status. “If the business can up-vote their comment they’re going to get this snowballing of positive comments or scores and it creates this unfortunate incentive for fraud and manipulations.”
For platforms that provide ratings, he said, it was important to understand the herding behaviours in order to design systems that are better-protected against this type of bias. The next step of research in this area, he said, should be to test some different systems to see if there are designs that are more immune to this type of herding bias – perhaps by withholding overall ratings on an item until after a user has voted with their own opinion.
Aral said that his intent with the research was not to do away with online ratings. “The aggregation of independent opinion or activity, these are all big opportunities for our society to use the internet to provide useful tools for making decisions and taking collective action and so on. I’d hate for this to be taken as a condemnation of collective intelligence and the wisdom of crowds. This is a paper that’s trying to improve and help those processes.”
Hogan also warned against making too many generalisations from behaviour on a single website. “There are behaviours that are different in different websites. Sites that are set up in different ways are going to have completely different mechanics. This is not the same thing as liking on Facebook and we can’t equate all of these subtle behaviours as similar just because they might look similar.”
guardian.co.uk © Guardian News & Media Limited 2010