Despite the progress we've made with social analytics, we're still debating the relevance and utilization of different types of mined data.
As with all successes in the digital age, social media has made the rapid transition from new, cool and the domain of the younger generation, to becoming simply part of the cultural landscape.
In doing so, it has fallen under the gaze of the business world, with analysts and CEOs alike falling over themselves to exploit this new medium of customer tracking.
Company after company has tapped into the Twitter firehose – the entire stream of Tweets sent across the world and beyond – but early sentiment analysis was notoriously unreliable. There have been improvements since, but with industry-leading accuracy rates still falling some way short of the desired standard for areas of analytics, does investing in social sentiment tracking present good value?
Whether you are a business, government or operating in the third sector, the ability to measure how positive or negative an individual feels about the product or service you provide is clearly immensely valuable. The ability to do this without having to directly engage with that individual is a CMO’s fantasy.
It doesn’t take much thinking about – you can have perceptions of your brand streamed directly to you in easily understandable form, and there is no risk that a customer will feel pestered or intruded upon.
Social sentiment analysis took off around 2010 and was based around word lists used to establish the emotions conveyed by a Tweet or other online post. Lists of positive and negative words and short phrases were given predefined sentiment values, and text was then given an overall score according to which ones it contained.
At first this approach was riddled with pitfalls – how do you code for double negatives or homonyms? How do you know whether a word like ‘sick’ is being used in the traditional or slang sense?
“Sentiment analysis is a very complex task for a machine because of the multiple and often unpredictable soft and hard variables that come into play when interpreting it. The main problem being that the sentiment of a sentence only rarely lies in the sentence itself and is instead rooted in the cultural context around that sentence”, said Francesco D’Orazio, CIO of social analytics firm FACE Group.
“This requires the algorithm to compute a vast amount of densely interconnected information to answer a fairly simple question in human terms. A bit like asking a Martian to tell us whether the statement “Margaret Thatcher is alive and well in modern Britain” is positive or negative”, said D’Orazio.
Initial accuracy rates in sentiment analysis were so low as to be virtually useless, and manual checking remained an attractive option, with the losses in speed offset by gains in reliability. But after a couple of experimental years, there is now little doubt that the foremost analysts have reached a stage where the insights they are delivering from social sentiment are adding real value for a huge variety of organisations.
Breaking through the 70% ceiling
DataSift, supplier of FACE Group’s data and the largest company in the world with permissions to analyse, repackage and resell feeds from the Twitter firehose, now achieves 70% accuracy with its sentiment tracking, a rate founder and CTO Nick Halstead claims is the highest accurate figure in the industry.
“Anyone who says they’re getting better than 70% [today] is lying, generally speaking”, said Halstead.
“There has been a clear shift in the last three years – the difficulty with sentiment analysis really is about understanding the context of it, and the tech definitely has got better. We’re starting to bridge the gap, and we’re way beyond word lists now”, said Halstead.
DataSift and FACE Group have both turned to good old fashioned human intelligence as a means of testing and improving their algorithms.
“We use [task crowdsourcing marketplace] Amazon Mechanical Turk to test our results every single week, and we have to use three people per tweet or comment. Even with humans you only get an 85% accuracy because views differ on whether a word is positive or negative, and one person’s view of a subject may be different to that of a specialist. We look at about 6,000 tweets per week in this manner”, said Halstead.
DataSift carries out sentiment analysis on every single tweet in five languages – English, German, French, Spanish and Portuguese – and is in the process of adding Chinese, following expressions of interest from customers.
Its algorithms are crafted using natural language processing (NLP) techniques and are tuned separately for each language to take account of structural differences. The English language is often ridiculed for its complexity and labyrinthine grammar rules, but in sentiment analysis this can actually be a benefit.
“Complexity is actually better for computers because you have more and more detailed structure that can be ‘understood’ [by an algorithm]. Punctuation, for example, can be used by a computer to see which part of a sentence a negative qualifier is referring to”, said Halstead.
While human intelligence can help in the short term, a longer term solution to the context problem is carrying out separate analysis to mine demographic data from social media content.
“We are the only company in the world that has full demographic data for Twitter. We have location, age, gender, salary – all kinds of things”, said Halstead.
In light of ongoing events it is worth highlighting that this data is gathered using messages and other details posted publicly on Twitter – nobody is mining your DMs.
Following the publication earlier this year of Pew Internet research into the demographics of different social media networks, critics have questioned whether Twitter analysis really adds value for a company given the social network presents an already skewed social group, but Halstead disagreed.
“No sampled social group is representative of the whole world, but if you have demographics in the metadata, and you can marry that with the demographic data your business has about your customers in the real world, you can get incredibly accurate matches between the two”, he said.
This is where the benefits of social analytics to a CMO are at their most obvious. Data around a specific product – be it sales figures, web traffic or another source – can be easily matched to social data to answer questions like “what do females in the 30-40 age group think of product x, and how did that change in the hours following the launch of our new TV advertising campaign?”
And we are talking about tangible examples here, not just best case scenarios, as the following example shows.
“Dell recently had a situation where sentiment suddenly went negative for a specific product. This was a sudden and completely outside of expected sentiment. Dell was able to to dig into details, detect a strong negative response was specific to a published price”, said Patrick Morrissey, vice president of marketing at DataSift.
“They conducted an internal review, make a global price change on the web and reverse sentiment back to positive. All this happened in less that twenty four hours. This is something that would not be been noted for weeks or months so the financial impact of being able to intercept this and action a change was quite substantial.”
In many ways this is big data in its purest form – the combining of multiple diverse data streams in pursuit of real or near-real time insights. Velocity, variety and volume – DataSift uses a custom version of Hadoop that allows its analysts to look individually at billions of social messages in any given query – are all there.
Should social analytics be the priority?
There are some who question the eagerness with which many businesses are throwing resources at social sentiment analysis, though, arguing instead that the resources devoted by some businesses to social sentiment analysis would be better used to analyse more traditional CRM data.
“The money is not usually in social media, it’s in other non-structured data that enterprises have had for a long time, such as call records”, said Stephen Brobst, CTO of US data processing giant Teradata.
“The problem with social media analytics is that it’s very sexy, but myself? I’ll take the money over the sex.
“The most well-governed companies are looking at emails, customer service interactions and voice interaction, because it’s actually more actionable. It’s not that social media data is without value, but there are lower hanging fruit”, said Brobst.
But even if it is not the one-size-fits-all revenue generator its most fierce proponents would have us believe, there is a developing consensus that social analytics has a role to play in any successful big data strategy.
“Often the most valuable insight is based on transactional and behavioural data. But social media or sentiment analysis gives you more colour to inform your business decisions and actions. If I only had one choice I would take behavioural data every time, but neither is social media something to be ignored. It can add a richer, more human understanding to flesh out information by numbers”, said Peter Worster, partner of data consultancy Conduit.
It may sound bizarre at first, but the very process of interacting directly with others on social media limits the effectiveness of social analytics.
Threaded conversations are difficult enough to track without factoring in sentiment analysis – look at a conversation on Twitter where a statement was sent as multiple tweets and try to follow the dialogue – and establishing where sentiment was directed is up there with the most difficult tasks facing DataSift’s data scientists.
“We give the tools to our customers to enable them to track threaded conversations, but the power to understand whether negative or positive sentiment is related to a product or back to a previous comment is very much at the advanced end of this field”, said Halstead.
The other emerging trend – and one that uses Facebook just as much as Twitter – was showcased last summer during the Olympic Games. Marketing departments used social analytics to bring a data-driven approach to the task of establishing which athletes to sponsor following the Games.
“It wasn’t just tweets with #Olympics – that’s rubbish. We had a database of every named athlete, every athlete with a Twitter handle, with a Facebook fan page and so on. What people paid us for was analysing this data to see whether social activity [such as retweets and new followers] related to their participation or not. Brands then used this to decide which athletes to sponsor after the Games”, said Halstead.
Other social networks
You could be forgiven for thinking social analytics is 99% Twitter – it is, after all, the largest single source of publicly available social data – but Facebook Pages is another huge resource.
Comments, ‘likes’, link sharing and other metrics provide, in total, around 200m items per day for analysis. Most of this comes from celebrity fan pages and pages for brands and products, while a minority is from the personal profiles of those whose profiles are left publicly accessible.
Forums are another significant area – hundreds of millions of posts are left every day – and can prove especially valuable when you consider they are already focused around specific subjects.
Even its strongest advocates would not say social media analytics has fulfilled its potential, but the question appears to have moved from how much value it can add, to how many new revenue streams it can open.
guardian.co.uk © Guardian News & Media Limited 2010