Twitter Predicts the Future
The chatter in Twitter can accurately predict the box-office revenues of upcoming movies weeks before they are released. In fact, Tweets can predict the performance of films better than market-based predictions, such as Hollywood Stock Exchange, which have been the best predictors to date.
The Social Computing Lab at HP Labs in Palo Alto, CA found that using only the rate at which movies are mentioned could successfully predict future revenues. But when the sentiment of the tweet was factored in (how favorable it was toward the new movie), the prediction was even more exact. To quantify the sentiments in 3 million tweets the team used the anonymous human workers found by the Amazon Mechanical Turk to rate a sample of tweets, and then trained an algorythmic classifier to derive a rating for the rest.

The graph above compares the predicted vs actual box office scores of tweet-rates (blue line) and Hollywood Stock Exchange (green line).
Benardo Huberman, the chief investigator on this work, says that they predicted the outcomes of new movies released in November and December 2009 and January 2010, including Avatar, Invictus, The Blind Side and Twilight.
Of course, predicting movie revenues is only a tidy test case. If you can use Twitter to predict the future of movie tickets, then why not elections, or sales of other products? As the authors write:
This method can be extended to a large panoply of topics, ranging from the future rating of products to agenda setting and election outcomes. At a deeper level, this work shows how social media expresses a collective wisdom which, when properly tapped, can yield an extremely powerful and accurate indicator of future outcomes.
The PDF version of the paper Predicting the Future With Social Media by Sitaram Asur and Bernardo Huberman is short and clear.
Works fine until people realize it works, then they start gaming it, and it stops working.
Posted by Stephen Downes on April 1, 2010 at 7:02 PM@ Stephen Downes I completely agree! My sentiments exactly.
Posted by Colley1962 on April 2, 2010 at 3:08 AM@Stephen, “gaming” massive amounts of data is hard to do without incentive. How can tens (perhaps hundreds) of thousands of people be compelled to fake tweets about a movie? Tweet bots?
Question: does this predictive quality of Twitter push an outcome, like a self-fulfilling prophecy, or is it simply a passive indicator? If the latter, I see little incentive for gaming. And, as the summary implies, applications for this metric seem to go far beyond entertainment, especially as higher population percentages align their lives into virtual space.
Posted by John on April 2, 2010 at 5:20 AMThis makes sense, provided Surowiecki’s rules of a wise crowd apply: Diversity of opinion (yes), Independence (some opinions may be determined by others, but not everyone follows everyone else), Decentralization (yes) and Aggregation (available).
Will be interesting to see how it goes. I would suggest that it is far less likely to be gamed than good old-fashioned surveys - there are lots of other reasons to tweet!
Posted by Keith De La Rue on April 2, 2010 at 6:01 AMBernardo has done it again. This shows that the complex dynamics of word of mouth (Tweets) and a growing positive assessment of the film drive its success, as I show in my book Hollywood Economics.
Posted by Arthur De Vany on April 2, 2010 at 8:46 AMImagine how effective Twitter could become as a prediction engine by changing their their present question to a new one. Not “What’s happening?”, but “What are you up to?” or “what’s next?” Not “What are you doing now?” but: “What now?”
Posted by Monique van Dusseldorp on April 2, 2010 at 9:14 AMKevin,
Fascinating link. As others have noted, there is the concern about gaming the system, but it’s not so easy to do for mass market items. I’d guess there are lots of applications for this kind of analysis, such as in publishing.
James
Posted by James Rafferty on April 2, 2010 at 10:43 AMThey used 3 million tweets from hundreds of thousands of users. Good luck gaming that system, which has over 100 million users.
Posted by Bob on April 2, 2010 at 10:57 AMThe beauty of statistical analysis is that it is resilient to manipulation as it requires a large corpus of data, which by it’s nature is difficult to manipulate in a statistically significant way
Posted by christian on April 2, 2010 at 11:50 AMCall me a nitpicker, but the mechanism here seems to be - collect from Twitter a large number of the opinions of moviegoers on whether they are excited about an upcoming movie. People usually tweet about things that have their attention…
Or, to say it another way, the fact that people like to talk on Twitter, in an honest manner about “coming attractions” that have caught their notice. Effectively this allows you to conduct an opinion poll about the movie with a massive sample size. This then turns out to have really good predictive power.
This doesn’t seem particularly shocking and likewise seems to point to serious limitations in the predictive power. i.e. It only works for items people talk about spontaneously and unguardedly and that the Twitter population is a good sample for the consumers of.
It’s great news for someone in the movie business, distributors I guess, because they can choose not to screen films that are not going to be popular. I’ve no idea if the Twitter population reflects the voting population well enough to predict an election…
Posted by Indy on April 2, 2010 at 12:03 PMMovie attendance and election outcomes are socially determined … I can see how twitter can be good for predicting those (assuming that demographic biases among twitterites does not lead to a highly skewed sample). For complex phenomena, this method might be limited.
Posted by Jonathan Byron on April 2, 2010 at 12:57 PMIf this became an accepted predictive method couldn’t it be easily manipulative therefore countering it’s predictability.
Posted by bulldogmi on April 3, 2010 at 2:09 AMGaming the system is so easy today with the huge amount of spam and bots flooding twitter and already gaming the twitter trends.
Posted by Fred on April 3, 2010 at 6:23 AMAnyone got a link to the data? I want to make a better graph - a square graph with the same units on both axes.
Please email me at ben@benatkin.com in case I forget to check back.
Posted by Ben Atkin on April 4, 2010 at 8:06 PMhave the researchers put their money where their mouths are and used this to place intrade bets?
Posted by Aaron Davies on April 6, 2010 at 5:10 PMI’m not sure that the system would be as hard to game as some people say; it would take some resources, but you can find people who are capable of creating buzz on Twitter and get them to talk about the movie, which would lead to additional buzz, etc.
But the question, then, is have you actually gamed the system, or are you doing what you should probably be doing anyway, which is getting thought leaders to talk about your product?
Posted by ptp on April 10, 2010 at 11:07 AMInitially I thought this was an indication against the superiority of prediction markets touted so prominently by Robin Hanson of George Mason University (http://www.overcomingbias.com). But of coures Hanson has an answer - he also mentions the gaming issue - but basically the reply is that HSX would become an even better predictor the second the market had acces to this Twitter information: http://www.overcomingbias.com/2010/03/masking-movie-manipulation.html
Posted by Sebastian Franck on April 14, 2010 at 1:13 AMGreat read and agree with the notion, as we are starting to implement this methodology in analysis ourselves at Media Logic (www.mlinc.com).
Goes to show what we can truly gauge from analyzing the conversations around our brand in reference to qualitative data, but more importantly it shows that developing a strong social marketing strategy and placing social at the center (Conversation Centric marketing) of your marketing and / or business can help you positively effect your brands perception, either that be in advocacy or in product development.
For a current example take a look at Ford’s campaign “The Ford Story” (http://ow.ly/1AGfO) and for a movie related example look back to 1999 (Yes, I did say 1999 and social marketing in the same paragraph) at The Blair Witch Project, which was a story outlined by it’s creator on the web prior to getting investing and writing the screenplay $23,000 to produce $92million in revenue definitely speaks for itself……..
Posted by Michael Smith on April 20, 2010 at 6:07 AMInteresting post. At first when I began to comment, I was going to argue that this predictive analysis is most accurate when the movie’s demographic matches Twitter’s demographic. However, I think Twitter’s demographic are movie goers. I was trying to think of a movie that caters towards the older generations - the ones less likely to Tweet, the ones most likely retired and out of the corporate world, therefore, not on their computers 8+ hours a day! Despite the fact that I couldn’t think of an example movie, I think the same thought applies. Less people will tweet about it and less revenue will be brought in. Just because Twitter’s demographics are skewed towards the younger generation, I think they align with movie goer demograhics anyway, so it doesn’t matter what the movie is about, the predictive analysis will still be accurate.
Coming from the technical side, my company would be most interested in the predictive modeling and algorithms the analysts used. That is our business, so we’re excited to see others are catching on!
Thanks for the post! Ellen O’Neal www.livelogic.net
Posted by Ellen O'Neal on August 19, 2010 at 7:26 AM


Three thoughts come to mind: 1. This effect is probably alot more pronounced when the target audience is younger and more active on-line. 2. No matter the buzz, some movies are so flawed they tank anyway. 3. By extension, couldn’t the studios or directors float their ideas and gauge the potential before spending a dime? That might save us from Rocky IX.
Posted by Alan on April 1, 2010 at 3:30 PM