Researchers Predict Future Epidemics Using Old Newspapers

Scientists have created software that analyzes past events in order to predict future health threats and events.

There’s an old axiom asserting that history repeats itself, and those who fail to learn from it are bound to make the same mistakes. With computers and data processing programs, it’s becoming easier to see the patterns in the cycles of history – now it’s time to learn from them.

Which is exactly what Eric Horvitz and Kira Radinsky are attempting to do.

Horvitz, a researcher from Microsoft, and Radinsky, a researcher at the Technion-Israel Institute of Technology, have developed software that helps analyze past events and health related headlines so as to forecast future real-world events. The systems works by:

identifying significant increases in the likelihood of disease outbreaks, deaths, and riots in advance of the occurrence of these events in the world.

The published paper ‘Mining the Web to Predict Future Events’ relays their findings, and offers a platform on which to build a more practical program. The software developed by Horvitz and Radinsky analyzes, qualifies, and even contextualizes online archival data that mentions natural events (like droughts), deaths, and health related epidemics. The analytical and ‘learning’ ability of the software

builds predictive models that generalize from specific sets of sequences of events to provide likelihoods of future outcomes, based on patterns of evidence observed in near-term newsfeeds.

The software is able to leverage articles, headlines, and data from over 90 sources spanning from the 1980s to the early 2000s, including The New York Times, Wikipedia, FreeBase, and WordNet.

 

Prediction-flowchart

 

Predictive software of this nature could potentially allow for more proactive alerting systems, which could lead to a greater preparedness and even reduction in disease outbreaks, deaths, and related riots worldwide. One of the examples highlighted in the paper is a drought in Angola in 2006 that precipitated an outbreak of cholera in 2007, which the software successfully predicted would happen based on the analyzed patterns. Amazingly, the precision of the forecasts for natural events/disease outbreaks, deaths, and riots was in the range of ‘70% to 90%’ accurate.

 

Angola-cholera

While researchers do much of the current analysis on health trends manually, a clear advantage of the software is highlighted in the conclusion of the research paper:

Beyond knowledge that is easily discovered in studies or available from experts, new relationships and context-sensitive probabilities of outcomes can be discovered with such automated analyses. Systems employing the methods would have fast and comprehensive access to news stories, including stories that might seem insignificant but that can provide valuable evidence about the evolution of larger, more important stories.

Businesses have been practicing forecasting for years. Futures markets, stocks, and futurists are rooted in the idea of predicting coming trends based on historical data, so it’s no wonder that the same approach can be applied to natural and human related events, which tend to be similarly cyclical. One startup, Recorded Future, has already received funding from Google and the CIA and works to

constantly collect news, blogs, and public social media…[to] identify the events: past, present, and future…[and] help you find predictive signals in the noise of the web.

Twenty-two years of data to analyze is a relatively small sample set, which is why Horvitz aims to gather and analyze more data that digs further into the past. The foundation of the software laid out in the ‘Mining’ research paper could prove especially useful in underdeveloped countries that are held under the constant sway of recurring natural events.

Mining the Web to Predict Future Events

Recorded Future

Images via ‘Mining the Web’ and Flickr

Quantcast