Harnessing Existing Brain Power with reCAPTCHA
If you have trouble with those distorted words while signing up for something online, you’re not alone. CAPTCHAs, as they are officially known, stands for “Completely Automated Public Turing test to tell Computers and Humans Apart.” Even after decades of research, computers can’t seem to decipher mangled characters the way a human brain can. Luis von Ahn, a computer scientist from Carnegie Mellon University, has developed a security system called reCAPTCHA that keeps that tried-and-true robot filter in place, but harnesses that basic human function for research. Von Ahn’s system pairs a real security word with a word from a historical document that computer-based optical character recognition can’t figure out. If three separate users agree on the meaning of the word then it’s accepted and translated to the digital record.
He has paired up with the New York Times to digitize their achieves stretching back to 1851. According to his calculations, 200 million people a day are spending an average of 10 seconds deciphering these words before moving on with their registration or purchase. With just a fraction of that unparalleled access to human brainpower the project has been able to digitize 1.3 billion words so far with 99% accuracy. So next time you bemoan the fact you have to fill out a form keep in mind you may be reading a word from the civil war. NPR interviews Marc Frons from the project:
Marc Frons, chief technology officer of digital operations for The Times, says the pace is astonishing. Each month, the project digitizes about two years’ worth of newspapers
“Next year, if all goes well, we can do as many as 70 years, which would be almost the entire rest of the archive that is not digitized,” says Frons. “It’s just pretty cool when you’re signing up for a Web site and you see the reCAPTCHA sign. You sort of know, ‘Gee, I’m helping digitize part of The New York Times.‘ “