3 November 2017
Google is definitely one of the smartest businesses around. If they can harvest the brain power of 750 million internet users to digitize books, which internet company can be smarter.
If you haven’t been living under Olumo rock for the last decade, you’ve probably seen this:
It’s called a Captcha and it was used to make sure that a website user is actually human, because computers had (and to an extent, are still having) a tough time reading text like that.
It literally stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart.”
And when thousands of websites started using it in the mid 2000s to ensure robots don’t screw them over, roughly 200 million captchas were typed every single day.
And assuming each one takes about 10–12 seconds, that’s approximately a gazillion minutes of essentially wasted human effort.
Luis Von Ahn, one of the founders of Captcha, explains in this amazing TEDx talk that he then realized that that massive human effort could be put to use to digitize all books, to make them searchable and easily accessible.
So, they built reCaptcha. You may have noticed that at some point, you had to type two words instead of just one:
Here, one word is used to genuinely check if you’re human. And the other word is from a scanned copy of an old book and is showed to you to basically ask you to digitize it.
Google then acquired it in 2009. And guess what? ReCaptcha was used to completely digitize all books on Google Books, to make them searchable.
We all did it together. And we did it for free!
And Google has moved on to image recognition using reCaptcha to get labeled data sets for its AI research:
And we are still doing it for free. Time to add “Senior Data Labeling Expert at Google Inc” to the resume.