The Common Crawl Foundation has made 5 billion indexed web pages readily available for free.

The Common Crawl Foundation has indexed 5 billion web pages and has made the data readily available to anyone for free on the Amazon EC2/S3 cloud computing infrastructure. What this essentially means is that tech innovators looking to challenge Google and create the next best search engine can do so more easily, quicker and cheaper.

To access the information, users will need to setup their own Amazon EC2 Hadoop cluster and pay for the time they use it. There are no upfront costs to use the Amazon EC2 Hadoop and it charges cost per instance hour.

This content is available for Premium Subscribers only.
Already a subscriber? Log in