Monday, August 6, 2007

Amazon EC2, S3 + Hadoop = Open Source Utility Computing on Google Scale

Google is the undisputed King of Scalability. The High Scalability blog collected the open secrets of the Google Architecture. Two important components are:
These distributed technologies are the foundations of their scalable and reliable storage and processing clusters.

The open source Hadoop project implements these functions so you can build a Google like cluster in your data center. In case you do not have 100s of servers at your disposal ask Amazon for help. The Amazon EC2 and S3 utility computing services can host your cluster at a reasonable price. Check out this tutorial by Tom White who illustrates how to use Hadoop and Amazon Web Services together using a large collection of web access logs:
Yahoo has recently announced their support for Hadoop.

"Looking ahead and thinking about how the economics of large scale computing continue to improve, it's not hard to imagine a time when Hadoop and Hadoop-powered infrastructure is as common as the LAMP (Linux, Apache, MySQL, Perl/PHP/Python) stack that helped to powered the previous growth of the Web."


bob said...

Interesting thoughts. How about Open Sourcing Amazon EC2 and S3 as well?

That would be a game changer!

See my blog:

Geekr said...

Thanks for your comment Bob. Your blog is very interesting!