Apache Hadoop is a system for building shared storage and processing infrastructures for large volumes of data (multiple terabytes or petabytes). Hadoop clusters are used by a wide range of projects for a growing number of web players (
Dotcoms were the first companies to see their data volume grow exponentially. Many have based their business model on the processing of these data. Both Google and Facebook derive most of their income from data analysis for advertising purposes. Unable to wait for traditional software editors, these companies have invested heavily in the development of new softwares to cope with this explosion while exploiting new concepts. Today, thanks to the Open Source model, these technologies are present in a large number of industries and become a key component of many government companies and services.
Hadoop is the open source implementation of two Google papers. The first, published in 2003, describes the architecture of