Storage and massive processing with Hadoop

Storage and massive processing with Hadoop


By David WORMS

Nov 26, 2010

Big Data
Do you like our work......we hire!

Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.

Apache Hadoop is a system for building shared storage and processing infrastructures for large volumes of data (multiple terabytes or petabytes). Hadoop clusters are used by a wide range of projects for a growing number of web players (Yahoo!, EBay, Facebook, LinkedIn, [Twitter]( /)) and their size continues to increase. Yahoo! has 45,000 machines with the largest cluster of 4,000 servers and 40 PB while Facebook reported to store 20 PBs on the same HDFS cluster (for Hadoop Distributed File System).

Dotcoms were the first companies to see their data volume grow exponentially. Many have based their business model on the processing of these data. Both Google and Facebook derive most of their income from data analysis for advertising purposes. Unable to wait for traditional software editors, these companies have invested heavily in the development of new softwares to cope with this explosion while exploiting new concepts. Today, thanks to the Open Source model, these technologies are present in a large number of industries and become a key component of many government companies and services.

Hadoop is the open source implementation of two Google papers. The first, published in 2003, describes the architecture of GFS (for Google Distributed Filesystem). The second, published in 2004, introduces the Map-Reduce paradigm. At that time, Doug Cutting, today at Cloudera, was working on Nutch, an open source software of the Apache Foundation including an Internet vacuum cleaner and a search engine. Nutch’s storage and computing requirements have led to the implementation of Google’s work into what will become Hadoop.

Share this article

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain