Support Ukrain
Adaltas logoAdaltasAdaltas logoAdaltas

Apache Hadoop MapReduce

MapReduce is a distributed data processing framework. It is part of the Apache Hadoop framework and works on top of Apache HDFS.

This framework permits efficient processing of large amount of data distributed across multiple nodes.

During a MapReduce job, the data is split into chunks that are processed in parallel by the MapReduce tasks. The two main tasks of MapReduce are:

  • Mapper: The mapper tasks process records one-by-one and output key/value pairs. The key is the input and the value is the result of the operation.
  • Reducer: The reducer task process the result of the mappers grouped by the same key. The reducer performs an aggregation operation for each group.

All processing steps are persisted in HDFS. In the case of failure, MapReduce can recover from previous processing steps. This assures high availability of the system.

Related articles

Internship in Big Data infrastructure with TDP

Internship in Big Data infrastructure with TDP

Categories: Infrastructure, Learning | Tags: Cyber Security, DevOps, Java, Hadoop, IaC, Internship, TDP

Job Description Big Data and distributed computing is at Adaltas’ core. We support our partners in the deployment, maintenance and optimization of some of France’s largest clusters. Adaltas is also an…

Daniel HARTY

By Daniel HARTY

Oct 25, 2021

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.