Cloudera
Cloudera is a company founded in 2008 in Palo Alto, California. The company is specialized in software related to Apache Hadoop and offers its own Hadoop distribution. The first distribution of Cloudera appeared in 2009. In 2019, cloudera formalized the merger with Hortonworks, a concurent actor of the big data market.
The Cloudera Development Platform (CDP), the current distribution, as well as Cloudera Distribution Hadoop (CDH), still widely deployed, are popular distributions of Apache Hadoop and are used in the field of high performance computing (HPC) and big data applications. The essential components include management tools, monitoring services, distributed storage, distributed computing, scheduling, security, ... The Cloudera's distributions target enterprise customers with both on-premise and in the cloud environments.
- Learn more
- Official website
Related articles

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT
Categories: Big Data, DevOps & SRE, Infrastructure | Tags: Ansible, Ranger, DevOps, Hortonworks, Hadoop, HBase, Knox, Spark, Cloudera, CDP, CDH, Open source, TDP
Ever since Cloudera and Hortonworks merged, the choice of commercial Hadoop distributions for on-prem workloads essentially boils down to CDP Private Cloud. CDP can be seen as the “best of both worlds…
Apr 14, 2022

Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud
Categories: Big Data, Cloud Computing | Tags: Ansible, Cloudera, CDP, Cluster, Data Warehouse, Vagrant, IaC
Following our recent Cloudera Data Platform (CDP) overview, we cover how to deploy CDP private Cloud on you local infrastructure. It is entirely automated with the Ansible cookbooks published by…
Jul 23, 2021

An overview of Cloudera Data Platform (CDP)
Categories: Big Data, Cloud Computing, Data Engineering | Tags: SDX, Data Analytics, Big Data, Cloud, Cloudera, CDP, CDH, Data Hub, Data Lake, Data Warehouse
Cloudera Data Platform (CDP) is a cloud computing platform for businesses. It provides integrated and multifunctional self-service tools in order to analyze and centralize data. It brings security and…
Jul 19, 2021

Cloudera CDP and Cloud migration of your Data Warehouse
Categories: Big Data, Cloud Computing | Tags: Azure, Cloudera, Data Hub, Data Lake, Data Warehouse
While one of our customer is anticipating a move to the Cloud and with the recent announcement of Cloudera CDP availability mi-september during the Strata conference, it seems like the appropriate…
By David WORMS
Dec 16, 2019

Notes on the Cloudera Open Source licensing model
Categories: Big Data | Tags: CDSW, License, Cloudera Manager, Open source
Following the publication of its Open Source licensing strategy on July 10, 2019 in an article called “our Commitment to Open Source Software”, Cloudera broadcasted a webinar yesterday October 2…
By David WORMS
Oct 25, 2019

Running Apache Hive 3, new features and tips and tricks
Categories: Big Data, Business Intelligence, DataWorks Summit 2019 | Tags: Druid, Hive, JDBC, LLAP, Hadoop, Kafka, Release and features
Apache Hive 3 brings a bunch of new and nice features to the data warehouse. Unfortunately, like many major FOSS releases, it comes with a few bugs and not much documentation. It is available since…
Jul 25, 2019

Introduction to Cloudera Data Science Workbench
Categories: Data Science | Tags: Azure, Cloudera, Docker, Git, Kubernetes, Machine Learning, MLOps, Notebook
Cloudera Data Science Workbench is a platform that allows Data Scientists to create, manage, run and schedule data science workflows from their browser. Thus it enables them to focus on their main…
Feb 28, 2019

Apache Hadoop YARN 3.0 – State of the union
Categories: Big Data, DataWorks Summit 2018 | Tags: YARN, GPU, Hortonworks, Hadoop, HDFS, MapReduce, Cloudera, Data Science, Docker, Release and features
This article covers the ”Apache Hadoop YARN: state of the union” talk held by Wangda Tan from Hortonworks during the Dataworks Summit 2018. What is Apache YARN? As a reminder, YARN is one of the two…
May 31, 2018

Cloudera Sessions Paris 2017
Categories: Big Data, Events | Tags: EC2, Altus, CDSW, SDX, PaaS, Azure, Cloudera, CDH, Data Science
Adaltas was at the Cloudera Sessions on October 5, where Cloudera showcased their new products and offerings. Below you’ll find a summary of what we witnessed. Note: the information were aggregated in…
Oct 16, 2017

Exposing Kafka on two different networks
Categories: Infrastructure | Tags: Cyber Security, VLAN, Kafka, Cloudera, CDH, Network
A Big Data setup usually requires you to have multiple networking interface, let’s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform system…
Jul 22, 2017

MiNiFi: Data at Scales & the Values of Starting Small
Categories: Big Data, DevOps & SRE, Infrastructure | Tags: MiNiFi, NiFi, C++, HDP, HDF, IOT, Cloudera
This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it’s a NiFi minimal agent to deploy on small devices to bring data to a cluster’s NiFi pipeline (ex: IoT…
Jul 8, 2017

Composants for CDH and HDP
Categories: Big Data | Tags: Flume, Hive, Sqoop, Zookeeper, Hortonworks, HDP, Hadoop, Oozie, Cloudera, CDH
I was interested to compare the different components distributed by Cloudera and HortonWorks. This also gives us an idea of the versions packaged by the two distributions. At the time of this writting…
By David WORMS
Sep 22, 2013

The state of Hadoop distributions
Categories: Big Data | Tags: Hortonworks, Intel, Oracle, Hadoop, Cloudera
Apache Hadoop is of course made available for download on its official webpage. However, downloading and installing the several components that make a Hadoop cluster is not an easy task and is a…
By David WORMS
May 11, 2013

Hadoop development cluster of virtual machines with static IP using VirtualBox
Categories: Infrastructure | Tags: Ambari, Hortonworks, Red Hat, VirtualBox, VM, VMware, Cloudera, Network
A few days ago, I explained how to set up a cluster of virtual machine with static IPsand Internet access suitable to host your Hadoop cluster locally for development. At the time I made use of VMWare…
By David WORMS
Mar 14, 2013

Virtual machines with static IP for your Hadoop development cluster
Categories: Infrastructure | Tags: Ambari, Hortonworks, Red Hat, VirtualBox, VM, VMware, Cloudera, Network
While I am about to install and test Ambari, this article is the occasion to illustrate how I set up my development environment with multiple virtual machines. Ambari, the deployment and monitoring…
By David WORMS
Feb 27, 2013

Storage and massive processing with Hadoop
Categories: Big Data | Tags: Hadoop, HDFS, Storage
Apache Hadoop is a system for building shared storage and processing infrastructures for large volumes of data (multiple terabytes or petabytes). Hadoop clusters are used by a wide range of projects…
By David WORMS
Nov 26, 2010