Cloudera

Cloudera is a company founded in 2008 in Palo Alto, California. The company is specialized in software related to Apache Hadoop and offers its own Hadoop distribution. The first distribution of Cloudera appeared in 2009. In 2019, cloudera formalized the merger with Hortonworks, a concurent actor of the big data market.

The Cloudera Development Platform (CDP), the current distribution, as well as Cloudera Distribution Hadoop (CDH), still widely deployed, are popular distributions of Apache Hadoop and are used in the field of high performance computing (HPC) and big data applications. The essential components include management tools, monitoring services, distributed storage, distributed computing, scheduling, security, ... The Cloudera's distributions target enterprise customers with both on-premise and in the cloud environments.

Related articles

CDP part 3: Data Services activation on CDP Public Cloud environment

CDP part 3: Data Services activation on CDP Public Cloud environment

Categories: Big Data, Cloud Computing, Infrastructure | Tags: Infrastructure, AWS, Big Data, Cloudera, CDP

One of the big selling points of Cloudera Data Platform (CDP) is their mature managed service offering. These are easy to deploy on-premises, in the public cloud or as part of a hybrid solution. Theā€¦

Albert KONRAD

By Albert KONRAD

Jun 27, 2023

CDP part 5: user permissions management on CDP Public Cloud

CDP part 5: user permissions management on CDP Public Cloud

Categories: Big Data, Cloud Computing, Data Governance | Tags: Ranger, Cloudera, CDP, Data Warehouse

When you create a user or a group in CDP, it requires permissions to access resources and use the Data Services. This article is the fifth in a series of six: CDP part 1: introduction to end-to-endā€¦

Tobias CHAVARRIA

By Tobias CHAVARRIA

Jul 18, 2023

CDP part 2: CDP Public Cloud deployment on AWS

CDP part 2: CDP Public Cloud deployment on AWS

Categories: Big Data, Cloud Computing, Infrastructure | Tags: Infrastructure, AWS, Big Data, Cloud, Cloudera, CDP, Cloudera Manager

The Cloudera Data Platform (CDP) Public Cloud provides the foundation upon which full featured data lakes are created. In a previous article, we introduced the CDP platform. This article is the secondā€¦

Albert KONRAD

By Albert KONRAD

Jun 19, 2023

CDP part 1: introduction to end-to-end data lakehouse architecture with CDP

CDP part 1: introduction to end-to-end data lakehouse architecture with CDP

Categories: Cloud Computing, Data Engineering, Infrastructure | Tags: Data Engineering, Hortonworks, Iceberg, AWS, Azure, Big Data, Cloud, Cloudera, CDP, Cloudera Manager, Data Warehouse

Cloudera Data Platform (CDP) is a hybrid data platform for big data transformation, machine learning and data analytics. In this series we describe how to build and use an end-to-end big dataā€¦

Stephan BAUM

By Stephan BAUM

Jun 8, 2023

CDP part 6: end-to-end data lakehouse ingestion pipeline with CDP

CDP part 6: end-to-end data lakehouse ingestion pipeline with CDP

Categories: Big Data, Data Engineering, Learning | Tags: NiFi, Business intelligence, Data Engineering, Iceberg, Spark, Big Data, Cloudera, CDP, Data Analytics, Data Lake, Data Warehouse

In this hands-on lab session we demonstrate how to build an end-to-end big data solution with Cloudera Data Platform (CDP) Public Cloud, using the infrastructure we have deployed and configured overā€¦

Tobias CHAVARRIA

By Tobias CHAVARRIA

Jul 24, 2023

Keycloak deployment in EC2

Keycloak deployment in EC2

Categories: Cloud Computing, Data Engineering, Infrastructure | Tags: Security, EC2, Authentication, AWS, Docker, Keycloak, SSL/TLS, SSO

Why use Keycloak Keycloak is an open-source identity provider (IdP) using single sign-on (SSO). An IdP is a tool to create, maintain, and manage identity information for principals and to provideā€¦

Stephan BAUM

By Stephan BAUM

Mar 14, 2023

Data platform requirements and expectations

Data platform requirements and expectations

Categories: Big Data, Infrastructure | Tags: Data Engineering, Data Governance, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Science

A big data platform is a complex and sophisticated system that enables organizations to store, process, and analyze large volumes of data from a variety of sources. It is composed of severalā€¦

David WORMS

By David WORMS

Mar 23, 2023

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT

Categories: Big Data, DevOps & SRE, Infrastructure | Tags: DevOps, Hortonworks, Ansible, Hadoop, HBase, Knox, Ranger, Spark, Cloudera, CDP, CDH, Open source, TDP

Ever since Cloudera and Hortonworks merged, the choice of commercial Hadoop distributions for on-prem workloads essentially boils down to CDP Private Cloud. CDP can be seen as the ā€œbest of both worldsā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Apr 14, 2022

Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud

Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud

Categories: Big Data, Cloud Computing | Tags: Ansible, Cloudera, CDP, Cluster, Data Warehouse, Vagrant, IaC

Following our recent Cloudera Data Platform (CDP) overview, we cover how to deploy CDP private Cloud on you local infrastructure. It is entirely automated with the Ansible cookbooks published byā€¦

Alexander HOFFMANN

By Alexander HOFFMANN

Jul 23, 2021

An overview of Cloudera Data Platform (CDP)

An overview of Cloudera Data Platform (CDP)

Categories: Big Data, Cloud Computing, Data Engineering | Tags: SDX, Big Data, Cloud, Cloudera, CDP, CDH, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Warehouse

Cloudera Data Platform (CDP) is a cloud computing platform for businesses. It provides integrated and multifunctional self-service tools in order to analyze and centralize data. It brings security andā€¦

Alexander HOFFMANN

By Alexander HOFFMANN

Jul 19, 2021

Cloudera CDP and Cloud migration of your Data Warehouse

Cloudera CDP and Cloud migration of your Data Warehouse

Categories: Big Data, Cloud Computing | Tags: Azure, Cloudera, Data Hub, Data Lake, Data Warehouse

While one of our customer is anticipating a move to the Cloud and with the recent announcement of Cloudera CDP availability mi-september during the Strata conference, it seems like the appropriateā€¦

David WORMS

By David WORMS

Dec 16, 2019

Notes on the Cloudera Open Source licensing model

Notes on the Cloudera Open Source licensing model

Categories: Big Data | Tags: CDSW, License, Cloudera Manager, Open source

Following the publication of its Open Source licensing strategy on July 10, 2019 in an article called ā€œour Commitment to Open Source Softwareā€, Cloudera broadcasted a webinar yesterday October 2ā€¦

David WORMS

By David WORMS

Oct 25, 2019

Running Apache Hive 3, new features and tips and tricks

Running Apache Hive 3, new features and tips and tricks

Categories: Big Data, Business Intelligence, DataWorks Summit 2019 | Tags: JDBC, LLAP, Druid, Hadoop, Hive, Kafka, Release and features

Apache Hive 3 brings a bunch of new and nice features to the data warehouse. Unfortunately, like many major FOSS releases, it comes with a few bugs and not much documentation. It is available sinceā€¦

Gauthier LEONARD

By Gauthier LEONARD

Jul 25, 2019

Introduction to Cloudera Data Science Workbench

Introduction to Cloudera Data Science Workbench

Categories: Data Science | Tags: Azure, Cloudera, Docker, Git, Kubernetes, Machine Learning, MLOps, Notebook

Cloudera Data Science Workbench is a platform that allows Data Scientists to create, manage, run and schedule data science workflows from their browser. Thus it enables them to focus on their mainā€¦

Mehdi ELALAMI

By Mehdi ELALAMI

Feb 28, 2019

Apache Hadoop YARN 3.0 ā€“ State of the union

Apache Hadoop YARN 3.0 ā€“ State of the union

Categories: Big Data, DataWorks Summit 2018 | Tags: GPU, Hortonworks, Hadoop, HDFS, MapReduce, YARN, Cloudera, Data Science, Docker, Release and features

This article covers the ā€Apache Hadoop YARN: state of the unionā€ talk held by Wangda Tan from Hortonworks during the Dataworks Summit 2018. What is Apache YARN? As a reminder, YARN is one of the twoā€¦

Lucas BAKALIAN

By Lucas BAKALIAN

May 31, 2018

Cloudera Sessions Paris 2017

Cloudera Sessions Paris 2017

Categories: Big Data, Events | Tags: Altus, CDSW, SDX, EC2, Azure, Cloudera, CDH, Data Science, PaaS

Adaltas was at the Cloudera Sessions on October 5, where Cloudera showcased their new products and offerings. Below youā€™ll find a summary of what we witnessed. Note: the information were aggregated inā€¦

CĆ©sar BEREZOWSKI

By CĆ©sar BEREZOWSKI

Oct 16, 2017

Exposing Kafka on two different networks

Exposing Kafka on two different networks

Categories: Infrastructure | Tags: Cyber Security, VLAN, Kafka, Cloudera, CDH, Network

A Big Data setup usually requires you to have multiple networking interface, letā€™s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform systemā€¦

CĆ©sar BEREZOWSKI

By CĆ©sar BEREZOWSKI

Jul 22, 2017

MiNiFi: Data at Scales & the Values of Starting Small

MiNiFi: Data at Scales & the Values of Starting Small

Categories: Big Data, DevOps & SRE, Infrastructure | Tags: MiNiFi, NiFi, C++, HDF, Cloudera, HDP, IOT

This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically itā€™s a NiFi minimal agent to deploy on small devices to bring data to a clusterā€™s NiFi pipeline (ex: IoTā€¦

CĆ©sar BEREZOWSKI

By CĆ©sar BEREZOWSKI

Jul 8, 2017

Composants for CDH and HDP

Composants for CDH and HDP

Categories: Big Data | Tags: Flume, Hortonworks, Hadoop, Hive, Oozie, Sqoop, Zookeeper, Cloudera, CDH, HDP

I was interested to compare the different components distributed by Cloudera and HortonWorks. This also gives us an idea of the versions packaged by the two distributions. At the time of this writtingā€¦

David WORMS

By David WORMS

Sep 22, 2013

The state of Hadoop distributions

The state of Hadoop distributions

Categories: Big Data | Tags: Hortonworks, Intel, Oracle, Hadoop, Cloudera

Apache Hadoop is of course made available for download on its official webpage. However, downloading and installing the several components that make a Hadoop cluster is not an easy task and is aā€¦

David WORMS

By David WORMS

May 11, 2013

Virtual machines with static IP for your Hadoop development cluster

Virtual machines with static IP for your Hadoop development cluster

Categories: Infrastructure | Tags: Ambari, Hortonworks, Red Hat, VirtualBox, VM, VMware, Cloudera, Network

While I am about to install and test Ambari, this article is the occasion to illustrate how I set up my development environment with multiple virtual machines. Ambari, the deployment and monitoringā€¦

David WORMS

By David WORMS

Feb 27, 2013

Hadoop development cluster of virtual machines with static IP using VirtualBox

Hadoop development cluster of virtual machines with static IP using VirtualBox

Categories: Infrastructure | Tags: Ambari, Hortonworks, Red Hat, VirtualBox, VM, VMware, Cloudera, Network

A few days ago, I explained how to set up a cluster of virtual machine with static IPsand Internet access suitable to host your Hadoop cluster locally for development. At the time I made use of VMWareā€¦

David WORMS

By David WORMS

Mar 14, 2013

Storage and massive processing with Hadoop

Storage and massive processing with Hadoop

Categories: Big Data | Tags: Hadoop, HDFS, Storage

Apache Hadoop is a system for building shared storage and processing infrastructures for large volumes of data (multiple terabytes or petabytes). Hadoop clusters are used by a wide range of projectsā€¦

David WORMS

By David WORMS

Nov 26, 2010

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Scienceā€¦

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain