Cloudera Distribution Platform (CDH)

CDP is the plateform born from the merger between Cloudera and Hortonworks which provide a complete solution from the Edge to AI all with enterprise-grade security and governance. It is available both on premises and on the major Cloud providers.

Related articles

CDP part 6: end-to-end data lakehouse ingestion pipeline with CDP

CDP part 6: end-to-end data lakehouse ingestion pipeline with CDP

Categories: Big Data, Data Engineering, Learning | Tags: NiFi, Business intelligence, Data Engineering, Iceberg, Spark, Big Data, Cloudera, CDP, Data Analytics, Data Lake, Data Warehouse

In this hands-on lab session we demonstrate how to build an end-to-end big data solution with Cloudera Data Platform (CDP) Public Cloud, using the infrastructure we have deployed and configured over…

Tobias CHAVARRIA

By Tobias CHAVARRIA

Jul 24, 2023

CDP part 5: user permissions management on CDP Public Cloud

CDP part 5: user permissions management on CDP Public Cloud

Categories: Big Data, Cloud Computing, Data Governance | Tags: Ranger, Cloudera, CDP, Data Warehouse

When you create a user or a group in CDP, it requires permissions to access resources and use the Data Services. This article is the fifth in a series of six: CDP part 1: introduction to end-to-end…

Tobias CHAVARRIA

By Tobias CHAVARRIA

Jul 18, 2023

CDP part 4: user management on CDP Public Cloud with Keycloak

CDP part 4: user management on CDP Public Cloud with Keycloak

Categories: Big Data, Cloud Computing, Data Governance | Tags: EC2, Big Data, CDP, Docker Compose, Keycloak, SSO

Previous articles of the serie cover the deployment of a CDP Public Cloud environment. All the components are ready for use and it is time to make the environment available to other users to explore…

Tobias CHAVARRIA

By Tobias CHAVARRIA

Jul 4, 2023

CDP part 3: Data Services activation on CDP Public Cloud environment

CDP part 3: Data Services activation on CDP Public Cloud environment

Categories: Big Data, Cloud Computing, Infrastructure | Tags: Infrastructure, AWS, Big Data, Cloudera, CDP

One of the big selling points of Cloudera Data Platform (CDP) is their mature managed service offering. These are easy to deploy on-premises, in the public cloud or as part of a hybrid solution. The…

Albert KONRAD

By Albert KONRAD

Jun 27, 2023

CDP part 2: CDP Public Cloud deployment on AWS

CDP part 2: CDP Public Cloud deployment on AWS

Categories: Big Data, Cloud Computing, Infrastructure | Tags: Infrastructure, AWS, Big Data, Cloud, Cloudera, CDP, Cloudera Manager

The Cloudera Data Platform (CDP) Public Cloud provides the foundation upon which full featured data lakes are created. In a previous article, we introduced the CDP platform. This article is the second…

Albert KONRAD

By Albert KONRAD

Jun 19, 2023

CDP part 1: introduction to end-to-end data lakehouse architecture with CDP

CDP part 1: introduction to end-to-end data lakehouse architecture with CDP

Categories: Cloud Computing, Data Engineering, Infrastructure | Tags: Data Engineering, Hortonworks, Iceberg, AWS, Azure, Big Data, Cloud, Cloudera, CDP, Cloudera Manager, Data Warehouse

Cloudera Data Platform (CDP) is a hybrid data platform for big data transformation, machine learning and data analytics. In this series we describe how to build and use an end-to-end big data…

Stephan BAUM

By Stephan BAUM

Jun 8, 2023

Keycloak deployment in EC2

Keycloak deployment in EC2

Categories: Cloud Computing, Data Engineering, Infrastructure | Tags: Security, EC2, Authentication, AWS, Docker, Keycloak, SSL/TLS, SSO

Why use Keycloak Keycloak is an open-source identity provider (IdP) using single sign-on (SSO). An IdP is a tool to create, maintain, and manage identity information for principals and to provide…

Stephan BAUM

By Stephan BAUM

Mar 14, 2023

Spark on Hadoop integration with Jupyter

Spark on Hadoop integration with Jupyter

Categories: Adaltas Summit 2021, Infrastructure, Tech Radar | Tags: Infrastructure, Jupyter, Spark, YARN, CDP, HDP, Notebook, TDP

For several years, Jupyter notebook has established itself as the notebook solution in the Python universe. Historically, Jupyter is the tool of choice for data scientists who mainly develop in Python…

Aargan COINTEPAS

By Aargan COINTEPAS

Sep 1, 2022

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT

Categories: Big Data, DevOps & SRE, Infrastructure | Tags: Ranger, DevOps, Hortonworks, Ansible, Hadoop, HBase, Knox, Spark, Cloudera, CDP, CDH, Open source, TDP

Ever since Cloudera and Hortonworks merged, the choice of commercial Hadoop distributions for on-prem workloads essentially boils down to CDP Private Cloud. CDP can be seen as the “best of both worlds…

Leo SCHOUKROUN

By Leo SCHOUKROUN

Apr 14, 2022

Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud

Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud

Categories: Big Data, Cloud Computing | Tags: Ansible, Cloudera, CDP, Cluster, Data Warehouse, Vagrant, IaC

Following our recent Cloudera Data Platform (CDP) overview, we cover how to deploy CDP private Cloud on you local infrastructure. It is entirely automated with the Ansible cookbooks published by…

Alexander HOFFMANN

By Alexander HOFFMANN

Jul 23, 2021

An overview of Cloudera Data Platform (CDP)

An overview of Cloudera Data Platform (CDP)

Categories: Big Data, Cloud Computing, Data Engineering | Tags: SDX, Big Data, Cloud, Cloudera, CDP, CDH, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Warehouse

Cloudera Data Platform (CDP) is a cloud computing platform for businesses. It provides integrated and multifunctional self-service tools in order to analyze and centralize data. It brings security and…

Alexander HOFFMANN

By Alexander HOFFMANN

Jul 19, 2021

Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin

Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin

Categories: Big Data, Infrastructure | Tags: Maven, Hadoop, HBase, Hive, Spark, Git, Release and features, TDP, Unit tests

The Hadoop ecosystem gave birth to many popular projects including HBase, Spark and Hive. While technologies like Kubernetes and S3 compatible object storages are growing in popularity, HDFS and YARN…

Leo SCHOUKROUN

By Leo SCHOUKROUN

Dec 18, 2020

Rebuilding HDP Hive: patch, test and build

Rebuilding HDP Hive: patch, test and build

Categories: Big Data, Infrastructure | Tags: Maven, GitHub, Java, Hive, Git, Release and features, TDP, Unit tests

The Hortonworks HDP distribution will soon be deprecated in favor of Cloudera’s CDP. One of our clients wanted a new Apache Hive feature backported into HDP 2.6.0. We thought it was a good opportunity…

Leo SCHOUKROUN

By Leo SCHOUKROUN

Oct 6, 2020

Installing Hadoop from source: build, patch and run

Installing Hadoop from source: build, patch and run

Categories: Big Data, Infrastructure | Tags: Maven, Java, LXD, Hadoop, HDFS, Docker, TDP, Unit tests

Commercial Apache Hadoop distributions have come and gone. The two leaders, Cloudera and Hortonworks, have merged: HDP is no more and CDH is now CDP. MapR has been acquired by HP and IBM BigInsights…

Leo SCHOUKROUN

By Leo SCHOUKROUN

Aug 4, 2020

Should you move your Big Data and Data Lake to the Cloud

Should you move your Big Data and Data Lake to the Cloud

Categories: Big Data, Cloud Computing | Tags: DevOps, AWS, Azure, Cloud, CDP, Databricks, GCP

Should you follow the trend and migrate your data, workflows and infrastructure to GCP, AWS and Azure? During the Strata Data Conference in New-York, a general focus was put on moving customer’s Big…

Joris RUMMENS

By Joris RUMMENS

Dec 9, 2019

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain