Trunk Data Platform (TDP)

Trunk Data Platform (TDP) is a fully open source big data distribution based on the Apache ecosystem. The initiative is incubated by The Open Source I Trust (TOSIT), a French association whose mission is to promote open source between large accounts and institutions.

The TDP distribution is based on the open source versions of big data components of the Apache ecosystem. As part of the TDP project, these components are compiled, tested and deployed automatically.

The TDP distribution defines and qualifies a set of versioned components that interact with each other. In addition, it provides the community with tools for deploying platforms. The resulting stack is versioned and evolves along the following axes:

  • The evolution of the components that compose it by integrating new versions and applying/backporting fixes;
  • Adding new features to the source code of the TDP project.

Any new development has a ripple effect in the compilation of all the components, the validation of tests and the provision of a new version of the distribution in accordance with the recommendations of Semantic Versioning (SemVer).

For ensure the continuation of services, the first versions made available are aligned with those of the HDP 2.6.5 and HDP 3.1.5 distributions. The list of supported components includes: Hadoop (HDFS, YARN, MapReduce), Hive & Tez, Spark, Ranger, HBase, Phoenix, Knox, Oozie, NiFi, Kafka, and ZooKeeper.

Related articles

New TDP website launched

New TDP website launched

Categories: Big Data | Tags: Programming, Ansible, Hadoop, Python, TDP

The new TDP (Trunk Data Platform) website is online. We invite you to browse its pages to discover the platform, stay informed, and cultivate contact with the TDP community. TDP is a completely openā€¦

David WORMS

By David WORMS

Oct 3, 2023

Installation Guide to TDP, the 100% open source big data platform

Installation Guide to TDP, the 100% open source big data platform

Categories: Big Data, Infrastructure | Tags: Infrastructure, VirtualBox, Hadoop, Vagrant, TDP

The Trunk Data Platform (TDP) is a 100% open source big data distribution, based on Apache Hadoop and compatible with HDP 3.1. Initiated in 2021 by EDF, the DGFiP and Adaltas, the project is governedā€¦

Paul FARAULT

By Paul FARAULT

Oct 18, 2023

Dive into tdp-lib, the SDK in charge of TDP cluster management

Dive into tdp-lib, the SDK in charge of TDP cluster management

Categories: Big Data, Infrastructure | Tags: Programming, Ansible, Hadoop, Python, TDP

All the deployments are automated and Ansible plays a central role. With the growing complexity of the code base, a new system was needed to overcome the Ansible limitations which will enable us toā€¦

Guillaume BOUTRY

By Guillaume BOUTRY

Jan 24, 2023

Spark on Hadoop integration with Jupyter

Spark on Hadoop integration with Jupyter

Categories: Adaltas Summit 2021, Infrastructure, Tech Radar | Tags: Infrastructure, Jupyter, Spark, YARN, CDP, HDP, Notebook, TDP

For several years, Jupyter notebook has established itself as the notebook solution in the Python universe. Historically, Jupyter is the tool of choice for data scientists who mainly develop in Pythonā€¦

Aargan COINTEPAS

By Aargan COINTEPAS

Sep 1, 2022

TDP workshop: Become a TDP power user from your terminal

TDP workshop: Become a TDP power user from your terminal

Categories: Events, Learning | Tags: DevOps, Ansible, Hadoop, Open source, TDP

The TDP CLI is used to deploy and operate your TDP services. It relies on tdp-lib to provide control and flexibility at your fingertips. Some time ago, we announced the public release of TDP - Trunkā€¦

Paul FARAULT

By Paul FARAULT

Jun 17, 2022

Big data infrastructure internship

Big data infrastructure internship

Categories: Big Data, Data Engineering, DevOps & SRE, Infrastructure | Tags: Infrastructure, Hadoop, Big Data, Cluster, Internship, Kubernetes, TDP

Job description Big Data and distributed computing are at the core of Adaltas. We accompagny our partners in the deployment, maintenance, and optimization of some of the largest clusters in Franceā€¦

Stephan BAUM

By Stephan BAUM

Dec 2, 2022

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT

Categories: Big Data, DevOps & SRE, Infrastructure | Tags: DevOps, Hortonworks, Ansible, Hadoop, HBase, Knox, Ranger, Spark, Cloudera, CDP, CDH, Open source, TDP

Ever since Cloudera and Hortonworks merged, the choice of commercial Hadoop distributions for on-prem workloads essentially boils down to CDP Private Cloud. CDP can be seen as the ā€œbest of both worldsā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Apr 14, 2022

Reliable and reproducible Linux installation with NixOS

Reliable and reproducible Linux installation with NixOS

Categories: Infrastructure, Learning | Tags: Linux, Packaging, VM, NixOS, TDP

When using an operating system, upgrading packages or installing new ones are common tasks that introduce the risk of affecting the stability of the system. NixOS is a Linux distribution that ensuresā€¦

Florent MOUAFFO

By Florent MOUAFFO

Feb 8, 2022

Nix introduction, main concepts and commands

Nix introduction, main concepts and commands

Categories: Infrastructure, Learning | Tags: Arch Linux, CentOS, Linux, OS X, Packaging, Ubuntu, NixOS, TDP

Nix is a functional package manager for Linux and other Unix systems, making the management of packages more reliable and easy to reproduce. With a traditional package manager, when updating a packageā€¦

Florent MOUAFFO

By Florent MOUAFFO

Feb 1, 2022

Internship in Big Data infrastructure with TDP

Internship in Big Data infrastructure with TDP

Categories: Infrastructure, Learning | Tags: Cyber Security, DevOps, Java, Hadoop, IaC, Internship, TDP

Job Description Big Data and distributed computing is at Adaltasā€™ core. We support our partners in the deployment, maintenance and optimization of some of Franceā€™s largest clusters. Adaltas is also anā€¦

Daniel HARTY

By Daniel HARTY

Oct 25, 2021

Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin

Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin

Categories: Big Data, Infrastructure | Tags: Maven, Hadoop, HBase, Hive, Spark, Git, Release and features, TDP, Unit tests

The Hadoop ecosystem gave birth to many popular projects including HBase, Spark and Hive. While technologies like Kubernetes and S3 compatible object storages are growing in popularity, HDFS and YARNā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Dec 18, 2020

Rebuilding HDP Hive: patch, test and build

Rebuilding HDP Hive: patch, test and build

Categories: Big Data, Infrastructure | Tags: Maven, Java, Hive, Git, GitHub, Release and features, TDP, Unit tests

The Hortonworks HDP distribution will soon be deprecated in favor of Clouderaā€™s CDP. One of our clients wanted a new Apache Hive feature backported into HDP 2.6.0. We thought it was a good opportunityā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Oct 6, 2020

Installing Hadoop from source: build, patch and run

Installing Hadoop from source: build, patch and run

Categories: Big Data, Infrastructure | Tags: Maven, Java, LXD, Hadoop, HDFS, Docker, TDP, Unit tests

Commercial Apache Hadoop distributions have come and gone. The two leaders, Cloudera and Hortonworks, have merged: HDP is no more and CDH is now CDP. MapR has been acquired by HP and IBM BigInsightsā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Aug 4, 2020

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Scienceā€¦

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain