
A presentation of OKDP, the Open Kubernetes Data Plaform
By Mori HUANG
Jun 29, 2026
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
OKDP (Open Kubernetes Data Platform) is a service platform running on Kubernetes infrastructure dedicated to end-to-end data management. It aims to provide a robust, enterprise-grade solution for the industry through a commitment to open-source standards. The project is in active development, and here is a preview of what it offers.
Initiated by the French Public Finances Directorate General (DGFiP), the project now gather many contributor including companies such as Orange, CGI, Kubotal et Adaltas.
Under the Hood
OKDP contains 2 main layers: Data & AI modules, and a Control Plane consisting of a server and a web interface.
Data & AI modules
OKDP provides catalog of pre-integrated open-source tools, deployable independently on Kubernetes. Install the full stack or cherry pick according to your desires. Services are organized by function:
- Lakehouse & Analytics
- Orchestration
- Data Science
- Visualization & BI
- Ingestion & Streaming (planned)
- AI/MLOps (planned)
Control Plane
The backbone of the platform, acting as a governance and automation layer, integrates all tools seamlessly: handling authentication, multi-tenancy, resource management, and observability across the entire stack. It is composed of a server, and a web UI:
- OKDP Server provides a unified REST API for managing deployments, clusters, GitOps repositories, etc.
- OKDP UI enables a unified control to deploy, configure, and monitor OKDP Platform components.
Main objectives
Sovereignty by nature
OKDP is 100% open-source and cloud-native. Based on the 2025 CNCF Cloud Native Survey, 98% of organizations have utilized cloud native techniques, and 82% of container users now run K8s in production (up from 66% in 2023). As a mature project hosted by the CNCF, Kubernetes has established itself as the de facto operating system for the modern enterprise. Inheriting the open-source philosophy of the TOSIT TDP project, where both the orchestrator and its underlying components are fully open-source, the OKDP project ensures users avoid vendor lock-in while benefiting from a cloud-native architecture that is portable, scalable, and resilient across any cloud provider (AKS, EKS, GKE, etc).
Modularity and durability
OKDP offers a highly adaptable architecture that scales according to specific workload requirements. The concept is to enable users to build their tailor-made platform based on specific needs, and thus the components can be deployed independently at any granularity. Each component is distributed as a Helm charts, enabling it to be integrated into any existing Kubernetes environment alongside the existing tooling, with or without the OKDP Control Plane.
Leveraging cloud-native design, the platform provides a standardized framework that prevents the accumulation of technical debt and simplifies lifecycle management. This foundation supports continuous modernization through automated updates and modular upgrades, ensuring the data stack remains cutting-edge and sustainable.
Data centric
As a comprehensive data management ecosystem, OKDP orchestrates the entire data lifecycle with built-in governance, serving all participants in the data ecosystem including data scientists, data engineers, AI engineers, and beyond. Some key use cases include:
- High-performance SQL analytics and large-scale distributed data processing over petabytes of data.
- Daily ETL data pipeline reading data from and writing to S3-compatible storage.
- Interactive data analysis via JupyterLab, with visualization and exploration through Apache Superset.
- Machine learning and end-to-end AI/MLOps pipeline for data science teams.
- Real-time streaming and BI for operational analytics.
Together, these capabilities streamline cross-team data sharing, eliminate costly storage silos, and minimize redundancy across the enterprise.
The upcoming release this year
OKDP will be released on september 14th of 2026; the team is currently focused on finalizing the last few features and refining the user experience.
Environments
The following components are included in the upcoming release, organized by function:
Lakehouse & Analytics
- Apache Spark:
Spark is a unified analytics engine for large-scale data processing, supporting both batch and streaming workloads across multiple language APIs. - Trino:
Trino is a high-performance, distributed SQL query engine designed to query large-scale data sets across disparate sources via a single interface. - Apache Polaris:
Polaris is a cloud-native catalog for Apache Iceberg, providing centralized metadata management and cross-engine interoperability. - Hive Metastore:
Hive Metastore is a centralized metadata repository for data lakes and big data analytics.
Data Science
- JupyterLab:
JupyterLab is a web-based IDE-style environment for notebooks and code. Its flexible interface enables users to configure workflows in data science, machine learning, and scientific computing.
Orchestration & Governance
- Apache Airflow:
Airflow is an open-source platform designed for workflow orchestration, allowing teams to develop, schedule, and monitor their workflows.
Visualization & BI
- Apache Superset:
Superset is a data exploration and visualization platform for business intelligence.
More components are planned for future releases, including ingestion, streaming, and AI/MLOps tooling.
Helm and GitOps
Helm is a package manager that helps define, install, and upgrade applications on Kubernetes, making it easy to version and publish via a registry. Combined with industry-leading GitOps solutions such as Argo CD, Flux CD, etc, the robust infrastructure-as-Code (IaC) approach ensures a higher operational efficiency, traceability, and flexibility in application management.
Future objectives
Consider the first release as a starting for the OKDP project. There’s a lot more on the way, and here are the primary objectives for the future.
AI/ML lifecycle
OKDP plans to integrate Kubeflow, MLflow, and LLM Serving for AI & MLOps capabilities, allowing teams to better manage the AI/ML lifecycle.
Ingestion, streaming & governance
Support is upcoming for complex ETL ingestion workflows, streaming, and real-time processing, with the integration of Apache NiFi, Apache Kafka, and Apache Flink.
GitOps CD tools
Combined with Helm charts, a GitOps CD tool leverages the capabilities such as automatic deployment and synchronization, version control, and drift detection. OKDP plans to include Argo CD for GitOps functionality.
Planned tooling
The following tooling can be expected in the future in accordance with the objectives outlined above.
AI/ML lifecycle
- Kubeflow:
Kubeflow is an AI/ML platform comprising multiple projects targeting the deployment and management of AI/ML workloads on Kubernetes. - MLflow:
MLflow is an AI engineering platform for agents, LLMs, and ML models. It focuses on experiment tracking, model evaluation capabilities, a production model registry, and model deployment tools. - LLM Serving:
LLM Serving is the process of hosting and delivering an LLM or AI model in production to handle user prompts and generate responses. The specific tooling has not yet been defined.
Ingestion, streaming & governance
- Apache NiFi:
NiFi is a dataflow system designed for ETL pipeline and real-time data processing. It allows users to create, schedule, and monitor data flow. - Apache Kafka:
Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, and low-latency data pipelines. - Apache Flink:
Flink is a distributed engine for stream and batch data processing tasks.
GitOps CD tools
- Argo CD:
Argo CD is a declarative CD (Continuous Delivery) tool for Kubernetes that keeps the configuration of components synchronized with the desired state in the Git repository.
Orchestration & Governance
- OpenMetadata:
OpenMetadata is a unified metadata platform offering a centralized, single source of truth for all the metadata in an organization.
Ready to start?
You can explore the OKDP official website and follow the roadmap, a concise introduction and webinar video are available on this page: What’s New in OKDP - Open Kubernetes Data Platform (in French).
A full stack platform can be quickly spun up through the OKDP sandbox repository (note, a complete redisign of the UI is shipping soon). Several examples demonstrating how to work with the OKDP platform are included in the OKDP example repository. Moreover, the potential contribution areas for the project are listed in the OKDP use cases repository.
Meanwhile, TOSIT actively welcomes community contributions and engagement, so if you find the project interesting, please reach out and join us in the weekly users meeting. Meeting notes can be found in the meeting notes repository.