Delta Lake

Delta Lake is a storage layer on top of existing data lake. It is compatible with Apache Spark. It helps tackling data reliability issues and manage data lifecycle. Underlying storage format is Parquet, a columnar open-source format. Delta Lake enables ACID transactions, scalable metadata handling, data versioning, schema enforcement and schema evolution. It also supports updates and deletes. It is available in open-source version or managed version on Databricks.

Related articles

Importing data to Databricks: external tables et Delta Lake

Categories: Data Engineering, Data Science, Learning | Tags: Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python

During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.