Delta Lake

Delta Lake is a storage layer on top of existing data lake. It is compatible with Apache Spark. It helps tackling data reliability issues and manage data lifecycle. Underlying storage format is Parquet, a columnar open-source format. Delta Lake enables ACID transactions, scalable metadata handling, data versioning, schema enforcement and schema evolution. It also supports updates and deletes. It is available in open-source. version or managed version on Databricks.

Related articles

Comparison of database architectures: data warehouse, data lake and data lakehouse

Comparison of database architectures: data warehouse, data lake and data lakehouse

Categories: Big Data, Data Engineering | Tags: Data Governance, Infrastructure, Iceberg, Parquet, Spark, Data Lake, Lakehouse, Data Warehouse, File Format

Database architectures have experienced constant innovation, evolving with the appearence of new use cases, technical constraints, and requirements. From the three database structures we are comparing…

Gonzalo ETSE

By Gonzalo ETSE

May 17, 2022

Internship in Data Engineering

Internship in Data Engineering

Categories: Front End, Learning | Tags: Metrics, Monitoring, Hive, Kafka, Delta Lake, Elasticsearch, IaC, Internship, Kubernetes, Streaming

Job Description Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine ​​raw data into information that can be used by business analysts and data…

David WORMS

By David WORMS

Oct 25, 2021

Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI

Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI

Categories: Data Engineering, Learning | Tags: Cloud, Data Lake, Databricks, Delta Lake, MLflow

Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customers…

Anna KNYAZEVA

By Anna KNYAZEVA

May 26, 2021

Data versioning and reproducible ML with DVC and MLflow

Data versioning and reproducible ML with DVC and MLflow

Categories: Data Science, DevOps & SRE, Events | Tags: Data Engineering, Databricks, Delta Lake, Git, Machine Learning, MLflow, Storage

Our talk on data versioning and reproducible Machine Learning proposed to the Data + AI Summit (formerly known as Spark+AI) is accepted. The summit will take place online the 17-19th November…

Experiment tracking with MLflow on Databricks Community Edition

Experiment tracking with MLflow on Databricks Community Edition

Categories: Data Engineering, Data Science, Learning | Tags: Spark, Databricks, Deep Learning, Delta Lake, Machine Learning, MLflow, Notebook, Python, Scikit-learn

Introduction to Databricks Community Edition and MLflow Every day the number of tools helping Data Scientists to build models faster increases. Consequently, the need to manage the results and the…

Importing data to Databricks: external tables and Delta Lake

Importing data to Databricks: external tables and Delta Lake

Categories: Data Engineering, Data Science, Learning | Tags: Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python

During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain