Delta Lake is a storage layer on top of existing data lake. It is compatible with Apache Spark. It helps tackling data reliability issues and manage data lifecycle. Underlying storage format is Parquet, a columnar open-source format. Delta Lake enables ACID transactions, scalable metadata handling, data versioning, schema enforcement and schema evolution. It also supports updates and deletes. It is available in open-source version or managed version on Databricks.
During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…
May 21, 2020