Data lineage is the process of documenting and monitoring the origin, transformations and movements of data throughout its life cycle. It provides a comprehensive and transparent view that helps understand how data is collected, manipulated, transformed and used across different systems, processes and applications within an organization.
The visibility it offers ensure data quality, security and compliance. It allows us to answer questions such as: where does this data come from? How have they been modified or processed? Where are they stored? Who has access to it? This detailed understanding of the data journey is essential for making informed decisions, ensuring data governance, facilitating audits and ensuring compliance with regulations, such as the GDPR (General Data Protection Regulation) in the European Union, or other data privacy and security standards.
OpenLineage is an open-source specification for data lineage. The specification is complemented by Marquez, its reference implementation. Since its launch in late 2020, OpenLineage has been a presence…
Dec 19, 2023