Data Lake

A Data Lake is a central repository from various data sources where the emphasis is put on storing data rapidly and for a low cost, at the expense of a well defined structure. A wide variety of data can be stored in data lakes such as structured data (like columns and rows in classical RDBMS), semi-structured data (CSV, XML and JSON files), and unstructured data (images, videos, emails, web pages…). In a data lake, the data is stored in a raw format, untouched, making it flexible for later usage. In practice, data lakes are often based on the Hadoop framework.

Related articles

Cloudera CDP and Cloud migration of your Data Warehouse

Categories: Big Data, Cloud Computing | Tags: Cloudera, Data Lake, Data Warehouse, Azure

While one of our customer is anticipating a move to the Cloud and with the recent announcement of Cloudera CDP availability mi-september during the Strata conference, it seems like the appropriate…

David WORMS

By David WORMS

Dec 16, 2019

Innovation, project vs product culture in Data Science

Categories: Data Science, Data Governance | Tags: DevOps, Agile, Scrum

Data Science carries the jobs of tomorrow. It is closely linked to the understanding of the business usecases, the behaviors and the insights that will be extracted from existing data. The stakes are…

David WORMS

By David WORMS

Oct 8, 2019

Data Lake ingestion best practices

Categories: Big Data, Data Engineering | Tags: Avro, Hive, NiFi, ORC, Spark, File Format, Data Governance, HDF, Operation, Protocol Buffers, Registry, Schema, Data Lake

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers…

David WORMS

By David WORMS

Jun 18, 2018

Oracle and Hive, how data are published?

Categories: Big Data | Tags: Hive, Sqoop, Oracle, Data Lake

In the past few days, I’ve published 3 related articles: a first one covering the option to integrate Oracle and Hadoop, a second one explaining how to install and use the Oracle SQL Connector with…

David WORMS

By David WORMS

Jul 6, 2013

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.