Loading...
Home2018-06-06T08:30:40+00:00

BigData

Data Engineering

Data collect, data preparation, data lake, data gouvernance

Data Science

Writing algorithms, Spark, machine learning, exploration, statistics, python, R

Data Streaming

Message Bus, Key Performance Indicator (KPI), Threshold Detection, Time Window Queries, Intelligent Behaviors

Data Analytics

Visualization, notebooks

Latest articles

Curing the Kafka blindness with the UI manager

By |June 20th, 2018|Categories: Big Data, DataWorks Summit 2018|Tags: , , , |

Today it’s really difficult for developers, operators and managers to visualize and monitor what happens in a Kafka cluster. This articles covers a new graphical interface to oversee Kafka. It was given  by George Vetticaden, [...]

A CoreOS development cluster with Vagrant and VirtualBox

By |June 20th, 2018|Categories: Container, DevOps|Tags: , , , |

Following CoreOS’s instructions on how to set up a development environment in VirtualBox did not work out well for me. Here are the steps I followed to get Container Linux up and running with Vagrant. [...]

Data Lake ingestion best practices

By |June 18th, 2018|Categories: Data Engineering, DevOps|Tags: , , , , , , , |

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers. […]

DataWorks Summit 2018: A few days speaking Hadoop

By |June 5th, 2018|Categories: DataWorks Summit 2018|Tags: , , , |

The Adaltas crew went to the DataWorks Summit 2018 held in Berlin on the 18th and 19th of April 2018. On this occasion, we compiled a series of articles about the conferences that have marked [...]

Accelerating query processing with materialized views in Apache Hive

By |May 31st, 2018|Categories: Data Engineering, DataWorks Summit 2018|

Jesus Camacho Rodriguez from Hortonworks held a talk “Accelerating query processing with materialized views in Apache Hive” about the new materialized view feature coming in Apache Hive 3.0. This article covers the main principle of [...]

YARN and GPU Distribution for Machine Learning

By |May 30th, 2018|Categories: Data Science, DataWorks Summit 2018|Tags: , , |

This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be useful [...]

Apache Metron in the Real World

By |May 29th, 2018|Categories: Cyber Security, DataWorks Summit 2018, Events|Tags: , , |

Apache Metron is a storage and analytic platform specialized in cyber security. This talk was about demonstrating the usages and capabilities of Apache Metron in the real world. The presentation [...]