Python

Auto-scaling Druid with Kubernetes

Apache Druid is an open-source analytics data store which could leverage the auto-scaling abilities of Kubernetes due to its distributed nature and its reliance on memory. I was inspired by the talk “Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes” by Jinchul Kim during DataWorks Summit 2019 Europe in Barcelona. […]

Apache Beam: a unified programming model for data processing pipelines

In this article, we will review the concepts, the history and the future of Apache Beam, that may well become the new standard for data processing pipelines definition. […]

Get in control of your workflows with Apache Airflow

Below is a compilation of my notes taken during the presentation of Airflow by Christian Trebing from BlueYonder. Introduction Use case : how to handle data coming in regularly from customers ? Option 1 : use CRON only time triggers hard error handling inconvenient when overlapping Option 2 : Writing a workflow processing tool start is easy [...]

By |2019-06-19T07:08:27+00:00July 17th, 2016|Categories: Events, Tech Radar|Tags: , , , |0 Comments