DataWorks Summit 2019

Articles related to the event at Cloudera DataWorks Summit in Berlin on March 19-21, 2019.

Running Apache Hive 3, new features and tips and tricks

Apache Hive 3 brings a bunch of new and nice features to the data warehouse. Unfortunately, like many major FOSS releases, it comes with a few bugs and not much documentation. It is available since July 2018 as part of HDP3 (Hortonworks Data Platform version 3). I will first review the new features available with [...]

By |2019-07-25T22:40:14+00:00July 25th, 2019|Categories: Big Data, DataWorks Summit 2019|Tags: , , , , , , , |0 Comments

Auto-scaling Druid with Kubernetes

Apache Druid is an open-source analytics data store which could leverage the auto-scaling abilities of Kubernetes due to its distributed nature and its reliance on memory. I was inspired by the talk “Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes” by Jinchul Kim during DataWorks Summit 2019 Europe in Barcelona. […]