Big Data

Curing the Kafka blindness with the UI manager

Today it’s really difficult for developers, operators and managers to visualize and monitor what happens in a Kafka cluster. This articles covers a new graphical interface to oversee Kafka. It was given  by George Vetticaden, VP Management product at Hortonworks, during the DataWorks Summit at the San Jose Conference Center June 2018. […]

By | 2018-06-21T13:06:52+00:00 June 20th, 2018|Categories: Big Data, DataWorks Summit 2018|Tags: , , , |0 Comments

TensorFlow on Spark 2.3: The Best of Both Worlds

The integration of TensorFlow With Spark has a lot of potential and creates new opportunities. […]

Running Enterprise Workloads in the Cloud with Cloudbreak

This article is based on Peter Darvasi and Richard Doktorics’ talk Running Enterprise Workloads in the Cloud at the DataWorks Summit 2018 in Berlin. It presents Hortonworks’ automated deployment tool for cloud environments, Cloudbreak, describes and comments features that Peter and Richard explained in their talk, and give some personal guidelines on when and why [...]

By | 2018-06-06T09:16:58+00:00 May 28th, 2018|Categories: Big Data, DataWorks Summit 2018|Tags: , , , |1 Comment

Omid: Scalable and highly available transaction processing for Apache Phoenix

Apache Omid provides a transactional layer on top of key/value NoSQL databases. In practice, it is usually used on top of Apache HBase. […]

By | 2018-06-05T22:36:36+00:00 May 24th, 2018|Categories: Big Data, DataWorks Summit 2018, Events|Tags: , , , , , |1 Comment

Apache Beam: a unified programming model for data processing pipelines

In this article, we will review the concepts, the history and the future of Apache Beam, that may well become the new standard for data processing pipelines definition. […]

Present and future of Hadoop workflow scheduling: Oozie 5.x

During the DataWorks Summit Europe 2018 in Berlin, I had the opportunity to attend a breakout session on Apache Oozie. It covers the new features released in Oozie 5.0, including future features of Oozie 5.X, which is the main subject of this article. They spent some time discussing the Apache Ambari’s Workflow Scheduler and its way [...]

By | 2018-06-05T22:36:37+00:00 May 23rd, 2018|Categories: Big Data, DataWorks Summit 2018|Tags: , |2 Comments

Essential questions about Time Series

Today, the bulk of Big Data is temporal. We see it in the media and among our customers: smart meters, banking transactions, smart factories, connected vehicles … IoT and Big Data go hand in hand. […]

By | 2018-06-05T22:36:40+00:00 March 19th, 2018|Categories: Big Data, Data Engineering|Tags: , , , , , |0 Comments

HDP cluster supervision

About With the current growth of BigData technologies, more and more companies are building their own clusters in hope to get some value of their data. One main concern while building these infrastructures is the capacity to continuously monitor the cluster's health and report issues as fast as possible. This is where supervision comes in. [...]

By | 2018-06-05T22:37:04+00:00 July 5th, 2017|Categories: Big Data|2 Comments