cesar

About César Berezowski

Big Data consultant @ Adaltas since 2015, Cesar enjoys discovering stuff and experimenting with new technologies in addition to his day to day work

From Dockerfile to Ansible Containers

Presentation by Tomas Tomecek from Red Hat’s containerization team. This talk was an introduction to the Dockerfile format and to Ansible container’s tool and then a comparison of both. […]

By | 2017-11-23T11:21:04+00:00 October 25th, 2017|Categories: Open Source Summit Europe 2017|Tags: , , |0 Comments

Exposing Kafka on two different networks

This article was implemented using CDH 5.7.1 with Kafka 2.0.1.5 installed using parcels. One of the clusters we are working on has the following network configuration: A "data" network exposing our edge, kafka and master nodes to the outside world An "internal" network dedicated to the cluster for our worker nodes We use Kafka for data [...]

By | 2017-10-24T12:13:22+00:00 July 22nd, 2017|Categories: Blog|Tags: , |0 Comments

Change Ambari’s topbar color

We recently had a client that has multiple environments (Production, Integration, Testing, ...) running on HDP and managed using one Ambari instance per cluster. One of the questions that came up was the folloging: We need a way to distinguish our environment when on Ambari and the cluster name is visually not enough, how can [...]

By | 2017-07-24T21:37:13+00:00 July 9th, 2017|Categories: Hack|Tags: , |0 Comments

MiNiFi: Data at Scales & the Values of Starting Small

This post is part of the Series of the Dataworks Summit 2017 (ex-Hadoop Summit) Speaker is Aldrin Piri from Hortonworks This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it's a NiFi minimal agent to deploy on small devices to bring data to a cluster's NiFi pipeline (ex: IoT). Here are [...]

By | 2017-07-24T21:37:13+00:00 July 8th, 2017|Categories: Blog, Events|Tags: , , , , |0 Comments

Get in control of your workflows with Apache Airflow

Presentation by Christian Trebing from BlueYonder Introduction Use case : how to handle data coming in regularly from customers ? Option 1 : use CRON only time triggers hard error handling inconvenient when overlapping Option 2 : Writing a workflow processing tool start is easy soon you reach limits: invest much more than envisionned of work with [...]

By | 2017-07-24T21:37:13+00:00 July 17th, 2016|Categories: Events|0 Comments

Apache Apex : next gen Big Data analytics

Presentation by Thomas Weise from DataTorrent (developpers of Apex) Introduction Apache Apex is an in-memory distributed parallel stream processing engine, like Flink or Storm. However, it is built with native Hadoop integration in mind : Yarn is used for resource managing and ordonnancing HDFS is used to store persistant states Application development model   A stream [...]

By | 2017-07-24T21:37:13+00:00 July 17th, 2016|Categories: Events|0 Comments

EclairJS – Putting a Spark in Web Apps

Presentation by David Fallside from IBM, images extracted from the presentation. Introduction Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich environment with NPM. EclairJS is a NodeJS library that provides bindings to a Spark application : An RDD is bound to a JS object that is made [...]

By | 2017-07-24T21:37:14+00:00 July 17th, 2016|Categories: Events|0 Comments