cesar

About César Berezowski

Big Data consultant @ Adaltas since 2015, I enjoy discovering stuff and experimenting with new technologies in addition to my day to day work

Apache Flink: past, present and future

Apache Flink is a little gem which deserves a lot more attention. Let’s dive into Flink’s past, its current state and the future it is heading to by following the keynotes and presentations at Flink Forward 2018. […]

By |2018-11-15T11:47:31+00:00November 5th, 2018|Categories: Big Data, Data Engineering|Tags: , , , , , , |0 Comments

From Dockerfile to Ansible Containers

Presentation by Tomas Tomecek from Red Hat’s containerization team. This talk was an introduction to the Dockerfile format and to Ansible container’s tool and then a comparison of both. […]

By |2018-06-05T22:36:50+00:00October 25th, 2017|Categories: Open Source Summit Europe 2017|Tags: , , |0 Comments

Exposing Kafka on two different networks

A Big Data setup usually requires you to have multiple networking interface, let’s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform system wich functions like a publish/subscribe distributed messaging. It is designed for high throughput with built-in partitioning, replication, and fault tolerance. [...]

By |2018-06-05T22:37:00+00:00July 22nd, 2017|Categories: Blog|Tags: , |0 Comments

Change Ambari’s topbar color

We recently had a client that has multiple environments (Production, Integration, Testing, ...) running on HDP and managed using one Ambari instance per cluster. One of the questions that came up was the folloging: We need a way to distinguish our environment when on Ambari and the cluster name is visually not enough, how can [...]

By |2018-06-05T22:37:01+00:00July 9th, 2017|Categories: Hack|Tags: , |1 Comment

MiNiFi: Data at Scales & the Values of Starting Small

This post is part of the Series of the Dataworks Summit 2017 (ex-Hadoop Summit) Speaker is Aldrin Piri from Hortonworks This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it's a NiFi minimal agent to deploy on small devices to bring data to a cluster's NiFi pipeline (ex: IoT). Here are [...]

By |2018-06-05T22:37:03+00:00July 8th, 2017|Categories: Blog, Events|Tags: , , , , |0 Comments

Get in control of your workflows with Apache Airflow

Presentation by Christian Trebing from BlueYonder Introduction Use case : how to handle data coming in regularly from customers ? Option 1 : use CRON only time triggers hard error handling inconvenient when overlapping Option 2 : Writing a workflow processing tool start is easy soon you reach limits: invest much more than envisionned of work with [...]

By |2018-06-05T22:37:05+00:00July 17th, 2016|Categories: Events|0 Comments

Apache Apex : next gen Big Data analytics

Presentation by Thomas Weise from DataTorrent (developpers of Apex) Introduction Apache Apex is an in-memory distributed parallel stream processing engine, like Flink or Storm. However, it is built with native Hadoop integration in mind : Yarn is used for resource managing and ordonnancing HDFS is used to store persistant states Application development model   A stream [...]

By |2018-06-05T22:37:06+00:00July 17th, 2016|Categories: Events|0 Comments