cesar

About César Berezowski

Big Data consultant @ Adaltas since 2015, I enjoy discovering stuff and experimenting with new technologies in addition to my day to day work

What’s new in Apache Spark 2.3 ?

Let’s dive into the new features offered by the new 2.3 distribution of Apache Spark. […]

From Dockerfile to Ansible Containers

Presentation by Tomas Tomecek from Red Hat’s containerization team. This talk was an introduction to the Dockerfile format and to Ansible container’s tool and then a comparison of both. […]

By | 2017-11-23T11:21:04+00:00 October 25th, 2017|Categories: Open Source Summit Europe 2017|Tags: , , |0 Comments

Exposing Kafka on two different networks

A Big Data setup usually requires you to have multiple networking interface, let’s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform system wich functions like a publish/subscribe distributed messaging. It is designed for high throughput with built-in partitioning, replication, and fault tolerance. [...]

By | 2018-03-20T10:50:39+00:00 July 22nd, 2017|Categories: Blog|Tags: , |0 Comments

Change Ambari’s topbar color

We recently had a client that has multiple environments (Production, Integration, Testing, ...) running on HDP and managed using one Ambari instance per cluster. One of the questions that came up was the folloging: We need a way to distinguish our environment when on Ambari and the cluster name is visually not enough, how can [...]

By | 2017-07-24T21:37:13+00:00 July 9th, 2017|Categories: Hack|Tags: , |0 Comments

MiNiFi: Data at Scales & the Values of Starting Small

This post is part of the Series of the Dataworks Summit 2017 (ex-Hadoop Summit) Speaker is Aldrin Piri from Hortonworks This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it's a NiFi minimal agent to deploy on small devices to bring data to a cluster's NiFi pipeline (ex: IoT). Here are [...]

By | 2017-07-24T21:37:13+00:00 July 8th, 2017|Categories: Blog, Events|Tags: , , , , |0 Comments

Get in control of your workflows with Apache Airflow

Presentation by Christian Trebing from BlueYonder Introduction Use case : how to handle data coming in regularly from customers ? Option 1 : use CRON only time triggers hard error handling inconvenient when overlapping Option 2 : Writing a workflow processing tool start is easy soon you reach limits: invest much more than envisionned of work with [...]

By | 2017-07-24T21:37:13+00:00 July 17th, 2016|Categories: Events|0 Comments

Apache Apex : next gen Big Data analytics

Presentation by Thomas Weise from DataTorrent (developpers of Apex) Introduction Apache Apex is an in-memory distributed parallel stream processing engine, like Flink or Storm. However, it is built with native Hadoop integration in mind : Yarn is used for resource managing and ordonnancing HDFS is used to store persistant states Application development model   A stream [...]

By | 2017-07-24T21:37:13+00:00 July 17th, 2016|Categories: Events|0 Comments

EclairJS – Putting a Spark in Web Apps

Presentation by David Fallside from IBM, images extracted from the presentation. Introduction Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich environment with NPM. EclairJS is a NodeJS library that provides bindings to a Spark application : An RDD is bound to a JS object that is made [...]

By | 2017-07-24T21:37:14+00:00 July 17th, 2016|Categories: Events|0 Comments