Introduction

In a processing framework world where Apache Spark catches most of the light, let’s dive in the background and checkout one of it’s opponents, Apache Flink.

Duration: 6h
Format: Talk

Presentation

As a data engineer in a Big Data world, you have to know your technologies and which is the best to your use case. Flink’s processing engine designed for data streams is on many aspects the best in its field but it’s also usually overlooked by customers because of Spark’s presence on the market.

Apache Flink excels at processing unbounded and bounded data sets and provides out of the box APIs to develop stateful stream applications, from log monitoring to a website’s backend or machine learning and content recommendation.

We’ll have you come out of this talk with an understanding of Flink’s internals (and the differences with Spark) and how to set it up and use it quickly.

Author

Cesar is a Big Data & Hadoop solution architect and data engineer with 3 years of experience on distributed systems. He designed and built data ingestion worflows and real time services as much as accompanied his clients to identify their needs. He is polyvalent on Big Data platforms from clusters planning, conception and deployment to prototyping, industrializing and maintaining applications in collaborations with users, analysts, data scientists, engineers and operations teams.

Recently he’s been working at Renault Digital to build a data streaming platform on Spark, and keeps himself in the loop on Flink in the hope of getting a client to work with it.