kafka

Spark Streaming part 1: build data pipelines with Spark Structured Streaming

Spark Structured Streaming is a new engine introduced with Apache Spark 2 used for processing streaming data. It is built on top of the existing Spark SQL engine and the Spark DataFrame. The Structured Streaming engine shares the same API as with the Spark SQL engine and is as easy to use. Spark Structured Streaming [...]

By |2019-04-18T16:07:47+00:00April 18th, 2019|Categories: Big Data, Data Engineering|Tags: , , , , |0 Comments

Deploying a secured Flink cluster on Kubernetes

When deploying secured Flink applications inside Kubernetes, you are faced with two choices. Assuming your Kubernetes is secure, you may rely on the underlying platform or rely on Flink native solutions to secure your application from the inside. Note, those two solutions are not mutually exclusive. […]

By |2018-10-09T11:25:29+00:00October 8th, 2018|Categories: Big Data, Cyber Security|Tags: , , , , , |0 Comments

Lando: Deep Learning used to summarize conversations

Lando is an application to summarize conversations using Speech To Text to translate the written record of a meeting into text and Deep Learning technics to summarize contents. It allows users to quickly understand the context of the conversation. During the cource of our internship at Adaltas, we worked on a new project called Lando to [...]

Curing the Kafka blindness with the UI manager

Today it’s really difficult for developers, operators and managers to visualize and monitor what happens in a Kafka cluster. This articles covers a new graphical interface to oversee Kafka. It was given  by George Vetticaden, VP Management product at Hortonworks, during the DataWorks Summit at the San Jose Conference Center June 2018. […]

By |2018-06-21T13:06:52+00:00June 20th, 2018|Categories: Big Data, DataWorks Summit 2018|Tags: , , , |0 Comments

Exposing Kafka on two different networks

A Big Data setup usually requires you to have multiple networking interface, let’s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform system wich functions like a publish/subscribe distributed messaging. It is designed for high throughput with built-in partitioning, replication, and fault tolerance. [...]

By |2018-06-05T22:37:00+00:00July 22nd, 2017|Categories: Blog|Tags: , |0 Comments