Apache Kafka
Related articles

Comparaison of different file formats in Big Data
Categories: Big Data, Data Engineering | Tags: Analytics, Avro, HDFS, Hive, Kafka, MapReduce, ORC, Batch processing, Big Data, CSV, Data Analytics, Data structures, Database, JSON, Protocol Buffers, Hadoop, Parquet, Spark, Kubernetes, XML
In data processing, there are different types of files formats to store your data sets. Each format has its own pros and cons depending upon the use cases and exists to serve one or several purposes…
By Aida NGOM
Jul 23, 2020

Policy enforcing with Open Policy Agent
Categories: Cyber Security, Data Governance | Tags: Kafka, Ranger, Authorization, REST, Cloud, Kubernetes, SSL/TLS
Open Policy Agent is an open-source multi-purpose policy engine. Its main goal is to unify policy enforcement across the cloud native stack. The project was created by Styra and it is currently…
Jan 22, 2020

Should you move your Big Data and Data Lake to the Cloud
Categories: Big Data, Cloud Computing | Tags: DevOps, AWS, Cloud, CDP, Databricks, GCP, Azure
Should you follow the trend and migrate your data, workflows and infrastructure to GCP, AWS and Azure? During the Strata Data Conference in New-York, a general focus was put on moving customer’s Big…
Dec 9, 2019

Internship Data Science & Data Engineer - ML in production and streaming data ingestion
Categories: Data Engineering, Data Science | Tags: Flink, Kafka, DevOps, Hadoop, HBase, Spark, Kubernetes, Python
Context The exponential evolution of data has turned the industry upside down by redefining data storage, processing and data ingestion pipelines. Mastering these methods considerably facilitates…
By David WORMS
Nov 26, 2019

InfraOps & DevOps Internship - build a Big Data & Kubernetes PaaS
Categories: Big Data, Containers Orchestration | Tags: Kafka, DevOps, LXD, NoSQL, Hadoop, Spark, Ceph, Kubernetes
Context The acquisition of a high-capacity cluster is in line with Adaltas’ desire to build a PAAS-type offering to use and to provide Big Data and container orchestration platforms. The platforms are…
By David WORMS
Nov 26, 2019

Machine Learning model deployment
Categories: Big Data, Data Engineering, Data Science, DevOps & SRE | Tags: DevOps, Operation, AI, Cloud, Machine Learning, MLOps, On-premises, Schema
“Enterprise Machine Learning requires looking at the big picture […] from a data engineering and a data platform perspective,” lectured Justin Norman during the talk on the deployment of Machine…
Sep 30, 2019

Running Apache Hive 3, new features and tips and tricks
Categories: Big Data, Business Intelligence, DataWorks Summit 2019 | Tags: Druid, Hive, Kafka, JDBC, LLAP, Hadoop, Release and features
Apache Hive 3 brings a bunch of new and nice features to the data warehouse. Unfortunately, like many major FOSS releases, it comes with a few bugs and not much documentation. It is available since…
Jul 25, 2019

Spark Streaming part 1: build data pipelines with Spark Structured Streaming
Categories: Data Engineering, Learning | Tags: Kafka, Apache Spark Streaming, Big Data, Streaming, Spark
Spark Structured Streaming is a new engine introduced with Apache Spark 2 used for processing streaming data. It is built on top of the existing Spark SQL engine and the Spark DataFrame. The…
Apr 18, 2019

Deploying a secured Flink cluster on Kubernetes
Categories: Big Data | Tags: Flink, HDFS, Kafka, Elasticsearch, Encryption, Kerberos, SSL/TLS
When deploying secured Flink applications inside Kubernetes, you are faced with two choices. Assuming your Kubernetes is secure, you may rely on the underlying platform or rely on Flink native…
By David WORMS
Oct 8, 2018

Lando: Deep Learning used to summarize conversations
Categories: Data Science, Learning | Tags: Deep Learning, Micro Services, Open API, Kubernetes, Neural Network, Node.js
Lando is an application to summarize conversations using Speech To Text to translate the written record of a meeting into text and Deep Learning technics to summarize contents. It allows users to…
By Yliess HATI
Sep 18, 2018

Curing the Kafka blindness with the UI manager
Categories: Big Data | Tags: Ambari, Kafka, Ranger, Hortonworks, HDP, HDF, JMX, UI
Today it’s really difficult for developers, operators and managers to visualize and monitor what happens in a Kafka cluster. This articles covers a new graphical interface to oversee Kafka. It was…
Jun 20, 2018

Apache Metron in the Real World
Categories: Cyber Security, DataWorks Summit 2018 | Tags: Algorithm, HDFS, Kafka, NiFi, Solr, Storm, Elasticsearch, pcap, RDBMS, Metron, Spark, Data Science, SQL
Apache Metron is a storage and analytic platform specialized in cyber security. This talk was about demonstrating the usages and capabilities of Apache Metron in the real world. The presentation was…
May 29, 2018

Exposing Kafka on two different networks
Categories: Infrastructure | Tags: Kafka, Cloudera, Cyber Security, Network, VLAN, CDH
A Big Data setup usually requires you to have multiple networking interface, let’s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform system…
Jul 22, 2017

Apache Apex: next gen Big Data analytics
Categories: Data Science, Events, Tech Radar | Tags: Apex, Flink, Kafka, Storm, Tools, Hadoop, Data Science, Machine Learning
Below is a compilation of my notes taken during the presentation of Apache Apex by Thomas Weise from DataTorrent, the company behind Apex. Introduction Apache Apex is an in-memory distributed parallel…
Jul 17, 2016

State of the Hadoop open-source ecosystem in early 2013
Categories: Big Data | Tags: Flume, Kafka, Mesos, Phoenix, Pig, Hadoop, Mahout, Data Science
Hadoop is already a large ecosystem and my guess is that 2013 will be the year where it grows even larger. There are some pieces that we no longer need to present. ZooKeeper, hbase, Hive, Pig, Flume…
By David WORMS
Jul 8, 2013