Blog

MiNiFi: Data at Scales & the Values of Starting Small

This post is part of the Series of the Dataworks Summit 2017 (ex-Hadoop Summit) Speaker is Aldrin Piri from Hortonworks This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it's a NiFi minimal agent to deploy on small devices to bring data to a cluster's NiFi pipeline (ex: IoT). Here are [...]

By | 2017-07-24T21:37:13+00:00 July 8th, 2017|Categories: Blog, Events|Tags: , , , , |0 Comments

HDP cluster supervision

About With the current growth of BigData technologies, more and more companies are building their own clusters in hope to get some value of their data. One main concern while building these infrastructures is the capacity to continuously monitor the cluster's health and report issues as fast as possible. This is where supervision comes in. [...]

By | 2017-11-21T20:08:44+00:00 July 5th, 2017|Categories: Big Data|0 Comments

Get in control of your workflows with Apache Airflow

Presentation by Christian Trebing from BlueYonder Introduction Use case : how to handle data coming in regularly from customers ? Option 1 : use CRON only time triggers hard error handling inconvenient when overlapping Option 2 : Writing a workflow processing tool start is easy soon you reach limits: invest much more than envisionned of work with [...]

By | 2017-07-24T21:37:13+00:00 July 17th, 2016|Categories: Events|0 Comments

Apache Apex : next gen Big Data analytics

Presentation by Thomas Weise from DataTorrent (developpers of Apex) Introduction Apache Apex is an in-memory distributed parallel stream processing engine, like Flink or Storm. However, it is built with native Hadoop integration in mind : Yarn is used for resource managing and ordonnancing HDFS is used to store persistant states Application development model   A stream [...]

By | 2017-07-24T21:37:13+00:00 July 17th, 2016|Categories: Events|0 Comments

EclairJS – Putting a Spark in Web Apps

Presentation by David Fallside from IBM, images extracted from the presentation. Introduction Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich environment with NPM. EclairJS is a NodeJS library that provides bindings to a Spark application : An RDD is bound to a JS object that is made [...]

By | 2017-07-24T21:37:14+00:00 July 17th, 2016|Categories: Events|0 Comments

Apache Apex with Apache SAMOA

Traditional Machine Learning - Batch Oriented - Supervised - most common - Training and Scoring - One time model building - Data set - Training: Model building - Holdout: Paremeter tuning - Test: Accuracy Online Machine Learning - Streaming - Change - Dynmaically adapt to new patterns in Data - Change over time (concept drift) [...]

By | 2017-07-24T19:57:56+00:00 July 17th, 2016|Categories: Events|0 Comments

Hive, Calcite and Druid

BI/OLAP requires interactive visualization of complex data streams: Real time bidding events User activity streams Voice call logs Network trafic flows Firewall events Application KPIs Traditionnal solutions RDBMS (Mysql..): don't scale, need caching but adhoc queries remain slow Key/value store (HBase...): quick but takes forever to compute (pre-materialization of data) Context Created in 2011, open-sourced [...]

By | 2017-11-21T20:08:31+00:00 July 14th, 2016|Categories: Big Data|0 Comments

Network Namespace without Docker

Let's imagine the following use case: I am connected to several networks (wlan0, eth0, usb0). I want to choose which network I'm gonna use when I launch apps. My app doesn't allow me to choose a specific interface, it's delegated to the OS that chooses the default one. I could of course use Docker, which [...]

By | 2017-10-24T12:16:02+00:00 July 6th, 2016|Categories: Blog, Hack|Tags: |0 Comments

A simple connect middleware to transpile CoffeeScript files

This new module called connect-coffee-script is a Connect middleware used to serve javascript files written in CoffeeScript. This middleware is to be used by connect or any Connect compatible framework such as Express and Zappa. For those not familiar with CoffeeScript, it is a transpiler which compile into Javascript. […]

By | 2017-11-21T20:07:51+00:00 July 4th, 2016|Categories: Hack|0 Comments