Blog

Get in control of your workflows with Apache Airflow

Presentation by Christian Trebing from BlueYonder Introduction Use case : how to handle data coming in regularly from customers ? Option 1 : use CRON only time triggers hard error handling inconvenient when overlapping Option 2 : Writing a workflow processing tool start is easy soon you reach limits: invest much more than envisionned of work with [...]

By | 2017-07-24T21:37:13+00:00 July 17th, 2016|Categories: Events|0 Comments

Apache Apex : next gen Big Data analytics

Presentation by Thomas Weise from DataTorrent (developpers of Apex) Introduction Apache Apex is an in-memory distributed parallel stream processing engine, like Flink or Storm. However, it is built with native Hadoop integration in mind : Yarn is used for resource managing and ordonnancing HDFS is used to store persistant states Application development model   A stream [...]

By | 2017-07-24T21:37:13+00:00 July 17th, 2016|Categories: Events|0 Comments

EclairJS – Putting a Spark in Web Apps

Presentation by David Fallside from IBM, images extracted from the presentation. Introduction Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich environment with NPM. EclairJS is a NodeJS library that provides bindings to a Spark application : An RDD is bound to a JS object that is made [...]

By | 2017-07-24T21:37:14+00:00 July 17th, 2016|Categories: Events|0 Comments

Apache Apex with Apache SAMOA

Traditional Machine Learning - Batch Oriented - Supervised - most common - Training and Scoring - One time model building - Data set - Training: Model building - Holdout: Paremeter tuning - Test: Accuracy Online Machine Learning - Streaming - Change - Dynmaically adapt to new patterns in Data - Change over time (concept drift) [...]

By | 2017-07-24T19:57:56+00:00 July 17th, 2016|Categories: Events|0 Comments

Hive, Calcite and Druid

BI/OLAP requires interactive visualization of complex data streams: Real time bidding events User activity streams Voice call logs Network trafic flows Firewall events Application KPIs Traditionnal solutions RDBMS (Mysql..): don't scale, need caching but adhoc queries remain slow Key/value store (HBase...): quick but takes forever to compute (pre-materialization of data) Context Created in 2011, open-sourced [...]

By | 2017-11-21T20:08:31+00:00 July 14th, 2016|Categories: Big Data|0 Comments

Network Namespace without Docker

Let's imagine the following use case: I am connected to several networks (wlan0, eth0, usb0). I want to choose which network I'm gonna use when I launch apps. My app doesn't allow me to choose a specific interface, it's delegated to the OS that chooses the default one. I could of course use Docker, which [...]

By | 2017-10-24T12:16:02+00:00 July 6th, 2016|Categories: Blog, Hack|Tags: |0 Comments

A simple connect middleware to transpile CoffeeScript files

This new module called connect-coffee-script is a Connect middleware used to serve javascript files written in CoffeeScript. This middleware is to be used by connect or any Connect compatible framework such as Express and Zappa. For those not familiar with CoffeeScript, it is a transpiler which compile into Javascript. […]

By | 2017-11-21T20:07:51+00:00 July 4th, 2016|Categories: Hack|0 Comments

L’offre Red Hat Storage et son intégration avec Hadoop

J’ai eu l’occasion d’être introduit à Red Hat Storage et Gluster lors d’une présentation menée conjointement par Red Hat France et la société StartX. J’ai ici recompilé mes notes, du moins partiellement. Je terminerai pas l’intégration entre Red Hat Storage et Hadoop, plus particulièrement ce qu’on peut en attendre avant de mener une expérimentation en [...]

By | 2017-11-21T20:06:23+00:00 July 3rd, 2016|Categories: Big Data|0 Comments

Virtual machines with static IP for your Hadoop development cluster

While I am about to install and test Ambari, this article is the occasion to illustrate how I set up my development environment with multiple virtual machines. Ambari, the deployment and monitoring tool for Hadoop cluster, will be the subject of a yet to be written article. My virtal environment is VMWare but VirtualBox has [...]

By | 2017-11-21T20:11:45+00:00 July 27th, 2013|Categories: Hack|0 Comments

Catch ‘uncaughtException’ error in your mocha test

This isn't the first time I faced this situation. Today, I finally found the time and energy to look for a solution. In your mocha test, let's say you need to test an expected "uncaughtException" event, the Node.js technique to catch the uncatchable. Easy, just register an "uncaughtException" listener to the process event emitter. Well, [...]

By | 2017-11-21T19:54:26+00:00 July 27th, 2013|Categories: Hack|0 Comments