Home 2017-11-23T09:09:31+00:00


Data Engineering

Data collect, data preparation, data lake, data gouvernance

Data Science

Writing algorithms, Spark, machine learning, exploration, statistics, python, R

Data Streaming

Message Bus, Key Performance Indicator (PKI), Threshold Detection, Time Window Queries, Intelligent Behaviors

Data Analytics

Visualization, notebooks

Latest articles

Merging multiple files in hadoop

By | July 12th, 2013|Categories: Big Data|

This is a command I used to concatenate the files stored in Hadoop HDFS matching a globing expression into a single file. It use the "getmerge" utility of "hadoop fs" but contrary to "getmerge", the [...]

Maven 3 behind a proxy

By | July 11th, 2013|Categories: Hack|

Maven 3 isn't so different to it's previous version 2. You will migrate most of your project quite easily between the two versions. That wasn't the case a fews years ago between versions 1 and [...]

The state of Hadoop distributions

By | July 11th, 2013|Categories: Big Data|

Apache Hadoop is of course made available for download on its official webpage. However, downloading and installing the several components that make a Hadoop cluster is not an easy task and is a daunting task. [...]

Node CSV version 0.2.7

By | July 9th, 2013|Categories: Node.js|

While I’m release version 0.2.7 of the CSV parser for Node.js, I stop here to drop a few lines of what has made into this release. […]