Blog

Maven 3 behind a proxy

Maven 3 isn't so different to it's previous version 2. You will migrate most of your project quite easily between the two versions. That wasn't the case a fews years ago between versions 1 and 2. However it took me some time to find out how to properly configure my proxy settings and this article [...]

By | 2017-11-21T20:16:01+00:00 July 11th, 2013|Categories: Hack|0 Comments

The state of Hadoop distributions

Apache Hadoop is of course made available for download on its official webpage. However, downloading and installing the several components that make a Hadoop cluster is not an easy task and is a daunting task. Below is a list of the main distributions including Hadoop. This follows an article published a few days ago about [...]

By | 2017-11-21T20:16:10+00:00 July 11th, 2013|Categories: Big Data|0 Comments

Definitions of machine learning algorithms present in Apache Mahout

Apache Mahout is a machine learning library built for scalability. Its core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. It contains various algorithms which we are defining below. Each of them may define multiple implementations. A mojority but not all of the [...]

By | 2017-11-21T20:16:25+00:00 July 8th, 2013|Categories: Big Data|0 Comments

About the new BSD license and its difference with other BSD licenses

As a non restrictive open source license, the “new BSD license” is a commonly used license accross the Node.js community. However, this is only one of the BSD license available along the original “BSD license” and the “Simplified BSD License” or “FreeBSD License”. It may seems confusing as to how you must choose your license [...]

By | 2017-11-21T20:16:35+00:00 July 8th, 2013|Categories: Data Governance|0 Comments

State of the Hadoop open-source ecosystem in early 2013

Hadoop is already a large ecosystem and my guess is that 2013 will be the year where it grow even larger. There are some pieces that we no longer need to present. ZooKeeper, hbase, Hive, Pig, Flume, Oozie, Avro, Sqoop, Cascading, Cascalog, HUE, Mahout to name a few. At the same time, many open source [...]

By | 2017-11-21T20:16:43+00:00 July 8th, 2013|Categories: Big Data|0 Comments

Oracle and Hive, how data are published?

In the past few days, I've published 3 related articles: a first one covering the option to integrate Oracle and Hadoop, a second one explaining how to install and use the Oracle SQL Connector with HDFS and a third one explaining how to install and use the Oracle SQL Connector with Hive. Those last two [...]

By | 2017-11-21T20:16:51+00:00 July 6th, 2013|Categories: Big Data|0 Comments

Tutorial for creating and publishing a new Node.js module

In this tutorial, I provide complete instructions for creating a new Node.js module, writing the code in coffee-script, publishing it on GitHub, sharing it with other Node.js fellows through NPM, testing it with Mocha, Should and JsCoverage and integrating it to travis. Because of its simplicity, this module could also be used as a scaffolding [...]

By | 2017-11-21T20:17:02+00:00 July 4th, 2013|Categories: Node.js|0 Comments

Composants for CDH and HDP

I was interested to compare the different components distributed by Cloudera and HortonWorks. This also gives us an idea of the versions packaged by the two distributions. At the time of this writting, April 2013, I am comparing the Cloudera distribution 4.2.0 and the Hortonwork Data Plaftorm 2.0.0. CDH 4.2.0 bigtop-jsvc 1.0.10-cdh4.2.0 bigtop-tomcat 6.0.35-cdh4.2.0 datafu [...]

By | 2017-11-21T20:17:08+00:00 July 2nd, 2013|Categories: Big Data|0 Comments

Traverser des arrays en mode asynchronisé dans Node.js avec Each

Les librairies en Node.js permettant de gérer et simplifier les appels asynchrones sont légions. Voici le genre de librairies que chacun écrit pour lui et éventuellement publie. Elles ont pour but de réduire les codes spaghetti constitué d’imbrication de callbacks. Je ne fais pas exception. Après un an et demi d’usage intensif, je pense qu’il [...]

By | 2017-11-21T20:17:17+00:00 July 29th, 2012|Categories: Node.js|0 Comments