Adaltas manie les technologies open source de l’Internet. Nos domaines de compétences incluent la création d’applications riches basées sur l’HTML5, l’environnement serveur NodeJs, les stockages NoSQLs et le traitement de données massives, notamment sur la plateforme Hadoop.

Adaltas work with open source web technologies. Our focus is on rich Internet application based on HTML5, the server-side NodeJs stack, NoSQLs storages and big data treatment with Hadoop.

[en] Catch ‘uncaughtException’ error in your mocha test

This isn’t the first time I faced this situation. Today, I finally found the time and energy to look for a solution. In your mocha test, let’s say you need to test an expected “uncaughtException” event, the Node.js technique to catch the uncatchable. Easy, just register an “uncaughtException” listener to the process event emitter. Well, not so easy, and no so complicate either.

[en] Hadoop development cluster of virtual machines with static IP using VirtualBox

A few days ago, I explained how to set up a cluster of virtual machine with static IPs and Internet access suitable to host your Hadoop cluster locally for development. At the time I made use of VMWare. I’m getting back with the same topic but this time using the VirtualBox manager.

I decided to give a change to VirtualBox as an alternative to VMWare for multiple reasons. Installation of CentOs partially failed at the end. I need to reboot the machine. No real consequences but not a thing I appreciate. VirtualBox is free and open source, VMWare isn’t open source and even commercially distributed on OSX. Another goodies I was interested in, the ability to choose the IP rage of address for my internal network, I have a limited memory dedicated to those sort of things. After many trials, I managed to install only once the VMWare tools, don’t ask me how I did it, another traumatism. Finally, I have the sweet hypothetic idea of scripting the virtual machine provisioning and installation process. If I’m not wrong, that shouldn’t be a problem with VirtualBox.

[en] Virtual machines with static IP for your Hadoop development cluster

While I am about to install and test Ambari, this article is the occasion to illustrate how I set up my development environment with multiple virtual machines. Ambari, the deployment and monitoring tool for Hadoop cluster, will be the subject of a yet to be written article. My virtal environment is VMWare but VirtualBox has the same network functionalities and should work as well.

Note, I have since then written a similar article covering additionnal functionnalities and using VirtualBox.

What’s really important here is to assign to each virtual machine a fixed IP address which won’t change over time. I personally work on a MacBook pro laptop and I found it very frustrating to restart each of the Hadoop components when I receive new IP addresses while switching between networks. Additionally, the setup should also provide an Internet gateway.

[en] Splitting HDFS file into multiple hive tables

I am going to show how to split a file store as CSV inside HDFS into multiple Hive tables based on the content of each record. The context is simple. We are using Flume to collect logs from all over our datacenter through syslog. The stream is dumped into HDFS files partitioned by minute. Oozie is here listening to newly created directories and when ready, it want to distribute its content across various Hive tables, one for each log category.

For example, we want log ssh logs to go to the ssh table. If we cannot determine to which category a log record is associated, we dump it to an “xlogs” table. Later on, when appropriate new rules are added, we should be able to iterate through the “xlogs” table and dispatch its record across the appropriate tables.

[en] Merging multiple files in hadoop.

This is a command I used to concatenate the files stored in Hadoop HDFS matching a globing expression into a single file. It use the “getmerge” utility of “hadoop fs” but contrary to “getmerge”, the final merged file isn’t put into the local filesystem but inside HDFS.

[en] Maven 3 behind a proxy

Maven 3 isn’t so different to it’s previous version 2. You will migrate most of your project quite easily between the two versions. That wasn’t the case a fews years ago between versions 1 and 2. However it took me some time to find out how to properly configure my proxy settings and this article is the occasion to share the result and keep it for later consultation.