This isn’t the first time I faced this situation. Today, I finally found the time and energy to look for a solution. In your mocha test, let’s say you need to test an expected “uncaughtException” event, the Node.js technique to catch the uncatchable. Easy, just register an “uncaughtException” listener to the process event emitter. Well, not so easy, and no so complicate either.
Adaltas manie les technologies open source de l’Internet. Nos domaines de compétences incluent la création d’applications riches basées sur l’HTML5, l’environnement serveur NodeJs, les stockages NoSQLs et le traitement de données massives, notamment sur la plateforme Hadoop.
Adaltas work with open source web technologies. Our focus is on rich Internet application based on HTML5, the server-side NodeJs stack, NoSQLs storages and big data treatment with Hadoop.
A few days ago, I explained how to set up a cluster of virtual machine with static IPs and Internet access suitable to host your Hadoop cluster locally for development. At the time I made use of VMWare. I’m getting back with the same topic but this time using the VirtualBox manager.
I decided to give a change to VirtualBox as an alternative to VMWare for multiple reasons. Installation of CentOs partially failed at the end. I need to reboot the machine. No real consequences but not a thing I appreciate. VirtualBox is free and open source, VMWare isn’t open source and even commercially distributed on OSX. Another goodies I was interested in, the ability to choose the IP rage of address for my internal network, I have a limited memory dedicated to those sort of things. After many trials, I managed to install only once the VMWare tools, don’t ask me how I did it, another traumatism. Finally, I have the sweet hypothetic idea of scripting the virtual machine provisioning and installation process. If I’m not wrong, that shouldn’t be a problem with VirtualBox.
Apache Hadoop is of course made available for download on its official webpage. However, downloading and installing the several components that make a Hadoop cluster is not an easy task and is a daunting task. Below is a list of the main distributions including Hadoop. This follows an article published a few days ago about the Hadoop ecosystem.
While I am about to install and test Ambari, this article is the occasion to illustrate how I set up my development environment with multiple virtual machines. Ambari, the deployment and monitoring tool for Hadoop cluster, will be the subject of a yet to be written article. My virtal environment is VMWare but VirtualBox has the same network functionalities and should work as well.
Note, I have since then written a similar article covering additionnal functionnalities and using VirtualBox.
What’s really important here is to assign to each virtual machine a fixed IP address which won’t change over time. I personally work on a MacBook pro laptop and I found it very frustrating to restart each of the Hadoop components when I receive new IP addresses while switching between networks. Additionally, the setup should also provide an Internet gateway.
I am going to show how to split a file store as CSV inside HDFS into multiple Hive tables based on the content of each record. The context is simple. We are using Flume to collect logs from all over our datacenter through syslog. The stream is dumped into HDFS files partitioned by minute. Oozie is here listening to newly created directories and when ready, it want to distribute its content across various Hive tables, one for each log category.
For example, we want log ssh logs to go to the ssh table. If we cannot determine to which category a log record is associated, we dump it to an “xlogs” table. Later on, when appropriate new rules are added, we should be able to iterate through the “xlogs” table and dispatch its record across the appropriate tables.
This is a command I used to concatenate the files stored in Hadoop HDFS matching a globing expression into a single file. It use the “getmerge” utility of “hadoop fs” but contrary to “getmerge”, the final merged file isn’t put into the local filesystem but inside HDFS.
Maven 3 isn’t so different to it’s previous version 2. You will migrate most of your project quite easily between the two versions. That wasn’t the case a fews years ago between versions 1 and 2. However it took me some time to find out how to properly configure my proxy settings and this article is the occasion to share the result and keep it for later consultation.
While I’m release version 0.2.7 of the CSV parser for Node.js, I stop here to drop a few lines of what has made into this release.
Hadoop is already a large ecosystem and my guess is that 2013 will be the year where it grow even larger. There are some pieces that we no longer need to present. ZooKeeper, hbase, Hive, Pig, Flume, Oozie, Avro, Sqoop, Cascading, Cascalog, HUE, Mahout to name a few. At the same time, many open source projects are appearing on GitHub and elsewhere or being introduced Apache Incubator.
As a non restrictive open source license, the “new BSD license” is a commonly used license accross the Node.js community. However, this is only one of the BSD license available along the original “BSD license” and the “Simplified BSD License” or “FreeBSD License”. It may seems confusing as to how you must choose your license based on the clause you wish to endorse.