Adaltas manie les technologies open source de l’Internet. Nos domaines de compétences incluent la création d’applications riches basées sur l’HTML5, l’environnement serveur NodeJs, les stockages NoSQLs et le traitement de données massives, notamment sur la plateforme Hadoop.
Adaltas work with open source web technologies. Our focus is on rich Internet application based on HTML5, the server-side NodeJs stack, NoSQLs storages and big data treatment with Hadoop.
Suite à une demande, l’article ci-dessous est la traduction d’un précédent publié le 19 février 2012.
Aujourd’hui, j’ai finalement décidé de passer une peu de temps autour de Travis. Cette petite image verte en haut des pages d’accueil de projets GitHub m’intrigue de plus en plus ces derniers jours. En fait, pour être tout à fait honnête, ce n’est pas exactement ainsi que j’ai débuté ma soirée. Tout d’abord, après deux ans de bon et loyaux services, j’ai décidé d’abandonner Expresso pour donner une chance à Mocha. Et puisque je m’étais habitué aux quelques petites fonction dont Expresso enrichit le module assert, il m’a fallut y remédier, ce qui m’a conduit au module Should. Il me fut assez plaisant de voir comment ces deux derniers modules se complètent parfaitement l’un et l’autre, dans la plus pure tradition Unix: petit, puissant et bon citoyen.
A few days ago, we have conducted a test in order to compare various Hive file formats and compression methods. Among those file formats, some are native to HDFS and apply to all Hadoop users. The test suite is composed of similar Hive queries which create a table, eventually set a compression type and load the same dataset into the new table. Among all the queries, we tested the “sequence file”, “text file” and “RCFILE” formats and the “default”, “bz”, “gz”, “LZO” and “Snappy” compression codecs.
I am publishing two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub in two Java classes: “UDAFToMap” and “UDAFToOrderedMap” or you can download the jar file. The first function converts an aggregation into a map and is internally using a Java HashMap. The second function extends the first one. It converts an aggregation into an ordered map and is internally using a Java TreeMap.
Today, I finally decided to spend some time around Travis. It’s been a weeks since that little green image on top of GitHub homepages has been buzzing me. Well, to be totally honest, this isn’t how I started my evening. First, after 2 years of good and faithfull service, I decided to drop Expresso and give a chance to Mocha. Because Expresso enriches the assert module with one or two functions which I became addicted to, I also had to find the same functionalities into another assertion library which lead me to testing Should. It is very pleasant to see those two working together, as in the Unix tradition: small, powerful and naturally integrated.
Chances are that, if you code in CoffeeScript, you often find yourself facing a JavaScript exception telling you a problem occured on a specific line. Problem is that the line number in question in the one of the generated JavaScript, not your in CoffeeScript line number. Even worse, if you generate your JavaScript transparently, you wont have any JavaScript file to look into and the all process of finding where this error occored is even more frustrating.
Well, it seems like the future version of JavaScript could come to the rescue, but not before a few months. In the mean time, here’s a little of fun about writing a small Bash script that may save you some time.
We are releasing Node Mecano on GitHub which gather common functions used while deploying systems. The idea was to group those functions into a comprehensive library.
In the next few weeks, we will be exploring the storage and analytic of a large generated dataset. This dataset is composed of CRM tables associated to one timeserie table of about 7,000 billiard rows.
Before importing the dataset into Hive, we will be exploring different optimization options expected to impact speed and storage size.
Last Friday, an hour before the doors of my customer close for the weekend, a co-worker came to me. He just finished to export 9 CSV files from Oracle which he wanted to import into Greenplum such as our customer could start testing on Monday morning.
The problem as exposed was quite simple. He needed a quick solution (less than an hour, coding included) to transform all the date in the source CSV file into a format suitable for Greenplum. While Oracle exported dates in the form of ‘DD/MM/YYYY’, Greenplum was picky enough to expect dates in the form of ‘YYYY-MM-DD’.