David Worms

About David Worms

Passionate with programming, data and entrepreneurship, I participate in shaping Adaltas to be a team of talented engineers to share our skills and experiences.

Node CSV version 0.2 with streaming API

Announced in august, the Node CSV parser in its version 0.2 has just been released. This version is a major enhancement as it aligned the parser with the best Node.js practice in respect of streams. The CSV parser behave both as a Stream Writer and a Stream Reader. Be carefull, to achieve this goal, a [...]

By |2017-11-21T20:19:51+00:00July 2nd, 2012|Categories: Node.js|0 Comments

Two Hive UDAF to convert an aggregation to a map

I am publishing two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub in two Java classes: “UDAFToMap” and “UDAFToOrderedMap” or you can download the jar file. The first function converts an aggregation into a map and is internally using a Java HashMap. The second function extends [...]

By |2018-06-05T22:37:23+00:00March 6th, 2012|Categories: Big Data|0 Comments

Coffee script, how do I debug that damn js line?

Update April 12th, 2012: Pull request adding error reporting to CoffeeScript with line mapping Chances are that, if you code in CoffeeScript, you often find yourself facing a JavaScript exception telling you a problem occured on a specific line. Problem is that the line number in question in the one of the generated JavaScript, not [...]

By |2018-06-05T22:37:25+00:00February 15th, 2012|Categories: Node.js|0 Comments

OS module on steroids with the SIGAR Node binding

Today we are announcing the first release of the Node binding to the SIGAR library. Visit the project website or the source code repository on GitHub. SIGAR is a cross platform interface for gathering system information. From the project website, such information include: System memory, swap, cpu, load average, uptime, logins Per-process memory, cpu, credential [...]

By |2018-06-05T22:37:27+00:00January 11th, 2012|Categories: Node.js|0 Comments

Timeseries storage in Hadoop and Hive

In the next few weeks, we will be exploring the storage and analytic of a large generated dataset. This dataset is composed of CRM tables associated to one timeserie table of about 7,000 billiard rows. Before importing the dataset into Hive, we will be exploring different optimization options expected to impact speed and storage size. [...]

By |2018-06-05T22:37:29+00:00January 10th, 2012|Categories: Big Data|0 Comments

How Node CSV parser may save your weekend

Last Friday, an hour before the doors of my customer close for the weekend, a co-worker came to me. He just finished to export 9 CSV files from [Oracle][oracle] which he wanted to import into [Greenplum][green] such as our customer could start testing on Monday morning. The problem as exposed was quite [...]

By |2018-06-05T22:37:31+00:00December 13th, 2011|Categories: Node.js|0 Comments