HBase

Clusters and workloads migration from Hadoop 2 to Hadoop 3

Hadoop 2 to Hadoop 3 migration is a hot subject. How to upgrade your clusters, which features present in the new release may solve current problems and bring new opportunities, how are your current processes impacted, which migration strategy is the most appropriate to your organization? […]

By |2018-08-17T09:36:26+00:00July 25th, 2018|Categories: Big Data|Tags: , , , |0 Comments

Data Lake ingestion best practices

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers. […]

By |2018-06-18T09:29:50+00:00June 18th, 2018|Categories: Data Engineering, DevOps|Tags: , , , , , , , |1 Comment

Omid: Scalable and highly available transaction processing for Apache Phoenix

Apache Omid provides a transactional layer on top of key/value NoSQL databases. In practice, it is usually used on top of Apache HBase. […]

By |2018-06-05T22:36:36+00:00May 24th, 2018|Categories: Big Data, DataWorks Summit 2018, Events|Tags: , , , , , |1 Comment

Essential questions about Time Series

Today, the bulk of Big Data is temporal. We see it in the media and among our customers: smart meters, banking transactions, smart factories, connected vehicles … IoT and Big Data go hand in hand. […]

By |2018-06-05T22:36:40+00:00March 19th, 2018|Categories: Big Data, Data Engineering|Tags: , , , , , |0 Comments

HDFS and Hive storage – comparing file formats and compression methods

A few days ago, we have conducted a test in order to compare various Hive file formats and compression methods. Among those file formats, some are native to HDFS and apply to all Hadoop users. The test suite is composed of similar Hive queries which create a table, eventually set a compression type and load [...]

By |2019-06-25T10:32:24+00:00March 13th, 2012|Categories: Data Engineering|Tags: , , , , , |0 Comments

Two Hive UDAF to convert an aggregation to a map

I am publishing two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub in two Java classes: “UDAFToMap” and “UDAFToOrderedMap” or you can download the jar file. The first function converts an aggregation into a map and is internally using a Java HashMap. The second function extends [...]

By |2019-06-25T10:25:53+00:00March 6th, 2012|Categories: Data Engineering|Tags: , , , |0 Comments

Hadoop installation on OSX in pseudo-distributed mode

[crayon-5d33ba620d923562473906/] The operating system chosen is OSX but the procedure is not so different for any Unix environment because most of the software is downloaded from the Internet, uncompressed and set manually. Only a few packages are installed by Macport but these are easily found on equivalent tools like Apt and Yum. Since the downloaded [...]

By |2019-06-23T21:39:15+00:00December 1st, 2010|Categories: Hack|Tags: , , , |0 Comments

Node HBase, a NodeJs client for Apache HBase

HBase is a "column familly" database from the Hadoop ecosystem built on the model of Google BigTable. HBase can accommodate very large volumes of data (tera or peta) while maintaining high availability and fast response times. Adaltas has posted a Node.js client for HBase whose code is published on GitHub and which uses the REST [...]

By |2019-06-23T21:36:24+00:00November 1st, 2010|Categories: Big Data|Tags: , , , |0 Comments