Big Data

Ambari – How to blueprint

As infrastructure engineers at Adaltas, we deploy Hadoop clusters. A lot of them. Our clients usually choose to use an entreprise ready distribution like HDP or CDH with their built-in cluster deployment and management solutions, namely Ambari and Cloudera Manager. These tools offer an easy way to deploy clusters through their well documented and straightforward [...]

By | 2018-01-17T10:13:17+00:00 January 17th, 2018|Categories: Big Data|Tags: , , , |0 Comments

HDP cluster supervision

About With the current growth of BigData technologies, more and more companies are building their own clusters in hope to get some value of their data. One main concern while building these infrastructures is the capacity to continuously monitor the cluster's health and report issues as fast as possible. This is where supervision comes in. [...]

By | 2017-11-21T20:08:44+00:00 July 5th, 2017|Categories: Big Data|0 Comments

Hive, Calcite and Druid

BI/OLAP requires interactive visualization of complex data streams: Real time bidding events User activity streams Voice call logs Network trafic flows Firewall events Application KPIs Traditionnal solutions RDBMS (Mysql..): don't scale, need caching but adhoc queries remain slow Key/value store (HBase...): quick but takes forever to compute (pre-materialization of data) Context Created in 2011, open-sourced [...]

By | 2017-11-21T20:08:31+00:00 July 14th, 2016|Categories: Big Data|0 Comments

L’offre Red Hat Storage et son intégration avec Hadoop

J’ai eu l’occasion d’être introduit à Red Hat Storage et Gluster lors d’une présentation menée conjointement par Red Hat France et la société StartX. J’ai ici recompilé mes notes, du moins partiellement. Je terminerai pas l’intégration entre Red Hat Storage et Hadoop, plus particulièrement ce qu’on peut en attendre avant de mener une expérimentation en [...]

By | 2017-11-21T20:06:23+00:00 July 3rd, 2016|Categories: Big Data|0 Comments

Oracle to Apache Hive with the Oracle SQL Connector

In a previous article published last week, I introduced the choices available to connect Oracle and Hadoop. In a follow up article, I covered the Oracle SQL Connector, its installation and integration with Apache Hadoop and more specifically how to declare a file present inside HDFS, the Hadoop filesystem, as a database table inside the [...]

By | 2017-11-21T19:51:34+00:00 July 27th, 2013|Categories: Big Data|0 Comments

Kerberos and delegation tokens security with WebHDFS

WebHDFS is an HTTP Rest server bundle with the latest version of Hadoop. What interest me on this article is to dig into security with the Kerberos and delegation tokens functionalities. I will cover its usages from the command line and a programming language perspective. Don't crawl the web looking for a command to start [...]

By | 2017-11-21T20:14:56+00:00 July 25th, 2013|Categories: Big Data|0 Comments

Apache Hive Essentials How-to by Darren Lee

Recently, I've been ask to review a new book on Apache Hive called "Apache Hive Essentials How-to" written by Darren Lee and published by Packt Publishing. To say it short, I sincerely recommend it. I focused here on what I liked the most and the things I would have personnaly liked to read about. Looking [...]

By | 2017-11-21T20:14:08+00:00 July 23rd, 2013|Categories: Big Data|0 Comments

Splitting HDFS file into multiple hive tables

I am going to show how to split a file store as CSV inside HDFS into multiple Hive tables based on the content of each record. The context is simple. We are using Flume to collect logs from all over our datacenter through syslog. The stream is dumped into HDFS files partitioned by minute. Oozie [...]

By | 2017-11-21T20:14:01+00:00 July 15th, 2013|Categories: Big Data|0 Comments

Options to connect and integrate Hadoop with Oracle

I will list the different tools and libraries available to us developers in order to integrate Oracle and Hadoop. The Oracle SQL Connector for HDFS described below is covered in a follow up article with more details. To summarize, we have Sqoop originally from Cloudera and now part of Apache, a Sqoop plugin from MapQuest [...]

By | 2017-11-21T20:13:53+00:00 July 15th, 2013|Categories: Big Data|0 Comments

Testing the Oracle SQL Connector for Hadoop HDFS

Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in HDFS files or a Hive table. You can also query and join data in HDFS or a Hive table with other database-resident data. If required, you can also load data into the database using SQL. For an [...]

By | 2017-11-21T20:13:43+00:00 July 15th, 2013|Categories: Big Data|0 Comments