Adaltas manie les technologies open source de l’Internet. Nos domaines de compétences incluent la création d’applications riches basées sur l’HTML5, l’environnement serveur NodeJs, les stockages NoSQLs et le traitement de données massives, notamment sur la plateforme Hadoop.

Adaltas work with open source web technologies. Our focus is on rich Internet application based on HTML5, the server-side NodeJs stack, NoSQLs storages and big data treatment with Hadoop.

[en] Remote connection with SSH

While teaching big data and Hadoop, a student ask me about SSH and how to use. I’ll discuss about the protocol and the tools to benefit from it.

Lately, I’ve been installing Hadoop with its core components including Kerberos and LDAP servers from a single host to full clusters using SSH to remotely and securely run commands and access files.

[en] Crawl you website including login form with Phantomjs

With PhantomJS, we start a headless WebKit and pilot it with our own scripts. Said differently, we write a script in JavaScript or CoffeeScript which controls an Internet browser and manipulates the webpage loaded inside. In the past, I’ve used a similar solution called [Selenium]. PhantomJS is much faster, it doesn’t start a graphical browser (that’s what headless stands for) and you can inject your own JavaScript inside the page (I can’t remember that we could do such a thing with Selenium).

[en] Kerberos and delegation tokens security with WebHDFS

WebHDFS is an HTTP Rest server bundle with the latest version of Hadoop. What interest me on this article is to dig into security with the Kerberos and delegation tokens functionalities. I will cover its usages from the command line and a programming language perspective.

[en] Tutorial for creating and publishing a new Node.js module

In this tutorial, I provide complete instructions for creating a new Node.js module, writing the code in coffee-script, publishing it on GitHub, sharing it with other Node.js fellows through NPM, testing it with Mocha, Should and JsCoverage and integrating it to travis. Because of its simplicity, this module could also be used as a scaffolding to create your own modules. I will consider as a module best practices. Additionally, I recommend listening to this audiocast on GitHub about titled “a module authoring show”.

[en] Oracle and Hive, how data are published?

In the past few days, I’ve published 3 related articles: a first one covering the option to integrate Oracle and Hadoop, a second one explaining how to install and use the Oracle SQL Connector with HDFS and a third one explaining how to install and use the Oracle SQL Connector with Hive. Those last two articles raised some questions for which I have recompiled the notes below.

The documentation says:

The Oracle external table is not a “live” Hive table. When changes are made to a Hive table, you must use the ExternalTable tool to either republish the data or create a new external table.

I wasn’t sure how to interpret this. Particularly the part “changes are mode to a Hive table” and “republish the data”. Does “Hive table” refer to the data or the schema.

[en] Oracle to Apache Hive with the Oracle SQL Connector

In a previous article published last week, I introduced the choices available to connect Oracle and Hadoop. In a follow up article, I covered the Oracle SQL Connector, its installation and integration with Apache Hadoop and more specifically how to declare a file present inside HDFS, the Hadoop filesystem, as a database table inside the Oracle database.

Below I will complement the integration between Oracle and Hadoop with the integration of the Apache Hive data warehouse system.

[en] Testing the Oracle SQL Connector for Hadoop HDFS

Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in HDFS files or a Hive table. You can also query and join data in HDFS or a Hive table with other database-resident data. If required, you can also load data into the database using SQL. For an Oracle user, HDFS files and the Hive tables are hidden behind external tables.

This article describes how to install and use the Oracle SQL Connector for Hadoop. I am only covering the integration with HDFS. Another article describes how to further configure the SQL connector to integrate with Hive.

[en] Options to connect and integrate Hadoop with Oracle

I will list the different tools and libraries available to us developers in order to integrate Oracle and Hadoop. The Oracle SQL Connector for HDFS described below is covered in a follow up article with more details.

To summarize, we have Sqoop originally from Cloudera and now part of Apache, a Sqoop plugin from MapQuest and the Oracle Big Data connectors as a family of four distinct products which are Oracle Loader for Hadoop (OLH), Oracle SQL Connector for HDFS, Oracle R Connector for Hadoop and Oracle Data Integrator Application Adapter for Hadoop.