While teaching big data and Hadoop, a student ask me about SSH and how to use. I’ll discuss about the protocol and the tools to benefit from it.
Adaltas manie les technologies open source de l’Internet. Nos domaines de compétences incluent la création d’applications riches basées sur l’HTML5, l’environnement serveur NodeJs, les stockages NoSQLs et le traitement de données massives, notamment sur la plateforme Hadoop.
Adaltas work with open source web technologies. Our focus is on rich Internet application based on HTML5, the server-side NodeJs stack, NoSQLs storages and big data treatment with Hadoop.
WebHDFS is an HTTP Rest server bundle with the latest version of Hadoop. What interest me on this article is to dig into security with the Kerberos and delegation tokens functionalities. I will cover its usages from the command line and a programming language perspective.
In this tutorial, I provide complete instructions for creating a new Node.js module, writing the code in coffee-script, publishing it on GitHub, sharing it with other Node.js fellows through NPM, testing it with Mocha, Should and JsCoverage and integrating it to travis. Because of its simplicity, this module could also be used as a scaffolding to create your own modules. I will consider as a module best practices. Additionally, I recommend listening to this audiocast on GitHub about titled “a module authoring show”.
In the past few days, I’ve published 3 related articles: a first one covering the option to integrate Oracle and Hadoop, a second one explaining how to install and use the Oracle SQL Connector with HDFS and a third one explaining how to install and use the Oracle SQL Connector with Hive. Those last two articles raised some questions for which I have recompiled the notes below.
The documentation says:
The Oracle external table is not a “live” Hive table. When changes are made to a Hive table, you must use the ExternalTable tool to either republish the data or create a new external table.
I wasn’t sure how to interpret this. Particularly the part “changes are mode to a Hive table” and “republish the data”. Does “Hive table” refer to the data or the schema.
In a previous article published last week, I introduced the choices available to connect Oracle and Hadoop. In a follow up article, I covered the Oracle SQL Connector, its installation and integration with Apache Hadoop and more specifically how to declare a file present inside HDFS, the Hadoop filesystem, as a database table inside the Oracle database.
Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in HDFS files or a Hive table. You can also query and join data in HDFS or a Hive table with other database-resident data. If required, you can also load data into the database using SQL. For an Oracle user, HDFS files and the Hive tables are hidden behind external tables.
This article describes how to install and use the Oracle SQL Connector for Hadoop. I am only covering the integration with HDFS. Another article describes how to further configure the SQL connector to integrate with Hive.
I will list the different tools and libraries available to us developers in order to integrate Oracle and Hadoop. The Oracle SQL Connector for HDFS described below is covered in a follow up article with more details.
To summarize, we have Sqoop originally from Cloudera and now part of Apache, a Sqoop plugin from MapQuest and the Oracle Big Data connectors as a family of four distinct products which are Oracle Loader for Hadoop (OLH), Oracle SQL Connector for HDFS, Oracle R Connector for Hadoop and Oracle Data Integrator Application Adapter for Hadoop.
I was interested to compare the different components distributed by Cloudera and HortonWorks. This also gives us an idea of the versions packaged by the two distributions. At the time of this writting, April 2013, I am comparing the Cloudera distribution 4.2.0 and the Hortonwork Data Plaftorm 2.0.0.