The operating system chosen is OSX but the procedure is not so different for any Unix environment because most of the software is downloaded from the Internet, uncompressed and set manually. Only a few packages are installed by Macport but these are easily found on equivalent tools like Apt and Yum. Since the downloaded software is in Java, there should be no worries about how it works in other environments.

This environment is configured in pseudo-distributed mode to best simulate the behavior of a cluster if a single station. In this mode, each Java process runs in its own JVM.

The procedure covers the installation of the following software:

Choice of versions

The software installation from the SVN repositories faced a problem of incompatibility between Hive which requires the latest stable version of Hadoop (2.20.2) and that of Sqoop which requires the SVN version of Hadoop. For this reason, we opted for versions distributed by Cloudera. Based on stable versions, they include many of the patches present in SVN repositories and are tested by some of the best experts in the community.

However, some features are not yet present at the time of distribution, so some of us also use versions compiled from SVN repositories. The software in question is HBase and Hive and their manual installation is not covered below.

Installation

The described procedure is based on the assumption that XCode and MacPort are already present on the system.

The distribution of Cloudera is CDH3beta2 which is not the most recent but the mechanism is the same provided you go to the Cloudera website and download the latest versions. MacPort Dependencies

Setting up SSH Preparing the installation directory Extracting packages Extracting packages Setting up the environment Software configuration

Use

Starting services

Stop services

Administration

If the installation went smoothly, the following URLs should be available:

  • Hadoop Map / Reduce Administration: http://localhost:50030

  • Hadoop File System Browser: http://localhost:50070

  • Hadoop Task Tracker Status: http://localhost:50060

  • Hue: http://localhost:8088