Ambari – How to blueprint

Ambari – How to blueprint

As infrastructure engineers at Adaltas, we deploy Hadoop clusters. A lot of them. Let’s see how to automate this process with REST requests.

While really handy for deploying one or two clusters, the process of filling hundreds of fields involving many copy/pasting when deploying a dozen of them can be painful. This is where automation comes in.

Our clients usually choose to use an entreprise-ready distribution like Hortonworks HDP or Cloudera CDH with their built-in cluster deployment and management solutions, namely Ambari and Cloudera Manager. These tools offer an easy way to deploy clusters through their well documented and straightforward UIs. In this article, we will focus on HDP’s deployment tool: Ambari, and its cluster definition files: blueprints.

What are blueprints

Blueprints in an Ambari environment can mean two things. The first one is the following, taken directly from Ambari’s documentation :

Ambari Blueprints are a declarative definition of a cluster. With a Blueprint, you specify a Stack, the Component layout and the Configurations to materialize a Hadoop cluster instance (via a REST API) without having to use the Ambari Cluster Install Wizard.

This is the global definition of the Ambari Blueprint technology. This technology is, in fact, two JSON files submitted one after the other to Ambari’s REST API.

One of these files, the first to be submitted, is the second meaning of a blueprint. It represents a template that can be used for as many cluster deployments as we like.
Since it can be used to define multiple clusters over various environments, it has to be as generic as possible.

The second file to be submitted will be used to set all properties that are limited to one cluster instance. We’ll call it the cluster file. Ambari uses the information gotten from the previously submitted blueprint file and enriches them with the cluster file to launch the deployment process. Properties that are set in the cluster file will override the ones of the blueprint file when needed.

This is what a cluster deployment using Ambari’s Blueprints looks like :

  1. Install and configure Ambari to be ready to receive a cluster deployment request
  2. Create and submit the “blueprint.json” file via the REST API to Ambari
  3. Create and submit the “cluster.json” file via the REST API to Ambari
  4. Wait for the deployment process to end
  5. Tune the configurations set by Ambari’s stack advisor

File structure – blueprint.json

The blueprint.json file has three categories at its root :

  • Blueprints , where the blueprint’s global information is set. This includes the stack name and version, and security type.
  • host_groups , which defines host profiles and the components that are deployed on each of them.
  • configurations , with most of the non-default configurations of these components.

At this point, your JSON file should look like this :

Category content – blueprints

Ambari supports multiple stacks to deploy. The most used is Hortonworks’ HDP, that’s what we’ll use here in our example. As for the security, choose between NONE  and KERBEROS . You might want to add a custom kerberos_descriptor, but in our case it was not needed so we’ll not explain it further.

Here’s an easy and functionnal sample of your Blueprints category for a kerberized HDP 2.6 cluster :

Category content – host groups

Host groups define templates to apply to groups of hosts in your cluster.

These are the information you can set as a template :

  • The components that will be deployed on each host mapped to this profile
  • The number of hosts expected to match this profile
  • Some custom configurations to be applied to only this type of hosts
  • A name that best represents hosts of this profile

Some examples of host groups you might want to define in this section: management nodes, worker nodes, master nodes, edge nodes…

Note that you’ll probably have to define multiple master node profiles as they usually do not share the same components.

For HDP 2.6, these are the available components :

Here is a host group sample for worker nodes :

Category content – configurations

This is where you will put most of your custom configurations.

There is no need to set every configuration property for every component you plan to deploy. Most of them have default values defined by their component, and Ambari comes with a stack advisor that sets automatically some others based on your infrastructure. Add the ones that only you are able to define, which is plenty enough.

The structure of a configuration item is the following :

A configuration category is a set of properties that are can usually be found in a single configuration file. Some common examples are : core-site, hdfs-site, hadoop-env, zookeeper-env, …

To get an exhaustive list of the configuration categories supported by Ambari you can either export a blueprint from an existing cluster with the same components deployed on it or look at the configuration sections on the UI. Be aware that Ambari may divide a category in several sections. For example, the “core-site” category can be found as “Advanced core-site” and “Custom core-site” on the UI, but is defined as simply “core-site” in a blueprint file.

Also, a good practice is to leave Ambari to handle the resource sizing of your components first and then tune them through the UI.

There is a configuration category though that is not in the UI and is not part of one of the components you want to deplo : cluster-env . This is a special category for Ambari‘s own properties and is used by it to know how it should deploy your cluster. If you once deployed a cluster through the UI, you will notice that its properties are the ones found in the Misc tab.

So, here’s a part of what the configurations category could contain:

In the previous example, you can see a value called %HOSTGROUP::zk_node% . This is a variable that will be replaced by all hostnames mapped with the host group “zk_node”. Be cautious though when using it, as the conversion is not yet supported on all properties.

Properties that are known to handle the %HOSTGROUP::hg_name%  conversion :

When not supported and you are required by the property to set actual hostnames, define it in the cluster.json file instead (see section “File structure – cluster.json” below).

File structure – cluster.json

While the blueprint.json file represents the template of your cluster deployment, the cluster.json file is the instantiation of your deployment. This means that it is specific to one cluster, and has hard defined values in it.

The cluster.json file has five categories at its root :

  • blueprint , the name of the blueprint (template) that you previously created. Its name is defined at its submission.
  • host_groups , which are the mapping between the hostnames of your infrastructure and their profile defined in the blueprint.
  • configurations , with the properties that are specific to this cluster deployment.
  • security , which has the same value as the property of the “Blueprints” section of the blueprint.
  • credentials , KDC connection information for a kerberized cluster

At this point, your JSON file should look like this :

Category content – host groups

Unlike the blueprint, host groups in the cluster.json file is used to map a real host to a previously defined template.

Its structure is fairly straightforward, as you just set the name of the template (aka. group) and assign a list of hosts to it :

To take the worker template as an example again, here’s what it would look like :

Category content – configurations

Most of your configuration properties should have been defined in the blueprint.json file as they can be used in various cluster implementations.

However, there are two types of properties that are limited to a specific deployment :

  • user and infrastructure dependent configurations
  • configurations that do not handle %HOSTGROUP::hg_name%  conversion

You might also want to add properties that rely on the previously mentioned ones.

In the first category, you’ll mostly find database connection information, authentication credentials, and business-related properties like YARN queues.

This is a sample of these configurations :

The following properties are known to not handle the %HOSTGROUP::hg_name%  conversion :

The configuration category keeps the same structure as in the blueprint.json file. Here’s a sample :

Category content – credentials

For the same reasons as the properties set in the configurations section, KDC credentials of a secure cluster have to be defined on a deployment basis.

This is the structure of it :

Ambari REST API usage

Be sure to have a running Ambari server and agents to send the blueprint to. The remote Ambari repository also has to be reachable for components like Ambari-Infra or Ambari-Metrics.

Repositories registration

First, register your HDP repositories to use for this deployment. This can be done using the following request :

To register HDP 2.6 for RedHat 7 from hortonworks’ public repositories, use the following :

Blueprint files submission

As seen above, start by submitting the blueprint template file. For the ${blueprint_name} , use the same value as the “blueprint” property of your cluster.json file.

Finally, submit the definition of the current cluster. It will take ${cluster_name}  as the name of the cluster.

Conclusion

Even with blueprints, they are plenty of configuration parameters to set. In fact, it may even take longer to create a single blueprint than to fill all fields of each service in Ambari‘s web wizard. You’ll want to use the blueprints when deploying multiple clusters, or creating and destroying environments automatically.

To do this, more than just blueprints is required. For example for one of our customer, we use Puppet to automate the hosts preparation and Ambari’s server and agents installation. When done, it runs a custom built ruby script to generate the blueprint.json and cluster.json files and submit them to the newly installed Ambari. The same can be done through Ansible, or even a custom orchestration engine like the one we wrote, Nikita.

In conclusion, Ambari’s blueprints enable the automation of an HDP (or other distribution) deployment, but can hardly do it alone. Choose the tools that fit you the most or that are currently used by your company, and create a JSON builder for the blueprint.json and cluster.json files.

By | 2018-03-20T14:48:09+00:00 January 17th, 2018|Categories: Big Data, DevOps|Tags: , , , , |0 Comments

About the Author:

Leave A Comment

Time limit is exhausted. Please reload the CAPTCHA.