Kerberos and delegation tokens security with WebHDFS

Kerberos and delegation tokens security with WebHDFS

By David WORMS

Jul 25, 2013

Categories: Cyber Security | Tags: HDFS, Big Data, HTTP, Kerberos

WebHDFS is an HTTP Rest server bundle with the latest version of Hadoop. What interests me on this article is to dig into security with the Kerberos and delegation tokens functionalities. I will cover its usages from the command line and a programming language perspective.

Don’t crawl the web looking for a command to start it. Indeed, it is already available as part of the namenode HTTP interface, by default, on port 50070.

Lets review how a URL is built. Considering a namenode running on a “nn” host and a default port of 50070, all the URLs start as ”http://nn:50070”. Then the URL path is prefix by “/webhdfs/v1” to guaranty that WebHDFS clients will talk to clusters with different Hadoop versions. The remaining of the URL path indicates the HDFS path point to a file or a directory. Inside the URL query parameters, the “op” parameter tells the type of operation to execute, for example “LISTSTATUS” to list the content of a directory.

Speaking URL, here’s an interesting side note. There is no support for HTTPS at the moment. Going through the WebHDFS Jira issue umbrella, there is no mention of implementing it. Maybe because the usage of Kerberos prevents the transmission of password in clear or maybe shall we just create on? A solution is to use a secured proxy HTTP server in front of WebHDFS.

So a basic URL to list the content of the directory “/user/test” is:

curl -s "http://nn:50070/webhdfs/v1/user/test?op=LISTSTATUS"

Question, how do we secure this request? WebHDFS proposes two solutions. The example codes below initiates the Kerberos tickets from a keytab instead of password.

The first uses Kerberos to send the request. CURL knows how to do this with the --negotiate option. Here’s an example:

kinit -kt /etc/security/keytabs/test.headless.keytab test && {
  curl -s --negotiate -u : "http://nn:50070/webhdfs/v1/user/test?op=LISTSTATUS"
  kdestroy
}

The second obtains a delegation token using a Kerberos request and uses the token to send the request. Said differently, it uses the first method to get the token and then just pass the token in the URLs. In this example, we obtain the token and destroy our Kerberos ticket to show that it has no influence. In final request, we added to “delegation” parameter to the URL.

kinit -kt /etc/security/keytabs/test.headless.keytab test && {
  token=`curl -s --negotiate -u : "http://nn:50070/webhdfs/v1/?op=GETDELEGATIONTOKEN"`
  token=`echo $token | grep -Po 'urlString":"\K[^"]*'`
  kdestroy
  curl -s "http://nn:50070/webhdfs/v1/user/test?delegation=#{token}&op=LISTSTATUS"
}

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.