Apache Oozie
Related articles
Present and future of Hadoop workflow scheduling: Oozie 5.x
Categories: Big Data, DataWorks Summit 2018 | Tags: Hive, Oozie, Sqoop, HDP, REST, Hadoop, CDH
During the DataWorks Summit Europe 2018 in Berlin, I had the opportunity to attend a breakout session on Apache Oozie. It covers the new features released in Oozie 5.0, including future features of…
May 23, 2018
Execute Python in an Oozie workflow
Categories: Data Engineering | Tags: Oozie, Elasticsearch, REST, Python
Oozie workflows allow you to use multiple actions to execute code, however doing so with Python can be a bit tricky, let’s see how to do that. I’ve recently designed a workflow that would interact…
Mar 6, 2018
Composants for CDH and HDP
Categories: Big Data | Tags: Flume, Hive, Oozie, Sqoop, Zookeeper, Cloudera, Hortonworks, HDP, Hadoop, CDH
I was interested to compare the different components distributed by Cloudera and HortonWorks. This also gives us an idea of the versions packaged by the two distributions. At the time of this writting…
By David WORMS
Sep 22, 2013
Splitting HDFS files into multiple hive tables
Categories: Data Engineering | Tags: Flume, HDFS, Hive, Oozie, Pig, SQL
I am going to show how to split a CSV file stored inside HDFS as multiple Hive tables based on the content of each record. The context is simple. We are using Flume to collect logs from all over our…
By David WORMS
Sep 15, 2013