Apache Oozie

Apache Oozie is an open source java web application available under apache license 2.0. It is defined as a job scheduler system designed and deployed to manage and run Hadoop Stack jobs in a distributed storage environment.

An Oozie workflow is a set of actions organized in a Directed Acyclic Graph (DAG). The task chronology, as well as the workflow's start and finish rules, are determined but the control nodes and the execution of tasks are triggered by the action nodes. It comes pre-loaded with a variety of Hadoop Ecosystem actions (including Apache MapReduce and Apache Pig), as well as system-specific jobs (such as shell scripts).

Oozie Coordinator allows you to run Oozie workflows regularly at a given time, according to data avaibility or when an event occurs. A workflow task is launched these conditions are met.

Oozie bundle is a combination of multiple coordinator and workflow jobs in which you manage their lifecycle.

Related articles

Internship in Big Data infrastructure with TDP

Internship in Big Data infrastructure with TDP

Categories: Infrastructure, Learning | Tags: Cyber Security, DevOps, Java, Hadoop, IaC, Internship, TDP

Job Description Big Data and distributed computing is at Adaltas’ core. We support our partners in the deployment, maintenance and optimization of some of France’s largest clusters. Adaltas is also an…

Daniel HARTY

By Daniel HARTY

Oct 25, 2021

Introducing Apache Airflow on AWS

Introducing Apache Airflow on AWS

Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: PySpark, Learning and tutorial, Airflow, Oozie, Spark, AWS, Docker, Python

Apache Airflow offers a potential solution to the growing challenge of managing an increasingly complex landscape of data management tools, scripts and analytics processes. It is an open-source…

Aargan COINTEPAS

By Aargan COINTEPAS

May 5, 2020

Clusters and workloads migration from Hadoop 2 to Hadoop 3

Clusters and workloads migration from Hadoop 2 to Hadoop 3

Categories: Big Data, Infrastructure | Tags: Slider, YARN, Erasure Coding, Rolling Upgrade, HDFS, Spark, Docker

Hadoop 2 to Hadoop 3 migration is a hot subject. How to upgrade your clusters, which features present in the new release may solve current problems and bring new opportunities, how are your current…

Lucas BAKALIAN

By Lucas BAKALIAN

Jul 25, 2018

Present and future of Hadoop workflow scheduling: Oozie 5.x

Present and future of Hadoop workflow scheduling: Oozie 5.x

Categories: Big Data, DataWorks Summit 2018 | Tags: Sqoop, HDP, REST, Hadoop, Hive, Oozie, CDH

During the DataWorks Summit Europe 2018 in Berlin, I had the opportunity to attend a breakout session on Apache Oozie. It covers the new features released in Oozie 5.0, including future features of…

Leo SCHOUKROUN

By Leo SCHOUKROUN

May 23, 2018

Execute Python in an Oozie workflow

Execute Python in an Oozie workflow

Categories: Data Engineering | Tags: REST, Oozie, Elasticsearch, Python

Oozie workflows allow you to use multiple actions to execute code, however doing so with Python can be a bit tricky, let’s see how to do that. I’ve recently designed a workflow that would interact…

César BEREZOWSKI

By César BEREZOWSKI

Mar 6, 2018

Composants for CDH and HDP

Composants for CDH and HDP

Categories: Big Data | Tags: Flume, Sqoop, Hortonworks, HDP, Hadoop, Hive, Oozie, Zookeeper, Cloudera, CDH

I was interested to compare the different components distributed by Cloudera and HortonWorks. This also gives us an idea of the versions packaged by the two distributions. At the time of this writting…

David WORMS

By David WORMS

Sep 22, 2013

Splitting HDFS files into multiple hive tables

Splitting HDFS files into multiple hive tables

Categories: Data Engineering | Tags: Flume, Pig, HDFS, Hive, Oozie, SQL

I am going to show how to split a CSV file stored inside HDFS as multiple Hive tables based on the content of each record. The context is simple. We are using Flume to collect logs from all over our…

David WORMS

By David WORMS

Sep 15, 2013

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain