The Adaltas crew went to the DataWorks Summit 2018 held in Berlin on the 18th and 19th of April 2018. On this occasion, we compiled a series of articles about the conferences that have marked us most.
- Omid: Scalable and highly available transaction processing for Apache Phoenix
Apache Omid provides a transactional layer on top of key/value NoSQL databases. This article is the result of my understanding of Apache Omid through online documentation and the conference Ohab Shacham présentation.
- Apache Beam: a unified programming model for data processing pipelines
Apache Beam is the Google implementation of the Dataflow model to express robust, out-of-order data processing pipelines in a variety of languages for both stream and batch architectures. The article is written after the presentation “Present and future of unified, portable and efficient data processing with Apache Beam” by Davor Bonaci.
- Apache Hadoop YARN 3.0 – State of the union
This article covers “the Apache Hadoop YARN: state of the union” talk held by Wangda Tan from Hortonworks about the present and the future of Apache YARN.
- Accelerating query processing with materialized views in Apache Hive
Jesus Camacho Rodriguez from Hortonworks held a talk “Accelerating query processing with materialized views in Apache Hive” about the new materialized view feature coming in Apache Hive 3.0.
- Present and future of Hadoop workflow scheduling: Oozie 5.x
This article covers the new features released in Oozie 5.0, including future features of Oozie 5.X, as well as the Apache Ambari’s Workflow Scheduler and its way to design and visualize Apache Oozie workflows.
- TensorFlow on Spark 2.3: the best of both worlds
This article is about the new features of the 2.3 release of Apache Spark, an open source framework for Big Data computation on clusters. It insists on the integration of Tensorflow in Spark and the benefits of combining them.
- YARN and GPU Distribution for Machine Learning
This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be useful in this context and how it can help the algorithms to run smoothly.
- What’s new in Apache Spark 2.3?
This is a composition of the two talks, “Apache Spark 2.3 boosts advanced analytics & deep learning” by Yanbo Liang and “ORC Improvement in Apache Spark 2.3” by Dongjoon Hyun, to dive into the new features offered by the 2.3 distribution of Apache Spark.
- Running Enterprise Workloads in the Cloud with Cloudbreak
This article is based on Peter Darvasi and Richard Doktorics’ talk “Running Enterprise Workloads in the Cloud” at the DataWorks Summit 2018 in Berlin.
- Apache Metron in the Real World
Apache Metron is a storage and analytic platform specialised in cyber security. The talk led by Dave Russell was about demonstrating the usages and capabilities of Apache Metron in the real world.