MiNiFi: Data at Scales & the Values of Starting Small

MiNiFi: Data at Scales & the Values of Starting Small

Do you like our work......we hire!

Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.

This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it’s a NiFi minimal agent to deploy on small devices to bring data to a cluster’s NiFi pipeline (ex: IoT).

This post is part of the Series of the Dataworks Summit 2017 (ex-Hadoop Summit) and the speaker is Aldrin Piri from Hortonworks. Here are the main points.

Apache NiFi

Apache NiFi is a system answering the following question:

In a connected world where everything and anything can be a producer, how do you bring your data to the consumer?

It allows to collect data from variable sources, apply it some logic and operations and then make them available to other frameworks or push them into a filesystem.

It’s key features are:

  • Guaranteed delivery
  • Data buffering
  • Prioritized queuing
  • Flow specific Quality of Service (latency vs throughput, loss tolerance)
  • Data provenance
  • Recovery / recording a rolling log of fine-grained history
  • Visual command & control
  • Flow templates
  • Pluggable / multi-role security
  • Designed for extenstion
  • Clustering

It uses FlowFiles to store data in its pipeline which is a format storing binary data with associated metadata, much like HTTP, allowing to retrace the file’s provenance. FlowFiles allow NiFi to be data-agnostic. However the system is designed to support plugins for specific data format operations.

Apache MiNiFi

NiFi is very nice however it requires a lot of computing power to run and thus is fairly limited to DataCenters, which means that data provenance is also limited to the DataCenter’s entry point.

With this in mind, NiFi’s team bundled the libraries with FlowFile format, tagging support, site-to-site protocol and provenance generation without all the processing framework, web server and UI, and developped two clients:

  • In Java, way less consuming than the original NiFi service
  • In C++, smaller than the Java one

The first implementation is heavily based on the original NiFi whereas C++ is a complete rewrite for performance optimizations, and is well suited for sensor networks.

There is also the smallest option: develop a specific client using the bundled libraries for a specific platform (iOS / Android SDK, …)

That’s MiNiFi, an embarked NiFi client enabling data provenance directly from the producer.

NiFi Ecosystem

To further the extension of NiFi, the following components are coming:

  • Configuration management of flows & versioning
  • Extension repositories
  • Variable registry

Final thoughts

With the announcement of MiNiFi, the Apache NiFi team tries to be the best ETL for the new IoT world.

Share this article

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain