MiNiFi: Data at Scales & the Values of Starting Small
Jul 8, 2017
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it’s a NiFi minimal agent to deploy on small devices to bring data to a cluster’s NiFi pipeline (ex: IoT).
This post is part of the Series of the Dataworks Summit 2017 (ex-Hadoop Summit) and the speaker is Aldrin Piri from Hortonworks. Here are the main points.
Apache NiFi is a system answering the following question:
In a connected world where everything and anything can be a producer, how do you bring your data to the consumer?
It allows to collect data from variable sources, apply it some logic and operations and then make them available to other frameworks or push them into a filesystem.
It’s key features are:
- Guaranteed delivery
- Data buffering
- Prioritized queuing
- Flow specific Quality of Service (latency vs throughput, loss tolerance)
- Data provenance
- Recovery / recording a rolling log of fine-grained history
- Visual command & control
- Flow templates
- Pluggable / multi-role security
- Designed for extenstion
It uses FlowFiles to store data in its pipeline which is a format storing binary data with associated metadata, much like HTTP, allowing to retrace the file’s provenance. FlowFiles allow NiFi to be data-agnostic. However the system is designed to support plugins for specific data format operations.
NiFi is very nice however it requires a lot of computing power to run and thus is fairly limited to DataCenters, which means that data provenance is also limited to the DataCenter’s entry point.
With this in mind, NiFi’s team bundled the libraries with FlowFile format, tagging support, site-to-site protocol and provenance generation without all the processing framework, web server and UI, and developped two clients:
- In Java, way less consuming than the original NiFi service
- In C++, smaller than the Java one
The first implementation is heavily based on the original NiFi whereas C++ is a complete rewrite for performance optimizations, and is well suited for sensor networks.
There is also the smallest option: develop a specific client using the bundled libraries for a specific platform (iOS / Android SDK, …)
That’s MiNiFi, an embarked NiFi client enabling data provenance directly from the producer.
To further the extension of NiFi, the following components are coming:
- Configuration management of flows & versioning
- Extension repositories
- Variable registry
With the announcement of MiNiFi, the Apache NiFi team tries to be the best ETL for the new IoT world.