Presentation by David Fallside from IBM, images extracted from the presentation.
EclairJS is a NodeJS library that provides bindings to a Spark application:
- An RDD is bound to a JS object that is made immutable
- Spark operators are transparently mapped to JS functions (ex: flatMap, filter, …)
- Every Spark operator mapped returns a promise
The use of promises allows to emulate Spark’s use of the DAG:
- Transformations return a new object and are added to the DAG
- Actions executes the whole DAG to get a result
EclairJS has two main components:
- Client: JS API, installed with NPM
- Server: JS providing Java mapping and able to run in the JVM using Oracle Nashorn, has to be run
The server also uses Jupyter Notebook to provide a WebSocket endpoint between client and server
In terms of performances, Spark’s native Java API is way faster, however EclairJS is twice as fast as Spark’s PySpark API.
EclairJS seems to be a great project if you need to integrate Spark jobs into a web application.