Presentation by David Fallside from IBM, images extracted from the presentation.


Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich environment with NPM.

EclairJS is a NodeJS library that provides bindings to a Spark application :

  • An RDD is bound to a JS object that is made immutable
  • Spark operators are transparently mapped to JS functions (ex: flatMap, filter, …)
  • Every Spark operator mapped returns a promise

The use of promises allows to emulate Spark’s use of the DAG :

  • Transformations return a new object and are added to the DAG
  • Actions executes the whole DAG to get a result

EclairJS - Code semantics


EclairJS has two main components :

  • Client: JS API, installed with NPM
  • Server: JS providing Java mapping and able to run in the JVM using Oracle Nashorn, has to be run

The server also uses Jupyter Notebook to provide a WebSocket endpoint between client and server

EclairJS - Architecture


In terms of performances, Spark’s native Java API is way faster, however EclairJS is twice as fast as Spark’s PySpark API.

EclairJS - Performances


EclairJS seems to be a great project if you need to integrate Spark jobs into a web application.