EclairJS – Putting a Spark in Web Apps

Presentation by David Fallside from IBM, images extracted from the presentation.

Introduction

Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich environment with NPM.

EclairJS is a NodeJS library that provides bindings to a Spark application :

  • An RDD is bound to a JS object that is made immutable
  • Spark operators are transparently mapped to JS functions (ex: flatMap, filter, …)
  • Every Spark operator mapped returns a promise

The use of promises allows to emulate Spark’s use of the DAG :

  • Transformations return a new object and are added to the DAG
  • Actions executes the whole DAG to get a result

EclairJS - Code semantics

Architecture

EclairJS has two main components :

  • Client: JS API, installed with NPM
  • Server: JS providing Java mapping and able to run in the JVM using Oracle Nashorn, has to be run

The server also uses Jupyter Notebook to provide a WebSocket endpoint between client and server

EclairJS - Architecture

Performance

In terms of performances, Spark’s native Java API is way faster, however EclairJS is twice as fast as Spark’s PySpark API.

EclairJS - Performances

Conclusion

EclairJS seems to be a great project if you need to integrate Spark jobs into a web application.

By |2018-06-05T22:37:06+00:00July 17th, 2016|Categories: Events|0 Comments

About the Author:

Big Data consultant @ Adaltas since 2015, I enjoy discovering stuff and experimenting with new technologies in addition to my day to day work

Leave A Comment