EclairJS - Putting a Spark in Web Apps

EclairJS - Putting a Spark in Web Apps

By David WORMS

Jul 17, 2016

Presentation by David Fallside from IBM, images extracted from the presentation.

Introduction

Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich environment with NPM.

EclairJS is a NodeJS library that provides bindings to a Spark application:

  • An RDD is bound to a JS object that is made immutable
  • Spark operators are transparently mapped to JS functions (ex: flatMap, filter, …)
  • Every Spark operator mapped returns a promise

The use of promises allows to emulate Spark’s use of the DAG:

  • Transformations return a new object and are added to the DAG
  • Actions executes the whole DAG to get a result

EclairJS - Code semantics

Architecture

EclairJS has two main components:

  • Client: JS API, installed with NPM
  • Server: JS providing Java mapping and able to run in the JVM using Oracle Nashorn, has to be run

The server also uses Jupyter Notebook to provide a WebSocket endpoint between client and server

EclairJS - Architecture

Performance

In terms of performances, Spark’s native Java API is way faster, however EclairJS is twice as fast as Spark’s PySpark API.

EclairJS - Performances

Conclusion

EclairJS seems to be a great project if you need to integrate Spark jobs into a web application.

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.