Blog, last published articles

Publish Spark SQL DataFrame and RDD with Spark Thrift Server

The distributed and in-memory nature of the Spark engine makes it an excellent candidate to expose data to clients which expect low latencies. Dashboards, notebooks, BI studios, KPIs-based reports tools commonly speak the JDBC/ODBC protocols and are such examples. Spark Thrift Server may be used in various fashions. It can run independently as Spark standalone [...]

By |2019-03-25T14:50:18+00:00March 25th, 2019|Categories: Big Data, Data Engineering|Tags: , , , , |0 Comments

Multihoming on Hadoop

Multihoming, which means having multiple networks attached to one node, is one of the main components to manage the heterogeneous network usage of an Apache Hadoop cluster. This article is an introduction to the concept and its applications for real-world businesses. […]

By |2019-03-05T18:48:18+00:00March 5th, 2019|Categories: Adalas Summit 2018, Big Data, Data Engineering|Tags: , , |0 Comments

Introduction to Cloudera Data Science Workbench

Cloudera Data Science Workbench is a platform that allows Data Scientists to create, manage, run and schedule data science workflows from their browser. Thus it enables them to focus on their main task that is deriving insights from data, without thinking about the complexity that lies in the background. CDSW was released after Cloudera’s acquisition of [...]

Apache Knox made easy!

Apache Knox is the secure entry point of a Hadoop cluster, but can it also be the entry point for my REST applications? […]

Installing Kubernetes on CentOS 7

This article explains how to install a Kubernetes cluster. I will dive into what each step does so you can build a thorough understanding of what is going on. […]

By |2019-01-29T16:37:55+00:00January 29th, 2019|Categories: Adalas Summit 2018, Container, DevOps, Uncategorized|Tags: , , , |0 Comments

Self-sovereign identities with verifiable claims

Towards a trusted, personal, persistent, and portable digital identity for all. […]

Applying Deep Reinforcement Learning to Poker

We will cover the subject of Deep Reinforcement Learning, more specifically the Deep Q Learning algorithm introduced by DeepMind, and then we'll apply a version of this algorithm to the game of Poker. Reinforcement learning Machine Learning and Deep Learning have become a hot topic in the past years. With the recent improvements in parallel [...]

By |2019-03-10T20:12:15+00:00January 9th, 2019|Categories: Data Science, Deep Learning|Tags: , , |1 Comment

LXD: The Missing Piece

LXD stands for Linux Container Daemon. Yet another container technology. But LXD is very different. It stands apart from the pack. It is not necessarily better nor much faster nor more secure! But it resolves an issue that other containers doesn't. Many of us moved too fast from traditional Virtual Machines to Application containers because [...]

By |2018-12-28T14:21:04+00:00December 28th, 2018|Categories: Container, DevOps|Tags: , , |2 Comments

Monitoring a production Hadoop cluster with Kubernetes

Monitoring a production grade Hadoop cluster is a real challenge and needs to be constantly evolving. The software we use today is based on Nagios. Very efficient when it comes to the simplest surveillance, it is not able to meet the need for a more complex verification. In this article, we will propose an architecture [...]

CodaLab – Data Science competitions

CodaLab Competition is a platform for code execution in the field of Data Science. It is a web interface on which a user can submit code or results and compare themselves to others. Let’s see how it works and how to install CodaLab On-Premise. […]

By |2018-12-17T16:45:38+00:00December 17th, 2018|Categories: Big Data, Data Science|Tags: , , , , |0 Comments