Data Science

TensorFlow installation on Docker

TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array (tensors) TensorFlow runs on CPU or GPU (using CUDA®). The architecture is flexible and highly scalable. It can be deployed on smartphones, desktop/servers, or even servers cluster. Installation CPU Only [...]

By |2019-08-05T20:26:32+00:00August 5th, 2019|Categories: Container, Data Science, Learning|Tags: , , , , , |0 Comments

Spark Streaming part 4: clustering with Spark MLlib

Spark MLlib is an Apache's Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform for developing Machine Learning systems. An ML model developed with Spark MLlib can be combined with a low-latency streaming pipeline created with Spark Structured Streaming. The K-means clustering algorithm [...]

By |2019-07-12T08:07:03+00:00July 11th, 2019|Categories: Big Data, Data Engineering, ML|Tags: , , , , |1 Comment

Introduction to Cloudera Data Science Workbench

Cloudera Data Science Workbench is a platform that allows Data Scientists to create, manage, run and schedule data science workflows from their browser. Thus it enables them to focus on their main task that is deriving insights from data, without thinking about the complexity that lies in the background. CDSW was released after Cloudera’s acquisition of [...]

Applying Deep Reinforcement Learning to Poker

We will cover the subject of Deep Reinforcement Learning, more specifically the Deep Q Learning algorithm introduced by DeepMind, and then we'll apply a version of this algorithm to the game of Poker. Reinforcement learning Machine Learning and Deep Learning have become a hot topic in the past years. With the recent improvements in parallel [...]

By |2019-03-10T20:12:15+00:00January 9th, 2019|Categories: Data Science, Deep Learning|Tags: , , |1 Comment

CodaLab – Data Science competitions

CodaLab Competition is a platform for code execution in the field of Data Science. It is a web interface on which a user can submit code or results and compare themselves to others. Let’s see how it works and how to install CodaLab On-Premise. […]

By |2018-12-17T16:45:38+00:00December 17th, 2018|Categories: Big Data, Data Science|Tags: , , , , |0 Comments

Main advantages of GraphQL as an alternative to REST

GraphQL is based on a simple idea, moving the assembly of a request from the server to the client. The client sees the overall strongly-typed schema instead of multiple REST endpoints and he builds the query he wants. My first REST based web application, SPAs for Single Page Applications as we are calling it lately, [...]

By |2018-11-27T09:56:07+00:00November 27th, 2018|Categories: Big Data, Data Science|Tags: , , , , , |0 Comments

Lando: Deep Learning used to summarize conversations

Lando is an application to summarize conversations using Speech To Text to translate the written record of a meeting into text and Deep Learning technics to summarize contents. It allows users to quickly understand the context of the conversation. During the cource of our internship at Adaltas, we worked on a new project called Lando to [...]

Deep learning on YARN: running Tensorflow and friends on Hadoop cluster

With the arrival of Hadoop 3, YARN offer more flexibility in resource management. It is now possible to perform Deep Learning analysis on GPUs with specific development environments, leveraging available resources. This article is a based on the presentation of Wandga Tan, Apache Hadoop PMC menber, at the DataWorks Summit 2018. It mostly focus on [...]

By |2018-07-24T19:43:12+00:00July 24th, 2018|Categories: Data Science, DataWorks Summit 2018|Tags: , , , |0 Comments

YARN and GPU Distribution for Machine Learning

This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be useful in this context and how it can help the algorithms to run smoothly. This article stems from a conference at [...]

By |2019-08-16T21:26:37+00:00May 30th, 2018|Categories: Data Science, DataWorks Summit 2018|Tags: , , |2 Comments