Data Science

YARN and GPU Distribution for Machine Learning

This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be useful in this context and how it can help the algorithms to run smoothly. This article stems from a conference at [...]

By | 2018-06-07T10:25:09+00:00 May 30th, 2018|Categories: Data Science, DataWorks Summit 2018|Tags: , , |2 Comments

TensorFlow on Spark 2.3: The Best of Both Worlds

The integration of TensorFlow With Spark has a lot of potential and creates new opportunities. […]

Hadoop and R with RHadoop

RHadoop is a bridge between R, a language and environment to statistically explore data sets, and Hadoop, a framework that allows for the distributed processing of large data sets across clusters of computers. RHadoop is built out of 3 components which are R packages: rmr, rhdfs and rhbase. Below, we will present each of those [...]

By | 2018-06-05T22:37:20+00:00 July 19th, 2012|Categories: Data Science|0 Comments