pierre

About Pierre Sauvage

Passionate about computer science since his childhood, and practicing programming in leisure since adolescence, Pierre joined an engineering school specializing in Information System, Big Data option. He began his career in the IoT research laboratory, where he was able to study distributed systems, both theoretically and practically. Pierre then joined Adaltas. Today he is a Big Data & Hadoop Solution Architect and Data Engineer with over 4 years of hands-on experience in Hadoop and 5 years of experience with distributed systems. He has been designing, developing and maintaining, data processing workflows and real-time services as well as bringing to clients a unified and consistent vision on data management and workflows across their different data sources and business requirements. He steps in at all levels of the Data platforms, from planning, design and architecture to clusters deployment, administration, maintenance as well as prototyping and applications development in collaboration with business users, analysts, data scientists, engineering and operational teams. He also has a good experience as educator for knowledge transfer and training.(He regularly gives courses and training around Big Data for various engineering and master schools) facilitating the transfer of knowledge and training of teams.

TensorFlow installation on Docker

TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array (tensors) TensorFlow runs on CPU or GPU (using CUDA®). The architecture is flexible and highly scalable. It can be deployed on smartphones, desktop/servers, or even servers cluster. Installation CPU Only [...]

By |2019-08-05T20:26:32+00:00August 5th, 2019|Categories: Container, Data Science, Learning|Tags: , , , , , |0 Comments

Druid and Hive integration

This article covers the integration between Hive Interactive (LDAP) and Druid. One can see it as a complement of the Ultra-fast OLAP Analytics with Apache Hive and Druid article. Tools description Hive and Hive LLAP Hive is an environment allowing SQL queries on data stored in HDFS. The following executors can be configured in Hive: Map [...]

By |2019-06-19T09:22:12+00:00June 17th, 2019|Categories: Blog, Data Engineering|0 Comments

Kubernetes Storage Primitives for Stateful Workloads

This article is based on the presentation “Introduction to Kubernetes Storage Primitives for Stateful Workloads” from the OSS Convention Prague 2017 by the {Code} team. […]

Advanced multi-tenant Hadoop and Zookeeper protection

Zookeeper is a critical component to Hadoop's high availability operation. The latter protects itself by limiting the number of maximum connections (maxConns = 400). However Zookeeper does not protect himself intelligently, he refuses connections once the threshold is reached. In such case, the core components (HBase RegionServers / HDFS ZKFC) will no longer be able [...]

By |2019-08-05T21:06:41+00:00July 5th, 2017|Categories: Big Data, Infrastructure|Tags: , , , |0 Comments

Apache Apex with Apache SAMOA

Traditional Machine Learning - Batch Oriented - Supervised - most common - Training and Scoring - One time model building - Data set - Training: Model building - Holdout: Paremeter tuning - Test: Accuracy Online Machine Learning - Streaming - Change - Dynmaically adapt to new patterns in Data - Change over time (concept drift) [...]

By |2019-06-18T22:53:49+00:00July 17th, 2016|Categories: Data Science, Events|Tags: , , |0 Comments

Network Namespace without Docker

Let's imagine the following use case: I am connected to several networks (wlan0, eth0, usb0). I want to choose which network I'm gonna use when I launch apps. My app doesn't allow me to choose a specific interface, it's delegated to the OS that chooses the default one. I could of course use Docker, which [...]

By |2019-06-21T21:46:13+00:00July 6th, 2016|Categories: Blog, Hack|Tags: , , |0 Comments