Learning
The sharing of knowledge at Adaltas is reflected in the transfer of skills to our clients, the implementation of tailor-made training, our frequent publications of articles, our Open Source contributions as well as teaching in several schools and universities.
Related articles

TDP workshop: Become a TDP power user from your terminal
Categories: Events, Learning | Tags: Ansible, DevOps, Hadoop, Open source, TDP
The TDP CLI is used to deploy and operate your TDP services. It relies on tdp-lib to provide control and flexibility at your fingertips. Some time ago, we announced the public release of TDP - Trunk…
By Paul FARAULT
Jun 17, 2022

NixOS: Enabling LXD virtual machines using Flakes
Categories: Hack, Learning | Tags: GitHub, Learning and tutorial, Linux, LXD, Packaging, VM, NixOS, Open source
Nixpkgs is an ever-increasing collection of software packages for Nix and NixOS. Even with more than 80,000 packages, you easily run in a situation where there is a functionality that is not yet…
May 13, 2022

Reliable and reproducible Linux installation with NixOS
Categories: Infrastructure, Learning | Tags: Arch Linux, CentOS, Linux, OS X, Packaging, Ubuntu, VM, NixOS, TDP
When using an operating system, upgrading packages or installing new ones are common tasks that introduce the risk of affecting the stability of the system. NixOS is a Linux distribution that ensures…
Feb 8, 2022

Nix introduction, main concepts and commands
Categories: Infrastructure, Learning | Tags: Arch Linux, CentOS, Linux, OS X, Packaging, Ubuntu, NixOS, TDP
Nix is a functional package manager for Linux and other Unix systems, making the management of packages more reliable and easy to reproduce. With a traditional package manager, when updating a package…
Feb 1, 2022

Blockchain 101: Blockchains and Consensus Mechanisms
Categories: Adaltas Summit 2021, Infrastructure, Learning | Tags: Cryptography, Infrastructure, Blockchain, Consensus
Cryptocurrencies are booming in 2021, with a market cap moving from 750 to more than 3,000 billion dollars. Let’s face it, this is mainly due to speculation. A lot of people involved do not have a…
Jan 18, 2022

Spring 2022 internship - building a Data Lab
Categories: Data Science, Learning | Tags: MongoDB, Spark, Argo CD, Elasticsearch, Internship, Kubernetes, OpenID Connect, PostgreSQL
Job Description Over the last few years, we developed the ability to use computers to process large amounts of data. The ecosystem evolved over a large offering of tools and libraries and the creation…
By David WORMS
Nov 24, 2021

H2O in practice: a protocol combining AutoML with traditional modeling approaches
Categories: Data Science, Learning | Tags: Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python, XGBoost
H20 comes with a lot of functionalities. The second part of the series H2O in practice proposes a protocol to combine AutoML modeling with traditional modeling and optimization approach. The objective…
Nov 12, 2021

Internship in Big Data infrastructure with TDP
Categories: Infrastructure, Learning | Tags: Cyber Security, DevOps, Java, Hadoop, IaC, Internship, TDP
Job Description Big Data and distributed computing is at Adaltas’ core. We support our partners in the deployment, maintenance and optimization of some of France’s largest clusters. Adaltas is also an…
By Daniel HARTY
Oct 25, 2021

Internship in Data Engineering
Categories: Front End, Learning | Tags: Hive, Metrics, Monitoring, Kafka, Delta Lake, Elasticsearch, IaC, Internship, Kubernetes, Streaming
Job Description Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine raw data into information that can be used by business analysts and data…
By David WORMS
Oct 25, 2021

Internship in Web Technologies
Categories: Front End, Learning | Tags: DevOps, LDAP, React.js, CI/CD, Docker, GraphQL, IaC, Internship, Kubernetes, Node.js, OAuth2
Job Description As part of its Big Data activities, Adaltas Academy is an information-sharing platform bringing together articles, training content, and a knowledge base. The users of the platform are…
By David WORMS
Oct 14, 2021

H2O in practice: a Data Scientist feedback
Categories: Data Science, Learning | Tags: Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python
Automated machine learning (AutoML) platforms are gaining popularity and becoming a new important tool in the data scientists’ toolbox. A few months ago, I introduced H2O, an open-source platform for…
Sep 29, 2021

Adaltas Summit 2021, 2nd edition in corsica
Categories: Adaltas Summit 2021, Learning | Tags: Ansible, Hadoop, Spark, Azure, Blockchain, Deep Learning, Docker, Terraform, Kubernetes, Node.js
For its second edition, the whole Adaltas crew is gathering in Corsica for a whole week with 2 days dedicated to technology the 23rd and the 24th of september 2021. After a year and a half of sanitary…
By David WORMS
Sep 21, 2021

Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI
Categories: Data Engineering, Learning | Tags: Cloud, Data Lake, Databricks, Delta Lake, MLflow
Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customers…
May 26, 2021

TensorFlow Extended (TFX): the components and their functionalities
Categories: Big Data, Data Engineering, Data Science, Learning | Tags: Beam, Data Engineering, Pipeline, CI/CD, Data Science, Deep Learning, Deployment, Machine Learning, MLOps, Open source, Python, TensorFlow
Putting Machine Learning (ML) and Deep Learning (DL) models in production certainly is a difficult task. It has been recognized as more failure-prone and time consuming than the modeling itself, yet…
Mar 5, 2021

Faster model development with H2O AutoML and Flow
Categories: Data Science, Learning | Tags: Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python
Building Machine Learning (ML) models is a time-consuming process. It requires expertise in statistics, ML algorithms, and programming. On top of that, it also requires the ability to translate a…
Dec 10, 2020

Experiment tracking with MLflow on Databricks Community Edition
Categories: Data Engineering, Data Science, Learning | Tags: Spark, Databricks, Deep Learning, Delta Lake, Machine Learning, MLflow, Notebook, Python, Scikit-learn
Introduction to Databricks Community Edition and MLflow Every day the number of tools helping Data Scientists to build models faster increases. Consequently, the need to manage the results and the…
Sep 10, 2020

Importing data to Databricks: external tables and Delta Lake
Categories: Data Engineering, Data Science, Learning | Tags: Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python
During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…
May 21, 2020

Optimization of Spark applications in Hadoop YARN
Categories: Data Engineering, Learning | Tags: Tuning, Hadoop, Spark, Python
Apache Spark is an in-memory data processing tool widely used in companies to deal with Big Data issues. Running a Spark application in production requires user-defined resources. This article…
Mar 30, 2020

MLflow tutorial: an open source Machine Learning (ML) platform
Categories: Data Engineering, Data Science, Learning | Tags: AWS, Azure, Databricks, Deep Learning, Deployment, Machine Learning, MLflow, MLOps, Python, Scikit-learn
Introduction and principles of MLflow With increasingly cheaper computing power and storage and at the same time increasing data collection in all walks of life, many companies integrated Data Science…
Mar 23, 2020

TensorFlow installation on Docker
Categories: Containers Orchestration, Data Science, Learning | Tags: CPU, Jupyter, Linux, AI, Deep Learning, Docker, TensorFlow
TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array…
Aug 5, 2019

Spark Streaming part 4: clustering with Spark MLlib
Categories: Data Engineering, Data Science, Learning | Tags: Apache Spark Streaming, Spark, Big Data, Clustering, Machine Learning, Scala, Streaming
Spark MLlib is an Apache’s Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform for…
Jun 27, 2019

Spark Streaming part 2: run Spark Structured Streaming pipelines in Hadoop
Categories: Data Engineering, Learning | Tags: Apache Spark Streaming, Spark, Python, Streaming
Spark can process streaming data on a multi-node Hadoop cluster relying on HDFS for the storage and YARN for the scheduling of jobs. Thus, Spark Structured Streaming integrates well with Big Data…
May 28, 2019

Spark Streaming part 1: build data pipelines with Spark Structured Streaming
Categories: Data Engineering, Learning | Tags: Apache Spark Streaming, Kafka, Spark, Big Data, Streaming
Spark Structured Streaming is a new engine introduced with Apache Spark 2 used for processing streaming data. It is built on top of the existing Spark SQL engine and the Spark DataFrame. The…
Apr 18, 2019

First Class Functions in Python
Categories: Hack, Learning | Tags: Programming, Python
I recently watched a talk by Dave Cheney about first class functions in Go. Python supports first class functions too, so can we use them in the same ways? Absolutely. I have been using Python for a…
Apr 15, 2019

CodaLab – Data Science competitions
Categories: Data Science, Adaltas Summit 2018, Learning | Tags: Database, Infrastructure, MySQL, Machine Learning, Node.js, Python
CodaLab Competition is a platform for code execution in the field of Data Science. It is a web interface on which a user can submit code or results and compare themselves to others. Let’s see how it…
Dec 17, 2018

One week to discuss technology in a Moroccan riad
Categories: Adaltas Summit 2018, Learning | Tags: Flink, CDSW, Gatsby, React.js, Hadoop, Knox, Data Science, Deep Learning, Kubernetes, Node.js
Adaltas organise the year its first conference between the 22 and 26 of October. On the agenda of these 5 days of conference: discuss technology in one of the most beautiful riad of Marrakech. Mix the…
By David WORMS
Oct 11, 2018

Lando: Deep Learning used to summarize conversations
Categories: Data Science, Learning | Tags: Micro Services, Open API, Deep Learning, Internship, Kubernetes, Neural Network, Node.js
Lando is an application to summarize conversations using Speech To Text to translate the written record of a meeting into text and Deep Learning technics to summarize contents. It allows users to…
By Yliess HATI
Sep 18, 2018

Notes after Katacoda Training on Kubernetes Container Orchestration
Categories: Containers Orchestration, Learning | Tags: Helm, Ingress, Kubeadm, CNI, Micro Services, Minikube, Kubernetes
A few weeks ago, I dedicated two days to follow the turorials available on Katacoda, the interactive learning platform for Kubernetes or any other container orchestration platform. I’m sharing my…
By David WORMS
Dec 14, 2017

Scaling massive, real-time data pipelines with Go
Categories: Open Source Summit Europe 2017, Learning | Tags: Algorithm, Data structures, Go Lang, Pipeline, Protocols, Network
Last week at the Open Source Summit in Prague, Jean de Klerk held a talk called Scaling massive, real-time data pipelines with Go. This article goes over the main points of the talk, detailing the…
Nov 21, 2017

Apache Hive Essentials How-to by Darren Lee
Categories: Business Intelligence, Learning | Tags: Hive, UDF, Hadoop, File Format, SQL
Recently, I’ve been ask to review a new book on Apache Hive called “Apache Hive Essentials How-to” (edit: the second edition is now available) written by Darren Lee and published by Packt Publishing…
By David WORMS
Apr 23, 2013

Hadoop and HBase installation on OSX in pseudo-distributed mode
Categories: Big Data, Learning | Tags: Hue, Infrastructure, Hadoop, HBase, Big Data, Deployment
The operating system chosen is OSX but the procedure is not so different for any Unix environment because most of the software is downloaded from the Internet, uncompressed and set manually. Only a…
By David WORMS
Dec 1, 2010