Learning
The sharing of knowledge at Adaltas is reflected in the transfer of skills to our clients, the implementation of tailor-made training, our frequent publications of articles, our Open Source contributions as well as teaching in several schools and universities.
Related articles
Faster model development with H2O AutoML and Flow
Categories: Data Science, Learning | Tags: Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python
Building Machine Learning (ML) models is a time-consuming process. It requires expertise in statistics, ML algorithms, and programming. On top of that, it also requires the ability to translate a…
Dec 10, 2020
Experiment tracking with MLflow on Databricks Community Edition
Categories: Data Engineering, Data Science, Learning | Tags: Spark, Deep Learning, Databricks, Delta Lake, Machine Learning, MLflow, Notebook, Python, Scikit-learn
Introduction to Databricks Community Edition and MLflow Every day the number of tools helping Data Scientists to build models faster increases. Consequently, the need to manage the results and the…
Sep 10, 2020
Importing data to Databricks: external tables and Delta Lake
Categories: Data Engineering, Data Science, Learning | Tags: Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python
During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…
May 21, 2020
Optimisation of Spark applications in Hadoop YARN
Categories: Data Engineering, Learning | Tags: Spark, Tuning, Hadoop, Python
Apache Spark is an in-memory data processing tool widely used in companies to deal with Big Data issues. Running a Spark application in production requires user-defined resources. This article…
Mar 30, 2020
MLflow tutorial: an open source Machine Learning (ML) platform
Categories: Data Engineering, Data Science, Learning | Tags: Deep Learning, AWS, Databricks, Deployment, Machine Learning, Azure, MLflow, MLOps, Python, Scikit-learn
Introduction and principles of MLflow With increasingly cheaper computing power and storage and at the same time increasing data collection in all walks of life, many companies integrated Data Science…
Mar 23, 2020
TensorFlow installation on Docker
Categories: Containers Orchestration, Data Science, Learning | Tags: CPU, Deep Learning, Docker, Jupyter, Linux, AI, TensorFlow
TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array…
Aug 5, 2019
Spark Streaming part 4: clustering with Spark MLlib
Categories: Data Engineering, Data Science, Learning | Tags: Spark, Apache Spark Streaming, Big Data, Streaming, Clustering, Machine Learning, Scala
Spark MLlib is an Apache’s Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform for…
Jul 11, 2019
Spark Streaming part 2: run Spark Structured Streaming pipelines in Hadoop
Categories: Data Engineering, Learning | Tags: Spark, Apache Spark Streaming, Streaming, Python
Spark can process streaming data on a multi-node Hadoop cluster relying on HDFS for the storage and YARN for the scheduling of jobs. Thus, Spark Structured Streaming integrates well with Big Data…
May 28, 2019
Spark Streaming part 1: build data pipelines with Spark Structured Streaming
Categories: Data Engineering, Learning | Tags: Kafka, Spark, Apache Spark Streaming, Big Data, Streaming
Spark Structured Streaming is a new engine introduced with Apache Spark 2 used for processing streaming data. It is built on top of the existing Spark SQL engine and the Spark DataFrame. The…
Apr 18, 2019
First Class Functions in Python
Categories: Hack, Learning | Tags: Programming, Python
I recently watched a talk by Dave Cheney about first class functions in Go. Python supports first class functions too, so can we use them in the same ways? Absolutely. I have been using Python for a…
Apr 15, 2019
CodaLab – Data Science competitions
Categories: Data Science, Adaltas Summit 2018, Learning | Tags: Database, Infrastructure, MySQL, Machine Learning, Node.js, Python
CodaLab Competition is a platform for code execution in the field of Data Science. It is a web interface on which a user can submit code or results and compare themselves to others. Let’s see how it…
Dec 17, 2018
One week to discuss technology in a Moroccan riad
Categories: Adaltas Summit 2018, Learning | Tags: Flink, CDSW, Deep Learning, Gatsby, React.js, Hadoop, Knox, Kubernetes, Node.js
Adaltas organise the year its first conference between the 22 and 26 of October. On the agenda of these 5 days of conference: discuss technology in one of the most beautiful riad of Marrakech. Mix the…
By David WORMS
Oct 11, 2018
Lando: Deep Learning used to summarize conversations
Categories: Data Science, Learning | Tags: Deep Learning, Micro Services, Open API, Kubernetes, Neural Network, Node.js
Lando is an application to summarize conversations using Speech To Text to translate the written record of a meeting into text and Deep Learning technics to summarize contents. It allows users to…
By Yliess HATI
Sep 18, 2018
Notes after Katacoda Training on Kubernetes Container Orchestration
Categories: Containers Orchestration, Learning | Tags: Helm, Ingress, Kubeadm, CNI, Micro Services, Minikube, Kubernetes
A few weeks ago, I dedicated two days to follow the turorials available on Katacoda, the interactive learning platform for Kubernetes or any other container orchestration platform. I’m sharing my…
By David WORMS
Dec 14, 2017
Scaling massive, real-time data pipelines with Go
Categories: Open Source Summit Europe 2017, Learning | Tags: Algorithm, Data structures, Go, Network, Pipeline, Protocols
Last week at the Open Source Summit in Prague, Jean de Klerk held a talk called Scaling massive, real-time data pipelines with Go. This article goes over the main points of the talk, detailing the…
Nov 21, 2017
Apache Hive Essentials How-to by Darren Lee
Categories: Business Intelligence, Learning | Tags: Hive, UDF, Hadoop, File Format, SQL
Recently, I’ve been ask to review a new book on Apache Hive called “Apache Hive Essentials How-to” written by Darren Lee and published by Packt Publishing. To say it short, I sincerely recommend it. I…
By David WORMS
Apr 23, 2013
Hadoop and HBase installation on OSX in pseudo-distributed mode
Categories: Big Data, Learning | Tags: Big Data, Hue, Infrastructure, Hadoop, HBase, Deployment
The operating system chosen is OSX but the procedure is not so different for any Unix environment because most of the software is downloaded from the Internet, uncompressed and set manually. Only a…
By David WORMS
Dec 1, 2010