Adaltas Logo

Adaltas Talented Open Source consultants
collaborating with your teams.

Cloud and Data Lake
  • UI
  • Front-end
  • Data Science
  • Data Engineering
  • Micro Services
  • RDBMS
  • Containers
  • NoSQL
  • Big Data
  • DevOps
  • Cloud
  • On-premise

Adaltas is a team of consultants with a focus on Open Source, Big Data and distributed systems based in France, Canada and Morocco.

  • Architecture, audit and digital transformation
  • Cloud and on-premise operation
  • Complex application and ingestion pipelines
  • Efficient and reliable solutions delivery

Latest articles

JS monorepos in prod 2: project versioning and publishing

Categories: DevOps & SRE, Front End | Tags: CI/CD, Git, JavaScript, Unit tests, Monorepo, Node.js, Release and features

One great advantage of a monorepo is to maintain coherent versions between packages and to automatize the version creation and the publication of packages. This article covers the versioning and…

David WORMS

By David WORMS

Jan 11, 2021

JS monorepos in prod 1: project initialization

Categories: DevOps & SRE, Front End | Tags: Git, JavaScript, Monorepo, Node.js, Release and features

Every project journey begins with the step of initialization. When your overall project is composed of multiple projects, it is tempting to create one Git repository per project. In Node.js, a project…

David WORMS

By David WORMS

Jan 5, 2021

Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin

Categories: Big Data, Infrastructure | Tags: Hive, Maven, Spark, Git, Unit tests, Hadoop, HBase, Release and features

The Hadoop ecosystem gave birth to many popular projects including HBase, Spark and Hive. While technologies like Kubernetes and S3 compatible object storages are growing in popularity, HDFS and YARN…

Leo SCHOUKROUN

By Leo SCHOUKROUN

Dec 18, 2020

Faster model development with H2O AutoML and Flow

Categories: Data Science, Learning | Tags: Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python

Building Machine Learning (ML) models is a time-consuming process. It requires expertise in statistics, ML algorithms, and programming. On top of that, it also requires the ability to translate a…

OAuth2 and OpenID Connect for microservices and public applications (Part 2)

Categories: Containers Orchestration, Cyber Security | Tags: CNCF, JSON, LDAP, Micro Services, OAuth2, OpenID Connect

Using OAuth2 and OpenID Connect, it is important to understand how the authorization flow is taking place, who shall call the Authorization Server, how to store the tokens. Moreover, microservices and…

David WORMS

By David WORMS

Nov 20, 2020

OAuth2 and OpenID Connect, a gentle and working introduction (Part 1)

Categories: Containers Orchestration, Cyber Security | Tags: CNCF, Go, JAMstack, LDAP, Kubernetes, OpenID Connect

Understanding OAuth2, OpenID and OpenID Connect (OIDC), how they relate, how the communications are established, and how to architecture your application with the given access, refresh and id tokens…

David WORMS

By David WORMS

Nov 17, 2020

Connecting to ADLS Gen2 from Hadoop (HDP) and Nifi (HDF)

Categories: Big Data, Cloud Computing, Data Engineering | Tags: HDFS, NiFi, Authentication, Authorization, Hadoop, Azure Data Lake Storage (ADLS), Azure, OAuth2

As data projects built in the Cloud are becoming more and more frequent, a common use case is to interact with Cloud storage from an existing on premise Big Data platform. Microsoft Azure recently…

Gauthier LEONARD

By Gauthier LEONARD

Nov 5, 2020

Rebuilding HDP Hive: patch, test and build

Categories: Big Data, Infrastructure | Tags: Hive, Maven, Git, GitHub, Java, Unit tests, Release and features

The Hortonworks HDP distribution will soon be deprecated in favor of Cloudera’s CDP. One of our clients wanted a new Apache Hive feature backported into HDP 2.6.0. We thought it was a good opportunity…

Leo SCHOUKROUN

By Leo SCHOUKROUN

Oct 6, 2020

Data versioning and reproducible ML with DVC and MLflow

Categories: Data Science, DevOps & SRE, Events | Tags: Data Engineering, Git, Databricks, Delta Lake, Machine Learning, MLflow, Storage

Our talk on data versioning and reproducible Machine Learning proposed to the Data + AI Summit (formerly known as Spark+AI) is accepted. The summit will take place online the 17-19th November…

Experiment tracking with MLflow on Databricks Community Edition

Categories: Data Engineering, Data Science, Learning | Tags: Spark, Deep Learning, Databricks, Delta Lake, Machine Learning, MLflow, Notebook, Python, Scikit-learn

Introduction to Databricks Community Edition and MLflow Every day the number of tools helping Data Scientists to build models faster increases. Consequently, the need to manage the results and the…

Version your datasets with Data Version Control (DVC) and Git

Categories: Data Science, DevOps & SRE | Tags: DevOps, Git, Infrastructure, Operation, SCM

Using a Version Control System such as Git for source code is a good practice and an industry standard. Considering that projects focus more and more on data, shouldn’t we have a similar approach such…

Grégor JOUET

By Grégor JOUET

Sep 3, 2020

Plugin architecture in JavaScript and Node.js with Plug and Play

Categories: Front End, Node.js | Tags: Asynchronous, DevOps, JavaScript, Programming, Agile, Open source, Release and features

Plug and Play helps library and application authors to introduce a plugin architecture into their code. It simplifies complex code execution with well-defined interception points, also called hooks…

David WORMS

By David WORMS

Aug 28, 2020

Installing Hadoop from source: build, patch and run

Categories: Big Data, Infrastructure | Tags: HDFS, Maven, Docker, Java, LXD, Unit tests, Hadoop

Commercial Apache Hadoop distributions have come and gone. The two leaders, Cloudera and Hortonworks, have merged: HDP is no more and CDH is now CDP. MapR has been acquired by HP and IBM BigInsights…

Leo SCHOUKROUN

By Leo SCHOUKROUN

Aug 4, 2020

Download datasets into HDFS and Hive

Categories: Big Data, Data Engineering | Tags: Analytics, HDFS, Hive, Big Data, Data Analytics, Data Engineering, Data structures, Database, Hadoop, Data Lake, Data Warehouse

Introduction Nowadays, the analysis of large amounts of data is becoming more and more possible thanks to Big data technology (Hadoop, Spark,…). This explains the explosion of the data volume and the…

Aida NGOM

By Aida NGOM

Jul 31, 2020

Comparaison of different file formats in Big Data

Categories: Big Data, Data Engineering | Tags: Analytics, Avro, HDFS, Hive, Kafka, MapReduce, ORC, Spark, Batch processing, Big Data, CSV, Data Analytics, Data structures, Database, JSON, Protocol Buffers, Hadoop, Parquet, Kubernetes, XML

In data processing, there are different types of files formats to store your data sets. Each format has its own pros and cons depending upon the use cases and exists to serve one or several purposes…

Aida NGOM

By Aida NGOM

Jul 23, 2020

Automate a Spark routine workflow from GitLab to GCP

Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: Airflow, Spark, CI/CD, Learning and tutorial, GitLab, GCP, Terraform

A workflow consists in automating a succession of tasks to be carried out without human intervention. It is an important and widespread concept which particularly apply to operational environments…

Ferdinand DE BAECQUE

By Ferdinand DE BAECQUE

Jun 16, 2020

Importing data to Databricks: external tables and Delta Lake

Categories: Data Engineering, Data Science, Learning | Tags: Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python

During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…

Introducing Apache Airflow on AWS

Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: Airflow, Oozie, Spark, PySpark, Docker, Learning and tutorial, AWS, Python

Apache Airflow offers a potential solution to the growing challenge of managing an increasingly complex landscape of data management tools, scripts and analytics processes. It is an open-source…

Aargan COINTEPAS

By Aargan COINTEPAS

May 5, 2020

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.