Articles published in 2021

GitOps in practice, deploy Kubernetes applications with ArgoCD

GitOps in practice, deploy Kubernetes applications with ArgoCD

Categories: Containers Orchestration, DevOps & SRE, Adaltas Summit 2021 | Tags: Argo CD, Argo Workflows, CI/CD, Git, GitOps, IaC, Kubernetes, MLOps

GitOps is a set of practices to deploy applications using Git. Application definitions, configurations, and connectivity are to be stored in a version control software such as Git. Git then serves as…

Paul-Adrien CORDONNIER

By Paul-Adrien CORDONNIER

Dec 16, 2021

JS monorepos in prod 6: CI/CD, continuous integration and deployment with Travis CI

JS monorepos in prod 6: CI/CD, continuous integration and deployment with Travis CI

Categories: DevOps & SRE, Front End | Tags: CI/CD, Monorepo, Node.js, Unit tests

Implementing continuous integration CI and continuous deployment (CD) on a monorepo is quite complex due to the diversity of multiple responsibilities between developers and the need to coordinate…

David WORMS

By David WORMS

Dec 6, 2021

Spring 2022 internship - building a Data Lab

Spring 2022 internship - building a Data Lab

Categories: Data Science, Learning | Tags: MongoDB, Kafka, Spark, Argo CD, Cloud, Elasticsearch, IaC, Internship, Keycloak, Kubernetes, OpenID Connect, PostgreSQL, Prometheus, TFX

Job Description Over the last few years, we developed the ability to use computers to process large amounts of data. The ecosystem evolved over a large offering of tools and libraries and the creation…

David WORMS

By David WORMS

Nov 24, 2021

CSV package for Node.js version 6

CSV package for Node.js version 6

Categories: Node.js | Tags: Data Engineering, Refactoring, CSV, File Format, Release and features

Version 6 of the package for Node.js is released along its sub projects. Here are the latest versions: version , latest version was NPM version , latest version was NPM version , latest version…

David WORMS

By David WORMS

Nov 15, 2021

H2O in practice: a protocol combining AutoML with traditional modeling approaches

H2O in practice: a protocol combining AutoML with traditional modeling approaches

Categories: Data Science, Learning | Tags: PySpark, Automation, JDBC, R, Avro, Hadoop, HDFS, Hive, ORC, Parquet, Cloud, CSV, H2O, Machine Learning, MLOps, On-premises, Open source, Python, Scala, XGBoost

H20 comes with a lot of functionalities. The second part of the series H2O in practice proposes a protocol to combine AutoML modeling with traditional modeling and optimization approach. The objective…

Internship in Big Data infrastructure with TDP

Internship in Big Data infrastructure with TDP

Categories: Infrastructure, Learning | Tags: Ranger, YARN, Arch Linux, CentOS, Cyber Security, Data Engineering, DevOps, Java, Ansible, Hadoop, HDFS, Hive, Knox, MapReduce, Oozie, Spark, Zookeeper, Big Data, Terraform, IaC, Internship, TDP

Job Description Big Data and distributed computing is at Adaltas’ core. We support our partners in the deployment, maintenance and optimization of some of France’s largest clusters. Adaltas is also an…

Daniel HARTY

By Daniel HARTY

Oct 25, 2021

Internship in Data Engineering

Internship in Data Engineering

Categories: Front End, Learning | Tags: Metrics, Monitoring, Hadoop, Hive, Kafka, Cloud, Delta Lake, Elasticsearch, IaC, Internship, Kubernetes, MLflow, Prometheus, Streaming, TFX

Job Description Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine ​​raw data into information that can be used by business analysts and data…

David WORMS

By David WORMS

Oct 25, 2021

Internship in Web Technologies

Internship in Web Technologies

Categories: Front End, Learning | Tags: DevOps, LDAP, React.js, CI/CD, Docker, GraphQL, IaC, Internship, Kubernetes, Node.js, OAuth2

Job Description As part of its Big Data activities, Adaltas Academy is an information-sharing platform bringing together articles, training content, and a knowledge base. The users of the platform are…

David WORMS

By David WORMS

Oct 14, 2021

H2O in practice: a Data Scientist feedback

H2O in practice: a Data Scientist feedback

Categories: Data Science, Learning | Tags: PySpark, Automation, JDBC, R, Avro, Hadoop, HDFS, Hive, ORC, Parquet, Cloud, CSV, H2O, Machine Learning, MLOps, On-premises, Open source, Python, Scala

Automated machine learning (AutoML) platforms are gaining popularity and becoming a new important tool in the data scientists’ toolbox. A few months ago, I introduced H2O, an open-source platform for…

Adaltas Summit 2021, 2nd edition in corsica

Adaltas Summit 2021, 2nd edition in corsica

Categories: Adaltas Summit 2021, Learning | Tags: Ansible, Hadoop, Spark, Argo CD, Azure, Blockchain, Deep Learning, Docker, Terraform, Kubernetes, Node.js

For its second edition, the whole Adaltas crew is gathering in Corsica for a whole week with 2 days dedicated to technology the 23rd and the 24th of september 2021. After a year and a half of sanitary…

David WORMS

By David WORMS

Sep 21, 2021

Running your Travis CI builds locally with Docker

Running your Travis CI builds locally with Docker

Categories: DevOps & SRE, Front End | Tags: Bash, Tools, CI/CD, Monorepo, Node.js, Unit tests

Setting up the environment to run the tests on a CI/CD can take a few roundtrips between your host machine and the CI/CD running remotely. For every attempt, you’ll have to commit and publish your…

David WORMS

By David WORMS

Sep 6, 2021

Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud

Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud

Categories: Big Data, Cloud Computing | Tags: Ansible, Cloudera, CDP, Cluster, Data Warehouse, Vagrant, IaC

Following our recent Cloudera Data Platform (CDP) overview, we cover how to deploy CDP private Cloud on you local infrastructure. It is entirely automated with the Ansible cookbooks published by…

Alexander HOFFMANN

By Alexander HOFFMANN

Jul 23, 2021

An overview of Cloudera Data Platform (CDP)

An overview of Cloudera Data Platform (CDP)

Categories: Big Data, Cloud Computing, Data Engineering | Tags: SDX, Big Data, Cloud, Cloudera, CDP, CDH, Data Analytics, Data Hub, Data Lake, Lakehouse, Data Warehouse

Cloudera Data Platform (CDP) is a cloud computing platform for businesses. It provides integrated and multifunctional self-service tools in order to analyze and centralize data. It brings security and…

Alexander HOFFMANN

By Alexander HOFFMANN

Jul 19, 2021

Modern Python part 2: write unit tests & enforce Git commit conventions

Modern Python part 2: write unit tests & enforce Git commit conventions

Categories: DevOps & SRE | Tags: GitHub, Git, Monorepo, pandas, Python, Unit tests

Good software engineering practices always bring a lot of long-term benefits. For example, writing unit tests permits you to maintain large codebases and ensures that a specific piece of your code…

Faouzi BRAZA

By Faouzi BRAZA

Jun 24, 2021

Modern Python part 3: run a CI pipeline & publish your package to PiPy

Modern Python part 3: run a CI pipeline & publish your package to PiPy

Categories: DevOps & SRE | Tags: GitHub, CI/CD, Git, Python, Release and features, Unit tests

To propose a well-maintained and usable Python package to the open-source community or even inside your company, you are expected to accomplish a set of critical steps. First ensure that your code is…

Faouzi BRAZA

By Faouzi BRAZA

Jun 28, 2021

Desacralizing the Linux overlay filesystem in Docker

Desacralizing the Linux overlay filesystem in Docker

Categories: Containers Orchestration, Infrastructure | Tags: DevOps, File system, Linux, Docker

Overlay filesystems (also called union filesystems) is a fundamental technology in Docker to create images and containers. They allow creating a union of directories to create a filesystem. Multiple…

David WORMS

By David WORMS

Jun 3, 2021

Modern Python part 1: start a project with pyenv & poetry

Modern Python part 1: start a project with pyenv & poetry

Categories: DevOps & SRE | Tags: Git, Python, Release and features, Unit tests

When learning a programming language, the focus is essentially on understanding the syntax, the code style, and the underlying concepts. With time, you become sufficiently comfortable with the…

Faouzi BRAZA

By Faouzi BRAZA

Jun 9, 2021

Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI

Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI

Categories: Data Engineering, Learning | Tags: AWS, Azure, Cloud, Data Hub, Data Lake, Data Warehouse, Databricks, Delta Lake, GCP, Machine Learning, MLflow

Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customers…

Anna KNYAZEVA

By Anna KNYAZEVA

May 26, 2021

JS monorepos in prod 5: merging Git repositories and preserve commit history

JS monorepos in prod 5: merging Git repositories and preserve commit history

Categories: DevOps & SRE, Node.js | Tags: Bash, DevOps, GitHub, NPM, Packaging, Git, GitOps, JavaScript, Monorepo, Node.js, Open source

At Adaltas, we maintain several open-source Node.js projects organized as Git monorepos and published on NPM. We shared our experience to work with Lerna monorepos in a set of articles: Part…

Sergei KUDINOV

By Sergei KUDINOV

May 21, 2021

Find your way into data related Microsoft Azure certifications

Find your way into data related Microsoft Azure certifications

Categories: Cloud Computing, Data Engineering | Tags: Data Governance, AWS, Azure, Azure Data Lake Storage (ADLS), Azure Data Catalog, Azure Data Factory, Data Science, GCP

Microsoft Azure has certification paths for many technical job roles such as developer, Data Engineer, Data Scientist and solution architect among others. Each of these certifications consists of…

Barthelemy NGOM

By Barthelemy NGOM

Apr 14, 2021

Bridging the DBnomics Swagger/OpenAPI schema with GraphQL

Bridging the DBnomics Swagger/OpenAPI schema with GraphQL

Categories: DevOps & SRE, Front End | Tags: Data Engineering, Database, Front-end, Gatsby, JAMstack, React.js, REST, API, GraphQL, JavaScript, Network, Node.js, Schema

While redacting a long and fastidious document today, I came across DBnomics, an open platform federating economic datasets. Browsing its website and APIs, I found their OpenAPI schema (aka Swagger…

David WORMS

By David WORMS

Apr 8, 2021

Apache Liminal: when MLOps meets GitOps

Apache Liminal: when MLOps meets GitOps

Categories: Big Data, Containers Orchestration, Data Engineering, Data Science, Tech Radar | Tags: Data Engineering, CI/CD, Data Science, Deep Learning, Deployment, Docker, GitOps, Kubernetes, Machine Learning, MLOps, Open source, Python, TensorFlow

Apache Liminal is an open-source software which proposes a solution to deploy end-to-end Machine Learning pipelines. Indeed it permits to centralize all the steps needed to construct Machine Learning…

Aargan COINTEPAS

By Aargan COINTEPAS

Mar 31, 2021

Storage size and generation time in popular file formats

Storage size and generation time in popular file formats

Categories: Data Engineering, Data Science | Tags: Automation, Data structures, Metrics, Avro, Hadoop, HDFS, Hive, MapReduce, ORC, Parquet, Batch processing, Big Data, Data Lake, Data Warehouse, File Format, JavaScript Object Notation (JSON)

Choosing an appropriate file format is essential, whether your data transits on the wire or is stored at rest. Each file format comes with its own advantages and disadvantages. We covered them in a…

Barthelemy NGOM

By Barthelemy NGOM

Mar 22, 2021

TensorFlow Extended (TFX): the components and their functionalities

TensorFlow Extended (TFX): the components and their functionalities

Categories: Big Data, Data Engineering, Data Science, Learning | Tags: Beam, Data Engineering, Pipeline, CI/CD, Data Science, Deep Learning, Deployment, Machine Learning, MLOps, Open source, Python, TensorFlow

Putting Machine Learning (ML) and Deep Learning (DL) models in production certainly is a difficult task. It has been recognized as more failure-prone and time consuming than the modeling itself, yet…

JS monorepos in prod 4: unit testing with Mocha and Should.js

JS monorepos in prod 4: unit testing with Mocha and Should.js

Categories: DevOps & SRE, Front End | Tags: Automation, CI/CD, Git, GitOps, Monorepo, Node.js, Unit tests

Unit testing is essential for every long-term project and allows you to pull down functionalities of your code into isolated testable units. Indeed the main goal of a unit test is to verify if an…

David WORMS

By David WORMS

Feb 25, 2021

JS monorepos in prod 3: commit enforcement and changelog generation

JS monorepos in prod 3: commit enforcement and changelog generation

Categories: DevOps & SRE, Front End | Tags: Gatsby, NPM, CI/CD, Git, JavaScript, Monorepo, Node.js, Release and features, Unit tests

Conventional Commits introduces a structured format for commit messages. It standardizes the messages among all the contributors. This makes them more readable and easy to automate. It simplifies the…

David WORMS

By David WORMS

Feb 2, 2021

JS monorepos in prod 2: project versioning and publishing

JS monorepos in prod 2: project versioning and publishing

Categories: DevOps & SRE, Front End | Tags: Gatsby, NPM, CI/CD, Git, GitOps, JavaScript, Monorepo, Node.js, Release and features, Unit tests

One great advantage of a monorepo is to maintain coherent versions between packages and to automatize the version creation and the publication of packages. This article covers the versioning and…

David WORMS

By David WORMS

Jan 11, 2021

JS monorepos in prod 1: project initialization

JS monorepos in prod 1: project initialization

Categories: DevOps & SRE, Front End | Tags: Gatsby, NPM, Git, GitOps, JavaScript, Monorepo, Node.js, Release and features

Every project journey begins with the step of initialization. When your overall project is composed of multiple projects, it is tempting to create one Git repository per project. In Node.js, a project…

David WORMS

By David WORMS

Jan 5, 2021

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain