Articles published in 2021
![GitOps in practice, deploy Kubernetes applications with ArgoCD GitOps in practice, deploy Kubernetes applications with ArgoCD](/static/9e1709eabfab9ffcecf963cbb6e15a84/0fd76/gitops-argo-cd.png)
GitOps in practice, deploy Kubernetes applications with ArgoCD
Categories: Containers Orchestration, DevOps & SRE, Adaltas Summit 2021 | Tags: Argo CD, Argo Workflows, CI/CD, Git, GitOps, IaC, Kubernetes, MLOps
GitOps is a set of practices to deploy applications using Git. Application definitions, configurations, and connectivity are to be stored in a version control software such as Git. Git then serves as…
Dec 16, 2021
![JS monorepos in prod 6: CI/CD, continuous integration and deployment with Travis CI JS monorepos in prod 6: CI/CD, continuous integration and deployment with Travis CI](/static/40985ab4ad872b58f6b9aa28b1c76a9a/0fd76/continuous-integration-deployment.png)
JS monorepos in prod 6: CI/CD, continuous integration and deployment with Travis CI
Categories: DevOps & SRE, Front End | Tags: CI/CD, Monorepo, Node.js, Unit tests
Implementing continuous integration CI and continuous deployment (CD) on a monorepo is quite complex due to the diversity of multiple responsibilities between developers and the need to coordinate…
By David WORMS
Dec 6, 2021
![CSV package for Node.js version 6 CSV package for Node.js version 6](/static/99ed73431c694f29b4979cbe15f921c0/0fd76/csv.png)
CSV package for Node.js version 6
Categories: Node.js | Tags: Data Engineering, Refactoring, CSV, File Format, Release and features
Version 6 of the package for Node.js is released along its sub projects. Here are the latest versions: version , latest version was NPM version , latest version was NPM version , latest version…
By David WORMS
Nov 15, 2021
![Spring 2022 internship - building a Data Lab Spring 2022 internship - building a Data Lab](/static/4bbe295af85e4e2f1485a1a6f092e267/0fd76/build-datalab.png)
Spring 2022 internship - building a Data Lab
Categories: Data Science, Learning | Tags: MongoDB, Kafka, Spark, Argo CD, Cloud, Elasticsearch, IaC, Internship, Keycloak, Kubernetes, OpenID Connect, PostgreSQL, Prometheus, TFX
Job Description Over the last few years, we developed the ability to use computers to process large amounts of data. The ecosystem evolved over a large offering of tools and libraries and the creation…
By David WORMS
Nov 24, 2021
![Internship in Big Data infrastructure with TDP Internship in Big Data infrastructure with TDP](/static/d66a0a4b8575c5e10368b311407ed019/0fd76/internship-tdp.png)
Internship in Big Data infrastructure with TDP
Categories: Infrastructure, Learning | Tags: Arch Linux, CentOS, Cyber Security, Data Engineering, DevOps, Java, Ansible, Hadoop, HDFS, Hive, Knox, MapReduce, Oozie, Ranger, Spark, YARN, Zookeeper, Big Data, Terraform, IaC, Internship, TDP
Job Description Big Data and distributed computing is at Adaltas’ core. We support our partners in the deployment, maintenance and optimization of some of France’s largest clusters. Adaltas is also an…
By Daniel HARTY
Oct 25, 2021
![H2O in practice: a protocol combining AutoML with traditional modeling approaches H2O in practice: a protocol combining AutoML with traditional modeling approaches](/static/1fa130ce2060d6d5efe8fdbedc6ed3d8/0fd76/h2o-automl-protocol.png)
H2O in practice: a protocol combining AutoML with traditional modeling approaches
Categories: Data Science, Learning | Tags: PySpark, Automation, JDBC, R, Avro, Hadoop, HDFS, Hive, ORC, Parquet, Cloud, CSV, H2O, Machine Learning, MLOps, On-premises, Open source, Python, Scala, XGBoost
H20 comes with a lot of functionalities. The second part of the series H2O in practice proposes a protocol to combine AutoML modeling with traditional modeling and optimization approach. The objective…
Nov 12, 2021
![Internship in Data Engineering Internship in Data Engineering](/static/c8926b82c0987458b240d7355f83283f/0fd76/data-engineering.png)
Internship in Data Engineering
Categories: Front End, Learning | Tags: Metrics, Monitoring, Hadoop, Hive, Kafka, Cloud, Delta Lake, Elasticsearch, IaC, Internship, Kubernetes, MLflow, Prometheus, Streaming, TFX
Job Description Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine raw data into information that can be used by business analysts and data…
By David WORMS
Oct 25, 2021
![Internship in Web Technologies Internship in Web Technologies](/static/e28d33d6f9a9871fc1a5ca73d69957d7/0fd76/technologies-web.png)
Internship in Web Technologies
Categories: Front End, Learning | Tags: DevOps, LDAP, React.js, CI/CD, Docker, GraphQL, IaC, Internship, Kubernetes, Node.js, OAuth2
Job Description As part of its Big Data activities, Adaltas Academy is an information-sharing platform bringing together articles, training content, and a knowledge base. The users of the platform are…
By David WORMS
Oct 14, 2021
![H2O in practice: a Data Scientist feedback H2O in practice: a Data Scientist feedback](/static/12298157086f95aa3e94c715d4f08041/0fd76/h2o_puzzle.png)
H2O in practice: a Data Scientist feedback
Categories: Data Science, Learning | Tags: PySpark, Automation, JDBC, R, Avro, Hadoop, HDFS, Hive, ORC, Parquet, Cloud, CSV, H2O, Machine Learning, MLOps, On-premises, Open source, Python, Scala
Automated machine learning (AutoML) platforms are gaining popularity and becoming a new important tool in the data scientists’ toolbox. A few months ago, I introduced H2O, an open-source platform for…
Sep 29, 2021
![Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud](/static/4f226ea2dd45a23339eeea153729889c/0fd76/cdp-installation.png)
Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud
Categories: Big Data, Cloud Computing | Tags: Ansible, Cloudera, CDP, Cluster, Data Warehouse, Vagrant, IaC
Following our recent Cloudera Data Platform (CDP) overview, we cover how to deploy CDP private Cloud on you local infrastructure. It is entirely automated with the Ansible cookbooks published by…
Jul 23, 2021
![Running your Travis CI builds locally with Docker Running your Travis CI builds locally with Docker](/static/b89d3cd615b382581e7e23fdeda6c1ae/0fd76/travis-ci-docker.png)
Running your Travis CI builds locally with Docker
Categories: DevOps & SRE, Front End | Tags: Bash, Tools, CI/CD, Monorepo, Node.js, Unit tests
Setting up the environment to run the tests on a CI/CD can take a few roundtrips between your host machine and the CI/CD running remotely. For every attempt, you’ll have to commit and publish your…
By David WORMS
Sep 6, 2021
![Modern Python part 3: run a CI pipeline & publish your package to PiPy Modern Python part 3: run a CI pipeline & publish your package to PiPy](/static/8a16ae3d8fff2f1c6c547775aa80650f/0fd76/modern-python-3.png)
Modern Python part 3: run a CI pipeline & publish your package to PiPy
Categories: DevOps & SRE | Tags: CI/CD, Git, GitHub, Python, Release and features, Unit tests
To propose a well-maintained and usable Python package to the open-source community or even inside your company, you are expected to accomplish a set of critical steps. First ensure that your code is…
By Faouzi BRAZA
Jun 28, 2021
![Adaltas Summit 2021, 2nd edition in corsica Adaltas Summit 2021, 2nd edition in corsica](/static/156d29c3318963d7724ea4a55bb10ea0/0fd76/adaltas-summit-2021.png)
Adaltas Summit 2021, 2nd edition in corsica
Categories: Adaltas Summit 2021, Learning | Tags: Ansible, Hadoop, Spark, Argo CD, Azure, Blockchain, Deep Learning, Docker, Terraform, Kubernetes, Node.js
For its second edition, the whole Adaltas crew is gathering in Corsica for a whole week with 2 days dedicated to technology the 23rd and the 24th of september 2021. After a year and a half of sanitary…
By David WORMS
Sep 21, 2021
![Modern Python part 2: write unit tests & enforce Git commit conventions Modern Python part 2: write unit tests & enforce Git commit conventions](/static/8bd4095e2c6c6433a7b5e016d898a30b/0fd76/modern-python-2.png)
Modern Python part 2: write unit tests & enforce Git commit conventions
Categories: DevOps & SRE | Tags: Git, GitHub, Monorepo, pandas, Python, Unit tests
Good software engineering practices always bring a lot of long-term benefits. For example, writing unit tests permits you to maintain large codebases and ensures that a specific piece of your code…
By Faouzi BRAZA
Jun 24, 2021
![An overview of Cloudera Data Platform (CDP) An overview of Cloudera Data Platform (CDP)](/static/2c65eb2f707b2f349dccc928a8ea12d9/0fd76/cdp-overview.png)
An overview of Cloudera Data Platform (CDP)
Categories: Big Data, Cloud Computing, Data Engineering | Tags: SDX, Big Data, Cloud, Cloudera, CDP, CDH, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Warehouse
Cloudera Data Platform (CDP) is a cloud computing platform for businesses. It provides integrated and multifunctional self-service tools in order to analyze and centralize data. It brings security and…
Jul 19, 2021
![Modern Python part 1: start a project with pyenv & poetry Modern Python part 1: start a project with pyenv & poetry](/static/483c641dae9b8225c67c2ebabbf50f51/0fd76/modern-python.png)
Modern Python part 1: start a project with pyenv & poetry
Categories: DevOps & SRE | Tags: Git, Python, Release and features, Unit tests
When learning a programming language, the focus is essentially on understanding the syntax, the code style, and the underlying concepts. With time, you become sufficiently comfortable with the…
By Faouzi BRAZA
Jun 9, 2021
![JS monorepos in prod 5: merging Git repositories and preserve commit history JS monorepos in prod 5: merging Git repositories and preserve commit history](/static/b4705e058e2548f2b140f09b51424f95/0fd76/migrating-to-monorepo.png)
JS monorepos in prod 5: merging Git repositories and preserve commit history
Categories: DevOps & SRE, Node.js | Tags: Bash, DevOps, NPM, Packaging, Git, GitHub, GitOps, JavaScript, Monorepo, Node.js, Open source
At Adaltas, we maintain several open-source Node.js projects organized as Git monorepos and published on NPM. We shared our experience to work with Lerna monorepos in a set of articles: Part…
May 21, 2021
![Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI](/static/d7b671cfb4c15e7aade3bd71a0e119c3/0fd76/databricks-selfpaced.png)
Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI
Categories: Data Engineering, Learning | Tags: AWS, Azure, Cloud, Data Hub, Data Lake, Data Warehouse, Databricks, Delta Lake, GCP, Machine Learning, MLflow
Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customers…
May 26, 2021
![Bridging the DBnomics Swagger/OpenAPI schema with GraphQL Bridging the DBnomics Swagger/OpenAPI schema with GraphQL](/static/b7f9351255c80c9a3f0041848708b134/0fd76/dbnomics-graphql.png)
Bridging the DBnomics Swagger/OpenAPI schema with GraphQL
Categories: DevOps & SRE, Front End | Tags: Data Engineering, Database, Front-end, Gatsby, JAMstack, React.js, API, GraphQL, JavaScript, Network, Node.js, REST, Schema
While redacting a long and fastidious document today, I came across DBnomics, an open platform federating economic datasets. Browsing its website and APIs, I found their OpenAPI schema (aka Swagger…
By David WORMS
Apr 8, 2021
![Apache Liminal: when MLOps meets GitOps Apache Liminal: when MLOps meets GitOps](/static/41c2e11fb046683b549d030f3b3528b4/0fd76/apache-liminal.png)
Apache Liminal: when MLOps meets GitOps
Categories: Big Data, Containers Orchestration, Data Engineering, Data Science, Tech Radar | Tags: Data Engineering, CI/CD, Data Science, Deep Learning, Deployment, Docker, GitOps, Kubernetes, Machine Learning, MLOps, Open source, Python, TensorFlow
Apache Liminal is an open-source software which proposes a solution to deploy end-to-end Machine Learning pipelines. Indeed it permits to centralize all the steps needed to construct Machine Learning…
Mar 31, 2021
![Desacralizing the Linux overlay filesystem in Docker Desacralizing the Linux overlay filesystem in Docker](/static/18c6dd774251c1af4332364c3e9f7e6f/0fd76/linux-docker-overlay.png)
Desacralizing the Linux overlay filesystem in Docker
Categories: Containers Orchestration, Infrastructure | Tags: DevOps, File system, Linux, Docker
Overlay filesystems (also called union filesystems) is a fundamental technology in Docker to create images and containers. They allow creating a union of directories to create a filesystem. Multiple…
By David WORMS
Jun 3, 2021
![Find your way into data related Microsoft Azure certifications Find your way into data related Microsoft Azure certifications](/static/966361b29b19ad1652f6b6dbe3422f67/0fd76/azure-certification.png)
Find your way into data related Microsoft Azure certifications
Categories: Cloud Computing, Data Engineering | Tags: Data Governance, AWS, Azure, Azure Data Lake Storage (ADLS), Azure Data Catalog, Azure Data Factory, Data Science, GCP
Microsoft Azure has certification paths for many technical job roles such as developer, Data Engineer, Data Scientist and solution architect among others. Each of these certifications consists of…
Apr 14, 2021
![Storage size and generation time in popular file formats Storage size and generation time in popular file formats](/static/0eb8d1f2b2259c47ff8da109641b8a22/0fd76/file-format-storage.png)
Storage size and generation time in popular file formats
Categories: Data Engineering, Data Science | Tags: Automation, Data structures, Metrics, Avro, Hadoop, HDFS, Hive, MapReduce, ORC, Parquet, Batch processing, Big Data, Data Lake, Data Warehouse, File Format, JavaScript Object Notation (JSON)
Choosing an appropriate file format is essential, whether your data transits on the wire or is stored at rest. Each file format comes with its own advantages and disadvantages. We covered them in a…
Mar 22, 2021
![TensorFlow Extended (TFX): the components and their functionalities TensorFlow Extended (TFX): the components and their functionalities](/static/7e26dafa6ce1f0b0f3c1b9ff6fd6f48f/0fd76/tfx-overview.png)
TensorFlow Extended (TFX): the components and their functionalities
Categories: Big Data, Data Engineering, Data Science, Learning | Tags: Beam, Data Engineering, Pipeline, CI/CD, Data Science, Deep Learning, Deployment, Machine Learning, MLOps, Open source, Python, TensorFlow
Putting Machine Learning (ML) and Deep Learning (DL) models in production certainly is a difficult task. It has been recognized as more failure-prone and time consuming than the modeling itself, yet…
Mar 5, 2021
![JS monorepos in prod 3: commit enforcement and changelog generation JS monorepos in prod 3: commit enforcement and changelog generation](/static/9525d4eb0945502ef82bc5e09ecdbc71/0fd76/js-monorepos-commits-changelog.png)
JS monorepos in prod 3: commit enforcement and changelog generation
Categories: DevOps & SRE, Front End | Tags: Gatsby, NPM, CI/CD, Git, JavaScript, Monorepo, Node.js, Release and features, Unit tests
Conventional Commits introduces a structured format for commit messages. It standardizes the messages among all the contributors. This makes them more readable and easy to automate. It simplifies the…
By David WORMS
Feb 2, 2021
![JS monorepos in prod 2: project versioning and publishing JS monorepos in prod 2: project versioning and publishing](/static/d45f0a263110af7d21339a6e1affb71e/0fd76/js-monorepos-versioning-publishing.png)
JS monorepos in prod 2: project versioning and publishing
Categories: DevOps & SRE, Front End | Tags: Gatsby, NPM, CI/CD, Git, GitOps, JavaScript, Monorepo, Node.js, Release and features, Unit tests
One great advantage of a monorepo is to maintain coherent versions between packages and to automatize the version creation and the publication of packages. This article covers the versioning and…
By David WORMS
Jan 11, 2021
![JS monorepos in prod 4: unit testing with Mocha and Should.js JS monorepos in prod 4: unit testing with Mocha and Should.js](/static/1ae27cf9576ee80eeb41149c776121eb/0fd76/js-monorepos-unit-testing.png)
JS monorepos in prod 4: unit testing with Mocha and Should.js
Categories: DevOps & SRE, Front End | Tags: Automation, CI/CD, Git, GitOps, Monorepo, Node.js, Unit tests
Unit testing is essential for every long-term project and allows you to pull down functionalities of your code into isolated testable units. Indeed the main goal of a unit test is to verify if an…
By David WORMS
Feb 25, 2021
![JS monorepos in prod 1: project initialization JS monorepos in prod 1: project initialization](/static/3f882ef75ac77b9233c7fc7e77937a91/0fd76/js-monorepos-initialization.png)
JS monorepos in prod 1: project initialization
Categories: DevOps & SRE, Front End | Tags: Gatsby, NPM, Git, GitOps, JavaScript, Monorepo, Node.js, Release and features
Every project journey begins with the step of initialization. When your overall project is composed of multiple projects, it is tempting to create one Git repository per project. In Node.js, a project…
By David WORMS
Jan 5, 2021