Articles published in 2021
GitOps in practice, deploy Kubernetes applications with ArgoCD
Categories: Containers Orchestration, DevOps & SRE, Adaltas Summit 2021 | Tags: Argo CD, Argo Workflows, CI/CD, Git, GitOps, IaC, Kubernetes, MLOps
GitOps is a set of practices to deploy applications using Git. Application definitions, configurations, and connectivity are to be stored in a version control software such as Git. Git then serves as…
Dec 16, 2021
JS monorepos in prod 6: CI/CD, continuous integration and deployment with Travis CI
Categories: DevOps & SRE, Front End | Tags: CI/CD, Monorepo, Node.js, Unit tests
Implementing continuous integration CI and continuous deployment (CD) on a monorepo is quite complex due to the diversity of multiple responsibilities between developers and the need to coordinate…
By David WORMS
Dec 6, 2021
Spring 2022 internship - building a Data Lab
Categories: Data Science, Learning | Tags: MongoDB, Kafka, Spark, Argo CD, Cloud, Elasticsearch, IaC, Internship, Keycloak, Kubernetes, OpenID Connect, PostgreSQL, Prometheus, TFX
Job Description Over the last few years, we developed the ability to use computers to process large amounts of data. The ecosystem evolved over a large offering of tools and libraries and the creation…
By David WORMS
Nov 24, 2021
CSV package for Node.js version 6
Categories: Node.js | Tags: Data Engineering, Refactoring, CSV, File Format, Release and features
Version 6 of the package for Node.js is released along its sub projects. Here are the latest versions: version , latest version was NPM version , latest version was NPM version , latest version…
By David WORMS
Nov 15, 2021
H2O in practice: a protocol combining AutoML with traditional modeling approaches
Categories: Data Science, Learning | Tags: PySpark, Automation, JDBC, R, Avro, Hadoop, HDFS, Hive, ORC, Parquet, Cloud, CSV, H2O, Machine Learning, MLOps, On-premises, Open source, Python, Scala, XGBoost
H20 comes with a lot of functionalities. The second part of the series H2O in practice proposes a protocol to combine AutoML modeling with traditional modeling and optimization approach. The objective…
Nov 12, 2021
Internship in Big Data infrastructure with TDP
Categories: Infrastructure, Learning | Tags: Arch Linux, CentOS, Cyber Security, Data Engineering, DevOps, Java, Ansible, Hadoop, HDFS, Hive, Knox, MapReduce, Oozie, Ranger, Spark, YARN, Zookeeper, Big Data, Terraform, IaC, Internship, TDP
Job Description Big Data and distributed computing is at Adaltas’ core. We support our partners in the deployment, maintenance and optimization of some of France’s largest clusters. Adaltas is also an…
By Daniel HARTY
Oct 25, 2021
Internship in Data Engineering
Categories: Front End, Learning | Tags: Metrics, Monitoring, Hadoop, Hive, Kafka, Cloud, Delta Lake, Elasticsearch, IaC, Internship, Kubernetes, MLflow, Prometheus, Streaming, TFX
Job Description Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine raw data into information that can be used by business analysts and data…
By David WORMS
Oct 25, 2021
Internship in Web Technologies
Categories: Front End, Learning | Tags: DevOps, LDAP, React.js, CI/CD, Docker, GraphQL, IaC, Internship, Kubernetes, Node.js, OAuth2
Job Description As part of its Big Data activities, Adaltas Academy is an information-sharing platform bringing together articles, training content, and a knowledge base. The users of the platform are…
By David WORMS
Oct 14, 2021
H2O in practice: a Data Scientist feedback
Categories: Data Science, Learning | Tags: PySpark, Automation, JDBC, R, Avro, Hadoop, HDFS, Hive, ORC, Parquet, Cloud, CSV, H2O, Machine Learning, MLOps, On-premises, Open source, Python, Scala
Automated machine learning (AutoML) platforms are gaining popularity and becoming a new important tool in the data scientists’ toolbox. A few months ago, I introduced H2O, an open-source platform for…
Sep 29, 2021
Adaltas Summit 2021, 2nd edition in corsica
Categories: Adaltas Summit 2021, Learning | Tags: Ansible, Hadoop, Spark, Argo CD, Azure, Blockchain, Deep Learning, Docker, Terraform, Kubernetes, Node.js
For its second edition, the whole Adaltas crew is gathering in Corsica for a whole week with 2 days dedicated to technology the 23rd and the 24th of september 2021. After a year and a half of sanitary…
By David WORMS
Sep 21, 2021
Running your Travis CI builds locally with Docker
Categories: DevOps & SRE, Front End | Tags: Bash, Tools, CI/CD, Monorepo, Node.js, Unit tests
Setting up the environment to run the tests on a CI/CD can take a few roundtrips between your host machine and the CI/CD running remotely. For every attempt, you’ll have to commit and publish your…
By David WORMS
Sep 6, 2021
Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud
Categories: Big Data, Cloud Computing | Tags: Ansible, Cloudera, CDP, Cluster, Data Warehouse, Vagrant, IaC
Following our recent Cloudera Data Platform (CDP) overview, we cover how to deploy CDP private Cloud on you local infrastructure. It is entirely automated with the Ansible cookbooks published by…
Jul 23, 2021
An overview of Cloudera Data Platform (CDP)
Categories: Big Data, Cloud Computing, Data Engineering | Tags: SDX, Big Data, Cloud, Cloudera, CDP, CDH, Data Analytics, Data Hub, Data Lake, Data lakehouse, Data Warehouse
Cloudera Data Platform (CDP) is a cloud computing platform for businesses. It provides integrated and multifunctional self-service tools in order to analyze and centralize data. It brings security and…
Jul 19, 2021
Modern Python part 3: run a CI pipeline & publish your package to PiPy
Categories: DevOps & SRE | Tags: CI/CD, Git, GitHub, Python, Release and features, Unit tests
To propose a well-maintained and usable Python package to the open-source community or even inside your company, you are expected to accomplish a set of critical steps. First ensure that your code is…
By Faouzi BRAZA
Jun 28, 2021
Modern Python part 2: write unit tests & enforce Git commit conventions
Categories: DevOps & SRE | Tags: Git, GitHub, Monorepo, pandas, Python, Unit tests
Good software engineering practices always bring a lot of long-term benefits. For example, writing unit tests permits you to maintain large codebases and ensures that a specific piece of your code…
By Faouzi BRAZA
Jun 24, 2021
Modern Python part 1: start a project with pyenv & poetry
Categories: DevOps & SRE | Tags: Git, Python, Release and features, Unit tests
When learning a programming language, the focus is essentially on understanding the syntax, the code style, and the underlying concepts. With time, you become sufficiently comfortable with the…
By Faouzi BRAZA
Jun 9, 2021
Desacralizing the Linux overlay filesystem in Docker
Categories: Containers Orchestration, Infrastructure | Tags: DevOps, File system, Linux, Docker
Overlay filesystems (also called union filesystems) is a fundamental technology in Docker to create images and containers. They allow creating a union of directories to create a filesystem. Multiple…
By David WORMS
Jun 3, 2021
Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI
Categories: Data Engineering, Learning | Tags: AWS, Azure, Cloud, Data Hub, Data Lake, Data Warehouse, Databricks, Delta Lake, GCP, Machine Learning, MLflow
Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customers…
May 26, 2021
JS monorepos in prod 5: merging Git repositories and preserve commit history
Categories: DevOps & SRE, Node.js | Tags: Bash, DevOps, NPM, Packaging, Git, GitHub, GitOps, JavaScript, Monorepo, Node.js, Open source
At Adaltas, we maintain several open-source Node.js projects organized as Git monorepos and published on NPM. We shared our experience to work with Lerna monorepos in a set of articles: Part…
May 21, 2021
Find your way into data related Microsoft Azure certifications
Categories: Cloud Computing, Data Engineering | Tags: Data Governance, AWS, Azure, Azure Data Lake Storage (ADLS), Azure Data Catalog, Azure Data Factory, Data Science, GCP
Microsoft Azure has certification paths for many technical job roles such as developer, Data Engineer, Data Scientist and solution architect among others. Each of these certifications consists of…
Apr 14, 2021
Bridging the DBnomics Swagger/OpenAPI schema with GraphQL
Categories: DevOps & SRE, Front End | Tags: Data Engineering, Database, Front-end, Gatsby, JAMstack, React.js, API, GraphQL, JavaScript, Network, Node.js, REST, Schema
While redacting a long and fastidious document today, I came across DBnomics, an open platform federating economic datasets. Browsing its website and APIs, I found their OpenAPI schema (aka Swagger…
By David WORMS
Apr 8, 2021
Apache Liminal: when MLOps meets GitOps
Categories: Big Data, Containers Orchestration, Data Engineering, Data Science, Tech Radar | Tags: Data Engineering, CI/CD, Data Science, Deep Learning, Deployment, Docker, GitOps, Kubernetes, Machine Learning, MLOps, Open source, Python, TensorFlow
Apache Liminal is an open-source software which proposes a solution to deploy end-to-end Machine Learning pipelines. Indeed it permits to centralize all the steps needed to construct Machine Learning…
Mar 31, 2021
Storage size and generation time in popular file formats
Categories: Data Engineering, Data Science | Tags: Automation, Data structures, Metrics, Avro, Hadoop, HDFS, Hive, MapReduce, ORC, Parquet, Batch processing, Big Data, Data Lake, Data Warehouse, File Format, JavaScript Object Notation (JSON)
Choosing an appropriate file format is essential, whether your data transits on the wire or is stored at rest. Each file format comes with its own advantages and disadvantages. We covered them in a…
Mar 22, 2021
TensorFlow Extended (TFX): the components and their functionalities
Categories: Big Data, Data Engineering, Data Science, Learning | Tags: Beam, Data Engineering, Pipeline, CI/CD, Data Science, Deep Learning, Deployment, Machine Learning, MLOps, Open source, Python, TensorFlow
Putting Machine Learning (ML) and Deep Learning (DL) models in production certainly is a difficult task. It has been recognized as more failure-prone and time consuming than the modeling itself, yet…
Mar 5, 2021
JS monorepos in prod 4: unit testing with Mocha and Should.js
Categories: DevOps & SRE, Front End | Tags: Automation, CI/CD, Git, GitOps, Monorepo, Node.js, Unit tests
Unit testing is essential for every long-term project and allows you to pull down functionalities of your code into isolated testable units. Indeed the main goal of a unit test is to verify if an…
By David WORMS
Feb 25, 2021
JS monorepos in prod 3: commit enforcement and changelog generation
Categories: DevOps & SRE, Front End | Tags: Gatsby, NPM, CI/CD, Git, JavaScript, Monorepo, Node.js, Release and features, Unit tests
Conventional Commits introduces a structured format for commit messages. It standardizes the messages among all the contributors. This makes them more readable and easy to automate. It simplifies the…
By David WORMS
Feb 2, 2021
JS monorepos in prod 2: project versioning and publishing
Categories: DevOps & SRE, Front End | Tags: Gatsby, NPM, CI/CD, Git, GitOps, JavaScript, Monorepo, Node.js, Release and features, Unit tests
One great advantage of a monorepo is to maintain coherent versions between packages and to automatize the version creation and the publication of packages. This article covers the versioning and…
By David WORMS
Jan 11, 2021
JS monorepos in prod 1: project initialization
Categories: DevOps & SRE, Front End | Tags: Gatsby, NPM, Git, GitOps, JavaScript, Monorepo, Node.js, Release and features
Every project journey begins with the step of initialization. When your overall project is composed of multiple projects, it is tempting to create one Git repository per project. In Node.js, a project…
By David WORMS
Jan 5, 2021