Git
Related articles
JS monorepos in prod 3: commit enforcement and changelog generation
Categories: DevOps & SRE, Front End | Tags: CI/CD, Git, JavaScript, Unit tests, Monorepo, Node.js, Release and features
Conventional Commits introduces a structured format for commit messages. It standardizes the messages among all the contributors. This makes them more readable and easy to automate. It simplifies the…
By David WORMS
Feb 2, 2021
JS monorepos in prod 2: project versioning and publishing
Categories: DevOps & SRE, Front End | Tags: CI/CD, Git, JavaScript, Unit tests, Monorepo, Node.js, Release and features
One great advantage of a monorepo is to maintain coherent versions between packages and to automatize the version creation and the publication of packages. This article covers the versioning and…
By David WORMS
Jan 11, 2021
JS monorepos in prod 1: project initialization
Categories: DevOps & SRE, Front End | Tags: Git, JavaScript, Monorepo, Node.js, Release and features
Every project journey begins with the step of initialization. When your overall project is composed of multiple projects, it is tempting to create one Git repository per project. In Node.js, a project…
By David WORMS
Jan 5, 2021
Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin
Categories: Big Data, Infrastructure | Tags: Hive, Maven, Spark, Git, Unit tests, Hadoop, HBase, Release and features
The Hadoop ecosystem gave birth to many popular projects including HBase, Spark and Hive. While technologies like Kubernetes and S3 compatible object storages are growing in popularity, HDFS and YARN…
Dec 18, 2020
Rebuilding HDP Hive: patch, test and build
Categories: Big Data, Infrastructure | Tags: Hive, Maven, Git, GitHub, Java, Unit tests, Release and features
The Hortonworks HDP distribution will soon be deprecated in favor of Cloudera’s CDP. One of our clients wanted a new Apache Hive feature backported into HDP 2.6.0. We thought it was a good opportunity…
Oct 6, 2020
Data versioning and reproducible ML with DVC and MLflow
Categories: Data Science, DevOps & SRE, Events | Tags: Data Engineering, Git, Databricks, Delta Lake, Machine Learning, MLflow, Storage
Our talk on data versioning and reproducible Machine Learning proposed to the Data + AI Summit (formerly known as Spark+AI) is accepted. The summit will take place online the 17-19th November…
Sep 30, 2020
Version your datasets with Data Version Control (DVC) and Git
Categories: Data Science, DevOps & SRE | Tags: DevOps, Git, Infrastructure, Operation, SCM
Using a Version Control System such as Git for source code is a good practice and an industry standard. Considering that projects focus more and more on data, shouldn’t we have a similar approach such…
By Grégor JOUET
Sep 3, 2020
Automate a Spark routine workflow from GitLab to GCP
Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: Airflow, Spark, CI/CD, Learning and tutorial, GitLab, GCP, Terraform
A workflow consists in automating a succession of tasks to be carried out without human intervention. It is an important and widespread concept which particularly apply to operational environments…
Jun 16, 2020
InfraOps & DevOps Internship - build a Big Data & Kubernetes PaaS
Categories: Big Data, Containers Orchestration | Tags: Kafka, Spark, DevOps, LXD, NoSQL, Hadoop, Ceph, Kubernetes
Context The acquisition of a high-capacity cluster is in line with Adaltas’ desire to build a PAAS-type offering to use and to provide Big Data and container orchestration platforms. The platforms are…
By David WORMS
Nov 26, 2019
Introduction to Cloudera Data Science Workbench
Categories: Data Science | Tags: Cloudera, Docker, Git, Kubernetes, Machine Learning, Azure, Notebook
Cloudera Data Science Workbench is a platform that allows Data Scientists to create, manage, run and schedule data science workflows from their browser. Thus it enables them to focus on their main…
Feb 28, 2019