DevOps and Site Reliability Engineering (SRE)
DevOps is understood as part of the corporate culture with certain principles that a company aspires to and follows for the long term. Supporters of this culture value collaboration, the joy of experimenting and the willingness to learn. All parties involved in a DevOps culture focus on one goal throughout the entire software delivery lifecycle (not just development and operations): the rapid implementation of stable, high-quality software, from concept to customer or user.
The automation of software development, testing and deployment through Continuous Delivery (CD) is a recognized key factor for DevOps. Automation enables faster software implementation and ensures the solutions have the quality, security and stability they need.
Objectives
Defining and contributing to:
- Service Level Indicator (SLI)
- Service Level Objective (SLO)
- Service Level Agreements (SLA)
- Service risk, level of availability and error budget
Collaboration
Works together with the application developers:
- Change management
- Set commons goals
- Ensure production delivery
- Improve system reliabity
Responsibilities
Involved and responsible for:
- Monitoring and alterting
- Capacity planning and availability
- latency, performance and efficiency
- Emergency response and automation
Articles related to DevOps
Categories: DevOps & SRE, Front End | Tags: Monorepo, Node.js
Unit testing is essential for every long-term project and allows you to pull down functionalities of your code into isolated testable units. Indeed the main goal of a unit test is to verify if an…
By David WORMS
Feb 25, 2021
JS monorepos in prod 3: commit enforcement and changelog generation
Categories: DevOps & SRE, Front End | Tags: CI/CD, Git, JavaScript, Unit tests, Monorepo, Node.js, Release and features
Conventional Commits introduces a structured format for commit messages. It standardizes the messages among all the contributors. This makes them more readable and easy to automate. It simplifies the…
By David WORMS
Feb 2, 2021
JS monorepos in prod 2: project versioning and publishing
Categories: DevOps & SRE, Front End | Tags: CI/CD, Git, JavaScript, Unit tests, Monorepo, Node.js, Release and features
One great advantage of a monorepo is to maintain coherent versions between packages and to automatize the version creation and the publication of packages. This article covers the versioning and…
By David WORMS
Jan 11, 2021
JS monorepos in prod 1: project initialization
Categories: DevOps & SRE, Front End | Tags: Git, JavaScript, Monorepo, Node.js, Release and features
Every project journey begins with the step of initialization. When your overall project is composed of multiple projects, it is tempting to create one Git repository per project. In Node.js, a project…
By David WORMS
Jan 5, 2021
Data versioning and reproducible ML with DVC and MLflow
Categories: Data Science, DevOps & SRE, Events | Tags: Data Engineering, Git, Databricks, Delta Lake, Machine Learning, MLflow, Storage
Our talk on data versioning and reproducible Machine Learning proposed to the Data + AI Summit (formerly known as Spark+AI) is accepted. The summit will take place online the 17-19th November…
Sep 30, 2020
Version your datasets with Data Version Control (DVC) and Git
Categories: Data Science, DevOps & SRE | Tags: DevOps, Git, Infrastructure, Operation, SCM
Using a Version Control System such as Git for source code is a good practice and an industry standard. Considering that projects focus more and more on data, shouldn’t we have a similar approach such…
By Grégor JOUET
Sep 3, 2020
Machine Learning model deployment
Categories: Big Data, Data Engineering, Data Science, DevOps & SRE | Tags: DevOps, Operation, AI, Cloud, Machine Learning, MLOps, On-premises, Schema
“Enterprise Machine Learning requires looking at the big picture … from a data engineering and a data platform perspective,” lectured Justin Norman during the talk on the deployment of Machine…
Sep 30, 2019
Rook with Ceph doesn't provision my Persistent Volume Claims!
Categories: DevOps & SRE | Tags: PVC, Linux, Rook, Ubuntu, Ceph, Cluster, Kubernetes
Ceph installation inside Kubernetes can be provisionned using Rook. Currently doing an internship at Adaltas, I was in charge of participating in the setup of a Kubernetes (k8s) cluster. To avoid…
Sep 9, 2019
Spark Streaming part 3: DevOps, tools and tests for Spark applications
Categories: Big Data, Data Engineering, DevOps & SRE | Tags: Spark, Apache Spark Streaming, DevOps, Learning and tutorial
Whenever services are unavailable, businesses experience large financial losses. Spark Streaming applications can break, like any other software application. A streaming application operates on data…
Jun 19, 2019
Monitoring a production Hadoop cluster with Kubernetes
Categories: DevOps & SRE | Tags: Thrift, Docker, Elasticsearch, Graphana, Prometheus, Shinken, Hadoop, Knox, Cluster, Kubernetes, Node, Node.js, Python
Monitoring a production grade Hadoop cluster is a real challenge and needs to be constantly evolving. The software we use today is based on Nagios. Very efficient when it comes to the simplest…
Dec 21, 2018
Hadoop cluster takeover with Apache Ambari
Categories: Big Data, DevOps & SRE, Adaltas Summit 2018 | Tags: Ambari, Automation, HDP, iptables, Kerberos, Nikita, REST, Systemd, Cluster, Node, Node.js
We recently migrated a large production Hadoop cluster from a “manual” automated install to Apache Ambari, we called this the Ambari Takeover. This is a risky process and we will detail why this…
Nov 15, 2018
KVM machines for Vagrant on Archlinux
Categories: DevOps & SRE | Tags: Arch Linux, KVM, Linux, Virtualization, VM, Vagrant
Vagrant supports different providers to manage virtualization. In a Linux environment, you can dramatically improve VM performance by using the libvirt provider and the KVM hypervisor. This tutorial…
Sep 19, 2018
Publishing guidelines
Categories: DevOps & SRE | Tags: Arch Linux, KVM, Markdown, VM, Vagrant
This is as much a set of guidelines targeting everyone publishing content on the web as rules for reviewers to ensure no validation is forgotten before submitting for publication. It mostly targets…
By David WORMS
Feb 26, 2018
Ambari - How to blueprint
Categories: Big Data, DevOps & SRE | Tags: Ambari, Ranger, Automation, DevOps, Operation, REST
As infrastructure engineers at Adaltas, we deploy Hadoop clusters. A lot of them. Let’s see how to automate this process with REST requests. While really handy for deploying one or two clusters, the…
Jan 17, 2018
Apache Thrift vs REST
Categories: DevOps & SRE, Open Source Summit Europe 2017 | Tags: Thrift, gRPC, HTTP, JSON, REST
Adaltas recently attended the Open Source Summit Europe 2017 in Prague. I had the opportunity to follow a presentation made by Randy Abernethy and Jens Geyer of RM-X, a cloud native consulting company…
Oct 28, 2017
From Dockerfile to Ansible Containers
Categories: Containers Orchestration, DevOps & SRE, Open Source Summit Europe 2017 | Tags: Ansible, Docker, Docker Compose, pip, Shell, YAML
This talk was an introduction to the Dockerfile format and to Ansible container’s tool and then a comparison of both. It was hold by Tomas Tomecek from Red Hat’s containerization team. The Dockerfile…
Oct 25, 2017
Multi-Repo, Multi-Node Gating at Massive Scale
Categories: Cloud Computing, DevOps & SRE, Open Source Summit Europe 2017 | Tags: Ansible, CI/CD, Infrastructure, Jenkins, Red Hat, Zuul, OpenStack
This is a recap and personal review of Monty Taylor’s presentation of OpenStack’s Continuous Integration tool Zuul at the OpenSource Summit 2017 in Prague (not to mix with Netflix’ Zuul project…
Oct 24, 2017
MiNiFi: Data at Scales & the Values of Starting Small
Categories: Big Data, DevOps & SRE, Infrastructure | Tags: MiNiFi, NiFi, Cloudera, C++, HDP, HDF, IOT
This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it’s a NiFi minimal agent to deploy on small devices to bring data to a cluster’s NiFi pipeline (ex: IoT…
Jul 8, 2017
HDP cluster monitoring
Categories: Big Data, DevOps & SRE, Infrastructure | Tags: Alert, Ambari, HDP, Metrics, Monitoring, REST
With the current growth of BigData technologies, more and more companies are building their own clusters in hope to get some value of their data. One main concern while building these infrastructures…
Jul 5, 2017
Hive Metastore HA with DBTokenStore: Failed to initialize master key
Categories: Big Data, DevOps & SRE | Tags: Hive, Bug, Infrastructure
This article describes my little adventure around a startup error with the Hive Metastore. It shall be reproducable with any secure installation, meaning with Kerberos, with high availability enabled…
By David WORMS
Jul 21, 2016
A fresh look at testing Node.js projects: Mocha, Should and Travis
Categories: DevOps & SRE, Node.js | Tags: CI/CD, DevOps, JavaScript, Mocha, Unit tests, Node.js
Today, I finally decided to spend some time around Travis. It’s been a few weeks since that little green image on top of many GitHub homepages has been buzzing me. Well, to be totally honest, this isn…
By David WORMS
Feb 19, 2012
Announcing Mecano, a set of functions for system deployment
Categories: DevOps & SRE, Node.js | Tags: Automation, CoffeeScript, Infrastructure, JavaScript, Open source
Update July 2016, Mecano is now renamed Nikita. We are releasing Node Mecano on GitHub which gather common functions used while deploying systems. The idea was to group those functions into a…
By David WORMS
Feb 12, 2012