Site Reliability Engineering (SRE)

SRE is a set of practices coming from Google's experience on treating operation as a software. The commitment towards full service lifecycle enables organizations to successfully build, deploy, monitor, and maintain software systems. SRE is composed of technical and cultural aspects with the shared objective of meeting the expected reliability targets.

The 5 basic principles of the DevOps philosophy and their implementation via the SRE are:

  1. Break down organizational silos

Large companies have a complex organizational structure with a

multitude of teams often working separately in "silos". Each team has a

different view of the whole, which encourages inefficiency. The task of

DevOps and SREs is to better align teams with each other towards

overall goals and towards a common vision. 2. Accept failures in the product lifecycle

Service Level Indicators (SLI) and Service Level Objectives (SLO) are

used to assess failures. SLIs measure failures over time. An SLO is a

service level agreement regarding a specific metric, such as

availability or response time, that must be met. Each failure leads to

reassessment and optimization of the objectives. SREs have a risk

budget to test the limits and more radical changes to potentially

innovate faster. SRE quantifies this acceptable risk as an "error

budget". 3. Implement changes in small, quick steps

Like DevOps, SRE encourages continuous improvement through small and

frequent development steps. 4. Use standard tools and automation

Incompatibility and integration issues between technologies create

silos, even in a DevOps environment. SRE introduces common technologies

and cross-access to information across different IT teams. SRE's policy

is to automate manual tasks that are repetitive, reactive, and produce

no lasting improvement. Automation should free up capabilities for work

that brings long-term benefits. 5. Base reliability on measurement data

The various stakeholders need to agree on a common way to measure

reliability and what to do when the value is out of specification. Key

DevOps metrics are number of deployments over time, time from commit to

release, number of failed deployments, and required recovery time.

Related articles

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain