Back in days, applications were monolithic and we could use an IP address to access a service. With virtual machines (VM), multiple hosts start to appear on a same machine with multiple apps. Things were still similar with VMs than with physical machines as services were still accessible from an IP.
With MicroServices, things changed and people were not prepared. With Mesos or Kubernates, applications are split into many services with many replicas. A bunch of services make an application. Applications may share a same IP and physical hosts and use a different port. They could also use random IP, random ports and start/stop at any time… Changes became the only constant.
How can we keep our service registry updated, DNS instantiated between servers, how to setup gateways…
Following the Wikipedia definition, service discovery is the automatic detection of devices and services offered by these devices on a computer network. A service discovery protocol (SDP) is a network protocol that helps accomplish service discovery.
Main Requirements are:
- Up to date
Those are typical requirements, but it’s not enough :
- Don’t communicate with unhealthy services
- Subsettings, limiting the pool of potential backend tasks with which a client task interacts
- Load distribution smarter than round robin with better optimization (for example based on host capacity, neighbours, responsibilities)
- Minimize waste of resources
- Protect individual tasks against overload implemented at the communication layer (not necessary in the app)
The role of the proxy (for example Nginx, Marathon-lb) is to transparently forward connections from the static service port exposing the service to the outside world into to the dynamically assigned host/port of the contained service.
- Single point of change, just update the proxy configuration
- No in app dependencies, no need to update the app
- Fine-grained load balancing
- Single Point of Failure (SPoF)
- Additional hop, man in the middle (MitM) attacks
- Require common protocol (usually), the proxy needs to know how a package looks like, how it needs to be routed
Custom SRV records into the DNS are used to route the traffic into the containers.
- No hops
- No SPOF – works with stale data
- Protocol independent
- In-app dependency, if we need to change anything (eg in load balancing), the application needs reconfiguration and eventually redeployment
- Local load balancing, no control on load distribution, no prevention of denial of service
- Cache invalidation
A service mesh is a dedicated infrastructure layer for making service-to-service communication more secured, fast, reliable, observable and manageable.
It focuses on solving communication problems between containers. It addresses routing, rerouting with graceful degradation as services fail, and secure inter-service communication. In traditional apps, this logic is built directly into the application.
As architectures become increasingly segmented into multiple apps and multiple in-app services, moving communication logic out of the application eases the development and integration process while providing more resilient functionalities. Just as applications shouldn’t be writing their own TCP stack, they also shouldn’t be managing their own load balancing logic, or their own service discovery management, or their own retry and timeout policies. With service mesh, the application code doesn’t need to know about network topology, service discovery, load balancing and connection management logic.
Consist of two parts:
The sidecar is like a plug-in for a service. It is a utility container which runs next to a container with a light coupling between to two. A sidecar container is reusable and can be paired with numerous containers. It is responsible for:
- service discovery
- health checking, register/de-register
- routing and load balancing : when two services need to talk, they talk to the sidecar
- Authentication and Authorization (AuthN/Z), refusing creating connections
- Protocol abstraction, transparent TLS encryption, HTTP/1 to HTTP/2 upgrade
- Metrics/Tracing, success rates, request volumes, and latencies
The controller is the brain of the system, responsible for managing and configuring proxies to route traffic, as well as enforcing policies at runtime. It gathers the data from the sidecar, got all information and notify the sidecar about how they should behave, talk to the scheduler about what instances are running, which ones should be up or down… It also take care of deployments.
- No load balancing
- No circuit breaking/repeating policy
- No tracing
At the time of this writing, Linkerd and Istio are two open source projects which are considered matured. Linkerd can run on Kubernetes, DC/OS, and a cluster of host machines. It is part of the Cloud Native Computing Foundation, is built on top of Finagle and can use gRPC. Istio currently runs only on Kubernetes. In Istio, the sidecar container is called data plane (based on Envoy service proxy) and the controller is called the plane controller.