Docker
Docker is an open-source project that pioners the usage of container technologies with LXC, the container runtime it was using at the time. Unlike virtual machines which emulate virtual hardwares, containers share the resources of the operating system and are much more efficient. Docker leverages Linux kernel features including cgroups and namespaces to isolate processes and ensure they run independently with the expected resources. Docker also smoothens the process for creating, building containers, sharing and versioning images.
Docker images are created from Dockerfile and are uploaded to online repositories like the Docker Hub in order to be shared publicly with the community or privately within your organization. By design Docker enforces modulatiry and allows developers to pack, ship and run any applications as a lightweight, portable and self-sufficient containers. By simplifying their usage and packing all the underlying technologies into a coherent product, Docker was the main driver towards the adoption of container technologies across the industry.
- Related tags
- Kubernetes
Related articles

Deploy your containerized AI applications with nvidia-docker
Categories: Containers Orchestration, Data Science | Tags: containerd, DevOps, Learning and tutorial, NVIDIA, Container, Docker, Keras, TensorFlow
More and more products and services are taking advantage of the modeling and prediction capabilities of AI. This article presents the nvidia-docker tool for integrating AI (Artificial Intelligence…
Mar 24, 2022

Internship in Web Technologies
Categories: Front End, Learning | Tags: DevOps, LDAP, React.js, CI/CD, Docker, GraphQL, IaC, Internship, Kubernetes, Node.js, OAuth2
Job Description As part of its Big Data activities, Adaltas Academy is an information-sharing platform bringing together articles, training content, and a knowledge base. The users of the platform are…
By David WORMS
Oct 14, 2021

Adaltas Summit 2021, 2nd edition in corsica
Categories: Adaltas Summit 2021, Learning | Tags: Ansible, Hadoop, Spark, Azure, Blockchain, Deep Learning, Docker, Terraform, Kubernetes, Node.js
For its second edition, the whole Adaltas crew is gathering in Corsica for a whole week with 2 days dedicated to technology the 23rd and the 24th of september 2021. After a year and a half of sanitary…
By David WORMS
Sep 21, 2021

Desacralizing the Linux overlay filesystem in Docker
Categories: Containers Orchestration, Infrastructure | Tags: DevOps, File system, Linux, Docker
Overlay filesystems (also called union filesystems) is a fundamental technology in Docker to create images and containers. They allow creating a union of directories to create a filesystem. Multiple…
By David WORMS
Jun 3, 2021

Apache Liminal: when MLOps meets GitOps
Categories: Big Data, Containers Orchestration, Data Engineering, Data Science, Tech Radar | Tags: Data Engineering, CI/CD, Data Science, Deep Learning, Deployment, Docker, GitOps, Kubernetes, Machine Learning, MLOps, Open source, Python, TensorFlow
Apache Liminal is an open-source software which proposes a solution to deploy end-to-end Machine Learning pipelines. Indeed it permits to centralize all the steps needed to construct Machine Learning…
Mar 31, 2021

Installing Hadoop from source: build, patch and run
Categories: Big Data, Infrastructure | Tags: Maven, Java, LXD, Unit tests, Hadoop, HDFS, Docker, TDP
Commercial Apache Hadoop distributions have come and gone. The two leaders, Cloudera and Hortonworks, have merged: HDP is no more and CDH is now CDP. MapR has been acquired by HP and IBM BigInsights…
Aug 4, 2020

Introducing Apache Airflow on AWS
Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: Airflow, PySpark, Learning and tutorial, Oozie, Spark, AWS, Docker, Python
Apache Airflow offers a potential solution to the growing challenge of managing an increasingly complex landscape of data management tools, scripts and analytics processes. It is an open-source…
May 5, 2020

Expose a Rook-based Ceph cluster outside of Kubernetes
Categories: Containers Orchestration | Tags: Debug, Rook, Ceph, Container, Docker, Kubernetes
We recently deployed a LXD based Hadoop cluster and we wanted to be able to apply size quotas on some filesystems (ie: service logs, user homes). Quota is a built in feature of the Linux kernel used…
Apr 16, 2020

Install and debug Kubernetes inside LXD
Categories: Containers Orchestration | Tags: Debug, Linux, LXD, Container, Docker, Kubernetes, Node
We recently deployed a Kubernetes cluster with the need to maintain clusters isolation on our bare metal nodes across our infrastructure. We knew that Virtual Machines would provide the required…
Feb 4, 2020

Policy enforcing with Open Policy Agent
Categories: Cyber Security, Data Governance | Tags: Ranger, REST, Kafka, Authorization, Cloud, Kubernetes, SSL/TLS
Open Policy Agent is an open-source multi-purpose policy engine. Its main goal is to unify policy enforcement across the cloud native stack. The project was created by Styra and it is currently…
Jan 22, 2020

Logstash pipelines remote configuration and self-indexing
Categories: Data Engineering, Infrastructure | Tags: Docker, Elasticsearch, Kibana, Logstash, Log4j
Logstash is a powerful data collection engine that integrates in the Elastic Stack (Elasticsearch - Logstash - Kibana). The goal of this article is to show you how to deploy a fully managed Logstash…
Dec 13, 2019

Machine Learning model deployment
Categories: Big Data, Data Engineering, Data Science, DevOps & SRE | Tags: DevOps, Operation, AI, Cloud, Machine Learning, MLOps, On-premises, Schema
“Enterprise Machine Learning requires looking at the big picture […] from a data engineering and a data platform perspective,” lectured Justin Norman during the talk on the deployment of Machine…
Sep 30, 2019

TensorFlow installation on Docker
Categories: Containers Orchestration, Data Science, Learning | Tags: CPU, Jupyter, Linux, AI, Deep Learning, Docker, TensorFlow
TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array…
Aug 5, 2019

Introduction to Cloudera Data Science Workbench
Categories: Data Science | Tags: Azure, Cloudera, Docker, Git, Kubernetes, Machine Learning, MLOps, Notebook
Cloudera Data Science Workbench is a platform that allows Data Scientists to create, manage, run and schedule data science workflows from their browser. Thus it enables them to focus on their main…
Feb 28, 2019

Installing Kubernetes on CentOS 7
Categories: Containers Orchestration | Tags: CentOS, cgroups, CNCF, DevOps, Infrastructure, Namespaces, Red Hat, VM, Ceph, Docker, Kubernetes
This article explains how to install a Kubernetes cluster. I will dive into what each step does so you can build a thorough understanding of what is going on. This article is based on my talk from the…
Jan 29, 2019

LXD: The Missing Piece
Categories: Containers Orchestration | Tags: CPU, Linux, LXD, VM, Docker, Kubernetes
LXD stands for Linux Container Daemon. Yet another container technology. But LXD is very different. It stands apart from the pack. It is not necessarily better nor much faster nor more secure! But it…
Dec 28, 2018

Monitoring a production Hadoop cluster with Kubernetes
Categories: DevOps & SRE | Tags: Thrift, Grafana, Shinken, Hadoop, Knox, Cluster, Docker, Elasticsearch, Kubernetes, Node, Node.js, Prometheus, Python
Monitoring a production grade Hadoop cluster is a real challenge and needs to be constantly evolving. The software we use today is based on Nagios. Very efficient when it comes to the simplest…
Dec 21, 2018

Microsoft introduces Cloud Native Application Bundles
Categories: Containers Orchestration | Tags: CLI, Helm, Packaging, Docker, Kubernetes
At DockerCon EU 2018 in Barcelona, Matt Butcher, Principal Engineer at Microsoft and inventor of Helm, introduced CNAB, Cloud Native Application Bundles, a packaging format for distributed…
Dec 4, 2018

Clusters and workloads migration from Hadoop 2 to Hadoop 3
Categories: Big Data, Infrastructure | Tags: Slider, YARN, Erasure Coding, Rolling Upgrade, HDFS, Spark, Docker
Hadoop 2 to Hadoop 3 migration is a hot subject. How to upgrade your clusters, which features present in the new release may solve current problems and bring new opportunities, how are your current…
Jul 25, 2018

Apache Hadoop YARN 3.0 – State of the union
Categories: Big Data, DataWorks Summit 2018 | Tags: YARN, GPU, Hortonworks, Hadoop, HDFS, MapReduce, Cloudera, Data Science, Docker, Release and features
This article covers the ”Apache Hadoop YARN: state of the union” talk held by Wangda Tan from Hortonworks during the Dataworks Summit 2018. What is Apache YARN? As a reminder, YARN is one of the two…
May 31, 2018

YARN and GPU Distribution for Machine Learning
Categories: Data Science, DataWorks Summit 2018 | Tags: YARN, GPU, Machine Learning, Neural Network, Storage
This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be…
By Grégor JOUET
May 30, 2018

Mesos Introduction
Categories: Containers Orchestration, Open Source Summit Europe 2017 | Tags: Mesos, Container Orchestration, CUDA, GPU, Container, Data Science, Docker
Apache Mesos is an open source cluster management project designed to implement and optimize distributed systems. Mesos enables the management and sharing of resources in a fine and dynamic way…
Nov 15, 2017

From Dockerfile to Ansible Containers
Categories: Containers Orchestration, DevOps & SRE, Open Source Summit Europe 2017 | Tags: Ansible, pip, Shell, YAML, Docker, Docker Compose
This talk was an introduction to the Dockerfile format and to Ansible container’s tool and then a comparison of both. It was hold by Tomas Tomecek from Red Hat’s containerization team. The Dockerfile…
Oct 25, 2017

Kubernetes Storage Primitives for Stateful Workloads
Categories: Cloud Computing, Containers Orchestration, Open Source Summit Europe 2017 | Tags: Container Storage Interface (CSI), PVC, Azure, Docker, GCE, Kubernetes, Storage
This article is based on the presentation “Introduction to Kubernetes Storage Primitives for Stateful Workloads” from the OSS Convention Prague 2017 by the {Code} team. So, let’s start, what is…
Oct 28, 2017

Nobody* puts Java in a Container
Categories: Containers Orchestration, Open Source Summit Europe 2017, Infrastructure | Tags: cgroups, Java, JRE, JVM, Namespaces, Docker
This talk was about the issues of putting Java in a container and how, in its latest version, the JDK is now more aware of the container it is running in. The presentation is led by Joerg Schad…
Oct 28, 2017

Network Namespace without Docker
Categories: Hack | Tags: DNS, Linux, Namespaces, VLAN, Docker, Network
Let’s imagine the following use case: I am connected to several networks (wlan0, eth0, usb0). I want to choose which network I’m gonna use when I launch apps. My app doesn’t allow me to choose a…
Jul 6, 2016

What's new in Apache Spark 2.3?
Categories: Data Engineering, DataWorks Summit 2018 | Tags: Arrow, ORC, PySpark, Tuning, Spark, Spark MLlib, Data Science, Docker, Kubernetes, pandas, Streaming
Let’s dive into the new features offered by the 2.3 distribution of Apache Spark. This article is a composition of the following talks seen at the DataWorks Summit 2018 and additional research: Apache…
May 23, 2018