Docker

Related articles

Introducing Apache Airflow on AWS

Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: Airflow, Oozie, Spark, PySpark, Docker, Learning and tutorial, AWS, Python

Apache Airflow offers a potential solution to the growing challenge of managing an increasingly complex landscape of data management tools, scripts and analytics processes. It is an open-source…

Aargan COINTEPAS

By Aargan COINTEPAS

May 5, 2020

Expose a Rook-based Ceph cluster outside of Kubernetes

Categories: Containers Orchestration | Tags: Container, Debug, Docker, Rook, Ceph, Kubernetes

We recently deployed a LXD based Hadoop cluster and we wanted to be able to apply size quotas on some filesystems (ie: service logs, user homes). Quota is a built in feature of the Linux kernel used…

Leo SCHOUKROUN

By Leo SCHOUKROUN

Apr 16, 2020

Install and debug Kubernetes inside LXD

Categories: Containers Orchestration | Tags: Container, Debug, Docker, Linux, LXD, Kubernetes

We recently deployed a Kubernetes cluster with the need to maintain clusters isolation on our bare metal nodes across our infrastructure. We knew that Virtual Machines would provide the required…

Leo SCHOUKROUN

By Leo SCHOUKROUN

Feb 4, 2020

Policy enforcing with Open Policy Agent

Categories: Cyber Security, Data Governance | Tags: Kafka, Ranger, Authorization, Cloud, REST, Kubernetes, SSL/TLS

Open Policy Agent is an open-source multi-purpose policy engine. Its main goal is to unify policy enforcement across the cloud native stack. The project was created by Styra and it is currently…

Leo SCHOUKROUN

By Leo SCHOUKROUN

Jan 22, 2020

Logstash pipelines remote configuration and self-indexing

Categories: Data Engineering, Infrastructure | Tags: Docker, Elasticsearch, Kibana, Logstash, Log4j

Logstash is a powerful data collection engine that integrates in the Elastic Stack (Elasticsearch - Logstash - Kibana). The goal of this article is to show you how to deploy a fully managed Logstash…

Paul-Adrien CORDONNIER

By Paul-Adrien CORDONNIER

Dec 13, 2019

Machine Learning model deployment

Categories: Big Data, Data Engineering, Data Science, DevOps & SRE | Tags: Cloud, DevOps, On-premise, Operation, Schema, AI, Machine Learning, MLOps

“Enterprise Machine Learning requires looking at the big picture … from a data engineering and a data platform perspective,” lectured Justin Norman during the talk on the deployment of Machine…

Oskar RYNKIEWICZ

By Oskar RYNKIEWICZ

Sep 30, 2019

TensorFlow installation on Docker

Categories: Containers Orchestration, Data Science, Learning | Tags: CPU, Deep Learning, Docker, Jupyter, Linux, AI, TensorFlow

TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array…

Pierre SAUVAGE

By Pierre SAUVAGE

Aug 5, 2019

Introduction to Cloudera Data Science Workbench

Categories: Data Science | Tags: Cloudera, Docker, Git, Kubernetes, Machine Learning, Azure, Notebook

Cloudera Data Science Workbench is a platform that allows Data Scientists to create, manage, run and schedule data science workflows from their browser. Thus it enables them to focus on their main…

Mehdi ELALAMI

By Mehdi ELALAMI

Feb 28, 2019

Installing Kubernetes on CentOS 7

Categories: Containers Orchestration | Tags: CentOS, cgroups, CNCF, DevOps, Docker, Infrastructure, Namespaces, Red Hat, VM, Ceph, Kubernetes

This article explains how to install a Kubernetes cluster. I will dive into what each step does so you can build a thorough understanding of what is going on. This article is based on my talk from the…

Arthur BUSSER

By Arthur BUSSER

Jan 29, 2019

LXD: The Missing Piece

Categories: Containers Orchestration | Tags: CPU, Docker, Linux, LXD, VM, Kubernetes

LXD stands for Linux Container Daemon. Yet another container technology. But LXD is very different. It stands apart from the pack. It is not necessarily better nor much faster nor more secure! But it…

Tariq SAHNOUNI

By Tariq SAHNOUNI

Dec 28, 2018

Monitoring a production Hadoop cluster with Kubernetes

Categories: DevOps & SRE | Tags: Thrift, Docker, Elasticsearch, Graphana, Node.js, Prometheus, Shinken, Hadoop, Knox, Kubernetes, Python

Monitoring a production grade Hadoop cluster is a real challenge and needs to be constantly evolving. The software we use today is based on Nagios. Very efficient when it comes to the simplest…

Paul-Adrien CORDONNIER

By Paul-Adrien CORDONNIER

Dec 21, 2018

Microsoft introduces Cloud Native Application Bundles

Categories: Containers Orchestration | Tags: CLI, Docker, Helm, Packaging, Kubernetes

At DockerCon EU 2018 in Barcelona, Matt Butcher, Principal Engineer at Microsoft and inventor of Helm, introduced CNAB, Cloud Native Application Bundles, a packaging format for distributed…

Arthur BUSSER

By Arthur BUSSER

Dec 4, 2018

Clusters and workloads migration from Hadoop 2 to Hadoop 3

Categories: Big Data, Infrastructure | Tags: HDFS, Slider, Spark, YARN, Docker, Erasure Coding, Rolling Upgrade

Hadoop 2 to Hadoop 3 migration is a hot subject. How to upgrade your clusters, which features present in the new release may solve current problems and bring new opportunities, how are your current…

Lucas BAKALIAN

By Lucas BAKALIAN

Jul 25, 2018

Apache Hadoop YARN 3.0 – State of the union

Categories: Big Data, DataWorks Summit 2018 | Tags: HDFS, MapReduce, YARN, Cloudera, Docker, GPU, Hortonworks, Release and features, Hadoop

This article covers the ”Apache Hadoop YARN: state of the union” talk held by Wangda Tan from Hortonworks during the Dataworks Summit 2018. What is Apache YARN? As a reminder, YARN is one of the two…

Lucas BAKALIAN

By Lucas BAKALIAN

May 31, 2018

YARN and GPU Distribution for Machine Learning

Categories: Data Science, DataWorks Summit 2018 | Tags: YARN, GPU, Machine Learning, Neural Network, Storage

This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be…

Grégor JOUET

By Grégor JOUET

May 30, 2018

What's new in Apache Spark 2.3?

Categories: Data Engineering, DataWorks Summit 2018 | Tags: Arrow, ORC, Spark, PySpark, Docker, Streaming, Tuning, Spark MLlib, Kubernetes, pandas

Let’s dive into the new features offered by the 2.3 distribution of Apache Spark. This article is a composition of the following talks seen at the DataWorks Summit 2018 and additional research: Apache…

César BEREZOWSKI

By César BEREZOWSKI

May 23, 2018

Mesos Introduction

Categories: Containers Orchestration, Open Source Summit Europe 2017 | Tags: Mesos, Container, Container Orchestration, CUDA, Docker, GPU

Apache Mesos is an open source cluster management project designed to implement and optimize distributed systems. Mesos enables the management and sharing of resources in a fine and dynamic way…

Louis BIANCHERIN

By Louis BIANCHERIN

Nov 15, 2017

Kubernetes Storage Primitives for Stateful Workloads

Categories: Cloud Computing, Containers Orchestration, Open Source Summit Europe 2017 | Tags: Docker, Container Storage Interface (CSI), PVC, GCE, Kubernetes, Azure, Storage

This article is based on the presentation “Introduction to Kubernetes Storage Primitives for Stateful Workloads” from the OSS Convention Prague 2017 by the {Code} team. So, let’s start, what is…

Pierre SAUVAGE

By Pierre SAUVAGE

Oct 28, 2017

Nobody* puts Java in a Container

Categories: Containers Orchestration, Open Source Summit Europe 2017, Infrastructure | Tags: cgroups, Docker, Java, JRE, JVM, Namespaces

This talk was about the issues of putting Java in a container and how, in its latest version, the JDK is now more aware of the container it is running in. The presentation is led by Joerg Schad…

Paul-Adrien CORDONNIER

By Paul-Adrien CORDONNIER

Oct 28, 2017

From Dockerfile to Ansible Containers

Categories: Containers Orchestration, DevOps & SRE, Open Source Summit Europe 2017 | Tags: Ansible, Docker, Docker Compose, pip, Shell, YAML

This talk was an introduction to the Dockerfile format and to Ansible container’s tool and then a comparison of both. It was hold by Tomas Tomecek from Red Hat’s containerization team. The Dockerfile…

César BEREZOWSKI

By César BEREZOWSKI

Oct 25, 2017

Network Namespace without Docker

Categories: Hack | Tags: DNS, Docker, Linux, Namespaces, Network, VLAN

Let’s imagine the following use case: I am connected to several networks (wlan0, eth0, usb0). I want to choose which network I’m gonna use when I launch apps. My app doesn’t allow me to choose a…

Pierre SAUVAGE

By Pierre SAUVAGE

Jul 6, 2016

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.