Docker
Related articles
Installing Hadoop from source: build, patch and run
Categories: Big Data, Infrastructure | Tags: HDFS, Maven, Docker, Java, LXD, Unit tests, Hadoop
Commercial Apache Hadoop distributions have come and gone. The two leaders, Cloudera and Hortonworks, have merged: HDP is no more and CDH is now CDP. MapR has been acquired by HP and IBM BigInsights…
Aug 4, 2020
Introducing Apache Airflow on AWS
Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: Airflow, Oozie, Spark, PySpark, Docker, Learning and tutorial, AWS, Python
Apache Airflow offers a potential solution to the growing challenge of managing an increasingly complex landscape of data management tools, scripts and analytics processes. It is an open-source…
May 5, 2020
Expose a Rook-based Ceph cluster outside of Kubernetes
Categories: Containers Orchestration | Tags: Container, Debug, Docker, Rook, Ceph, Kubernetes
We recently deployed a LXD based Hadoop cluster and we wanted to be able to apply size quotas on some filesystems (ie: service logs, user homes). Quota is a built in feature of the Linux kernel used…
Apr 16, 2020
Install and debug Kubernetes inside LXD
Categories: Containers Orchestration | Tags: Container, Debug, Docker, Linux, LXD, Kubernetes, Node
We recently deployed a Kubernetes cluster with the need to maintain clusters isolation on our bare metal nodes across our infrastructure. We knew that Virtual Machines would provide the required…
Feb 4, 2020
Policy enforcing with Open Policy Agent
Categories: Cyber Security, Data Governance | Tags: Kafka, Ranger, Authorization, REST, Cloud, Kubernetes, SSL/TLS
Open Policy Agent is an open-source multi-purpose policy engine. Its main goal is to unify policy enforcement across the cloud native stack. The project was created by Styra and it is currently…
Jan 22, 2020
Logstash pipelines remote configuration and self-indexing
Categories: Data Engineering, Infrastructure | Tags: Docker, Elasticsearch, Kibana, Logstash, Log4j
Logstash is a powerful data collection engine that integrates in the Elastic Stack (Elasticsearch - Logstash - Kibana). The goal of this article is to show you how to deploy a fully managed Logstash…
Dec 13, 2019
Machine Learning model deployment
Categories: Big Data, Data Engineering, Data Science, DevOps & SRE | Tags: DevOps, Operation, AI, Cloud, Machine Learning, MLOps, On-premises, Schema
“Enterprise Machine Learning requires looking at the big picture … from a data engineering and a data platform perspective,” lectured Justin Norman during the talk on the deployment of Machine…
Sep 30, 2019
TensorFlow installation on Docker
Categories: Containers Orchestration, Data Science, Learning | Tags: CPU, Deep Learning, Docker, Jupyter, Linux, AI, TensorFlow
TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array…
Aug 5, 2019
Introduction to Cloudera Data Science Workbench
Categories: Data Science | Tags: Cloudera, Docker, Git, Kubernetes, Machine Learning, Azure, Notebook
Cloudera Data Science Workbench is a platform that allows Data Scientists to create, manage, run and schedule data science workflows from their browser. Thus it enables them to focus on their main…
Feb 28, 2019
Installing Kubernetes on CentOS 7
Categories: Containers Orchestration | Tags: CentOS, cgroups, CNCF, DevOps, Docker, Infrastructure, Namespaces, Red Hat, VM, Ceph, Kubernetes
This article explains how to install a Kubernetes cluster. I will dive into what each step does so you can build a thorough understanding of what is going on. This article is based on my talk from the…
Jan 29, 2019
LXD: The Missing Piece
Categories: Containers Orchestration | Tags: CPU, Docker, Linux, LXD, VM, Kubernetes
LXD stands for Linux Container Daemon. Yet another container technology. But LXD is very different. It stands apart from the pack. It is not necessarily better nor much faster nor more secure! But it…
Dec 28, 2018
Monitoring a production Hadoop cluster with Kubernetes
Categories: DevOps & SRE | Tags: Thrift, Docker, Elasticsearch, Graphana, Prometheus, Shinken, Hadoop, Knox, Cluster, Kubernetes, Node, Node.js, Python
Monitoring a production grade Hadoop cluster is a real challenge and needs to be constantly evolving. The software we use today is based on Nagios. Very efficient when it comes to the simplest…
Dec 21, 2018
Microsoft introduces Cloud Native Application Bundles
Categories: Containers Orchestration | Tags: CLI, Docker, Helm, Packaging, Kubernetes
At DockerCon EU 2018 in Barcelona, Matt Butcher, Principal Engineer at Microsoft and inventor of Helm, introduced CNAB, Cloud Native Application Bundles, a packaging format for distributed…
Dec 4, 2018
Clusters and workloads migration from Hadoop 2 to Hadoop 3
Categories: Big Data, Infrastructure | Tags: HDFS, Slider, Spark, YARN, Docker, Erasure Coding, Rolling Upgrade
Hadoop 2 to Hadoop 3 migration is a hot subject. How to upgrade your clusters, which features present in the new release may solve current problems and bring new opportunities, how are your current…
Jul 25, 2018
Apache Hadoop YARN 3.0 – State of the union
Categories: Big Data, DataWorks Summit 2018 | Tags: HDFS, MapReduce, YARN, Cloudera, Docker, GPU, Hortonworks, Hadoop, Data Science, Release and features
This article covers the ”Apache Hadoop YARN: state of the union” talk held by Wangda Tan from Hortonworks during the Dataworks Summit 2018. What is Apache YARN? As a reminder, YARN is one of the two…
May 31, 2018
YARN and GPU Distribution for Machine Learning
Categories: Data Science, DataWorks Summit 2018 | Tags: YARN, GPU, Machine Learning, Neural Network, Storage
This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be…
By Grégor JOUET
May 30, 2018
What's new in Apache Spark 2.3?
Categories: Data Engineering, DataWorks Summit 2018 | Tags: Arrow, ORC, Spark, PySpark, Docker, Streaming, Tuning, Spark MLlib, Data Science, Kubernetes, pandas
Let’s dive into the new features offered by the 2.3 distribution of Apache Spark. This article is a composition of the following talks seen at the DataWorks Summit 2018 and additional research: Apache…
May 23, 2018
Mesos Introduction
Categories: Containers Orchestration, Open Source Summit Europe 2017 | Tags: Mesos, Container, Container Orchestration, CUDA, Docker, GPU, Data Science
Apache Mesos is an open source cluster management project designed to implement and optimize distributed systems. Mesos enables the management and sharing of resources in a fine and dynamic way…
Nov 15, 2017
Kubernetes Storage Primitives for Stateful Workloads
Categories: Cloud Computing, Containers Orchestration, Open Source Summit Europe 2017 | Tags: Docker, Container Storage Interface (CSI), PVC, GCE, Kubernetes, Azure, Storage
This article is based on the presentation “Introduction to Kubernetes Storage Primitives for Stateful Workloads” from the OSS Convention Prague 2017 by the {Code} team. So, let’s start, what is…
Oct 28, 2017
Nobody* puts Java in a Container
Categories: Containers Orchestration, Open Source Summit Europe 2017, Infrastructure | Tags: cgroups, Docker, Java, JRE, JVM, Namespaces
This talk was about the issues of putting Java in a container and how, in its latest version, the JDK is now more aware of the container it is running in. The presentation is led by Joerg Schad…
Oct 28, 2017
From Dockerfile to Ansible Containers
Categories: Containers Orchestration, DevOps & SRE, Open Source Summit Europe 2017 | Tags: Ansible, Docker, Docker Compose, pip, Shell, YAML
This talk was an introduction to the Dockerfile format and to Ansible container’s tool and then a comparison of both. It was hold by Tomas Tomecek from Red Hat’s containerization team. The Dockerfile…
Oct 25, 2017
Network Namespace without Docker
Categories: Hack | Tags: DNS, Docker, Linux, Namespaces, Network, VLAN
Let’s imagine the following use case: I am connected to several networks (wlan0, eth0, usb0). I want to choose which network I’m gonna use when I launch apps. My app doesn’t allow me to choose a…
Jul 6, 2016