Storage

Storage is the capacity to retain digital information on a computer component. In practice, storage is organised in hierarchy, placing hot data which required fast but costly access closer to the CPU and cold data further away on slower but persistent devices sometimes accessed through the network. Fast but volatile storage is most often called "memory.". The main characteristics of storage inclue volatility, mutability, accessibility, adressability, capacity, performance, energy use and security.

Related articles

Rook with Ceph doesn't provision my Persistent Volume Claims!

Categories: DevOps & SRE | Tags: PVC, Linux, Rook, Ubuntu, Ceph, Kubernetes

Ceph installation inside Kubernetes can be provisionned using Rook. Currently doing an internship at Adaltas, I was in charge of participating in the setup of a Kubernetes (k8s) cluster. To avoid…

Eyal CHOJNOWSKI

By Eyal CHOJNOWSKI

Sep 9, 2019

Running Apache Hive 3, new features and tips and tricks

Categories: Big Data, Business Intelligence, DataWorks Summit 2019 | Tags: Druid, Hive, Kafka, JDBC, LLAP, Release and features, Hadoop

Apache Hive 3 brings a bunch of new and nice features to the data warehouse. Unfortunately, like many major FOSS releases, it comes with a few bugs and not much documentation. It is available since…

Gauthier LEONARD

By Gauthier LEONARD

Jul 25, 2019

Apache Flink: past, present and future

Categories: Data Engineering | Tags: Flink, Pipeline, Streaming, Kubernetes, Machine Learning, SQL

Apache Flink is a little gem which deserves a lot more attention. Let’s dive into Flink’s past, its current state and the future it is heading to by following the keynotes and presentations at Flink…

César BEREZOWSKI

By César BEREZOWSKI

Nov 5, 2018

YARN and GPU Distribution for Machine Learning

Categories: Data Science, DataWorks Summit 2018 | Tags: YARN, GPU, Machine Learning, Neural Network, Storage

This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be…

Grégor JOUET

By Grégor JOUET

May 30, 2018

Notes after Katacoda Training on Kubernetes Container Orchestration

Categories: Containers Orchestration, Learning | Tags: Helm, Ingress, Kubeadm, CNI, Micro Services, Minikube, Kubernetes

A few weeks ago, I dedicated two days to follow the turorials available on Katacoda, the interactive learning platform for Kubernetes or any other container orchestration platform. I’m sharing my…

David WORMS

By David WORMS

Dec 14, 2017

Kubernetes Storage Primitives for Stateful Workloads

Categories: Cloud Computing, Containers Orchestration, Open Source Summit Europe 2017 | Tags: Docker, Container Storage Interface (CSI), PVC, GCE, Kubernetes, Azure, Storage

This article is based on the presentation “Introduction to Kubernetes Storage Primitives for Stateful Workloads” from the OSS Convention Prague 2017 by the {Code} team. So, let’s start, what is…

Pierre SAUVAGE

By Pierre SAUVAGE

Oct 28, 2017

Kubernetes 1.8

Categories: Containers Orchestration, Open Source Summit Europe 2017 | Tags: containerd, CRD, Network, OCI, RBAC, Release and features, Kubernetes

The 1.8 release of Kubernetes brings a lot of new things. With 2500+ pull request, 2000+ commits, 400+ commiters, Kubernetes added 39 new features in this version. This is the richest release in terms…

Younes YASSINE

By Younes YASSINE

Oct 24, 2017

Hive, Calcite and Druid

Categories: Big Data | Tags: Analytics, Druid, Hive, Database, Hadoop

BI/OLAP requires interactive visualization of complex data streams: Real time bidding events User activity streams Voice call logs Network trafic flows Firewall events Application KPIs Traditionnal…

David WORMS

By David WORMS

Jul 14, 2016

Red Hat Storage Gluster and its integration with Hadoop

Categories: Big Data | Tags: HDFS, GlusterFS, Red Hat, Hadoop, Storage

I had the opportunity to be introduced to Red Hat Storage and Gluster in a joint presentation by Red Hat France and the company StartX. I have here recompiled my notes, at least partially. I will…

David WORMS

By David WORMS

Jul 3, 2015

State of the Hadoop open-source ecosystem in early 2013

Categories: Big Data | Tags: Flume, Kafka, Mesos, Phoenix, Pig, Hadoop, Mahout

Hadoop is already a large ecosystem and my guess is that 2013 will be the year where it grows even larger. There are some pieces that we no longer need to present. ZooKeeper, hbase, Hive, Pig, Flume…

David WORMS

By David WORMS

Jul 8, 2013

Merging multiple files in Hadoop

Categories: Hack | Tags: HDFS, File system, Hadoop

This is a command I used to concatenate the files stored in Hadoop HDFS matching a globing expression into a single file. It uses the “getmerge” utility of but contrary to “getmerge”, the final…

David WORMS

By David WORMS

Jan 12, 2013

HDFS and Hive storage - comparing file formats and compression methods

Categories: Big Data | Tags: Analytics, Hive, ORC, File Format, Parquet

A few days ago, we have conducted a test in order to compare various Hive file formats and compression methods. Among those file formats, some are native to HDFS and apply to all Hadoop users. The…

David WORMS

By David WORMS

Mar 13, 2012

Two Hive UDAF to convert an aggregation to a map

Categories: Data Engineering | Tags: Hive, File Format, Java, HBase

I am publishing two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub in two Java classes: “UDAFToMap” and “UDAFToOrderedMap” or you can download the jar file. The…

David WORMS

By David WORMS

Mar 6, 2012

Timeseries storage in Hadoop and Hive

Categories: Data Engineering | Tags: HDFS, Hive, CRM, File Format, timeseries, Tuning, Hadoop

In the next few weeks, we will be exploring the storage and analytic of a large generated dataset. This dataset is composed of CRM tables associated to one timeserie table of about 7,000 billiard rows…

David WORMS

By David WORMS

Jan 10, 2012

Storage and massive processing with Hadoop

Categories: Big Data | Tags: HDFS, Hadoop, Storage

Apache Hadoop is a system for building shared storage and processing infrastructures for large volumes of data (multiple terabytes or petabytes). Hadoop clusters are used by a wide range of projects…

David WORMS

By David WORMS

Nov 26, 2010

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.