Amazon Simple Storage Service (S3)

Amazone S3 is a scalable, web-based cloud storage service for application programs and online backup and archiving of data at high speed and at low cost. To make web scale computing for developers as simple as possible, Amazon S3 was intentionally designed to offer a minimal feature set. Amazon S3 is an object storage service. The concept of object storage is different from file and block storage. Each object is filed together with an identification number and associated metadata. Applications use this identification number to access an object.

Related articles

Importing data to Databricks: external tables and Delta Lake

Categories: Data Engineering, Data Science, Learning | Tags: Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python

During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…

Introducing Apache Airflow on AWS

Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: Airflow, Oozie, Spark, PySpark, Docker, Learning and tutorial, AWS, Python

Apache Airflow offers a potential solution to the growing challenge of managing an increasingly complex landscape of data management tools, scripts and analytics processes. It is an open-source…

Aargan COINTEPAS

By Aargan COINTEPAS

May 5, 2020

Cloudera CDP and Cloud migration of your Data Warehouse

Categories: Big Data, Cloud Computing | Tags: Cloudera, Data Hub, Data Lake, Data Warehouse, Azure

While one of our customer is anticipating a move to the Cloud and with the recent announcement of Cloudera CDP availability mi-september during the Strata conference, it seems like the appropriate…

David WORMS

By David WORMS

Dec 16, 2019

Should you move your Big Data and Data Lake to the Cloud

Categories: Big Data, Cloud Computing | Tags: DevOps, AWS, Cloud, CDP, Databricks, GCP, Azure

Should you follow the trend and migrate your data, workflows and infrastructure to GCP, AWS and Azure? During the Strata Data Conference in New-York, a general focus was put on moving customer’s Big…

Joris RUMMENS

By Joris RUMMENS

Dec 9, 2019

Hadoop Ozone part 3: advanced replication strategy with Copyset

Categories: Infrastructure | Tags: HDFS, Ozone, Cluster, Kubernetes, Node

Hadoop Ozone provide a way of setting a ReplicationType for every write you make on the cluster. Right now is supported HDFS and Ratis but more advanced replication strategies can be achieved. In this…

Hadoop Ozone part 2: tutorial and getting started of its features

Categories: Infrastructure | Tags: HDFS, CLI, Learning and tutorial, REST, Ozone, Amazon S3, Cluster

The releases of Hadoop Ozone come with a handy docker-compose file to try out Ozone. The below instructions provide details on how to use it. You can also use the Katacoda training sandbox which…

Hadoop Ozone part 1: an introduction of the new filesystem

Categories: Infrastructure | Tags: HDFS, Ozone, Cluster, Kubernetes

Hadoop Ozone is an object store for Hadoop. It is designed to scale to billions of objects of varying sizes. It is currently in development. The roadmap is available on the project wiki. This article…

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.