Amazon Web Services (AWS)
Amazon Web Services is a division of Amazon that offers a platform for on-demand cloud computing with more than 175 affiliated services. Among them, AWS S3 and AWS EC2, which allows you to respectively host your data and rent virtual machines to execute various types of actions.
Related articles
Importing data to Databricks: external tables and Delta Lake
Categories: Data Engineering, Data Science, Learning | Tags: Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python
During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…
May 21, 2020
Introducing Apache Airflow on AWS
Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: Airflow, Oozie, Spark, PySpark, Docker, Learning and tutorial, AWS, Python
Apache Airflow offers a potential solution to the growing challenge of managing an increasingly complex landscape of data management tools, scripts and analytics processes. It is an open-source…
May 5, 2020
Snowflake, the Data Warehouse for the Cloud, introduction and tutorial
Categories: Business Intelligence, Cloud Computing | Tags: Cloud, Data Lake, Data Science, Data Warehouse, Snowflake
Snowflake is a SaaS-based data-warehousing platform that centralizes, in the cloud, the storage and processing of structured and semi-structured data. The increasing generation of data produced over…
Apr 7, 2020
MLflow tutorial: an open source Machine Learning (ML) platform
Categories: Data Engineering, Data Science, Learning | Tags: Deep Learning, AWS, Databricks, Deployment, Machine Learning, Azure, MLflow, MLOps, Python, Scikit-learn
Introduction and principles of MLflow With increasingly cheaper computing power and storage and at the same time increasing data collection in all walks of life, many companies integrated Data Science…
Mar 23, 2020
Cloudera CDP and Cloud migration of your Data Warehouse
Categories: Big Data, Cloud Computing | Tags: Cloudera, Data Hub, Data Lake, Data Warehouse, Azure
While one of our customer is anticipating a move to the Cloud and with the recent announcement of Cloudera CDP availability mi-september during the Strata conference, it seems like the appropriate…
By David WORMS
Dec 16, 2019
Should you move your Big Data and Data Lake to the Cloud
Categories: Big Data, Cloud Computing | Tags: DevOps, AWS, Cloud, CDP, Databricks, GCP, Azure
Should you follow the trend and migrate your data, workflows and infrastructure to GCP, AWS and Azure? During the Strata Data Conference in New-York, a general focus was put on moving customer’s Big…
Dec 9, 2019
Google Cloud Summit Paris Notes
Categories: Events | Tags: AWS, Cloud, GCP, Kubernetes, Azure, On-premises
Google organized its yearly Summit edition 2019 in Paris on the 18th of June. This year’s event was the biggest yet in Paris, which reflect Google’s commitment to position itself in the French market…
Jun 26, 2019
Running Enterprise Workloads in the Cloud with Cloudbreak
Categories: Big Data, Cloud Computing, DataWorks Summit 2018 | Tags: Cloudbreak, HDP, Operation, Hadoop, AWS, GCP, Azure, OpenStack
This article is based on Peter Darvasi and Richard Doktorics’ talk Running Enterprise Workloads in the Cloud at the DataWorks Summit 2018 in Berlin. It presents Hortonworks’ automated deployment tool…
May 28, 2018