Cloud Computing
Achieving agility, efficiency, cost control and better analytics by deploying a cloud big data infrastructure while taking the security and legacy imperatives into account is not a trivial task. Managing an elastic pool of resources in a multi-tenant environment while respecting the SLAs and data integrity and keeping the bills under control is not either.
We architect, deploy and operate hybrid public and private cloud solutions based on multiple offerings on a daily basis. We’ve been involved in different approaches to cloud migration, from “Lift & Shift” to complete re-platform. These experiences provide our consultants with the depth and breadth of skills needed to help you navigate, customize and operate the new normal.
Our consultants intervene on all the project life cycle, from the feasibility study to the project delivery
Cloud migration
- Gather and document the requirements (functional and nonfunctional)
- Architect the solution based on those requirements
- Roadmap definition and project planning
- Test, optimize, cut-off processes
- Public cloud services comparison
Cloud operation and optimization
- Audit infrastructure, processes and costs
- Infrastructure deployment automation
- Define and achieve desired state and processes (SLOs, SLAs)
- Infrastructure, networking and service operation
- Cost analysis and optimization
Cloud integration and development
- Technology and services qualification and validation
- Data ingestion/preparation pipelines
- Data loading and connections
- Machine Learning algorithms
- Stream and batch processing
Articles related to Cloud
Connecting to ADLS Gen2 from Hadoop (HDP) and Nifi (HDF)
Categories: Big Data, Cloud Computing, Data Engineering | Tags: HDFS, NiFi, Authentication, Authorization, Hadoop, Azure Data Lake Storage (ADLS), Azure, OAuth2
As data projects built in the Cloud are becoming more and more frequent, a common use case is to interact with Cloud storage from an existing on premise Big Data platform. Microsoft Azure recently…
Nov 5, 2020
Automate a Spark routine workflow from GitLab to GCP
Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: Airflow, Spark, CI/CD, Learning and tutorial, GitLab, GCP, Terraform
A workflow consists in automating a succession of tasks to be carried out without human intervention. It is an important and widespread concept which particularly apply to operational environments…
Jun 16, 2020
Introducing Apache Airflow on AWS
Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: Airflow, Oozie, Spark, PySpark, Docker, Learning and tutorial, AWS, Python
Apache Airflow offers a potential solution to the growing challenge of managing an increasingly complex landscape of data management tools, scripts and analytics processes. It is an open-source…
May 5, 2020
Snowflake, the Data Warehouse for the Cloud, introduction and tutorial
Categories: Business Intelligence, Cloud Computing | Tags: Cloud, Data Lake, Data Science, Data Warehouse, Snowflake
Snowflake is a SaaS-based data-warehousing platform that centralizes, in the cloud, the storage and processing of structured and semi-structured data. The increasing generation of data produced over…
Apr 7, 2020
Cloudera CDP and Cloud migration of your Data Warehouse
Categories: Big Data, Cloud Computing | Tags: Cloudera, Data Hub, Data Lake, Data Warehouse, Azure
While one of our customer is anticipating a move to the Cloud and with the recent announcement of Cloudera CDP availability mi-september during the Strata conference, it seems like the appropriate…
By David WORMS
Dec 16, 2019
Should you move your Big Data and Data Lake to the Cloud
Categories: Big Data, Cloud Computing | Tags: DevOps, AWS, Cloud, CDP, Databricks, GCP, Azure
Should you follow the trend and migrate your data, workflows and infrastructure to GCP, AWS and Azure? During the Strata Data Conference in New-York, a general focus was put on moving customer’s Big…
Dec 9, 2019
Insert rows in BigQuery tables with complex columns
Categories: Cloud Computing, Data Engineering | Tags: GCP, BigQuery, Schema, SQL
Google’s BigQuery is a cloud data warehousing system designed to process enormous volumes of data with several features available. Out of all those features, let’s talk about the support of Struct…
Nov 22, 2019
Running Enterprise Workloads in the Cloud with Cloudbreak
Categories: Big Data, Cloud Computing, DataWorks Summit 2018 | Tags: Cloudbreak, HDP, Operation, Hadoop, AWS, GCP, Azure, OpenStack
This article is based on Peter Darvasi and Richard Doktorics’ talk Running Enterprise Workloads in the Cloud at the DataWorks Summit 2018 in Berlin. It presents Hortonworks’ automated deployment tool…
May 28, 2018
Micro Services
Categories: Cloud Computing, Containers Orchestration, Open Source Summit Europe 2017 | Tags: Mesos, CNCF, DNS, Encryption, gRPC, Istio, Linkerd, Micro Services, MITM, Proxy, Service Mesh, Kubernetes, SPOF, SSL/TLS
Back in the days, applications were monolithic and we could use an IP address to access a service. With virtual machines (VM), multiple hosts started to appear on the same machine with multiple apps…
By David WORMS
Nov 14, 2017
Kubernetes Storage Primitives for Stateful Workloads
Categories: Cloud Computing, Containers Orchestration, Open Source Summit Europe 2017 | Tags: Docker, Container Storage Interface (CSI), PVC, GCE, Kubernetes, Azure, Storage
This article is based on the presentation “Introduction to Kubernetes Storage Primitives for Stateful Workloads” from the OSS Convention Prague 2017 by the {Code} team. So, let’s start, what is…
Oct 28, 2017
Multi-Repo, Multi-Node Gating at Massive Scale
Categories: Cloud Computing, DevOps & SRE, Open Source Summit Europe 2017 | Tags: Ansible, CI/CD, Infrastructure, Jenkins, Red Hat, Zuul, OpenStack
This is a recap and personal review of Monty Taylor’s presentation of OpenStack’s Continuous Integration tool Zuul at the OpenSource Summit 2017 in Prague (not to mix with Netflix’ Zuul project…
Oct 24, 2017
Node.js is now integrated to the Microsoft Azure platform
Categories: Cloud Computing, Tech Radar | Tags: Linux, Cloud, Azure, Node.js
Node is now a first class citizen in the Microsoft Azure cloud environment alongside .Net, Java and PHP. This integration is the logical consequence of Microsoft’s involvement in the development of…
By David WORMS
Dec 11, 2011