Data Analytics
Related articles

An overview of Cloudera Data Platform (CDP)
Categories: Big Data, Cloud Computing, Data Engineering | Tags: SDX, Data Analytics, Big Data, Cloud, Cloudera, CDP, CDH, Data Hub, Data Lake, Data Warehouse
Cloudera Data Platform (CDP) is a cloud computing platform for businesses. It provides integrated and multifunctional self-service tools in order to analyze and centralize data. It brings security and…
Jul 19, 2021

Download datasets into HDFS and Hive
Categories: Big Data, Data Engineering | Tags: Hive, Business intelligence, Data Analytics, Data Engineering, Data structures, Database, Hadoop, HDFS, Big Data, Data Lake, Data Warehouse
Introduction Nowadays, the analysis of large amounts of data is becoming more and more possible thanks to Big data technology (Hadoop, Spark,…). This explains the explosion of the data volume and the…
By Aida NGOM
Jul 31, 2020

Comparaison of different file formats in Big Data
Categories: Big Data, Data Engineering | Tags: ORC, Batch processing, Business intelligence, Data structures, Protocol Buffers, Avro, HDFS, Parquet, Big Data, CSV, JavaScript Object Notation (JSON), Kubernetes
In data processing, there are different types of files formats to store your data sets. Each format has its own pros and cons depending upon the use cases and exists to serve one or several purposes…
By Aida NGOM
Jul 23, 2020

Auto-scaling Druid with Kubernetes
Categories: Big Data, Business Intelligence, Containers Orchestration | Tags: EC2, Druid, CNCF, Container Orchestration, Data Analytics, Helm, Metrics, OLAP, Operation, Cloud, Kubernetes, Prometheus, Python
Apache Druid is an open-source analytics data store which could leverage the auto-scaling abilities of Kubernetes due to its distributed nature and its reliance on memory. I was inspired by the talk…
Jul 16, 2019

Druid and Hive integration
Categories: Big Data, Business Intelligence, Tech Radar | Tags: Druid, Hive, Data Analytics, LLAP, OLAP, SQL
This article covers the integration between Hive Interactive (LDAP) and Druid. One can see it as a complement of the Ultra-fast OLAP Analytics with Apache Hive and Druid article. Tools description…
Jun 17, 2019

Hadoop and R with RHadoop
Categories: Business Intelligence, Data Science | Tags: Thrift, Data Analytics, Learning and tutorial, R, Hadoop, HBase, HDFS, MapReduce
RHadoop is a bridge between R, a language and environment to statistically explore data sets, and Hadoop, a framework that allows for the distributed processing of large data sets across clusters of…
By David WORMS
Jul 19, 2012