Hive

Publish Spark SQL DataFrame and RDD with Spark Thrift Server

The distributed and in-memory nature of the Spark engine makes it an excellent candidate to expose data to clients which expect low latencies. Dashboards, notebooks, BI studios, KPIs-based reports tools commonly speak the JDBC/ODBC protocols and are such examples. Spark Thrift Server may be used in various fashions. It can run independently as Spark standalone [...]

By |2019-03-25T14:50:18+00:00March 25th, 2019|Categories: Big Data, Data Engineering|Tags: , , , , |0 Comments

Clusters and workloads migration from Hadoop 2 to Hadoop 3

Hadoop 2 to Hadoop 3 migration is a hot subject. How to upgrade your clusters, which features present in the new release may solve current problems and bring new opportunities, how are your current processes impacted, which migration strategy is the most appropriate to your organization? […]

By |2018-08-17T09:36:26+00:00July 25th, 2018|Categories: Big Data|Tags: , , , |0 Comments

Data Lake ingestion best practices

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers. […]

By |2018-06-18T09:29:50+00:00June 18th, 2018|Categories: Data Engineering, DevOps|Tags: , , , , , , , |1 Comment

Essential questions about Time Series

Today, the bulk of Big Data is temporal. We see it in the media and among our customers: smart meters, banking transactions, smart factories, connected vehicles … IoT and Big Data go hand in hand. […]

By |2018-06-05T22:36:40+00:00March 19th, 2018|Categories: Big Data, Data Engineering|Tags: , , , , , |0 Comments