Database

MariaDB integration with Hadoop

During a workshop with one of our customers, Adaltas has identified a potential risk to use MariaDB's High Availability (HA) strategy. Since the customer selected Cloudera's CDH 5 distribution, the reasoning below is based on Cloudera's official documentation. However, it applies to all Hadoop distributions including Hortonworks. Cloudera lists the various databases supported in HA [...]

By |2019-08-05T21:03:36+00:00July 31st, 2017|Categories: Big Data, Infrastructure|Tags: , , , , |0 Comments

Hive, Calcite and Druid

BI/OLAP requires interactive visualization of complex data streams: Real time bidding events User activity streams Voice call logs Network trafic flows Firewall events Application KPIs Traditionnal solutions RDBMS (Mysql..): don't scale, need caching but adhoc queries remain slow Key/value store (HBase...): quick but takes forever to compute (pre-materialization of data) Context Created in 2011, open-sourced [...]

By |2019-06-21T22:05:23+00:00July 14th, 2016|Categories: Big Data|Tags: , , , , |0 Comments

Installing and using MADlib with PostgreSQL on OSX

We cover basic installation and usage of PostgreSQL and MADlib on OSX and Ubuntu. Instructions for other environments should be similar. PostgreSQL is an Open Source database with enterprise functionalities which often lack in MySQL. MADlib is an Open Source library which enhances a PostgreSQL or Greenplum database with functionalities for scalable in-database analytics. [...]

By |2019-08-02T08:33:29+00:00July 7th, 2012|Categories: Data Science|Tags: , , , |0 Comments