MariaDB integration with Hadoop

MariaDB integration with Hadoop

Do you like our work......we hire!

Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.

During a workshop with one of our customers, Adaltas has identified a potential risk to use MariaDB’s High Availability (HA) strategy.

Since the customer selected Cloudera’s CDH 5 distribution, the reasoning below is based on Cloudera’s official documentation. However, it applies to all Hadoop distributions including Hortonworks.

Cloudera lists the various databases supported in HA on its website:
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/admin_cm_ha_dbms.html
and, in the case of MariaDB, redirects the user to the replication documentation:
https://mariadb.com/kb/en/mariadb/setting-up-replication/

The last documentation reflects the old replication strategy. Recently, version 10.0 of MariaDB introduces a replication strategy based on global transaction IDs (GTIDs):
https://mariadb.com/kb/en/global-transaction-id/

One hypothesis is that Cloudera’s documentation does not reflect the latest developments in MariaDB. However, Cloudera explicitly states that the GTID replication mode is not supported in the case of Mysql: “Cloudera Manager installation fails if GTID-based replication is enabled in MySQL”. In return, the documentation of MariaDB specifies that their implementation of GTID is not identical to that of MySQL: “Note that MariaDB and MySQL have different GTID implementations, and that these are not compatible with each other”.

There remains a doubt as to the compatibility of the components deployed by Cloudera to MariaDB configured with GTID.

On the community and documentation site of Hortonworks, we could not identify any additional information. Hortonworks confirms that it does not support GTID in MySQL without mentioning MariaDB or providing more information:
https://community.hortonworks.com/questions/2172/hive-metastore-ha-mysql-replication-for-failover-p.html

Possible actions:

  • Set up an installation to validate the integration, subject to validation, there will remain a doubt as to stability in operation;
  • Bring the question back to support Cloudera in order to obtain an official response.
Share this article

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain