MariaDB integration with Hadoop
During a workshop with one of our customers, Adaltas has identified a potential risk to use MariaDB’s High Availability (HA) strategy.
Since the customer selected Cloudera’s CDH 5 distribution, the reasoning below is based on Cloudera’s official documentation. However, it applies to all Hadoop distributions including Hortonworks.
Cloudera lists the various databases supported in HA on its website:
and, in the case of MariaDB, redirects the user to the replication documentation:
The last documentation reflects the old replication strategy. Recently, version 10.0 of MariaDB introduces a replication strategy based on global transaction IDs (GTIDs):
One hypothesis is that Cloudera’s documentation does not reflect the latest developments in MariaDB. However, Cloudera explicitly states that the GTID replication mode is not supported in the case of Mysql: “Cloudera Manager installation fails if GTID-based replication is enabled in MySQL”. In return, the documentation of MariaDB specifies that their implementation of GTID is not identical to that of MySQL: “Note that MariaDB and MySQL have different GTID implementations, and that these are not compatible with each other”.
There remains a doubt as to the compatibility of the components deployed by Cloudera to MariaDB configured with GTID.
On the community and documentation site of Hortonworks, we could not identify any additional information. Hortonworks confirms that it does not support GTID in MySQL without mentioning MariaDB or providing more information:
- Set up an installation to validate the integration, subject to validation, there will remain a doubt as to stability in operation;
- Bring the question back to support Cloudera in order to obtain an official response.