Change Data Capture with Amazon RDS MariaDB and Pivotal Cloud Foundry

Recently I had a need to determine how to implement change data capture against an existing MariaDB database running with Amazon RDS.  The applications that were using this database were written in a variety of languages (Perl, PHP, Node, Java) and were spread across several different environments, so it really wanted to be something done via the database itself.  Amazon RDS gives us fantastic operational benefits, but also some limitations which we feared would hamper our ability to do this.  We have been making great use of Pivotal Web Services (PWS), which is an implementation of Cloud Foundry and there was big appetite to have this solution run within our PWS space.

Maxwell’s Daemon

After much Google Kung-Fu, we were pleased to stumble upon the solution: maxwell’s daemon.  Straight from the site:

…reads MySQL binlogs and writes row updates to Kafka as JSON. Maxwell has a low operational bar and produces a consistent, easy to ingest stream of updates. It allows you to easily “bolt on” some of the benefits of stream processing systems without going through your entire code base to add (unreliable) instrumentation points.

This sounded like exactly what we were looking for!!  The only question was would it work in Amazon RDS and could it run in Cloud Foundry?

Configuration in Amazon RDS

This was surprisingly straightforward and easy, and right from the documentation

To run Maxwell against RDS, (either Aurora or Mysql) you will need to do the following:

  • set binlog_format to “ROW”. Do this in the “parameter groups” section. For a Mysql-RDS instance this parameter will be in a “DB Parameter Group”, for Aurora it will be in a “DB Cluster Parameter Group”.
  • setup RDS binlog retention as described here.  call mysql.rds_set_configuration('binlog retention hours', 24) on the server.

Configuration for Pivotal Web Services (Cloud Foundry)

This also was pretty straight forward, one additional requirement we had was to use Amazon SQS instead of either Kafka or Kinesis.  Maxwell allows for STDOUT output of changes in JSON, so we just wrapped the whole thing in perl and send the CDC items to an SQS queue for other workers within Cloud Foundry to work with.  The only tricky item to find was an ENV variable to tell the buildpack that we were smarter than it.

JBP_CONFIG_JAVA_MAIN: ‘{java_main_class: “DoesNotMatter”}’

So the full manifest for us ended up being (obviously with sensitive information removed):

---
applications:
- name: maxwell
  memory: 1G
  instances: 1
  host: maxwell
  path: .
  command: '/usr/bin/perl ./run.pl'
  health-check-type: none
  buildpack: https://github.com/cloudfoundry/java-buildpack.git
  no-route: true
  env:
    JBP_CONFIG_JAVA_MAIN: '{java_main_class: "DoesNotMatter"}'
    AWS_SECRET_ACCESS_KEY: ""
    AWS_REGION: ""
    AWS_SQS_URL: ""
    AWS_ACCESS_KEY_ID: ""
    COMPOSER_GITHUB_OAUTH_TOKEN: ""

Here you will find the simple perl program that starts maxwell (again with sensitive information removed and note that the perl program ignores all tables that are not either named “events” or “users” for this proof of concept and also bundles with it the addition of the SQS module for Perl.

#!/usr/bin/perl -I/home/vcap/app/perl5/lib/perl5

use Amazon::SQS::Simple;
use JSON;
use strict;

$ENV{'JAVA_HOME'} = "/home/vcap/app/.java-buildpack/open_jdk_jre";
$ENV{'PATH'}  = $ENV{'PATH'} . ":" . $ENV{'JAVA_HOME'} . "/bin";

my $sqs     = new Amazon::SQS::Simple( "USER", "PASS" );
my $queue   = $sqs->GetQueue('QUEUENAME');

open (MAXWELL, "./bin/maxwell --user='USER' --password='PASS' --host='HOSTNAME' --producer=stdout | ");
while(my $message=<MAXWELL>)
{
        chop $message;
        if($message =~ /^\{/)
        {
                my @decoded_json = decode_json($message);
                if ($decoded_json[0]{'table'} =~ /events|users/)
                {
                        $queue->SendMessage($message);
                        print "*** Sending - ";
                }
        }
        #print $message, "\n";
}

Hopefully this helps someone who is looking to seamlessly replicate the constantly modifying data from an existing MySQL/MariaDB system.

Leave a Reply

Your email address will not be published. Required fields are marked *