Multi-Source Replication From Dolt to MySQL

TECHNICAL
15 min read

One of MySQL's greatest features is its robust replication support. MySQL replication provides scalability, reliability, and high availability benefits for MySQL-based systems and is extremely widely used in production deployments of MySQL. Here at DoltHub, we've built Dolt, the world's first version-controlled database, and although Dolt is a completely new codebase (written in Golang by the way), it was designed to be compatible with MySQL's dialect of SQL, wire protocol, and tooling. So naturally, supporting MySQL's replication protocol was something we always intended on doing. Today, we support MySQL-to-Dolt replication as well as Dolt-to-MySQL replication (as well as our own, custom protocol for Dolt-to-Dolt replication). We've also shown other ways to use the MySQL replication protocol with Dolt, such as in change data capture use cases with Debezium.

Last week, one of our customers asked about using Dolt with MySQL multi-source replication. MySQL multi-source replication allows a MySQL server to replicate data from multiple source servers. This feature is useful for consolidating data from multiple database servers into a single server, such as when you have data sharded across multiple servers and want to create a single view of all the data for analytics or data warehouse queries. Since Dolt already supports acting as a MySQL source server, we figured this would work, and sure enough it does! Note that Dolt doesn't support replicating from multiple sources, but it does support being a source for MySQL replication.

In this blog post, we'll walk through how to run multiple Dolt servers and have them each sync their data to a single MySQL server using multi-source replication.

MySQL Multi-Source Replication

At its core, the MySQL replication protocol is fairly straightforward. It's been around over two decades and has evolved a bit to add new features and optimize data type serialization, but the basic concepts have remained very stable. A replica is configured to connect to a source server, and once connected, the source server streams binary events that describe data changes being committed on the source server. If you're curious about more details on how the protocol actually works, check out our deep dive blog on the MySQL replication protocol.

Multi-source replication is just a small addition on top of this basic idea. A replica server is configured for two or more sources and each source server is identified with an assigned replication channel name. On the replica, instead of running START REPLICA; you'll run START REPLICA FOR CHANNEL '<channel_name>'; for each channel.

Multi-source replication

While multi-source replication is easy to set up, there are a few important caveats to be aware of...

The most important point to understand is that MySQL does not do any conflict resolution when applying data updates from multiple sources. For example, if source1 and source2 both change the value of a row, one of those updates will end up being overwritten by the other, without any notification. Depending on the order of updates, and how the data is used, this could cause problems for your application and you won't know it has happened.

Additionally, you need to be careful with applying schema changes while replication is running. For example, if you change a table's schema on source1, it will replicate to the replica server and, depending on your schema, if there are any updates still coming in from source2, then those could cause errors and crash replication if the new schema isn't compatible with those data changes. Generally, the safest way to modify table schemas in a multi-source replication setup, is to stop data changes on the sources, then stop replication, make the schema changes, ensure all schemas are in sync on the source servers and replica server, and then restart replication and allow data changes on the source servers again.

If you keep those caveats in mind, multi-source replication can be a useful tool for consolidating data from multiple sources into a single server. Use cases where data is disjoint, such as combining data from multiple shards, can be a particularly good fit for multi-source replication since there are no concerns about conflicts in the data updates between the sources.

Demo

Now that you understand how MySQL multi-source replication works, let's run through a demo showing how to use two Dolt servers for the source servers replicating to a single MySQL replica in a multi-source replication topology. In this demo, we're using Dolt version 1.43.10 and MySQL 8.4. Note that between MySQL 8.0, MySQL 8.4 (the current LTS release), and MySQL 9.0 (the current innovation release), there have been some changes to authentication options that may affect what parameters you need to pass when initializing your MySQL server.

Start the Dolt Source Servers:

The first thing we need to do is create our two source servers. Each of our source servers is a Dolt SQL server, but there's no reason you couldn't have a mix of Dolt servers and MySQL servers for the source servers in this topology. In the commands below, we're creating a new directory for our Dolt database, initializing it to create the database, and then setting some configuration parameters.

cd /tmp
mkdir dolt-servers && cd dolt-servers
mkdir dolt1 && cd dolt1
dolt init --fun
dolt sql -q "SET @@PERSIST.log_bin=1;"
dolt sql -q "SET @@PERSIST.gtid_mode=ON;"
dolt sql -q "SET @@PERSIST.enforce_gtid_consistency=ON;"
dolt sql -q "SET @@PERSIST.server_id=11228;"
dolt sql -q "SET @@PERSIST.server_uuid='0c06d583-0000-421c-b2f1-9a6b03305f97';"
dolt sql-server --port 11228 --loglevel DEBUG

Let's quickly break down what each of these configuration settings does:

  • log_bin controls whether binary logging is enabled or not. The MySQL replication protocol streams data from the binary log, so binary logging must be enabled for replication to work. Note that while binary logging is enabled by default for MySQL, currently for Dolt, you have to explicitly turn on binary logging.
  • gtid_mode controls whether GTIDs (Global Transaction IDentifiers) are enabled. When enabled on a MySQL system, a unique transaction ID is assigned to every executed transaction, which makes it easier to track the stream of transactions from a source server. GTIDs are the preferred way for MySQL replication systems to track the position of replication data in a replication stream, and the only mode supported by Dolt, so GTID_MODE must be enabled.
  • enforce_gtid_consistency is another GTID related setting that controls whether the server enforces GTID consistency checks. This setting should be enabled to ensure that the server is correctly tracking the replication stream.
  • server_id is a positive integer that uniquely identifies the server in a replication topology. Each server in a replication topology MUST have a unique server_id. For simplicity, we're using 11228 here to match the port number on which we're running this server.
  • server_uuid is another unique identifier for the server, similar to server_id. For historical reasons around how the MySQL protocol evolved, it depends on both the server_id and server_uuid identifiers uniquely identifying each server. If these values are not unique for each server in the replication topology, the replica will not be able to correctly track the replication streams and will halt replication. Normally, server_uuid doesn't need to be set, since it is automatically generated by Dolt and MySQL, but in this case, since we're running multiple Dolt servers on the same host, we explicitly set it to ensure that each server has a unique UUID and that they don't end up using the same UUID from any global configuration.

When you run the dolt sql-server command, the server will start up and dump logs to the console. Leave this terminal open so you can see the logs and then open a new terminal to start up the second Dolt server.

cd /tmp/dolt-servers
mkdir dolt2 && cd dolt2
dolt init --fun
dolt sql -q "SET @@PERSIST.log_bin=1;"
dolt sql -q "SET @@PERSIST.gtid_mode=ON;"
dolt sql -q "SET @@PERSIST.enforce_gtid_consistency=ON;"
dolt sql -q "SET @@PERSIST.server_id=11229;"
dolt sql -q "SET @@PERSIST.server_uuid='0c06d583-0001-421c-b2f1-9a6b03305f97';"
dolt sql-server --port 11229 --loglevel DEBUG

These commands are almost exactly identical to the commands we used to start the first Dolt source server, with the exception that of course we're setting different unique identifier values. Leave this terminal open as well so you can see the logs from our second source server. At the end of this demo, when you are ready to stop the Dolt servers, just press CTRL+C in each terminal to stop each dolt process.

Start the MySQL Replica Server:

Now that we have two source servers running, it's time to start up the MySQL replica server. A reminder that for this demo, I'm using MySQL 8.4. If you're using a different version of MySQL, you may need to adjust the parameters passed to the mysqld command to initialize and start your replica server. For example, with MySQL 8.0, the --mysql-native-password=ON is not needed, and if specified will prevent the server from starting up.

# Create a new directory for the MySQL database
cd /tmp
mkdir -p mysqlReplica/mysql_data && cd mysqlReplica
chmod -R 0777 .

# Initialize the new MySQL data directory
mysqld --no-defaults --user=mysql \
	--initialize-insecure \
	--datadir=./mysql_data

# Start up the MySQL server on port 1230
mysqld --no-defaults \
    --datadir=./mysql_data \
    --gtid-mode=ON \
    --skip-replica-start=ON \
    --enforce-gtid-consistency=ON \
    --port=1230 \
    --server-id=1234 \
    --socket=mysql-1230.sock \
    --general_log_file=./general_log \
    --log-bin=./log_bin \
    --slow_query_log_file=./slow_query_log \
    --log-error=./log_error \
    --pid-file=pid-1230.pid \
    --mysql-native-password=ON

After running the second mysql command, your MySQL replica server should be running on port 1230 and the terminal should hang without printing any output. If the mysqld command stops, then it means the server is no longer running, and you should look in the mysql_data directory at the log_error file to see what went wrong. After this demo, when you are ready to stop the MySQL server, you can run mysqladmin -uroot --protocol TCP --port 1230 shutdown to stop the server.

Configure and Start Replication:

Now that we've got all three servers running, let's log into the MySQL server and configure multi-source replication. To start a SQL shell to the MySQL replica, open a new terminal and run:

mysql -uroot --protocol TCP --port 1230

Once we're connected to our MySQL replica through a SQL shell, we'll configure each replication source, using a CHANGE REPLICATION SOURCE statement, and start replication using the START REPLICA statement. Note that in each statement, we specify FOR CHANNEL '<channel_name>' to uniquely identify each replication source.

CHANGE REPLICATION SOURCE TO SOURCE_HOST='localhost', SOURCE_PORT=11228, SOURCE_USER='root', SOURCE_AUTO_POSITION=1 FOR CHANNEL 'dolt-11228';

START REPLICA FOR CHANNEL 'dolt-11228';

CHANGE REPLICATION SOURCE TO SOURCE_HOST='localhost', SOURCE_PORT=11229, SOURCE_USER='root', SOURCE_AUTO_POSITION=1 FOR CHANNEL 'dolt-11229';

START REPLICA FOR CHANNEL 'dolt-11229';

Once replication has started, you can check the status of replication using the SHOW REPLICA STATUS FOR CHANNEL '<channel_name>' \G statement to view the status. This shows you the status of the specified replication channel, including the position in the replication stream, the last transaction executed, and any errors that have occurred.

Replicate Data

Now that we've got all our servers running and connected, let's get to the fun part and replicate some data!

In our example, we're modeling an e-commerce platform that handles orders from customers all over the world. With millions of orders daily, our data is sharded by region, so we set up our first Dolt source server with a database named orders_na to handle orders from North America and our second Dolt source server with a database named orders_eu to handle orders from Europe. We want to consolidate all of our order data into a single MySQL replica server for analytics and reporting.

In a new terminal, open a SQL shell to the first Dolt source server, running on port 11228, using the following command:

mysql -uroot --protocol TCP --port 11228

In the SQL shell for this Dolt source server, create the orders_na database and the orders table. Leave this SQL shell open so we can come back to it later and insert some data.

create database orders_na;
use orders_na;
CREATE TABLE orders (
    id CHAR(36) PRIMARY KEY DEFAULT (UUID()),
    name VARCHAR(255),
    address VARCHAR(255),
    email VARCHAR(255),
    order_date TIMESTAMP
);

-- Create a Dolt commit containing the changes to create the initial schema
CALL dolt_commit('-Am', 'Creating orders table');

Next, let's set up the second Dolt source server, running on port 11229. Open a SQL shell to port 11229 with the following command:

mysql -uroot --protocol TCP --port 11228

In that SQL shell, create the orders_eu database and the orders table. Leave this SQL shell open, too, so we can come back to it later and insert some data.

create database orders_eu;
use orders_eu;
CREATE TABLE orders (
    id CHAR(36) PRIMARY KEY DEFAULT (UUID()),
    name VARCHAR(255),
    address VARCHAR(255),
    email VARCHAR(255),
    order_date TIMESTAMP
);

-- Create a Dolt commit containing the changes to create the initial schema
CALL dolt_commit('-Am', 'Creating orders table');

Our schema has been set up in our two source servers, so let's take a quick look at our MySQL replica and make sure that those two databases have replicated over. In the SQL shell for the MySQL replica that you opened earlier, run the SHOW DATABASES statement:

SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| orders_eu          |
| orders_na          |
| performance_schema |
| sys                |
+--------------------+
6 rows in set (0.00 sec)

SELECT * FROM orders_na.orders;
Empty set (0.00 sec)

SELECT * FROM orders_eu.orders;
Empty set (0.00 sec)

If you see the two sharded databases showing up and an empty orders table in each one, then replication is working correctly and we're ready to insert some data.

Jump back over to the SQL shell for the first Dolt source server and insert some data into the orders table in the orders_na database:

INSERT INTO orders_na.orders (name, address, email, order_date) VALUES
    ('Alice Johnson', '123 Maple St, Springfield, IL', 'alice.johnson@example.com', '2024-10-01 14:32:00'),
    ('Bob Smith', '456 Oak St, Metropolis, NY', 'bob.smith@example.com', '2024-10-02 09:45:00'),
    ('Catherine Lee', '789 Pine St, Gotham, NJ', 'catherine.lee@example.com', '2024-10-03 16:20:00'),
    ('Daniel Kim', '321 Birch St, Star City, CA', 'daniel.kim@example.com', '2024-10-04 11:10:00'),
    ('Emily Chen', '654 Cedar St, Central City, TX', 'emily.chen@example.com', '2024-10-05 08:05:00'),
    ('Frank Wright', '987 Elm St, Coast City, FL', 'frank.wright@example.com', '2024-10-06 13:50:00');

-- Create a Dolt commit; this isn't strictly necessary for replication, but it's a good practice to commit your changes.
-- We'll use this commit history later to demonstrate Dolt's version control features.
CALL dolt_commit('-Am', 'Adding NA orders');

And then switch over to the SQL shell for the second Dolt source server and insert some data into that shard:

INSERT INTO orders_eu.orders (name, address, email, order_date) VALUES
    ('Sophie Müller', '12 Hauptstraße, Munich, Germany', 'sophie.mueller@example.de', '2024-10-07 10:15:00'),
    ('Luca Rossi', '5 Via Roma, Milan, Italy', 'luca.rossi@example.it', '2024-10-08 15:40:00'),
    ('Emma Dubois', '33 Rue de Rivoli, Paris, France', 'emma.dubois@example.fr', '2024-10-09 09:25:00'),
    ('Pieter Janssen', '78 Keizersgracht, Amsterdam, Netherlands', 'pieter.janssen@example.nl', '2024-10-10 14:55:00'),
    ('Ana García', '10 Calle Mayor, Madrid, Spain', 'ana.garcia@example.es', '2024-10-11 12:30:00');

-- Create a Dolt commit; this isn't strictly necessary for replication, but it's a good practice to commit your changes.
-- We'll use this commit history later to demonstrate Dolt's version control features.
CALL dolt_commit('-Am', 'Adding EU orders');

And finally, back in the SQL shell for our MySQL replica, let's check to confirm that both sources have replicated their data to the replica server.

SELECT * FROM orders_na.orders;
+--------------------------------------+---------------+--------------------------------+---------------------------+---------------------+
| id                                   | name          | address                        | email                     | order_date          |
+--------------------------------------+---------------+--------------------------------+---------------------------+---------------------+
| 0e63ba56-de57-4780-a425-463d3c084c39 | Daniel Kim    | 321 Birch St, Star City, CA    | daniel.kim@example.com    | 2024-10-04 04:10:00 |
| 51559fcc-0c4b-451f-b176-a2858e5ba1a1 | Frank Wright  | 987 Elm St, Coast City, FL     | frank.wright@example.com  | 2024-10-06 06:50:00 |
| 5d94639c-f7bf-40bf-abb8-6fcaab69df47 | Catherine Lee | 789 Pine St, Gotham, NJ        | catherine.lee@example.com | 2024-10-03 09:20:00 |
| 71ebd5a5-db75-4987-bca8-dc84a1851085 | Emily Chen    | 654 Cedar St, Central City, TX | emily.chen@example.com    | 2024-10-05 01:05:00 |
| 7423ab46-a6a9-489b-9526-b0dcf66a053f | Bob Smith     | 456 Oak St, Metropolis, NY     | bob.smith@example.com     | 2024-10-02 02:45:00 |
| 75bde129-3af5-484c-be1f-e7807f74ffa5 | Alice Johnson | 123 Maple St, Springfield, IL  | alice.johnson@example.com | 2024-10-01 07:32:00 |
+--------------------------------------+---------------+--------------------------------+---------------------------+---------------------+
6 rows in set (0.00 sec)

SELECT * FROM orders_eu.orders;
+--------------------------------------+----------------+------------------------------------------+---------------------------+---------------------+
| id                                   | name           | address                                  | email                     | order_date          |
+--------------------------------------+----------------+------------------------------------------+---------------------------+---------------------+
| 0353eaa3-5d4d-4c73-9965-c781b5390bf4 | Pieter Janssen | 78 Keizersgracht, Amsterdam, Netherlands | pieter.janssen@example.nl | 2024-10-10 07:55:00 |
| 14532d8f-6c41-4284-9438-5c2bd7d207bd | Emma Dubois    | 33 Rue de Rivoli, Paris, France          | emma.dubois@example.fr    | 2024-10-09 02:25:00 |
| 32af14a3-71ce-4bc9-b5e9-7560d7b893a8 | Sophie Müller  | 12 Hauptstraße, Munich, Germany          | sophie.mueller@example.de | 2024-10-07 03:15:00 |
| b1a17473-e8da-4583-af8f-98ec2bb2292d | Luca Rossi     | 5 Via Roma, Milan, Italy                 | luca.rossi@example.it     | 2024-10-08 08:40:00 |
| c036a471-1d35-4b70-af98-6f79296b25e8 | Ana García     | 10 Calle Mayor, Madrid, Spain            | ana.garcia@example.es     | 2024-10-11 05:30:00 |
+--------------------------------------+----------------+------------------------------------------+---------------------------+---------------------+
5 rows in set (0.00 sec)

And sure enough, every record from the orders_na shard and the orders_eu shard have been correctly replicated to our MySQL multi-source replica.

Using Dolt's Version Control Features

Now that we've seen the basics on using multi-source replication, let's try out some of Dolt's version control features and see how they work with replication.

Let's say that the most recent commit on the orders_na shard was a mistake and we need to revert it. We could use the dolt_revert() stored procedure to create a new commit on the database's commit graph that reverses the changes made in any commit, but if the commit we're trying to reverse is the most recent commit, then it's easier and cleaner to use the dolt_reset() stored procedure to move the branch HEAD pointer back to the previous commit, essentially erasing any sign of that most recent commit from the commit graph. (Technically, that commit is still accessible if you know the hash and haven't run garbage collection, and tools like Dolt's reflog can help you find those previous commit hashes.)

Back in the SQL shell for our second Dolt source server, in the orders_eu database, run the following command to move the branch HEAD pointer back one commit, to the commit where we created the empty orders table:

CALL dolt_reset('--hard', 'HEAD~');
+--------+
| status |
+--------+
|      0 |
+--------+
1 row in set (0.02 sec)

SELECT * FROM orders;
Empty set (0.00 sec)

Since our data changed on the source server, we expect those same changes to be replicated to our MySQL replica. Let's jump back to the SQL shell for our replica server and check the orders_eu.orders table to see if was correctly updated:

SELECT * FROM orders_eu.orders;
Empty set (0.00 sec)

And just like we expected, when we use Dolt's version control features to change the data on a source server, those changes replicate to the MySQL replica server to make its data match.

Now that we have all the data from all our shards together in one server, it's easy for us to write analytic queries that cross each shard database to help us better understand worldwide orders. For example, we could write a query that calculates the total number of orders worldwide, or look at patterns in how the time orders are placed varies across regions.

Future Ideas for Multi-Source Replication

MySQL's support for multi-source replication opens up some interesting possibilities for ways to set up replication topologies with Dolt and MySQL servers. For example, today, Dolt-to-MySQL replication is limited to replicating a single branch to a MySQL server. This limitation comes from MySQL's requirement that each replication channel must have a unique server_id and server_uuid. It would be possible to create a way to configure binary logging for multiple branches on a Dolt database, and to configure a unique server_id and server_uuid for each of those branches, so that a single Dolt server could serve multiple unique replication channels that a MySQL server could connect to using multi-source replication. This could be an interesting pattern for customers who are using Dolt branching to shard or isolate data in their Dolt databases and need to synchronize multiple shards into a central MySQL server. We don't currently have plans to build this, but we're always happy to adjust our roadmap based on what customers tell us they need, so don't be shy to let us know if this is a feature that would help you out!

Summary

Dolt's support for MySQL's replication protocol opens up many possibilities for integrating Dolt into systems with MySQL databases. Dolt supports Dolt-to-MySQL replication where Dolt data is replicated to a MySQL server, as well as MySQL-to-Dolt replication where MySQL data is replicated to a Dolt server. As we saw in this blog post, you can even setup multiple Dolt servers to sync into a single MySQL server. We focused on MySQL replication options in this post, but Dolt also supports an optimized replication protocol for replicating between two or more Dolt servers, including a mode for hot-standby replication so you can quickly change the main database server in a failover scenario.

If you want to talk about replication, database development, or version control, come join us on the DoltHub Discord server! We're always happy to help people get started with Dolt and talk about how Dolt can fit into different use cases.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.