Announcing Postgres to Doltgres Replication
Introduction
Last month we published a deep dive on how to build a Postgres logical replication system in Go. Today we're excited to announce the release of this feature, available in the latest version of Doltgres.
Why replicate from Postgres to Doltgres?
We built replication from Postgres to Doltgres because an
early customer wanted to track how the data in his primary Postgres database was changing over
time. This is a great use case for Doltgres in its pre-beta-release days: you can use it in addition
to your primary Postgres server, instead of switching database products to one not quite ready for
production. It's effectively free in terms of performance from the perspective of your existing
Postgres installation, and you get a complete version history of every change made to your primary
database that you can query with the standard set of Dolt functions, like
DOLT_DIFF()
.
Let's see how to set it up.
Enabling Postgres to Doltgres replication
Getting Doltgres replicating from Postgres involves making configuration changes on both Postgres and the Doltgres sides. All of this is detailed in the official docs, but here's a quick overview.
First, turn on the correct level of replication in Postgres. This requires a server restart.
% echo "wal_level = logical" >> /var/lib/postgresql/data/postgresql.conf
Then run some commands as the postgres super user to create some replication slots on the primary:
% export DBNAME=postgres
% export TABLE=t1
% export SLOTNAME=doltgres_slot
% psql "dbname=$DBNAME replication=database" -c "CREATE PUBLICATION $SLOTNAME FOR TABLE $TABLE;"
% psql "dbname=$DBNAME replication=database" -c "CREATE_REPLICATION_SLOT $SLOTNAME LOGICAL pgoutput;"
Finally, start up doltgres
with an appropriate config file that enables replication.
log_level: info
behavior:
read_only: false
autocommit: true
persistence_behavior: load
dolt_transaction_commit: true
user:
name: "doltgres"
password: "password"
listener:
host: localhost
port: 5433
max_connections: 100
read_timeout_millis: 28800000
write_timeout_millis: 28800000
postgres_replication:
postgres_server_address: 127.0.0.1
postgres_user: postgres
postgres_password: password
postgres_database: postgres
postgres_port: 5432
slot_name: doltgres_slot
Let's see a demo of this in action.
Demo
First I created a table t1
on both the primary and the replica:
create table t1 (a int primary key, b int);
Then I started replication on the primary for this table using the psql
commands above. Finally, I
started the Doltgres replica with the config file above. Then I connected to the primary Postgres
server with psql
and started issuing some queries.
postgres=# insert into t1 values (1, 2);
INSERT 0 1
postgres=# insert into t1 values (3, 4);
INSERT 0 1
postgres=# delete from t1 where a = 1;
DELETE 1
postgres=# update t1 set b = 10;
UPDATE 1
I see a bunch of replication information in the log for the Doltgres replica:
2024/04/22 16:31:12 BeginMessage: &{{66} 0/542E9468 2024-04-22 16:31:12.85403 -0700 PDT 1882880}
2024/04/22 16:31:12 replicating query: START TRANSACTION
2024/04/22 16:31:12 Sent Standby status message with WALWritePosition = 0/542E9219, WALApplyPosition = 0/542E9301
2024/04/22 16:31:12 XLogData (*pglogrepl.RelationMessageV2) => WALStart 0/0 ServerWALEnd 0/0 ServerTime 2024-04-22 16:31:12.856936 -0700 PDT
2024/04/22 16:31:12 Sent Standby status message with WALWritePosition = 0/542E9219, WALApplyPosition = 0/1
2024/04/22 16:31:12 XLogData (*pglogrepl.InsertMessageV2) => WALStart 0/542E9300 ServerWALEnd 0/542E9300 ServerTime 2024-04-22 16:31:12.856975 -0700 PDT
2024/04/22 16:31:12 replicating query: INSERT INTO public.t1 (a, b) VALUES (1, 2)
2024/04/22 16:31:12 Sent Standby status message with WALWritePosition = 0/542E9219, WALApplyPosition = 0/542E9301
2024/04/22 16:31:12 XLogData (*pglogrepl.CommitMessage) => WALStart 0/542E9498 ServerWALEnd 0/542E9498 ServerTime 2024-04-22 16:31:12.856988 -0700 PDT
2024/04/22 16:31:12 CommitMessage: &{{67} 0 0/542E9468 0/542E9498 2024-04-22 16:31:12.85403 -0700 PDT}
2024/04/22 16:31:12 replicating query: COMMIT
2024/04/22 16:31:12 Writing LSN 0/542E9468 to file
Now when I connect to the Doltgres replica with psql
, I can see that the table has the expected contents:
doltgres=> select * from t1;
a | b
---+----
3 | 10
(1 row)
And I can query the history to see how the data changed on the primary. Here's the log of all changes:
doltgres=> select * from dolt_log;
commit_hash | committer | email | date | message
----------------------------------+---------------+------------------+---------------------+----------------------------
gdi6ioptbm6qda4uuqt38mnh01qv3bef | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:51 | Transaction commit
79ap78nes15k6ta5dknp6cqgbglutk4d | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:35 | Transaction commit
3l3mami9v0qu1ggel39d4smsl9j27aia | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:27 | Transaction commit
nv6899loipsbm3955vnvo8g1o74gaqho | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:12 | Transaction commit
65oq43mtpghr8376an0nhthad868nt74 | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:30:05 | Transaction commit
54vfim5l5mccsskhlfl97l4egpl84087 | Zach Musgrave | zach@dolthub.com | 2024-03-18 22:17:49 | Initialize data repository
(6 rows)
And here's examining a few of the diffs in these transactions:
doltgres=> select * from dolt_diff('HEAD~', 'HEAD', 't1');
to_a | to_b | to_commit | to_commit_date | from_a | from_b | from_commit | from_commit_date | diff_type
------+------+-----------+-------------------------+--------+--------+-------------+-------------------------+-----------
3 | 10 | HEAD | 2024-04-22 23:31:51.382 | 3 | 4 | HEAD~ | 2024-04-22 23:31:35.073 | modified
(1 row)
doltgres=> select * from dolt_diff('HEAD~2', 'HEAD', 't1');
to_a | to_b | to_commit | to_commit_date | from_a | from_b | from_commit | from_commit_date | diff_type
------+------+-----------+-------------------------+--------+--------+-------------+-------------------------+-----------
| | HEAD | 2024-04-22 23:31:51.382 | 1 | 2 | HEAD~2 | 2024-04-22 23:31:27.714 | removed
3 | 10 | HEAD | 2024-04-22 23:31:51.382 | 3 | 4 | HEAD~2 | 2024-04-22 23:31:27.714 | modified
(2 rows)
All of the various table
functions and system
tables to explore the
history of the tables work as you would expect. For example, if I want to see the table as it
existed at every commit in its history, I can use the dolt_history_t1
system table:
doltgres=> select * from dolt_history_t1;
a | b | commit_hash | committer | commit_date
---+----+----------------------------------+---------------+---------------------
3 | 10 | gdi6ioptbm6qda4uuqt38mnh01qv3bef | Zach Musgrave | 2024-04-22 23:31:51
3 | 4 | 79ap78nes15k6ta5dknp6cqgbglutk4d | Zach Musgrave | 2024-04-22 23:31:35
1 | 2 | 3l3mami9v0qu1ggel39d4smsl9j27aia | Zach Musgrave | 2024-04-22 23:31:27
3 | 4 | 3l3mami9v0qu1ggel39d4smsl9j27aia | Zach Musgrave | 2024-04-22 23:31:27
1 | 2 | nv6899loipsbm3955vnvo8g1o74gaqho | Zach Musgrave | 2024-04-22 23:31:12
(5 rows)
Conclusion
Doltgres is free and open source, so go check it out if you're interested in replicating your PostgreSQL primary to a data store with built-in diff capabilities. Doltgres isn't ready for production use yet, but because this feature works alongside your existing Postgres installation, it's zero risk to try it out. Doltgres can be a great way to understand and keep track of how your Postgres data is changing over time.
Have questions about DoltgreSQL or Postgres replication? Join us on Discord to talk to our engineering team and meet other Dolt users.