Announcing Postgres to Doltgres Replication

April 23, 2024

5 min read

Introduction

Last month we published a deep dive on how to build a Postgres logical replication system in Go. Today we're excited to announce the release of this feature, available in the latest version of Doltgres.

Why replicate from Postgres to Doltgres?

We built replication from Postgres to Doltgres because an early customer wanted to track how the data in his primary Postgres database was changing over time. This is a great use case for Doltgres in its pre-beta-release days: you can use it in addition to your primary Postgres server, instead of switching database products to one not quite ready for production. It's effectively free in terms of performance from the perspective of your existing Postgres installation, and you get a complete version history of every change made to your primary database that you can query with the standard set of Dolt functions, like DOLT_DIFF().

Let's see how to set it up.

Enabling Postgres to Doltgres replication

Getting Doltgres replicating from Postgres involves making configuration changes on both Postgres and the Doltgres sides. All of this is detailed in the official docs, but here's a quick overview.

First, turn on the correct level of replication in Postgres. This requires a server restart.

% echo "wal_level = logical" >> /var/lib/postgresql/data/postgresql.conf

Then run some commands as the postgres super user to create some replication slots on the primary:

% export DBNAME=postgres
% export TABLE=t1
% export SLOTNAME=doltgres_slot
% psql "dbname=$DBNAME replication=database" -c "CREATE PUBLICATION $SLOTNAME FOR TABLE $TABLE;"
% psql "dbname=$DBNAME replication=database" -c "CREATE_REPLICATION_SLOT $SLOTNAME LOGICAL pgoutput;"

Finally, start up doltgres with an appropriate config file that enables replication.

log_level: info

behavior:
  read_only: false
  autocommit: true
  persistence_behavior: load
  dolt_transaction_commit: true

user:
  name: "doltgres"
  password: "password"

listener:
  host: localhost
  port: 5433
  max_connections: 100
  read_timeout_millis: 28800000
  write_timeout_millis: 28800000

postgres_replication:
  postgres_server_address: 127.0.0.1
  postgres_user: postgres
  postgres_password: password
  postgres_database: postgres
  postgres_port: 5432
  slot_name: doltgres_slot

Let's see a demo of this in action.

Demo

First I created a table t1 on both the primary and the replica:

create table t1 (a int primary key, b int);

Then I started replication on the primary for this table using the psql commands above. Finally, I started the Doltgres replica with the config file above. Then I connected to the primary Postgres server with psql and started issuing some queries.

postgres=# insert into t1 values (1, 2);
INSERT 0 1
postgres=# insert into t1 values (3, 4);
INSERT 0 1
postgres=# delete from t1 where a = 1;
DELETE 1
postgres=# update t1 set b = 10;
UPDATE 1

I see a bunch of replication information in the log for the Doltgres replica:

2024/04/22 16:31:12 BeginMessage: &{{66} 0/542E9468 2024-04-22 16:31:12.85403 -0700 PDT 1882880}
2024/04/22 16:31:12 replicating query: START TRANSACTION
2024/04/22 16:31:12 Sent Standby status message with WALWritePosition = 0/542E9219, WALApplyPosition = 0/542E9301
2024/04/22 16:31:12 XLogData (*pglogrepl.RelationMessageV2) => WALStart 0/0 ServerWALEnd 0/0 ServerTime 2024-04-22 16:31:12.856936 -0700 PDT
2024/04/22 16:31:12 Sent Standby status message with WALWritePosition = 0/542E9219, WALApplyPosition = 0/1
2024/04/22 16:31:12 XLogData (*pglogrepl.InsertMessageV2) => WALStart 0/542E9300 ServerWALEnd 0/542E9300 ServerTime 2024-04-22 16:31:12.856975 -0700 PDT
2024/04/22 16:31:12 replicating query: INSERT INTO public.t1 (a, b) VALUES (1, 2)
2024/04/22 16:31:12 Sent Standby status message with WALWritePosition = 0/542E9219, WALApplyPosition = 0/542E9301
2024/04/22 16:31:12 XLogData (*pglogrepl.CommitMessage) => WALStart 0/542E9498 ServerWALEnd 0/542E9498 ServerTime 2024-04-22 16:31:12.856988 -0700 PDT
2024/04/22 16:31:12 CommitMessage: &{{67} 0 0/542E9468 0/542E9498 2024-04-22 16:31:12.85403 -0700 PDT}
2024/04/22 16:31:12 replicating query: COMMIT
2024/04/22 16:31:12 Writing LSN 0/542E9468 to file

Now when I connect to the Doltgres replica with psql, I can see that the table has the expected contents:

doltgres=> select * from t1;
 a | b
---+----
 3 | 10
(1 row)

And I can query the history to see how the data changed on the primary. Here's the log of all changes:

doltgres=> select * from dolt_log;
           commit_hash            |   committer   |      email       |        date         |          message
----------------------------------+---------------+------------------+---------------------+----------------------------
 gdi6ioptbm6qda4uuqt38mnh01qv3bef | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:51 | Transaction commit
 79ap78nes15k6ta5dknp6cqgbglutk4d | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:35 | Transaction commit
 3l3mami9v0qu1ggel39d4smsl9j27aia | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:27 | Transaction commit
 nv6899loipsbm3955vnvo8g1o74gaqho | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:31:12 | Transaction commit
 65oq43mtpghr8376an0nhthad868nt74 | Zach Musgrave | zach@dolthub.com | 2024-04-22 23:30:05 | Transaction commit
 54vfim5l5mccsskhlfl97l4egpl84087 | Zach Musgrave | zach@dolthub.com | 2024-03-18 22:17:49 | Initialize data repository
(6 rows)

And here's examining a few of the diffs in these transactions:

doltgres=> select * from dolt_diff('HEAD~', 'HEAD', 't1');
 to_a | to_b | to_commit |     to_commit_date      | from_a | from_b | from_commit |    from_commit_date     | diff_type
------+------+-----------+-------------------------+--------+--------+-------------+-------------------------+-----------
    3 |   10 | HEAD      | 2024-04-22 23:31:51.382 |      3 |      4 | HEAD~       | 2024-04-22 23:31:35.073 | modified
(1 row)

doltgres=> select * from dolt_diff('HEAD~2', 'HEAD', 't1');
 to_a | to_b | to_commit |     to_commit_date      | from_a | from_b | from_commit |    from_commit_date     | diff_type
------+------+-----------+-------------------------+--------+--------+-------------+-------------------------+-----------
      |      | HEAD      | 2024-04-22 23:31:51.382 |      1 |      2 | HEAD~2      | 2024-04-22 23:31:27.714 | removed
    3 |   10 | HEAD      | 2024-04-22 23:31:51.382 |      3 |      4 | HEAD~2      | 2024-04-22 23:31:27.714 | modified
(2 rows)

All of the various table functions and system tables to explore the history of the tables work as you would expect. For example, if I want to see the table as it existed at every commit in its history, I can use the dolt_history_t1 system table:

doltgres=> select * from dolt_history_t1;
 a | b  |           commit_hash            |   committer   |     commit_date
---+----+----------------------------------+---------------+---------------------
 3 | 10 | gdi6ioptbm6qda4uuqt38mnh01qv3bef | Zach Musgrave | 2024-04-22 23:31:51
 3 |  4 | 79ap78nes15k6ta5dknp6cqgbglutk4d | Zach Musgrave | 2024-04-22 23:31:35
 1 |  2 | 3l3mami9v0qu1ggel39d4smsl9j27aia | Zach Musgrave | 2024-04-22 23:31:27
 3 |  4 | 3l3mami9v0qu1ggel39d4smsl9j27aia | Zach Musgrave | 2024-04-22 23:31:27
 1 |  2 | nv6899loipsbm3955vnvo8g1o74gaqho | Zach Musgrave | 2024-04-22 23:31:12
(5 rows)

Conclusion

Doltgres is free and open source, so go check it out if you're interested in replicating your PostgreSQL primary to a data store with built-in diff capabilities. Doltgres isn't ready for production use yet, but because this feature works alongside your existing Postgres installation, it's zero risk to try it out. Doltgres can be a great way to understand and keep track of how your Postgres data is changing over time.

Have questions about DoltgreSQL or Postgres replication? Join us on Discord to talk to our engineering team and meet other Dolt users.

Blog

Introduction

Why replicate from Postgres to Doltgres?

Enabling Postgres to Doltgres replication

Demo

Conclusion

Get started with Dolt