Accessing Dolt Data with the Postgres mysql_fdw
Dolt is a full featured SQL database which supports Git-like version control features, including branch, diff, merge, clone, push and pull. It's wire protocol and supported SQL dialect are based on MySQL.
Here at DoltHub, it's not uncommon for our users to be software firms that are already heavily invested in PostreSQL. We've recently released changes to Dolt to improve its compatibility with the MySQL Foreign Data Wrapper for PostgreSQL. Using the foreign data wrapper, you can easily access your Dolt tables from your existing PostgreSQL database and do things like perform queries which join Dolt tables with Postgres tables and run DML operations against Dolt tables.
This is a short blog post to show how to get started with mysql_fdw running against a Dolt SQL server.
Overview
Foreign data wrappers are extensions to Postgres which allow a given server to access data which resides outside of the Postgres RDBMS using regular SQL queries. There are a variety of such extensions, including extensions which allow it to access various other RDBMS systems (MySQL, Postgres, SQLite, Oracle, etc.), access plain text files as if they were relational data (CSV, JSON, Parquet, etc.), access various non-relational databases (LDAP, DynamoDB, Redis, etc.), and even access arbitrary internet protocols (Git, IMAP, www, etc.). Suffice to say, it's a very powerful model and there is some really cool stuff out there.
Because Dolt speaks the MySQL wire protocol, you can utilize the MySQL foreign data wrapper to access your Dolt tables within from within your Postgres database. In order to see it in action, we will do the following things:
-
Get a dolt database with some interesting data. We will be working with a
clone
of a bounties database. -
Run a Dolt sql-server instance to which the Postgres server will connect.
-
Run a Postgres server with the mysql_fdw installed. We will be working with a docker image which already has postgres and the FDW installed.
-
Run the appropriate DDL commands against the PostgreSQL server in order to tell it how to the remote Dolt server, which tables it should expect to find, and what their schema will be.
At the end of our demo, mysql_fdw is running within the postgres server to allow for access to the tables in the foreign dolt database. The running server instances look like:
Let's jump in and see it all in action.
Get a Dolt database
First we will clone an existing bounties database from dolthub.com. Make
certain you are working with the latest dolt release, which includes the
changes to improve compatibility with mysql_fdw
:
$ dolt clone dolthub/standard-charge-files
cloning https://doltremoteapi.dolthub.com/dolthub/standard-charge-files
$ cd standard-charge-files
Run dolt sql-server
Now we will run dolt sql-server
, exposing the standard_charge_files
database. We want the root
user on the database to have a password, so we
will create a small configuration file to run it against:
$ DOLT_ROOT_PASSWORD=`openssl rand -hex 16`
$ cat > config.yaml <<EOF
user:
name: root
password: "$DOLT_ROOT_PASSWORD"
EOF
$ dolt sql-server --config config.yaml &
To make sure everything is working, we can run a quick query:
$ echo 'select count(*) from hospitals;' | mysql -u root --password="$DOLT_ROOT_PASSWORD" standard_charge_files
mysql: [Warning] Using a password on the command line interface can be insecure.
count(*)
7232
Run Postgres and Configure mysql_fdw
For ease of demonstration, we're going to run Postgres from a docker image with
mysql_fdw
already installed.
$ export POSTGRES_PASSWORD=`openssl rand -hex 16`
$ docker run --publish 5432:5432 -d --rm -e POSTGRES_PASSWORD toleg/postgres_mysql_fdw:15.2_fdw2.9.0
$ PGPASSWORD="$POSTGRES_PASSWORD" psql -h 127.0.0.1 --user postgres -v DOLT_ROOT_PASSWORD="$DOLT_ROOT_PASSWORD"
Type "help" for help.
postgres=# \dt
Did not find any relations.
Now we can use mysql_fdw to connect to our dolt sql-server:
postgres=# create extension mysql_fdw;
CREATE EXTENSION
postgres=# CREATE SERVER dolt_server FOREIGN DATA WRAPPER mysql_fdw OPTIONS (host 'host.docker.internal', port '3306');
CREATE SERVER
postgres=# CREATE USER MAPPING FOR postgres SERVER dolt_server OPTIONS (username 'root', password :'DOLT_ROOT_PASSWORD');
CREATE USER MAPPING
At this point, we can import the entire schema of the foreign tables, if we want:
postgres=# IMPORT FOREIGN SCHEMA standard_charge_files FROM SERVER dolt_server INTO public;
IMPORT FOREIGN SCHEMA
This will import foreign table definitions for every table in the
standard_charge_files
database. We can inspect the imported tables with psql directive \dE
:
postgres=# \dE+
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+----------------+---------------+----------+-------------+---------+-------------
public | additional_npi | foreign table | postgres | permanent | 0 bytes |
public | hospitals | foreign table | postgres | permanent | 0 bytes |
public | npi_tin | foreign table | postgres | permanent | 0 bytes |
(3 rows)
And we can run queries against the remote tables:
postgres=# select count(*) from npi_tin;
count
-------
10164
(1 row)
postgres=# select npi, tin, standard_charge_file_url from hospitals natural join npi_tin where standard_charge_file_url <> '' limit 10;
npi | tin | standard_charge_file_url
------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------
1487866315 | 560529945 | https://chsprodrsrcegrporg385.blob.core.windows.net/cms-price-transparency/560529945_atrium-health-cabarrus_standardcharges.json
1487866315 | 560529945 | https://chsprodrsrcegrporg385.blob.core.windows.net/cms-price-transparency/560529945_atrium-health-cabarrus_standardcharges.json
1497859649 | 370960170 | https://www.nm.org/-/media/northwestern/resources/patients-and-visitors/billing-and-insurance/northwestern-memorial-hospital.xlsx
1497859649 | 370960170 | https://www.nm.org/-/media/northwestern/resources/patients-and-visitors/billing-and-insurance/northwestern-memorial-hospital.xlsx
1558463745 | 352007327 | https://www.comhs.org/-/media/Files/PriceTransparency/St%20Mary%20Medical%20Center_standardcharges2021.ashx
1114924834 | 221487330 | https://www.saintpetershcs.com/SaintPeters/media/SaintPeters/About%20Us/SaintPeters_Hospital-Transparency-Upload-121621.pdf
1114924834 | 221487330 | https://www.saintpetershcs.com/SaintPeters/media/SaintPeters/About%20Us/SaintPeters_Hospital-Transparency-Upload-121621.pdf
1669424354 | 455540852 | https://www.legacyhealth.org/-/media/Files/CSV/Price-Transparency/93-0281321_legacy-silverton-medical-center_standardcharges.json
1346753118 | 823349983 | https://kingwood247er.com/wp-content/uploads/2022/05/823349983_Kingwood-Emergency-Hospital_Standardcharges.csv
1124021811 | 710246565 | https://irp.cdn-website.com/d9154e9e/files/uploaded/charge%20master%205.2.pdf
(10 rows)
Getting Dolt Specific
So far, all of this functionality is available against any RDBMS being accessed
from a foreign data wrapper. We're excited that dolt's compatibility with
mysql_fdw
has recently improved, but one major reason people use Dolt is to
gain access to its version control features, and those aren't being showcased
above. The reality is that this approach is great for read-only use cases, but
there are some existing limitations which make dolt's version control features
less accessible when accessing them through Postgres + mysql_fdw
:
-
Some dolt features rely on stored procedures, like
call dolt_commit()
,call dolt_merge()
andcall dolt_checkout()
. As far as we know, these are currently unaccessible frommysql_fdw
. -
Some dolt features rely on dynamic schema (database) designations, in the form of
database_name/branch_name
, for example, to access different branches and revisions of the database.
Importing dolt-specific metadata tables
Depending on your use case, these constraints may be a bit of a show stopper.
There are some ways to work within them a bit. For example, you can easily
manually define a foreign table for a few of of the built-in Dolt tables like
dolt_status
and dolt_log
:
postgres=# CREATE FOREIGN TABLE dolt_status (
table_name text NOT NULL,
staged smallint NOT NULL,
status text NOT NULL
) SERVER dolt_server
OPTIONS (dbname 'standard_charge_files', table_name 'dolt_status');
CREATE FOREIGN TABLE
postgres=# CREATE FOREIGN TABLE dolt_log (
commit_hash text NOT NULL,
committer text NOT NULL,
email text NOT NULL,
date timestamp NOT NULL,
message text NOT NULL
) SERVER dolt_server
OPTIONS (dbname 'standard_charge_files', table_name 'dolt_log');
CREATE FOREIGN TABLE
postgres=# SELECT * FROM dolt_status;
table_name | staged | status
------------+--------+--------
(0 rows)
postgres=# SELECT * FROM dolt_log LIMIT 1;
commit_hash | committer | email | date | message
----------------------------------+-----------+------------------+---------------------+----------------------
e4j80uq3v2qj3jus3goajnas11ubtk0f | spacelove | alec@dolthub.com | 2023-04-05 18:44:49 | Dolthub User: rl1987+
| | | | Accepted PR: 6
(1 row)
Using mysql_fdw
, we can run DML against any MySQL table where the first column
is a UNIQUE or PRIMARY KEY. After we run the DML, we can see the changes
reflected in dolt_status
.
postgres=# INSERT INTO hospitals (enrollment_id) VALUES ('-1');
INSERT 0 1
postgres=# SELECT * FROM dolt_status;
table_name | staged | status
------------+--------+----------
hospitals | 0 | modified
(1 row)
postgres=# DELETE FROM hospitals WHERE enrollment_id = '-1';
DELETE 1
postgres=# SELECT * FROM dolt_status;
table_name | staged | status
------------+--------+--------
(0 rows)
Importing tables from a non-default branch
We can also import the tables from a specific non-default branch or commit by specifying the branch in the database name. For example:
postgres=# CREATE FOREIGN TABLE tuopuh_npi_tin (
npi int,
tin varchar(16383)
) SERVER dolt_server
OPTIONS (dbname 'standard_charge_files/tuopuhsicl3i2kblhkdpdr8ik3vmjvf8', table_name 'npi_tin');
CREATE FOREIGN TABLE
postgres-# \dE
List of relations
Schema | Name | Type | Owner
--------+----------------+---------------+----------
public | additional_npi | foreign table | postgres
public | dolt_log | foreign table | postgres
public | dolt_status | foreign table | postgres
public | hospitals | foreign table | postgres
public | npi_tin | foreign table | postgres
public | tuopuh_npi_tin | foreign table | postgres
(6 rows)
postgres=# SELECT count(*) FROM tuopuh_npi_tin;
count
-------
10164
(1 row)
This same technique works for dolt_log
or dolt_status
against a different
Dolt branch HEAD as well. If you have just a few Dolt HEADs you need to access
on the read path of your integration with your existing Postgres database,
this can be a very workable approach. It would perhaps be more practical to
import the various branch HEADs as foreign tables in their own, HEAD specific
Postgres schemas, and then to query across them within the Postgres context.
Overall we can access a number of versioned features of the Dolt database from
within Postgres using the mysql_fdw
, but write-heavy workloads and workloads
which require access to advanced version control features like branching and
diffs remain most accessible using the MySQL protocol against the Dolt
sql-server instance.
We know potential customers are committed to Postgres and we will continue to
work on better Dolt support for the Postgres use case. We are excited to expose
normal Dolt tables and Dolt system tables to Postgres servers through the
mysql_fdw
.
Conclusion
We saw how to run a postgres server with mysql_fdw
installed and how to use
mysql_fdw to connect to a dolt sql-server instance. We imported the tables in
the Dolt server as Postgres foreign tables and were table to run queries
against them, including writes when the table's schema met specific
constraints. We saw how to import Dolt specific metadata tables and specific
branches and revisions of the foreign Dolt database as well.