Doltgres Data Imports
Lately, we've been hard at work building Doltgres, our version-controlled Postgres-compatible database. Doltgres gives you the power of a SQL relational database, combined with the distributed version-control features of Git. Doltgres is currently in Alpha, and we're getting close to hitting our Beta milestone. In this post, we're focusing on one specific Beta milestone requirement: support for importing Postgres data dumps.
Postgres Data Dumps
When new customers come to try out Dolt or Doltgres, one of the very first things they usually want to do is load a data dump from an existing database and see how well Dolt and Doltgres work. If a potential customer can't get this first step working quickly, then we find that people aren't typically willing to spend a lot more time trying. They often don't reach out for help or cut issues on GitHub, and instead just move on. So, getting imports to load in cleanly was one of our initial big areas of focus with Dolt, and we're giving the same attention to Doltgres now, too.
Supporting database dump files is challenging, because database schemas often use a LOT of features across the large surface area of SQL, particularly exercising DDL support. For customers to be able to import database dumps from a wide variety of real world schemas, we need a wide swath of support across Postgres' DDL SQL syntax.
We've been mining dump files from the internet, testing them, and then cataloging, prioritizing, and fixing the gaps we find. There's been a LOT of progress closing those gaps and getting closer to full support, and we've now reached a point where we're starting to be able to cleanly import Postgres data dumps! There are enough features implemented for the first of these Postgres dumps to cleanly load into Doltgres. In the next section, we'll demo importing that dump into Doltgres.
Import Demo
Let's walk through a demo of importing a PostgreSQL dump from the internet into Doltgres. We'll be using the french-towns-communes-francaises dataset from the morenoh149/postgresDBSamples repository on GitHub. This dataset was originally part of the now defunct pgFoundry site, where many Postgres resources and datasets were provided.
If you don't have the doltgres
binary on your machine yet, go ahead and install it.
Next, start up the doltgres
server:
> doltgres
INFO[0000] Creating root@localhost superuser
INFO[0000] Server ready. Accepting connections.
WARN[0000] secure_file_priv is set to "", which is insecure.
WARN[0000] Any user with GRANT FILE privileges will be able to read any file which the sql-server process can read.
WARN[0000] Please consider restarting the server with secure_file_priv set to a safe (or non-existent) directory.
Now that we've got a Doltgres server running, let's import some data! We first use wget
to download the raw french-towns-communes-francaises.sql file from GitHub, then we use the Postgres SQL shell, psql
, to feed in those SQL statements to the Doltgres server:
wget "https://raw.githubusercontent.com/morenoh149/postgresDBSamples/refs/heads/master/french-towns-communes-francaises/french-towns-communes-francaises.sql"
PGPASSWORD=password psql -Upostgres -hlocalhost < french-towns-communes-francaises.sql
If everything works correctly, you should see the following output in your terminal, indicating what statements are being executed from the SQL file:
BEGIN
CREATE TABLE
CREATE TABLE
CREATE TABLE
SET
SET
SET
COPY 26
COPY 100
COPY 36684
pg_catalog.setval
-------------------
26
(1 row)
pg_catalog.setval
-------------------
100
(1 row)
pg_catalog.setval
-------------------
36684
(1 row)
COMMIT
ANALYZE
ANALYZE
ANALYZE
From here, we can log into the database, look around, and run some sample queries:
PGPASSWORD=password psql -Upostgres -hlocalhost
psql (15.12 (Homebrew), server 15.0)
Type "help" for help.
postgres=> \dt
List of relations
Schema | Name | Type | Owner
--------+-------------+-------+----------
public | departments | table | postgres
public | regions | table | postgres
public | towns | table | postgres
(3 rows)
postgres=> select count(*) from towns;
count
-------
36684
(1 row)
postgres=> select * from towns limit 1;
id | code | article | name | department
-------+------+---------+-----------+------------
16070 | 309 | | Souprosse | 40
(1 row)
Now that we've confirmed our data import looks good, let's create a Doltgres commit. This creates a commit in our database's commit graph that we can always refer back to:
call dolt_commit('-Am', 'Initial data import for french-towns-communes-francaises dataset');
We can take a look at the dolt_log
system table to see the commit log for our current branch:
postgres=> select commit_hash, date, message from dolt_log;
commit_hash | date | message
----------------------------------+---------------------+------------------------------------------------------------------
l9iltg5a2667l4kosag49sh0tn4srvmj | 2025-03-19 22:27:00 | Initial data import for french-towns-communes-francaises dataset
8aht4u32pjbkmgiqqj7ql2uetq9j7a6g | 2025-03-19 22:25:32 | CREATE DATABASE
rikg506s6mbkeq9lould58cuelltavnm | 2025-03-19 22:25:32 | Initialize data repository
(3 rows)
And sure enough, we see our new commit is the most recent commit reported in the commit log.
What's Next?
As excited as we are about getting a first Postgres data dump from the internet to cleanly load into Dolgres, we're not stopping to celebrate just yet. We've been testing a lot of data dumps and there are many more gaps we're targeting to knock out quickly.
In addition to lots of smaller features we've got in our backlog, here are a few of the larger features we're looking at next:
TEXT
types in primary, secondary, and foreign keys – Postgres uses a clever approach, named TOAST, that allows efficient storage of large text or binary content, by storing it inline in a row when it's small enough, and moving larger content to out-of-row storage. This enablesTEXT
types to be stored and searched efficiently enough for them to be used in primary and secondary indexes, as well as for foreign keys.TEXT
types are commonly used in Postgres indexes, so we're excited to deliver this and support more Postgres schemas in Doltgres. We've already got a working prototype for this support, so stay tuned for an announcement on this one soon.- User-defined PL/pgSQL functions – We just recently released initial support for user-defined PL/pgSQL functions. We've got many of the basic statements covered (e.g. loops, branching, assignment, selecting from queries) but there are still a few more areas we're fleshing out (e.g. exception handling, variable aliases, cursors).
- Triggers – now that we've got support for user-defined functions, using those functions as triggers on tables is a logical next step. Triggers are a widely-used and powerful feature for relational databases, so we're excited to get these working in Doltgres soon.
We Need Your Help!
Doltgres has been making a TON of progress in the past few months, and we'll be continuing to announce lots more support for new features going forward.
You can help us out by taking Doltgres for a spin and sharing feedback on any gaps or problems you hit! Try importing your Postgres schemas to Doltgres and let us know how it goes. If you can share your schema file with us, that's even better! If you don't have an existing Postgres data dump, you can try out the steps in the getting started guide.
Come join us on the DoltHub Discord server and let us know what's missing for you, or open up an issue on GitHub for us.