Creating multiple databases in Dolt

FEATURE RELEASESQL
6 min read

Dolt is a SQL database with git-like version control capabilities, the only SQL database you can branch and merge, clone and fork, push and pull like a git repository. Until very recently, Dolt was lacking a very basic ability that almost every other SQL database has: the ability to CREATE and DROP databases via SQL statements. Today we're excited to announce that this functionality is available in the most recent release.

This blog post is about how the new feature works, and a discussion of the design decisions that led to us delaying releasing the feature for so long.

Dolt database == git repository?

Dolt is git for data, and early in our interface design we leaned very heavily into the git metaphor, copying git exactly. If you know a command for git, it works the same for dolt. You can even google questions about git, and the answers will be applicable to identical problems in dolt.

% git clone myOrg/myRepo            % dolt clone myOrg/myDb
% git checkout -b feature-branch    % dolt checkout -b feature-branch
% git add file                      % dolt add table
% git commit -m "message"           % dolt commit -m "message"

And of course, to initialize a dolt database, you didn't spin up a database server and execute create database myDb. You called dolt init, just like you would to initialize a git repo.

% git init            % dolt init

This was a stumbling block for many customers, especially those that came to us with data from another database they were trying to import into dolt. Most of them wanted to use some standard database dump tool, like pg_dump or mysqldump, to export their data. And the first thing most of those tools do is:

DROP DATABASE IF EXISTS myDb;
CREATE DATABASE myDb;

We used a couple hacky workarounds to make these statements parse with Dolt but not actually do anything, so they worked with the dump and import workflow. But you still had to dolt init before you started your SQL server, or it wouldn't work.

In fact, if you tried to run dolt sql or dolt sql-server outside of a dolt database directory, you would see an error:

% dolt sql
The current directory is not a valid dolt repository.
run: dolt init before trying to run this command

In retrospect, hewing so close to the git standard, while a good choice for getting our customers to quickly understand the CLI and versioning semantics, was a mistake for the CREATE DATABASE use case. And it was long overdue to be corrected.

dolt sql outside a dolt database directory

Now, when you run dolt sql in any directory, it just works. If you ran the command in a directory with no databases, then there are no databases present in the shell. If you create new ones, they get created in subdirectories of the current one.

% dolt sql
# Welcome to the DoltSQL shell.
# Statements must be terminated with ';'.
# "exit" or "quit" (or Ctrl-D) to exit.
> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
+--------------------+
> create database mydb;
> use mydb;
Database changed
mydb> create table test (a int primary key);
mydb> insert into test values (1);
Query OK, 1 row affected
mydb> select dolt_commit('-am', 'new table');
+----------------------------------+
| dolt_commit('-am', 'new table')  |
+----------------------------------+
| 13v8ns8rdu0c1ac4rbql095m1pcfh46m |
+----------------------------------+
mydb> create database foo;
mydb> use foo;
Database changed
foo> create table test2 (a int primary key);
foo> show tables;
+-------+
| Table |
+-------+
| test2 |
+-------+
foo> exit
Bye

Let's examine what these commands did. Both new databases were created in new directories rooted at the current working directory. We can cd into those directories to examine them:

zachmu@zachmu-laptop:Users/zachmu/sql% ls
foo  mydb
zachmu@zachmu-laptop:Users/zachmu/sql% cd mydb
zachmu@zachmu-laptop:zachmu/sql/mydb% dolt status
On branch main
nothing to commit, working tree clean
zachmu@zachmu-laptop:zachmu/sql/mydb% dolt log
commit 13v8ns8rdu0c1ac4rbql095m1pcfh46m
Author: Zach Musgrave <zach@dolthub.com>
Date:   Sat Nov 20 16:45:28 -0800 2021

        new table

commit 2d7l94ummql83qrap9dthblmgn7eoga5
Author: Zach Musgrave <zach@dolthub.com>
Date:   Sat Nov 20 16:44:50 -0800 2021

        Initialize data repository

zachmu@zachmu-laptop:zachmu/sql/mydb% cd ../foo
zachmu@zachmu-laptop:zachmu/sql/foo% dolt status
On branch main
Untracked files:
  (use "dolt add <table|doc>" to include in what will be committed)
        new table:      test2

dolt sql (and dolt sql-server) now automatically look for databases in the current working directory, and in subdirectories of that directory. Let's see what happens when we run dolt sql from the new foo database we created above.

zachmu@zachmu-laptop:Users/zachmu/sql% cd foo
zachmu@zachmu-laptop:zachmu/sql/foo% pwd
/c/Users/zachmu/sql/foo
zachmu@zachmu-laptop:zachmu/sql/foo% dolt sql -q 'show databases';
+--------------------+
| Database           |
+--------------------+
| foo                |
| information_schema |
+--------------------+
zachmu@zachmu-laptop:zachmu/sql/foo% dolt sql -q 'create database newDb';
zachmu@zachmu-laptop:zachmu/sql/foo% ls
newDb
zachmu@zachmu-laptop:zachmu/sql/foo% dolt sql -q 'show databases';
+--------------------+
| Database           |
+--------------------+
| foo                |
| information_schema |
| newDb              |
+--------------------+
zachmu@zachmu-laptop:zachmu/sql/foo%

So in the above example, the database foo is rooted in the directory /c/Users/zachmu/sql/foo, and any new databases created get their own subdirectory there. Because we started the sql shell in /c/Users/zachmu/sql/foo, the newDb database directory gets created at /c/Users/zachmu/sql/foo/newDb.

The nesting only works one level. If we cd .. back to our original db directory, we only see the two original databases we created, not newDb.

achmu@zachmu-laptop:zachmu/sql/foo% cd ../
zachmu@zachmu-laptop:Users/zachmu/sql% pwd
/c/Users/zachmu/sql
zachmu@zachmu-laptop:Users/zachmu/sql% dolt sql -q "show databases";
+--------------------+
| Database           |
+--------------------+
| foo                |
| information_schema |
| mydb               |
+--------------------+

Dropping databases now works too, and physically deletes data

This was a point of contention in the office, because dolt's unique selling point is being a versioned database, and DROP DATABASE is unrecoverable. We went back and forth on this several times, debating whether DROP DATABASE should just do a logical delete (by moving the dropped database to a recycling bin or similar), or actually physically delete the database files. But the problem with not physically deleting the database files is that we would need to introduce special administrative procedures to do the physical delete, since that's what a customer expects DROP DATABASE to actually do. So for now, when you DROP DATABASE, it removes the directory from the filesystem.

zachmu@zachmu-laptop:Users/zachmu/sql% ls
foo  mydb
zachmu@zachmu-laptop:Users/zachmu/sql% dolt sql -q "drop database foo";
zachmu@zachmu-laptop:Users/zachmu/sql% ls
mydb

After releasing this feature we had a partial change of heart: we didn't want DROP DATABASE to be the only thing you can do to a dolt database that's not reversible via source control operations. So we reached a compromise: by default DROP DATABASE does nothing and prints a warning. To make it actually delete a database on disk, you have to set a special session variable that's off by default. This will become the default in our next release.

Same commit graph or separate?

In the above examples, all of the databases have their own commit graph and history, and are versioned independently. They cohabitate in the same server, but there's no way to create a branch across all of them, or commit an atomic update to more than one in the same transaction. Similarly, if you want to push them up to DoltHub, you have to run one push command each.

All of this is by design, and matches the functionality that our customers are asking for now.

But there's another way to think of multiple databases, which most RDBMS systems provide as an option. For example, in Postgres, when you CREATE DATABASE, by default what you're doing is mostly just creating a namespace. It's still possible to update multiple databases atomically in a single transaction. This is also true in MySQL (as long as they share the same storage engine).

In the future we anticipate that Dolt customers will want similar functionality, so that a dolt database will contain multiple schemas that share a commit graph, making it possible to branch across them all, or commit them all in the same transaction (or dolt commit). In fact, for most customers we think this should be the default behavior, and creating a database with a separate commit graph will be an unusual use case.

Conclusion

Have an opinion on multiple databases in dolt? Have an idea for how you might use dolt at your company but need some help getting started? Come over to our Discord to talk to the team and ask questions.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.