Creating multiple databases in Dolt
Dolt is a SQL database with git-like version
control capabilities, the only SQL database you can branch and merge,
clone and fork, push and pull like a git repository. Until very
recently, Dolt was lacking a very basic ability that almost every
other SQL database has: the ability to CREATE
and DROP
databases
via SQL statements. Today we're excited to announce that this
functionality is available in the most recent release.
This blog post is about how the new feature works, and a discussion of the design decisions that led to us delaying releasing the feature for so long.
Dolt database == git repository?
Dolt is git for data, and early in our interface design we leaned very
heavily into the git metaphor, copying git
exactly. If you know a
command for git
, it works the same for dolt
. You can even google
questions about git
, and the answers will be applicable to identical
problems in dolt
.
% git clone myOrg/myRepo % dolt clone myOrg/myDb
% git checkout -b feature-branch % dolt checkout -b feature-branch
% git add file % dolt add table
% git commit -m "message" % dolt commit -m "message"
And of course, to initialize a dolt database, you didn't spin up a
database server and execute create database myDb
. You called dolt init
, just like you would to initialize a git repo.
% git init % dolt init
This was a stumbling block for many customers, especially those that
came to us with data from another database they were trying to import
into dolt. Most of them wanted to use some standard database dump
tool, like pg_dump
or mysqldump
, to export their data. And the
first thing most of those tools do is:
DROP DATABASE IF EXISTS myDb;
CREATE DATABASE myDb;
We used a couple hacky workarounds to make these statements parse with
Dolt but not actually do anything, so they worked with the dump and
import workflow. But you still had to dolt init
before you started
your SQL server, or it wouldn't work.
In fact, if you tried to run dolt sql
or dolt sql-server
outside
of a dolt database directory, you would see an error:
% dolt sql
The current directory is not a valid dolt repository.
run: dolt init before trying to run this command
In retrospect, hewing so close to the git standard, while a good
choice for getting our customers to quickly understand the CLI and
versioning semantics, was a mistake for the CREATE DATABASE
use
case. And it was long overdue to be corrected.
dolt sql
outside a dolt database directory
Now, when you run dolt sql
in any directory, it just works. If you
ran the command in a directory with no databases, then there are no
databases present in the shell. If you create new ones, they get
created in subdirectories of the current one.
% dolt sql
# Welcome to the DoltSQL shell.
# Statements must be terminated with ';'.
# "exit" or "quit" (or Ctrl-D) to exit.
> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
+--------------------+
> create database mydb;
> use mydb;
Database changed
mydb> create table test (a int primary key);
mydb> insert into test values (1);
Query OK, 1 row affected
mydb> select dolt_commit('-am', 'new table');
+----------------------------------+
| dolt_commit('-am', 'new table') |
+----------------------------------+
| 13v8ns8rdu0c1ac4rbql095m1pcfh46m |
+----------------------------------+
mydb> create database foo;
mydb> use foo;
Database changed
foo> create table test2 (a int primary key);
foo> show tables;
+-------+
| Table |
+-------+
| test2 |
+-------+
foo> exit
Bye
Let's examine what these commands did. Both new databases were created
in new directories rooted at the current working directory. We can
cd
into those directories to examine them:
zachmu@zachmu-laptop:Users/zachmu/sql% ls
foo mydb
zachmu@zachmu-laptop:Users/zachmu/sql% cd mydb
zachmu@zachmu-laptop:zachmu/sql/mydb% dolt status
On branch main
nothing to commit, working tree clean
zachmu@zachmu-laptop:zachmu/sql/mydb% dolt log
commit 13v8ns8rdu0c1ac4rbql095m1pcfh46m
Author: Zach Musgrave <zach@dolthub.com>
Date: Sat Nov 20 16:45:28 -0800 2021
new table
commit 2d7l94ummql83qrap9dthblmgn7eoga5
Author: Zach Musgrave <zach@dolthub.com>
Date: Sat Nov 20 16:44:50 -0800 2021
Initialize data repository
zachmu@zachmu-laptop:zachmu/sql/mydb% cd ../foo
zachmu@zachmu-laptop:zachmu/sql/foo% dolt status
On branch main
Untracked files:
(use "dolt add <table|doc>" to include in what will be committed)
new table: test2
dolt sql
(and dolt sql-server
) now automatically look for
databases in the current working directory, and in subdirectories of
that directory. Let's see what happens when we run dolt sql
from the
new foo
database we created above.
zachmu@zachmu-laptop:Users/zachmu/sql% cd foo
zachmu@zachmu-laptop:zachmu/sql/foo% pwd
/c/Users/zachmu/sql/foo
zachmu@zachmu-laptop:zachmu/sql/foo% dolt sql -q 'show databases';
+--------------------+
| Database |
+--------------------+
| foo |
| information_schema |
+--------------------+
zachmu@zachmu-laptop:zachmu/sql/foo% dolt sql -q 'create database newDb';
zachmu@zachmu-laptop:zachmu/sql/foo% ls
newDb
zachmu@zachmu-laptop:zachmu/sql/foo% dolt sql -q 'show databases';
+--------------------+
| Database |
+--------------------+
| foo |
| information_schema |
| newDb |
+--------------------+
zachmu@zachmu-laptop:zachmu/sql/foo%
So in the above example, the database foo
is rooted in the directory
/c/Users/zachmu/sql/foo
, and any new databases created get their own
subdirectory there. Because we started the sql shell in
/c/Users/zachmu/sql/foo
, the newDb
database directory gets created
at /c/Users/zachmu/sql/foo/newDb
.
The nesting only works one level. If we cd ..
back to our original
db directory, we only see the two original databases we created, not
newDb
.
achmu@zachmu-laptop:zachmu/sql/foo% cd ../
zachmu@zachmu-laptop:Users/zachmu/sql% pwd
/c/Users/zachmu/sql
zachmu@zachmu-laptop:Users/zachmu/sql% dolt sql -q "show databases";
+--------------------+
| Database |
+--------------------+
| foo |
| information_schema |
| mydb |
+--------------------+
Dropping databases now works too, and physically deletes data
This was a point of contention in the office, because dolt's unique
selling point is being a versioned database, and DROP DATABASE
is
unrecoverable. We went back and forth on this several times, debating
whether DROP DATABASE
should just do a logical delete (by moving the
dropped database to a recycling bin or similar), or actually
physically delete the database files. But the problem with not
physically deleting the database files is that we would need to
introduce special administrative procedures to do the physical delete,
since that's what a customer expects DROP DATABASE
to actually
do. So for now, when you DROP DATABASE
, it removes the directory
from the filesystem.
zachmu@zachmu-laptop:Users/zachmu/sql% ls
foo mydb
zachmu@zachmu-laptop:Users/zachmu/sql% dolt sql -q "drop database foo";
zachmu@zachmu-laptop:Users/zachmu/sql% ls
mydb
After releasing this feature we had a partial change of heart: we
didn't want DROP DATABASE
to be the only thing you can do to a dolt
database that's not reversible via source control operations. So we
reached a compromise: by default DROP DATABASE
does nothing and
prints a warning. To make it actually delete a database on disk, you
have to set a special session variable that's off by default. This
will become the default in our next release.
Same commit graph or separate?
In the above examples, all of the databases have their own commit
graph and history, and are versioned independently. They cohabitate in
the same server, but there's no way to create a branch across all of
them, or commit an atomic update to more than one in the same
transaction. Similarly, if you want to push
them up to
DoltHub, you have to run one push
command
each.
All of this is by design, and matches the functionality that our customers are asking for now.
But there's another way to think of multiple databases, which most
RDBMS systems provide as an option. For example, in Postgres, when you
CREATE DATABASE
, by default what you're doing is mostly just
creating a namespace. It's still possible to update multiple databases
atomically in a single transaction. This is also true in MySQL (as
long as they share the same storage engine).
In the future we anticipate that Dolt customers will want similar
functionality, so that a dolt database will contain multiple schemas
that share a commit graph, making it possible to branch across them
all, or commit them all in the same transaction (or dolt commit
). In
fact, for most customers we think this should be the default behavior,
and creating a database with a separate commit graph will be an
unusual use case.
Conclusion
Have an opinion on multiple databases in dolt? Have an idea for how you might use dolt at your company but need some help getting started? Come over to our Discord to talk to the team and ask questions.