Dolt Pro Tips

November 20, 2024

4 min read

Dolt is the world's first and only version controlled SQL database. We've been iterating on the concept for over six years now.

We're the world's experts in database version control. Along the way we've learned a few things. This article collects some of those "pro tips" into a handy list with links to more information.

Connect to Branches

There are a number of ways to use branches and history in Dolt. For those coming from Git, the most common way to switch branches is a git checkout. Dolt supports the equivalent command on the command line: dolt checkout. This will work exactly as you would expect. It will switch the branch to the named branch. It also resets tables to HEAD if the argument is a table name. What can't checkout do?

Unlike Git, Dolt also supports a SQL procedure equivalent aptly named dolt_checkout(). This will do the same thing as dolt checkout on the command line with a caveat. dolt_checkout() will switch the branch for the SQL session. New sessions will connect to the default branch. This differs from the command line behavior as the checked out branch is stored in the database metadata meaning restarting your computer and navigating back to your Dolt database on disk will keep the same checked out branch.

This gets especially confusing in the case of database connection pools where the active branch for a connection could be the checked out branch from a previous session. For this reason, we recommend avoiding dolt_checkout() in the SQL context and connecting to a branch instead. If you specify, <db>/<branch> in your connection string you will be connected to the <branch> branch, no checkout required. This plays nicely with most connection pools.

Use UUID Primary Leys

Dolt is a decentralized database. It is designed such that multiple copies can iterate independently, come together at a later time, and merge changes together. Astute database folks wonder what Dolt does with auto_increment keys. Aren't those guaranteed conflicts? I insert the next row on main and you insert a different next row on feature. Those rows both get the same key. Good luck with the merge.

If you're running a single server, Dolt is smart and maintains a global auto_increment counter for all branches. This helps but it doesn't help in the decentralized case. If two servers don't know anything about each other, they can't coordinate on an auto_increment counter.

For this reason, we recommend using UUID keys instead. You can even have them auto-generated using this handy bit of SQL.

create table t_uuid(
    id varchar(36) default(uuid()) primary key,
    c1 varchar(100)
);

Schema Overriding

Most databases have a single schema. If you want to change the schema, it is an involved process often called a schema migration. There are a bunch of tools and processes to support database schema migrations.

Dolt is the first database that can support multiple schema versions of the same database, either through history queries or schema changes on branches. This really freaks out some tools, especially Object Relational Mappers (ORMs). It's a pain to code against multiple schemas. History or branch queries that access different schema can break application code.

Dolt has a feature called schema overriding where Dolt returns all the queries run with the schema of a specific branch or commit. Added a column recently and want a history query to return with a NULL for that column? Use schema overriding. Schema overriding makes Dolt look like a more traditional single schema database so it works better with ORMs.

Collect Garbage

Dolt makes a lot of disk garbage, especially on import. A 1GB MySQL dump import can end up being 10-20GB on disk after importing it into Dolt. After garbage collection, Dolt will reduce that size back down to around 1GB. Garbage collection can be run using dolt gc on the command line or call dolt_gc() in SQL. Garbage collect regularly when using Dolt.

Automated garbage collection is coming soon.

JSON Also Merges

The general rule of Dolt merge conflicts is "if two branches modify the same row, column pair to different values, a merge conflict is generated". The exception to this rule is JSON columns. Dolt is smart enough to merge JSON documents that do not modify the same key. Merges at scale are fast given Dolt's use of Prolly Trees to store JSON and all other table data.

This is a three-way merge of JSON documents:

Base:

{
  "a": 1,
  "b": 2,
  "c": 3
}

Left:

{
  "a": 2,
  "b": 2,
  "c": 3
}

Right:

{
  "a": 1,
  "b": 2,
  "c": 2
}

Result:

{
  "a": 2,
  "b": 2,
  "c": 2
}

You can turn this behavior off and have multiple writes to JSON column, row pairs throw a conflict, but we think JSON merge really helps the utility of JSON columns in Dolt. So, leave it on.

Protect Your Branches

Dolt supports standard users and grants permissions. You can use these to specify fine-grained permissions for your tables.

Users and grants know nothing about branches. Dolt implements an additional layer of permissions called branch permissions. Want to lock down writes to the main branch but allow writes on other branches? This is what branch permissions are for. Dolt supports admin and write branch permissions. Any user can read from any branch as long as they have the correct table level permissions.

Unversioned Tables

Do you have tables in your database that you don't want tracked by Dolt version control? Dolt ignore allows you to have normal tables that are not tracked by version control, similar to .gitignore. Ignored tables are not global. The tables exist on a branch. To ignore tables add them to the dolt_ignore system table.

Undrop a Database

We've all been there. I bet you or someone you know has accidentally dropped a production database. Because of version control, Dolt is more resilient to a number of operator errors than a traditional database. For a while, however, drop database was Dolt's kryptonite. A drop database would delete the database and all its history, making it unrecoverable.

But no longer. Dolt now supports dolt_undrop() to restore dropped databases. We've had some customers thank us for this feature.

`remotesrv` is a lightweight remote

Like Git, Dolt supports the concept of remotes. A remote is a copy of the database, usually on another computer, that you interact with via clone, push, pull, and fetch. DoltHub, like GitHub, is the canonical example of a remote.

Dolt supports a number of other remotes. If you're looking for a simple HTTP remote with no graphical user interface, some customers use remotesrv, an open source Dolt remote available via the Dolt Git repository. Just compile and run it for a simple remote accessible via HTTP.

Conclusion

With all those tips, you're now a Dolt Pro yourself. Need more tips? Stop by our Discord and just ask. Someone is always around to help.

Blog