Transactions in a Database with Branches
Dolt is Git for data, a SQL database that you can fork, clone, branch, and merge. Today we're excited to announce the alpha release of SQL transactions!
This blog will cover how transactions work in Dolt and how this differs from more traditional RDBMS transaction implementations.
What are transactions?
Transactions are a very old concept in relational databases. They're a way to safely manage the shared state of a database across many different readers and writers, so that two people can edit the same database at the same time. They're used for a lot of things in database applications, but broadly speaking they have three primary goals:
-
Atomic units of work. Transactions let you issue multiple
INSERT
,UPDATE
, andDELETE
statements, then guarantee that either all of them go into effect at once, or none of them do. This happens when youCOMMIT
orROLLBACK
the transaction, respectively. -
Isolation. During a long-running transaction, you don't want other clients to see your changes until after you
COMMIT
. And you don't want to see any changes they are making either. -
Locking. The database needs to keep track of who is editing which rows, so that they are applied in a consistent order. It should be impossible for someone's changes to be silently overwritten by another transaction.
Consider this table:
mysql> describe cities;
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| name | varchar(100) | NO | PRI | NULL | |
| population | int | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+
2 rows in set (0.02 sec)
Let's say that I have two clients editing this table at the same time. This is what they see:
Client #1 Client #2
mysql> select * from cities; mysql> select * from cities;
+---------+------------+ +---------+------------+
| name | population | | name | population |
+---------+------------+ +---------+------------+
| Houston | 100 | | Houston | 100 |
+---------+------------+ +---------+------------+
1 row in set (0.01 sec) 1 row in set (0.01 sec)
mysql> insert into cities values mysql> insert into cities values
("Miami", 200); ("New York", 200);
Query OK, 1 row affected (0.01 sec) Query OK, 1 row affected (0.01 sec)
mysql> insert into cities values mysql> insert into cities values
("Boston", 300); ("Seattle", 300);
Query OK, 1 row affected (0.01 sec) Query OK, 1 row affected (0.01 sec)
mysql> select * from cities; mysql> select * from cities;
+---------+------------+ +----------+------------+
| name | population | | name | population |
+---------+------------+ +----------+------------+
| Boston | 300 | | New York | 200 |
| Houston | 100 | | Houston | 100 |
| Miami | 200 | | Seattle | 300 |
+---------+------------+ +----------+------------+
3 rows in set (0.00 sec) 3 rows in set (0.00 sec)
mysql> commit; mysql> commit;
Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec)
mysql> select * from cities; mysql> select * from cities;
+----------+------------+ +----------+------------+
| name | population | | name | population |
+----------+------------+ +----------+------------+
| Boston | 300 | | New York | 200 |
| Houston | 100 | | Houston | 100 |
| Miami | 200 | | Seattle | 300 |
| New York | 200 | +----------+------------+
| Seattle | 300 | 3 rows in set (0.00 sec)
+----------+------------+
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from cities;
+----------+------------+
| name | population |
+----------+------------+
| Boston | 300 |
| Houston | 100 |
| Miami | 200 |
| New York | 200 |
| Seattle | 300 |
+----------+------------+
At the end of the example, Client #2 actually commits first, which begins a new transaction before Client #1 commits their changes. So they don't see Client #1's changes until they start another new transaction.
Transactions in Dolt
In Dolt, transactions work a little differently. Dolt is the only SQL database you can branch and merge. And in Dolt, every transaction you commit performs a merge as well.
This is easiest to see with an example. We'll modify our table above to add one more column:
mysql> desc cities;
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| name | varchar(100) | NO | PRI | NULL | |
| state | varchar(100) | YES | | NULL | |
| population | int | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+
When a transaction commits, it performs a merge using the current
globally shared state. You can imagine this working exactly as in git
branch and merge: every time you start a transaction you are checking
out a new branch, and every time you commit you're merging it back to
master
.
So far this isn't any different from a normal database. But let's say that two clients want to add a row with the same key to a table. What happens?
In MySQL and other RDBMS's, the second transaction blocks until the first commits, at which point you'll see an error in the second transaction:
Client #1 Client #2
mysql> select * from cities; mysql> select * from cities;
+---------+------------+ +---------+------------+
| name | population | | name | population |
+---------+------------+ +---------+------------+
| Houston | 100 | | Houston | 100 |
+---------+------------+ +---------+------------+
1 row in set (0.01 sec) 1 row in set (0.01 sec)
mysql> insert into cities values mysql> insert into cities values
("Miami", 200); ("Miami", 200);
Query OK, 1 row affected (0.01 sec) -- Query blocks until Client #1 commits
mysql> commit;
Query OK, 0 rows affected (0.01 sec) ERROR 1062 (23000): Duplicate entry 'Miami' for key 'cities.PRIMARY'
Dolt is different. If two clients add the same row to a table, there's no blocking and no error. It's same as if two branches of the database inserted the same rows and then wanted to merge back to master. Since the data is the same, there's no conflict.
On the other hand, what happens if two clients add the same key to a
table, but the non-key columns aren't the same? That's an error, since
there's no way to resolve the merge conflict without human help. The
second transaction to COMMIT
must roll back and retry their work.
The same thing happens with two conflicting UPDATE
statements to the
same row, or when one client UPDATES
a row that another client
DELETES
.
Finally, there's one more interesting wrinkle. If two clients both modify an existing row, but they don't modify the same columns, then there's no conflict, and both sets of modifications succeed. Dolt is the only SQL database in the world that can do this.
Compatibility
Depending on your application and its needs, the above flexibility might actually be a liability. Databases have been using transaction semantics similar to MySQL's for a long time, and many applications have built business logic around them.
Luckily, the transaction merge algorithm is flexible enough to
accommodate most of these concerns, and we'll be adding additional
merge modes to be more compatible with traditional RDBMS solutions for
those applications that require it. We imagine setting a system
variable or a custom property on the transaction itself. But for this
initial alpha release, transactions will behave the same way existing
command line customers expect from a dolt merge
command.
If you need different semantics from your transactions, please let us know.
Other transactional statements
The above examples only deal with START TRANSACTION
, COMMIT
, and
ROLLBACK
, which are the more commonly seen transactional
statements. But Dolt supports the full range of MySQL transaction
statements, which means SAVEPOINT
and ROLLBACK TO SAVEPOINT
, if
you need those too.
@@autocommit
works as well, and is on by default. It works exactly
the same as it does in MySQL.
For now, every transaction uses the REPEATABLE_READ
isolation level,
which is the default in MySQL and means that transactions don't see
any changes made by other transactions. MySQL has several other
isolation levels that you can read about
here. These
other isolation levels will be coming later, as customers need them.
Using transactions in Dolt today
For this alpha release, you'll need to take a couple extra steps to get it working. First, set an environment variable:
export DOLT_ENABLE_TRANSACTIONS=true
Then start your server with a config file that enables multiple
concurrent connections by setting the max_connections
field higher
than 1, as described in the docs.
A database with branches
Transactions are really just table stakes for a fully functional database. Dolt does some interesting things in terms of merging units of work without conflicts or locks, but what's really exciting about Dolt as a transactional database is the prospect of running a database serving multiple branches simultaneously, from the same host.
This is still a work in progress not ready to release, but it's what we're building towards in the near future. When you start a Dolt SQL server, clients can connect to a particular HEAD (a branch or other ref) in addition to the default. For example, you might use a connection string that looks like this:
"server=127.0.0.1;uid=root;pwd=12345;database=mydb:branches/feature"
"server=127.0.0.1;uid=root;pwd=12345;database=mydb:workspaces/fromWeb"
Similarly, you'll be able to change the branch you're connected to in the same sesssion by setting a session variable:
SET @@mydb_head_ref = "branches/feature";
SET @@mydb_head_ref = "workspaces/fromWeb";
Or by issuing a special SQL statement:
SELECT DOLT_CHECKOUT("feature");
And of course, you'll be able to merge one branch into another on a live server:
SELECT DOLT_MERGE("feature");
What's really exciting about this idea is that it extends the concept of transactions and makes it possible to persist them beyond a single session, without interfering with any other client's state. Simply make a branch for a user, connect them to it, and let them make as many changes as they want, on whatever schedule they want. Then when they're happy with the changes, merge them back into the production branch.
We're still very early in development for this concurrency model, and we'll have a lot more to say and show off in the coming weeks and months. Stay tuned!
Try it out today
If you haven't tried Dolt yet, download it and get started!
This is an alpha release under heavy development, and it would be great to hear from early adopters about what's missing for their own application's needs. Come join us on Discord to say hi and let us know what you think!