Supporting AUTO_INCREMENT
Dolt is a database built for collaboration and data distribution. It's "Git for Data,"
a SQL database you can branch, merge, diff, clone, fork, push and pull. We intend to become a fully MySQL
compatible database. Today, we're announcing support for AUTO_INCREMENT
columns in Dolt.
The Basics
For the unfamiliar, AUTO_INCREMENT
is column property that allows you to generate sequential primary keys for your
data, automatically. If your data doesn't have a natural unique identifier, the database can create one for you,
removing this logic from the application layer.
dolt> create table auto_inc (
-> auto_pk int PRIMARY KEY AUTO_INCREMENT,
-> myStr text
-> );
dolt> INSERT INTO auto_inc (myStr) VALUES ("abc"),("xyz"),("dupe"),("dupe");
dolt> SELECT * FROM auto_inc;
+---------+-------+
| auto_pk | myStr |
+---------+-------+
| 1 | abc |
| 2 | xyz |
| 3 | dupe |
| 4 | dupe |
+---------+-------+
Unadultered, auto increment will generate ascending, sequential keys. If inserted values skip over a range of keys, auto increment will insert after the latest keys so that key order always matches insert order:
dolt> INSERT INTO auto_inc VALUES (10,"asdf");
dolt> INSERT INTO auto_inc (myStr) VALUES ("aoeu");
dolt> SELECT * FROM auto_inc;
+---------+-------+
| auto_pk | myStr |
+---------+-------+
| 1 | abc |
| 2 | xyz |
| 3 | dupe |
| 4 | dupe |
| 10 | asdf |
| 11 | aoeu |
+---------+-------+
The Implementation
At face value, auto increment seems to generate the next value in the sequence using max(val) + 1
. The initial
implementation of auto increment in Dolt made this assumption and generated auto increment values using a sub query.
However, a closer inspection of the MySQL documentation
shows a more complex and stateful behavior. MySQL auto increment values can be "burned" either by deletion or through
a rolled-back transaction. If the largest key value is deleted, we still don't want to reuse it:
dolt> DELETE FROM auto_inc WHERE auto_pk = 11;
dolt> INSERT INTO auto_inc (myStr) VALUES ("newest");
dolt> SELECT * FROM auto_inc;
+---------+--------+
| auto_pk | myStr |
+---------+--------+
| 1 | abc |
| 2 | xyz |
| 3 | dupe |
| 4 | dupe |
| 10 | asdf |
| 12 | newest |
+---------+--------+
Properly following MySQL's auto increment semantics meant creating some table-level state to track auto increment
sequences. Dolt's SQL functionality is implemented using go-mysql-server
an open-source SQL engine. go-mysql-server
allows queries over an arbitrary persistence layer by providing a set of
golang interfaces for the persistence layer to implement. The interface for supporting AUTO_INCREMENT
is:
type AutoIncrementTable interface {
Table
// GetAutoIncrementValue gets the next AUTO_INCREMENT value.
GetAutoIncrementValue(*Context) (interface{}, error)
// AutoIncrementSetter returns an AutoIncrementSetter.
AutoIncrementSetter(*Context) AutoIncrementSetter
}
// AutoIncrementSetter provides support for altering a table's AUTO_INCREMENT sequence.
type AutoIncrementSetter interface {
// SetAutoIncrementValue sets a new AUTO_INCREMENT value.
SetAutoIncrementValue(*Context, interface{}) error
// Close finalizes the set operation, persisting the result.
Close() error
}
The persistence layer is responsible for providing and maintaining the auto increment sequence. The query engine is
responsible for inserting rows correctly given the next value in the sequence. go-mysql-server
even supports mutating
this table-level state via an ALTER TABLE
query:
blog> ALTER TABLE auto_inc AUTO_INCREMENT = 20;
blog> INSERT INTO auto_inc (myStr) VALUES ("twenty");
blog> SELECT * FROM auto_inc;
+---------+--------+
| auto_pk | myStr |
+---------+--------+
| 1 | abc |
| 2 | xyz |
| 3 | dupe |
| 4 | dupe |
| 10 | asdf |
| 12 | newest |
| 20 | twenty |
+---------+--------+
The Tradeoffs
Choosing MySQL compatibility was a central decision in Dolt's product evolution. It was important to emulate a known
quantity in order to ease adoption and flatten Dolt's learning curve. Mixing the functionality of MySQL and Git creates
a tool that feels familiar the first time you use it.
However, we found plenty of gotchas as we implement
more of MySQL's functionality. In some more obscure cases we've decided to deviate from MySQL's behavior when it makes
sense to. While it was important to create table state for correct auto increment semantics, it's less clear whether
we should implement the LAST_INSERT_ID()
function. This function has unintuitive and, at times, undefined
semantics. From the MySQL documentation:
For a multiple-row insert, LAST_INSERT_ID() and mysql_insert_id() actually return the AUTO_INCREMENT key from the first of the inserted rows. This enables multiple-row inserts to be reproduced correctly on other servers in a replication setup.
You should be aware that, if you mix references to LAST_INSERT_ID() and LAST_INSERT_ID(expr), the effect is undefined.
MySQL is so widely used that even a poorly defined API such as this is relied upon in some contexts.
Eventually we will replicate these idiosyncrasies. Dolt users can still inspect a table's auto increment state
using the information_schema
database:
dolt> SELECT table_name,`auto_increment` FROM information_schema.tables WHERE table_name = 'auto_inc';
+------------+----------------+
| table_name | auto_increment |
+------------+----------------+
| auto_inc | 20 |
+------------+----------------+
Conclusion
As we build out Dolt's SQL functionality (and correctness) the product continues to become more powerful and more usable. Auto increment is just the latest example. We're committed to solving data collaboration and distribution. We believe Dolt is the tool to do it. Let us know what you'd like to see us build next!