Unlocking Time-Travel for Data and Code
Imagine, for a moment, that both code and data were not constrained to the present. With a "standard" database, your data lives only in the present. Sure, with backups you're able to recall the past, but to do so means to fully immerse yourself in the past by spinning up a new server with the backup's data. You cannot have data from both the past and the present at the same time...or at least, you couldn't before Dolt.
The Power of Version Control
Yes, Dolt is a full-fledged SQL database implementing MySQL's dialect.
Yes, Dolt is a versioned database, earning it the nickname Git for data.
But one thing the marketing buzzwords don't express is how powerful this marriage of versioning and a full-fledged database actually is. Let's take Git for example. Git is just for versioning your source code, and let's narrow that down even further to just the history aspect. You're able to checkout code from any commit, from any branch, at any point in time. You can build that code, run it, test it...but it exists on an isolated island that is its commit. The relationship between two commits explores code ownership and evolution, but not much outside of that, as the two commits are likely incompatible due to interface and logical changes.
This fundamentally differs from how Dolt relates two commits. Two commits—two potentially different sets of data—both have value separately and in combination with one another. Taking a contrived example of a table of names and addresses. That table will evolve over time as people move. The initial state of that table, however, potentially carries just as much at-present value as the current state. That the table has changed is, in itself, additional data. With Dolt, you can tap into that additional data, which I can show with a quick example:
$ mkdir example
$ cd example
$ dolt init
Successfully initialized dolt data repository.
$ dolt sql <<SQL
> CREATE TABLE houses (name VARCHAR(200) PRIMARY KEY, address VARCHAR(300));
> INSERT INTO houses VALUES ('John Doe', '1234 Drive St'), ('Jane Doe', '2345 Street Ave'), ('Jack Doe', '3456 Avenue Way');
> SQL
Query OK, 3 rows affected (0.00 sec)
$ dolt add -A
$ dolt commit -m "first set of people"
commit 965f2290jef6l9rhgmmq9bh909oq2l1i (HEAD -> main)
Author: Daylon Wilkins <daylon@dolthub.com>
Date: Wed Jan 18 12:00:00 -0800 2023
first set of people
$ dolt sql -q "REPLACE INTO houses VALUES ('Jane Doe', '4567 Way Park');"
Query OK, 2 rows affected (0.00 sec)
$ dolt commit -am "jane moved"
commit 54i43tbng92462hlhtlinttsouc1h03t (HEAD -> main)
Author: Daylon Wilkins <daylon@dolthub.com>
Date: Wed Jan 18 12:00:01 -0800 2023
jane moved
$ dolt sql <<SQL
> SELECT
> old.name AS 'Name',
> old.address AS 'Old',
> new.address AS 'New'
> FROM houses AS new
> JOIN houses AS OF 'HEAD~1' AS old ON
> old.name=new.name AND
> old.address!=new.address;
> SQL
+----------+-----------------+---------------+
| Name | Old | New |
+----------+-----------------+---------------+
| Jane Doe | 2345 Street Ave | 4567 Way Park |
+----------+-----------------+---------------+
This example takes advantage of AS OF
, and we use it to gather the data of who moved between our two commits.1
This drives home that, in the world of versioning, the data between any two commits is additional data, regardless of whether those commits represent code or data.
As most people are used to the idea of versioning through source code, the value of this additional data is limited, but not so with a versioned database such as Dolt.
In fact, versioning as a concept is more powerful when applied to data than when applied to source code.
So what about Stored Procedures?
I've spent most of this blog talking about a single aspect of versioning—exploring history through commits—so let's go over how stored procedures operate in this space. As I'm sure most are aware, databases have stored procedures, which are basically programs that are run directly by the database. On the surface, they seem to run into the same limitations that versioning source code faces, except for one major difference: stored procedures are, fundamentally, programs built to operate on data.
This transitively gives stored procedures in Dolt all of the benefits of working with versioned data in Dolt. However, stored procedures are, themselves, just a form of data. When referring to standard source code, this distinction doesn't carry much importance, but when talking about programs that are meant solely to work with data, this distinction revolutionizes how one thinks about the relationship between code and data. Dolt allows you to truly explore the relationship of data over time, and that extends to stored procedures as well. This means that you can run stored procedures from arbitrary commits using data from arbitrary commits.
This does come with a few limitations though. What has happened in the past may be queried, but it cannot be altered.2 This means that stored procedures, when referencing past data, may not modify that data. To relate this to versioning source code using Git, you can checkout any commit that you want, but as soon as you edit a file, you're put into a state where you're now working on a new commit. Back to stored procedures, another limitation is that any references to data within the stored procedure must be valid at the queried point in history. If a table does not exist at some point in time, or the schema has changed and no longer carries the desired column, then the result is indeterminate and thus an error. Besides those two limitations, well, there are no limitations.
Controlling Stored Procedures Across Time
I gave an example earlier of accessing the data of how a table changed, so let's look at how to do something similar for stored procedures. As I mentioned, stored procedures are another form of data, and we allow you to separately control whether you want to reference a specific stored procedure, or whether you want to allow that stored procedure to access data from a specific point. You can, of course, do both at the same time as well.
Let's create a very simple repository with a single table, a single column, and a single stored procedure. We'll use the CLI again, but this example is a breeze to replicate by running Dolt as a SQL server.3
Running Dolt as a SQL server is also incredibly easy, and you can find an example in the blog post where I talk about branch permissions. You'll be hard-pressed to find a database that is as easy to use and get up-and-running as Dolt.
$ mkdir example
$ cd example
$ dolt init
Successfully initialized dolt data repository.
$ dolt sql <<SQL
> CREATE TABLE simple (v1 BIGINT);
> INSERT INTO simple VALUES (2);
> CREATE PROCEDURE proc() SELECT v1 * 10 FROM simple;
> CALL DOLT_ADD('-A');
> CALL DOLT_COMMIT('-m', 'single table and procedure');
> CALL DOLT_BRANCH('other');
> DROP PROCEDURE proc;
> CREATE PROCEDURE proc() SELECT v1 * 100 FROM simple;
> UPDATE simple SET v1 = 3;
> CALL DOLT_ADD('-A');
> CALL DOLT_COMMIT('-m', 'updated table and procedure');
> SQL
Query OK, 1 row affected (0.00 sec)
...
We now have a table with a single value of 3
, and a stored procedure that takes the value in that table and multiplies it by 100
.
We also have a branch named other
that references the original commit, which had a table containing 2
and a stored procedure that multiplies the value in the table by 10
.
First, let's call the stored procedure like one normally would.
$ dolt sql -q 'CALL proc();'
+-------------------+
| (simple.v1 * 100) |
+-------------------+
| 300 |
+-------------------+
Exactly what we would expect. Now, let's reference our first stored procedure, but still use it with the current data.
$ dolt sql -q 'CALL `example/other`.proc();'
+------------------+
| (simple.v1 * 10) |
+------------------+
| 30 |
+------------------+
Normally you're able to specify which database you'd like to execute the stored procedure from, and if a database isn't provided (like in the previous statement), we assume the current database. We've enhanced that functionality, allowing you to specify additional information about that database, which in this case is the branch. This allows you to reference only the stored procedure's definition, while still using the current data.
If you want to reference the old data while using the current definition, then you can do so using the AS OF
syntax I introduced earlier.
$ dolt sql -q 'CALL proc() AS OF "other";'
+-------------------+
| (simple.v1 * 100) |
+-------------------+
| 200 |
+-------------------+
Finally, you can combine the two to reference the previous definition along with the previous data.
$ dolt sql -q 'CALL `example/other`.proc() AS OF "other";'
+------------------+
| (simple.v1 * 10) |
+------------------+
| 20 |
+------------------+
There are a few things that I'm not covering in these examples, such as how we propagate the commit state of definitions and data to nested CALL
statements, but much of that can be inferred to work as you would expect.
I hope that, by now, you've seen how powerful version control actually is when applied to data, and especially when used in combination with stored procedures. Dolt is the only database that is able to truly tap into the data that exists both discretely and implicitly within a versioned system. Once you've discovered that power, then you begin to understand just how valuable Dolt truly is.
We hope that you'll join us as we continue to work on Dolt. You can keep up to date with us through Twitter, or you can chat directly with us through Discord.
- I used
HEAD~1
to reference the previous commit, but it's valid to use other identifying information such as the commit hash as well. Also, Dolt has a suite of stored procedures and system tables available, along with standard Git-influenced commands from the CLI, so it's perhaps easier to accomplish this exact example using those. This was just for the sake of demonstration, and also to introduce theAS OF
syntax.↩ - The past may technically be altered by rewriting history since it's all data, but that's a different topic for another time.↩
- I chose to use the CLI as it's super easy to show as a single code block.↩