Dolt Diff Magic: Part 3 – Diffing tables at two revisions
Welcome back to the third post in this series exploring the tricks you can do with Dolt's magical versioning features! 🪄
In the first post of this series,
we explored how the dolt_diff
system table enables you to see what tables have changed, when they changed, and who changed them.
In the second post of this series,
we explored the dolt_diff_$tablename
system tables and demonstrated how you can see each individual data change
in your tables.
In this post, we're exploring use cases that require comparing a table's data at two revisions in order to calculate
all the data differences between those two points in the commit history. As we'll see, this is a particularly
useful tool for cases such as merging data across branches.
Introducing...
Before we jump into examples, let's quickly introduce the two Dolt features that we'll be using to compare table data at two revisions...
dolt_commit_diff_$tablename
System Table
The primary tool we'll use to compare a table's data at two revisions is the
dolt_commit_diff_$tablename
system table.
This system table exists for every user created table in your database.
It works a bit differently from dolt_diff_$tablename
that we looked at in part two of this series:
- You must specify a
to_commit
andfrom_commit
as filter expressions when using this table, otherwise you will get an error. - The commits you can work with aren’t limited to the current checked out branch – you can specify any two commits from any branches in your database. This makes this system table very useful for diffing changes to a table before you merge from another branch, as we'll see shortly.
dolt_commit_diff_$tablename
shows you the cumulative diff between the two commits you specified – not a log of each individual delta applied in a commit. In other words, if you change a single row 10 times betweento_commit
andfrom_commit
, you’ll only see one row returned fromdolt_commit_diff_$tablename
with the cumulative diff of all the changes to that row between those two commits. (If you don't remember how this is different from thedolt_diff_$tablename
system table, check out the second post in this series.)
Let's look at a quick example, using
the DoltDiffMagic database hosted on DoltHub.
We can use the dolt_commit_diff_inventory
system table to compare any two revisions of the data in our inventory
table.
By specifying two revisions in the to_commit
and from_commit
filter expressions, we can see exactly how
each row in our table changed, including the previous values in each changed row at from_commit
, the new values
in each row at to_commit
, commit metadata, and how the row changed (added
, removed
, or modified
).
To try this out locally, install Dolt,
clone the DoltDiffMagic database (dolt clone jfulghum/DoltDiffMagic
), then launch the Dolt SQL shell from that
new directory (dolt sql
) to connect to the database. Alternatively, you can access the database on DoltHub and
run this query and view the results directly from your browser.
select *
from dolt_commit_diff_inventory
where to_commit='d4q5m9ks9f8b4eq1rjskicnsaiq343cn' and
from_commit='0s185scjtbok85bg834embv7f7ueabkd';
+------------+----------+------------+----------+----------------------------------+-----------------------------------+-----------+------------+--------------+------------+----------------------------------+----------------------------------+-----------+
| to_name | to_price | to_item_id | to_count | to_commit | to_commit_date | from_name | from_price | from_item_id | from_count | from_commit | from_commit_date | diff_type |
+------------+----------+------------+----------+----------------------------------+-----------------------------------+-----------+------------+--------------+------------+----------------------------------+----------------------------------+-----------+
| NULL | NULL | NULL | NULL | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | soup | 2 | 1 | 91 | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | removed |
| blender | 15 | 3 | 99 | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | blender | 20 | 3 | 80 | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | modified |
| camera | 50 | 4 | 6 | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | camera | 50 | 4 | 7 | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | modified |
| walkman | 10 | 5 | 11 | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | walkman | 10 | 5 | 12 | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | modified |
| puppy chow | 15 | 6 | 8 | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | NULL | NULL | NULL | NULL | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | added |
| gum | 1 | 7 | 12 | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | NULL | NULL | NULL | NULL | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | added |
+------------+----------+------------+----------+----------------------------------+-----------------------------------+-----------+------------+--------------+------------+----------------------------------+----------------------------------+-----------+
dolt_diff()
Table Function
We'll primarily be using the dolt_commit_diff_$tablename
system table in this post, but I also want to introduce a new feature
we recently added to access diff information in SQL: the
dolt_diff()
system table function.
This new interface is very similar to the dolt_commit_diff_$tablename
system table that we just learned about, but there are
two key differences:
- Instead of being exposed as a table, this functionality is exposed as a table function. This is our first table function in Dolt 🎉, and there are some restrictions on how table functions can currently be used in queries. We hope to expand table function support and add other table functions, such as JSON_TABLE in the future.
- The
dolt_diff()
table function allows you to see the exact to and from schemas of your data. This is particularly helpful when you need to look at two revisions of your table that have different schemas. We'll see an example of this later.
Let's test out the dolt_diff()
table function to calculate the same diff from the dolt_commit_diff_$tablename
example above.
You can run this query in your local clone of the
DoltDiffMagic database, or you can
execute it directly on DoltHub.
Notice that, as expected, the same diff data is returned, but the syntax for using the dolt_diff()
table function is slightly different.
We'll see a more advanced example of dolt_diff()
later that shows why you'd want to consider using the dolt_diff()
system table
instead of the dolt_commit_diff_$tablename
system table.
select *
from dolt_diff('0s185scjtbok85bg834embv7f7ueabkd', 'd4q5m9ks9f8b4eq1rjskicnsaiq343cn', "inventory");
+------------+----------+------------+----------+----------------------------------+-----------------------------------+-----------+------------+--------------+------------+----------------------------------+----------------------------------+-----------+
| to_name | to_price | to_item_id | to_count | to_commit | to_commit_date | from_name | from_price | from_item_id | from_count | from_commit | from_commit_date | diff_type |
+------------+----------+------------+----------+----------------------------------+-----------------------------------+-----------+------------+--------------+------------+----------------------------------+----------------------------------+-----------+
| NULL | NULL | NULL | NULL | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | soup | 2 | 1 | 91 | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | removed |
| blender | 15 | 3 | 99 | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | blender | 20 | 3 | 80 | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | modified |
| camera | 50 | 4 | 6 | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | camera | 50 | 4 | 7 | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | modified |
| walkman | 10 | 5 | 11 | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | walkman | 10 | 5 | 12 | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | modified |
| puppy chow | 15 | 6 | 8 | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | NULL | NULL | NULL | NULL | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | added |
| gum | 1 | 7 | 12 | d4q5m9ks9f8b4eq1rjskicnsaiq343cn | 2022-03-31 16:02:54.613 +0000 UTC | NULL | NULL | NULL | NULL | 0s185scjtbok85bg834embv7f7ueabkd | 2022-03-18 21:24:44.55 +0000 UTC | added |
+------------+----------+------------+----------+----------------------------------+-----------------------------------+-----------+------------+--------------+------------+----------------------------------+----------------------------------+-----------+
Diffs in Action
Alright, now that you know about the dolt_commit_diff_$tablename
system tables and the dolt_diff()
system table function,
let's put them to use in a couple of realistic scenarios where we need to inspect the differences in our data.
Merging Branches
The first big use case for diffing table data at two revisions that we'll look at today is merging changes across branches.
Branches are a powerful feature in Dolt that
allows you to logically organize and isolate your changes until they're ready to be merged with your main branch, or any other branch.
When you merge changes across branches, you need to review those changes before merging to catch any problems.
Using dolt_commit_diff_$tablename
makes it super easy to view those diffs as part of your merging process.
Continuing our inventory example from the previous post in this series, let's say that our store does quarterly inventory
checks to count all the physical inventory and update the state in our database. Counting the inventory is done incrementally
on a separate branch (inventory/2022-Q2
) and then after all the inventory is counted, those changes are merged to the main branch to go live.
If you checkout both the main
and inventory/2022-Q2
branches, you can
run this query locally against Dolt, or you can
execute it directly on DoltHub in your web browser.
select to_name, to_item_id, from_count, to_count
from dolt_commit_diff_inventory
where from_commit=HASHOF('main')
and to_commit=HASHOF('inventory/2022-Q2');
+----------+------------+------------+----------+
| to_name | to_item_id | from_count | to_count |
+----------+------------+------------+----------+
| nintendo | 2 | 9 | 8 |
| walkman | 5 | 11 | -11 |
| gum | 7 | 12 | 13 |
+----------+------------+------------+----------+
It looks like the data entry for walkman inventory was messed up! Someone must have accidentally entered that negative sign when recording the inventory count. Good thing we diffed our branch against main before we merged these changes over!
Comparing Data Across Schema Changes
Generally, we recommend using dolt_commit_diff_$tablename
, but when the schema of a table has significant changes between
the to_commit
and from_commit
revisions, you can use the dolt_diff()
system table function to see the full schema
as it existed at both the to_commit
and from_commit
revisions of your table.
Let's check out an example of diff'ing across schema changes. In our DoltDiffMagic database, we've got another table
called store_hours
that holds a text representation of the store's hours:
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| day_of_week | int | NO | PRI | | |
| hours | varchar(255) | NO | | | |
+-------------+--------------+------+-----+---------+-------+
On the v2
branch of our database, we've moved to a new and improved schema:
+-------------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------+------+-----+---------+-------+
| day_of_week | int | NO | PRI | | |
| open | int | YES | | | |
| close | int | YES | | | |
+-------------+------+------+-----+---------+-------+
When we use dolt_commit_diff_$tablename
, the schema at the tip of the current branch is used to create the to_
and from_
columns.
As you can see below, when we run the query below with dolt_commit_diff_store_hours
, we can't see all the columns in the two
versions of the tables, because the schema has diverged significantly.
select to_day_of_week, from_hours, to_hours
from dolt_commit_diff_store_hours
where from_commit=hashof("main") and to_commit=hashof("v2");
+----------------+------------+----------+
| to_day_of_week | from_hours | to_hours |
+----------------+------------+----------+
| 0 | 12 to 6 | NULL |
| 1 | 10 to 6 | NULL |
| 2 | 10 to 6 | NULL |
| 3 | 10 to 6 | NULL |
| 4 | 10 to 6 | NULL |
| 5 | 10 to 6 | NULL |
| 6 | closed | NULL |
+----------------+------------+----------+
The dolt_diff()
table function is able to generate the to_
and from_
columns more flexibly though – it includes a
to_
column for every column in the table's schema at the to
revision, and a from_
column for every column in the table's schema at the from
revision you
specify, allowing you to see the full changes between the two revisions.
select to_day_of_week, from_hours, to_open, to_close
from dolt_diff("main", "v2", "store_hours");
+----------------+------------+---------+----------+
| to_day_of_week | from_hours | to_open | to_close |
+----------------+------------+---------+----------+
| 0 | 12 to 6 | 12 | 18 |
| 1 | 10 to 6 | 10 | 18 |
| 2 | 10 to 6 | 10 | 18 |
| 3 | 10 to 6 | 10 | 18 |
| 4 | 10 to 6 | 10 | 18 |
| 5 | 10 to 6 | 10 | 18 |
| 6 | closed | NULL | NULL |
+----------------+------------+---------+----------+
Presto!
That wraps up this third post on Dolt's diff features. I hope you enjoyed learning about
the dolt_commit_diff_$tablename
system table, the
dolt_diff()
system table function, and
how they let you easily diff the contents of your tables across two revisions or
branches in your Dolt database.
Got questions on how to get started with Dolt, or how to use any of Dolt's diff features? Come join us on Discord and let us know! We’d love to connect with you and help you get started using Dolt!