Announcing `dolt add --patch`

FEATURE RELEASE
4 min read

SQL and Git had a baby, and we named this bundle of joy Dolt. My last couple posts here on the blog have been about Git Workspaces and Dolt Workspaces, and that was to lay the groundwork for today's announcement: Dolt now supports add --patch!

A quick recap on Workspaces: When working with your source code and data, sometimes the need arises to commit only a subset of the changes you've made, and to leave other changes uncommitted. In Git the mechanism to do this is the git add --patch command. In Dolt you have the ability to update the dolt_workspace_* tables. Now, with dolt add --patch, Git users will feel more at home.

Let's dive in!

The Scenario

I'm going to use the same setup as in my last blog post where we have crowd sourced data that we have varying degrees of confidence in. The idea is that you are going to receive data which you need to evaluate and accept as part of your workflow. It may be the case that some data is not worth saving.

Start by creating a Dolt database using the commands line tools. All SQL examples will be run in the Dolt SQL Shell:

demo$ dolt init --fun
Successfully initialized dolt data repository.
demo$ dolt sql
# Welcome to the DoltSQL shell.
# Statements must be terminated with ';'.
# "exit" or "quit" (or Ctrl-D) to exit. "\help" for help.
demo/main>

We'll store that data in a table called ratings which is created with the following:

CREATE TABLE ratings (
    id INT NOT NULL AUTO_INCREMENT,
    rating FLOAT CHECK (rating >= 0 AND rating <= 100),
    confidence FLOAT CHECK (confidence >= -1.0 AND confidence <= 1.0), 
    PRIMARY KEY (id)
    );

Seed 5 rows with 0.0 values:

INSERT INTO ratings (rating, confidence) VALUES (0, 0), (0, 0), (0, 0), (0, 0), (0, 0);

Finally create a commit for that:

CALL DOLT_COMMIT("-Am", "seed ratings table");

Running add --patch

To simulate a data import, we are going to update all rows with random values, add a row, and drop a row:

UPDATE ratings SET rating = RAND() * 100, confidence = (RAND() * 2 - 1);
DELETE FROM ratings WHERE ID = 3;
INSERT INTO ratings (rating, confidence) VALUES (98,1.0);

Which leaves the ratings table in the following state:

select * from ratings;
+----+-----------+------------+
| id | rating    | confidence |
+----+-----------+------------+
| 1  | 13.405175 | -0.2208576 |
| 2  | 65.623566 | -0.8340564 |
| 4  | 54.157528 | 0.9325458  |
| 5  | 92.079475 | -0.9895077 |
| 6  | 98        | 1          |
+----+-----------+------------+

The diff view, CLI dolt diff or SQL Shell \diff will show you something like this (but with color):

$ dolt diff
diff --dolt a/ratings b/ratings
--- a/ratings
+++ b/ratings
+---+----+-----------+------------+
|   | id | rating    | confidence |
+---+----+-----------+------------+
| < | 1  | 0         | 0          |
| > | 1  | 13.405175 | -0.2208576 |
| < | 2  | 0         | 0          |
| > | 2  | 65.623566 | -0.8340564 |
| - | 3  | 0         | 0          |
| < | 4  | 0         | 0          |
| > | 4  | 54.157528 | 0.9325458  |
| < | 5  | 0         | 0          |
| > | 5  | 92.079475 | -0.9895077 |
| + | 6  | 98        | 1          |
+---+----+-----------+------------+

As you can see, even though there was an update to row 3 in the first update, there was also a removal. As a result in the diff you can see that the sum total of the changes was to delete the row. The update to that row was lost because we didn't commit it. The other previously existing rows have new values, and row id 6 was added. The table at HEAD is always ephemeral if you don't commit it.

Now, if you run the dolt add --patch command, the summary of changes made and a prompt will be presented to the users:

$ dolt add --patch
Table                              Added / Modified / Removed
=====                              =====   ========   =======
ratings                            1       4          1
==============
Table: ratings
==============
+---+----+-----------+------------+
|   | id | rating    | confidence |
+---+----+-----------+------------+
| < | 1  | 0         | 0          |
| > | 1  | 13.405175 | -0.2208576 |
+---+----+-----------+------------+
Stage this row [y,n,q,a,d,s,?]?

You can also run the command from within the dolt sql Shell. Here is a capture to show you pretty colors!

Terminal Capture

Details

The example above uses the y and n commands to individually select rows to stage for commit. There are other options which you can see with ? command:

...
Stage this row [y,n,q,a,d,s,?]? ?
y - stage the current change
n - do not stage the current change
q - quit
a - add all changes in this table
d - do not stage any further changes in this table
s - show summary of unstaged changes and start over
? - show this help

Each command takes immediate effect when executed. If you enter y to a couple rows, then q, the two rows at the start will be staged and ready for commit.

Also, by default all tables will be presented in the summary in alphabetical order and each can be added to. If you add the optional table arguments to the command, you will be prompted for changes to those tables only:

dolt add --patch tableA tableB

Under the covers, dolt add --patch is a veneer on top of updates to dolt_workspace_* tables. First we determine which tables have unstaged changes using the dolt_status system table. Then we gather the changes from the unstaged tables using the workspace tables. Each change in those tables is then presented to the user for inclusion, and the appropriate UPDATE calls are made to the dolt_workspace_* table to achieve the desired outcome. For large datasets, evaluating each row will be infeasible and you may consider using SQL directly. You can mix and match approaches since the dolt_workspace_* tables are always calculated directly from your current session state.

Wrapping Up

dolt add --patch is going to be very familiar to all you Git users in the audience. We decided to keep it as simple as possible because the real power comes with using SQL to update the dolt_workspace_* tables, but there is probably more functionality we could add if you ask for it. Possibly more applicable to application builders - you can take inspiration from this tool to build your own domain specific tools to leverage the power of the Workspace. Come on over to the Dolt Discord and let us know what you're going to build!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.