Column Tags

REFERENCE
9 min read

Dolt is the world's first and only version controlled SQL database. To support versioning, Dolt has some unique properties compared to other SQL databases.

Some of these properties are good, like Dolt's unique, custom storage engine. Some of these properties have proven problematic, like column tags. Column tags are so problematic, we've written 934 blogs and not one of them mentions column tags.

This article rectifies this injustice and finally, once and for all, explains column tags. The next time we talk about column tags will be when we deprecate them.

What are Column Tags?

Column tags are unique identifiers for columns. There is a command on the Dolt command line interface, dolt schema tags, which displays them.

$ mkdir column_tags_example
$ cd column_tags_example 
$ dolt init
Successfully initialized dolt data repository.
$ dolt sql -q "create table t (id int primary key auto_increment, words varchar(100))" 
$ dolt schema tags  
+-------+--------+------+
| table | column | tag  |
+-------+--------+------+
| t     | id     | 2831 |
| t     | words  | 8169 |
+-------+--------+------+

What sorcery is this?

History of Column Tags

To understand Dolt column tags, it helps to walk back in time and learn their history.

Dolt was initially built as a data sharing tool. It was only later that Dolt fully embraced the Online Transaction Processing (OLTP) use case and became a full fledged MySQL-compatible database.

In the data sharing use case, one of the primary concerns we had was "forking storage". To be an efficient data sharing platform, we believed that as much data needed to be shared across versions as possible and determining the differences between versions (ie. diff) had to be fast. Dolt's core data structure, the Prolly Tree, accomplishes both of these feats. However, a Prolly Tree can only represent one table schema. If you add a column to a table, Dolt makes a brand new Prolly Tree, forking storage and making diffs between versions before and after the schema change slow.

Forking Storage

In the case of column additions, forking storage is a necessary evil. Dolt forks storage on column additions to this day. However, for column renames, we could avoid forking storage by not relying on the name of the column to look up values in the Prolly Tree but instead generate and use a unique identifier to represent that column. If the column is renamed, no big deal, the identifier stays the same. And the column tag was born.

No Forks

Later we realized that forking storage really doesn't matter for most users. Storage is cheap and diffing across schema changes is an uncommon operation. So, column tags seem like a classic premature optimization.

Even worse, from the early stages, we had regrets. In the beginning, column tags were explicit, the user had to define them when creating a table. You would make a SQL comment in your CREATE TABLE statements and define the tag. Unfortunately, no other database has column tags so no one understood them. Column tags were a barrier to adoption because people got confused when they tried to create a table, a necessary action early in the product adoption pipeline.

create table t {
    id int primary key, /* tag 1 */
    words varchar(100), /* tag 2 */
}

So, we hid column tags from the user and made column tags randomly generated. Now, we supported standard SQL data description language (DDL) and Dolt was easier for new users to adopt.

However, another key feature of version control is history independence. No matter how you create a table, if the table is the same, the table must look the exact same in storage. What this means for column tags is, if two branches add the same column, they must receive the same column tag. That case is somewhat easy. Use the name, type, and other attributes of the column to generate a deterministic column tag. But it gets complicated. If I add a column on a branch and call it foo then rename it to bar, it must have the same tag as a column I add on another branch that I name bar from the start. We made a valiant attempt to generate the same tags for the same columns and it worked for a while. Most of the time, the column tag didn't matter. It was mostly the same when it had to be.

Dolt and Dolt users could mostly just ignore column tags, except when Dolt has to do a schema merge. Dolt relied on column tags to determine whether two columns were the same for the purpose of merge. This sometimes worked and sometimes didn't. In the times it didn't, the user would get a schema conflict with some message about columns not having the same tag. "My column definition is the same on branch a and branch b. Why will this not merge? What are tags?" Since the user never encountered tags before, he or she was understandably lost.

So, we made another attempt to remove tags from the schema merge process and only rely on them as a fallback in some corner cases where they are needed. This approach has been largely successful. This is the state we are in today. Most users will never need to know about column tags.

So Why Should I care About Column Tags?

We've made improvements to Dolt to be less reliant on column tags, but we occasionally see a customer hit an issue with a column tag conflict. This typically happens when schemas have been modified on both branches that are being merged together. The logic that assigns column tags is deterministic, but it does depend on the current state of the table – in other words, it's not truly history independent. That means that if two branches make a different set of changes to a table, even if they get to the same final schema, the column tags may end up being different.

Like we described earlier, this typically isn't a problem, since Dolt will try to match columns by name, but... it can cause a problem if two columns, even in different tables, try to use the same column tag. This is when customers will hit an error message about a "column tag conflict".

How to Resolve a Column Tag Merge Issue

The Dolt CLI provides two ways to manually change the column tags on a branch:

  • dolt schema update-tag <table> <column> <tag>
  • dolt schema copy-tags <from-branch>

The first command, dolt schema update-tag, updates a single column's tag to a manually specified value on the currently checked out branch. This is useful if you have a column tag conflict for a single column, as we'll see in an example below.

The second command, dolt schema copy-tags, is more advanced and will sync the column tags on the current branch with the column tags on the specified branch. This is useful if you have many column tag conflicts to deal with, although this is a rarer case and we recommend starting with dolt schema update-tag for a more targeted approach.

Note that both of these commands are available from the Dolt CLI, and currently require you to stop a running sql-server before you can execute them, however we're actively working on exposing update-tag through the SQL interface so that sql-server users don't need to stop their sql-server to fix any tag conflicts.

Using dolt schema update-tag

Let's take a look at an example of a column tag conflict and how to use update-tag to fix it.

In our example database, we have two tables, t and z, on two branches, main and branch1.

Here are the column tags on the main branch:

% dolt schema tags
+-------+--------+-------+
| table | column | tag   |
+-------+--------+-------+
| t     | pk     | 15476 |
| t     | c1     | 7423  |
| t     | c2     | 345   |
| z     | pk     | 497   |
+-------+--------+-------+

Here we checkout branch1 and list out its column tags:

dolt checkout branch1
Switched to branch 'branch1'

dolt schema tags               
+-------+--------+-------+
| table | column | tag   |
+-------+--------+-------+
| t     | pk     | 15476 |
| t     | c1     | 7423  |
| z     | pk     | 345   |
+-------+--------+-------+

There are two important things to notice in the output above:

  • On main, a new column, c2, has been added to table t.
  • On branch1, schema changes have caused the pk column on table z to change its tag to 345, which just happens to be the same tag assigned to the new t.c2 column.

If we try to merge the changes from main into branch1, we'll hit an error because two columns are using the same column tag. Like we briefly mentioned earlier, this can happen because column tag generation isn't truly history independent. Let's try merging main into branch1 and see what happens...

dolt merge main 
cannot create column c2 on table t, the tag 345 was already used in table z

Just like we expected, Dolt isn't able to complete this merge, because it finds that adding the new t.c2 column to branch1 would result in two columns having the same column tag, so we see the error message about tag 345 being used on multiple columns.

To manually work around this, we can look at the output of dolt schema tags and use the dolt schema update-tag command to assign one column a different tag. It's a good practice to standardize the tags based on what main has, so we'll use dolt schema update-tag on branch1 to change z.pk to match the column tag on main, 497 and then commit that change:

dolt schema update-tag z pk 497

dolt commit -am "updating column tag for z.pk"
commit hdi18sj105f8bg70ql0j1uj9aeiec7m4 (HEAD -> branch1) 
Author: Jason Fulghum <jason@dolthub.com>
Date:  Tue May 13 12:25:36 -0700 2025

        updating column tag for z.pk

Now that we've resolved the conflicting tag, we should be able to merge from main to branch1:

dolt merge main
Updating 6n6sj08s4u43rerda9v4b1mpjfn3dtki..h4iu76bs97uvojqmd60n6rpenmkmjkfc
commit 6n6sj08s4u43rerda9v4b1mpjfn3dtki (HEAD -> branch1) 
Merge: hdi18sj105f8bg70ql0j1uj9aeiec7m4 h4iu76bs97uvojqmd60n6rpenmkmjkfc
Author: Jason Fulghum <jason@dolthub.com>
Date:  Tue May 13 12:26:25 -0700 2025

        Merge branch 'main' into branch1

t | 0 
1 tables changed, 0 rows added(+), 0 rows modified(*), 0 rows deleted(-)

And sure enough, the merge is able to run successfully now after we resolved the duplicate tag.

Using dolt schema copy-tags

The dolt schema update-tag command allows you to update a single column's tag to a value that you select. This is typically the best tool to correct a column tag conflict. However, in very rare cases, we have seen where there are many column tag conflicts, and using update-tag to fix each one would be tedious. In those cases, the dolt schema copy-tags <branch> command can help by syncing all the column tags on the current checked out branch, with the column tags from another branch.

Let's take another look at the previous column tag conflict example and see how we could instead use dolt schema copy-tags to correct it.

As a reminder, here are the column tags on the main branch:

% dolt schema tags
+-------+--------+-------+
| table | column | tag   |
+-------+--------+-------+
| t     | pk     | 15476 |
| t     | c1     | 7423  |
| t     | c2     | 345   |
| z     | pk     | 497   |
+-------+--------+-------+

And here are the column tags on branch1:

% dolt schema tags               
+-------+--------+-------+
| table | column | tag   |
+-------+--------+-------+
| t     | pk     | 15476 |
| t     | c1     | 7423  |
| z     | pk     | 345   |
+-------+--------+-------+

From the previous example, we know that merging main to branch1 fails because of the column tag conflict where two different columns are both using 345 as their tag. If we checkout branch1 we can sync all the tags on branch1 with the tags on main by running:

% dolt schema copy-tags main
changing z.pk to 346
syncing t.c2 to 345
syncing z.pk to 497

2 column tags synced from branch main

Note that unlike dolt schema update-tag, dolt schema copy-tags will automatically commit the changes, so you don't need to manually run dolt commit.

Now if we look at our tags on branch1, we should see that they match main:

% dolt schema tags
+-------+--------+-------+
| table | column | tag   |
+-------+--------+-------+
| t     | pk     | 15476 |
| t     | c1     | 7423  |
| z     | pk     | 497   |
+-------+--------+-------+

And sure enough, we are now able to successfully merge main into branch1:

% dolt merge main 
Updating mr1t7jv0tf3c384pn6gogbdibt42f1df..h4iu76bs97uvojqmd60n6rpenmkmjkfc
commit mr1t7jv0tf3c384pn6gogbdibt42f1df (HEAD -> branch1) 
Merge: tp5gvvjl8r939s5vccpdji96u0q4b0ic h4iu76bs97uvojqmd60n6rpenmkmjkfc
Author: Jason Fulghum <jason@dolthub.com>
Date:  Tue May 13 12:41:46 -0700 2025

        Merge branch 'main' into branch1

t | 0 
1 tables changed, 0 rows added(+), 0 rows modified(*), 0 rows deleted(-)

Conclusion

As you can see, column tags can surprise users and get a little messy. Hopefully, you never run into a column tag issue. If you do, this article can help but also, the Dolt team is always available on Discord to help. As I opened with, I hope the next time we write about column tags is their deprecation announcement.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.