What Should Happen when there is a Schema Conflict?

TECHNICAL
7 min read

Lately, I've been writing a lot about three-way merge. Dolt is the first and only SQL database to provide three-way merge of both schema and data. Being first means we get to define how three-way works for SQL databases. Defining behavior can sometimes be tricky.

As you can see from a number of three-way merge examples, most of the time three-way merges of databases produce obvious and intuitive results. However, there are a number of non-obvious edge cases, including merges that generate schema conflicts.

Schema Conflicts

These edge cases are so non-obvious, I had an error based on an old state of Dolt. I stated you could resolve schema conflicts but currently you cannot. Why? This blog will explain.

What is a Schema Conflict?

A schema conflict is the result of two schemas Dolt cannot automatically merge. There's a complex set of rules on which schema changes can merge and which schema changes generate a conflict. The rules are intuitive and are probably understood by a few examples.

  1. If one side of the merge changes a type of a column from an int to a float and the other side changes the same column from an int to a varchar(10), this is a schema conflict.
  2. If one side of the merge adds a NOT NULL constraint to a column and the other side adds a default, this is a schema conflict.
  3. If one side changes the type of a column and the other side deletes the column, this is a schema conflict.
  4. If one side adds a column of type int and the other side adds a column of the same name but of type varchar(10), this is a schema conflict.

As you can see, schema conflicts generally happen if two branches modify the same column in a different way.

A good detailed example complete with SQL can be found in my three-way merge examples blog.

What happens now when there is a Schema Conflict?

When we first introduced schema conflicts, in the event of a conflict the user could view the conflict using the dolt_schema_conflicts table. After that, the user had the ability choose "ours" or "theirs" using dolt_conflicts_resolve() to resolve the schema conflict.

But dolt_conflicts_resolve() didn't behave as one might expect in the schema conflict case. In the data conflict case, dolt_conflicts_resolve() takes the "ours" or "theirs" value for conflicting rows but any non-conflicting rows are merged cleanly. In the schema conflicts case, dolt_conflicts_resolve() would take the schema and data from the specified side of the merge. No data merge would happen. For instance, if you picked "ours", your table would look exactly like it did before the merge, ignoring any data changes that happened on the branch you were trying to merge. This was non-obvious to users. Thanks Dolt, you didn't really "merge" anything. We created an issue and disallowed the use of dolt_conflicts_resolve() in the schema conflicts case. We throw the following error when dolt_conflicts_resolve() is called.

three_way_merge/main> call dolt_conflicts_resolve('--ours', 'employees');
Error 1105 (HY000): Unable to automatically resolve schema conflicts since data changes may not have been fully merged yet.
To continue, abort this merge (dolt merge --abort) then apply ALTER TABLE statements to one side of this merge to get the two schemas in sync with the desired schema, then rerun the merge.
To track resolution of this limitation, follow https://github.com/dolthub/dolt/issues/6616

We throw up our hands! The user has no option to resolve a merge with a schema conflict. Their only option right now is to abort the merge. We tell them to manually resolve the schema conflicts and try the merge again.

Why is this hard?

So, the reason this is hard is because schema merge is a step in the merge process. When everything merges cleanly, a Dolt merge works like this:

  1. User initiates a merge
  2. Schema merge
  3. Data merge
  4. Commit

This all happens cleanly and without user input. The result of the merge is intuitive. Your tables would look like you would expect them to.

In the event of a data conflict and constraint violation, a Dolt merge works like this:

  1. User initiates a merge
  2. Schema merge
  3. Data merge
  4. Resolve any data conflicts
  5. Resolve any constraint violations
  6. Commit

User input is required but importantly, it happens at the end when all the automatic bits of Dolt merge are completed. One resolution may be to delete the table being merged that generated conflicts, and Dolt is fine with that.

First and foremost, Dolt merge right now only has one stage to resolve conflicts. Dolt merge is not interactive. Some interactive workflow is required to make schema conflict resolution work. After you resolve your schema conflicts, you could imagine running something like dolt_merge('--continue') and Dolt would attempt a data merge based on the schema conflict resolution you provided, failing if the merge could not be further processed. This is work but it's not the hard part.

The hard part is, with a schema conflict, the data merge is dependent on the users resolution of the schema conflict. It may be unclear how to process the data merge after the user resolves the schema conflict. What if the user resolves the conflict with a table schema that Dolt has no idea how to map the data onto? I'll walk through a concrete, illustrative example to better exhibit the difficulty.

Example

Let's say you have a simple weights table with an integer id primary key and second integer column called weight. This table has two values in it.

schema_conflicts/main> create table weights(id int primary key, weight int);
schema_conflicts/main*> insert into weights values (0,1), (1,2);
Empty set (0.01 sec)

schema_conflicts/main*> call dolt_commit('-Am', 'Merge base');
+----------------------------------+
| hash                             |
+----------------------------------+
| g0r60aqn09igdcjt0o1gnm3gr0ch0rp2 |
+----------------------------------+
1 row in set (0.02 sec)

I make an add-units branch where I will change the type of the weight column to varchar(30) and append some units to the integers making them strings. On main, I will change the column type to float and add a decimal place. First, the add-units branch:

schema_conflicts/main> call dolt_checkout('-b', 'add-units');
+--------+--------------------------------+
| status | message                        |
+--------+--------------------------------+
| 0      | Switched to branch 'add-units' |
+--------+--------------------------------+
1 row in set (0.01 sec)

schema_conflicts/add-units> alter table weights modify column weight varchar(30);
schema_conflicts/add-units*> update weights set weight=CONCAT(weight, ' pounds') where id=0;
Empty set (0.01 sec)

schema_conflicts/add-units*> update weights set weight=CONCAT(weight, ' stones') where id=1;
Empty set (0.01 sec)

schema_conflicts/add-units*> \diff
diff --dolt a/weights b/weights
--- a/weights
+++ b/weights
 CREATE TABLE `weights` (
   `id` int NOT NULL,
-  `weight` int,
+  `weight` varchar(30),
   PRIMARY KEY (`id`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
+---+----+----------+
|   | id | weight   |
+---+----+----------+
| < | 0  | 1        |
| > | 0  | 1 pounds |
| < | 1  | 2        |
| > | 1  | 2 stones |
+---+----+----------+
schema_conflicts/add-units*> call dolt_commit('-am', 'Added units to weight');
+----------------------------------+
| hash                             |
+----------------------------------+
| 3h9t3pn2bb41qorq9239n9f4cfhu7cge |
+----------------------------------+
1 row in set (0.01 sec)

Now, on main:

schema_conflicts/main> alter table weights modify column weight float;
schema_conflicts/main*> update weights set weight=weight + 0.1;
Empty set (0.01 sec)

schema_conflicts/main> \diff
diff --dolt a/weights b/weights
--- a/weights
+++ b/weights
 CREATE TABLE `weights` (
   `id` int NOT NULL,
+  `weight` float,
-  `weight` int,
   PRIMARY KEY (`id`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
+---+----+--------+
|   | id | weight |
+---+----+--------+
| < | 0  | 1      |
| > | 0  | 1.1    |
| < | 1  | 2      |
| > | 1  | 2.1    |
+---+----+--------+
schema_conflicts/main*> call dolt_commit('-am', 'More precision');
+----------------------------------+
| hash                             |
+----------------------------------+
| e8mudluu2dr6mlhpke2dip83h4gm9bkh |
+----------------------------------+
1 row in set (0.01 sec)

And now the merge, which will schema conflict. We must remember to turn off @@autocommit so we can view the merge conflicts – otherwise they will be automatically rolled back by the @@autocommit behavior.

schema_conflicts/main> set @@autocommit=0;
schema_conflicts/main> call dolt_merge('add-units');
+------+--------------+-----------+-----------------+
| hash | fast_forward | conflicts | message         |
+------+--------------+-----------+-----------------+
|      | 0            | 1         | conflicts found |
+------+--------------+-----------+-----------------+
1 row in set (0.00 sec)

schema_conflicts/main*> select * from dolt_status;
+------------+--------+-----------------+
| table_name | staged | status          |
+------------+--------+-----------------+
| weights    | 0      | schema conflict |
+------------+--------+-----------------+
1 row in set (0.00 sec)

schema_conflicts/main*> select * from dolt_schema_conflicts;
+------------+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+----------------------------------------------------------------------------+
| table_name | base_schema                                                       | our_schema                                                        | their_schema                                                      | description                                                                |
+------------+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+----------------------------------------------------------------------------+
| weights    | CREATE TABLE `weights` (                                          | CREATE TABLE `weights` (                                          | CREATE TABLE `weights` (                                          | different column definitions for our column weight and their column weight |
|            |   `id` int NOT NULL,                                              |   `id` int NOT NULL,                                              |   `id` int NOT NULL,                                              |                                                                            |
|            |   `weight` int,                                                   |   `weight` float,                                                 |   `weight` varchar(30),                                           |                                                                            |
|            |   PRIMARY KEY (`id`)                                              |   PRIMARY KEY (`id`)                                              |   PRIMARY KEY (`id`)                                              |                                                                            |
|            | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin; | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin; | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin; |                                                                            |
+------------+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+----------------------------------------------------------------------------+

How can this schema conflict be resolved? One reasonable merge would be to change the type of the weight column on main to varchar(30) creating a data conflict on both rows. Then, we would resolve the data conflict to 1.1 pounds and 2.1 stones. Another resolution could be to keep the float type in the weight column and then ignore the changes made on the add-units branch. A third resolution might be revert the type change to the weight column on the add-units branch, add a second column of type varchar(10) called units, put pounds and stones in that column, and then cleanly data merge the two tables.

The challenge is, how can we allow the user enough user interface to modify both sides of the merge such that all those outcomes are possible? Right now, we say abort the merge, make any modifications you need to to get a clean schema merge, and then try again.

Conclusion

Schema conflicts get complicated quickly. If you have any ideas on what you'd like to see for a schema merge and schema conflicts workflow, we'd love to hear them. Come by our Discord and let us know.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.