Announcing Fast Diffing and Merging of JSON Documents

August 15, 2024

3 min read

There's no one-size-fits-all approach when it comes to modeling data. Sometimes you want to use a file system. Sometimes you need a relational database. Other times you need a semi-structured file format, like JSON or flatbuffers. (Aside: we love flatbuffers and use them internally for Dolt's storage system)

But no matter the shape of your data, it should be version controlled. We think that file systems, relational data, and structured documents should all be able to live side-by-side in one place and be version controlled together. And we believe that Dolt can be that place.

Dolt is the world's first version controlled SQL database. Think "Git for Data", with all the great Git features that help ensure that you'll never lose your data or its version history: branching, diffing, merging, even rebasing. Dolt does it all.

And Dolt is good for more than just relational data: you can use it to store file systems and documents too. You can build a file system on top of Dolt.¹ And like all modern SQL engines, it supports JSON documents as a column type, and provides standard functions for traversing, filtering, and mutating JSON values.

But good version control doesn't just mean having a history, it means being able to compare histories, make branches, collaborate in parallel, and then merge those changes back together. When the changes between two branches are small, comparing those changes should be fast, regardless of how large your database is. Since the inception of Dolt, Prolly Trees have allowed us to do that for relational data. But we wanted to do that for all your data.

JSON in Dolt

Since the start of the year, Dolt has supported three way merging of JSON documents: if two branches made simultaneous changes to a JSON document stored within a Dolt table, Dolt could reconcile those changes automatically. But unlike merging tables, this process required scanning both versions of the document in their entirety to find the differences. This was still incredibly useful for preventing merge conflicts, which are the bane of users everywhere and killer of productivity. But the performance implications made it infeasible for large documents. We knew we could do better.

We recently announced that we had improved how we store JSON internally in order to speed up lookup and mutation operations without compromising read speeds. We also discussed in more detail exactly how we accomplished this. One of our goals with this migration was to bring the same fast merging capability to JSON documents.

JSON Objects Now Merge Fast

And now we have. We're pleased to say that Dolt can now store, inspect, filter, even transform and mutate documents of virtually limitless size. Small changes to a document are instant, regardless of the total document size. And diffing and merging these changes between branches scales only with the size of the changed data.

Where is this Useful?

This unlocks a ton of new potential for using Dolt to manage all sorts of data. For instance, one place where people are already using Dolt is in configuration management. This is especially common in game development. To showcase the power of Dolt, we made our own fork of the open-source game Endless Sky, which uses a Dolt database to store all of the game's data.

Sometimes this configuration data is well-structured and fits well in schemaed tables. Sometimes it's more polymorphic and is best represented in a semi-structured format like JSON. When we made our fork of Endless Sky, we used both approaches, which you can see for yourself by exploring the game's database on DoltHub.

Endless Sky is a small game, but even much larger games model their data the same way, with a mix of relational data and semi-structured documents, stored side-by-side in SQL tables. It doesn't matter how large these documents get: even JSON documents on the order of gigabytes can be tracked this way, with multiple developers modifying them concurrently. And thanks to Dolt's structural sharing, the complete version history of these documents can be tracked in a space-efficient manner.

Some of our Enterprise Support customers are game developers, and they love the value that Dolt gives to their workflow. No matter how big their data gets, Dolt is up to the challenge.

This isn't the end, either. We believe that Dolt can be the best SQL database for storing structured data, and we're not going to rest until Dolt is the best place to version control all of your important data. Is there something you'd like to see Dolt do? Drop us a line on Discord or file an issue on GitHub. We take user feedback and suggestions very seriously.

The SQLite Archive specification provides a standard way to implement a file system on top of a SQL database. Although it's probably a better idea to just track your files in Git and have your Dolt database store a Git commit hash instead.↩

Blog

JSON in Dolt

JSON Objects Now Merge Fast

Where is this Useful?

Get started with Dolt