Faster Large Database Access with `mmap`

FEATURE RELEASE
4 min read

Large Dolt databases are slow to interact with on the dolt command line. Most of the slowness comes from loading required storage file indexes. Nick was frustrated by this so he implemented a solution. If you have a running dolt sql-server, we now give you an option to keep the storage indexes in memory using mmap resulting in faster start up time for the dolt CLI connecting to that server. This article explains in more detail.

What is mmap?

Have you ever tried to run the dolt command line in the directory of a large database? It's really slow.

$ du -h .dolt
1.3T	.dolt/noms/oldgen
1.3T	.dolt/noms
  0B	.dolt/temptf
  0B	.dolt/stats/.dolt/noms/oldgen
 13M	.dolt/stats/.dolt/noms
  0B	.dolt/stats/.dolt/temptf
 13M	.dolt/stats/.dolt
 13M	.dolt/stats
1.3T	.dolt
$ time dolt log -n 1
commit biah3dkofnsmjc37m6ttivp9qvtoa1nm (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Mon Oct 13 11:43:29 -0700 2025

        11,348,100 pages imported

dolt log -n 1  4.48s user 28.19s system 82% cpu 39.538 total

Most of this time is spent loading storage file indexes into memory.

This slow start up time frustrated Nick. He wanted a solution. There's not much we can do for dolt commands that execute without a running dolt sql-server. We just have to load all those indexes. But, with a long running process like the dolt sql-server we had some options.

Enter mmap, which stands for "memory-map". mmap is the process of loading the contents of a file directly into memory and holding it there, making subsequent accesses fast by avoiding disk reads. This is a perfect solution for loading and keeping storage file indexes in memory for subsequent dolt CLI invocations to use.

Nick implemented the mmap option in this Pull Request, which shipped in Dolt release 1.58.1. You can enable it in your Dolt config with the command dolt config --set "mmap_archive_indexes" true if your database is also in archive format, which is not the default yet.

It's not on by default because it has some downsides, most notably:

  1. mmap doesn't play well with Go's process scheduler, potentially causing performance issues.
  2. The code is different on *nix and Windows systems, adding complexity.

The change only really affects performance of the dolt command line for large databases that also have a running dolt sql-server and are in archive format. That's a lot of ifs. So, hiding the feature behind a configuration flag is the right choice for now given the risk/reward trade off.

Prerequisite

Your database needs to be in archive format. Archive format will be the default format of Dolt 2.0. It saves 30-50% of disk space so it's good on its own. It also enables a feature where we mmap the storage indexes of the archive files.

$ dolt archive

Or, if you're starting a new database, turn on automatic garbage collection into the archive format in your config.yaml like so:

behavior:
  auto_gc_behavior:
    enable: true
    archive_level: 1

You'll want that setting anyway to help us test for Dolt 2.0 where those settings will be the default.

Enable mmap

After getting your database in archive format, you enable mmap using the mmap_archive_indexes key in dolt config. You set that value to true using the following command.

dolt config --set "mmap_archive_indexes" true

Now, start a dolt sql-server, open another shell and interact with that database using the dolt CLI.

Performance

Let's use the aforementioned 1.3 TB Wikipedia import. By the way, I'm still working on the import. It's my white whale.

With mmap_archive_indexes off:

$ time dolt log -n 1
commit 2ior69uu1m0i9f299sjufag931hsvbuh (HEAD -> main, remotes/origin/main)
Author: timsehn <tim@dolthub.com>
Date:  Fri Sep 12 17:05:06 +0000 2025

        10,760,600 pages imported


real    0m8.109s
user    0m27.750s
sys     1m3.136s

With mmap_archive_indexes on:

$ time dolt log -n 1
commit 2ior69uu1m0i9f299sjufag931hsvbuh (HEAD -> main, remotes/origin/main)
Author: timsehn <tim@dolthub.com>
Date:  Fri Sep 12 17:05:06 +0000 2025

        10,760,600 pages imported


real    0m0.529s
user    0m0.404s
sys     0m0.584s

That's a ~20X speed up from 8s down to half a second. Pretty impressive. Note, this is even faster than the 27s on my Mac laptop at the top that is unarchived. It took more than 64GB of memory to archive this database so the mmap_archive_indexes setting was enabled on a large EC2 instance. No matter what, this setting makes large database interactions much, much faster.

By the way, archiving the 1.3 TB Wikipedia import makes it 821GB. Another big win.

Conclusion

We're testing the mmap archive indexes setting right now and considering making it the default in Dolt 2.0. In the meantime, the archive format without mmap will become the default for new databases very soon. We continue to improve Dolt's storage format transparently behind the scenes. If you want to help us test mmap archive indexes or find a bug in Dolt when you've enabled it, please come by our Discord or cut a GitHub Issue

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.