Faster Large Database Access with `mmap`
Large Dolt databases are slow to interact with on the dolt
command line. Most of the slowness comes from loading required storage file indexes. Nick was frustrated by this so he implemented a solution. If you have a running dolt sql-server
, we now give you an option to keep the storage indexes in memory using mmap
resulting in faster start up time for the dolt
CLI connecting to that server. This article explains in more detail.
What is mmap
?
Have you ever tried to run the dolt
command line in the directory of a large database? It's really slow.
$ du -h .dolt
1.3T .dolt/noms/oldgen
1.3T .dolt/noms
0B .dolt/temptf
0B .dolt/stats/.dolt/noms/oldgen
13M .dolt/stats/.dolt/noms
0B .dolt/stats/.dolt/temptf
13M .dolt/stats/.dolt
13M .dolt/stats
1.3T .dolt
$ time dolt log -n 1
commit biah3dkofnsmjc37m6ttivp9qvtoa1nm (HEAD -> main)
Author: timsehn <tim@dolthub.com>
Date: Mon Oct 13 11:43:29 -0700 2025
11,348,100 pages imported
dolt log -n 1 4.48s user 28.19s system 82% cpu 39.538 total
Most of this time is spent loading storage file indexes into memory.
This slow start up time frustrated Nick. He wanted a solution. There's not much we can do for dolt
commands that execute without a running dolt sql-server
. We just have to load all those indexes. But, with a long running process like the dolt sql-server
we had some options.
Enter mmap
, which stands for "memory-map". mmap
is the process of loading the contents of a file directly into memory and holding it there, making subsequent accesses fast by avoiding disk reads. This is a perfect solution for loading and keeping storage file indexes in memory for subsequent dolt
CLI invocations to use.
Nick implemented the mmap
option in this Pull Request, which shipped in Dolt release 1.58.1. You can enable it in your Dolt config with the command dolt config --set "mmap_archive_indexes" true
if your database is also in archive format, which is not the default yet.
It's not on by default because it has some downsides, most notably:
mmap
doesn't play well with Go's process scheduler, potentially causing performance issues.- The code is different on *nix and Windows systems, adding complexity.
The change only really affects performance of the dolt
command line for large databases that also have a running dolt sql-server
and are in archive format. That's a lot of ifs. So, hiding the feature behind a configuration flag is the right choice for now given the risk/reward trade off.
Prerequisite
Your database needs to be in archive format. Archive format will be the default format of Dolt 2.0. It saves 30-50% of disk space so it's good on its own. It also enables a feature where we mmap
the storage indexes of the archive files.
$ dolt archive
Or, if you're starting a new database, turn on automatic garbage collection into the archive format in your config.yaml
like so:
behavior:
auto_gc_behavior:
enable: true
archive_level: 1
You'll want that setting anyway to help us test for Dolt 2.0 where those settings will be the default.
Enable mmap
After getting your database in archive format, you enable mmap
using the mmap_archive_indexes
key in dolt config
. You set that value to true
using the following command.
dolt config --set "mmap_archive_indexes" true
Now, start a dolt sql-server
, open another shell and interact with that database using the dolt
CLI.
Performance
Let's use the aforementioned 1.3 TB Wikipedia import. By the way, I'm still working on the import. It's my white whale.
With mmap_archive_indexes
off:
$ time dolt log -n 1
commit 2ior69uu1m0i9f299sjufag931hsvbuh (HEAD -> main, remotes/origin/main)
Author: timsehn <tim@dolthub.com>
Date: Fri Sep 12 17:05:06 +0000 2025
10,760,600 pages imported
real 0m8.109s
user 0m27.750s
sys 1m3.136s
With mmap_archive_indexes
on:
$ time dolt log -n 1
commit 2ior69uu1m0i9f299sjufag931hsvbuh (HEAD -> main, remotes/origin/main)
Author: timsehn <tim@dolthub.com>
Date: Fri Sep 12 17:05:06 +0000 2025
10,760,600 pages imported
real 0m0.529s
user 0m0.404s
sys 0m0.584s
That's a ~20X speed up from 8s down to half a second. Pretty impressive. Note, this is even faster than the 27s on my Mac laptop at the top that is unarchived. It took more than 64GB of memory to archive this database so the mmap_archive_indexes
setting was enabled on a large EC2 instance. No matter what, this setting makes large database interactions much, much faster.
By the way, archiving the 1.3 TB Wikipedia import makes it 821GB. Another big win.
Conclusion
We're testing the mmap archive indexes setting right now and considering making it the default in Dolt 2.0. In the meantime, the archive format without mmap will become the default for new databases very soon. We continue to improve Dolt's storage format transparently behind the scenes. If you want to help us test mmap archive indexes or find a bug in Dolt when you've enabled it, please come by our Discord or cut a GitHub Issue