Failing 100 Real World Postgres Dumps
We're continuing to make progress on DoltgreSQL, which is a version of Dolt built to be a drop-in replacement for PostgreSQL. Dolt is a drop-in replacement for MySQL that is built from the ground up with Git-influenced version control features. This means that you can use branch, merge, diff, and more with your data and schema. As we're aiming to be a complete drop-in replacement for Postgres, we want to ensure that we're focusing our efforts on features that are critical to the highest number of users. This has led to downloading over 100 Postgres SQL dumps, and adding them to our test suite!
Why download 100 SQL dumps?
Building a version-controlled database comes with a unique set of challenges, and we've had to tackle many of those while building Dolt. When starting work on Doltgres, we already had a large number of problems solved since we can reuse Dolt's storage engine. However, we had a new challenge: determining what to actually work on first. It's a straightforward question, and the answer is just as straightforward: work on what people actually use.
Being straightforward doesn't mean that it's a useful answer though, as it just creates another question: what are people actually using?
I would assume that not many people are using CREATE OPERATOR FAMILY
; however, that's just an assumption.
Without much information, we decided to work on features that we knew were used heavily in Dolt, but considering Postgres is used quite differently than MySQL, such an approach didn't help for all of the features that Postgres contains and MySQL doesn't.
Eventually, we had a prospective Doltgres user reach out to us, and we asked for them to share their schema.
This showed us that some of our assumptions didn't hold.
For example, we didn't think CREATE TYPE
would be widely used, but it was in the first schema we received, so we promptly got to work on it.
Since then, we've received more schemas, and some of them have "just worked" due to the progress that we've made on Doltgres. We've also added a ton more metrics to our CI to track progression, such as the official Postgres regression tests. However, even with all of this progress being made, we still weren't sure if we were focusing in the right areas. Since we knew that SQL dumps are made by actual users for their actual use cases, we decided to download over 100 public dumps from GitHub to get as close as possible to a real-world compatibility scenario.
Only 3 successes
We wanted these dumps to become a part of our testing framework, so we built a scaffold that allows us to load all of the dumps into Doltgres and report whether it sees any errors. Shockingly, of the >100 dumps added, only 3 reported zero errors. We definitely expected failures, but this was pretty much a failure of every single one. Thankfully, this gave us a lot of data on what to prioritize, as users should at least be able to load their Postgres dumps into Doltgres. Some of these failures were expected, such as support for extensions. My last blog talked about how we're adding support for native Postgres extensions, and this data lets us get a real-world spread of which ones are the most commonly used.
Some of these failures were a bit unexpected, as we had already done the primary implementation work, but ran into syntactical differences.
For example, CREATE SEQUENCE
allows for setting the owning column of a sequence, and we already had this capability in Doltgres.
However, what we didn't have was the ability to change that owner via ALTER SEQUENCE
, which was a very simple task, albeit one we had not done (fixing this added another 6 successes!).
Other errors cause cascading errors, such that a file that reports thousands of errors really only has one.
One great example of that being CREATE DATABASE ... WITH TEMPLATE
, where we don't yet support the TEMPLATE
portion.
This causes the database creation to fail, and therefore every statement after it to also fail since they all rely on the database existing.
We're actively working through these since they're the best litmus test we currently have for real-world compatibility. Some of these issues are quirks of how pg_dump chooses to backup and restore data, and therefore don't quite reflect server usage (it's unlikely that users are commonly adding comments to extensions). However, in our quest to be a drop-in replacement, it is our mission to fix all of these errors that users may encounter.
We accept your dumps!
If you have a Postgres dump that won't load in Doltgres, then you should definitely reach out to us so we can ensure we're prioritizing what's important for you. There are a few different ways to do this.
First, you can file an issue on GitHub, stating the query that is failing and the error message that you're receiving. During the import, it may report an error, but it may not report the query that the error originated from, so this is how you can find that query:
- Stop your running Doltgres server
- Add the line
log_level: debug
to yourconfig.yaml
file (example config) - Restart your Doltgres server and try to import the dump again
- In the Doltgres server output, you should see an "error running query" message, with the error contained within
error="..."
- Take the preceding query (in
query="..."
), alongside the error message, and include those in the GitHub issue.
You can also get in touch with us through Discord, where you could even send us your dump so we can directly ensure that it works in its entirety. As stated earlier, we've had several users do this for us and it has helped both us and them.
Lastly, feel free to follow us on Twitter/X to stay up-to-date on Doltgres and Dolt! The power of a version-controlled database is something that must be experienced to be believed, so you should definitely try these out! As an engineer, I'm continually amazed at the ways that users are able to take advantage of these projects, as they're truly a new paradigm in databases. Thank you for reading my blog post!