How to fix bugs in 24 hours or less
Here at DoltHub, we made a pledge to fix Dolt correctness bugs in 24 hours or less. We're proud of this pledge and we work hard to uphold it. But how is this possible?
Response times to issues in the software industry vary wildly. It's not uncommon to never get a response, let alone a fix for your issue. Not at DoltHub. Find a bug in Dolt and we'll fix it in 24 hours or less. This time includes us releasing a new version. It's kind of DoltHub's superpower. This blog explains how we do it.
I don't believe you...
Here is our GitHub issue queue. Anything that is cut by "not us" gets automatically labeled "Customer Issue". We also manually add that label to issues we cut on behalf of customers, usually after a discussion on our Discord.
Now, let's look at the closed issues. It's a bit tough to see given GitHub's date precision but here are some recent examples examples of fast turnaround.
Description: Datetime conversion issue using a case insensitive collation
Bug: https://github.com/dolthub/dolt/issues/7781
Fix: https://github.com/dolthub/go-mysql-server/pull/2482
Release: https://github.com/Homebrew/homebrew-core/pull/170482
Elapsed time: ~1 day
Description: Panic when using a BEFORE INSERT
in a trigger
Bug: https://github.com/dolthub/dolt/issues/7720
Fix 1: https://github.com/dolthub/go-mysql-server/pull/2442
Fix 2: https://github.com/dolthub/go-mysql-server/pull/2446
Release: https://github.com/Homebrew/homebrew-core/pull/169481
Elapsed time: ~1 day
As you can see, we're responsive, quickly fix the issue, and cut a Dolt release so the customer can easily get access to the fix.
OK. I believe you. Tell me how.
Identify Quickly
It goes without saying but in order to fix issues quickly, you need to pay attention to incoming issues. At DoltHub, someone owns the bug queue, currently James. This person is responsible for following incoming issues and responding when one comes in.
I back him up and respond if I see an issue first. One of my first tasks every morning is checking the issue queue. So, usually at worst, you're getting eight hours until your first response.
Prioritize
It's crucial to note the difference between a bug and a feature. Our 24-hour pledge applies to bugs, not features. We fix bugs fast. Features don't get the same treatment. Dolt has a well defined set of specifications. If it works in MySQL and not in Dolt, it's a bug. If it works in Git and not in Dolt, it's a bug. This really helps prioritize issues quickly because we don't have ambiguity around "Is this an issue?"
After we know it's a bug, someone actually has to do the work. We have a bug queue owner. This person is first line of defense. If a bug comes in, that is their number one priority. If the stack overflows, they have the ability to draft others to the bug fix cause. Very rarely do incoming bugs require more than one software engineer's attention, which brings us to our next point.
Code Quality
Your code has to be good enough that you don't get a large flow of bugs. Our 24-hour pledge is new in 2024. Dolt went 1.0 in May 2023. It took another year or so to improve before we were in a position to turn bug fixes around quickly consistently.
Test Quality
When you do fix bugs, the code must not be a "house of cards" to fix. Bug fixes can't cause different bugs. Dolt has a battery of automated regression tests to prevent regressions.
In build, we have:
- Engine Tests - ~42,000 Golang tests to test SQL engine behavior.
- Bats Tests - ~2,500
bash
script tests to test the Dolt CLI and server - Client tests - ~100 additional tests written in various languages, testing client's connectivity to a Dolt server.
- SQL Correctness Tests - Also know as
sqllogictest
. A suite of ~6M SQL queries verified against what MySQL returns.
Additionally, nightly and on release, we run:
- Performance Tests - Sysbench and TPC-C compared against MySQL to detect performance regressions.
- Fuzzer - Continuously merge randomly generated databases to test merge.
As you can see, Dolt is extremely well tested. As mentioned, behavior is well specified which lends itself well to test.
Release Automation
Finally, once we do fix an issue, we want to get it into the customer's hand in an official release. Often, once the change makes it into main
we'll let the customer know if he or she wants the fix urgently, he or she can build from source.
That said, releasing Dolt for a DoltHub engineer is picking a version number and running a GitHub action. The build completes in less than an hour. So, often, a customer bug fix triggers a release.
Why it matters
Getting bug fixes in the hands of customer quickly is much appreciated. Customers love it. We get a lot of "Wows". That is probably reason enough.
However, there's a more specific reason to fix issues quickly related to building an open source SQL database. Dolt is meant to be run at the core of your application stack. Trust is a key deciding factor in adoption. If your database is down, your application is down. Competing products are 30 (MySQL) and 40 (Postgres) years old, respectively. Those years have built hard won trust. Dolt's version control features make it a compelling alternative to many MySQL and Postgres use cases but it's only 5 years old, a baby in database years. We must build trust in the technology one customer fix at a time.
Conclusion
Found a bug in Dolt? We'll fix it in 24 hours or less. Please cut an issue. Want to talk about Dolt's promise and how we achieve it? Come by our Discord and we're happy to discuss.