How to Fix a Bug in Dolt's SQL Engine
Dolt is an open-source SQL database that has Git-like functionality, including branch, merge, clone, push and pull. As we attract more and more users with various use cases and ways of integrating Dolt into their existing workflows and systems, it's not rare for Dolt and its SQL engine to need a little bit of work to support a new client library or upstream dependency. This blog post is a short tutorial about how to make and test changes in Dolt and its SQL engine, and how to submit those changes for inclusion in Dolt itself.
Prerequisites
Before we get started, you should know that:
-
Dolt is largely written in golang and you should install the golang sdk to work with it.
-
Dolt's SCM is git and you will need to download and install it to fetch the source code and create pull requests.
-
Dolt's source code is hosted on github and you will want to create an account and follow the instructions for authenticating to GitHub with SSH if you don't already have one.
An Example Code Change
For the purposes of this blog post, we're going to work through fixing and testing a recent bug that was filed against Dolt. In the bug report, a user is using RMariaDB, a database connector library for R, and is seeing some values of certain timestamps fail to insert. The server is returning an error:
time = data.frame(time = as.POSIXct("2014-01-01 19:00:00"))
try(dbWriteTable(dolt_conn, "time", time, overwrite = TRUE))
#> Error : value "2014-1-2" can't be converted to time.Time [1105]
Let's start by checking out the relevant projects, building dolt, and seeing if we can reproduce the error based on the error message and the bug report.
Getting the Source Code
Dolt's primary repository is at dolthub/dolt on GitHub. The majority of the logic for SQL expression handling lives in a separate project, dolthub/go-mysql-server. We're going to need to work on both of these projects. Our first step is to fork the projects so that we have our own fork to push to and submit PRs from. On both repository pages on GitHub, while signed in, click "Fork" in the upper right corner of the repository page. You will get your own copy of the repository:
Then we clone those forks to a workspace on our computer:
~ $ mkdir dolt_workspace
~/dolt_workspace $ cd dolt_workspace
~/dolt_workspace $ git clone git@github.com:reltuk/dolt.git
~/dolt_workspace $ git clone git@github.com:reltuk/go-mysql-server.git
By default, the dolt repository builds and links against a pinned version of
go-mysql-server, so to develop the two together we need to point our checkout
to our local go-mysql-server. The go module root for dolt lives in //go
,
while the go module root for go-mysql-server lives in //
. We will also take
the chance to create a local git branch for our changes:
~/dolt_workspace $ cd go-mysql-server
~/dolt_workspace/go-mysql-server $ git checkout -b aaron/fix-2088-time-convert-bug
~/dolt_workspace/go-mysql-server $ cd ../dolt/go
~/dolt_workspace/dolt/go $ git checkout -b aaron/fix-2088-time-convert-bug
~/dolt_workspace/dolt/go $ go mod edit -replace github.com/dolthub/go-mysql-server=../../go-mysql-server
It's reasonable at this point to check that everything is setup and working as expected by running dolt's unit tests (some of these packages unfortunately take a bit of time for the tests to run, and this can take a few minutes):
~/dolt_workspace/dolt/go $ go test ./...
? github.com/dolthub/dolt/go/cmd/dolt [no test files]
ok github.com/dolthub/dolt/go/cmd/dolt/cli 0.843s
ok github.com/dolthub/dolt/go/cmd/dolt/commands 2.547s
? github.com/dolthub/dolt/go/cmd/dolt/commands/cnfcmds [no test files]
? github.com/dolthub/dolt/go/cmd/dolt/commands/credcmds [no test files]
? github.com/dolthub/dolt/go/cmd/dolt/commands/cvcmds [no test files]
? github.com/dolthub/dolt/go/cmd/dolt/commands/indexcmds [no test files]
? github.com/dolthub/dolt/go/cmd/dolt/commands/schcmds [no test files]
ok github.com/dolthub/dolt/go/cmd/dolt/commands/sqlserver 1.198s
? github.com/dolthub/dolt/go/cmd/dolt/commands/tblcmds [no test files]
ok github.com/dolthub/dolt/go/cmd/dolt/errhand 0.300s
? github.com/dolthub/dolt/go/cmd/git-dolt [no test files]
? github.com/dolthub/dolt/go/cmd/git-dolt/commands [no test files]
ok github.com/dolthub/dolt/go/cmd/git-dolt/config 0.206s
? github.com/dolthub/dolt/go/cmd/git-dolt/doltops [no test files]
? github.com/dolthub/dolt/go/cmd/git-dolt/env [no test files]
ok github.com/dolthub/dolt/go/cmd/git-dolt/utils 0.311s
? github.com/dolthub/dolt/go/cmd/git-dolt-smudge [no test files]
.
.
.
? github.com/dolthub/dolt/go/store/util/outputpager [no test files]
? github.com/dolthub/dolt/go/store/util/profile [no test files]
? github.com/dolthub/dolt/go/store/util/progressreader [no test files]
ok github.com/dolthub/dolt/go/store/util/random 0.294s
ok github.com/dolthub/dolt/go/store/util/sizecache 0.244s
? github.com/dolthub/dolt/go/store/util/status [no test files]
? github.com/dolthub/dolt/go/store/util/tempfiles [no test files]
? github.com/dolthub/dolt/go/store/util/test [no test files]
ok github.com/dolthub/dolt/go/store/util/verbose 0.312s
ok github.com/dolthub/dolt/go/store/util/writers 0.292s
ok github.com/dolthub/dolt/go/store/valuefile 0.471s
? github.com/dolthub/dolt/go/utils/3pdeps [no test files]
? github.com/dolthub/dolt/go/utils/copyrightshdrs [no test files]
? github.com/dolthub/dolt/go/utils/remotesrv [no test files]
Finally, let's install dolt from source and see if we can make progress on
reproducing the bug. Here, we just attempt to insert the same string that
appears in the error message into a timestamp
column:
~/dolt_workspace/dolt/go $ go install ./cmd/dolt
~/dolt_workspace/dolt/go $ cd ../../
~/dolt_workspace $ mkdir test_db
~/dolt_workspace $ cd test_db
~/dolt_workspace/test_db $ dolt init
Successfully initialized dolt data repository.
~/dolt_workspace/test_db $ dolt sql
# Welcome to the DoltSQL shell.
# Statements must be terminated with ';'.
# "exit" or "quit" (or Ctrl-D) to exit.
test_db> create table has_timestamp (t timestamp);
test_db> insert into has_timestamp values ('2014-1-2');
value "2014-1-2" can't be converted to time.Time
Brilliant, we've managed to reproduce the bug. Let's track down where that error is coming from.
Tracking Down the Error
Hopefully this isn't too hard:
$ ~/dolt_workspace $ grep -R "can't be converted to time.Time"
aaronson@Aarons-MacBook-Pro dolt_workspace % grep -R "can\'t be converted to time.Time" .
./go-mysql-server/benchmark/metadata.go: // TODO: value "1996-01-02" can't be converted to time.Time
./go-mysql-server/benchmark/metadata.go: // TODO: value "1996-03-13" can't be converted to time.Time
./go-mysql-server/benchmark/metadata.go: // TODO: value "1996-03-13" can't be converted to time.Time
./go-mysql-server/benchmark/metadata.go: // TODO: value "1996-03-13" can't be converted to time.Time
./go-mysql-server/sql/datetimetype.go: ErrConvertingToTime = errors.NewKind("value %q can't be converted to time.Time")
so we can open go-mysql-server/sql/datetimetype.go
in an editor and look for
uses of ErrConvertingToTime
. We quickly find the only place it is returned:
func (t datetimeType) ConvertWithoutRangeCheck(v interface{}) (time.Time, error) {
var res time.Time
switch value := v.(type) {
case string:
if value == zeroDateStr || value == zeroTimestampDatetimeStr {
return zeroTime, nil
}
parsed := false
for _, fmt := range TimestampDatetimeLayouts {
if t, err := time.Parse(fmt, value); err == nil {
res = t.UTC()
parsed = true
break
}
}
if !parsed {
return zeroTime, ErrConvertingToTime.New(v)
}
and going to the definition of TimestampDatetimeLayouts
, we find:
// TimestampDatetimeLayouts hold extra timestamps allowed for parsing. It does
// not have all the layouts supported by mysql. Missing are two digit year
// versions of common cases and dates that use non common separators.
//
// https://github.com/MariaDB/server/blob/mysql-5.5.36/sql-common/my_time.c#L124
TimestampDatetimeLayouts = []string{
"2006-01-02 15:04:05.999999",
"2006-01-02",
"2006-1-2 15:4:5.999999",
time.RFC3339,
time.RFC3339Nano,
"2006-01-02T15:04:05",
"20060102150405",
"20060102",
"2006/01/02",
"2006-01-02 15:04:05.999999999 -0700 MST", // represents standard Time.time.UTC()
}
Notably present is a "2006-1-2 15:4:5.999999"
, which looks like it might
handle single-digit months and days in a date that comes with a timestamp. But
there is no corresponding "2006-1-2"
in that list, which would possibly
handle the case we're trying to fix. Let's start by looking for a unit test we
could write that would demonstrate the bug in this ConvertWithoutRangeCheck
function.
We open up datetimetype_test.go
, and look through the top level functions:
func TestDatetimeCompare(t *testing.T) {
func TestDatetimeCreate(t *testing.T) {
func TestDatetimeCreateInvalidBaseTypes(t *testing.T) {
func TestDatetimeConvert(t *testing.T) {
func TestDatetimeString(t *testing.T) {
for our purposes, TestDatetimeConvert
looks the most promising, and digging
into the implementation, it's a table driven
test with a pretty
straight forward structure of giving an input to Convert
and asserting a
matching output and error value. We can just add tests for each of the three
handled time types:
diff --git a/sql/datetimetype_test.go b/sql/datetimetype_test.go
index d9551262..a3d385a9 100644
--- a/sql/datetimetype_test.go
+++ b/sql/datetimetype_test.go
@@ -150,6 +150,9 @@ func TestDatetimeConvert(t *testing.T) {
{Date, time.Date(2012, 12, 12, 12, 12, 12, 12, time.UTC),
time.Date(2012, 12, 12, 0, 0, 0, 0, time.UTC), false},
{Date, "2010-06-03", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
+ {Date, "2010-6-3", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
+ {Date, "2010-6-03", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
+ {Date, "2010-06-3", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
{Date, "2010-06-03 12:12:12", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
{Date, "2010-06-03 12:12:12.000012", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
{Date, "2010-06-03T12:12:12Z", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
@@ -161,6 +164,9 @@ func TestDatetimeConvert(t *testing.T) {
{Datetime, time.Date(2012, 12, 12, 12, 12, 12, 12, time.UTC),
time.Date(2012, 12, 12, 12, 12, 12, 12, time.UTC), false},
{Datetime, "2010-06-03", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
+ {Datetime, "2010-6-3", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
+ {Datetime, "2010-06-3", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
+ {Datetime, "2010-6-03", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
{Datetime, "2010-06-03 12:12:12", time.Date(2010, 6, 3, 12, 12, 12, 0, time.UTC), false},
{Datetime, "2010-06-03 12:12:12.000012", time.Date(2010, 6, 3, 12, 12, 12, 12000, time.UTC), false},
{Datetime, "2010-06-03T12:12:12Z", time.Date(2010, 6, 3, 12, 12, 12, 0, time.UTC), false},
@@ -177,6 +183,9 @@ func TestDatetimeConvert(t *testing.T) {
{Timestamp, time.Date(2012, 12, 12, 12, 12, 12, 12, time.UTC),
time.Date(2012, 12, 12, 12, 12, 12, 12, time.UTC), false},
{Timestamp, "2010-06-03", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
+ {Timestamp, "2010-6-3", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
+ {Timestamp, "2010-6-03", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
+ {Timestamp, "2010-06-3", time.Date(2010, 6, 3, 0, 0, 0, 0, time.UTC), false},
{Timestamp, "2010-06-03 12:12:12", time.Date(2010, 6, 3, 12, 12, 12, 0, time.UTC), false},
{Timestamp, "2010-06-03 12:12:12.000012", time.Date(2010, 6, 3, 12, 12, 12, 12000, time.UTC), false},
{Timestamp, "2010-06-03T12:12:12Z", time.Date(2010, 6, 3, 12, 12, 12, 0, time.UTC), false},
and then try running the tests:
dolt_workspace/go-mysql-server/sql $ go test .
aaronson@Aarons-MacBook-Pro sql % go test . -count 1
--- FAIL: TestDatetimeConvert (0.00s)
--- FAIL: TestDatetimeConvert/DATE_2010-6-3_2010-06-03_00:00:00_+0000_UTC (0.00s)
datetimetype_test.go:255:
Error Trace: datetimetype_test.go:255
Error: Received unexpected error:
value "2010-6-3" can't be converted to time.Time
...
--- FAIL: TestDatetimeConvert/DATE_2010-6-03_2010-06-03_00:00:00_+0000_UTC (0.00s)
datetimetype_test.go:255:
Error Trace: datetimetype_test.go:255
Error: Received unexpected error:
value "2010-6-03" can't be converted to time.Time
...
...
FAIL
FAIL github.com/dolthub/go-mysql-server/sql 0.181s
FAIL
Perfect. Let's go ahead and see if we can fix it by adding the new pattern in
TimestampDatetimeLayouts
.
diff --git a/sql/datetimetype.go b/sql/datetimetype.go
index 2f00e324..f50b82af 100644
--- a/sql/datetimetype.go
+++ b/sql/datetimetype.go
@@ -62,6 +62,7 @@ var (
TimestampDatetimeLayouts = []string{
"2006-01-02 15:04:05.999999",
"2006-01-02",
+ "2006-1-2",
"2006-1-2 15:4:5.999999",
time.RFC3339,
time.RFC3339Nano,
and test again:
dolt_workspace/go-mysql-server/sql $ go test .
ok github.com/dolthub/go-mysql-server/sql 0.327s
Seems to work. So now we have a unit test demonstrating the problem and we've
made that test pass. There's one other type of test we can add if we want to.
The original bug was reported in the context of an insert, and it would be
great to codify the fact that INSERT INTO table_with_datetime (timestamp) VALUES ("2006-6-3")
should always work. For that kind of test,
go-mysql-server
has a directory of integration tests that lives at
//enginetest
. If we look around a little, we will find a file called
insert_queries.go
and we can look through that file for places where we're
doing integration tests of inserts into datetime columns. Eventually, we might
come with something like:
diff --git a/enginetest/insert_queries.go b/enginetest/insert_queries.go
index 5b88fad5..a4daf1f3 100644
--- a/enginetest/insert_queries.go
+++ b/enginetest/insert_queries.go
@@ -190,6 +190,12 @@ var InsertQueries = []WriteQueryTest{
SelectQuery: "SELECT * FROM typestable WHERE id = 999;",
ExpectedSelect: []sql.Row{{int64(999), nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil}},
},
+ {
+ WriteQuery: `INSERT INTO typestable (id, ti, da) VALUES (999, '2021-09-1', '2021-9-01');`,
+ ExpectedWriteResult: []sql.Row{{sql.NewOkResult(1)}},
+ SelectQuery: "SELECT id, ti, da FROM typestable WHERE id = 999;",
+ ExpectedSelect: []sql.Row{{int64(999), sql.MustConvert(sql.Timestamp.Convert("2021-09-01")), sql.MustConvert(sql.Date.Convert("2021-09-01"))}},
+ },
{
WriteQuery: `INSERT INTO typestable SET id=999, i8=null, i16=null, i32=null, i64=null, u8=null, u16=null, u32=null, u64=null,
f32=null, f64=null, ti=null, da=null, te=null, bo=null, js=null, bl=null;`,
which hopefully captures the spirit of the behavior we want to verify. We can also verify that this fixes the behavior all the way at the dolt layer:
~/dolt_workspace/dolt/go $ go install ./cmd/dolt
~/dolt_workspace/dolt/go $ cd ../../test_db
~/dolt_workspace/test_db $ dolt sql
# Welcome to the DoltSQL shell.
# Statements must be terminated with ';'.
# "exit" or "quit" (or Ctrl-D) to exit.
test_db> insert into has_timestamp values ('2014-1-2');
Query OK, 1 row affected
Committing and Opening a Pull Request
At this point, the bug is fixed in Dolt's SQL engine. We need to submit a pull
request to the upstream project (dolt/go-mysql-server
) so that the fix can
land in Dolt's pinned version of the dependency. Let's go ahead and do that.
~/dolt_workspace/go-mysql-server $ git commit -am 'Fix string to datetime convert of YYYY-M-D.'
[aaron/fix-2088-time-convert-bug 81dce573] Fix string to datetime convert of YYYY-M-D.
3 files changed, 16 insertions(+)
~/dolt_workspace/go-mysql-server $ git push origin aaron/fix-2088-time-convert-bug:aaron/fix-2088-time-convert-bug
Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (7/7), 803 bytes | 803.00 KiB/s, done.
Total 7 (delta 6), reused 4 (delta 4)
remote: Resolving deltas: 100% (6/6), completed with 6 local objects.
remote:
remote: Create a pull request for 'aaron/fix-2088-time-convert-bug' on GitHub by visiting:
remote: https://github.com/reltuk/go-mysql-server/pull/new/aaron/fix-2088-time-convert-bug
remote:
To github.com:reltuk/go-mysql-server.git
* [new branch] aaron/fix-2088-time-convert-bug -> aaron/fix-2088-time-convert-bug
I can open the provided link in my browser
(https://github.com/reltuk/go-mysql-server/pull/new/aaron/fix-2088-time-convert-bug)
and open the PR by clicking the Create pull request
button:
Submitting the PR will get some automated tests run against the branch and will
notify the project maintainers that someone has some changes they would like to
land in go-mysql-server
. If there are failures in the automated tests, the
maintainers will let you know what steps you need to take to fix things up.
Otherwise, the maintainers will provide feedback, or, if everything looks good,
they will merge the PR.
Getting the Changes into Dolt
Once a change lands in go-mysql-server
, it will make its way Dolt's main
branch through automation. A PR will be opened against dolthub/dolt
to bump
the dependency, and automated tests will be run against the dolt
repository.
Someone from the dolt
team will merge the PR (or a subsequent one with
equivalent effect). However, if you want to add automated tests to dolt itself
that assert the behavior you have fixed in go-mysql-server
, please feel free
to submit a separate PR for that. For this particular bug fix, we did add bats
tests at the dolt
layer
to assert the behavior we want. It's also worth noting that the enginetest
integration tests from go-mysql-server
also get run against Dolt as the
storage engine by dolt's unit test setup as well.
Conclusion
Contributing to a new open source project can have a high barrier to entry, including discovering toolchains and developer tools dependencies, code organization, build and test processes, etc. The go ecosystem is quite homogenous in this regard and that makes it easier to jump into Dolt and go-mysql-server than equivalently sized projects in other language ecosystems might be. But there are still specific dependencies that need to be in place, subtleties of code organization to take into account and specific workflows that need to be known about and worked within. Hopefully this short overview has given a good concrete example of how to make progress on improving and testing Dolt's SQL engine. Some major takeaways:
-
Dolt depends on go-mysql-server and fixing SQL engine bugs in particular will often involve changing go-mysql-server, which lives in a separate Git repository. You can make good iterative progress by checking out and branching both repositories and adding a local dependency override to
dolt/go/go.mod
with thego mod edit ...
invocation used above. -
go-mysql-server
has two major types of tests: unit tests that live next to the implementation of the expression, function, analysis phase, plan nodes, etc., and engine tests, which live in//enginetest
. Engine tests function more like high level integration and functional tests. Many changes to go-mysql-server will merit adding some of both types of tests. -
Once a
go-mysql-server
PR is accepted and merged, it will make its way todolt
's master branch automatically. For many types of improvements to the SQL support in Dolt, it may not be necessary to make any changes to the Dolt repository itself. If you do need to or want to make changes to the Dolt repository itself, the procedure looks largely the same, just with opening a PR against the Dolt repository.
We hope this blog post can help anyone looking to navigate and contribute to the Dolt code base more effectively in the future.
If you're looking to jump into the Dolt or go-mysql-server code bases, we use
the tag good first issue
on our github issues to tag things that are probably
near at hand, unambiguous, and easy to test and review. Feel free to check out
the current list of issues in
go-mysql-server
or in
dolt.