Mind The convT

SQL
4 min read

We're using Go to write Dolt, the world's first version-controlled SQL database. This blog discusses how we sped up our table scans by 24% by avoiding interfaces at critical junctions.

Interfaces Are Handy

They modularize code along lines of common behavior. The Pet and Person types below both implement String() methods, and can therefore be assigned to variables of type Stringer:

Type Stringer interface {
  String() string
}

Type Person struct {
  name string
}

Func (p Person) String() {
  Return "hi my name is " + p.name
}

Type Pet struct {
  name string
}

Func (p Pet) String() {
  Return "hi my name is " + p.name
}

Fullfillment of those behaviors can be enforced at compile-time. The lines below will fail to compile if Person or Pet do not implement Stringer:

Var _ Stringer = Person{}
Var _ Stringer = Pet{}

Behaviors as APIs define boundaries between logical layers. The Catalog class below has different implementations between our memory and dolt storage layers, but the engine logic is oblivious to how the integrator decides to satisfy the behavior.

type Catalog interface {
	DatabaseProvider
	FunctionProvider
	TableFunctionProvider
	ExternalStoredProcedureProvider
	StatsProvider

	CreateDatabase(ctx *Context, dbName string, collation CollationID) error
	RemoveDatabase(ctx *Context, dbName string) error
	Table(ctx *Context, dbName, tableName string) (Table, Database, error)
	DatabaseTable(ctx *Context, db Database, tableName string) (Table, Database, error)
	TableAsOf(ctx *Context, dbName, tableName string, asOf interface{}) (Table, Database, error)
	DatabaseTableAsOf(ctx *Context, db Database, tableName string, asOf interface{}) (Table, Database, error)
}

Interfaces help aggregate higher order logic based on behavior. The function below tries to print the most verbose string behavior an object supports, or falls back to the built-in value formatter:

func DebugString(a any) string {
  Switch a := a.(Type) {
  Case DebugStringer:
    Return a.DebugString()
  Case Stringer:
     Return a.String()
  default:
     Return fmt.Sprintf("%v", a)
  }
}

They can even help us work around import cycles. The pattern below exposes a method in its place of use when defining it in a more standard location would otherwise cause a conflict. While not ideal, poking interface holes can be more practical that spending days shifting dependency cycles around:

type BehaviorInterface interface {
  Behavior()
}

Func doThing(a any) error {
  B, ok := a.(BehaviorInterface)
  If !ok {
    Return fmt.Errorf("expected type %T to implement BehaviorInterface", a)
  }
  b.Behavior()
  Return
}

But: a thing interfaces do not do is make code faster. Changing our memory->wire conversions to use concrete rather than interface types improved our tablescan performance by 24%.

Interface Benchmarks

Here is a quick comparison between an Itoa wrapper and a version that returns an interface (playground):

var res_ string

//go:noinline
func f1(i int) string {
	return strconv.Itoa(i)
}

//go:noinline
func f2(i int) interface{} {
	return strconv.Itoa(i)
}


func BenchmarkConcrete(b *testing.B) {
	var res string
	for i := 0; i < b.N; i++ {
		res = f1(i)
	}
	res_ = res
}

func BenchmarkInterface(b *testing.B) {
	var res string
	for i := 0; i < b.N; i++ {
		res = f2(i).(string)
	}
	res_ = res
}

The interface version is twice as slow for the same reason it is flexible, the return value is abstracted into a heap-allocated interface (pointer + type identifier):

goos: darwin
goarch: arm64
pkg: github.com/dolthub/go-mysql-server/sql/types
cpu: Apple M3 Pro
BenchmarkInterface
BenchmarkInterface-12    	26910858	        45.84 ns/op	      23 B/op	       2 allocs/op
BenchmarkConcrete
BenchmarkConcrete-12     	52958406	        23.76 ns/op	       7 B/op	       0 allocs/op

intf-return

Tablescans

Converting an integer to a string seems like a trivial example, but that's what tablescans actually do in production. Here is a profile for one of our sysbench tablescans:

types-scan-prof

The query pulls values off disk (right half above) just to push them to the wire (left half). Our liberal use of interfaces mean that we spent ~12% of CPU time running convT variations:

scan-convt

Converting a number from the in-memory representation to a byte string looks something like this:

func (t NumberTypeImpl_) SQL(ctx *sql.Context, v interface{}) (sqltypes.Value, error) {
	if vt, _, err := t.Convert(v); err == nil {
		switch t.baseType {
		case sqltypes.Int8, sqltypes.Int16, sqltypes.Int24, sqltypes.Int32, sqltypes.Int64:
			dest = strconv.AppendInt(nil, mustInt64(vt), 10)
...

We take advantage of the sql.Type interface's Convert method to funnel all values towards first (1) a range-appropriate integer type, and then (2) an explicit int64. The input can be anything -- strings, floats, and json strings will all be appropriately cast. For all of the reasons listed earlier, the interface organization and behavioral patterns helped us quickly reach 100% MySQL correctness sooner.

But Convert returns an interface, which as we saw is particularly expensive at the memory->wire SQL conversion. We can benchmark types.IntType.SQL the same way as our toy example above. Both convert a number into a byte string.

func BenchmarkNumI64SQL(b *testing.B) {
	var res sqltypes.Value
	t := Int64
	ctx := sql.NewEmptyContext()
	for i := 0; i < b.N; i++ {
		res, _ = t.SQL(ctx, nil, i)
	}
	result_ = res
}

i64-before

We can reformat the conversion to avoid runtime.convT64 by rearranging the operation order. We first convert to an int64, and afterwards correct for type-specific range issues:

func (t NumberTypeImpl_) SQLUint8(ctx *sql.Context, v interface{}) ([]byte, error) {
	num, _, err := convertToUint64(t, v)
	if err != nil {
		return nil, err
	}
	if num > math.MaxUint8 {
		num = uint64(math.MaxUint8)
	}
	dest = strconv.AppendUint(nil, num, 10)

	return dest, nil
    }

Now all calls return concrete interfaces:

i64-after

Here is a side-by-side comparison of time and memory saved by skipping the interface pointer malloc.

goos: darwin
goarch: arm64
pkg: github.com/dolthub/go-mysql-server/sql/types
cpu: Apple M3 Pro
BenchmarkNumI64SQLBefore-12    	18769036	        62.69 ns/op	      24 B/op	       2 allocs/op
BenchmarkNumI64SQLAfter-12    	23965848	        50.62 ns/op	      16 B/op	       1 allocs/op

This might seem like a small change, but skipping just half of our source of convT's improved sysbench tablescans by 24% (including saved GC time) measured by queries/second. The other half of convTs occur at the storage->key value memory layer. Those require more work to skip, but will probably have the same impact.

Summary

Interfaces are a distinguishing feature that elevates Golang's code organization. But their ease of implementation can interfere with peformance. Rewriting certain key bottlenecks improves Dolt's tablescan performance by 24%. If the second half of optimizations has the same impact, eliding convT's will improve tablescan performance by almost half!

If you have any questions about Dolt, databases, or Golang performance reach out to us on Twitter, Discord, and GitHub!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.