fmt.Sprintf vs String Concat
String concatenation isn't the most elegant looking code:
getFieldName := gf.tableName + "+" + gf.ColName
But it's quite a bit faster than fmt.Sprintf
, which arguably looks
more organized:
getFieldName := fmt.Sprintf("%s+%s", gf.tableName, gf.ColName)
We are not the first to notice this. But we recently got bitten by the common wisdom that suggests string operations are infrequent and small compared to network and IO overhead. For servers in remote data centers, this is certainly true. But changing from Sprintf
to +
improves Dolt's broad query performance for a local server by 1-3% (benchmarks here):
goos: darwin
goarch: arm64
pkg: github.com/dolthub/dolt/go/performance/microsysbench
│ before.txt │ after.txt │
│ sec/op │ sec/op vs base │
OltpPointSelect-12 22.85µ ± 13% 21.61µ ± 1% -5.44% (p=0.002 n=6)
OltpJoinScan-12 267.5µ ± 2% 259.2µ ± 1% -3.13% (p=0.002 n=6)
ProjectionAggregation-12 11.76m ± 14% 11.67m ± 9% ~ (p=0.818 n=6)
SelectRandomPoints-12 74.46µ ± 1% 71.45µ ± 1% -4.05% (p=0.002 n=6)
SelectRandomRanges-12 101.69µ ± 1% 98.14µ ± 4% -3.49% (p=0.015 n=6)
geomean 222.4µ 214.9µ -3.39%
│ before.txt │ after.txt │
│ B/op │ B/op vs base │
OltpPointSelect-12 19.88Ki ± 0% 19.46Ki ± 0% -2.13% (p=0.002 n=6)
OltpJoinScan-12 164.6Ki ± 0% 162.8Ki ± 0% -1.08% (p=0.002 n=6)
ProjectionAggregation-12 2.930Mi ± 14% 2.945Mi ± 8% ~ (p=0.937 n=6)
SelectRandomPoints-12 68.56Ki ± 0% 67.87Ki ± 0% -1.01% (p=0.002 n=6)
SelectRandomRanges-12 99.81Ki ± 0% 99.46Ki ± 0% -0.36% (p=0.002 n=6)
geomean 146.4Ki 145.2Ki -0.82%
│ before.txt │ after.txt │
│ allocs/op │ allocs/op vs base │
OltpPointSelect-12 443.0 ± 0% 416.0 ± 0% -6.09% (p=0.002 n=6)
OltpJoinScan-12 6.019k ± 0% 5.903k ± 0% -1.93% (p=0.002 n=6)
ProjectionAggregation-12 70.30k ± 14% 70.69k ± 8% ~ (p=0.937 n=6)
SelectRandomPoints-12 1.255k ± 0% 1.211k ± 0% -3.51% (p=0.002 n=6)
SelectRandomRanges-12 2.375k ± 0% 2.353k ± 0% -0.93% (p=0.002 n=6)
geomean 3.544k 3.458k -2.41%
It is becoming increasingly difficult for Dolt to get broad percentage level wins, especially with 4-line changes:
func (p *GetField) String() string {
- return fmt.Sprintf("%s.%s", p.table, p.name)
+ return p.table + "." + p.name
}
So this perf improvement matters. We will talk a little about how fmt.Sprintf
works in this blog. Moving forward we will use string concatenation when possible and keep an eye out for formatting hot paths.
How fmt.Sprintf
Works
What does fmt.Sprintf
do? The function inputs a string to format and a series of arguments interfaced as type any
. A simplified pseudocode Sprintf
might look like this:
type formatter struct {
buf []byte
lastI int
argI int
}
func (f *formatter) Sprintf(format string, args []any) string {
for i := 0; i < len(format); i++ {
if format[i] == '%' {
if i != f.lastI {
f.buf = append(f.buf, format[f.lastI:i])
}
f.lastI = i
i++
switch format[i] {
case 's':
f.lastI = i+1
f.fmtString(args[f.argI])
f.argI++
case 'd':
f.lastI = i+1
f.fmtInt(args[f.argI])
f.argI++
case '%': // percent literal, passthrough
default:
panic("unknown format sequence")
}
}
i++
}
if f.lastI != len(format) {
f.buf = append(f.buf, format[f.lastI:])
}
return string(f.buf)
}
We only included %s
and %d
symbols, omitted more thorough formatting considerations, and avoided pooling formatter
objects. But the core loop captures the heart of the process:
-
Find the next format symbol
-
Write the sequential string between this and the last format symbol
-
Write the next argument based on the current symbol
-
Write a prefix string if the loop does not end with a format symbol
Besides state tracking, the bulk of logical complexity resides in type-specific formatters:
func (f *formatter) fmtString(a any) {
switch s := a.(type) {
case string:
copy(f.buf[f.bufI:], s)
f.bufI += len(s)
case fmt.Stringer:
sS := s.String()
copy(f.buf[f.bufI:], sS)
f.bufI += len(sS)
default:
panic("unable to format %T as string")
}
}
Why Is fmt.Sprintf
Slow?
The any
interface is the first overhead. This 8-byte larger type wrapper accommodates any data type that we would like to print, and automatically uses the Stringer
interface to bridge the gap between structural types and format symbols.
The second problem is that we're allocating objects to track formatting state and intermediate buffers for writing new strings. We've committed to making a new string, and these allocations are table stakes.
The third issue is the bulk and flexibility of code required for full correctness. The standard library code for this is several hundred lines and highly branched.
Why Is String Concat Fast?
A +
operator in Go is internally converted to an accumulate/reduct operation. All arguments are collected into a stack and passed to the runtime.stringconcat
operator. stringconcat
counts the total bytes in the string, and uses the count to determine whether the result fits into a stack [32]byte
return or if we need to allocate a heap buffer.
The core loop looks like:
s, b := rawstringtmp(buf, strLength)
for _, x := range args {
copy(b, x)
b = b[len(x):]
}
return s
The way Dolt uses string concat almost always hits the fast path. Our string list fits in registers and the output within the default byte buffer. The bottleneck operation for *GetField.String()
in some cases might be loading the heap resident *GetField
.
Final Notes
I've spent several years doing SQL compiler transformations to simplify expression execution, and I am curious whether the Go compiler could check format strings for only %s
symbols and replace fmt.Sprintf
with runtime.concatstrings
. I do not understand the Go intermediate representations well enough to answer this question, but I will continue researching!
In the meantime we will keep trying to make Dolt faster.
If you have any questions about Dolt, databases, or Golang performance reach out to us on Twitter, Discord, and GitHub!