Dolt Supports Cached Prepared Statements
Last year we announced prepared statement
support,
a simple form of query parameterization. True prepared statements save
plans to be executed later, often multiple times. But the original work
only progressed as far as satisfying the handler API. The ComStmtPrepare
command
discarded plans that had to be recompiled for each ComStmtExecute
call.
Today we are excited to reintroduce prepared statements, now with server side caching! This addition is available in the latest version of Dolt, the first application database that can diff, merge, and revert with Git versioning semantics.
Prepared statements let a SQL client peek into the server's machinery and choose how and when to compile a query plan. A client that frontloads this work benefits from query caching and direct query parameterization. Locking in a query plan before considering user input can also protect against SQL injection, one of the many reasons object relational mappers (ORM's) make heavy use of prepared statements. All of the code referenced in this blog is freely available on GitHub. Join us as we walkthrough the nuances of this advanced feature!
How Do Prepared Statements Work
Prepared statements are an advanced performance and maintainability optimization for application databases.
Starting simple, consider a query template invoked several times for an import workload:
db.MustExec("INSERT INTO objects (label, bbox) VALUES ('cat', [1,2,3,4])")
db.MustExec("INSERT INTO objects (label, bbox) VALUES ('rabbit', [1,2,3,4])")
db.MustExec("INSERT INTO objects (label, bbox) VALUES ('dog', [1,2,3,4])")
> INSERT INTO objects (label, bbox) VALUES ('cat', [1,2,3,4]);
> INSERT INTO objects (label, bbox) VALUES ('rabbit', [1,2,3,4]);
> INSERT INTO objects (label, bbox) VALUES ('dog', [1,2,3,4]);
For each query above, we compile and then execute the same template three times with three sets of parameters. One way of improving this setup might format the parameters into a string:
var batch []struct{
label string
bbox string
} {...}
b := strings.Builder{}
b.WriteString("INSERT INTO objects (label, bbox) VALUES ")
sep := ""
for _, r := range batch {
b.WriteString(fmt.Sprintf("%s(%s, %s)", sep, r.label, r.bbox))
sep = ", "
}
db.MustExec(b.String())
> INSERT INTO objects (label, bbox) VALUES ('cat', [1,2,3,4]), ('rabbit', [1,2,3,4]), ('dog', [1,2,3,4]);
This is more efficient and flexible because we can parameterize a list of values. But it is also unwieldy. You have to stare at the code for awhile to understand what is happening. String substitution is also subject to SQL-injections, and the query still has to compile every time we want to insert data.
Prepared statements compile the query once, followed by executions with user specific parameters called "bindvars":
stmt, err := db.Preparex(`INSERT INTO objects (label, bbox) VALUES (?, ?)`)
for _, r := range batch {
stmt.MustExec(r.label, r.bbox)
}
> INSERT INTO objects (label, bbox) VALUES ('cat', [1,2,3,4]);
> INSERT INTO objects (label, bbox) VALUES ('rabbit', [1,2,3,4]);
> INSERT INTO objects (label, bbox) VALUES ('dog', [1,2,3,4]);
For the visually inclined, the diagram below shows how regular statements pass through the entire analyzer for each query, while prepared statements cache a partial plan.
Prepared statements offer an interface tradeoff. We push control to the SQL Server in exchange for a new backend contract for a prepared query. Whether this contract is a good or bad thing comes down to personal preference. Prepared statements sometimes require more code and more asynchronous thinking. Often a separation of concerns and repeat parameterization simplifies the codebase and query lifecycle.
Why Prepareds
ORMs as a domain favor prepared statements. The ORM that I used for the above examples, sqlx, translates between Go native structs and database rows for simple CRUD queries. But with prepareds, ORMs provide value beyond simply hiding cursor iteration and formatting result sets.
SQL Injection
Prepared statements prevent SQL injection attacks. A SQL injection
happens when a porous backend application parameterizes
arbitrary text. For example, a web server might string-format the
query SELECT * from objects where id=%s
based on a browser provided
parameter. A malicious user navigating to
www.myapp.com/user?uid='1;drop table users'
might delete all of your data! Prepared statements enforce a query plan
that can only substitute literal bindvars at execution time, preventing
arbitrary execution logic.
Performance
The first version of Dolt prepared statements caches query plans, but yields mixed performance wins. The benchmarks below compare insert speed with and without prepared statements:
var Result interface{}
func BenchmarkPrepStmt(b *testing.B) {
db := newdb()
stmt, err := db.Preparex(`INSERT INTO objects (label, bbox) VALUES (?, ?)`)
if err != nil {
b.Fatal(err)
}
var res interface{}
for n := 0; n < b.N; n++ {
res = stmt.MustExec("label", "box")
}
Result = res
}
func BenchmarkExec(b *testing.B) {
db := newdb()
var res interface{}
for n := 0; n < b.N; n++ {
res = db.MustExec(`INSERT INTO objects (label, bbox) VALUES ('label', 'bbox')`)
}
Result = res
}
The three result columns below correspond to (1) test name, (2)
iteration count, and (3) average execution time in
nanoseconds per operation. The third value is the most useful,
indicating the time it took to roundtrip a client MustExec
to the
server and back again as spooled results:
~/g/s/g/m/prep-stmt-tutorial > go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/dolthub/prep-stmt-tutorial
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkPrepStmt-12 378 3302280 ns/op
BenchmarkExec-12 374 3323406 ns/op
PASS
The execution time is equivalent, which is unfortunate given how simple and common the query is.
We can up the ante with a beefier query. Joins and subqueries take longer to compile, and caching those should should yield more impressive results:
query := `
insert into objects (label, bbox)
select a.label, a.bbox
from (select ?, ?) as a(label, bbox)
join other
on a.label = other.x;`
stmt, err := db.Preparex(query)
Sure enough, the newer query is about 35% faster when prepared:
~/g/s/g/m/prep-stmt-tutorial > go test -bench=.
BenchmarkPrepStmt-12 1000 1241423 ns/op
BenchmarkExec-12 650 1683874 ns/op
But we can do better! Ideally, the analyzer tees up an optimized plan that stops short of bindvar substitution. The plan is cached, and every new set of parameters launches a short finalization phase. The work required finalize should be small and proportional to the number of bindvar replacements. Small prepared queries have the potential for similar performance benefits.
Summary
We discussed prepared statement caching, a new Dolt feature that enables query parameterization and prevents SQL injection. Your ORM probably uses prepared statements behind the scenes to clean and cache simple INSERT and SELECT queries.
This feature is a steady improvement since implementing the prepare API last year. We also teased the potential for even juicier performance wins in the future.
If you have any questions about Dolt, databases, or Golang performance reach out to us on Twitter, Discord, and GitHub!