Hacking Go's Runtime with Generics
Today's blog is about maphash, a new open-source package for hashing arbitrary (comparable) Golang types. We'll do a deep-dive on how its implemented as well as some interesting tid-bits on the Golang runtime.
Here at DoltHub, we love Golang! We've been pretty happy with Golang as our primary development language. It has a strong open-source community, an excellent tool chain, and the simplicity of the language itself facilitates a productive developer experience. We're using it to build Dolt, a MySQL-compatible version-controlled database. This blog is the latest in our technical Golang series, check them out if you're looking for more fun Golang content.
Hashing All the Things
maphash
is a minimal package meant for one purpose: put the power of Golang's builtin hash into developers hands.
Any comparable
Golang type can be hashed using a maphash.Hasher
. The entire public API can be
summarized as follows:
type Hasher[K comparable] struct {
...
}
func NewHasher[K comparable]() Hasher[K] {
...
}
func (h Hasher[K]) Hash(key K) uint64 {
...
}
In Golang the comparable type constraint is the union of
all types that implement the operators ==
and !=
. This includes all scalar types, channels, interfaces, arrays, and
structs of other comparable
types. Essentially anything you can use as the key of map
you can hash with maphash
So what's the point? The motivation for this package came while implementing a concurrent, stripe-locked cache for Dolt. The design looked something like this:
type Cache[K comparable, V any] struct {
stripes [64]map[K]V
locks [64]sync.RWLock
}
Using generics, we can create a flexible container class backed by a builtin map
. However, as we try to implement the
accessor methods for this class, we get stuck writing a generic function to choose which stripe each key belongs to:
// getStripe returns the stripe that |key| belongs to.
func (c Cache[K, V]) getStripe(key K) int8 {
???
}
What we need here is a hash function for a generic, comparable
type K
.
Background
The concept of exposing a generic hash function is not new to Golang. This idea first appeared in the core Golang repo
in a 2017 issue submitted by Byran Mills of Google's Golang team. It
suggests adding a builtin function for hashing comparable
types as a building block for custom containers and data
structures. The proposal argues that Golang developers are forced to work around language when writing hash-based data
structures like Tries or concurrent hash maps.
Developers who want an efficient hash-based data structure have only the builtin map
to choose from.
When this issue was originally created in 2017, Golang didn't have generics, so the proposed interfaced lacked a concrete type:
func Local(interface{}) uintptr
This API seems reasonable at first, but it leaves the compiler without the type information it needs to implement this function efficiently. The secret sauce of what makes Go's builtin map so fast is that the compiler generates custom hash functions for each map type declared in the source code. These generated functions know how to traverse a specific data type and use AES instructions to hash them with special hardware support. Unfortunately, all this magic is hidden inside the runtime, and it depends on the compiler knowing a static type at build-time.
In go 1.19
, the standard library added a new maphash package that exposed
utilities for hashing string
s and []byte
slices using the AES-based runtime implementations. These are certainly
welcome additions to the language, but they still lack the flexibility needed to write generic container types like
Tries. Earlier this year, a new issue proposed expanding this package to
hash any comparable
type. But unless this proposal gets accepted, we're left with only one option: hacking.
Hacking the Runtime
As a pretext, we know the compiler is capable of generating the exact hash functions we want. And further, we know that
using a map[T]V
will trigger the compiler to generate a function that will hash a comparable
type T
. Within the
runtime, these custom hash functions are accessed through a maptype:
type maptype struct {
typ _type
key *_type
elem *_type
bucket *_type // internal type representing a hash bucket
// function for hashing keys (ptr to key, seed) -> hash
hasher func(unsafe.Pointer, uintptr) uintptr
keysize uint8 // size of key slot
elemsize uint8 // size of elem slot
bucketsize uint16 // size of bucket
flags uint32
}
Each usage of a map
takes a *maptype
and an instance of a map
(*hmap
):
// v := map[k]
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
...
}
// v, ok := map[k]
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {
...
}
// delete(map, k)
func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
...
}
So how can we get access to this? Using Golang's unsafe package, we can work around the type
system and get access to the private fields within these runtime types. This is what it looks like in our new maphash
package:
func getRuntimeHasher[K comparable]() (h hashfn, seed uintptr) {
a := any(make(map[K]struct{}))
i := (*mapiface)(unsafe.Pointer(&a))
h, seed = i.typ.hasher, uintptr(i.val.hash0)
return
}
type hashfn func(unsafe.Pointer, uintptr) uintptr
type mapiface struct {
typ *maptype
val *hmap
}
// mirrors go/src/runtime/type.go
type maptype struct {
typ _type
key *_type
elem *_type
bucket *_type
// function for hashing keys (ptr to key, seed) -> hash
hasher func(unsafe.Pointer, uintptr) uintptr
keysize uint8
elemsize uint8
bucketsize uint16
flags uint32
}
In getRuntimeHasher[K comparable]()
, start by declaring a map[K]struct{}
. This declaration makes the compiler create
the hash function we want. Next we cast our map
to type any
, forcing the compiler to add additional metadata do
describe the dynamic type of the variable a
. For map
s, this metadata is a pointer to a maptype
object, which is
just what we need to access the hash function! Using an unsafe.Pointer
we can cast a
to mapiface
(a runtime type),
get our hasher
, and construct a maphash.Hasher
:
type Hasher[K comparable] struct {
hash hashfn
seed uintptr
}
func NewHasher[K comparable]() Hasher[K] {
h, s := getRuntimeHasher[K]()
return Hasher[K]{hash: h, seed: s}
}
Using a maphash.Hasher
we can create hash-based datastructures keyed by UUID
s, time.Time
, or any other comparable
type. And thanks to the compiler, Hasher
is able to take advantage of optimized AES instructions. Hashing a string
with maphash
is just as fast as the standard
library version!
goos: darwin
goarch: amd64
pkg: github.com/dolthub/maphash
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkCompareStringHasher
BenchmarkCompareStringHasher/string_hasher
BenchmarkCompareStringHasher/string_hasher-12 237970230 5.341 ns/op 0 B/op 0 allocs/op
BenchmarkCompareStringHasher/std_string_hasher
BenchmarkCompareStringHasher/std_string_hasher-12 207972664 5.759 ns/op 0 B/op 0 allocs/op
PASS
A Humble Request
If you're thinking that this code is an abuse of Golang, I agree with you. The maphash
package, like map
itself, is
tightly coupled to the current runtime
implementation. Golang's map
implementation has been fairly stable over the
last several major releases, but any changes would need to be ported to maphash
. Pretty fragile! Currently, maphash
uses build tags to restrict
its supported versions to go 1.18
and go 1.19
. We'll expand this support with future major version releases.
If you're thinking there's a better way to do this, I agree with you again! If I could make a request, I would ask you
to comment and upvote the open issue to expand the standard library version
of maphash
to include the following:
// Comparable returns the hash of comparable value v with the given seed
// such that Comparable(s, v1) == Comparable(s, v2) if v1 == v2.
// If v contains a floating-point NaN, then the hash is non-deterministically random.
func Comparable[T comparable](seed Seed, v T) uint64
Wrapping Up
Thanks for reading! I hope you learned something about Golang internals and I hope you give maphash a try. I'd like to give a shout-out here to Dave Cheney and his excellent blogs about Golang and its runtime. In particular his blog on maps was incredibly helpful background for this work. I also hope you get a chance to checkout Dolt a Git-like SQL database built using Golang. If you're interested in learning more about Golang or Dolt, join us on our Discord!