Loading Native Postgres Extensions

DOLTGRES
5 min read

We're continuing to make progress on DoltgreSQL, which is a version of Dolt built to be a drop-in replacement for PostgreSQL. Dolt is a MySQL compatible database that is built from the ground up with Git-influenced version control features. This means that you can use branch, merge, diff, and more with your data and schema. As a part of our drop-in replacement plan, we've consistently been asked a question that is core for many existing Postgres users: are extensions supported? I'm glad to say that we are working on extension support, and we've actually been able to make some tangible progress! While we still have a lot more development to do, it's worthwhile to go over what we've achieved so far.

What are extensions?

For those that may be unaware, extensions are a collection of various items (functions, types, etc.), all packaged together to enable some larger functionality. For example, one popular extension is PostGIS, which adds the ability to work with geospatial data within Postgres. This extension is maintained by a separate organization, and a lot of the core processing that the extension does is implemented as its own library. Postgres provides an API that the library is developed against, and this allows the extension to access the underlying engine of Postgres to accomplish a wide range of tasks.

Using extensions outside of Postgres

Doltgres is not a fork of Postgres. Doltgres is written in Go and built on top of prolly trees, which are a novel data structure that underlies the ability of Dolt and Doltgres to version data and schemas, at scale, in realtime. This means that we have to reimplement all of the functionality of Postgres on top of our custom engine. Every statement, every type, and every function is written according to the documentation and observed effects, but the logic is specific to our custom engine. We've been very successful with this approach for Dolt, having achieved 100% correctness last year in sqllogictests. As a user, the underlying logic doesn't matter, just the results.

This isn't the case for extensions though. The API that Postgres exposes correlates, in many ways, to the internals of Postgres. While the API does have a degree of separation to ensure some form of backward compatibility, it's structured based on how Postgres itself is structured. Not only that, the exposed API is in the C language, which doesn't quite align with our usage of Go. Go does work with C code using cgo, but it's primarily geared for developers wanting to incorporate external libraries written in C/C++ into their Go project. Adding onto this, we support three primary operating systems as targets: Windows, Linux, and MacOS.

Because of this, we've long felt that being a drop-in replacement for Postgres was going to have to add the caveat of "as long as you don't use extensions". Of course that would be a blocker for many users, so over time we began to look into implementing some of the most popular extensions directly into Doltgres. For some extensions, this would be relatively easy, but for others like PostGIS, the task seemed monumental.

Well, Rome wasn't built in a day, so work began on incorporating support for the easy extensions. Some extensions are just SQL files, so as long as we support the statements, we could support those extensions. This wouldn't enable support for many extensions, but it would be a start. However, while working on this, I felt like it had to be possible to integrate the larger extensions with libraries, without having to implement all of them. No matter how many we implement, it would mean that we could fundamentally never support those who build their own extensions. After all, they're just library files, and loading libraries is core to many programs, so why not try it?

Loading native libraries

There are two main challenges when it comes to dynamically loading a library at runtime: loading the library, and linking the symbols. The symbols can be thought of as the API function signatures, and that "signatures" portion is crucial. As long as you can provide a matching signature, your own function will be called. This shifts the goal quite a bit. Rather than implementing the functions for each extension, we could implement the functions of the API. This is still a monumental task as the API is quite large, but it turns out that we don't quite have to implement the entire API at once. Libraries only link against the symbols that they need to function. This means that we can approach our API implementation incrementally, targeting the extensions that are the highest priority.

I wanted to start small, so I decided to go with uuid-ossp, which is an extension that ships with a standard Postgres installation. On Windows (my primary development platform), I utilized a tool called dumpbin, which tells me all of the symbols that the library expects, as well as the expected source of the symbol. The API symbols are linked to the postgres.exe binary. We can create our own library file and use a module-definition file to essentially spoof the postgres.exe name with our own library's name. With the symbols that are needed as well, I wrote stubs for the functions in Go, using cgo. Since my functions are written in standard Go, it will make interfacing with the Doltgres internals much easier.

With that part out of the way, we next needed to load the library. Windows provides an API function that allows us to add libraries to the library search path, and so we add our created library to that path using SetDllDirectoryW, and then load our library into memory. Afterward, we load the extension's library, and it sees our created library as though it were the actual postgres.exe binary, and we're able to make calls to the functions contained within. Once I saw that it was calling my stubbed functions, I wrote an actual implementation for the functions, and I successfully got UUIDs from the calls. It's a success!

What's next?

Although it's not fully hooked up, it proves that the approach can work. To verify, I also made sure that we could load that same extension in both Linux and MacOS, and while they have different methods, we're able to successfully load our own logic across each supported platform. Now that we know it's possible, all that's left to do is integrate this proof-of-concept into Doltgres. This isn't quite so trivial, considering the extensions expect incoming data to match a very specific format defined by the API, and we've come up with our own formats and internal machinery since it's an independent and original project.

The other large challenge deals with actually loading these extensions. Right now, we're relying on an existing Postgres installation, since they usually come with a set of default extensions. What about handling custom ones that a user wants to add? What happens when someone pushes a database that makes use of an extension and another user pulls it and they don't have the extension installed? Are extensions enabled on a per-branch basis, and if so what happens when branches are merged? These are questions that a normal Postgres instance doesn't need to worry about, but because of our versioning features, we have to tackle these. It's one of the exciting aspects of working on such an innovative database, but it definitely comes with its own challenges as a developer! Before starting this project, I had never heard of a module-definition file before, so I'm always glad to learn new things.

To the future

DoltgreSQL continues to improve, and we are excited to get extensions added. They're such a requested feature, and I'm glad we were able to break ground and begin implementing them into Doltgres. If there are any features that are missing that are critical to your workflow, then let us know! You can find us on Twitter/X and chat with us on Discord. We also welcome all issue reports! Thank you for reading the blog post!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.