pg_catalog: read only, or read-write?
Introduction
We're hard at work building Doltgres, a Postgres-compatible database with git-inspired version control features. Our long-term goal is to be a drop-in replacement for Postgres, so that application developers can use Doltgres without making any code changes. Usually this means implementing Doltgres features exactly as they occur in Postgres. But does it always?
This blog discusses an important Postgres feature, the pg_catalog
schema, and an internal debate
we're having about how to best support these tables in the short and long term.
What is the pg_catalog
schema?
In Postgres, metadata about the tables, functions and other schema elements are exposed via the
system catalogs, accessed through
the pg_catalog
schema in each database. Here are a few examples:
pg_class
: contains information about the relations (tables, indexes, etc.) in the databasepg_enum
: contains enum label and value definitionspg_proc
: contains function and procedure definitions
Additionally, there are tables that expose and configure the server's behavior, like users and their
permission to access particular tables, such as the pg_authid
table.
Here's an example of querying these tables.
db1=# select oid, relname from pg_class;
oid | relname
-------+------------------------------------------------
61501 | test
61504 | test_pkey
61506 | test2
61509 | test2_pkey
2619 | pg_statistic
1247 | pg_type
62229 | test
62232 | test_pkey
62235 | test
62238 | test_pkey
2836 | pg_toast_1255
2837 | pg_toast_1255_index
Why does it matter?
In a word: compatibility.
There are hundreds of Postgres-compatible tools and frameworks in the wild, and many of them issue
queries to the pg_catalog
tables as part of their operation. Workbenches need to see what tables
and other schema elements are available to query, and ORMs need to see what columns and foreign keys
are on the tables defined.
The bottom line is that there are very many tools that won't work with Doltgres until it returns reasonable results from these tables.
Postgres's extreme level of customizability
In deciding how to implement this critical functionality in Doltgres, we run into a dilemma.
pg_catalog
tables in Postgres are writable and have far-reaching effects when modified. Deciding
to duplicate this capability will have fundamental impacts on how the system we're building works,
down to the very lowest levels.
As we've covered in previous
blogs, Postgres's system
catalogs allow a kind of crazy amount of customization to dedicated users. For example: the system
table pg_operator
defines what operations like addition between two integers do. But it's not a
read-only view: you can write into it like any other table. This example redefines addition as
subtraction, system-wide:
CREATE FUNCTION subtract_integers(a INT4, b INT4)
RETURNS TEXT AS $$
BEGIN
RETURN (a - b)::text;
END;
$$ LANGUAGE plpgsql;
UPDATE pg_operator SET oprcode = 'subtract_integers', oprresult = 25 WHERE oid = 551;
So the question we have to ask ourselves is: when we say we want to be a drop-in replacement, how much do we really mean it? Are there people doing this kind of insane stunt for legitimate reasons in the wild, and will our customers need the same capability?
Read-only tables to the rescue?
We aren't totally in the dark here, as there are other successful commercial open-source projects
that have already been faced with the same dilemma and resolved it. Notably, Cockroach DB implements
pg_catalog
as a set of read-only views. And
many of the tables in it are empty.
Cockroach is much more mature than Doltgres, has a much larger user base, has raised much more venture capital, etc. Surely if read-only views are good enough for Cockroach, they should be good enough for Doltgres too, right?
Options for implementing the pg_catalog
tables
But maybe it's possible to do better than Cockroach without too much extra work? There are a few
different ways we could go about implementing pg_catalog
:
- Materialize the
pg_catalog
tables in storage exactly as Postgres does. This makes writing to them trivial, but would commit us to that particular implementation decision. It would also mean storing duplicate information, since Dolt and therefore Doltgres already have their own way of storing schemas, columns, etc. Keeping those two sources of truth in sync would be a potential source of bugs. - Intercept write operations. Instead of inserting or updating rows in the
pg_catalog
tables, we could hijack those operations and reinterpret them in terms of another, non-tabular storage solution. With this solution, we wouldn't be tied to storing this information the same way Postgres does, and we wouldn't need to duplicate any information about schemas. - Make them read-only. Simple and self-explanatory, which is always nice.
Nothing is off the table right now as we consider our options. So what should we do?
Personally, I'm not a big fan of the materialization solution, but it has the benefit of being
relatively simple to implement. On the other hand, saying our pg_catalog
tables are read-only,
like Cockroach did, potentially loses us some compatibility, but gives us maximum flexibility to
make our own storage and product decisions in this space. And we don't have high confidence that
people really write to these tables in the wild.
In the middle, we could allow writes by intercepting the UPDATE
and INSERT
statements and
performing our own logic. This is the most complicated solution, but is a good middle ground, if we
decide we need it.
This decision isn't necessarily final, but any serialization choices are difficult to change without requiring a migration by customers, which we want to avoid when possible. Starting with read-only tables won't prevent us from making them writable in a future release, and we could also do this on a table-by-table basis as required by customers.
So the design space here is relatively open. What do you think? Is having writable pg_catalog
tables important for your use case? We would love to hear from you either way.
Conclusion
We're building Doltgres, the Postgres version of our
version-controlled SQL database Dolt. Learning the ins and outs of the
pg_catalog
schema is critical to Doltgres becoming a drop-in replacement for Postgres, which is
our long-term goal. We're learning more and getting closer every day.
Have questions or opinions about how we implement pg_catalog
? Or maybe you are curious about the
world's first version-controlled SQL database? Join us on Discord
to talk to our engineering team and other Dolt users.