pg_catalog: read only, or read-write?

June 11, 2024

4 min read

Introduction

We're hard at work building Doltgres, a Postgres-compatible database with git-inspired version control features. Our long-term goal is to be a drop-in replacement for Postgres, so that application developers can use Doltgres without making any code changes. Usually this means implementing Doltgres features exactly as they occur in Postgres. But does it always?

This blog discusses an important Postgres feature, the pg_catalog schema, and an internal debate we're having about how to best support these tables in the short and long term.

What is the `pg_catalog` schema?

In Postgres, metadata about the tables, functions and other schema elements are exposed via the system catalogs, accessed through the pg_catalog schema in each database. Here are a few examples:

pg_class: contains information about the relations (tables, indexes, etc.) in the database
pg_enum: contains enum label and value definitions
pg_proc: contains function and procedure definitions

Additionally, there are tables that expose and configure the server's behavior, like users and their permission to access particular tables, such as the pg_authid table.

Here's an example of querying these tables.

db1=# select oid, relname from pg_class;
    oid  |                    relname
  -------+------------------------------------------------
   61501 | test
   61504 | test_pkey
   61506 | test2
   61509 | test2_pkey
    2619 | pg_statistic
    1247 | pg_type
   62229 | test
   62232 | test_pkey
   62235 | test
   62238 | test_pkey
    2836 | pg_toast_1255
    2837 | pg_toast_1255_index

Why does it matter?

In a word: compatibility.

There are hundreds of Postgres-compatible tools and frameworks in the wild, and many of them issue queries to the pg_catalog tables as part of their operation. Workbenches need to see what tables and other schema elements are available to query, and ORMs need to see what columns and foreign keys are on the tables defined.

The bottom line is that there are very many tools that won't work with Doltgres until it returns reasonable results from these tables.

Postgres's extreme level of customizability

In deciding how to implement this critical functionality in Doltgres, we run into a dilemma. pg_catalog tables in Postgres are writable and have far-reaching effects when modified. Deciding to duplicate this capability will have fundamental impacts on how the system we're building works, down to the very lowest levels.

As we've covered in previous blogs, Postgres's system catalogs allow a kind of crazy amount of customization to dedicated users. For example: the system table pg_operator defines what operations like addition between two integers do. But it's not a read-only view: you can write into it like any other table. This example redefines addition as subtraction, system-wide:

CREATE FUNCTION subtract_integers(a INT4, b INT4)
RETURNS TEXT AS $$
BEGIN
    RETURN (a - b)::text;
END;
$$ LANGUAGE plpgsql;
UPDATE pg_operator SET oprcode = 'subtract_integers', oprresult = 25 WHERE oid = 551;

So the question we have to ask ourselves is: when we say we want to be a drop-in replacement, how much do we really mean it? Are there people doing this kind of insane stunt for legitimate reasons in the wild, and will our customers need the same capability?

Read-only tables to the rescue?

We aren't totally in the dark here, as there are other successful commercial open-source projects that have already been faced with the same dilemma and resolved it. Notably, Cockroach DB implements pg_catalog as a set of read-only views. And many of the tables in it are empty.

Cockroach is much more mature than Doltgres, has a much larger user base, has raised much more venture capital, etc. Surely if read-only views are good enough for Cockroach, they should be good enough for Doltgres too, right?

Options for implementing the `pg_catalog` tables

But maybe it's possible to do better than Cockroach without too much extra work? There are a few different ways we could go about implementing pg_catalog:

Materialize the pg_catalog tables in storage exactly as Postgres does. This makes writing to them trivial, but would commit us to that particular implementation decision. It would also mean storing duplicate information, since Dolt and therefore Doltgres already have their own way of storing schemas, columns, etc. Keeping those two sources of truth in sync would be a potential source of bugs.
Intercept write operations. Instead of inserting or updating rows in the pg_catalog tables, we could hijack those operations and reinterpret them in terms of another, non-tabular storage solution. With this solution, we wouldn't be tied to storing this information the same way Postgres does, and we wouldn't need to duplicate any information about schemas.
Make them read-only. Simple and self-explanatory, which is always nice.

Nothing is off the table right now as we consider our options. So what should we do?

Personally, I'm not a big fan of the materialization solution, but it has the benefit of being relatively simple to implement. On the other hand, saying our pg_catalog tables are read-only, like Cockroach did, potentially loses us some compatibility, but gives us maximum flexibility to make our own storage and product decisions in this space. And we don't have high confidence that people really write to these tables in the wild.

In the middle, we could allow writes by intercepting the UPDATE and INSERT statements and performing our own logic. This is the most complicated solution, but is a good middle ground, if we decide we need it.

This decision isn't necessarily final, but any serialization choices are difficult to change without requiring a migration by customers, which we want to avoid when possible. Starting with read-only tables won't prevent us from making them writable in a future release, and we could also do this on a table-by-table basis as required by customers.

So the design space here is relatively open. What do you think? Is having writable pg_catalog tables important for your use case? We would love to hear from you either way.

Conclusion

We're building Doltgres, the Postgres version of our version-controlled SQL database Dolt. Learning the ins and outs of the pg_catalog schema is critical to Doltgres becoming a drop-in replacement for Postgres, which is our long-term goal. We're learning more and getting closer every day.

Have questions or opinions about how we implement pg_catalog? Or maybe you are curious about the world's first version-controlled SQL database? Join us on Discord to talk to our engineering team and other Dolt users.

Blog