psql's Hidden Queries

TECHNICALDOLTGRES
10 min read

Introduction

We're hard at work building Doltgres, a Postgres-compatible database with git-inspired version control features. Before we built Doltgres we built Dolt, which is MySQL-compatible instead. One of the first things we needed to understand and get working was how Postgres handles informational queries from its shell.

This blog post discusses what the informational commands from psql do under the hood and how we support them in Doltgres.

What are informational queries?

Informational queries are queries of database metadata to examine what databases, tables, columns, and other schema entities exist. They're a quick way to figure out the shape of the data and how to start querying it.

For example, in MySQL you can use the DESCRIBE keyword to get a simplified view of a table's schema:

mysql> describe t1;
+-------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-----+---------+-------+
| a     | int  | NO   |     | NULL    |       |
| b     | int  | YES  |     | NULL    |       |
+-------+------+------+-----+---------+-------+
2 rows in set (0.00 sec)

MySQL has a bunch of different useful, easy-to-use informational queries built right into the MySQL dialect, things like this:

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| common             |
| test               |
| test1              |
| world              |
+--------------------+

mysql> show tables;
+---------------------+
| Tables_in_test      |
+---------------------+
| a                   |
| about               |
| ai                  |
| ...                 |
| t1                  |
+---------------------+

These work in the MySQL shell, but there's nothing special about the MySQL shell here: it just sends the queries to the server, which interprets them as normal statements and executes them. This means the same syntax works on any MySQL connection or application, not just the MySQL shell.

Postgres, meanwhile, doesn't have this built in syntactic sugar. But it does ship with a shell called psql, which supports a bunch of informational commands you can view with the \? command:

 Informational
    (options: S = show system objects, + = additional detail)
    \d[S+]                 list tables, views, and sequences
    \d[S+]  NAME           describe table, view, sequence, or index
    \da[S]  [PATTERN]      list aggregates
    \dA[+]  [PATTERN]      list access methods
    \dAc[+] [AMPTRN [TYPEPTRN]]  list operator classes
    \dAf[+] [AMPTRN [TYPEPTRN]]  list operator families
    \dAo[+] [AMPTRN [OPFPTRN]]   list operators of operator families
    \dAp[+] [AMPTRN [OPFPTRN]]   list support functions of operator families
    \db[+]  [PATTERN]      list tablespaces
    \dc[S+] [PATTERN]      list conversions

    ...

You run these commands in the shell and get the results. For example, this one is roughly equivalent to MySQL's SHOW TABLES.

doltgres-> \d
           List of relations
 Schema |    Name    | Type  |  Owner
--------+------------+-------+----------
 public | test_table | table | postgres
(1 row)

The problem is that these commands only work in the psql shell. If you want this functionality in your application, or in another shell, you're out of luck.

But not to worry. We are here to peel back the curtain and show you what these commands are actually doing so you can use them in other contexts as needed.

psql commands under the hood: Listing tables

Behind the scenes, when you run one of the informational commands supported by psql, the shell intercepts it and translates it to a query on the pg_catalog tables. Let's look at the example above, which lists all relations.

doltgres-> \d
           List of relations
 Schema |    Name    | Type  |  Owner
--------+------------+-------+----------
 public | test_table | table | postgres
(1 row)

To get this result, psql is sending this SQL query to the Postgres server:

SELECT n.nspname as "Schema",
  c.relname as "Name",
  CASE c.relkind 
    WHEN 'r' THEN 'table' 
    WHEN 'v' THEN 'view' 
    WHEN 'm' THEN 'materialized view' 
    WHEN 'i' THEN 'index' 
    WHEN 'S' THEN 'sequence' 
    WHEN 't' THEN 'TOAST table' 
    WHEN 'f' THEN 'foreign table' 
    WHEN 'p' THEN 'partitioned table' 
    WHEN 'I' THEN 'partitioned index' 
  END as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
     LEFT JOIN pg_catalog.pg_am am ON am.oid = c.relam
WHERE c.relkind IN ('r','p','v','m','S','f','')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname !~ '^pg_toast'
      AND n.nspname <> 'information_schema'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;

That's quite a bit more complicated than \d! But as you can see, it resolves into a relatively straightforward query on the pg_catalog.pg_class table and a few others. pg_catalog is a special built-in schema that Postgres uses to store information about types, tables, and other database schema entities. Unlike the Information schema, which is part of the SQL standard, the pg_catalog schema is custom to Postgres and reflects internal architectural decisions particular to how Postgres organizes its data.

Let's look at a few other examples.

Listing indexes

MySQL has the query SHOW INDEXES from t1, which tells you what indexes exist on the table named. The closest equivalent from psql is \di, which tells you all indexes in the current database.

postgres=# \di
                                List of relations
  Schema  |            Name            | Type  |  Owner   |         Table
----------+----------------------------+-------+----------+-----------------------
 postgres | test_schema_pkey           | index | postgres | test_schema
 public   | a.b.c_pkey                 | index | postgres | a.b.c
 public   | t10_pkey                   | index | postgres | t10
 public   | t1_pkey                    | index | postgres | t1
 public   | t2_pkey                    | index | postgres | t2
 public   | t_bit_pkey                 | index | postgres | t_bit
 public   | t_bit_varying_pkey         | index | postgres | t_bit_varying
 public   | t_box_pkey                 | index | postgres | t_box
 public   | t_bytea_pkey               | index | postgres | t_bytea
 public   | t_cidr_pkey                | index | postgres | t_cidr
 public   | t_circle_pkey              | index | postgres | t_circle
 public   | t_inet_pkey                | index | postgres | t_inet
 public   | t_interval_pkey            | index | postgres | t_interval
 public   | t_line_pkey                | index | postgres | t_line
 public   | t_lseg_pkey                | index | postgres | t_lseg
 public   | t_macaddr_pkey             | index | postgres | t_macaddr
 public   | t_money_pkey               | index | postgres | t_money
 public   | t_path_pkey                | index | postgres | t_path
 public   | t_pg_lsn_pkey              | index | postgres | t_pg_lsn
 public   | t_pkey                     | index | postgres | t
 public   | t_point_pkey               | index | postgres | t_point
 public   | t_polygon_pkey             | index | postgres | t_polygon
 public   | t_smallserial_pkey         | index | postgres | t_smallserial
 public   | t_timestamp_with_zone_pkey | index | postgres | t_timestamp_with_zone
 public   | t_tsquery_pkey             | index | postgres | t_tsquery
 public   | t_tsvector_pkey            | index | postgres | t_tsvector
 public   | t_uuid_pkey                | index | postgres | t_uuid
 public   | t_xml_pkey                 | index | postgres | t_xml
 public   | test_pkey                  | index | postgres | test

You can filter this down to a single index by naming it:

# \di t_pkey
            List of relations
Schema |  Name  | Type  |  Owner   | Table
--------+--------+-------+----------+-------
public | t_pkey | index | postgres | t
(1 row)

Behind the scenes, psql is running this query:

SELECT n.nspname as "Schema",
  c.relname as "Name",
  CASE c.relkind 
    WHEN 'r' THEN 'table' 
    WHEN 'v' THEN 'view' 
    WHEN 'm' THEN 'materialized view' 
    WHEN 'i' THEN 'index' 
    WHEN 'S' THEN 'sequence' 
    WHEN 't' THEN 'TOAST table' 
    WHEN 'f' THEN 'foreign table' 
    WHEN 'p' THEN 'partitioned table' 
    WHEN 'I' THEN 'partitioned index' 
END as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
     LEFT JOIN pg_catalog.pg_am am ON am.oid = c.relam
WHERE c.relkind IN ('r','p','v','m','S','f','')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname !~ '^pg_toast'
      AND n.nspname <> 'information_schema'
  AND pg_catalog.pg_table_is_visible(c.oid);

What's interesting is that most of the CASE statement is rendered useless by the WHERE clause, but it's still in there, presumably copied from another similar query. Note also that the pg_catalog tables themselves are filtered from the result, as are the information_schema tables and other Postgres internal tables.

Listing databases

MySQL has SHOW DATABASES, which is equivalent to the \l command in psql:

doltgres=> \l
                                                                   List of databases
        Name        |  Owner   | Encoding |          Collate           |           Ctype            | ICU Locale | Locale Provider | Access privileges
--------------------+----------+----------+----------------------------+----------------------------+------------+-----------------+-------------------
 doltgres           | postgres | UTF8     | English_United States.1252 | English_United States.1252 |            | libc            |
 information_schema | postgres | UTF8     | English_United States.1252 | English_United States.1252 |            | libc            |
(2 rows)

Under the hood, psql is executing this query:

SELECT
  d.datname as "Name",
  pg_catalog.pg_get_userbyid(d.datdba) as "Owner",
  pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding",
  CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS "Locale Provider",
  d.datcollate as "Collate",
  d.datctype as "Ctype",
  d.daticulocale as "ICU Locale",
  NULL as "ICU Rules",
  pg_catalog.array_to_string(d.datacl, E'\n') AS "Access privileges"
FROM pg_catalog.pg_database d
ORDER BY 1;

What's curious is that altough the pg_catalog schema is technically local to the current database, it obviously does contain information about other databases than the one currently connected.

Describing a table

psql's built-in way to describe a table is arguably more useful than MySQL's DESCRIBE keyword. You just use \d with the name of the table you want to examine:

postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           | not null |
 name   | text    |           |          |
Indexes:
    "t1_pkey" PRIMARY KEY, btree (id)
Publications:
    "doltgres_slot"
    "pglogrepl_demo"

Here I'm getting not just the columns of the table, but also indexes on it, as well as some related info like which publications refer to this table. Pretty neat!

If you look at the queries psql is sending to the server, you'll see there's a lot of them. Unlike other psql commands, the command to describe a table is not a straightforward query and clearly involves a lot of back and forth and interpretation between shell and server. The psql shell is smart enough to omit any elements that aren't present and only show you relevant results. Here's what it's doing under the hood, annotated with comments to help you understand what's happening.

-- Getting the table's OID for use in further queries
SELECT c.oid,
  n.nspname,
  c.relname
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname OPERATOR(pg_catalog.~) '^(t1)$' COLLATE pg_catalog.default
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 2, 3;

-- Getting which kind of schema elements are in this table (using the OID from the first query)
SELECT c.relchecks, 
    c.relkind, 
    c.relhasindex, 
    c.relhasrules, 
    c.relhastriggers, 
    c.relrowsecurity, 
    c.relforcerowsecurity, 
    false AS relhasoids, 
    c.relispartition, '', 
    c.reltablespace, 
    CASE 
        WHEN c.reloftype = 0 THEN '' 
        ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text 
    END, c.relpersistence, c.relreplident, am.amname
FROM pg_catalog.pg_class c
 LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)
LEFT JOIN pg_catalog.pg_am am ON (c.relam = am.oid)
WHERE c.oid = '62222';

-- Getting column attributes like name, collation, nullability
SELECT a.attname,
  pg_catalog.format_type(a.atttypid, a.atttypmod),
  (SELECT pg_catalog.pg_get_expr(d.adbin, d.adrelid, true)
   FROM pg_catalog.pg_attrdef d
   WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
  a.attnotnull,
  (SELECT c.collname FROM pg_catalog.pg_collation c, pg_catalog.pg_type t
   WHERE c.oid = a.attcollation AND t.oid = a.atttypid AND a.attcollation <> t.typcollation) AS attcollation,
  a.attidentity,
  a.attgenerated
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = '62222' AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;

-- Getting index information
SELECT c2.relname, 
    i.indisprimary, 
    i.indisunique, 
    i.indisclustered, 
    i.indisvalid, 
    pg_catalog.pg_get_indexdef(i.indexrelid, 0, true),
    pg_catalog.pg_get_constraintdef(con.oid, true), 
    contype, 
    condeferrable, 
    condeferred, 
    i.indisreplident, 
    c2.reltablespace
FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i
  LEFT JOIN pg_catalog.pg_constraint con ON (conrelid = i.indrelid AND conindid = i.indexrelid AND contype IN ('p','u','x'))
WHERE c.oid = '62222' AND c.oid = i.indrelid AND i.indexrelid = c2.oid
ORDER BY i.indisprimary DESC, c2.relname;

-- Getting policy and permission information (not relevant to results above)
SELECT pol.polname, pol.polpermissive,
  CASE WHEN pol.polroles = '{0}' THEN NULL ELSE pg_catalog.array_to_string(array(select rolname from pg_catalog.pg_roles where oid = any (pol.polroles) order by 1),',') END,
  pg_catalog.pg_get_expr(pol.polqual, pol.polrelid),
  pg_catalog.pg_get_expr(pol.polwithcheck, pol.polrelid),
  CASE pol.polcmd
    WHEN 'r' THEN 'SELECT'
    WHEN 'a' THEN 'INSERT'
    WHEN 'w' THEN 'UPDATE'
    WHEN 'd' THEN 'DELETE'
    END AS cmd
FROM pg_catalog.pg_policy pol
WHERE pol.polrelid = '62222' ORDER BY 1;

-- Getting column statistics (not relevant to results above)
SELECT oid, 
    stxrelid::pg_catalog.regclass, 
    stxnamespace::pg_catalog.regnamespace::pg_catalog.text AS nsp, 
    stxname,
    pg_catalog.pg_get_statisticsobjdef_columns(oid) AS columns,
    'd' = any(stxkind) AS ndist_enabled,
    'f' = any(stxkind) AS deps_enabled,
    'm' = any(stxkind) AS mcv_enabled,
    stxstattarget
FROM pg_catalog.pg_statistic_ext
WHERE stxrelid = '62222'
ORDER BY nsp, stxname;

-- Getting publication information
SELECT pubname
     , NULL
     , NULL
FROM pg_catalog.pg_publication p
     JOIN pg_catalog.pg_publication_namespace pn ON p.oid = pn.pnpubid
     JOIN pg_catalog.pg_class pc ON pc.relnamespace = pn.pnnspid
WHERE pc.oid ='62222' and pg_catalog.pg_relation_is_publishable('62222')
UNION
SELECT pubname
     , pg_get_expr(pr.prqual, c.oid)
     , (CASE WHEN pr.prattrs IS NOT NULL THEN
         (SELECT string_agg(attname, ', ')
           FROM pg_catalog.generate_series(0, pg_catalog.array_upper(pr.prattrs::pg_catalog.int2[], 1)) s,
                pg_catalog.pg_attribute
          WHERE attrelid = pr.prrelid AND attnum = prattrs[s])
        ELSE NULL END) FROM pg_catalog.pg_publication p
     JOIN pg_catalog.pg_publication_rel pr ON p.oid = pr.prpubid
     JOIN pg_catalog.pg_class c ON c.oid = pr.prrelid
WHERE pr.prrelid = '62222'
UNION
SELECT pubname
     , NULL
     , NULL
FROM pg_catalog.pg_publication p
WHERE p.puballtables AND pg_catalog.pg_relation_is_publishable('62222')
ORDER BY 1;

-- Getting inherited tables (not present in above output)
SELECT c.oid::pg_catalog.regclass
FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i
WHERE c.oid = i.inhparent AND i.inhrelid = '62222'
  AND c.relkind != 'p' AND c.relkind != 'I'
ORDER BY inhseqno;

-- More inheritance information
SELECT c.oid::pg_catalog.regclass, c.relkind, inhdetachpending, pg_catalog.pg_get_expr(c.relpartbound, c.oid)
FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i
WHERE c.oid = i.inhrelid AND i.inhparent = '62222'
ORDER BY pg_catalog.pg_get_expr(c.relpartbound, c.oid) = 'DEFAULT', c.oid::pg_catalog.regclass::pg_catalog.text;

This example is much, much more complex than the other ones, and writing these queries yourself (or interpreting the results the way that psql does) requires a pretty deep understanding of postgres's internal storage details. Personally, I would take DESCRIBE t1 any day of the week.

Revealing the hidden queries

I obtained these hidden queries by connecting psql to a running Doltgres server, where I turned on debug logging to print every query received. Then I can just pull the queries out of the log:

DEBU[0005] Received message: Query { SELECT n.nspname as "Schema",
  c.relname as "Name", ...

You can also use a packet-sniffing tool like Wireshark for this, but make sure to disable SSL on your connection or you won't be able to read the contents of the TCP packets.

% PGSSLMODE=disable PGPASSWORD=password psql -h 127.0.0.1 -U postgres

Conclusion

We're building Doltgres, the Postgres version of our version-controlled SQL database Dolt. Learning the ins and outs of how Postgres organizes its data is critical to Doltgres becoming a drop-in replacement for Postgres, which is our long-term goal. We're learning more and getting closer every day.

Have questions on psql queries? Or maybe you are curious about the world's first version-controlled SQL database? Join us on Discord to talk to our engineering team and other Dolt users.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.