Dolt CI Schema Details

DOLTHUBREFERENCE
13 min read

Near the end of 2024 we announced the release of continuous integration (CI) testing on DoltHub. This feature allows you to run automated tests against your DoltHub databases, just like you do for your GitHub repositories. If you'd like to try this out for yourself, check out our docs for an introduction to some general concepts and a guide on getting started.

In an earlier post we outlined our design for adding CI to DoltHub and DoltLab, one modeled largely on GitHub's CI platform, GitHub Actions. Today, we'll be taking a closer look at the technical implementation for adding CI to DoltHub and DoltLab, and we'll do so by starting with CI's implementation in Dolt.

As mentioned in the design blog, we wanted CI configuration for DoltHub and DoltLab to live in and travel with each Dolt database. To do this, we decided to create a set of internal tables used to persist configuration.

In addition, we wanted a user to be able to supply Dolt with a "workflow" yaml file to define CI on their database. This of course is modeled on the Workflow yaml file used on GitHub to configure CI on a repository.

Deciding to model CI on DoltHub in this way, led us to the following schema design for Dolt's internal CI tables.

Schema details

Dolt stores CI configuration in the following internal tables. These are created and committed to a Dolt database when a user runs dolt ci init.

  • dolt_ci_workflows
  • dolt_ci_workflow_events
  • dolt_ci_workflow_event_triggers
  • dolt_ci_workflow_event_trigger_branches
  • dolt_ci_workflow_jobs
  • dolt_ci_workflow_steps
  • dolt_ci_workflow_saved_query_steps
  • dolt_ci_workflow_saved_query_step_expected_row_column_results

These tables are written to and read by the dolt ci command so that users don't need to manually edit each of these tables when configuring CI. To better understand what each table is used for, let's consider the following sample workflow file.

# sample workflow.yaml

name: "my first workflow"
on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main
      - alt
jobs:
  - name: "check schema"
    steps:
      - name: "ensure two tables exist"
        saved_query_name: "check two tables"
        expected_rows: "== 2"
        expected_columns: "== 1"

dolt_ci_workflows

CREATE TABLE `dolt_ci_workflows` (
  `name` varchar(2048) COLLATE utf8mb4_0900_ai_ci NOT NULL,
  `created_at` datetime(6) NOT NULL,
  `updated_at` datetime(6) NOT NULL,
  PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;

A Workflow for Dolt and DoltHub is similar to a workflow on GitHub. It's a yaml file that defines what CI jobs checks or tests should run and when they should run. These files are identified by their top-level name field, which must be unique among workflows. The dolt_ci_workflows table stores this name, as well as when it was created and last updated.

dolt_ci_workflow_events

CREATE TABLE `dolt_ci_workflow_events` (
  `id` varchar(36) NOT NULL,
  `event_type` int NOT NULL,
  `workflow_name_fk` varchar(2048) COLLATE utf8mb4_0900_ai_ci NOT NULL,
  PRIMARY KEY (`id`),
  KEY `workflow_name_fk` (`workflow_name_fk`),
  CONSTRAINT `dolt_ci_workflow_events_ibfk_1` FOREIGN KEY (`workflow_name_fk`) REFERENCES `dolt_ci_workflows` (`name`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;

Events are actions that trigger a workflow to run. These can be things like new data being pushed to DoltHub, or a user opening a pull-request with proposed data changes. Once these actions occur on a database with CI configured, DoltHub will execute the relevant workflow's Jobs.

The dolt_ci_workflow_events table has a foreign key reference to the dolt_ci_workflows table, since events are related to a specific workflow. The event_type is defined here as well, and refers to a type of event that triggers a workflow to run, like the aforementioned push or pull_request events.

dolt_ci_workflow_event_triggers

CREATE TABLE `dolt_ci_workflow_event_triggers` (
  `id` varchar(36) NOT NULL,
  `event_trigger_type` int NOT NULL,
  `workflow_event_id_fk` varchar(36) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `workflow_event_id_fk` (`workflow_event_id_fk`),
  CONSTRAINT `dolt_ci_workflow_event_triggers_ibfk_1` FOREIGN KEY (`workflow_event_id_fk`) REFERENCES `dolt_ci_workflow_events` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;

The dolt_ci_workflow_event_triggers stores more specific, conditional information about the Event that triggers a workflow. This is done with the event_trigger_type column, which stores the conditional information. Take for example the on section defined in the sample workflow file above.

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main
      - alt

This section defines two types of events that trigger a workflow: push and pull_request, but also defines the conditional information for each event.

For the push event, the workflow should only run if a push is made to the main branch. And similarly, the pull_request event should only trigger this workflow to run if a pull request activity occurs on a pull request involving either the main or alt branches.

Dolt defines this conditional information as event trigger types, and stores it in the event_trigger_type column. Possible types are:

const (
	WorkflowEventTriggerTypeUnspecified WorkflowEventTriggerType = iota
	WorkflowEventTriggerTypeBranches
	WorkflowEventTriggerTypeActivityOpened
	WorkflowEventTriggerTypeActivityClosed
	WorkflowEventTriggerTypeActivityReopened
	WorkflowEventTriggerTypeActivitySynchronized
)

The WorkflowEventTriggerTypeBranches type indicates that an event has a branch condition, and the name of the branch is stored in the dolt_ci_workflow_event_trigger_branches table.

The other types refer to "activities" that can occur on a pull request, like opening, closing, reopening, or synchronizing. If the sample workflow included an activity condition, for example:

on:
  pull_request:
    branches:
      - main
      - alt
    activities:
      - opened
      - reopened

A row for each event_trigger_type, WorkflowEventTriggerTypeActivityOpened and WorkflowEventTriggerTypeActivityReopened, associated with the pull_request event would be stored in the dolt_ci_workflow_event_triggers table.

dolt_ci_workflow_event_trigger_branches

CREATE TABLE `dolt_ci_workflow_event_trigger_branches` (
  `id` varchar(36) NOT NULL,
  `branch` varchar(1024) COLLATE utf8mb4_0900_ai_ci NOT NULL,
  `workflow_event_triggers_id_fk` varchar(36) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `workflow_event_triggers_id_fk` (`workflow_event_triggers_id_fk`),
  CONSTRAINT `dolt_ci_workflow_event_trigger_branches_ibfk_1` FOREIGN KEY (`workflow_event_triggers_id_fk`) REFERENCES `dolt_ci_workflow_event_triggers` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;

The dolt_ci_workflow_event_trigger_branches table, as mentioned above, is a simple join table that has a foreign key reference to the dolt_ci_workflow_event_triggers table for entries of type WorkflowEventTriggerTypeBranches.

dolt_ci_workflow_jobs

CREATE TABLE `dolt_ci_workflow_jobs` (
  `id` varchar(36) NOT NULL,
  `name` varchar(1024) COLLATE utf8mb4_0900_ai_ci NOT NULL,
  `created_at` datetime(6) NOT NULL,
  `updated_at` datetime(6) NOT NULL,
  `workflow_name_fk` varchar(2048) COLLATE utf8mb4_0900_ai_ci NOT NULL,
  PRIMARY KEY (`id`),
  KEY `workflow_name_fk` (`workflow_name_fk`),
  CONSTRAINT `dolt_ci_workflow_jobs_ibfk_1` FOREIGN KEY (`workflow_name_fk`) REFERENCES `dolt_ci_workflows` (`name`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;

Jobs define what should happen when a workflow is run, specifically, the programmatic work. This work is broken down into steps, or ordered, individual units of work. Each job is associated with a parent workflow, and must have a unique name.

dolt_ci_workflow_steps

CREATE TABLE `dolt_ci_workflow_steps` (
  `id` varchar(36) NOT NULL,
  `name` varchar(1024) COLLATE utf8mb4_0900_ai_ci NOT NULL,
  `step_order` int NOT NULL,
  `step_type` int NOT NULL,
  `created_at` datetime(6) NOT NULL,
  `updated_at` datetime(6) NOT NULL,
  `workflow_job_id_fk` varchar(36) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `workflow_job_id_fk` (`workflow_job_id_fk`),
  CONSTRAINT `dolt_ci_workflow_steps_ibfk_1` FOREIGN KEY (`workflow_job_id_fk`) REFERENCES `dolt_ci_workflow_jobs` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;

Steps, in the general sense, are any check or test or specific single unit of work that should be run as part of a job. Steps have a step_order, which indicates the order in which they should be run, and this is determined by the order in which they are defined in the workflow file.

Though designed with this generalization in mind, Dolt currently only supports a specific step_type, the "saved query step". Steps of this type are stored in the dolt_ci_workflow_saved_query_steps table, but additional types and corresponding tables will be added in the future as Dolt's CI support matures.

dolt_ci_workflow_saved_query_steps

CREATE TABLE `dolt_ci_workflow_saved_query_steps` (
  `id` varchar(36) NOT NULL,
  `saved_query_name` varchar(2048) COLLATE utf8mb4_0900_ai_ci NOT NULL,
  `expected_results_type` int NOT NULL,
  `workflow_step_id_fk` varchar(36) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `workflow_step_id_fk` (`workflow_step_id_fk`),
  CONSTRAINT `dolt_ci_workflow_saved_query_steps_ibfk_1` FOREIGN KEY (`workflow_step_id_fk`) REFERENCES `dolt_ci_workflow_steps` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;

The Saved Query Step runs a specific saved query against the Dolt database. The dolt_ci_workflow_saved_query_steps table stores the name of the saved query and the type of expected results to use to evaluate the results of the saved query. The actual SQL query associated with the saved query is stored in the database's dolt_query_catalog table. Let's return to the sample workflow file to get a better understanding of how this works.

jobs:
  - name: "check schema"
    steps:
      - name: "ensure two tables exist"
        saved_query_name: "check two tables"
        expected_rows: "== 2"
        expected_columns: "== 1"

Above, we see the "check schema" job that consists of a single step, the "ensure two tables exist" step. The name of the job and the name of the step indicates the general purpose of this CI check—that it should verify that the database contains exactly two tables.

In our example, the saved query name "check two tables" is mapped to the SQLquery show tables;. We can see this by inspecting the dolt_query_catalog table.

dolt sql -q "select * from dolt_query_catalog;"
+------------------+---------------+------------------+--------------+-------------+
| id               | display_order | name             | query        | description |
+------------------+---------------+------------------+--------------+-------------+
| check two tables | 1             | check two tables | show tables; |             |
+------------------+---------------+------------------+--------------+-------------+

The name of a saved query is used as the id in the dolt_query_catalog table. This name is stored in the dolt_ci_workflow_saved_query_steps table as the saved_query_name, so it can be looked up and run during workflow execution.

When this saved query is executed, the results of the "show tables;" query might look like this:

test % dolt sql -q "show tables;"
+----------------+
| Tables_in_test |
+----------------+
| t1             |
| t2             |
+----------------+

The returned output from the saved query is then a single column, "Tables_in_test", with two rows, "t1" and "t2".

This brings us to the expected_results_type column of the dolt_ci_workflow_saved_query_steps table. It indicates the type of "expected results" that should be used to evaluate the results returned by executing the saved query.

In the same way that Dolt CI will evolve to support more types of steps, it will also evolve to support more types of "expected results". Currently, Dolt CI only supports the "expected row and column count results type", which stores expected counts in the dolt_ci_workflow_saved_query_step_expected_row_column_results table.

dolt_ci_workflow_saved_query_step_expected_row_column_results

CREATE TABLE `dolt_ci_workflow_saved_query_step_expected_row_column_results` (
  `id` varchar(36) NOT NULL,
  `expected_column_count_comparison_type` int NOT NULL,
  `expected_column_row_comparison_type` int NOT NULL,
  `expected_column_count` bigint NOT NULL,
  `expected_row_count` bigint NOT NULL,
  `created_at` datetime(6) NOT NULL,
  `updated_at` datetime(6) NOT NULL,
  `workflow_saved_query_step_id_fk` varchar(36) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `workflow_saved_query_step_id_fk` (`workflow_saved_query_step_id_fk`),
  CONSTRAINT `dolt_ci_workflow_saved_query_step_expected_row_column_results_ibfk_1` FOREIGN KEY (`workflow_saved_query_step_id_fk`) REFERENCES `dolt_ci_workflow_saved_query_steps` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;

As indicated by the name, the dolt_ci_workflow_saved_query_step_expected_row_column_results table stores the number of expected rows and columns returned from a saved query. Take for example the sample workflow file again.

      steps:
        saved_query_name: "check two tables"
        expected_rows: "== 2"
        expected_columns: "== 1"

The file indicates that for the saved query "check two tables" which runs the show tables; query, we expect 2 rows and 1 column to be returned in the SQL results. These expected counts are stored in the expected_row_count and expected_column_count columns respectively.

Additionally, the expected_row_count_comparison_type and expected_column_count_comparison_type column indicates the type of comparison operation that should be used to assert the expected count matches the actual count. In the file snippet above, the == operator indicates equality, so if the expected count does not equal the actual count, the workflow run will fail.

Available comparison types are:

const (
	WorkflowSavedQueryExpectedRowColumnComparisonTypeUnspecified WorkflowSavedQueryExpectedRowColumnComparisonType = iota
	WorkflowSavedQueryExpectedRowColumnComparisonTypeEquals // ==
	WorkflowSavedQueryExpectedRowColumnComparisonTypeNotEquals // !=
	WorkflowSavedQueryExpectedRowColumnComparisonTypeLessThan // <
	WorkflowSavedQueryExpectedRowColumnComparisonTypeGreaterThan // >
	WorkflowSavedQueryExpectedRowColumnComparisonTypeLessThanOrEqual // <=
	WorkflowSavedQueryExpectedRowColumnComparisonTypeGreaterThanOrEqual // >=
)

Checking out the diff

Using the above schema definitions in Dolt, we've been able to make continuous integration testing on data with DoltHub, a reality.

To further illustrate these tables in action, let's import our sample workflow file into a Dolt database and check out the diff of the successful import. The dolt ci import command will parse our workflow file and update the various CI tables on our behalf.

# make a new database

% mkdir test
% cd test
test % dolt init

# initialize CI

test % dolt ci init
# create the sample workflow.yaml

test % cat <<EOF > workflow.yaml
name: "my first workflow"
on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main
      - alt
jobs:
  - name: "check schema"
    steps:
      - name: "ensure two tables exist"
        saved_query_name: "check two tables"
        expected_rows: "== 2"
        expected_columns: "== 1"
EOF
# import the workflow.yaml

test % dolt ci import workflow.yaml
# check out the diff

test % dolt diff HEAD~1
diff --dolt a/dolt_ci_workflow_event_trigger_branches b/dolt_ci_workflow_event_trigger_branches
--- a/dolt_ci_workflow_event_trigger_branches
+++ b/dolt_ci_workflow_event_trigger_branches
+---+--------------------------------------+--------+--------------------------------------+
|   | id                                   | branch | workflow_event_triggers_id_fk        |
+---+--------------------------------------+--------+--------------------------------------+
| + | 80de7968-8938-4cc3-b90c-e1bf26666d93 | main   | 98ae18c7-a5ef-4b75-b3db-2b498a8928f7 |
| + | 951d916c-3787-4cf6-903e-51d4d435a38b | alt    | 98ae18c7-a5ef-4b75-b3db-2b498a8928f7 |
| + | ba78997d-1a8a-4805-acbe-7d1c36bcf519 | main   | f8dab257-b845-4a84-9714-cd31a20e028e |
+---+--------------------------------------+--------+--------------------------------------+
diff --dolt a/dolt_ci_workflow_event_triggers b/dolt_ci_workflow_event_triggers
--- a/dolt_ci_workflow_event_triggers
+++ b/dolt_ci_workflow_event_triggers
+---+--------------------------------------+--------------------+--------------------------------------+
|   | id                                   | event_trigger_type | workflow_event_id_fk                 |
+---+--------------------------------------+--------------------+--------------------------------------+
| + | 98ae18c7-a5ef-4b75-b3db-2b498a8928f7 | 1                  | ed21a1be-958c-4553-a911-50f40085dd80 |
| + | f8dab257-b845-4a84-9714-cd31a20e028e | 1                  | a0038a05-87b5-4b79-a026-0d7addeeab37 |
+---+--------------------------------------+--------------------+--------------------------------------+
diff --dolt a/dolt_ci_workflow_events b/dolt_ci_workflow_events
--- a/dolt_ci_workflow_events
+++ b/dolt_ci_workflow_events
+---+--------------------------------------+------------+-------------------+
|   | id                                   | event_type | workflow_name_fk  |
+---+--------------------------------------+------------+-------------------+
| + | a0038a05-87b5-4b79-a026-0d7addeeab37 | 1          | my first workflow |
| + | ed21a1be-958c-4553-a911-50f40085dd80 | 2          | my first workflow |
+---+--------------------------------------+------------+-------------------+
diff --dolt a/dolt_ci_workflow_jobs b/dolt_ci_workflow_jobs
--- a/dolt_ci_workflow_jobs
+++ b/dolt_ci_workflow_jobs
+---+--------------------------------------+--------------+---------------------+---------------------+-------------------+
|   | id                                   | name         | created_at          | updated_at          | workflow_name_fk  |
+---+--------------------------------------+--------------+---------------------+---------------------+-------------------+
| + | 0d88b842-263e-4743-8b9d-9ab189fe2dd6 | check schema | 2025-01-28 11:03:02 | 2025-01-28 11:03:02 | my first workflow |
+---+--------------------------------------+--------------+---------------------+---------------------+-------------------+
diff --dolt a/dolt_ci_workflow_saved_query_step_expected_row_column_results b/dolt_ci_workflow_saved_query_step_expected_row_column_results
--- a/dolt_ci_workflow_saved_query_step_expected_row_column_results
+++ b/dolt_ci_workflow_saved_query_step_expected_row_column_results
+---+--------------------------------------+---------------------------------------+-------------------------------------+-----------------------+--------------------+---------------------+---------------------+--------------------------------------+
|   | id                                   | expected_column_count_comparison_type | expected_column_row_comparison_type | expected_column_count | expected_row_count | created_at          | updated_at          | workflow_saved_query_step_id_fk      |
+---+--------------------------------------+---------------------------------------+-------------------------------------+-----------------------+--------------------+---------------------+---------------------+--------------------------------------+
| + | f9b23079-7d78-423a-869a-e6ae4a05e357 | 1                                     | 1                                   | 1                     | 2                  | 2025-01-28 11:03:02 | 2025-01-28 11:03:02 | 31bab1b9-52ca-47ea-b3a8-4ad5e8cb29a4 |
+---+--------------------------------------+---------------------------------------+-------------------------------------+-----------------------+--------------------+---------------------+---------------------+--------------------------------------+
diff --dolt a/dolt_ci_workflow_saved_query_steps b/dolt_ci_workflow_saved_query_steps
--- a/dolt_ci_workflow_saved_query_steps
+++ b/dolt_ci_workflow_saved_query_steps
+---+--------------------------------------+------------------+-----------------------+--------------------------------------+
|   | id                                   | saved_query_name | expected_results_type | workflow_step_id_fk                  |
+---+--------------------------------------+------------------+-----------------------+--------------------------------------+
| + | 31bab1b9-52ca-47ea-b3a8-4ad5e8cb29a4 | check two tables | 1                     | dc857754-eed3-49dd-9e2a-32873f87a240 |
+---+--------------------------------------+------------------+-----------------------+--------------------------------------+
diff --dolt a/dolt_ci_workflow_steps b/dolt_ci_workflow_steps
--- a/dolt_ci_workflow_steps
+++ b/dolt_ci_workflow_steps
+---+--------------------------------------+-------------------------+------------+-----------+---------------------+---------------------+--------------------------------------+
|   | id                                   | name                    | step_order | step_type | created_at          | updated_at          | workflow_job_id_fk                   |
+---+--------------------------------------+-------------------------+------------+-----------+---------------------+---------------------+--------------------------------------+
| + | dc857754-eed3-49dd-9e2a-32873f87a240 | ensure two tables exist | 1          | 1         | 2025-01-28 11:03:02 | 2025-01-28 11:03:02 | 0d88b842-263e-4743-8b9d-9ab189fe2dd6 |
+---+--------------------------------------+-------------------------+------------+-----------+---------------------+---------------------+--------------------------------------+
diff --dolt a/dolt_ci_workflows b/dolt_ci_workflows
--- a/dolt_ci_workflows
+++ b/dolt_ci_workflows
+---+-------------------+---------------------+---------------------+
|   | name              | created_at          | updated_at          |
+---+-------------------+---------------------+---------------------+
| + | my first workflow | 2025-01-28 11:03:02 | 2025-01-28 11:03:02 |
+---+-------------------+---------------------+---------------------+

The diff output shows the new rows added to each table, and our database is now configured for CI! Once we push the changes to DoltHub, they'll be live.

Conclusion

Dolt CI is available now on DoltHub and DoltLab, and with new features and improvements on the way.

If you'd like to discuss any particular use-cases you have for CI on your Dolt databases, or want to request a feature or two, we'd love to hear from you, so swing by our Discord.

Thanks for reading and don't forget to check out each of our cool products below:

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.