Dolt CI Schema Details
Near the end of 2024 we announced the release of continuous integration (CI) testing on DoltHub. This feature allows you to run automated tests against your DoltHub databases, just like you do for your GitHub repositories. If you'd like to try this out for yourself, check out our docs for an introduction to some general concepts and a guide on getting started.
In an earlier post we outlined our design for adding CI to DoltHub and DoltLab, one modeled largely on GitHub's CI platform, GitHub Actions. Today, we'll be taking a closer look at the technical implementation for adding CI to DoltHub and DoltLab, and we'll do so by starting with CI's implementation in Dolt.
As mentioned in the design blog, we wanted CI configuration for DoltHub and DoltLab to live in and travel with each Dolt database. To do this, we decided to create a set of internal tables used to persist configuration.
In addition, we wanted a user to be able to supply Dolt with a "workflow" yaml
file to define CI on their database. This of course is modeled on the Workflow yaml
file used on GitHub to configure CI on a repository.
Deciding to model CI on DoltHub in this way, led us to the following schema design for Dolt's internal CI tables.
Schema details
Dolt stores CI configuration in the following internal tables. These are created and committed to a Dolt database when a user runs dolt ci init.
- dolt_ci_workflows
- dolt_ci_workflow_events
- dolt_ci_workflow_event_triggers
- dolt_ci_workflow_event_trigger_branches
- dolt_ci_workflow_jobs
- dolt_ci_workflow_steps
- dolt_ci_workflow_saved_query_steps
- dolt_ci_workflow_saved_query_step_expected_row_column_results
These tables are written to and read by the dolt ci command so that users don't need to manually edit each of these tables when configuring CI. To better understand what each table is used for, let's consider the following sample workflow file.
# sample workflow.yaml
name: "my first workflow"
on:
push:
branches:
- main
pull_request:
branches:
- main
- alt
jobs:
- name: "check schema"
steps:
- name: "ensure two tables exist"
saved_query_name: "check two tables"
expected_rows: "== 2"
expected_columns: "== 1"
dolt_ci_workflows
CREATE TABLE `dolt_ci_workflows` (
`name` varchar(2048) COLLATE utf8mb4_0900_ai_ci NOT NULL,
`created_at` datetime(6) NOT NULL,
`updated_at` datetime(6) NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
A Workflow for Dolt and DoltHub is similar to a workflow on GitHub. It's a yaml
file that defines what CI jobs checks or tests should run and when they should run. These files are identified by their top-level name
field, which must be unique among workflows. The dolt_ci_workflows
table stores this name, as well as when it was created and last updated.
dolt_ci_workflow_events
CREATE TABLE `dolt_ci_workflow_events` (
`id` varchar(36) NOT NULL,
`event_type` int NOT NULL,
`workflow_name_fk` varchar(2048) COLLATE utf8mb4_0900_ai_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `workflow_name_fk` (`workflow_name_fk`),
CONSTRAINT `dolt_ci_workflow_events_ibfk_1` FOREIGN KEY (`workflow_name_fk`) REFERENCES `dolt_ci_workflows` (`name`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
Events are actions that trigger a workflow to run. These can be things like new data being pushed to DoltHub, or a user opening a pull-request with proposed data changes. Once these actions occur on a database with CI configured, DoltHub will execute the relevant workflow's Jobs.
The dolt_ci_workflow_events
table has a foreign key reference to the dolt_ci_workflows
table, since events are related to a specific workflow. The event_type
is defined here as well, and refers to a type of event that triggers a workflow to run, like the aforementioned push
or pull_request
events.
dolt_ci_workflow_event_triggers
CREATE TABLE `dolt_ci_workflow_event_triggers` (
`id` varchar(36) NOT NULL,
`event_trigger_type` int NOT NULL,
`workflow_event_id_fk` varchar(36) NOT NULL,
PRIMARY KEY (`id`),
KEY `workflow_event_id_fk` (`workflow_event_id_fk`),
CONSTRAINT `dolt_ci_workflow_event_triggers_ibfk_1` FOREIGN KEY (`workflow_event_id_fk`) REFERENCES `dolt_ci_workflow_events` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
The dolt_ci_workflow_event_triggers
stores more specific, conditional information about the Event that triggers a workflow. This is done with the event_trigger_type
column, which stores the conditional information. Take for example the on
section defined in the sample workflow file above.
on:
push:
branches:
- main
pull_request:
branches:
- main
- alt
This section defines two types of events that trigger a workflow: push
and pull_request
, but also defines the conditional information for each event.
For the push
event, the workflow should only run if a push is made to the main
branch. And similarly, the pull_request
event should only trigger this workflow to run if a pull request activity occurs on a pull request involving either the main
or alt
branches.
Dolt defines this conditional information as event trigger types, and stores it in the event_trigger_type
column. Possible types are:
const (
WorkflowEventTriggerTypeUnspecified WorkflowEventTriggerType = iota
WorkflowEventTriggerTypeBranches
WorkflowEventTriggerTypeActivityOpened
WorkflowEventTriggerTypeActivityClosed
WorkflowEventTriggerTypeActivityReopened
WorkflowEventTriggerTypeActivitySynchronized
)
The WorkflowEventTriggerTypeBranches
type indicates that an event has a branch condition, and the name of the branch is stored in the dolt_ci_workflow_event_trigger_branches
table.
The other types refer to "activities" that can occur on a pull request, like opening, closing, reopening, or synchronizing. If the sample workflow included an activity condition, for example:
on:
pull_request:
branches:
- main
- alt
activities:
- opened
- reopened
A row for each event_trigger_type
, WorkflowEventTriggerTypeActivityOpened
and WorkflowEventTriggerTypeActivityReopened
, associated with the pull_request
event would be stored in the dolt_ci_workflow_event_triggers
table.
dolt_ci_workflow_event_trigger_branches
CREATE TABLE `dolt_ci_workflow_event_trigger_branches` (
`id` varchar(36) NOT NULL,
`branch` varchar(1024) COLLATE utf8mb4_0900_ai_ci NOT NULL,
`workflow_event_triggers_id_fk` varchar(36) NOT NULL,
PRIMARY KEY (`id`),
KEY `workflow_event_triggers_id_fk` (`workflow_event_triggers_id_fk`),
CONSTRAINT `dolt_ci_workflow_event_trigger_branches_ibfk_1` FOREIGN KEY (`workflow_event_triggers_id_fk`) REFERENCES `dolt_ci_workflow_event_triggers` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
The dolt_ci_workflow_event_trigger_branches
table, as mentioned above, is a simple join table that has a foreign key reference to the dolt_ci_workflow_event_triggers
table for entries of type WorkflowEventTriggerTypeBranches
.
dolt_ci_workflow_jobs
CREATE TABLE `dolt_ci_workflow_jobs` (
`id` varchar(36) NOT NULL,
`name` varchar(1024) COLLATE utf8mb4_0900_ai_ci NOT NULL,
`created_at` datetime(6) NOT NULL,
`updated_at` datetime(6) NOT NULL,
`workflow_name_fk` varchar(2048) COLLATE utf8mb4_0900_ai_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `workflow_name_fk` (`workflow_name_fk`),
CONSTRAINT `dolt_ci_workflow_jobs_ibfk_1` FOREIGN KEY (`workflow_name_fk`) REFERENCES `dolt_ci_workflows` (`name`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
Jobs define what should happen when a workflow is run, specifically, the programmatic work. This work is broken down into steps, or ordered, individual units of work. Each job is associated with a parent workflow, and must have a unique name.
dolt_ci_workflow_steps
CREATE TABLE `dolt_ci_workflow_steps` (
`id` varchar(36) NOT NULL,
`name` varchar(1024) COLLATE utf8mb4_0900_ai_ci NOT NULL,
`step_order` int NOT NULL,
`step_type` int NOT NULL,
`created_at` datetime(6) NOT NULL,
`updated_at` datetime(6) NOT NULL,
`workflow_job_id_fk` varchar(36) NOT NULL,
PRIMARY KEY (`id`),
KEY `workflow_job_id_fk` (`workflow_job_id_fk`),
CONSTRAINT `dolt_ci_workflow_steps_ibfk_1` FOREIGN KEY (`workflow_job_id_fk`) REFERENCES `dolt_ci_workflow_jobs` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
Steps, in the general sense, are any check or test or specific single unit of work that should be run as part of a job. Steps have a step_order
, which indicates the order in which they should be run, and this is determined by the order in which they are defined in the workflow file.
Though designed with this generalization in mind, Dolt currently only supports a specific step_type
, the "saved query step". Steps of this type are stored in the dolt_ci_workflow_saved_query_steps
table, but additional types and corresponding tables will be added in the future as Dolt's CI support matures.
dolt_ci_workflow_saved_query_steps
CREATE TABLE `dolt_ci_workflow_saved_query_steps` (
`id` varchar(36) NOT NULL,
`saved_query_name` varchar(2048) COLLATE utf8mb4_0900_ai_ci NOT NULL,
`expected_results_type` int NOT NULL,
`workflow_step_id_fk` varchar(36) NOT NULL,
PRIMARY KEY (`id`),
KEY `workflow_step_id_fk` (`workflow_step_id_fk`),
CONSTRAINT `dolt_ci_workflow_saved_query_steps_ibfk_1` FOREIGN KEY (`workflow_step_id_fk`) REFERENCES `dolt_ci_workflow_steps` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
The Saved Query Step runs a specific saved query against the Dolt database. The dolt_ci_workflow_saved_query_steps
table stores the name of the saved query and the type of expected results to use to evaluate the results of the saved query. The actual SQL query associated with the saved query is stored in the database's dolt_query_catalog
table. Let's return to the sample workflow file to get a better understanding of how this works.
jobs:
- name: "check schema"
steps:
- name: "ensure two tables exist"
saved_query_name: "check two tables"
expected_rows: "== 2"
expected_columns: "== 1"
Above, we see the "check schema" job that consists of a single step, the "ensure two tables exist" step. The name of the job and the name of the step indicates the general purpose of this CI check—that it should verify that the database contains exactly two tables.
In our example, the saved query name "check two tables" is mapped to the SQLquery show tables;
. We can see this by inspecting the dolt_query_catalog
table.
dolt sql -q "select * from dolt_query_catalog;"
+------------------+---------------+------------------+--------------+-------------+
| id | display_order | name | query | description |
+------------------+---------------+------------------+--------------+-------------+
| check two tables | 1 | check two tables | show tables; | |
+------------------+---------------+------------------+--------------+-------------+
The name of a saved query is used as the id
in the dolt_query_catalog
table. This name is stored in the dolt_ci_workflow_saved_query_steps
table as the saved_query_name
, so it can be looked up and run during workflow execution.
When this saved query is executed, the results of the "show tables;" query might look like this:
test % dolt sql -q "show tables;"
+----------------+
| Tables_in_test |
+----------------+
| t1 |
| t2 |
+----------------+
The returned output from the saved query is then a single column, "Tables_in_test", with two rows, "t1" and "t2".
This brings us to the expected_results_type
column of the dolt_ci_workflow_saved_query_steps
table. It indicates the type of "expected results" that should be used to evaluate the results returned by executing the saved query.
In the same way that Dolt CI will evolve to support more types of steps, it will also evolve to support more types of "expected results". Currently, Dolt CI only supports the "expected row and column count results type", which stores expected counts in the dolt_ci_workflow_saved_query_step_expected_row_column_results
table.
dolt_ci_workflow_saved_query_step_expected_row_column_results
CREATE TABLE `dolt_ci_workflow_saved_query_step_expected_row_column_results` (
`id` varchar(36) NOT NULL,
`expected_column_count_comparison_type` int NOT NULL,
`expected_column_row_comparison_type` int NOT NULL,
`expected_column_count` bigint NOT NULL,
`expected_row_count` bigint NOT NULL,
`created_at` datetime(6) NOT NULL,
`updated_at` datetime(6) NOT NULL,
`workflow_saved_query_step_id_fk` varchar(36) NOT NULL,
PRIMARY KEY (`id`),
KEY `workflow_saved_query_step_id_fk` (`workflow_saved_query_step_id_fk`),
CONSTRAINT `dolt_ci_workflow_saved_query_step_expected_row_column_results_ibfk_1` FOREIGN KEY (`workflow_saved_query_step_id_fk`) REFERENCES `dolt_ci_workflow_saved_query_steps` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin;
As indicated by the name, the dolt_ci_workflow_saved_query_step_expected_row_column_results
table stores the number of expected rows and columns returned from a saved query. Take for example the sample workflow file again.
steps:
saved_query_name: "check two tables"
expected_rows: "== 2"
expected_columns: "== 1"
The file indicates that for the saved query "check two tables" which runs the show tables;
query, we expect 2 rows and 1 column to be returned in the SQL results. These expected counts are stored in the expected_row_count
and expected_column_count
columns respectively.
Additionally, the expected_row_count_comparison_type
and expected_column_count_comparison_type
column indicates the type of comparison operation that should be used to assert the expected count matches the actual count. In the file snippet above, the ==
operator indicates equality, so if the expected count does not equal the actual count, the workflow run will fail.
Available comparison types are:
const (
WorkflowSavedQueryExpectedRowColumnComparisonTypeUnspecified WorkflowSavedQueryExpectedRowColumnComparisonType = iota
WorkflowSavedQueryExpectedRowColumnComparisonTypeEquals // ==
WorkflowSavedQueryExpectedRowColumnComparisonTypeNotEquals // !=
WorkflowSavedQueryExpectedRowColumnComparisonTypeLessThan // <
WorkflowSavedQueryExpectedRowColumnComparisonTypeGreaterThan // >
WorkflowSavedQueryExpectedRowColumnComparisonTypeLessThanOrEqual // <=
WorkflowSavedQueryExpectedRowColumnComparisonTypeGreaterThanOrEqual // >=
)
Checking out the diff
Using the above schema definitions in Dolt, we've been able to make continuous integration testing on data with DoltHub, a reality.
To further illustrate these tables in action, let's import our sample workflow file into a Dolt database and check out the diff of the successful import. The dolt ci import command will parse our workflow file and update the various CI tables on our behalf.
# make a new database
% mkdir test
% cd test
test % dolt init
# initialize CI
test % dolt ci init
# create the sample workflow.yaml
test % cat <<EOF > workflow.yaml
name: "my first workflow"
on:
push:
branches:
- main
pull_request:
branches:
- main
- alt
jobs:
- name: "check schema"
steps:
- name: "ensure two tables exist"
saved_query_name: "check two tables"
expected_rows: "== 2"
expected_columns: "== 1"
EOF
# import the workflow.yaml
test % dolt ci import workflow.yaml
# check out the diff
test % dolt diff HEAD~1
diff --dolt a/dolt_ci_workflow_event_trigger_branches b/dolt_ci_workflow_event_trigger_branches
--- a/dolt_ci_workflow_event_trigger_branches
+++ b/dolt_ci_workflow_event_trigger_branches
+---+--------------------------------------+--------+--------------------------------------+
| | id | branch | workflow_event_triggers_id_fk |
+---+--------------------------------------+--------+--------------------------------------+
| + | 80de7968-8938-4cc3-b90c-e1bf26666d93 | main | 98ae18c7-a5ef-4b75-b3db-2b498a8928f7 |
| + | 951d916c-3787-4cf6-903e-51d4d435a38b | alt | 98ae18c7-a5ef-4b75-b3db-2b498a8928f7 |
| + | ba78997d-1a8a-4805-acbe-7d1c36bcf519 | main | f8dab257-b845-4a84-9714-cd31a20e028e |
+---+--------------------------------------+--------+--------------------------------------+
diff --dolt a/dolt_ci_workflow_event_triggers b/dolt_ci_workflow_event_triggers
--- a/dolt_ci_workflow_event_triggers
+++ b/dolt_ci_workflow_event_triggers
+---+--------------------------------------+--------------------+--------------------------------------+
| | id | event_trigger_type | workflow_event_id_fk |
+---+--------------------------------------+--------------------+--------------------------------------+
| + | 98ae18c7-a5ef-4b75-b3db-2b498a8928f7 | 1 | ed21a1be-958c-4553-a911-50f40085dd80 |
| + | f8dab257-b845-4a84-9714-cd31a20e028e | 1 | a0038a05-87b5-4b79-a026-0d7addeeab37 |
+---+--------------------------------------+--------------------+--------------------------------------+
diff --dolt a/dolt_ci_workflow_events b/dolt_ci_workflow_events
--- a/dolt_ci_workflow_events
+++ b/dolt_ci_workflow_events
+---+--------------------------------------+------------+-------------------+
| | id | event_type | workflow_name_fk |
+---+--------------------------------------+------------+-------------------+
| + | a0038a05-87b5-4b79-a026-0d7addeeab37 | 1 | my first workflow |
| + | ed21a1be-958c-4553-a911-50f40085dd80 | 2 | my first workflow |
+---+--------------------------------------+------------+-------------------+
diff --dolt a/dolt_ci_workflow_jobs b/dolt_ci_workflow_jobs
--- a/dolt_ci_workflow_jobs
+++ b/dolt_ci_workflow_jobs
+---+--------------------------------------+--------------+---------------------+---------------------+-------------------+
| | id | name | created_at | updated_at | workflow_name_fk |
+---+--------------------------------------+--------------+---------------------+---------------------+-------------------+
| + | 0d88b842-263e-4743-8b9d-9ab189fe2dd6 | check schema | 2025-01-28 11:03:02 | 2025-01-28 11:03:02 | my first workflow |
+---+--------------------------------------+--------------+---------------------+---------------------+-------------------+
diff --dolt a/dolt_ci_workflow_saved_query_step_expected_row_column_results b/dolt_ci_workflow_saved_query_step_expected_row_column_results
--- a/dolt_ci_workflow_saved_query_step_expected_row_column_results
+++ b/dolt_ci_workflow_saved_query_step_expected_row_column_results
+---+--------------------------------------+---------------------------------------+-------------------------------------+-----------------------+--------------------+---------------------+---------------------+--------------------------------------+
| | id | expected_column_count_comparison_type | expected_column_row_comparison_type | expected_column_count | expected_row_count | created_at | updated_at | workflow_saved_query_step_id_fk |
+---+--------------------------------------+---------------------------------------+-------------------------------------+-----------------------+--------------------+---------------------+---------------------+--------------------------------------+
| + | f9b23079-7d78-423a-869a-e6ae4a05e357 | 1 | 1 | 1 | 2 | 2025-01-28 11:03:02 | 2025-01-28 11:03:02 | 31bab1b9-52ca-47ea-b3a8-4ad5e8cb29a4 |
+---+--------------------------------------+---------------------------------------+-------------------------------------+-----------------------+--------------------+---------------------+---------------------+--------------------------------------+
diff --dolt a/dolt_ci_workflow_saved_query_steps b/dolt_ci_workflow_saved_query_steps
--- a/dolt_ci_workflow_saved_query_steps
+++ b/dolt_ci_workflow_saved_query_steps
+---+--------------------------------------+------------------+-----------------------+--------------------------------------+
| | id | saved_query_name | expected_results_type | workflow_step_id_fk |
+---+--------------------------------------+------------------+-----------------------+--------------------------------------+
| + | 31bab1b9-52ca-47ea-b3a8-4ad5e8cb29a4 | check two tables | 1 | dc857754-eed3-49dd-9e2a-32873f87a240 |
+---+--------------------------------------+------------------+-----------------------+--------------------------------------+
diff --dolt a/dolt_ci_workflow_steps b/dolt_ci_workflow_steps
--- a/dolt_ci_workflow_steps
+++ b/dolt_ci_workflow_steps
+---+--------------------------------------+-------------------------+------------+-----------+---------------------+---------------------+--------------------------------------+
| | id | name | step_order | step_type | created_at | updated_at | workflow_job_id_fk |
+---+--------------------------------------+-------------------------+------------+-----------+---------------------+---------------------+--------------------------------------+
| + | dc857754-eed3-49dd-9e2a-32873f87a240 | ensure two tables exist | 1 | 1 | 2025-01-28 11:03:02 | 2025-01-28 11:03:02 | 0d88b842-263e-4743-8b9d-9ab189fe2dd6 |
+---+--------------------------------------+-------------------------+------------+-----------+---------------------+---------------------+--------------------------------------+
diff --dolt a/dolt_ci_workflows b/dolt_ci_workflows
--- a/dolt_ci_workflows
+++ b/dolt_ci_workflows
+---+-------------------+---------------------+---------------------+
| | name | created_at | updated_at |
+---+-------------------+---------------------+---------------------+
| + | my first workflow | 2025-01-28 11:03:02 | 2025-01-28 11:03:02 |
+---+-------------------+---------------------+---------------------+
The diff output shows the new rows added to each table, and our database is now configured for CI! Once we push the changes to DoltHub, they'll be live.
Conclusion
Dolt CI is available now on DoltHub and DoltLab, and with new features and improvements on the way.
If you'd like to discuss any particular use-cases you have for CI on your Dolt databases, or want to request a feature or two, we'd love to hear from you, so swing by our Discord.
Thanks for reading and don't forget to check out each of our cool products below:
- Dolt—it's Git for data.
- Doltgres—it's Dolt + PostgreSQL.
- DoltHub—it's GitHub for data.
- DoltLab—it's GitLab for data.
- Hosted Dolt—it's RDS for Dolt databases.
- Dolt Workbench—it's a SQL workbench for Dolt databases.