So you want a Temporal Database?
When I was writing my opus on Slowly Changing Dimension, I came across the term "temporal database". I had not heard of a "temporal database" before. I had heard of "temporal tables" but not a "temporal database". After doing a bit of research, I decided a "So you want..." blog was in order.
Here at DoltHub, we built Dolt, the world's first and only version controlled SQL database. Is a version controlled database considered a temporal database? If so, how? What's the difference? This article attempts to answer this and other questions about temporal databases and version controlled databases. Finally, we conclude with a survey of your options if you want a temporal database, focusing on open source offerings.
What is a Temporal Database?
Wikipedia offers the following definition:
A temporal database stores data relating to time instances. It offers temporal data types and stores information relating to past, present and future time.
This is a bit hard for me to parse. I'll offer a simpler definition. A temporal database is a database with built-in time period and history functionality.
Before temporal databases you would use Soft Deletes and/or Slowly Changing Dimension in your application to mimic the functionality of a temporal database. Temporal databases provide that functionality at the database layer, potentially obviating the need for soft deletes or slowly changing dimension at the application layer.
Let's break it down even further.
What does Database mean?
When most people hear database, they think tables and SQL. In the case of temporal databases, there are a few NoSQL options, mostly document and graph databases. I'll discuss SQL and NoSQL databases in the products section of this article. Since I'm more familiar with SQL as we built Dolt, the world's first version controlled SQL database, I'll focus on tables and SQL for the explanation sections.
What does Temporal mean?
This is where we get a little mind-bendy. For temporal databases, we define three types of time:
- Valid Time - The period in which the record is true.
- Transaction Time - The time the system changed a record.
- Decision Time - The time the fact in the record was deemed true.
If you implement one, two or three of these types of time you are a Uni-, Bi-, and Tri-temporal database, respectively. This kind of seems like non-helpful jargon to me. But people use it in their marketing so some people must like it.
Note, valid time and decision time is a way to model data and is not generally applicable to all data. Transaction time is applicable to all data to produce audit logs instead of using soft deletes. Thus, transaction time or system time functionality is the temporal database feature most widely used.
This makes more sense if you walk through an example. I used this example to explain Slowly Changing Dimension.
I'm going to have an employee table with employee_id
, last_name
, first_name
, and title
columns. We're going to have an employee last name, "Sehn", first name, "Tim", start as a "Software Engineer Intern", become a "Software Engineer", and finally become a "Software Engineer Manager". This is a simplified version of a table one would expect to find in most Enterprise Resource Planning (ERP) software.
In a non-temporal world, your table would evolve like this. No history. Just how the record looks at the time you read it. The table starts as:
+-------------+-----------+------------+--------------------------+
| employee_id | last_name | first_name | title |
+-------------+-----------+------------+--------------------------+
| 1 | Sehn | Tim | Software Engineer Intern |
+-------------+-----------+------------+--------------------------+
Then becomes:
+-------------+-----------+------------+-------------------+
| employee_id | last_name | first_name | title |
+-------------+-----------+------------+-------------------+
| 1 | Sehn | Tim | Software Engineer |
+-------------+-----------+------------+-------------------+
And finally becomes:
+-------------+-----------+------------+---------------------------+
| employee_id | last_name | first_name | title |
+-------------+-----------+------------+---------------------------+
| 1 | Sehn | Tim | Software Engineer Manager |
+-------------+-----------+------------+---------------------------+
To make this example temporal, I need to add time. Let's say these promotions are effective January 1, 2020, 2021, and 2022. I'm also going to reflect a pretty typical, but inefficient, promotion process. The committee who meets to discuss promotions meets late on January 7. Human Resources does not enter the information until January 14. The promotion is effective January 1. Thus, we have a valid time, a transaction time, and a decision time.
This is how that table would look in a normal SQL database. In addition to our core columns, we must add valid_from
, valid_to
, decision_from
, decision _to
, transaction_from
, and transaction_to
. The application would be responsible for handling the updating and inserting of the valid_from
, valid_to
, decision_from
, decision _to
, transaction_from
, and transaction_to
columns. In these examples we use NULL
to represent now or no end point. Note, modeling a table this way requires adding columns to the primary key. I added transaction_from
for this example.
show create table employees
+-----------+------------------------------------------------------------------+
| Table | Create Table |
+-----------+------------------------------------------------------------------+
| employees | CREATE TABLE `employees` ( |
| | `id` int NOT NULL, |
| | `last_name` varchar(100), |
| | `first_name` varchar(100), |
| | `title` varchar(100), |
| | `valid_from` datetime(6), |
| | `valid_to` datetime(6), |
| | `decision_from` datetime(6), |
| | `decision_to` datetime(6), |
| | `transaction_from` datetime(6) NOT NULL, |
| | `transaction_to` datetime(6), |
| | PRIMARY KEY (`id`,`transaction_from`) |
| | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin |
+-----------+------------------------------------------------------------------+
select * from employees
+----+-----------+------------+---------------------------+---------------------+---------------------+---------------------+-------------+---------------------+---------------------+
| id | last_name | first_name | title | valid_from | valid_to | decision_from | decision_to | transaction_from | transaction_to |
+----+-----------+------------+---------------------------+---------------------+---------------------+---------------------+-------------+---------------------+---------------------+
| 1 | Sehn | Tim | Software Engineer Intern | 2020-01-01 00:00:00 | 2021-01-01 00:00:00 | 2020-01-07 00:00:00 | NULL | 2020-01-14 00:00:00 | 2020-01-21 00:00:00 |
| 1 | Sehn | Tim | Software Engineer | 2021-01-01 00:00:00 | 2022-01-01 00:00:00 | 2021-01-07 00:00:00 | NULL | 2021-01-14 00:00:00 | 2022-01-21 00:00:00 |
| 1 | Sehn | Tim | Software Engineer Manager | 2022-01-01 00:00:00 | NULL | 2022-01-07 00:00:00 | NULL | 2022-01-14 00:00:00 | NULL |
+----+-----------+------------+---------------------------+---------------------+---------------------+---------------------+-------------+---------------------+---------------------+
How would this table change in a temporal database? For SQL, temporal functionality was added to the SQL standard in SQL:2011. We'll lean on that syntax here.
Time Periods
The first thing temporal databases add is the concept of a time period. Temporal databases allow you to define a time period and add constraints like "does not overlap" to time periods. For our example, we would model valid_from
, valid_to
, decision_from
, decision _to
, transaction_from
, and transaction_to
as time periods and likely add a "without overlaps" constraint.
Moreover, we could add one or more of these ranges to the primary key instead of just adding transaction_from
. You can also add foreign key constraints from other tables on these ranges.
+-----------+------------------------------------------------------------------+
| Table | Create Table |
+-----------+------------------------------------------------------------------+
| employees | CREATE TABLE `employees` ( |
| | `id` int NOT NULL, |
| | `last_name` varchar(100), |
| | `first_name` varchar(100), |
| | `title` varchar(100), |
| | `valid_from` datetime(6), |
| | `valid_to` datetime(6), |
| | `decision_from` datetime(6), |
| | `decision_to` datetime(6), |
| | `transaction_from` datetime(6) NOT NULL, |
| | `transaction_to` datetime(6), |
| | PERIOD FOR valid(`valid_from`, `valid_to`) |
| | PERIOD FOR decision(`decision_from`, `decision_to`) |
| | PERIOD FOR transaction(`transaction_from`, `transaction_to`) |
| | UNIQUE (id, valid WITHOUT OVERLAPS) |
| | UNIQUE (id, decision WITHOUT OVERLAPS) |
| | UNIQUE (id, transaction WITHOUT OVERLAPS) |
| | PRIMARY KEY (`id`,`transaction_from`) |
| | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin |
+-----------+------------------------------------------------------------------+
History
The second thing a temporal database adds is automated history. In a temporal database, the transaction_from
and transaction_to
columns can be hidden and automatically updated by the system. The system knows when transactions are being made. You can then access history via special SQL queries like select * from employees as of system time '2021-09-04 00:00:00'
or select * from employees versions between '2021-09-04 00:00:00' and '2022-09-04 00:00:00'
.
+-----------+-----------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------+-----------------------------------------------------------------------------------------+
| employees | CREATE TABLE `employees` ( |
| | `id` int NOT NULL, |
| | `last_name` varchar(100), |
| | `first_name` varchar(100), |
| | `title` varchar(100), |
| | `valid_from` datetime(6), |
| | `valid_to` datetime(6), |
| | `decision_from` datetime(6), |
| | `decision_to` datetime(6), |
| | PERIOD FOR valid(`valid_from`, `valid_to`) |
| | PERIOD FOR decision(`decision_from`, `decision_to`) |
| | UNIQUE (id, valid WITHOUT OVERLAPS) |
| | UNIQUE (id, decision WITHOUT OVERLAPS) |
| | PRIMARY KEY (`id`,`transaction_from`) |
| | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin WITH SYSTEM VERSIONING |
+-----------+-----------------------------------------------------------------------------------------+
Is a Version Controlled Database a Temporal Database?
Yes. From the temporal database wikipedia:
Temporal databases were one of the earliest forms of data version control, and influenced the development of modern data versioning systems.
We built a cool version controlled database so we're biased, but we agree. Version controlled databases by default store history and are thus uni-temporal databases. If the version controlled database supports time periods, it can be made bi-temporal or tri-temporal. No one has asked for time periods in Dolt yet. If you want them file an issue.
Why you need a Temporal Database
History
The primary reason for a temporal database is history. As storage becomes cheaper, why not keep the history of all your tables? Internal audit wants an audit log. Your warehouse uses the history for Change Data Capture. You can rollback easily if someone runs a bad query. In retrospect, it's kind of weird database updates can be destructive.
Beyond history is version control: the ability to diff, branch, and merge different histories. There is a new class of temporal databases that are also version controlled databases.
Better Time Modeling
If you have time period data in your data model, temporal databases allow you to model that data more efficiently. The database ensures data quality instead of your application. Having overlapping time constraints in primary keys and foreign keys is powerful.
How do Temporal Databases work?
Time periods are implemented using custom syntax and code. Where things get interesting is how databases store all the history of tables efficiently.
History Tables
History tables are like Soft Deletes or Slowly Changing Dimension but the database does it for you. The tables are hidden from you and accessed via the temporal SQL syntax. The issue with these solutions is scale. History tables do not share storage between versions. Particularly for wide schemas, storing the history of every edit can be expensive in storage.
Structural Sharing
There is a new class of storage systems built specifically for temporal and version controlled databases that only stores the differences between versions. There are a bunch of different approaches and algorithms. The approach is new so a standard has not emerged.
Dolt implements a data structure like a B-tree called a Prolly tree that can be compared in time proportional to the difference between the trees. These trees can be placed in a Merkle DAG to give full Git-style version control including diffs, branches, and merges. I'm biased but I think this is the best approach to structural sharing for databases.
Products
I have divided temporal databases into three product categories: temporal tables in SQL databases, NoSQL temporal databases, and version controlled databases.
Temporal Tables in SQL Databases
The "Big 3" closed source SQL databases all support some flavor of temporal tables: SQL Server, Oracle, and IBM Db2. In these articles I focus on open source solutions but if you are interested in proprietary solutions, feel free to contact sales at any of these large companies.
As fas as open source solutions to temporal tables you have MariaDB or Postgres (via extension). Notably absent are MySQL and SQLite. Of these two solutions MariaDB supports temporal tables in the core product and sticks to the SQL:2011 standard most closely. I would recommend MariaDB for those interested in temporal tables in existing SQL databases. It's what I used to demonstrate the syntax in the above example.
The issue with these solutions is scale. Temporal tables in these databases are implemented on top of hidden history tables. History tables do not share storage between versions. Particularly for wide schemas, storing the history of every edit can be expensive in storage.
NoSQL Temporal Databases
I ignored MarkLogic as it is freeware, not open source.
SirixDB
- Tagline
- An embeddable, temporal, evolutionary database system
- Initial Release
- September 2012
- GitHub
- https://github.com/sirixdb/sirix
SirixDB has been around for over a decade. The GitHub project is not super active. I almost did not include it given the inactivity.
SirixDB is an immutable database that advertises itself as an embeddable, temporal, evolutionary database system. You can store XML of JSON data in SirixDB. It is written in Java.
SirixDB supports structural sharing of database revisions using "a novel versioning algorithm called sliding snapshot". This algorithm works at the database page level. The linked article goes into fairly good detail. From a versioning perspective the ideas seem pretty sound.
XTDB
- Tagline
- General-purpose bitemporal database for SQL, Datalog & graph queries.
- Initial Release
- September 2020
- GitHub
- https://github.com/xtdb/xtdb
This is my first time researching XTDB though I have come across it in other contexts. XTDB is supported by JUXT, a financial services consulting company. I assume they saw the need for XTDB in their financial services consulting which makes sense given the need for audit logs in financial applications.
XTDB is a novel database as it allows you to stitch together any transaction log and document the storage engine you want into a temporal database. They call this architecture an "unbundled architectural approach". They put the transaction log at the center of the architecture which they call "turning the database inside out". I like the way they market this. My issue with their approach is it seems like a lot of set up and infrastructure to run. From this image, it looks like I need to run three pieces of infrastructure to get a running system. Maybe the flexibility is worth the complexity?
XTDB supports SQL though it is not a native SQL database. Tables are stored as documents. I would call XTDB's SQL support "SQL-like". Read their documentation for more details. To access historical data you use the Datalog interface, specifically the time travel or History APIs. Writes seem to be natively versioned.
Anyway, the XTDB project is very active and new. I expect it to continue to get better in the future.
Version Controlled Databases
Terminus DB
- Tagline
- Making Data Collaboration Easy
- Initial Release
- October 2019
- GitHub
- https://github.com/terminusdb/terminusdb
TerminusDB is a NoSQL version controlled database so it could be in either category. TerminusDB is a graph database. Since I started writing these So you want... blogs, TerminusDB has been Dolt's partner in the version controlled database category. They recently branched out and released TerminusCMS and VectorLink. Hopefully, they are still grinding on their core versioned graph database. We don't want to be alone down here.
Anyway, TerminusDB has full schema and data versioning capability on a graph database using a custom query language called Web Object Query Language (WOQL). WOQL is schema optional. TerminusDB also has the option to query JSON directly, similar to MongoDB, giving users a more document database style interface.
The versioning syntax is exposed via TerminusDB Console or a command line interface. The versioning metaphors are similar to Git. You branch, push, and pull. See their documentation for more information.
Dolt
- Tagline
- It's Git for Data
- Initial Release
- August 2019
- GitHub
- https://github.com/dolthub/dolt
Finally, we have Dolt. Dolt implements Git-style versioning on a SQL database. Dolt is the only SQL database you can diff, branch, and merge. These features are in addition to temporal time travel features.
Dolt is technically uni-temporal as it implements a built in log via the Git-style commit system. Dolt does not implement time period functionality but if it did it could be made bi-temporal or tri-temporal. If you want time period support, file an issue.
Dolt implements as of
and versions between
queries. In addition to timestamps these queries can be made using commits, branches, or tags, augmenting time travel functionality.
Let's see how Dolt works for our example. We make a Dolt commit every time we want to access the database temporally.
mysql> create table employees (
`employee_id` int,
`last_name` varchar(100),
`first_name` varchar(100),
`title` varchar(100),
primary key(`employee_id`));
Query OK, 0 rows affected (0.02 sec)
mysql> call dolt_commit('-Am', 'Created Table');
+----------------------------------+
| hash |
+----------------------------------+
| kbas1fkidnht61ekeivf28a3r4circ6g |
+----------------------------------+
1 row in set (0.02 sec)
mysql> insert into employees values (1,'Sehn','Tim','Software Engineer Intern');
Query OK, 1 row affected (0.02 sec)
mysql> call dolt_commit('-Am', 'Added Tim Sehn');
+----------------------------------+
| hash |
+----------------------------------+
| hp705lj2ehefnuk6352fs05nm3m0cc6t |
+----------------------------------+
1 row in set (0.01 sec)
mysql> update employees set title='Software Engineer' where employee_id=1;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> call dolt_commit('-Am', 'Rehired Tim Sehn full time');
+----------------------------------+
| hash |
+----------------------------------+
| k27ecdhqh3igk46a23mcqq0mg90pjre5 |
+----------------------------------+
1 row in set (0.02 sec)
mysql> update employees set title='Software Engineer Manager' where employee_id=1;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> call dolt_commit('-Am', 'Promoted Tim Sehn');
+----------------------------------+
| hash |
+----------------------------------+
| 0vlrmae4kc2d10guqp5vdijlnefs0sjb |
+----------------------------------+
1 row in set (0.02 sec)
mysql> select * from employees where employee_id=1;
+-------------+-----------+------------+---------------------------+
| employee_id | last_name | first_name | title |
+-------------+-----------+------------+---------------------------+
| 1 | Sehn | Tim | Software Engineer Manager |
+-------------+-----------+------------+---------------------------+
1 row in set (0.01 sec)
mysql> select * from dolt_history_employees where employee_id=1;
+-------------+-----------+------------+---------------------------+----------------------------------+-----------+-------------------------+
| employee_id | last_name | first_name | title | commit_hash | committer | commit_date |
+-------------+-----------+------------+---------------------------+----------------------------------+-----------+-------------------------+
| 1 | Sehn | Tim | Software Engineer Manager | 0vlrmae4kc2d10guqp5vdijlnefs0sjb | root | 2023-06-21 18:32:24.81 |
| 1 | Sehn | Tim | Software Engineer | k27ecdhqh3igk46a23mcqq0mg90pjre5 | root | 2023-06-21 18:31:58.486 |
| 1 | Sehn | Tim | Software Engineer Intern | hp705lj2ehefnuk6352fs05nm3m0cc6t | root | 2023-06-21 18:31:04.552 |
+-------------+-----------+------------+---------------------------+----------------------------------+-----------+-------------------------+
We're a little biased but we think temporal databases are going to converge with version controlled databases. Once you store history, you may as well have diff, branch, and merge as well. If you're interested in learning more, come by our Discord and chat.