Dolt for Beginners: Schemas

REFERENCE
7 min read

The Dolt for Beginners series focuses on topics people getting started with Dolt will likely be interested in. Most of our other blog articles go pretty deep so we want a space to talk about topics that experts may find boring.

Dolt for Beginners

Dolt is a SQL database. In SQL databases, data is stored in tables. Tables define named columns with types and some other attributes. The set of table definitions, plus some other advanced stuff, is called a schema. This blog will show you how to build a really basic schema for your Dolt database.

Install Dolt

The first thing you need to do to follow along with this article is install Dolt. Dolt is not complicated software. There is no complicated install process. You don't need a Docker container or multiple dependencies installed. You download the single Dolt program and run it.

We have even more convenient ways to install Dolt for every platform.

Once you have Dolt installed, you need to open a terminal like "Terminal" on Mac or Powershell on Windows. Make sure Dolt is on your PATH by typing dolt. If everything is working you'll be greeted by a help message with all the valid Dolt commands. For those familiar with the Git command line, Dolt should look pretty similar.

$ dolt
Valid commands for dolt are
                init - Create an empty Dolt data repository.
              status - Show the working tree status.
                 add - Add table changes to the list of staged table changes.
                diff - Diff a table.
... <trimmed for length>
... <trimmed for length>
... <trimmed for length>
              reflog - Show history of named refs.
              rebase - Reapplies commits on top of another base tip
                  ci - Commands for working with Dolt continuous integration configuration.

Create a Database

We're going to focus on the command-line interface (CLI) to Dolt today. You can also run Dolt as a server and connect to is with a SQL Workbench of your choice. But for demos and blog articles, the CLI is handy.

Open a shell, like Terminal on Mac, or Powershell on Windows. Navigate to the directory where you want your Dolt databases stored. I put mine in ~/dolt. Let's call our database schema_example. To create a database, we use dolt init. Dolt names databases after the directory they are created in so we make a schema_example directory and cd into it.

$ mkdir schema_example
$ cd schema_example 
$ dolt init
Successfully initialized dolt data repository.

We can see that we have a new database with the handy SHOW DATABASES SQL statement. We can now execute any SQL against Dolt using dolt sql -q.

$ dolt sql -q "SHOW DATABASES"
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| schema_example     |
+--------------------+

Those other two databases, information_schema and mysql are provided for MySQL compatibility. You can safely ignore them for now.

Create a Table

Now, it's time to create a simple table. I will make a people table with first_name, last_name, and location columns. I create a table with the CREATE TABLE SQL statement. Again, I can run any SQL from the Dolt CLI. All of my columns are sets of characters of varying length, so in SQL terms, my columns are of type varchar or "variable length character". varchar columns require a maximum length so I'll set that to 100 to be safe.

$ dolt sql -q "CREATE TABLE people(first_name varchar(100), last_name varchar(100), location varchar(100))"

Looks like that worked. At least I did not get an error. Let's make sure using SHOW TABLES.

$ dolt sql -q "SHOW TABLES"
+--------------------------+
| Tables_in_schema_example |
+--------------------------+
| people                   |
+--------------------------+

Let's also make sure I got the table schema I expected. To inspect the schema of a table, I can use one of two SQL statements, DESCRIBE or SHOW CREATE TABLE. I prefer SHOW CREATE TABLE because it shows me the CREATE TABLE statement I would need to run to recreate the table I have. Let's SHOW CREATE TABLE people.

$ dolt sql -q "SHOW CREATE TABLE people"
+--------+------------------------------------------------------------------+
| Table  | Create Table                                                     |
+--------+------------------------------------------------------------------+
| people | CREATE TABLE `people` (                                          |
|        |   `first_name` varchar(100),                                     |
|        |   `last_name` varchar(100),                                      |
|        |   `location` varchar(100)                                        |
|        | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin |
+--------+------------------------------------------------------------------+

As you can see, I got the table I expected. It has three columns all of varchar(100) type. You can ignore the advanced junk (ie. ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin) after the table definition. Those are the default engine, character set, and collation for Dolt.

Add a Primary Key

The table we made above is called a "keyless table" because it has no primary key. A primary key is a unique identifier for each row in the table. SQL databases use primary keys for a number of purposes including speeding up queries. In addition to those purposes, Dolt uses primary keys to identify rows across versions. In the SQL database world, you almost always want your tables to have a primary key. This is why every service you sign up for gives you an account number. That's the primary key for the system.

To add a primary key to our table, we will use an ALTER TABLE query. Whereas CREATE TABLE makes a new table schema, ALTER TABLE modifies an existing table schema. We will add a numerical column called id. Dolt supports a number of numerical types but we'll make our id an integer. We tell Dolt this is the primary key column by appending primary key to the column definition.

$ dolt sql -q "alter table people add column id int primary key"

Now, let's inspect our handy-work with SHOW CREATE TABLE.

$ dolt sql -q "SHOW CREATE TABLE people"                        
+--------+------------------------------------------------------------------+
| Table  | Create Table                                                     |
+--------+------------------------------------------------------------------+
| people | CREATE TABLE `people` (                                          |
|        |   `first_name` varchar(100),                                     |
|        |   `last_name` varchar(100),                                      |
|        |   `location` varchar(100),                                       |
|        |   `id` int NOT NULL,                                             |
|        |   PRIMARY KEY (`id`)                                             |
|        | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin |
+--------+------------------------------------------------------------------+

Now, I have a keyed people table.

Create Another Table

SQL databases can have multiple tables. Let's add a locations table to our database. This table will have a id column as the primary key, this time from the start, and city, state, and country columns all of type varchar(100).

$ dolt sql -q "CREATE TABLE location(id int primary key, city varchar(100), state varchar(100), country varchar(100))"

Now, you'll remember I have a location column of type varchar(100) in my original people table. Let's change that to a location id so it will be easy to join these two tables together in the future. To do that, I change the column name and type of the location column in the people table using ALTER TABLE. This requires two alters, one to rename the column and one to change the type.

$ dolt sql -q "alter table people rename column location to location_id"
$ dolt sql -q "alter table people modify column location_id int"   

Now, let's inspect the state of my two tables.

$ dolt sql -q "SHOW CREATE TABLE people"
+--------+------------------------------------------------------------------+
| Table  | Create Table                                                     |
+--------+------------------------------------------------------------------+
| people | CREATE TABLE `people` (                                          |
|        |   `first_name` varchar(100),                                     |
|        |   `last_name` varchar(100),                                      |
|        |   `location_id` int,                                             |
|        |   `id` int NOT NULL,                                             |
|        |   PRIMARY KEY (`id`)                                             |
|        | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin |
+--------+------------------------------------------------------------------+

$ dolt sql -q "SHOW CREATE TABLE location" 
+----------+------------------------------------------------------------------+
| Table    | Create Table                                                     |
+----------+------------------------------------------------------------------+
| location | CREATE TABLE `location` (                                        |
|          |   `id` int NOT NULL,                                             |
|          |   `city` varchar(100),                                           |
|          |   `state` varchar(100),                                          |
|          |   `country` varchar(100),                                        |
|          |   PRIMARY KEY (`id`)                                             |
|          | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin |
+----------+------------------------------------------------------------------+

This is a common schema pattern you'll see in a lot of databases. In order to reduce the width of tables and to make sure common information is only stored once, database schemas often factor common data into separate tables "related" by id columns. This is why SQL databases are often referred to as relational databases. There's even a special relation you can create called a foreign key that formalizes and enforces rules on this relation. But that's a bit advanced and this is our beginners series.

Create a View

The last element of schema I'll touch on in this article is called a "view". A view is a table derived from other tables in your database using a SQL query. You select from views just like you would select from tables but the data is generated using SQL. So you define a SQL query that makes a view of other tables.

In my example, I'll create a view that lists people and their location all in one flat table called alldata. To do this I must join the people and locations table. The SQL to do this is a simple JOIN.

select first_name, last_name, city, state, country 
from people 
join location on people.location_id=location.id

Now I send that SQL into a CREATE VIEW statement to create a view.

$ dolt sql -q "CREATE VIEW alldata as select first_name, last_name, city, state, country from people join location on people.location_id=location.id"

Like tables, to inspect the schema of a view I use SHOW CREATE VIEW.

$ dolt sql -q "SHOW CREATE VIEW alldata"
+---------+-----------------------------------------------------------------------------------------------------------------------------------------+----------------------+----------------------+
| View    | Create View                                                                                                                             | character_set_client | collation_connection |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------+----------------------+----------------------+
| alldata | CREATE VIEW `alldata` AS select first_name, last_name, city, state, country from people join location on people.location_id=location.id | utf8mb4              | utf8mb4_0900_bin     |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------+----------------------+----------------------+

Looks good to me.

Make sure it all Works

Now, to make sure this all works, let's insert a couple records into each table and select from the alldata view.

$ dolt sql -q "insert into location values (0, 'Santa Monica', 'CA', 'USA'), (1, 'Seattle', 'WA', 'USA')";  
Query OK, 2 rows affected (0.01 sec)
$ dolt sql -q "insert into people values ('Tim', 'Sehn', 0, 0), ('Zach', 'Musgrave', 1, 1)"
Query OK, 2 rows affected (0.01 sec)
$ dolt sql -q "select * from alldata"
+------------+-----------+--------------+-------+---------+
| first_name | last_name | city         | state | country |
+------------+-----------+--------------+-------+---------+
| Tim        | Sehn      | Santa Monica | CA    | USA     |
| Zach       | Musgrave  | Seattle      | WA    | USA     |
+------------+-----------+--------------+-------+---------+

My schema is working as expected. The alldata view joined the data from the two tables together and got rid of those pesky, distracting id columns.

Conclusion

Just like other SQL databases, Dolt uses schema to define the shape of your data. Data is stored in databases and tables. Views can be created to make derived tables. We just scratched the surface of Dolt schemas but hopefully you have a better idea on how to build a schema for your Dolt tables so you can take advantage of Dolt's unique version control features. More questions? Come by our Discord and just ask. We're here to help.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.