Why you need a Git-style Remote for your Database

REFERENCE
9 min read

Dolt is the world's first version-controlled SQL database. It merges, pun intended, all the features of a SQL database like MySQL with the version control functionality of Git.

Git has the concept of a remote. Remotes are the things you clone, push, pull, and fetch from. You usually have one called origin. GitHub and GitLab are popular Git remotes you've probably heard of.

Dolt also has remotes. DoltHub and DoltLab are Dolt remotes. DoltHub is named after GitHub and DoltLab is named after GitLab. GitHub started as a fully hosted solution and moved to on-premises deployments later. GitLab started as a self-deployed on-premises remote and later sold enterprise features. Thus, for Dolt, DoltHub is our fully hosted solution and DoltLab is our self-deployed offering.

People coming to Dolt from a Git perspective understand remotes. They want a backup and collaboration platform for their Dolt databases. But people coming from a SQL database perspective often fail to see the use case for remotes. This blog will explain what remotes are and what they can be used for.

Remotes

What are Remotes?

Remotes are an offline copy of your Dolt database in another location, usually a different network connected computer. If you are familiar with Git remotes, Dolt remotes behave in the exact same way. So, if you know Git, feel free to skip this section and go on to the next section.

You can see the remotes you have referenced by calling the dolt remote command from the command line or running select * from dolt_remotes in SQL. You can create a new remote using dolt remote add or call dolt_remote('add',...). A Dolt database can have multiple remotes.

$ mkdir yay-remotes
$ cd yay-remotes 
$ dolt init
Successfully initialized dolt data repository.
$ dolt remote add origin tim/yay-remotes
$ dolt remote add doltlab https://doltlab.dolthub.com:50051/timsehn/yay-remotes
$ dolt remote
doltlab
origin
$ dolt sql -q "select * from dolt_remotes"
+---------+-------------------------------------------------------+-----------------------------------------+--------+
| name    | url                                                   | fetch_specs                             | params |
+---------+-------------------------------------------------------+-----------------------------------------+--------+
| doltlab | https://doltlab.dolthub.com:50051/timsehn/yay-remotes | ["refs/heads/*:refs/remotes/doltlab/*"] | {}     |
| origin  | https://doltremoteapi.dolthub.com/tim/yay-remotes     | ["refs/heads/*:refs/remotes/origin/*"]  | {}     |
+---------+-------------------------------------------------------+-----------------------------------------+--------+

NOTE: Dolt defaults to DoltHub as a remote. Thus, I did not need a full URL to add DoltHub as a remote.

Clone

Usually, you aren't starting a new database and instead you are cloning a database from somewhere. You clone using the dolt clone command or dolt_clone() procedure. In this case, your first remote, called origin, is set up for you as the remote you cloned from. We set up a database on the DoltLab demonstration instance called yay-clones.

Test database

$ dolt clone https://doltlab.dolthub.com:50051/timsehn/yay-clones
cloning https://doltlab.dolthub.com:50051/timsehn/yay-clones
$ cd yay-clones 
$ dolt sql -q "select * from dolt_remotes"
+--------+------------------------------------------------------+----------------------------------------+--------+
| name   | url                                                  | fetch_specs                            | params |
+--------+------------------------------------------------------+----------------------------------------+--------+
| origin | https://doltlab.dolthub.com:50051/timsehn/yay-clones | ["refs/heads/*:refs/remotes/origin/*"] | {}     |
+--------+------------------------------------------------------+----------------------------------------+--------+

Push

When we make changes locally and want those changes to show up on the remote, we push the changes. You push using the dolt push command or dolt_push() procedure.

$ dolt sql -q "insert into t values (1, 'I made this change on my laptop')" 
Query OK, 1 row affected (0.00 sec)
$ dolt diff
diff --dolt a/t b/t
--- a/t
+++ b/t
+---+----+---------------------------------+
|   | id | words                           |
+---+----+---------------------------------+
| + | 1  | I made this change on my laptop |
+---+----+---------------------------------+
$ dolt commit -am "A change on Tim's laptop"
commit tigj0nmr3ium7j2oadahs106m1qn52r6 (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Wed Apr 24 11:14:18 -0700 2024

        A change on Tim's laptop

$ dolt push
| Uploading...
To https://doltlab.dolthub.com:50051/timsehn/yay-clones
 * [new branch]          main -> main

Test database push

Pull

When changes happen on the remote and we want to merge those changes into our local copy, we pull the changes. You pull using the dolt pull command or dolt_pull() procedure.

Test database pull

$ dolt pull
\ Pulling...Fast-forward
Updating tigj0nmr3ium7j2oadahs106m1qn52r6..q5n7pbfrm5b1ogeepf1cul58aqo6fotc
Everything up-to-date
$ dolt sql -q "select * from t"
+----+---------------------------------+
| id | words                           |
+----+---------------------------------+
| 0  | Yay                             |
| 1  | I made this change on my laptop |
| 2  | I made these changes on DoltLab |
+----+---------------------------------+

Fetch

When changes happen on the remote and we want to get those changes locally, but we don't want to change any of our local branches with a merge, we fetch the changes. You fetch using the dolt fetch command or the dolt_fetch() procedure.

Test database fetch

$ dolt fetch
$ dolt sql -q "select * from t"
+----+---------------------------------+
| id | words                           |
+----+---------------------------------+
| 0  | Yay                             |
| 1  | I made this change on my laptop |
| 2  | I made these changes on DoltLab |
+----+---------------------------------+

$ dolt sql -q "select * from t as of 'origin/main'"
+----+--------------------------------------------+
| id | words                                      |
+----+--------------------------------------------+
| 0  | Yay                                        |
| 1  | I made this change on my laptop            |
| 2  | I made these changes on DoltLab            |
| 3  | A change on DoltLab I do not want to merge |
+----+--------------------------------------------+

$ dolt diff origin/main
diff --dolt a/t b/t
--- a/t
+++ b/t
+---+----+--------------------------------------------+
|   | id | words                                      |
+---+----+--------------------------------------------+
| - | 3  | A change on DoltLab I do not want to merge |
+---+----+--------------------------------------------+

Remote Options

When we were explaining remotes, we showed off the DoltHub and DoltLab remote options. DoltHub and DoltLab are remotes with a web user interface. These web interfaces enable features like permissions, pull requests, and many others that we'll dig into later. Dolt supports a few other remotes as well that we'll explain here.

File System

The simplest remote is a file system remote. You can set up a directory on your machine to push and pull from. This can be a network mounted drive if you want to use this remote for backup or collaboration with others.

$ ls /tmp/
com.apple.launchd.H8y88Rzpnm	powerlog
mysql.sock
$ dolt remote add file file:///tmp/yay-clones 
$ dolt push file main
- Uploading...
To file:///tmp/yay-clones
 * [new branch]          main -> main
$ ls /tmp
com.apple.launchd.H8y88Rzpnm	powerlog
mysql.sock			yay-clones

Obviously, a file system has no web user interface and thus, no pull request workflow or other web features.

DoltHub

DoltHub

DoltHub is a fully hosted remote accessible via the internet. DoltHub is modeled after GitHub. DoltHub supports a full permissions model, viewing and editing databases via an integrated SQL workbench, a pull request workflow, issues, forks, and many more features. DoltHub is free to host public data and $50/month plus storage for private data. DoltHub is the simplest, most reliable remote option for your Dolt databases.

DoltLab

DoltHub

DoltLab is a DoltHub-style remote you deploy in your own network. DoltLab has the same web user interface as DoltHub and thus, has all the same features as DoltHub. DoltLab is for people who want a DoltHub-style remote for collaboration but don't want their Dolt database leaving their network. DoltLab requires a Ubuntu machine and some set up. So, you probably want to use DoltHub unless you have a local requirement.

Dolt SQL Server

You can also configure a running Dolt SQL Server as a remote, allowing you to use a running Dolt database as your remote. The remote uses SQL users and grants to establish which users can use the remote.

Hosted Dolt has an option to act as a remote. This is the easiest way to show off the feature, though any running Dolt SQL Server can enable the option. Once you've deployed a Hosted Dolt and enabled the "Expose Remote Endpoint" option, your deployments page will have the following instructions.

Hosted Clone

And you'll be able to clone, push, pull, and fetch your Hosted Dolt database as well as access it as a running SQL server.

dolt clone https://timsehn-yay-remotes.dbs.hosted.doltdb.com/yay-remotes --user "dqk2fjq4mzvr5gw6"
cloning https://timsehn-yay-remotes.dbs.hosted.doltdb.com/yay-remotes
$ cd yay-remotes 
$ dolt sql -q "insert into t values (1, 'I made this locally and pushed to a running hosted SQL server')" 
Query OK, 1 row affected (0.00 sec)
$ dolt commit -am "I made this on my Mac laptop" 
commit 0p3ak2g05vs8dieesnqslidtetm2h25t (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Wed Apr 24 12:45:44 -0700 2024

        I made this on my Mac laptop

$ dolt push origin main --user "dqk2fjq4mzvr5gw6"
| Uploading...
To https://timsehn-yay-remotes.dbs.hosted.doltdb.com/yay-remotes
 * [new branch]          main -> main

And now in the hosted workbench, voila!

Workbench push

By default, remotes are offline which means you cannot connect directly to them using a MySQL client. You connect to remotes with clone, push, pull and fetch. When you enable your Dolt SQL server to act as a remote, you can push and pull from it but you can also read from it and write to it using a MySQL client. It is online.

$ mysql -h"timsehn-yay-remotes.dbs.hosted.doltdb.com" -u"dqk2fjq4mzvr5gw6" -p"<redacted>" yay-remotes -e "select * from t"
+----+---------------------------------------------------------------+
| id | words                                                         |
+----+---------------------------------------------------------------+
|  0 | I made this in the hosted workbench                           |
|  1 | I made this locally and pushed to a running hosted SQL server |
+----+---------------------------------------------------------------+

Remote Advantages

As you are probably starting to see, remotes are a powerful Git capability. Dolt adds remotes to SQL databases. Let's go through some of the things remotes are used for to show you some advantages of using Dolt as your SQL database.

Verifiable Backups

One of the advantages of decentralized version control like Git is a full copy and history of source code files exists on every clone. Essentially, every clone is a backup of the code up to that point in history. This means all clones must be destroyed for you to lose data. Decentralized systems are far more resilient to data loss than centralized systems.

Dolt brings decentralized power to the traditionally centralized relational databases. A remote and clones can act as multiple backups for your critical databases.

Moreover, a remote can act as a check against data corruption or tampering. A known good read-only copy of a database can be stored remotely. To test for tampering, you run through the commit log of the remote comparing the commit hashes against the copy you are verifying. If the commit log is the same, the data is the same, guaranteed.

Debug on a Clone

Let's say a user has identified a bug but it is only reproducible with production data. In many environments, the production database is locked down to prevent catastrophic reads or writes. With Dolt and a remote, getting a copy of the production database is as simple as making a clone. Clone a copy of the production database to your laptop and debug on a production copy without worrying about breaking anything. You can even modify the data if you need to.

Collaboration

Remotes enable decentralized collaboration. Multiple, independent copies of the database can be evolving in parallel, pushing to different branches on the remote. When two branches want to coordinate, a merge process is used. On DoltHub and DoltLab, this merge process is a Pull Request which adds integrated human review to the process. This collaboration process enables the open source ecosystem and we think it can enable an open data ecosystem as well.

Permissions

A remote is a good place to enforce permissions. Who has read and write access to the database? Who can update the main branch? Do you need to go through a Pull Request to update main? Who must review said Pull Request? All of these policies are implemented and enforced on the remote.

Pull Requests

As mentioned in the collaboration section, Pull Requests are the most popular merge process use for collaboration. Pull requests implement integrated human and/or automated review of changes via differences (or diffs) to a branch before merge. Dolt combined with DoltHub or DoltLab, give you Pull Requests on database changes.

Dolt is the only SQL database that can tell you quickly what data and schema changed between two copies. Dolt is built from the storage engine up to give you this capability. Pull requests leverage this capability.

Issues

DoltHub and DoltLab implement built-in issue tracking. Issues allow for you to communicate about your database with others or the world. Found a problem with the data? Create an issue. Think a secondary index over a couple columns would be helpful? Create an issue. The maintainer of the database will see it and respond accordingly.

Forks

Do you want your own writable copy of a database but don't want to ask the maintainer for write permissions? This is when you create a fork. Forks enable permission-less, decentralized collaboration, a staple of the open source community. Dolt in combination with DoltHub or DoltLab bring the power of forks to the open data community.

Conclusion

Dolt brings the power of Git-style remotes to the SQL database. With Dolt you have a number of remote options including DoltHub and DoltLab. These remotes come with powerful web interfaces that enable features you know and love like Pull Requests, Forks, and Issues. Dolt can run as a MySQL replacement without a remote, but adding a remote enables a number of powerful use cases. Curious? Come by our Discord and let's discuss whether Dolt with a remote is right for your use case.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.