Using DoltHub as a Remote for Hosted Dolt
We have two web products for Dolt: DoltHub and Hosted Dolt. DoltHub is GitHub for Dolt and Hosted is AWS RDS for Dolt. Each product has different use cases depending on your goals, but there are also reasons to use DoltHub and Hosted together. Easily build an online application based on a crowd-sourced database on DoltHub or isolate changes to your production database and sync on your schedule.
To interact with DoltHub from your Hosted instance, you can use DoltHub as a
remote. We have SQL
remote operations such as dolt_clone
, dolt_pull
, and
dolt_push
that let you interact with remotes using SQL. However until recently we didn't
have an authentication model for Hosted that lets you clone private databases from DoltHub
or push to DoltHub databases you have write permissions to. You can now add Dolt
credentials to your Hosted instance to enable these actions.
How it works
When you use Dolt on the command line to
interact with DoltHub or DoltLab to pull private databases or push to databases you have
permission to, you're required to run dolt
login
. This creates a new
public/private keypair for authenticating with doltremoteapi and adds it to your global
Dolt configuration. It then opens up your browser to your credential settings
page on DoltHub where you can add the
public key.
Since you don't have access to the command line from your Hosted instance, the process for adding credentials is a little different. When you create a deployment and select "Expose Dolt credentials" or add credentials to an existing deployment, our deployment manager service will generate Dolt credentials and add them to the global Dolt configuration on your instance. We store the public key so it is available in the settings tab of your deployment. Once you add this key to DoltHub, you can use SQL remote commands to clone, push, pull, and fetch from public or private databases.
An example
A few months ago Tim wrote a blog about connecting Dolt via Hosted to Google Looker Studio to show how Dolt can add a versioning layer to your data or metrics. This blog was written before cloning private DoltHub databases was available. Usually people want their metrics data to be private, so we'll add to the example in the blog by showing how you can interact with a private database on DoltHub from Hosted.
First, I took Tim's simple_example
database on DoltHub and
copied it to a new private database named dolthub/user_metrics
.
Next, I create a new deployment on Hosted
and check the Expose Dolt credentials
check box from the form.
Once the deployment has started, the public key will be populated in the settings tab of the deployment page (this may take a few minutes). Note that any deployment administrator will have access to this key.
If I accidentally expose my key or decide I want to remove it, I also have those options there.
I click on "Add to DoltHub" to add my public key to DoltHub. This will open my DoltHub credentials settings page.
Now that I have my credentials set up, I can connect to the Hosted instance using the
information in the Connectivity tab and clone my dolthub/user_metrics
database.
% mysql -h"dolthub-metrics-example.dbs.hosted.doltdb.com" -u"[username]" -p"[password]"
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.9-Vitess
Copyright (c) 2000, 2022, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> call dolt_clone('dolthub/user_metrics');
+--------+
| status |
+--------+
| 0 |
+--------+
1 row in set (0.71 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| user_metrics |
+--------------------+
3 rows in set (0.05 sec)
mysql> use user_metrics;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+------------------------+
| Tables_in_user_metrics |
+------------------------+
| user_counts |
+------------------------+
1 row in set (0.04 sec)
Now I can do whatever I want with my metrics data, including following these steps to connect my Hosted instance to Looker.
Finance reviews our metrics charts and finds a hole in the data suggesting an outage.
Finance decides they want to show estimated DAU and MAU metrics for those four days, but
we don't trust Finance to have direct access to our production database. Instead, they can
fill in the estimated user_counts
for the missing metrics for March 1-4 on DoltHub using
the spreadsheet editor and create a pull request. They don't even need to know SQL!
We want to review the new chart from the change before we merge it to main
. We can pull
that branch to our Hosted instance and use it to create a new branch in
Looker.
mysql> call dolt_pull('origin', 'outage-estimates');
+--------------+-----------+
| fast_forward | conflicts |
+--------------+-----------+
| 1 | 0 |
+--------------+-----------+
1 row in set (1.08 sec)
mysql> select * from dolt_diff('main...outage-estimates', 'user_counts');
+---------------+--------+--------+------------------+-------------------------+-----------------+----------+----------+----------------------------------+------------------------+-----------+
| to_count_date | to_dau | to_mau | to_commit | to_commit_date | from_count_date | from_dau | from_mau | from_commit | from_commit_date | diff_type |
+---------------+--------+--------+------------------+-------------------------+-----------------+----------+----------+----------------------------------+------------------------+-----------+
| 2023-03-01 | 9 | 27 | outage-estimates | 2023-05-17 00:21:13.635 | 2023-03-01 | 0 | 0 | ufqn61fplkokuthk8fgdf2jgf94kv7mj | 2023-05-17 00:17:48.16 | modified |
| 2023-03-02 | 9 | 27 | outage-estimates | 2023-05-17 00:21:13.635 | 2023-03-02 | 0 | 0 | ufqn61fplkokuthk8fgdf2jgf94kv7mj | 2023-05-17 00:17:48.16 | modified |
| 2023-03-03 | 9 | 27 | outage-estimates | 2023-05-17 00:21:13.635 | 2023-03-03 | 0 | 0 | ufqn61fplkokuthk8fgdf2jgf94kv7mj | 2023-05-17 00:17:48.16 | modified |
| 2023-03-04 | 9 | 27 | outage-estimates | 2023-05-17 00:21:13.635 | 2023-03-04 | 0 | 0 | ufqn61fplkokuthk8fgdf2jgf94kv7mj | 2023-05-17 00:17:48.16 | modified |
+---------------+--------+--------+------------------+-------------------------+-----------------+----------+----------+----------------------------------+------------------------+-----------+
4 rows in set (0.05 sec)
Looks much better!
But I want to make an update to the DAU for March 1. I can make the change from Hosted on
a branch and push it back to DoltHub using dolt_push
.
mysql> call dolt_checkout('-b', 'outage-update');
+--------+
| status |
+--------+
| 0 |
+--------+
1 row in set (0.05 sec)
mysql> update user_counts set dau=10 where count_date='2023-03-01';
Query OK, 1 row affected (0.13 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> call dolt_commit('-am', 'Update DAU estimate for 2023-03-01');
+----------------------------------+
| hash |
+----------------------------------+
| h3q00k0ej2uvrcgube0ue7ck27lsfms8 |
+----------------------------------+
1 row in set (0.05 sec)
mysql> call dolt_push('origin', 'outage-update');
+---------+
| success |
+---------+
| 1 |
+---------+
1 row in set (2.47 sec)
And I will see the new branch in my database on DoltHub.
From there, our changes are approved and merged into the main
branch on DoltHub. One
more dolt_pull
will update our main
branch on Hosted.
mysql> call dolt_pull('origin', 'main');
+--------------+-----------+
| fast_forward | conflicts |
+--------------+-----------+
| 1 | 0 |
+--------------+-----------+
1 row in set (1.44 sec)
mysql> select * from dolt_log as of 'main';
+----------------------------------+-----------+---------------------+-------------------------+----------------------------------------------------------------------------+
| commit_hash | committer | email | date | message |
+----------------------------------+-----------+---------------------+-------------------------+----------------------------------------------------------------------------+
| h3q00k0ej2uvrcgube0ue7ck27lsfms8 | taylor | taylor@dolthub.com | 2023-05-17 00:34:57.659 | Update DAU estimate for 2023-03-01 |
| 20hrgf4mh8qq2naeq56ofpojlvpk6o7c | finance | finance@dolthub.com | 2023-05-17 00:21:13.635 | Fill in estimated user DAU and MAU for outage |
| 0os9q2pu14od0ms1hick3ridmtjth0at | timsehn | tim@dolthub.com | 2023-01-27 00:54:17.634 | Simulated metrics outage |
| 5jsvf0vlpeb0bp60iuoo20hvtsc34ipn | timsehn | tim@dolthub.com | 2023-01-27 00:50:49.836 | Base generated data |
| ikfqomtfnrt0b428pjt5n8v02fq9st6m | timsehn | tim@dolthub.com | 2023-01-23 20:43:02.062 | New data |
| tiu5jb99mrqt8k059gvdvphss76kmeo2 | timsehn | tim@dolthub.com | 2023-01-23 19:49:57.939 | Made table with a single row |
| lvp259t93l6jjjvup7nhu29je1lvlto9 | timsehn | tim@dolthub.com | 2023-01-23 19:47:52.786 | Initialize data repository |
+----------------------------------+-----------+---------------------+-------------------------+----------------------------------------------------------------------------+
Conclusion
Using DoltHub as a remote for Hosted can be a great way to build an application on top of a database you see on DoltHub or to isolate writes to your production database.
If you have any feedback on the Hosted product or want to use Hosted Dolt for your database, make a feature request on GitHub or reach out to us on Discord.