Dolt + Looker Studio
A couple weeks ago I wrote about how to use Dolt with Apache Superset. Dolt provides powerful versioning features to Superset charts and dashboards. Dolt "just works" with Superset using the MySQL connector so it's really easy to get started. The power of branches and history queries improve the efficiency of your data teams and integrity of your data.
I received feedback from a reader of my weekly email that he would like to learn how to connect Dolt to Google Looker Studio to get the same benefits. I live to serve. Here we go.
Why Dolt + Looker Studio?
We covered similar reasons in the Superset Blog but just in case you missed that one, let's do it again.
Dolt adds a versioning layer to your data or metrics. As a data scientist or analyst, you inspect a dashboard or table and something looks fishy. Was this number always 10? I thought I saw 8 the last time I ran this report? When did that change? Dolt allows your data scientists and data analysts to answer audit questions on your data and metrics by providing an audit log of every cell in the database: who changed what, when? Data versioning will improve the efficiency of your data team.
In this scenario, Dolt would version the tables output by your data warehouse jobs, think nightly metrics runs. This is a natural place to hook versioned Dolt databases up to your dashboarding infrastructure, like Looker.
In addition to auditing metrics, Dolt makes your metrics layer safe for writes. Want to preserve a forecasting model but not have it affect production data? Do it on a branch. Branches can greatly improve data team collaboration. Data analyses that currently happen in spreadsheets or notebooks can be shared and collaborated on more easily.
Why Looker Studio over Apache Superset?
Apache Superset is a service you deploy yourself, whereas Looker Studio is a cloud service akin to other Google products like Sheets or Docs. So, if you don't want to run your own tool, Looker Studio is for you.
Looker Studio does have some issues with large databases. When querying a SQL source, Looker Studio limits you to 150,000 rows whereas Superset lets you go hog wild since it's on your own infrastructure. For this reason, I preferred Superset but I'm a bit of a power user. If you don't want to run anything yourself, Looker Studio is for you.
Set Up Dolt for Looker Studio
Looker Studio is a hosted cloud service so you have to make your Dolt database visible to the internet. To do this, you have two options:
- Use Hosted Dolt.
- Poke a hole to the internet from your machine.
I'll cover both methods here.
Hosted Dolt
Hosted Dolt, as the name suggests, is a cloud hosted Dolt. Click a few buttons and you have a running Dolt database accessible via the internet. It's billed by the minute and instances start at $50/month.
Go to Hosted Dolt, sign in, and click create deployment. You will be greeted by a page that looks like this. Select the t2-medium instance type (that's the cheapest). Give your deployment a name, I called mine "looker".
Wait a couple minutes for the database to start. I now have a deployment in Hosted Dolt. This page will have connectivity information.
Now, I can connect to the Hosted Dolt instance from my laptop using the mysql
client. The connection string is on the page. I'm then going to use call dolt_clone()
to grab the simple_example database I used in the Superset article onto the Hosted instance from DoltHub. Look how clean and simple it is to copy a database in Dolt. Dolt and DoltHub are powerful data sharing tools.
$ mysql -h"timsehn-looker.dbs.hosted.doltdb.com" -u"<redacted>" -p"<redacted>"
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.7.9-Vitess
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> call dolt_clone('timsehn/simple_example');
+--------+
| status |
+--------+
| 0 |
+--------+
1 row in set (0.33 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| simple_example |
| mysql |
+--------------------+
3 rows in set (0.04 sec)
mysql> use `simple_example`;
Database changed
mysql> show tables;
+--------------------------+
| Tables_in_simple_example |
+--------------------------+
| user_counts |
+--------------------------+
1 row in set (0.04 sec)
You're all set. That's all you need to do to get Dolt running with an example database.
Local Machine
You always have the option to run Dolt on your local machine but you then must make your local machine visible to the internet. This requires a few steps.
First, I need to clone the simple_example
database and start a dolt sql-server
to get a MySQL compatible server going on my local machine.
$ dolt clone timsehn/simple_example
$ cd simple_example
$ dolt sql-server
Starting server with Config HP="localhost:3306"|T="28800000"|R="false"|L="info"|S="/tmp/mysql.sock"
Now, you need to poke a hole to the internet from your local machine. The best service I know of to do this is ngrok
. You'll need an account but otherwise, it is free.
I downloaded ngrok
on my Mac using Homebrew, but once you sign up there are a bunch of ways to install it.
$ brew install ngrok
==> Downloading https://bin.equinox.io/a/an2KvqCaZyY/ngrok-v3-3.1.1-stable-darwi
######################################################################## 100.0%
==> Installing Cask ngrok
==> Linking Binary 'ngrok' to '/opt/homebrew/bin/ngrok'
🍺 ngrok was successfully installed!
Once I signed up I ran ngrok config add-authtoken
with the authtoken
they gave me. Then I ran:
$ ngrok tcp 3306
That terminal gets taken over and you should see something like this.
In my case, I now have a Dolt database on the internet at hostname 2.tcp.ngrok.io
port 11756
. Let me prove it.
$ mysql -h 2.tcp.ngrok.io -P 11756 -uroot
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.9-Vitess
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use simple_example;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+--------------------------+
| Tables_in_simple_example |
+--------------------------+
| user_counts |
+--------------------------+
1 row in set (0.15 sec)
You're all set. Your database is exposed to the internet. Ask a friend to connect and run drop database simple_example
. This method is not recommended for production databases or data you want to keep secret. If you have this type of data, at least change the permissions so root
with no password doesn't work.
Create a Data Source
For the rest of this article, I'm going to assume you went the Hosted Dolt method. It's more secure and more in line with the Looker Studio hosted model.
To use the ngrok method, just enter the host and port ngrok gave you and use the user root
with no password. I tested it with Looker and it works.
Back to the Hosted Dolt version. I need to create the data source in Looker Studio. To do this, I first need to download the SSL certificate to authenticate my connection. We get our certificate from our Hosted Dolt deployment page. Click the little download icon next to hosted_ca.pem
.
After you've downloaded that file, head over to Looker Studio and log in with your Google account.
Let's create a data source in Looker from our Hosted Dolt instance. I click "Create > Data source" in the top left corner of the Looker Studio homepage. I search for mysql and use the provided MySQL connector.
I'm going to call this data source "simple_example/main", main for main branch. I enter all the information from the hosted connectivity page, click enable SSL, and use that certificate I told you to download. I left off the username and password for paranoia sake. Enter the user and password hosted provided you.
When you have it all in, click "AUTHENTICATE". This should bring up a user_counts
table. Select that and click "CONNECT". Success looks like this:
You now have a data source connected to Dolt that you can use in reports!
Create a Report
Now it's time to create a Report using the Dolt data source I just created. To do this, click the "CREATE REPORT" button on the top right of the data source page. You'll be greeted with this dialog.
I click "ADD TO REPORT" and selected "Don't show me this again". I now have a simple data table on page.
I'm going to make a report with one page of raw data and the other page with a chart. First the raw data, I expand the provided table to take up the whole page. I then add the Daily Active Users (dau
), and Monthly Active Users (mau
) dimensions. Finally, I get rid of the row number and pagination under the style tab.
Simple enough. Now I'll add a page with a time series chart. I click "Page > Add page". Then I click "Add a chart" and select the line graph. I expand the chart to be the full page. It's not quite right so I remove "Record count" under metrics and add dau
and mau
. I now have the chart I want.
The cool thing about Looker is that I can now click View to see what my report will look like when it's not in edit mode. Looks great if I don't say so myself.
Unfortunately, Finance doesn't think so! Look at that gaping hole because of a metrics outage.
Use a Branch
As explained in the Superset blog post:
We've all been there. The metrics pipeline was down for the first four days of March and the data pipeline team has been working tirelessly to recover the missing data. Until the data is recovered, it's really distracting to have that hole in the data. Temporarily, we want to show estimated DAU and MAU metrics for those four days. We want to have an audit log of what we did so no one questions the integrity of our business metrics.
With Dolt, we can either put the broken data on a branch or make the dashboard reference an estimated branch. For this example, we'll switch the dashboard to an estimated branch. Let assume I fixed the four data points on the estimated branch.
How would I show this new chart in Looker? It's easy. I need to make a data source for my branch.
Going back into edit mode on my report, I click "Add data". I'm greeted with the familiar Data Source page. Again I search for MySQL and select it. I re-enter all my credential information and click AUTHENTICATE. This time instead of clicking the user_counts
table, I select custom query and enter the following AS OF
query.
select * from user_counts as of 'estimated';
This selects the user_counts
table as it exists on the 'estimated' branch. I call this data source "simple_example/estimated", estimated for estimated branch.
I then duplicate the chart page and change the data source to my new estimated branch data source.
After I change the data source, I can see that the data is fixed in this branch. I click View and run off to show Finance we're all good. Our estimated data is fine.
Dolt gives you full audit capability here. As mentioned in the Superset blog:
Once the data is fixed on the main branch I can go back to that chart. Because of Dolt, I have an audit log of the whole event, ensuring the integrity of our business metrics.
Conclusion
As you can see Dolt works just as well with Google Looker Studio as it does with Apache Superset. You can use Hosted Dolt or your local machine to serve data to Looker Studio. Dolt provides powerful data versioning capabilities to charts and dashboards no matter your tool of choice. Want me to show off how another tool integrates with Dolt? Come by our Discord.