How to run DoltLab without egress

DOLTLAB
10 min read

DoltLab is the self-hosted version of DoltHub, a web-based remote for your Dolt databases. In recent weeks, we had a DoltLab customers reach out to us looking to run their DoltLab instances from within a closed, internal network where egress traffic is heavily restricted or altogether blocked. This means that their DoltLab instance is prevented from making outbound http requests. Earlier versions of DoltLab would error if egress traffic was restricted.

Most often, DoltLab customers who require their instance to run without egress need their instance to comply with their company's internal data security policies. And, while both DoltHub and DoltLab do not transport remote data off-instance, when it comes to data security, it's understandable why companies prefer to play it safe by simply blocking all outbound requests.

So, after helpful collaboration with our customers about their requirements and use-cases, we released DoltLab v2.3.3 which supports running a DoltLab instance without egress access.

Interestingly, earlier versions of DoltLab made two unintentional egress calls that were removed in DoltLab v2.3.3, but were tricky to track down.

The first of these unintentional egress calls occurred whenever a DoltLab Job was run. When the Job started, it logged the version of the Dolt binary contained in the Job. It did this by executing the dolt version command and in recent months, this command was updated to make an outbound http request to api.github.com. It does this check if there's a newer Dolt binary available to download, and notifies the user if there is. But this is not relevant to DoltLab, so we disabled this Dolt feature in DoltLab v2.3.3.

The second unintentional egress calls DoltLab made prior to v2.3.3 were to Stripe's API, via the inclusion of an HTML script the DoltLab DOM. DoltLab does not use Stripe, but DoltHub uses it to process payments. And, since both products share source code, this DoltHub-only script was being added, mistakenly, in DoltLab. Upon further investigation into this, we found that an import of the stripe/stripe-js NPM package was injecting this script on DoltLab, whenever it detected that the stripe-js script was missing from the DOM! This too has been fixed in DoltLab v2.3.3.

In the remainder of today's blog I'll cover how to set up and run DoltLab v2.3.3 in a restricted environment without egress. Please note, that at the time of this writing DoltLab Enterprise requires egress access. DoltLab Enterprise makes egress calls to a licensing server in order to validate the license of the enterprise instance and authorize the use of its features. However, fully offline DoltLab Enterprise support is under construction and will be available in the coming weeks.

Prerequisites

As of DoltLab v2.3.3, there are only two types of outbound http calls made by DoltLab.

First, a DoltLab instance emits first-party ("phone-home") metrics that let our team know how many instances are running in the wild. We use these metrics to help secure funding for DoltLab's ongoing development and support, so we encourage users to not disable them, though doing so is easy.

Second, a DoltLab instance needs to pull the container images for its various services from a public AWS ECR repository. If you've used DoltLab before, you know that it runs via Docker Compose, and when you start your DoltLab instance, the first thing you'll see is the service images being pulled to the host from public.ecr.aws.

root@ip:/home/ubuntu/doltlab# ./start.sh
0b2a9c59ab673ef79d6eb1fd7c84c16b5fb5d5c952a6efc53f15eeb29c058ff3
Pulling doltlabdb (public.ecr.aws/dolthub/doltlab/dolt-sql-server:v2.3.3)...
v2.3.3: Pulling from dolthub/doltlab/dolt-sql-server
7478e0ac0f23: Pull complete
c013805c2f1c: Pull complete
aace6430adbe: Pull complete
4b1b141afe4e: Pull complete
b6b8e5d82846: Pull complete
84d93b369a57: Pull complete
4f4fb700ef54: Pull complete
3c5f26355296: Pull complete
616210399853: Extracting [=========>                                         ]  7.471MB/38.67MB
3555462ef609: Download complete

Additionally, some of DoltLab's features, like Jobs will pull service images for the Job that is queued to run. This happens silently under-the-hood, and is orchestrated by DoltLab's main API service.

What's important to note here, is that to run a DoltLab instance on a host without egress, you'll need to perform two steps to prevent the instance from making both types of outbound calls.

Disabling metrics on DoltLab is simple, and we can do this later in the process with a small edit to DoltLab's installer_config.yaml. Avoiding outbound calls to public.ecr.aws, on the other hand, requires you to pre-load service images onto your host, before you attempt to start the instance.

Let's walk through an example DoltLab v2.3.3 deployment to better illustrate what this entails.

If you're running an older DoltLab instance and are wanting to upgrade to v2.3.3, be sure to stop your old instance using the ./stop.sh script, before continuing.

DoltLab v2.3.3 without egress

Prior to the release of DoltLab v2.3.3, the service images for DoltLab were only available from the public ECR repository. However, starting with v2.3.3, we release all of DoltLab's service images in a single zip file. This enables you to load the images onto the host, so they won't need to be pulled from the public repository.

To do this, download the zip for DoltLab v2.3.3 and the accompanying zip for the service images, and then upload these zip files onto your DoltLab host. You can do this by downloading these files on a host that has egress access, saving the files on a physical drive, then uploading them to the DoltLab host from the drive.

# download zip files on a host with egress access
root@ip:/home/ubuntu# curl -LO https://doltlab-releases.s3.amazonaws.com/linux/amd64/doltlab-v2.3.3.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 19.9M  100 19.9M    0     0  10.4M      0  0:00:01  0:00:01 --:--:-- 10.4M
root@ip:/home/ubuntu# curl -LO https://doltlab-releases.s3.amazonaws.com/linux/amd64/doltlab-service-images-v2.3.3.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3129M  100 3129M    0     0  20.2M      0  0:02:34  0:02:34 --:--:-- 21.9M
# upload zip files on a host without egress access using a physical drive
root@ip:/home/ubuntu# ls
doltlab-service-images-v2.3.3.zip  doltlab-v2.3.3.zip

Please be aware that the service images zip file is quite large. For v2.3.3 it is 3.13 GB.

Once the zip files are uploaded to the DoltLab host, ensure that any older version of DoltLab has been stopped. Next, unzip the service images zip file.

root@ip:/home/ubuntu# unzip doltlab-service-images-v2.3.3.zip -d service-images
Archive:  doltlab-service-images-v2.3.3.zip
  inflating: service-images/doltremoteapi-server-v2.3.3.tar
  inflating: service-images/dolthub-server-v2.3.3.tar
  inflating: service-images/file-importer-v2.3.3.tar
  inflating: service-images/dolt-sql-server-v2.3.3.tar
  inflating: service-images/pull-merge-v2.3.3.tar
  inflating: service-images/envoy-v1.28-latest.tar
  inflating: service-images/dolthubapi-server-v2.3.3.tar
  inflating: service-images/fileserviceapi-server-v2.3.3.tar
  inflating: service-images/query-job-v2.3.3.tar
  inflating: service-images/dolthubapi-graphql-server-v2.3.3.tar

Inside the unzipped file you'll see tarballs for each service DoltLab depends on. These files will need to be loaded into Docker, so DoltLab can use them.

To load them into Docker, cd into the service-images directory and use the docker image load command for each service file. Be sure not to omit any service file, and do not change the tags of the loaded images.

root@ip:/home/ubuntu/service-images# docker load < doltremoteapi-server-v2.3.3.tar
Loaded image: public.ecr.aws/dolthub/doltlab/doltremoteapi-server:v2.3.3
root@ip:/home/ubuntu/service-images# docker load < dolthub-server-v2.3.3.tar
Loaded image: public.ecr.aws/dolthub/doltlab/dolthub-server:v2.3.3
root@ip:/home/ubuntu/service-images# docker load < file-importer-v2.3.3.tar
Loaded image: public.ecr.aws/dolthub/doltlab/file-importer:v2.3.3
root@ip:/home/ubuntu/service-images# docker load < dolt-sql-server-v2.3.3.tar
Loaded image: public.ecr.aws/dolthub/doltlab/dolt-sql-server:v2.3.3
root@ip:/home/ubuntu/service-images# docker load < pull-merge-v2.3.3.tar
Loaded image: public.ecr.aws/dolthub/doltlab/pull-merge:v2.3.3
root@ip:/home/ubuntu/service-images# docker load < envoy-v1.28-latest.tar
Loaded image: envoyproxy/envoy:v1.28-latest
root@ip:/home/ubuntu/service-images# docker load < dolthubapi-server-v2.3.3.tar
Loaded image: public.ecr.aws/dolthub/doltlab/dolthubapi-server:v2.3.3
root@ip:/home/ubuntu/service-images# docker load < fileserviceapi-server-v2.3.3.tar
Loaded image: public.ecr.aws/dolthub/doltlab/fileserviceapi-server:v2.3.3
root@ip:/home/ubuntu/service-images# docker load < query-job-v2.3.3.tar
Loaded image: public.ecr.aws/dolthub/doltlab/query-job:v2.3.3
root@ip:/home/ubuntu/service-images# docker load < dolthubapi-graphql-server-v2.3.3.tar
Loaded image: public.ecr.aws/dolthub/doltlab/dolthubapi-graphql-server:v2.3.3

Once the images are loaded into Docker, you should be able to see them with the docker image ls command (although the "created" dates will be incorrect).

root@ip:/home/ubuntu/service-images# docker image ls
REPOSITORY                                                 TAG            IMAGE ID       CREATED        SIZE
public.ecr.aws/dolthub/doltlab/dolthub-server              v2.3.3         759cfc4e4310   11 days ago    2.94GB
public.ecr.aws/dolthub/doltlab/dolthubapi-graphql-server   v2.3.3         4d05fedef003   11 days ago    2.31GB
public.ecr.aws/dolthub/doltlab/dolt-sql-server             v2.3.3         eab19a52eef8   11 days ago    243MB
envoyproxy/envoy                                           v1.28-latest   a37e999f9612   12 days ago    150MB
public.ecr.aws/dolthub/doltlab/pull-merge                  v2.3.3         f1059932c427   4 months ago   289MB
public.ecr.aws/dolthub/doltlab/file-importer               v2.3.3         c13254c33ccb   4 months ago   296MB
public.ecr.aws/dolthub/doltlab/query-job                   v2.3.3         6a3e47fe8fd2   4 months ago   289MB
public.ecr.aws/dolthub/doltlab/doltremoteapi-server        v2.3.3         d2546e7c90c4   N/A            224MB
public.ecr.aws/dolthub/doltlab/dolthubapi-server           v2.3.3         923c5d21ca84   N/A            274MB
public.ecr.aws/dolthub/doltlab/fileserviceapi-server       v2.3.3         96265469c401   N/A            156MB

Your DoltLab instance will now no longer attempt to pull any images from the public ECR repository, since they're already loaded on the host.

Now it's time to configure your new DoltLab instance.

Unzip the DoltLab zip file and cd into the doltlab directory.

root@ip:/home/ubuntu/service-images# cd ../
root@ip:/home/ubuntu# ls
doltlab-service-images-v2.3.3.zip  doltlab-v2.3.3.zip  service-images
root@ip:/home/ubuntu# unzip doltlab-v2.3.3.zip -d doltlab
Archive:  doltlab-v2.3.3.zip
  inflating: doltlab/smtp_connection_helper
  inflating: doltlab/installer
  inflating: doltlab/installer_config.yaml
root@ip:/home/ubuntu# cd doltlab

If you have a previous installation of DoltLab on this host, you can simply copy the installer_config.yaml of your previous installation into this doltlab directory, replacing the default one. Just be sure to edit the version field of the old installer_config.yaml to be v2.3.3.

# installer_config.yaml

version: "v2.3.3"
# ...

Otherwise, if this is your first-time installing DoltLab, you can follow the steps on our Start DoltLab documentation page to edit the installer_config.yaml for your particular setup. This entails defining the host and specifying the passwords your instance should use.

Finally, you'll need to make one additional change to the installer_config.yaml file which will prevent your instance from making the metrics-related egress calls we discussed earlier. Edit installer_config.yaml once more and set the metrics_disabled field to true.

# installer_config.yaml
# ...

## First-party metrics can be disabled by setting `metrics_disabled: true`. Default is `false`.
metrics_disabled: true
# ...

Save your edits, and run the installer binary to generate DoltLab's static assets.

root@ip:/home/ubuntu/doltlab# ./installer
2024-10-01T21:22:43.488Z	INFO	cmd/main.go:554	Successfully configured DoltLab	{"version": "v2.3.3"}

2024-10-01T21:22:43.488Z	INFO	cmd/main.go:560	To start DoltLab, use:	{"script": "/home/ubuntu/doltlab/start.sh"}
2024-10-01T21:22:43.488Z	INFO	cmd/main.go:565	To stop DoltLab, use:	{"script": "/home/ubuntu/doltlab/stop.sh"}

You can now start your v2.3.3 DoltLab instance by running the ./start.sh script, and your instance will not be making any egress requests!

Conclusion

I want to extend a big thank you to the customer who worked closely with us to get this use case supported. We're always trying to improve our products and are so grateful when the community reaches out to let us know how we can better support them!

If you haven't heard yet, you can reach us anytime Discord. We'd love to chat with you and learn more about how you want to use Dolt and DoltLab.

Thanks for reading and don't forget to check out each of our cool products below:

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.