Deploying a Production Ready DoltLab Instance, An Example
This year we launched DoltLab the self-hosted version of DoltHub. In February, we released the latest version of DoltLab, version v0.2.0
, which included a number of features and bug fixes. We are actively working on DoltLab's next release which is focused on improving the DoltLab administrator's experience as well as making it a bit easier to submit bug reports and service logs to our team.
As part of our work to make DoltLab a high-quality product, we've recently launched an internal DoltLab instance we use as a staging environment for upcoming DoltLab releases. We've set up this instance to model what DoltLab administrators should do to more easily deploy their own DoltLab instances.
Today I'll be demonstrating how we deployed a DoltLab instance to this staging environment in a way that allows you to also deploy a DoltLab instance to a production ready environment.
We will be deploying our own production DoltLab instance in the near future. Our production instance will be used largely for enterprise client demonstrations, but will be available to the public for viewing, querying, and cloning databases.
Let's get started!
TL;DR Deploying a Production Ready DoltLab Instance
Build an AMI
To aid us in sanely deploying any version of DoltLab to our internal staging environment, we opted to create a custom AMI that installs and configures the required dependencies the host machine needs in to deploy DoltLab shortly after booting.
We also use Packer and Terraform to create the AMI and provision the necessary AWS resources. Please note, a cloud provider is not required to use DoltLab. Internally, we've opted to deploy DoltLab on AWS EC2 since AWS is DoltHub's current cloud provider.
To build an AMI with Packer, we created a file called doltlab_ami.pkr.hcl
that includes the following:
packer {
required_plugins {
amazon = {
version = ">= 1.0.4"
source = "github.com/hashicorp/amazon"
}
}
}
variable "sha" {
type = string
}
variable "stage" {
type = string
}
source "amazon-ebs" "ubuntu" {
ami_name = "doltlab-${var.stage}-${var.sha}"
instance_type = "m5a.xlarge"
source_ami_filter {
filters = {
name = "ubuntu/images/*ubuntu-focal-20.04-amd64-server-*"
root-device-type = "ebs"
virtualization-type = "hvm"
}
most_recent = true
owners = ["XXXXXXXXXX"]
}
ssh_username = "ubuntu"
}
build {
name = "doltlab"
source "source.amazon-ebs.ubuntu" {
region = "us-east-1"
assume_role {
role_arn = "arn:aws:iam::XXXXXXXXXXXXXX:role/DoltLabAMIBuilder"
}
}
provisioner "file" {
source = "authorized_keys"
destination = "/home/ubuntu/.ssh/authorized_keys"
}
provisioner "file" {
source = "ubuntu-bootstrap.sh"
destination = "/home/ubuntu/ubuntu-bootstrap.sh"
}
provisioner "file" {
source = "openssl.conf"
destination = "/home/ubuntu/openssl.conf"
}
# create self-signed tls cert for aws-smtp-relay
provisioner "shell" {
inline = [
"openssl req -new -x509 -config /home/ubuntu/openssl.conf -days 24855 -out /home/ubuntu/aws-smtp-relay.crt -keyout /home/ubuntu/aws-smtp-relay.key",
"sudo cp /home/ubuntu/aws-smtp-relay.crt /usr/local/share/ca-certificates/",
"sudo update-ca-certificates",
]
}
# install aws-smtp-relay
provisioner "shell" {
inline = [
"curl -LO https://go.dev/dl/go1.17.7.linux-amd64.tar.gz",
"sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.17.7.linux-amd64.tar.gz",
"sudo /usr/local/go/bin/go install github.com/blueimp/aws-smtp-relay@v1.1.0",
]
}
# create aws-smtp-relay service definition `aws-smtp-relayd`
provisioner "file" {
source = "aws-smtp-relayd.service"
destination = "/tmp/aws-smtp-relayd.service"
}
# create aws-smtp-relay service `aws-smtp-relayd` and enable it to start on boot
provisioner "shell" {
inline = [
"sudo chmod 664 /tmp/aws-smtp-relayd.service",
"sudo chown root:root /tmp/aws-smtp-relayd.service",
"sudo mv /tmp/aws-smtp-relayd.service /etc/systemd/system/aws-smtp-relayd.service",
"sudo systemctl daemon-reload",
"sudo systemctl enable aws-smtp-relayd",
]
}
provisioner "shell" {
inline = [
"sudo apt install unzip",
"curl \"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip\" -o \"awscliv2.zip\"",
"unzip awscliv2.zip",
"sudo ./aws/install",
]
}
post-processor "manifest" {
output = "doltlab_ami_manifest.json"
strip_path = true
}
}
At the start of the file, after declaring the amazon
plugin in the packer
block, we define two variables whose values will be defined at build time–sha
and stage
.
We use these variables to differentiate our AMIs based on the commit sha
of our DoltLab source code and the stage
, or context, this AMI is used in. To build a development AMI we'd set stage=dev
, and for production we'd set stage=prod
. We can then build the AMI by running:
sha=`git rev-parse HEAD`
packer build -var sha="$sha" -var stage="$stage" ./doltlab_ami.pkr.hcl
The source
block in our file defines the host machine and operating system we'll use as the base image for building DoltLab's AMI. Currently we use an m5a.xlarge
instance with 4 vCPU and 16GB of memory. This is the same instance type we'll use to run DoltLab. This host will run ubuntu 20.04
as DoltLab v0.2.0
is currently only available for linux.
The build
block specifies the IAM role, DoltLabAMIBuilder
, who has permission to build this AMI then proceeds to define a series of provisioner
blocks that do the heavy lifting in this packer configuration. Let's look at what each of these provisioner
blocks contain.
The first provisioner
block copies a local file authorized_keys
containing authorized ssh
keys to /home/ubuntu/.ssh/authorized_keys
in our DoltLab host. This allows all authorized DoltLab developers and administrators ssh
access to the host.
The second provisioner
block copies a local file ubuntu-bootstrap.sh
to /home/ubuntu/ubuntu-bootstrap.sh
on the DoltLab host. This script, originally created for this video blog, and available here, makes installing DoltLab's dependencies a very simple one-line command, so we just add it our host.
The third provisioner
block copies a local file openssl.conf
to /home/ubuntu/openssl.conf
and is referenced in a subsequent provisioner
block to generate a self signed TLS certificate used for connecting to an SMTP relay server we'll run next to DoltLab. We will look more closely at that a bit later, though. For now, here is the contents of our openssl.conf
file:
[req]
default_bits=2048
default_md=sha256
default_keyfile=aws-smtp-relay.key
encrypt_key=no
prompt=no
distinguished_name=distinguished_name
x509_extensions=x509_ext
[x509_ext]
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
nsCertType=server
keyUsage=digitalSignature,keyEncipherment
extendedKeyUsage=serverAuth
subjectAltName=@alt_names
[distinguished_name]
commonName=localhost
[alt_names]
IP=<AWS EIP>
DNS=<A Record DNS name>
Notice that the end of the file contains two key value pairs, IP=<AWS EIP>
and DNS=<A Record DNS name>
. For our internal DoltLab deployment, as with all production deployments, we want our DoltLab AMI to run on a host that resolves to the same IP address and DNS name every time we launch it.
So, we provisioned an AWS EIP and an DNS A Record mapped to the EIP. For the SMTP relay to recognize the host's connection, we add the EIP as the value to IP
, and the DNS A Record name as the value of DNS
.
The next provisioner
block is the first of the shell
blocks in our doltlab_ami.pkr.hcl
file. This block actually generates the aforementioned TLS certificate using openssl
and adds the certificate to the host's certificates store.
The provisioner
block after that is also a shell
block, and is responsible for installing golang
on the host, which we use immediately to install the AWS SES SMTP relay our DoltLab instance will use to send emails.
Setting up an SMTP relay allows DoltLab to send emails through our existing AWS SES account using IAM roles associated with the host machine. We opted to relay on the host's IAM roles for sending emails so that we do not need to pass in secret values to the EMAIL_USERNAME
and EMAIL_PASSWORD
environment variables required by the DoltLab's start script. This provides an additional layer of security for production environments.
The next provisioner
block is another file
block that copies a local file aws-smtp-relayd.service
to /tmp/aws-smtp-relayd.service
. This file is a simple systemctl service definition for the aws-smtp-relay
server we installed above. Here's what aws-smtp-relayd.service
contains:
[Unit]
Description=aws-smtp-relay service
[Service]
Environment="AWS_REGION=<Region>"
ExecStart=/root/go/bin/aws-smtp-relay -c /home/ubuntu/aws-smtp-relay.crt -k /home/ubuntu/aws-smtp-relay.key -s
[Install]
WantedBy=multi-user.target
This service definition enables us to start the SMTP relay server as a daemon process managed by systemctl
. In the next provisioner
block, we register this service with systemctl
, calling it aws-smtp-relayd
, and enable the the server process to start when the host boots using sudo systemctl enable aws-smtp-relayd
.
Finally, our last provisioner
block is another shell
block that installs the aws CLI tool on the host. We will use this tool in the EC2 launch template to reassign the newly launched EC2 instance's IP to be the stable AWS EIP we provisioned earlier. This will ensure any instance deployed with this AMI can assign itself to our EIP.
Create a Launch Template
After building the AMI with Packer, we created a launch template for our DoltLab deployments that enable launching new EC2 instances using our AMI. Here's what our Terraform file declaring this launch template looks like:
locals {
doltlab-user-data = <<USERDATA
#!/bin/bash
# capture instance id
InstanceID=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/instance-id)
# capture eip allocation id
AllocateID=$(aws ec2 describe-tags --filters "Name=tag:Name,Values=dev-eip" --query "Tags[0].ResourceId" --output text)
# Assigning Elastic IP to Instance
aws ec2 associate-address --instance-id $InstanceID --allocation-id $AllocateID
USERDATA
}
resource "aws_launch_template" "doltlab" {
name = "doltlab"
image_id = local.packer_ami_id
instance_type = "m5a.xlarge"
iam_instance_profile {
name = aws_iam_instance_profile.doltlab-instance-profile.name
}
network_interfaces {
associate_public_ip_address = true
subnet_id = aws_subnet.subnet.id
security_groups = [
aws_security_group.security_group.id
]
}
tags = {
Name = "doltlab-dev-instance"
}
block_device_mappings {
device_name = "/dev/sda1"
ebs {
volume_size = 2048
delete_on_termination = true
}
}
metadata_options {
http_endpoint = "enabled"
instance_metadata_tags = "enabled"
}
update_default_version = true
user_data = base64encode(local.doltlab-user-data)
}
Skipping the USERDATA
block for now and looking at the aws_launch_template
definition reveals that this launch template will deploy an m5a.xlarge
EC2 instance type with local.packer_ami_id
set to the value of the AMI ID we created with Packer.
The network_interfaces
block contains our VPC's subnet ID and the security group to attach to the launched instances.
Configuring the host's security groups for DoltLab is a very important step, since very specific ports must be open to run DoltLab successfully. Here is our security group configuration:
resource "aws_security_group" "security_group" {
name = "doltlab-dev"
description = "Security group for doltlab development"
vpc_id = aws_vpc.vpc.id
}
resource "aws_security_group_rule" "egress" {
description = "Allow doltlab instances to egress"
type = "egress"
protocol = "-1"
from_port = 0
to_port = 0
security_group_id = aws_security_group.security_group.id
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "ssh" {
description = "Allow doltlab instances to ingress ssh"
type = "ingress"
protocol = "tcp"
from_port = 22
to_port = 22
security_group_id = aws_security_group.security_group.id
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "http" {
description = "Allow http connections"
type = "ingress"
protocol = "tcp"
from_port = 80
to_port = 80
security_group_id = aws_security_group.security_group.id
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "doltlab_remote_data_server" {
description = "Allow connections to remote file server"
type = "ingress"
protocol = "tcp"
from_port = 100
to_port = 100
security_group_id = aws_security_group.security_group.id
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "doltlab_remote_api" {
description = "Allow connections to doltlab remote api"
type = "ingress"
protocol = "tcp"
from_port = 50051
to_port = 50051
security_group_id = aws_security_group.security_group.id
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "doltlab_file_service_api" {
description = "Allow connections to doltlab file service api"
type = "ingress"
protocol = "tcp"
from_port = 4321
to_port = 4321
security_group_id = aws_security_group.security_group.id
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "doltlab_aws_smtp_relay" {
description = "Allow connections to aws-smtp-relay"
type = "ingress"
protocol = "tcp"
from_port = 1025
to_port = 1025
security_group_id = aws_security_group.security_group.id
cidr_blocks = ["0.0.0.0/0"]
}
Most of these ports should look familiar if you've seen our DoltLab v0.2.0
setup blog or our DoltLab video blog. But since we are running an SMTP relay in this context, we also open port 1025
.
Getting back to the aws_launch_template
declaration, we can see that block_device_mappings
block provisions 2TBs of EBS disk for use on our host, however we don't plan on pushing or uploading extremely large amounts of data to our DoltLab staging site (the completed FBI-NIBRS database on DoltHub is 1TB itself!). If this changes, we would definitely increase the amount of disk we provision here to support our use case.
Finally, we enable the metadata_options
for the host so that we can retrieve some important information from EC2's metadata service, and include the USERDATA
we defined at the top of the file in the appropriate field. This USERDATA
block will run when the host boots and it's what allows the host to dynamically map it's own IP to our provisioned EIP.
Looking closely at the USERDATA
now, we can see the steps that allow this IP reassignment.
First, we fetch the InstanceID
of the newly launched instance by curl
ing the EC2 metadata endpoint:
InstanceID=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/instance-id)
Next, we fetch the AllocationID
associated with the EIP we provisioned using the aws
CLI tool we provisioned in our AMI:
AllocateID=$(aws ec2 describe-tags --filters "Name=tag:Name,Values=dev-eip" --query "Tags[0].ResourceId" --output text)
Then we just need to use associate-address
to map this new host to our EIP:
aws ec2 associate-address --instance-id $InstanceID --allocation-id $AllocateID
Now, when launching an instance from this launch template, you can see the IP remapping happen in the AWS console shortly after launch, it's kinda cool!
Start DoltLab
The final remaining step is to ssh
into the newly launched host and start DoltLab. To do this, first run the script that installs DoltLab's dependencies and downloads the version of DoltLab you want to run (which should be v0.2.0
or higher):
chmod +x ubuntu-bootstrap.sh
sudo ./ubuntu-bootstrap.sh with-sudo v0.2.0
After the script finishes, the host will have an unzipped directory called doltlab
that contains the resources needed to run DoltLab using the doltlab/start-doltlab.sh
script. This script will start DoltLab's services using docker-compose in daemon mode.
There is one additional change we need to make in order to enable DoltLab to successfully connect to our aws-smtp-relayd
process listening on 1025
.
We need to modify the doltlab/docker-compose.yaml
. Under the doltlabapi
section, in the volumes
definition, we mount the certificates of the host (which is authorized to connect to the SMTP relay server via TLS) to the container running doltlabapi
. This file should be mounted to the same path:
...
doltlabapi:
...
volumes:
...
/etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt
...
...
Once we've updated our docker-compose.yaml
file, we need to change groups in our shell so we can run docker
without using sudo
:
sudo newgrp docker
Now we can run the doltlab/start-doltlab.sh
script with the proper environment variables. Note that EMAIL_USERNAME
and EMAIL_PASSWORD
are required by the script, but can be set to nonsense values as the host's IAM roles will be used to authenticate emails sent by DoltLab:
HOST_IP=<Host IP or DNS Name> \
POSTGRES_PASSWORD=<Password> \
DOLTHUBAPI_PASSWORD=<Password> \
POSTGRES_USER=dolthubadmin \
EMAIL_USERNAME=not-used \
EMAIL_PASSWORD=not-used \
EMAIL_PORT=1025 EMAIL_HOST=<Host IP or DNS Name> \
NO_REPLY_EMAIL=<An Email Address to Receive No Reply Messages> \
./start-doltlab.sh
HOST_IP
will contain the DNS A Record we provisioned for our DoltLab instance, and we also supply this value to EMAIL_HOST
since our SMTP relay is also running on this host. POSTGRES_PASSWORD
and DOLTHUBAPI_PASSWORD
can be chosen by the deployer, but POSTGRES_USER
must be dolthubadmin
.
After the script completes, the running DoltLab services can be seen with docker ps
:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c1087c9f6004 public.ecr.aws/dolthub/doltlab/dolthub-server:v0.2.0 "docker-entrypoint.s…" 9 days ago Up 9 days 3000/tcp doltlab_doltlabui_1
a63aade4a36e public.ecr.aws/dolthub/doltlab/dolthubapi-graphql-server:v0.2.0 "docker-entrypoint.s…" 9 days ago Up 9 days 9000/tcp doltlab_doltlabgraphql_1
5b2cad62d4e5 public.ecr.aws/dolthub/doltlab/dolthubapi-server:v0.2.0 "/app/go/services/do…" 9 days ago Up 9 days doltlab_doltlabapi_1
e6268950f987 public.ecr.aws/dolthub/doltlab/doltremoteapi-server:v0.2.0 "/app/go/services/do…" 9 days ago Up 9 days 0.0.0.0:100->100/tcp, :::100->100/tcp, 0.0.0.0:50051->50051/tcp, :::50051->50051/tcp doltlab_doltlabremoteapi_1
52f39c016537 public.ecr.aws/dolthub/doltlab/fileserviceapi-server:v0.2.0 "/app/go/services/fi…" 9 days ago Up 9 days doltlab_doltlabfileserviceapi_1
0f952e7c7007 envoyproxy/envoy-alpine:v1.18-latest "/docker-entrypoint.…" 9 days ago Up 9 days 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:4321->4321/tcp, :::4321->4321/tcp, 10000/tcp doltlab_doltlabenvoy_1
204e0274798b public.ecr.aws/dolthub/doltlab/postgres-server:v0.2.0 "docker-entrypoint.s…" 9 days ago Up 9 days 5432/tcp doltlab_doltlabdb_1
Conclusion
If you're using DoltLab, or want to start, please don't hesitate to contact us here or on Discord in the #doltlab channel. We are happy to help you out and make sure that DoltLab delivers great value to support your use case.
Stay tuned for more DoltLab updates headed your way soon!