Productionizing Dolt

November 27, 2024

9 min read

So you've been using Dolt for a while, and you are ready to take the next step and use it in production. Whether you are running your production environment on prem, or in the cloud, there are several steps you should take to ensure that your Dolt database is production ready. Any production system should be monitored, backed up, and have a plan for disaster recovery. Dolt is no different. In this post, I'll cover some of the things you should consider when taking Dolt to production.

Configuring Your Server

Dolt sql-server can be configured using command line arguments or with a yaml config file. Command line configuration only provides access to a subset of all the options so we will launch Dolt like so:

dolt sql-server --config config.yaml

Initially we will configure the network host and port that our server will listen on. We will add more to our config file as we go, but our initial file looks like:

listener:
  host: 0.0.0.0
  port: 3306

This will start our server listening on port 3306 on all network interfaces. You can change this to be whatever is appropriate for your environment.

Users and Grants

The first thing you will want to do is lock down who has access to your databases. Dolt uses standard MySQL users and grants statements to control user access, and user privileges. Additionally, we will want to change the root account username and password. The root account is the default superuser account that has full access to the database. We will setup the location where users and grants get stored and set our root user at the same time by adding the following entries to our server's config.yaml file.

privilege_file: "privs.json"
user:
  name: "my-root-user-name"
  password: "my-strong-password"

Setting the privilege_file will store our user and grants in a file called privs.json in the working directory of the server. You may change the files storage location, but having a known location will help us later when we need to back up our database server.

Monitoring and Alerting

Prerequisites

Monitoring is a critical part of any production system. It allows you to track the health of your system and catch issues before they become problems. Before you can monitor your Dolt database you need to choose a monitoring solution for both your metrics and logs, and implement a way to access and view your data. Some solutions, like AWS CloudWatch, provide both metrics and logs in a single solution, and provide a means of building dashboards to view your data all using Amazon's cloud console to provide secure access. Other solutions like the ELK Stack (Elasticsearch, Logstash, Kibana) give you more control of your data, but require more setup and maintenance. Additionally you will need to handle auth and access control to ensure that only authorized users can access your monitoring data. Other popular monitoring solutions include:

Metrics

Now that you have a place to store your metrics and logs, you will want to track both system and application metrics. I will leave system metric tracking to you, as cloud instances, instances in your datacenter, and kubernetes deployments may have different metrics that you will want to track, as does the operating system you are running on.

Dolt provides a single http endpoint that must be configured to expose metrics. To do this we will add a metrics section to the config. This section will contain both a host and a port entry and should look something like:

metrics:
  host: "0.0.0.0"
  port: 8000

Again, you should change these values to be whatever is appropriate for your environment. Our server here will listen on port 8000 for requests to /metrics. This endpoint will return a prometheus compatible response that can be scraped by your monitoring solution (Prometheus metrics format is very popular and is compatible with most systems). You can inspect the metrics by simply sending a command line request or using a browser to access the endpoint.

~>curl http://127.0.0.1:8000/metrics
# HELP dss_concurrent_connections Number of clients concurrently connected to this instance of dolt sql server
# TYPE dss_concurrent_connections gauge
dss_concurrent_connections 0
# HELP dss_concurrent_queries Number of queries concurrently being run on this instance of dolt sql server
# TYPE dss_concurrent_queries gauge
dss_concurrent_queries 0
# HELP dss_connects Count of server connects
# TYPE dss_connects counter
dss_connects 0
# HELP dss_disconnects Count of server disconnects
# TYPE dss_disconnects counter
dss_disconnects 0
# HELP dss_dolt_version The version of dolt currently running on the machine
# TYPE dss_dolt_version gauge
dss_dolt_version 1.9529745e+07
# HELP dss_query_duration Histogram of dolt sql server query runtimes
# TYPE dss_query_duration histogram
dss_query_duration_bucket{le="0.01"} 0
dss_query_duration_bucket{le="0.1"} 0
dss_query_duration_bucket{le="1"} 0
dss_query_duration_bucket{le="10"} 0
dss_query_duration_bucket{le="100"} 0
dss_query_duration_bucket{le="1000"} 0
dss_query_duration_bucket{le="+Inf"} 0
dss_query_duration_sum 0
dss_query_duration_count 0
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.8962e-05
go_gc_duration_seconds{quantile="0.25"} 0.000175573
go_gc_duration_seconds{quantile="0.5"} 0.000192408
go_gc_duration_seconds{quantile="0.75"} 0.000298023
go_gc_duration_seconds{quantile="1"} 0.000298023
go_gc_duration_seconds_sum 0.000684966
go_gc_duration_seconds_count 4
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 27

# ... and many more metrics

Something to note is that there is no auth on the metrics endpoint. This means that anyone who can access the endpoint can see the metrics so you will want to make sure that the endpoint is not exposed to the public internet.

Inspecting the metrics that get returned will show metrics beginning with the prefix dss_ which are our Dolt SQL Server application metrics. The metrics beginning with go_ are Go runtime metrics.

Alerting

Having metrics is great, but you aren't going to sit and watch them all day. You need a way to be alerted when something goes wrong. For this you will want to integrate with an alerting system. Many monitoring solutions provide alerting as part of their offering, but you can also use a separate system like:

Alerting systems allow you to set up rules that will trigger alerts when certain conditions are met. For example, you could set up an alert that triggers when the query latency exceeds a certain threshold, or when CPU usage is too high. These alerting systems generally provide ways to setup on call rotations, and to escalate alerts if they are not acknowledged in a certain amount of time.

Logging

So now you have metrics and alerts, but what do you do when you get an alert? You need logs. Logs are a critical part of troubleshooting and debugging issues in your system. By default, Dolt logs to stdout and stderr, but you can configure it to log to a file as well using the command line flag --out-and-err. This will log both stdout and stderr to a file. Additionally you can configure the log level in your dolt sql-server config file. The log_level entry can be set to one of debug, info, warning, error, or fatal. The default is info.

log_level: "info"

If we are writing our logs to a file we will need to setup log rotation to prevent the log file from growing too large. Most logging solutions provide log rotation as part of their offering, but you can also use a separate system like:

Logrotate (for linux systems).
Fluentd

Once we are happy with the way our logs are being written and rotated we can send them to our log aggregation system. This system will store and index our logs so that we can search and filter them. The specifics of how to do this will depend on the system you are using, but most systems provide a way to send logs over http or tcp. Like our metrics, we will need to make sure we have set up auth and access control to ensure that only authorized users can read the logs as it is possible that sensitive information could be logged.

Disaster Recovery

No matter how good your database is, things can go wrong. Disks can fail, data can be corrupted, and servers can go down. You need to be prepared for these events, and have a plan in place to recover from them.

Backups

The first step in being able to recover from a disaster, is to have a backup of your data. With a backup we can restore our data to a point in time before the disaster occurred. There are a couple of strategies that we can use to backup our data:

call dolt_push

The simplest way to back up your data is to set up a Dolt Remote for each database and use dolt_push to push your data to the remote. This works well for backing up one branch of a database at a time, but if you want to inspect and push all your branches of all databases, you'll need to write a script to do so. dolt_push will push incremental updates of your database to the remote which is more efficient than storing the entire database each time. Additionally, it doesn't backup changes to the working set of a branch, only committed data.

If disaster does strike, and you want to stand up a new Dolt sql-server instance with the data from the remotes, you can simply clone the remote for each database.

call dolt_backup

The alternative is to use the dolt_backup. stored procedure. Unlike dolt_push, dolt_backup will back up the entire database, including all branches and their working sets, tags, and remote tracking refs. You will still need to back up each database individually.

Executing the dolt_backup can be executed in two ways. The first is to configure a named url for the backup, and then executing CALL dolt_backup('sync', 'named-url'); to back up the database to the named url. You can also skip the url naming step and execute CALL dolt_backup('sync-url', <DESTINATION>); to back up the database to the url directly. The backup url format supports aws, gs, and file url schemas. Examples of each are:

aws://[DYNAMO_TABLE:/S3_BUCKET_NAME]/path/to/backup/to
gs://BUCKET_NAME/path/to/backup/to
file://path/to/backup/to

Because dolt_backup is done via a stored procedure run in a SQL session, we want to set up credentials to access the backup storage location on a per user basis. This is currently only supported for AWS backups, and is done by providing an AWS credential file and an AWS region in the server's config.yaml file.

user_session_vars:
  name: "SQL_USER_NAME",
  vars:
    aws_credentials_file: "/path/to/aws/credentials/file",
    aws_credentials_region: "AWS_REGION_WHERE_BACKUP_WILL_BE_STORED"

This will allow only the user named SQL_USER_NAME to use the credentials stored in the file /path/to/aws/credentials/file to access aws resources like the S3 bucket where the backup will be stored, and the DynamoDB table where the backup metadata will be stored. An example of the credentials file is:

[default]
aws_access_key_id=ASIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
aws_session_token = IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZVERYLONGSTRINGEXAMPLE

Additional Things to Backup

In addition to backing up your data, you will want to backup your server configuration, and user and grants. This is as simple as copying our config.yaml and privs.json files to our backup storage location. Your script that takes periodic backups of all of your databases should also copy these files to your backup storage location.

Replication

Backups are great, but what if you could recover from disaster with little to no downtime. With one or more read replicas you can do just that. Read replicas are copies of your database that are kept in sync with the primary database. If the primary database goes down, you can promote a read replica to be the primary database and continue serving traffic.

Replication is a complex topic that is covered in depth in our replication documentation.

Conclusion

Taking any system to production is a big step, and Dolt is no different. There are numerous considerations and steps that you need to take to ensure that your system is production ready. If you have a lot of experience with taking systems to production, and you already have a monitoring, alerting, and backup storage location in place, then this isn't to be anything you haven't done before. However, if you don't have experience with these things, or if you just don't want to deal with them yourself, Hosted Dolt is a great option. We take care of all of these things for you, so you can focus on building your application. Additionally, if an alarm does sound, or if you are having production issues we have a team of experienced Dolt engineers that will respond to alerts and user created incident reports. Check out https://hosted.doltdb.com/ for more information, or come and talk to us on Discord.

Blog