Automate Your Database Workflow with the New DoltHub API

February 24, 2023

4 min read

We are excited to announce the release of the new DoltHub API, designed to make it easier to manage and collaborate on databases programmatically. The DoltHub API offers a range of features that can help streamline your work and improve productivity.

We had a number of users, especially during bounties, programming against the "unofficial" GraphQL API to automate things like creating Pull Requests. We've designed the new "official" API with those use cases in mind.

Key Features of the DoltHub API

The DoltHub API offers a range of features that can help you automate tasks and work more efficiently with your databases. Here are some of the key features:

Creating databases: With the new DoltHub API, you can create new databases programmatically using a POST request to the database endpoint. This can help you automate the process of creating and setting up new databases, like when you are using DoltHub as a replication broker.
Creating pull requests: You can create pull requests on your databases, to automate data insert and update workflows. You can open a pull request by making a POST request to the {owner}/{database}/pulls endpoint.
Adding pull request comments: The API also enables you to add comments to pull requests programmatically, allowing automated tools to post comments if your data quality control checks fails.
Merging pull requests: Finally, merge pull requests programmatically to help streamline the process of integrating changes into your databases after automated tests pass. You can merge a pull request by making a POST request to the {owner}/{database}/pulls/{pull_id}/merge endpoint, and get the merge operation status by sending a GET request to the same endpoint with the operation_name.

How to use the DoltHub API

To get started with the DoltHub API, you first need to create a DoltHub account. You'll need an API token to authenticate your API requests, which you can create in your account settings. The API is built using standard REST principles, and all responses are returned in JSON format.

Creating Databases

To create a new database, simply make a POST request to the database endpoint, providing ownerName, repoName and visibility the new database.

Here's an example of how to create a new database called museum-collections under the organization dolthub using an authorization token.

import requests

url = 'https://www.dolthub.com/api/v1alpha1/database'

headers = {
  'authorization': 'token YOUR_API_TOKEN'
  }

data = {
  "description": "Records from museums around the world.",
  "ownerName": "dolthub",
  "repoName": "museum-collections",
  "visibility": "public"
}

response = requests.post(url, headers=headers, json=data)

The JSON response returned:

{
  "status": "Success",
  "description": "Records from museums around the world.",
  "repository_owner": "dolthub",
  "repository_name": "museum-collections",
  "visibility": "public"
}

Creating Pull Requests

To create a pull request on a database, make a POST request to the {owner}/{database}/pulls endpoint.

Here is an example of opening a pull request on the museum-collections database with data from the Los Angeles County Museum of Art (LACMA). This data was added to the lacma branch on a fork of the dolthub/museum-collections database, whose owner is now liuliu. We would like to eventually merge the lacma branch from liuliu/museum-collections into the main branch of dolthub/museum-collections.

import requests

from_owner_name, to_owner_name, database_name =  "liuliu", "dolthub", "museum-collections"

url = "https://www.dolthub.com/api/v1alpha1/{}/{}/pulls".format(to_owner_name, database_name)

headers = {
  'authorization': 'token YOUR_API_TOKEN'
  }

data = {
  "title": "LACMA data",
  "description": "Records from the Los Angeles County of Museum.",
  "fromBranchOwnerName":  from_owner_name,
  "fromBranchRepoName": database_name,
  "fromBranchName": "lacma",
  "toBranchOwnerName": to_owner_name,
  "toBranchRepoName": database_name,
  "toBranchName": "main"
}

response = requests.post(url, headers=headers, json=data)

A successful JSON response includes a pull_id:

{
  "status": "Success",
  "title": "LACMA data",
  "description": "Records from the Los Angeles County of Museum.",
  "from_owner_name": "liuliu",
  "from_repository_name": "museum-collections",
  "from_branch_name": "lacma",
  "to_owner_name": "dolthub",
  "to_repository_name": "museum-collections",
  "to_branch_name": "main",
  "pull_id": "66"
}

Creating Pull Request Comment

To make a comment on a pull request, like when our automated data quality tests pass, make a POST request to the {owner}/{database}/pulls/{pull_id}/comments endpoint.

Here is an example of adding a pull request comment using an authorization token.

import requests

url = 'https://www.dolthub.com/api/v1alpha1/dolthub/museum-collections/pulls/66/comments'

headers = {
  'authorization': 'token YOUR_API_TOKEN'
  }

data ={
  "comment": "The pull request looks good!"
}

response = requests.post(url, headers=headers, json=data)

The JSON response:

{
  "status": "Success",
  "repository_owner": "dolthub",
  "repository_name": "museum-collections",
  "pull_id": "66",
  "comment": "The pull request looks good!"
}

Merging Pull Request

To merge a pull request, make a POST request to the {owner}/{database}/pulls/{pull_id}/merge endpoint.

Here is an example of merging pull request #66 on a database museum-collections using an authorization token. Note that the merge operation is asynchronous and creates an operation that can be polled to get the result.

import requests

url = 'https://www.dolthub.com/api/v1alpha1/dolthub/museum-collections/pulls/66/merge'

headers = {
  'authorization': 'token YOUR_API_TOKEN'
  }

response = requests.post(url, headers=headers )

A successful JSON response includes the operation_name:

{
  "status": "Success",
  "repository_owner": "dolthub",
  "repository_name": "museum-collections",
  "pull_id": "66",
  "operation_name": "operations/b09a9221-9dcb-4a15-9ca8-a64656946f12"
}

To poll the operation and check its status, you can use the operation_name from the returned response of the merge request to query the API. Once the operation is complete, the response will contain a job_id field indicating the job that's running the merge, as well as other information such as the repository_owner, repository_name, and pull_id.

Keep in mind that the time it takes for the merge operation to complete can vary depending on the size of the pull request and the complexity of the changes being merged.

import requests

url = 'https://www.dolthub.com/api/v1alpha1/dolthub/museum-collections/pulls/66/merge'

headers = {
  'authorization': 'token YOUR_API_TOKEN'
  }
data={
  "operationName": "operations/b09a9221-9dcb-4a15-9ca8-a64656946f12"
}

response = requests.get(url, headers=headers, json=data)

The JSON response includes a job_id field:

{
  "status": "Success",
  "operation_name": "operations/b09a9221-9dcb-4a15-9ca8-a64656946f12",
  "done": true,
  "job_id": "1",
  "repository_owner": "dolthub",
  "repository_name": "museum-collections",
  "pull_id": "66"
}

you can view the job status on DoltHub website, we will provide an API endpoint for query jobs in the future.

Conclusion

The new DoltHub API makes it easier than ever to work with your databases programmatically. Whether you're a developer, data scientist, or one of our bounty hunters, the DoltHub API has you covered. See our API documentation for more information. Try it out today and let us know what you think on Discord.

Blog