Getting Started: SQLAlchemy and Doltgres

DOLTGRESREFERENCE
22 min read

Here at DoltHub, we built Doltgres, the world's first version-controlled PostgreSQL database. Doltgres lets you branch, fork, clone, merge, and diff your relational data, in all the same ways Git lets you work with files. Doltgres is PostgreSQL-compatible so all the tools you're used to using with PostgreSQL work the same way with Doltgres, like SQLAlchemy.

SQLAlchemy is an extremely popular Python Object Relational Mapper (ORM) that augments standard Python PostgreSQL connection libraries. The main features are reflection, a query builder, and a mapped object interface for writes.

SQLAlchemy has been very popular with users of Dolt, our MySQL-compatible version controlled database, and we're excited to announce that SQLAlchemy works with Doltgres now, too. Two years ago we published a getting started walkthrough on using Dolt with SQLAlchemy and this post provides a similar getting started walkthrough for using SQLAlchemy with Doltgres.

Doltgres + SQLAlchemy

TLDR; The Code

If you don't want to run through the tutorial and just want the demo.py code to play with, it is available in this GitHub repository. demo.py is runnable if you have a Doltgres SQL Server started on the default port (i.e. 5432) with a database named sql_alchemy_big_demo created.

You must have Git, Doltgres, Python, psycopg2, and SQLAlchemy installed. Then, in one terminal start the Doltgres server:

$ doltgres
INFO[0000] Creating root@localhost superuser            
INFO[0000] Server ready. Accepting connections.         
WARN[0000] secure_file_priv is set to "", which is insecure. 
WARN[0000] Any user with GRANT FILE privileges will be able to read any file which the sql-server process can read. 
WARN[0000] Please consider restarting the server with secure_file_priv set to a safe (or non-existent) directory. 

In another terminal, test connectivity to the server by logging in and creating the sql_alchemy_big_demo database:

$ PGPASSWORD=password psql -Upostgres -hlocalhost
psql (15.12 (Homebrew), server 15.0)
Type "help" for help.

postgres=> CREATE DATABASE sql_alchemy_big_demo;
CREATE DATABASE

Then, in one more terminal, clone the doltgres-sqlalchemy-getting-started repository from GitHub and run the demo script:

$ mkdir ~/sqlalchemy-demo && cd ~/sqlalchemy-demo

$ git clone https://github.com/dolthub/doltgres-sqlalchemy-getting-started.git
Cloning into 'doltgres-sqlalchemy-getting-started'...
remote: Enumerating objects: 59, done.
remote: Counting objects: 100% (59/59), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 59 (delta 17), reused 52 (delta 16), pack-reused 0
Receiving objects: 100% (59/59), 15.05 KiB | 2.15 MiB/s, done.
Resolving deltas: 100% (17/17), done.

$ cd doltgres-sqlalchemy-getting-started

$ python demo.py
Using branch: main
Active branch: main
Resetting to commit: doltu614up273ideb65ntr9r8jl8q1t2
Tables in database:
	employees
	employees_teams
	teams
Created commit: dbbp9lbdprorffhv82ughtr0qndut3mi
...
...
...

You can run the demo script multiple times. It resets Doltgres to the init commit and deletes all the branches so it works the same on each subsequent execution. The code shows off table creation, Doltgres commits, reading Doltgres system tables using reflection, rollback using Doltgres reset, branching, and merging all with SQLAlchemy flair.

Install Doltgres, psycopg2, and SQLAlchemy

Doltgres is a single binary and is really easy to install – download it from the GitHub releases and put it on your PATH.

For the rest of this blog, I'm going to assume you are on a *NIX based system and use bash when interacting on the command line.

I will also assume you have python and pip installed on your machine. There are many ways to install python and they are fraught with peril. To get the Python packages you need for this article, run:

$ pip install psycopg2
$ pip install sqlalchemy

You are all set. You have everything you need.

Start the Doltgres Server

The first thing we need to do is get our Doltgres server running, so that SQLAlchemy can connect to. To start a server, on the default PostgreSQL port, just run doltgres. You should see output like this:

$ doltgres
INFO[0000] Creating root@localhost superuser            
INFO[0000] Server ready. Accepting connections.         
WARN[0000] secure_file_priv is set to "", which is insecure. 
WARN[0000] Any user with GRANT FILE privileges will be able to read any file which the sql-server process can read. 
WARN[0000] Please consider restarting the server with secure_file_priv set to a safe (or non-existent) directory. 

The terminal will just hang there. Any errors will be printed in this terminal. Leave this terminal open and open a new one.

Create a new Database

For this demo, we need a database named sql_alchemy_big_demo. We'll open a connection to the Doltgres server using the psql command and use CREATE DATABASE to create the new sql_alchemy_big_demo database.

$ PGPASSWORD=password psql -Upostgres -hlocalhost
psql (15.12 (Homebrew), server 15.0)
Type "help" for help.

postgres=> CREATE DATABASE sql_alchemy_big_demo;
CREATE DATABASE

Connect SQLAlchemy

This is where we get to start using Python! I'm going to assume you are running the script using the process described in the TLDR section. In the following sections, I'll include Python code snippets and explain what they do and why. I don't expect you to run this code in the Python shell. It's a bit too complicated for that.

So, in SQLAlchemy there is a concept of an engine. This is the "home base" for your database connectivity. You make an engine using the create_engine() function and a connection string. For Doltgres you use the PostgreSQL flavor connection string like so:

engine = create_engine(
    "postgresql+psycopg2://postgres@127.0.0.1:5432/sql_alchemy_big_demo"
)

You'll notice in the demo script, I've wrapped engine creation in a function called dolt_checkout() that takes a branch name. This is a useful abstraction in SQLAlchemy. If you tack a /<branch name> onto the end of a connection string in Doltgres, it connects to that branch. If you don't specify a branch in the connection string, it connects to the default branch, in most cases main. Thus, making the SQLAlchemy engine branch-aware allows us to persist branch information across connections.

engine = dolt_checkout('main')

def dolt_checkout(branch):
    engine_base = "postgresql+psycopg2://postgres@127.0.0.1:5432/sql_alchemy_big_demo"
    # Branches can be "checked out" via connection string. We make heavy use
    # of reflection in this example for system tables so passing around an
    # engine instead of a connection is best for this example.
    engine = create_engine(
        engine_base + "/" + branch
    )
    print("Using branch: " + branch)
    return engine

Once you've created an engine, it's standard SQLAlchemy to connect and run queries. The first thing we do is print the active branch. This function is a wrapper around the Doltgres function active_branch().

print_active_branch(engine)

def print_active_branch(engine):
    stmt = text("select active_branch()")
    with engine.connect() as conn:
        results = conn.execute(stmt)
        rows = results.fetchall()
        active_branch = rows[0][0]
        print("Active branch: " + active_branch)

This prints:

Using branch: main
Active branch: main

Note, there is also a dolt_checkout() function that switches branches for a connection. We chose an engine-based approach to switch branches because SQLAlchemy seems to encourage multiple connections in their documentation. Reflection in particular, which we use a lot in this demo, requires an engine, not a connection.

Create Tables

Now, it's time to make some tables. In this example, our database will have three tables: employees, teams, and employees_teams. We can define these in the standard SQLAlchemy way: create a Metadata() object, define the tables on the Metadata object using the Table() class, and at the end run the create_all() method on our Metadata object.

We turn off auto increment using autoincrement=False on all the primary keys. In SQLAlchemy, auto increment is on by default. In Doltgres, auto increment values are shared across branches to decrease the risk of conflicts. However, Doltgres best practice is to have your application specify keys if it can because auto increment values are not shared across clones. More information on clones vs branches can be found here.

setup_database(engine)

def setup_database(engine):
    metadata_obj = MetaData()

    # This is standard SQLAlchemy without the ORM
    employees_table = Table(
        "employees",
        metadata_obj,
        Column("id", Integer, primary_key=True, autoincrement=False),
        Column("last_name", String(255)),
        Column("first_name", String(255))
    )

    teams_table = Table(
        "teams",
        metadata_obj,
        Column("id", Integer, primary_key=True, autoincrement=False),
        Column("name", String(255))
    )

    employees_teams_table = Table(
        "employees_teams",
        metadata_obj,
        Column("employee_id",
               ForeignKey("employees.id"),
               primary_key=True,
               autoincrement=False),
        Column("team_id",
               ForeignKey("teams.id"),
               primary_key=True,
               autoincrement=False)
    )

    metadata_obj.create_all(engine)

Note, Doltgres supports foreign keys, secondary indexes, triggers, check constraints, and stored procedures. It's a modern, feature-rich PostgreSQL database.

Then to examine what we've done, we run \dt (PostgreSQL equivalent of show tables) and print the results.

print_tables(engine)

def print_tables(engine):
    # Raw SQL here to show what we've done
    with engine.connect() as conn:
        result = conn.execute(text("SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'"))

        print("Tables in database:")
        for row in result:
            table = row[0]
            print("\t" + table)

This outputs:

Tables in database:
	employees
	employees_teams
	teams

The employees, teams, and employees_teams tables are created.

Make a Doltgres Commit

Now, it's time to make a Doltgres commit. Doltgres commits and SQL transaction commits are different. Git and SQL both have commits and Doltgres is a combination of both so Doltgres must support both. This can be confusing. A Doltgres commit makes an entry in the commit log for versioning purposes. A SQL transaction commit is required to persist your database writes to disk so other connections can see them.

To make a Doltgres commit, we need to use the dolt_commit() function. Doltgres exposes version control write operations as functions. The naming of these functions follows the Git command line standard. git add on the Git command line becomes dolt_add() as a Doltgres SQL function. Arguments mimic Git as well. If you know Git, you already know how to use Doltgres.

The Python code below shows how we can execute the dolt_commit() function. Note, Doltgres has a staging area, just like Git, so we need to dolt_add() all the tables before making a commit. The resulting code looks like:

dolt_commit(engine, "Jason <jason@dolthub.com>", "Created tables")

def dolt_commit(engine, author, message):
    # Doltgres exposes version control writes as functions
    # Here, we use text to execute functions.
    with engine.connect() as conn:
        # -A means all tables
        conn.execute(
            text("SELECT dolt_add('-A')")
        )
        # --skip-empty so this does not fail if there is nothing to commit
        result = conn.execute(
            text("SELECT dolt_commit('--skip-empty', '--author', '"
                 + author
                 + "', '-m', '"
                 + message
                 + "')")
        )
        commit = None
        for row in result:
            commit = row[0]
        if ( commit ):
            print("Created commit: " + commit )

And running it results in the following output:

Created commit: 5ahj2vcifi29ibs48f7bp91sm752ub2i

Examine the Log

Let's examine the Doltgres commit log. Doltgres version control read operations are exposed in SQL as custom system tables or table functions. The commit log can be read using the dolt_log system table named after the git log command line equivalents. Again, if you know Git, you already know how to use Doltgres.

To automatically load the dolt_log schema using SQLAlchemy, we use reflection. We pass autoload_with=engine to the Table constructor to create the dolt_log table object. We then use the standard SQLAlchemy query builder to select and order the log elements we want to print. The resulting code looks like so:

print_commit_log(engine)

def print_commit_log(engine):
    # Examine a doltgres system table, dolt_log, using reflection
    metadata_obj = MetaData()
    print("Commit Log:")

    dolt_log = Table("dolt_log", metadata_obj, autoload_with=engine)
    stmt = select(dolt_log.c.commit_hash,
                  dolt_log.c.committer,
                  dolt_log.c.message
                  ).order_by(dolt_log.c.date.desc())

    with engine.connect() as conn:
        results = conn.execute(stmt)
        for row in results:
            commit_hash = row[0]
            author      = row[1]
            message     = row[2]
            print("\t" + commit_hash + ": " + message + " by " + author)

And it outputs the following:

Commit Log:
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Jason
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by Jason

Insert Some Data

Now, we're going to populate the tables with some data. Again, we use reflection to build table objects. Then we use standard SQLAlchemy insert syntax to insert rows.

insert_data(engine)

def load_tables(engine):
    metadata_obj = MetaData()

    employees = Table("employees", metadata_obj, autoload_with=engine)
    teams = Table("teams", metadata_obj, autoload_with=engine)
    employees_teams = Table("employees_teams",
                            metadata_obj,
                            autoload_with=engine)

    return (employees, teams, employees_teams)

def insert_data(engine):
    (employees, teams, employees_teams) = load_tables(engine)

    # This is standard SQLAlchemy
    stmt = insert(employees).values([
        {'id':0, 'last_name':'Sehn', 'first_name':'Tim'},
        {'id':1, 'last_name':'Hendriks', 'first_name':'Brian'},
        {'id':2, 'last_name':'Son', 'first_name':'Aaron'},
        {'id':3, 'last_name':'Fitzgerald', 'first_name':'Brian'}
        ])
    with engine.connect() as conn:
        conn.execute(stmt)
        conn.commit()

    stmt = insert(teams).values([
        {'id':0, 'name':'Engineering'},
        {'id':1, 'name':'Sales'}
    ])
    with engine.connect() as conn:
        conn.execute(stmt)
        conn.commit()

    stmt = insert(employees_teams).values([
        {'employee_id':0, 'team_id':0},
        {'employee_id':1, 'team_id':0},
        {'employee_id':2, 'team_id':0},
        {'employee_id':0, 'team_id':1},
        {'employee_id':3, 'team_id':1},
    ])
    with engine.connect() as conn:
        conn.execute(stmt)
        conn.commit()

Don't forget to conn.commit(). This commits the SQL transaction. SQLAlchemy has autocommit off by default so every SQL transaction must be explicitly committed. Again, both Git and SQL have the concept of a commit. Doltgres supports both so it can get a little confusing.

Let's make sure our inserts worked by displaying a summary table. SQLAlchemy comes with an impressive query builder that supports all manner of complex SQL queries. In this example, I construct a three table join. I'm expecting a schema change later in the demo so I defensively coded for that.

print_summary_table(engine)

def print_summary_table(engine):
    (employees, teams, employees_teams) = load_tables(engine)

    print("Team Summary")

    # Get all employees columns because we change the schema
    columns = []
    for column in employees.c:
        if ( column.key == "id" ):
            continue
        columns.append(column.key)

    # Must convert to tuple
    columns = tuple(columns)

    # Doltgres supports up to 12 table joins. Here we do a 3 table join.
    stmt = select(teams.c.name,
                  employees.c[columns]
                  ).select_from(
                      employees
                  ).join(
                      employees_teams,
                      employees.c.id == employees_teams.c.employee_id
                  ).join(
                      teams,
                      teams.c.id == employees_teams.c.team_id
                  ).order_by(teams.c.name.asc());
    with engine.connect() as conn:
        results = conn.execute(stmt)
        for row in results:
            team_name  = row[0]
            last_name  = row[1]
            first_name = row[2]
            start_date = ''
            if ( len(row) > 3 ):
                if ( row[3] ):
                    start_date = row[3].strftime('%Y-%m-%d')

            print("\t" + team_name + ": " + first_name + " " + last_name + " " + start_date)

Which results in the following output:

Team Summary
	Engineering: Tim Sehn
	Engineering: Brian Hendriks
	Engineering: Aaron Son
	Sales: Tim Sehn
	Sales: Brian Fitzgerald

The CEO at DoltHub wears many hats indeed.

Examine the Status and Diff

To see what tables changed, you can use the dolt_status system table. In a recurring theme, I again use reflection to automatically load the schema.

print_status(engine)

def print_status(engine):
    metadata_obj = MetaData()
    dolt_status = Table("dolt_status", metadata_obj, autoload_with=engine)

    print("Status")
    stmt = select(dolt_status.c.table_name, dolt_status.c.status)
    with engine.connect() as conn:
        results = conn.execute(stmt)
        rows = results.fetchall();
        if ( len(rows) > 0 ):
            for row in rows:
                table  = row[0]
                status = row[1]
                print("\t" + table + ": " + status)
        else:
            print("\tNo tables modified")

Now, that I see which tables changed, I want to see what rows changed in the tables. Doltgres is built from the ground up to provide fast differences (i.e. diffs) between table versions even for very large tables. Let's see how we can use Doltgres diff functionality with SQLAlchemy. In Doltgres, there is a dolt_diff_<table> system table for each user defined table and a dolt_diff() table function. The system table works better with reflection so we'll use it here. Notice, I filter the diff table down to only WORKING changes so I only see changes that aren't staged or committed.

print_diff(engine, "employees")

def print_diff(engine, table):
    metadata_obj = MetaData()

    print("Diffing table: " + table)
    dolt_diff = Table("dolt_diff_" + table,
                      metadata_obj,
                      autoload_with=engine)

    # Show only working set changes
    stmt = select(dolt_diff).where(dolt_diff.c.to_commit == 'WORKING')
    with engine.connect() as conn:
        results = conn.execute(stmt)
        for row in results:
            # I use a dictionary here because dolt_diff_<table> is a wide table
            row_dict = row._asdict()
            # Then I use pprint to display the results
            pprint(row_dict)

The resulting output looks like so:

Status
	employees: modified
	employees_teams: modified
	teams: modified
Diffing table: employees
{'diff_type': 'added',
 'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 716000),
 'from_first_name': None,
 'from_id': None,
 'from_last_name': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Tim',
 'to_id': 0,
 'to_last_name': 'Sehn'}
{'diff_type': 'added',
 'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 716000),
 'from_first_name': None,
 'from_id': None,
 'from_last_name': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Brian',
 'to_id': 1,
 'to_last_name': 'Hendriks'}
{'diff_type': 'added',
 'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 716000),
 'from_first_name': None,
 'from_id': None,
 'from_last_name': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Aaron',
 'to_id': 2,
 'to_last_name': 'Son'}
{'diff_type': 'added',
 'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 716000),
 'from_first_name': None,
 'from_id': None,
 'from_last_name': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Brian',
 'to_id': 3,
 'to_last_name': 'Fitzgerald'}

I transformed the row results to a dictionary and used pprint to save myself a bunch of diff display code that would have complicated this demo. I think it looks pretty.

Before we go onto the next section, let's commit our changes to the Doltgres commit graph.

dolt_commit(engine,
            "Aaron <aaron@dolthub.com>",
            "Inserted data into tables")
print_commit_log(engine)

Resulting in the following output:

Created commit: lf30j7u2pmrna1h4ld3gg7j9gpn9etho
Commit Log:
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Jason
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by Jason

Oh no. I made a mistake.

Dolt and Doltgres have powerful rollback capabilities. Let's imagine I accidentally drop a table. The foreign keys will prevent me from dropping employees or teams but employees_teams is not safe from my wrath!

drop_table(engine, "employees_teams")

def drop_table(engine, table):
    (employees, teams, employees_teams) = load_tables(engine)

    if ( table == "employees"):
        employees.drop(engine)
    elif ( table ==  "teams" ):
        teams.drop(engine)
    elif ( table == "employees_teams" ):
        employees_teams.drop(engine)
    else:
        print(table + ": Not found")

As we can see from status and table list, it is gone.

print_status(engine)
print_tables(engine)
Status
	employees_teams: deleted
Tables in database:
	employees
	teams

In a traditional database, this could be disastrous. In Doltgres, we can get it back with a simple select dolt_reset('hard'). This function takes an optional commit. If no commit is specified it resets to the HEAD commit.

dolt_reset_hard(engine, None)
print_status(engine)
print_tables(engine)

def dolt_reset_hard(engine, commit):
    if ( commit ):
        stmt = text("SELECT dolt_reset('--hard', '" + commit + "')")
        print("Resetting to commit: " + commit)
    else:
        stmt = text("SELECT dolt_reset('--hard')")
        print("Resetting to HEAD")

    with engine.connect() as conn:
        results = conn.execute(stmt)
        conn.commit()

Now, the output:

Resetting to HEAD
Status
	No tables modified
Tables in database:
	employees
	employees_teams
	teams

Doltgres makes operating databases less error prone. You can always back out changes you have in progress or rewind to a known good state.

Change data on a branch

Doltgres is the only PostgreSQL database with branches and merges. We've seen some PostgreSQL databases with branches, but none that support merging those branches. Here I show you how to create and switch branches. I'll make some changes and commit them. Later, I'll merge all my changes together. Think of a branch as a really long SQL transaction.

First, you need to create a branch. We can create a branch with the dolt_branch() function. In the Python code, I also consult the dolt_branches system table to make sure the branch does not already exist. We used the dolt_checkout() method described in the Connect to the Database section to switch branches using the engine object.

dolt_create_branch(engine, 'modify_data')
engine = dolt_checkout('modify_data')

def dolt_create_branch(engine, branch):
    # Check if branch exists
    metadata_obj = MetaData()

    dolt_branches = Table("dolt_branches", metadata_obj, autoload_with=engine)
    stmt = select(dolt_branches.c.name).where(dolt_branches.c.name == branch)
    with engine.connect() as conn:
        results = conn.execute(stmt)
        rows = results.fetchall()
        if ( len(rows) > 0 ):
             print("Branch exists: " + branch)
             return

    # Create branch
    stmt = text("SELECT dolt_branch('" + branch + "')")
    with engine.connect() as conn:
        results = conn.execute(stmt)
        print("Created branch: " + branch)

Now that we're on a new branch, it's safe to make changes. The main branch will remain unchanged as I make these changes. This function inserts, updates, and deletes using the SQLAlchemy query builder.

modify_data(engine)

def modify_data(engine):
    (employees, teams, employees_teams) = load_tables(engine)

    update_stmt = update(employees).where(employees.c.first_name == 'Tim'
                                          ).values(first_name='Timothy')

    insert_emp_stmt = insert(employees).values([
        {'id':4, 'last_name':'Wilkins', 'first_name':'Daylon'}
        ])
    insert_et_stmt = insert(employees_teams).values([
        {'employee_id':4, 'team_id':0}
    ])

    delete_stmt = delete(employees_teams).where(
        employees_teams.c.employee_id == 0
    ).where(employees_teams.c.team_id == 1)

    with engine.connect() as conn:
        conn.execute(update_stmt)
        conn.execute(insert_emp_stmt)
        conn.execute(insert_et_stmt)
        conn.execute(delete_stmt)
        conn.commit()

Let's inspect what we've done to make sure it looks good.

print_status(engine)
print_diff(engine, 'employees')
print_diff(engine, 'employees_teams')
print_summary_table(engine)
Using branch: modify_data
Status
	employees: modified
	employees_teams: modified
Diffing table: employees
{'diff_type': 'modified',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
 'from_first_name': 'Tim',
 'from_id': 0,
 'from_last_name': 'Sehn',
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Timothy',
 'to_id': 0,
 'to_last_name': 'Sehn'}
{'diff_type': 'added',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
 'from_first_name': None,
 'from_id': None,
 'from_last_name': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Daylon',
 'to_id': 4,
 'to_last_name': 'Wilkins'}
Diffing table: employees_teams
{'diff_type': 'removed',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
 'from_employee_id': 0,
 'from_team_id': 1,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_employee_id': None,
 'to_team_id': None}
{'diff_type': 'added',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
 'from_employee_id': None,
 'from_team_id': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_employee_id': 4,
 'to_team_id': 0}
Team Summary
	Engineering: Timothy Sehn
	Engineering: Brian Hendriks
	Engineering: Aaron Son
	Engineering: Daylon Wilkins
	Sales: Brian Fitzgerald

Daylon is added to the engineering team on the modify_data branch. Tim is no longer on the Sales team. Engineering for life!

Finally, let's commit these changes so we can make different changes on another branch.

dolt_commit(engine, 'Brian <brian@dolthub.com>', 'Modified data on branch')
print_commit_log(engine)
Created commit: 0o07b2f3r788t9nn2vg139maviu49fd4
Commit Log:
	0o07b2f3r788t9nn2vg139maviu49fd4: Modified data on branch by Brian
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Jason
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by Jason

Change schema on another branch

We're going to make a schema change on another branch and make some data modifications using the SQLAlchemy Session interface. The Session interface is the true alchemy in SQLAlchemy. You bind Python objects to tables and when you add, modify, or delete Python objects, calling Session.commit() automatically updates the records in your tables. Truly lead into gold stuff.

Below, you'll see we check out the main branch so the new branch has the correct base branch. Then, we create a new branch called modify_schema. Then, we run the modify_schema() function which adds a start date column and populates it using the Session interface. We finally use status and diff to show off what changed.

engine = dolt_checkout('main')
dolt_create_branch(engine, 'modify_schema')
engine = dolt_checkout('modify_schema')
print_active_branch(engine)
modify_schema(engine)
print_status(engine)
print_diff(engine, "employees")
print_summary_table(engine)

def modify_schema(engine):
    (employees, teams, employees_teams) = load_tables(engine)

    # SQLAlchemy does not support table alters so we use text
    stmt = text('alter table employees add column start_date date')
    with engine.connect() as conn:
        conn.execute(stmt)
        conn.commit()

    # Update using the SQL Alchemy session interface
    class Base(DeclarativeBase):
        pass

    class Employee(Base):
        __tablename__ = "employees"
        id: Mapped[int] = mapped_column(primary_key=True)
        last_name: Mapped[str] = mapped_column(String(255))
        first_name: Mapped[str] = mapped_column(String(255))
        start_date: Mapped[Date] = mapped_column(Date)

        def __repr__(self) -> str:
            return f"Employee(id={self.id!r}, last_name={self.last_name!r}, first_name={self.first_name!r}, start_date={self.start_date!r})"

    session = Session(engine)
    Tim = session.get(Employee, 0)
    Tim.start_date = "2018-08-06"

    Aaron = session.get(Employee, 1)
    Aaron.start_date = "2018-08-06"

    BHeni = session.get(Employee, 2)
    BHeni.start_date = "2018-08-06"

    Fitz = session.execute(select(Employee).filter_by(last_name="Fitzgerald")).scalar_one()
    Fitz.start_date = "2021-04-19"

    session.commit()

This outputs the following:

Using branch: main
Created branch: modify_schema
Using branch: modify_schema
Active branch: modify_schema
Status
	employees: modified
Diffing table: employees
{'diff_type': 'modified',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
 'from_first_name': 'Tim',
 'from_id': 0,
 'from_last_name': 'Sehn',
 'from_start_date': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Tim',
 'to_id': 0,
 'to_last_name': 'Sehn',
 'to_start_date': datetime.date(2018, 8, 6)}
{'diff_type': 'modified',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
 'from_first_name': 'Brian',
 'from_id': 1,
 'from_last_name': 'Hendriks',
 'from_start_date': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Brian',
 'to_id': 1,
 'to_last_name': 'Hendriks',
 'to_start_date': datetime.date(2018, 8, 6)}
{'diff_type': 'modified',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
 'from_first_name': 'Aaron',
 'from_id': 2,
 'from_last_name': 'Son',
 'from_start_date': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Aaron',
 'to_id': 2,
 'to_last_name': 'Son',
 'to_start_date': datetime.date(2018, 8, 6)}
{'diff_type': 'modified',
 'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
 'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
 'from_first_name': 'Brian',
 'from_id': 3,
 'from_last_name': 'Fitzgerald',
 'from_start_date': None,
 'to_commit': 'WORKING',
 'to_commit_date': None,
 'to_first_name': 'Brian',
 'to_id': 3,
 'to_last_name': 'Fitzgerald',
 'to_start_date': datetime.date(2021, 4, 19)}
Team Summary
	Engineering: Tim Sehn 2018-08-06
	Engineering: Brian Hendriks 2018-08-06
	Engineering: Aaron Son 2018-08-06
	Sales: Tim Sehn 2018-08-06
	Sales: Brian Fitzgerald 2021-04-19

As you can see, my defensive coding in the Insert Some Data section paid off and employee start dates are displayed. This looks good, so we'll commit it.

    dolt_commit(engine, 'Jason <jason@dolthub.com>', 'Modified schema on branch')
    print_commit_log(engine)
Created commit: apf44oec5ctk3cu0e0s2ot90pgpr2f1s
Commit Log:
	apf44oec5ctk3cu0e0s2ot90pgpr2f1s: Modified schema on branch by Jason
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn

Merge it all together

Now for our last trick. We will merge all the branches together and show the resulting summary table. To merge, you use the dolt_merge() function.

engine = dolt_checkout('main')
print_active_branch(engine)
print_commit_log(engine)
print_summary_table(engine)
dolt_merge(engine, 'modify_data')
print_summary_table(engine)
print_commit_log(engine)
dolt_merge(engine, 'modify_schema')
print_commit_log(engine)
print_summary_table(engine)

def dolt_merge(engine, branch):
    stmt = text("SELECT dolt_merge('" + branch + "')")
    with engine.connect() as conn:
        results = conn.execute(stmt)
        rows = results.fetchall()
        commit       = rows[0][0]
        fast_forward = rows[0][1]
        conflicts    = rows[0][2]
        print("Merge Complete: " + branch)
        print("\tCommit: " + commit)
        print("\tFast Forward: " + str(fast_forward))
        print("\tConflicts: " + str(conflicts))

This outputs the following. It's beautiful. You can see the data and schema evolving as we merge.

Using branch: main
Active branch: main
Commit Log:
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Jason
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by Jason
Team Summary
	Engineering: Tim Sehn
	Engineering: Brian Hendriks
	Engineering: Aaron Son
	Sales: Tim Sehn
	Sales: Brian Fitzgerald
Merge Complete: modify_data
	Commit: 0o07b2f3r788t9nn2vg139maviu49fd4
	Fast Forward: 1
	Conflicts: 0
Team Summary
	Engineering: Timothy Sehn
	Engineering: Brian Hendriks
	Engineering: Aaron Son
	Engineering: Daylon Wilkins
	Sales: Brian Fitzgerald
Commit Log:
	0o07b2f3r788t9nn2vg139maviu49fd4: Modified data on branch by Brian
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn
Merge Complete: modify_schema
	Commit: 2jmueascj0cnrba7pl46m9ok5et4ccmg
	Fast Forward: 0
	Conflicts: 0
Commit Log:
	2jmueascj0cnrba7pl46m9ok5et4ccmg: Merge branch 'modify_schema' into main by postgres
	apf44oec5ctk3cu0e0s2ot90pgpr2f1s: Modified schema on branch by Tim
	0o07b2f3r788t9nn2vg139maviu49fd4: Modified data on branch by Brian
	lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
	5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
	do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn
Team Summary
	Engineering: Timothy Sehn 2018-08-06
	Engineering: Brian Hendriks 2018-08-06
	Engineering: Aaron Son 2018-08-06
	Engineering: Daylon Wilkins
	Sales: Brian Fitzgerald 2021-04-19

Notice the first merge was a fast-forward merge just like in Git. Doltgres will detect schema and data conflicts if you make them.

Conclusion

Phew. That was a lot! If you made it this far, congratulations. You are now ready to build your own Doltgres application using Python and SQLAlchemy. If this wasn't enough to get you started or you've still got more questions, come by our Discord and we'll be help you out!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.