Getting Started: SQLAlchemy and Doltgres
Here at DoltHub, we built Doltgres, the world's first version-controlled PostgreSQL database. Doltgres lets you branch, fork, clone, merge, and diff your relational data, in all the same ways Git lets you work with files. Doltgres is PostgreSQL-compatible so all the tools you're used to using with PostgreSQL work the same way with Doltgres, like SQLAlchemy.
SQLAlchemy is an extremely popular Python Object Relational Mapper (ORM) that augments standard Python PostgreSQL connection libraries. The main features are reflection, a query builder, and a mapped object interface for writes.
SQLAlchemy has been very popular with users of Dolt, our MySQL-compatible version controlled database, and we're excited to announce that SQLAlchemy works with Doltgres now, too. Two years ago we published a getting started walkthrough on using Dolt with SQLAlchemy and this post provides a similar getting started walkthrough for using SQLAlchemy with Doltgres.
TLDR; The Code
If you don't want to run through the tutorial and just want the demo.py
code to play with, it is available in this GitHub repository. demo.py
is runnable if you have a Doltgres SQL Server started on the default port (i.e. 5432) with a database named sql_alchemy_big_demo
created.
You must have Git, Doltgres, Python, psycopg2, and SQLAlchemy installed. Then, in one terminal start the Doltgres server:
$ doltgres
INFO[0000] Creating root@localhost superuser
INFO[0000] Server ready. Accepting connections.
WARN[0000] secure_file_priv is set to "", which is insecure.
WARN[0000] Any user with GRANT FILE privileges will be able to read any file which the sql-server process can read.
WARN[0000] Please consider restarting the server with secure_file_priv set to a safe (or non-existent) directory.
In another terminal, test connectivity to the server by logging in and creating the sql_alchemy_big_demo
database:
$ PGPASSWORD=password psql -Upostgres -hlocalhost
psql (15.12 (Homebrew), server 15.0)
Type "help" for help.
postgres=> CREATE DATABASE sql_alchemy_big_demo;
CREATE DATABASE
Then, in one more terminal, clone the doltgres-sqlalchemy-getting-started
repository from GitHub and run the demo script:
$ mkdir ~/sqlalchemy-demo && cd ~/sqlalchemy-demo
$ git clone https://github.com/dolthub/doltgres-sqlalchemy-getting-started.git
Cloning into 'doltgres-sqlalchemy-getting-started'...
remote: Enumerating objects: 59, done.
remote: Counting objects: 100% (59/59), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 59 (delta 17), reused 52 (delta 16), pack-reused 0
Receiving objects: 100% (59/59), 15.05 KiB | 2.15 MiB/s, done.
Resolving deltas: 100% (17/17), done.
$ cd doltgres-sqlalchemy-getting-started
$ python demo.py
Using branch: main
Active branch: main
Resetting to commit: doltu614up273ideb65ntr9r8jl8q1t2
Tables in database:
employees
employees_teams
teams
Created commit: dbbp9lbdprorffhv82ughtr0qndut3mi
...
...
...
You can run the demo script multiple times. It resets Doltgres to the init commit and deletes all the branches so it works the same on each subsequent execution. The code shows off table creation, Doltgres commits, reading Doltgres system tables using reflection, rollback using Doltgres reset, branching, and merging all with SQLAlchemy flair.
Install Doltgres, psycopg2, and SQLAlchemy
Doltgres is a single binary and is really easy to install – download it from the GitHub releases and put it on your PATH
.
For the rest of this blog, I'm going to assume you are on a *NIX
based system and use bash when interacting on the command line.
I will also assume you have python
and pip
installed on your machine. There are many ways to install python and they are fraught with peril. To get the Python packages you need for this article, run:
$ pip install psycopg2
$ pip install sqlalchemy
You are all set. You have everything you need.
Start the Doltgres Server
The first thing we need to do is get our Doltgres server running, so that SQLAlchemy can connect to. To start a server, on the default PostgreSQL port, just run doltgres
. You should see output like this:
$ doltgres
INFO[0000] Creating root@localhost superuser
INFO[0000] Server ready. Accepting connections.
WARN[0000] secure_file_priv is set to "", which is insecure.
WARN[0000] Any user with GRANT FILE privileges will be able to read any file which the sql-server process can read.
WARN[0000] Please consider restarting the server with secure_file_priv set to a safe (or non-existent) directory.
The terminal will just hang there. Any errors will be printed in this terminal. Leave this terminal open and open a new one.
Create a new Database
For this demo, we need a database named sql_alchemy_big_demo
. We'll open a connection to the Doltgres server using the psql
command and use CREATE DATABASE
to create the new sql_alchemy_big_demo
database.
$ PGPASSWORD=password psql -Upostgres -hlocalhost
psql (15.12 (Homebrew), server 15.0)
Type "help" for help.
postgres=> CREATE DATABASE sql_alchemy_big_demo;
CREATE DATABASE
Connect SQLAlchemy
This is where we get to start using Python! I'm going to assume you are running the script using the process described in the TLDR section. In the following sections, I'll include Python code snippets and explain what they do and why. I don't expect you to run this code in the Python shell. It's a bit too complicated for that.
So, in SQLAlchemy there is a concept of an engine. This is the "home base" for your database connectivity. You make an engine using the create_engine()
function and a connection string. For Doltgres you use the PostgreSQL flavor connection string like so:
engine = create_engine(
"postgresql+psycopg2://postgres@127.0.0.1:5432/sql_alchemy_big_demo"
)
You'll notice in the demo script, I've wrapped engine creation in a function called dolt_checkout()
that takes a branch name. This is a useful abstraction in SQLAlchemy. If you tack a /<branch name>
onto the end of a connection string in Doltgres, it connects to that branch. If you don't specify a branch in the connection string, it connects to the default branch, in most cases main
. Thus, making the SQLAlchemy engine branch-aware allows us to persist branch information across connections.
engine = dolt_checkout('main')
def dolt_checkout(branch):
engine_base = "postgresql+psycopg2://postgres@127.0.0.1:5432/sql_alchemy_big_demo"
# Branches can be "checked out" via connection string. We make heavy use
# of reflection in this example for system tables so passing around an
# engine instead of a connection is best for this example.
engine = create_engine(
engine_base + "/" + branch
)
print("Using branch: " + branch)
return engine
Once you've created an engine, it's standard SQLAlchemy to connect and run queries. The first thing we do is print the active branch. This function is a wrapper around the Doltgres function active_branch()
.
print_active_branch(engine)
def print_active_branch(engine):
stmt = text("select active_branch()")
with engine.connect() as conn:
results = conn.execute(stmt)
rows = results.fetchall()
active_branch = rows[0][0]
print("Active branch: " + active_branch)
This prints:
Using branch: main
Active branch: main
Note, there is also a dolt_checkout()
function that switches branches for a connection. We chose an engine-based approach to switch branches because SQLAlchemy seems to encourage multiple connections in their documentation. Reflection in particular, which we use a lot in this demo, requires an engine, not a connection.
Create Tables
Now, it's time to make some tables. In this example, our database will have three tables: employees
, teams
, and employees_teams
. We can define these in the standard SQLAlchemy way: create a Metadata()
object, define the tables on the Metadata object using the Table()
class, and at the end run the create_all()
method on our Metadata object.
We turn off auto increment using autoincrement=False
on all the primary keys. In SQLAlchemy, auto increment is on by default. In Doltgres, auto increment values are shared across branches to decrease the risk of conflicts. However, Doltgres best practice is to have your application specify keys if it can because auto increment values are not shared across clones. More information on clones vs branches can be found here.
setup_database(engine)
def setup_database(engine):
metadata_obj = MetaData()
# This is standard SQLAlchemy without the ORM
employees_table = Table(
"employees",
metadata_obj,
Column("id", Integer, primary_key=True, autoincrement=False),
Column("last_name", String(255)),
Column("first_name", String(255))
)
teams_table = Table(
"teams",
metadata_obj,
Column("id", Integer, primary_key=True, autoincrement=False),
Column("name", String(255))
)
employees_teams_table = Table(
"employees_teams",
metadata_obj,
Column("employee_id",
ForeignKey("employees.id"),
primary_key=True,
autoincrement=False),
Column("team_id",
ForeignKey("teams.id"),
primary_key=True,
autoincrement=False)
)
metadata_obj.create_all(engine)
Note, Doltgres supports foreign keys, secondary indexes, triggers, check constraints, and stored procedures. It's a modern, feature-rich PostgreSQL database.
Then to examine what we've done, we run \dt
(PostgreSQL equivalent of show tables
) and print the results.
print_tables(engine)
def print_tables(engine):
# Raw SQL here to show what we've done
with engine.connect() as conn:
result = conn.execute(text("SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'"))
print("Tables in database:")
for row in result:
table = row[0]
print("\t" + table)
This outputs:
Tables in database:
employees
employees_teams
teams
The employees
, teams
, and employees_teams
tables are created.
Make a Doltgres Commit
Now, it's time to make a Doltgres commit. Doltgres commits and SQL transaction commits are different. Git and SQL both have commits and Doltgres is a combination of both so Doltgres must support both. This can be confusing. A Doltgres commit makes an entry in the commit log for versioning purposes. A SQL transaction commit is required to persist your database writes to disk so other connections can see them.
To make a Doltgres commit, we need to use the dolt_commit()
function. Doltgres exposes version control write operations as functions. The naming of these functions follows the Git command line standard. git add
on the Git command line becomes dolt_add()
as a Doltgres SQL function. Arguments mimic Git as well. If you know Git, you already know how to use Doltgres.
The Python code below shows how we can execute the dolt_commit()
function. Note, Doltgres has a staging area, just like Git, so we need to dolt_add()
all the tables before making a commit. The resulting code looks like:
dolt_commit(engine, "Jason <jason@dolthub.com>", "Created tables")
def dolt_commit(engine, author, message):
# Doltgres exposes version control writes as functions
# Here, we use text to execute functions.
with engine.connect() as conn:
# -A means all tables
conn.execute(
text("SELECT dolt_add('-A')")
)
# --skip-empty so this does not fail if there is nothing to commit
result = conn.execute(
text("SELECT dolt_commit('--skip-empty', '--author', '"
+ author
+ "', '-m', '"
+ message
+ "')")
)
commit = None
for row in result:
commit = row[0]
if ( commit ):
print("Created commit: " + commit )
And running it results in the following output:
Created commit: 5ahj2vcifi29ibs48f7bp91sm752ub2i
Examine the Log
Let's examine the Doltgres commit log. Doltgres version control read operations are exposed in SQL as custom system tables or table functions. The commit log can be read using the dolt_log
system table named after the git log
command line equivalents. Again, if you know Git, you already know how to use Doltgres.
To automatically load the dolt_log
schema using SQLAlchemy, we use reflection. We pass autoload_with=engine
to the Table constructor to create the dolt_log table object. We then use the standard SQLAlchemy query builder to select and order the log elements we want to print. The resulting code looks like so:
print_commit_log(engine)
def print_commit_log(engine):
# Examine a doltgres system table, dolt_log, using reflection
metadata_obj = MetaData()
print("Commit Log:")
dolt_log = Table("dolt_log", metadata_obj, autoload_with=engine)
stmt = select(dolt_log.c.commit_hash,
dolt_log.c.committer,
dolt_log.c.message
).order_by(dolt_log.c.date.desc())
with engine.connect() as conn:
results = conn.execute(stmt)
for row in results:
commit_hash = row[0]
author = row[1]
message = row[2]
print("\t" + commit_hash + ": " + message + " by " + author)
And it outputs the following:
Commit Log:
5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Jason
do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by Jason
Insert Some Data
Now, we're going to populate the tables with some data. Again, we use reflection to build table objects. Then we use standard SQLAlchemy insert syntax to insert rows.
insert_data(engine)
def load_tables(engine):
metadata_obj = MetaData()
employees = Table("employees", metadata_obj, autoload_with=engine)
teams = Table("teams", metadata_obj, autoload_with=engine)
employees_teams = Table("employees_teams",
metadata_obj,
autoload_with=engine)
return (employees, teams, employees_teams)
def insert_data(engine):
(employees, teams, employees_teams) = load_tables(engine)
# This is standard SQLAlchemy
stmt = insert(employees).values([
{'id':0, 'last_name':'Sehn', 'first_name':'Tim'},
{'id':1, 'last_name':'Hendriks', 'first_name':'Brian'},
{'id':2, 'last_name':'Son', 'first_name':'Aaron'},
{'id':3, 'last_name':'Fitzgerald', 'first_name':'Brian'}
])
with engine.connect() as conn:
conn.execute(stmt)
conn.commit()
stmt = insert(teams).values([
{'id':0, 'name':'Engineering'},
{'id':1, 'name':'Sales'}
])
with engine.connect() as conn:
conn.execute(stmt)
conn.commit()
stmt = insert(employees_teams).values([
{'employee_id':0, 'team_id':0},
{'employee_id':1, 'team_id':0},
{'employee_id':2, 'team_id':0},
{'employee_id':0, 'team_id':1},
{'employee_id':3, 'team_id':1},
])
with engine.connect() as conn:
conn.execute(stmt)
conn.commit()
Don't forget to conn.commit()
. This commits the SQL transaction. SQLAlchemy has autocommit off by default so every SQL transaction must be explicitly committed. Again, both Git and SQL have the concept of a commit. Doltgres supports both so it can get a little confusing.
Let's make sure our inserts worked by displaying a summary table. SQLAlchemy comes with an impressive query builder that supports all manner of complex SQL queries. In this example, I construct a three table join. I'm expecting a schema change later in the demo so I defensively coded for that.
print_summary_table(engine)
def print_summary_table(engine):
(employees, teams, employees_teams) = load_tables(engine)
print("Team Summary")
# Get all employees columns because we change the schema
columns = []
for column in employees.c:
if ( column.key == "id" ):
continue
columns.append(column.key)
# Must convert to tuple
columns = tuple(columns)
# Doltgres supports up to 12 table joins. Here we do a 3 table join.
stmt = select(teams.c.name,
employees.c[columns]
).select_from(
employees
).join(
employees_teams,
employees.c.id == employees_teams.c.employee_id
).join(
teams,
teams.c.id == employees_teams.c.team_id
).order_by(teams.c.name.asc());
with engine.connect() as conn:
results = conn.execute(stmt)
for row in results:
team_name = row[0]
last_name = row[1]
first_name = row[2]
start_date = ''
if ( len(row) > 3 ):
if ( row[3] ):
start_date = row[3].strftime('%Y-%m-%d')
print("\t" + team_name + ": " + first_name + " " + last_name + " " + start_date)
Which results in the following output:
Team Summary
Engineering: Tim Sehn
Engineering: Brian Hendriks
Engineering: Aaron Son
Sales: Tim Sehn
Sales: Brian Fitzgerald
The CEO at DoltHub wears many hats indeed.
Examine the Status and Diff
To see what tables changed, you can use the dolt_status
system table. In a recurring theme, I again use reflection to automatically load the schema.
print_status(engine)
def print_status(engine):
metadata_obj = MetaData()
dolt_status = Table("dolt_status", metadata_obj, autoload_with=engine)
print("Status")
stmt = select(dolt_status.c.table_name, dolt_status.c.status)
with engine.connect() as conn:
results = conn.execute(stmt)
rows = results.fetchall();
if ( len(rows) > 0 ):
for row in rows:
table = row[0]
status = row[1]
print("\t" + table + ": " + status)
else:
print("\tNo tables modified")
Now, that I see which tables changed, I want to see what rows changed in the tables. Doltgres is built from the ground up to provide fast differences (i.e. diffs) between table versions even for very large tables. Let's see how we can use Doltgres diff functionality with SQLAlchemy. In Doltgres, there is a dolt_diff_<table>
system table for each user defined table and a dolt_diff()
table function. The system table works better with reflection so we'll use it here. Notice, I filter the diff table down to only WORKING
changes so I only see changes that aren't staged or committed.
print_diff(engine, "employees")
def print_diff(engine, table):
metadata_obj = MetaData()
print("Diffing table: " + table)
dolt_diff = Table("dolt_diff_" + table,
metadata_obj,
autoload_with=engine)
# Show only working set changes
stmt = select(dolt_diff).where(dolt_diff.c.to_commit == 'WORKING')
with engine.connect() as conn:
results = conn.execute(stmt)
for row in results:
# I use a dictionary here because dolt_diff_<table> is a wide table
row_dict = row._asdict()
# Then I use pprint to display the results
pprint(row_dict)
The resulting output looks like so:
Status
employees: modified
employees_teams: modified
teams: modified
Diffing table: employees
{'diff_type': 'added',
'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 716000),
'from_first_name': None,
'from_id': None,
'from_last_name': None,
'to_commit': 'WORKING',
'to_commit_date': None,
'to_first_name': 'Tim',
'to_id': 0,
'to_last_name': 'Sehn'}
{'diff_type': 'added',
'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 716000),
'from_first_name': None,
'from_id': None,
'from_last_name': None,
'to_commit': 'WORKING',
'to_commit_date': None,
'to_first_name': 'Brian',
'to_id': 1,
'to_last_name': 'Hendriks'}
{'diff_type': 'added',
'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 716000),
'from_first_name': None,
'from_id': None,
'from_last_name': None,
'to_commit': 'WORKING',
'to_commit_date': None,
'to_first_name': 'Aaron',
'to_id': 2,
'to_last_name': 'Son'}
{'diff_type': 'added',
'from_commit': '5ahj2vcifi29ibs48f7bp91sm752ub2i',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 716000),
'from_first_name': None,
'from_id': None,
'from_last_name': None,
'to_commit': 'WORKING',
'to_commit_date': None,
'to_first_name': 'Brian',
'to_id': 3,
'to_last_name': 'Fitzgerald'}
I transformed the row results to a dictionary and used pprint
to save myself a bunch of diff display code that would have complicated this demo. I think it looks pretty.
Before we go onto the next section, let's commit our changes to the Doltgres commit graph.
dolt_commit(engine,
"Aaron <aaron@dolthub.com>",
"Inserted data into tables")
print_commit_log(engine)
Resulting in the following output:
Created commit: lf30j7u2pmrna1h4ld3gg7j9gpn9etho
Commit Log:
lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Jason
do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by Jason
Oh no. I made a mistake.
Dolt and Doltgres have powerful rollback capabilities. Let's imagine I accidentally drop a table. The foreign keys will prevent me from dropping employees
or teams
but employees_teams
is not safe from my wrath!
drop_table(engine, "employees_teams")
def drop_table(engine, table):
(employees, teams, employees_teams) = load_tables(engine)
if ( table == "employees"):
employees.drop(engine)
elif ( table == "teams" ):
teams.drop(engine)
elif ( table == "employees_teams" ):
employees_teams.drop(engine)
else:
print(table + ": Not found")
As we can see from status and table list, it is gone.
print_status(engine)
print_tables(engine)
Status
employees_teams: deleted
Tables in database:
employees
teams
In a traditional database, this could be disastrous. In Doltgres, we can get it back with a simple select dolt_reset('hard')
. This function takes an optional commit. If no commit is specified it resets to the HEAD
commit.
dolt_reset_hard(engine, None)
print_status(engine)
print_tables(engine)
def dolt_reset_hard(engine, commit):
if ( commit ):
stmt = text("SELECT dolt_reset('--hard', '" + commit + "')")
print("Resetting to commit: " + commit)
else:
stmt = text("SELECT dolt_reset('--hard')")
print("Resetting to HEAD")
with engine.connect() as conn:
results = conn.execute(stmt)
conn.commit()
Now, the output:
Resetting to HEAD
Status
No tables modified
Tables in database:
employees
employees_teams
teams
Doltgres makes operating databases less error prone. You can always back out changes you have in progress or rewind to a known good state.
Change data on a branch
Doltgres is the only PostgreSQL database with branches and merges. We've seen some PostgreSQL databases with branches, but none that support merging those branches. Here I show you how to create and switch branches. I'll make some changes and commit them. Later, I'll merge all my changes together. Think of a branch as a really long SQL transaction.
First, you need to create a branch. We can create a branch with the dolt_branch()
function. In the Python code, I also consult the dolt_branches
system table to make sure the branch does not already exist. We used the dolt_checkout()
method described in the Connect to the Database section to switch branches using the engine object.
dolt_create_branch(engine, 'modify_data')
engine = dolt_checkout('modify_data')
def dolt_create_branch(engine, branch):
# Check if branch exists
metadata_obj = MetaData()
dolt_branches = Table("dolt_branches", metadata_obj, autoload_with=engine)
stmt = select(dolt_branches.c.name).where(dolt_branches.c.name == branch)
with engine.connect() as conn:
results = conn.execute(stmt)
rows = results.fetchall()
if ( len(rows) > 0 ):
print("Branch exists: " + branch)
return
# Create branch
stmt = text("SELECT dolt_branch('" + branch + "')")
with engine.connect() as conn:
results = conn.execute(stmt)
print("Created branch: " + branch)
Now that we're on a new branch, it's safe to make changes. The main
branch will remain unchanged as I make these changes. This function inserts, updates, and deletes using the SQLAlchemy query builder.
modify_data(engine)
def modify_data(engine):
(employees, teams, employees_teams) = load_tables(engine)
update_stmt = update(employees).where(employees.c.first_name == 'Tim'
).values(first_name='Timothy')
insert_emp_stmt = insert(employees).values([
{'id':4, 'last_name':'Wilkins', 'first_name':'Daylon'}
])
insert_et_stmt = insert(employees_teams).values([
{'employee_id':4, 'team_id':0}
])
delete_stmt = delete(employees_teams).where(
employees_teams.c.employee_id == 0
).where(employees_teams.c.team_id == 1)
with engine.connect() as conn:
conn.execute(update_stmt)
conn.execute(insert_emp_stmt)
conn.execute(insert_et_stmt)
conn.execute(delete_stmt)
conn.commit()
Let's inspect what we've done to make sure it looks good.
print_status(engine)
print_diff(engine, 'employees')
print_diff(engine, 'employees_teams')
print_summary_table(engine)
Using branch: modify_data
Status
employees: modified
employees_teams: modified
Diffing table: employees
{'diff_type': 'modified',
'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
'from_first_name': 'Tim',
'from_id': 0,
'from_last_name': 'Sehn',
'to_commit': 'WORKING',
'to_commit_date': None,
'to_first_name': 'Timothy',
'to_id': 0,
'to_last_name': 'Sehn'}
{'diff_type': 'added',
'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
'from_first_name': None,
'from_id': None,
'from_last_name': None,
'to_commit': 'WORKING',
'to_commit_date': None,
'to_first_name': 'Daylon',
'to_id': 4,
'to_last_name': 'Wilkins'}
Diffing table: employees_teams
{'diff_type': 'removed',
'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
'from_employee_id': 0,
'from_team_id': 1,
'to_commit': 'WORKING',
'to_commit_date': None,
'to_employee_id': None,
'to_team_id': None}
{'diff_type': 'added',
'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
'from_employee_id': None,
'from_team_id': None,
'to_commit': 'WORKING',
'to_commit_date': None,
'to_employee_id': 4,
'to_team_id': 0}
Team Summary
Engineering: Timothy Sehn
Engineering: Brian Hendriks
Engineering: Aaron Son
Engineering: Daylon Wilkins
Sales: Brian Fitzgerald
Daylon is added to the engineering team on the modify_data
branch. Tim is no longer on the Sales team. Engineering for life!
Finally, let's commit these changes so we can make different changes on another branch.
dolt_commit(engine, 'Brian <brian@dolthub.com>', 'Modified data on branch')
print_commit_log(engine)
Created commit: 0o07b2f3r788t9nn2vg139maviu49fd4
Commit Log:
0o07b2f3r788t9nn2vg139maviu49fd4: Modified data on branch by Brian
lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Jason
do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by Jason
Change schema on another branch
We're going to make a schema change on another branch and make some data modifications using the SQLAlchemy Session interface. The Session interface is the true alchemy in SQLAlchemy. You bind Python objects to tables and when you add, modify, or delete Python objects, calling Session.commit()
automatically updates the records in your tables. Truly lead into gold stuff.
Below, you'll see we check out the main branch so the new branch has the correct base branch. Then, we create a new branch called modify_schema
. Then, we run the modify_schema()
function which adds a start date column and populates it using the Session interface. We finally use status and diff to show off what changed.
engine = dolt_checkout('main')
dolt_create_branch(engine, 'modify_schema')
engine = dolt_checkout('modify_schema')
print_active_branch(engine)
modify_schema(engine)
print_status(engine)
print_diff(engine, "employees")
print_summary_table(engine)
def modify_schema(engine):
(employees, teams, employees_teams) = load_tables(engine)
# SQLAlchemy does not support table alters so we use text
stmt = text('alter table employees add column start_date date')
with engine.connect() as conn:
conn.execute(stmt)
conn.commit()
# Update using the SQL Alchemy session interface
class Base(DeclarativeBase):
pass
class Employee(Base):
__tablename__ = "employees"
id: Mapped[int] = mapped_column(primary_key=True)
last_name: Mapped[str] = mapped_column(String(255))
first_name: Mapped[str] = mapped_column(String(255))
start_date: Mapped[Date] = mapped_column(Date)
def __repr__(self) -> str:
return f"Employee(id={self.id!r}, last_name={self.last_name!r}, first_name={self.first_name!r}, start_date={self.start_date!r})"
session = Session(engine)
Tim = session.get(Employee, 0)
Tim.start_date = "2018-08-06"
Aaron = session.get(Employee, 1)
Aaron.start_date = "2018-08-06"
BHeni = session.get(Employee, 2)
BHeni.start_date = "2018-08-06"
Fitz = session.execute(select(Employee).filter_by(last_name="Fitzgerald")).scalar_one()
Fitz.start_date = "2021-04-19"
session.commit()
This outputs the following:
Using branch: main
Created branch: modify_schema
Using branch: modify_schema
Active branch: modify_schema
Status
employees: modified
Diffing table: employees
{'diff_type': 'modified',
'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
'from_first_name': 'Tim',
'from_id': 0,
'from_last_name': 'Sehn',
'from_start_date': None,
'to_commit': 'WORKING',
'to_commit_date': None,
'to_first_name': 'Tim',
'to_id': 0,
'to_last_name': 'Sehn',
'to_start_date': datetime.date(2018, 8, 6)}
{'diff_type': 'modified',
'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
'from_first_name': 'Brian',
'from_id': 1,
'from_last_name': 'Hendriks',
'from_start_date': None,
'to_commit': 'WORKING',
'to_commit_date': None,
'to_first_name': 'Brian',
'to_id': 1,
'to_last_name': 'Hendriks',
'to_start_date': datetime.date(2018, 8, 6)}
{'diff_type': 'modified',
'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
'from_first_name': 'Aaron',
'from_id': 2,
'from_last_name': 'Son',
'from_start_date': None,
'to_commit': 'WORKING',
'to_commit_date': None,
'to_first_name': 'Aaron',
'to_id': 2,
'to_last_name': 'Son',
'to_start_date': datetime.date(2018, 8, 6)}
{'diff_type': 'modified',
'from_commit': 'lf30j7u2pmrna1h4ld3gg7j9gpn9etho',
'from_commit_date': datetime.datetime(2025, 7, 18, 19, 6, 18, 771000),
'from_first_name': 'Brian',
'from_id': 3,
'from_last_name': 'Fitzgerald',
'from_start_date': None,
'to_commit': 'WORKING',
'to_commit_date': None,
'to_first_name': 'Brian',
'to_id': 3,
'to_last_name': 'Fitzgerald',
'to_start_date': datetime.date(2021, 4, 19)}
Team Summary
Engineering: Tim Sehn 2018-08-06
Engineering: Brian Hendriks 2018-08-06
Engineering: Aaron Son 2018-08-06
Sales: Tim Sehn 2018-08-06
Sales: Brian Fitzgerald 2021-04-19
As you can see, my defensive coding in the Insert Some Data section paid off and employee start dates are displayed. This looks good, so we'll commit it.
dolt_commit(engine, 'Jason <jason@dolthub.com>', 'Modified schema on branch')
print_commit_log(engine)
Created commit: apf44oec5ctk3cu0e0s2ot90pgpr2f1s
Commit Log:
apf44oec5ctk3cu0e0s2ot90pgpr2f1s: Modified schema on branch by Jason
lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn
Merge it all together
Now for our last trick. We will merge all the branches together and show the resulting summary table. To merge, you use the dolt_merge()
function.
engine = dolt_checkout('main')
print_active_branch(engine)
print_commit_log(engine)
print_summary_table(engine)
dolt_merge(engine, 'modify_data')
print_summary_table(engine)
print_commit_log(engine)
dolt_merge(engine, 'modify_schema')
print_commit_log(engine)
print_summary_table(engine)
def dolt_merge(engine, branch):
stmt = text("SELECT dolt_merge('" + branch + "')")
with engine.connect() as conn:
results = conn.execute(stmt)
rows = results.fetchall()
commit = rows[0][0]
fast_forward = rows[0][1]
conflicts = rows[0][2]
print("Merge Complete: " + branch)
print("\tCommit: " + commit)
print("\tFast Forward: " + str(fast_forward))
print("\tConflicts: " + str(conflicts))
This outputs the following. It's beautiful. You can see the data and schema evolving as we merge.
Using branch: main
Active branch: main
Commit Log:
lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Jason
do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by Jason
Team Summary
Engineering: Tim Sehn
Engineering: Brian Hendriks
Engineering: Aaron Son
Sales: Tim Sehn
Sales: Brian Fitzgerald
Merge Complete: modify_data
Commit: 0o07b2f3r788t9nn2vg139maviu49fd4
Fast Forward: 1
Conflicts: 0
Team Summary
Engineering: Timothy Sehn
Engineering: Brian Hendriks
Engineering: Aaron Son
Engineering: Daylon Wilkins
Sales: Brian Fitzgerald
Commit Log:
0o07b2f3r788t9nn2vg139maviu49fd4: Modified data on branch by Brian
lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn
Merge Complete: modify_schema
Commit: 2jmueascj0cnrba7pl46m9ok5et4ccmg
Fast Forward: 0
Conflicts: 0
Commit Log:
2jmueascj0cnrba7pl46m9ok5et4ccmg: Merge branch 'modify_schema' into main by postgres
apf44oec5ctk3cu0e0s2ot90pgpr2f1s: Modified schema on branch by Tim
0o07b2f3r788t9nn2vg139maviu49fd4: Modified data on branch by Brian
lf30j7u2pmrna1h4ld3gg7j9gpn9etho: Inserted data into tables by Aaron
5ahj2vcifi29ibs48f7bp91sm752ub2i: Created tables by Tim
do1tbd13h57jlsm86gqeknnm7nob41uj: Іnіtialize datа repоsitory by timsehn
Team Summary
Engineering: Timothy Sehn 2018-08-06
Engineering: Brian Hendriks 2018-08-06
Engineering: Aaron Son 2018-08-06
Engineering: Daylon Wilkins
Sales: Brian Fitzgerald 2021-04-19
Notice the first merge was a fast-forward merge just like in Git. Doltgres will detect schema and data conflicts if you make them.
Conclusion
Phew. That was a lot! If you made it this far, congratulations. You are now ready to build your own Doltgres application using Python and SQLAlchemy. If this wasn't enough to get you started or you've still got more questions, come by our Discord and we'll be help you out!