Agents Need Tests Example

AI
8 min read

Agents need tests. Dolt is the only SQL database with a built-in test engine. Dolt is the database for agents. I just wrote about this yesterday.

Originally, I had the following example in that article but after feedback from my team, I decided to make this a separate article. I promise you it's not because I needed articles for our one-a-day schedule. The example is just very powerful and we wanted to make sure it was not buried under 1,000 words of explanation.

So, let's use an illustrative example to show how Dolt and Dolt CI can enhance an agentic workflow.

The Problem

Let's imagine we are implementing a grading agent for a college professor. The agent will go through students course work and assign a grade. After all grades are assigned, we want to assert that the grades meet a certain distribution to make sure our agent is not grading too easily. If the grades do not match the distribution the agent will go back and regrade with the new information.

This is a perfect use case for Dolt and Dolt CI. The agent will assign grades on branches and Dolt CI will run on the branches to make sure the grades meet a defined distribution. If not, Dolt CI will fail.

The Set Up

I'm going to use Claude Code as my agent to work with Dolt. As many of you know, it's my favorite agent. I made sure to start claude with our brand new AGENT.md file which tells Claude explicitly how to use Dolt CI.

$ dolt init
$ dolt docs print AGENT.md > CLAUDE.md
$ claude

Create The Database

This was my initial prompt.

I would like to make a database modeling a set of grades for a set of 
college courses. There will be students, courses, and grades for the 
student/course pairs. There will also be an additional verification table
modeling the distribution of letter grades in a class. So, for instance,
a class may only allow a maximum of 20% As, 40% Bs, 40% Cs, 10% Ds, and 
10% Fs. I then want the grades table to be verified by Dolt CI tests 
assuring that the grade distribution for a class to not exceed the 
maximums in the distribution found in the verification table.

Claude Code came up with a reasonable database and Dolt CI tests on the first try. You can find the database on DoltHub. With the help of my CLAUDE.md I created using Dolt's AGENT.md doc, Claude had no problem navigating Dolt's SQL and Git-style CLI interfaces to create this database. Dolt CI which is not well documented, in particular, benefits from the presence of a CLAUDE.md created from AGENT.md.

Sample Database on DoltHub

The Dolt CI definition is below. The interesting test is named "verify no grade distribution violations". As you can see, it checks that no bucket of grades exceeds the percent listed in the grade_distributions table. Claude Code came up with this complicated multi-table join on the first try. I always review test code thoroughly because agents work better with tests. This test "looks good to me (LGTM)".

$ dolt ci view "grade distribution validation"
name: "grade distribution validation"
"on":
    push:
        branches:
            - "master"
            - "test-violations"
            - "update-ci-branches"
            - "main"
            - "passing-grades-expansion"
jobs:
    - name: "schema validation"
      steps:
        - name: "verify all required tables exist"
          saved_query_name: "table_existence_check"
          saved_query_statement: "SELECT \n    COUNT(*) as table_count\nFROM information_schema.tables \nWHERE table_schema = DATABASE() \n    AND table_name IN ('students', 'courses', 'grades', 'grade_distributions');"
          expected_columns: "== 1"
          expected_rows: "== 1"
        - name: "check referential integrity"
          saved_query_name: "referential_integrity_check"
          saved_query_statement: "SELECT COUNT(*) as orphaned_grades\nFROM grades g \nLEFT JOIN students s ON g.student_id = s.student_id \nLEFT JOIN courses c ON g.course_id = c.course_id\nWHERE s.student_id IS NULL OR c.course_id IS NULL;"
          expected_columns: "== 1"
          expected_rows: "== 1"
    - name: "grade distribution compliance"
      steps:
        - name: "verify no grade distribution violations"
          saved_query_name: "grade_distribution_violations"
          saved_query_statement: "SELECT \n    c.course_code,\n    c.course_name,\n    g.letter_grade,\n    COUNT(*) as actual_count,\n    ROUND(COUNT(*) * 100.0 / total_students.total, 2) as actual_percentage,\n    gd.max_percentage,\n    CASE \n        WHEN COUNT(*) * 100.0 / total_students.total > gd.max_percentage \n        THEN 'VIOLATION' \n        ELSE 'OK' \n    END as status\nFROM grades g\nJOIN courses c ON g.course_id = c.course_id\nJOIN grade_distributions gd ON g.course_id = gd.course_id AND g.letter_grade = gd.letter_grade\nJOIN (\n    SELECT course_id, COUNT(*) as total\n    FROM grades\n    GROUP BY course_id\n) total_students ON g.course_id = total_students.course_id\nGROUP BY c.course_id, c.course_code, c.course_name, g.letter_grade, gd.max_percentage, total_students.total\nHAVING COUNT(*) * 100.0 / total_students.total > gd.max_percentage;"
          expected_rows: "== 0"

Failing Tests

Now, I make a set of grades that fail the Dolt CI tests on a branch using the following prompt.

Now on a branch, insert some grades that fail the distribution tests to 
show CI failing.

That branch can be found on DoltHub. I even made a pull request to show that the Dolt CI tests explicitly fail. As you can see, the Pull Request is clearly marked as failing the "grade distribution compliance" check.

PR with Failing Tests

If you want to see the tests fail locally, you can check with dolt ci run.

$ dolt checkout test-violations
Switched to branch 'test-violations'
$ dolt ci run "grade distribution validation"
Running workflow: grade distribution validation
Running job: schema validation
Step: verify all required tables exist - PASS
Step: check referential integrity - PASS
Running job: grade distribution compliance
Step: verify no grade distribution violations - FAIL
Ran query: SELECT 
    c.course_code,
    c.course_name,
    g.letter_grade,
    COUNT(*) as actual_count,
    ROUND(COUNT(*) * 100.0 / total_students.total, 2) as actual_percentage,
    gd.max_percentage,
    CASE 
        WHEN COUNT(*) * 100.0 / total_students.total > gd.max_percentage 
        THEN 'VIOLATION' 
        ELSE 'OK' 
    END as status
FROM grades g
JOIN courses c ON g.course_id = c.course_id
JOIN grade_distributions gd ON g.course_id = gd.course_id AND g.letter_grade = gd.letter_grade
JOIN (
    SELECT course_id, COUNT(*) as total
    FROM grades
    GROUP BY course_id
) total_students ON g.course_id = total_students.course_id
GROUP BY c.course_id, c.course_code, c.course_name, g.letter_grade, gd.max_percentage, total_students.total
HAVING COUNT(*) * 100.0 / total_students.total > gd.max_percentage;
Assertion failed: expected row count 0, got 1

Passing Tests

Now, I wanted a branch with some succeeding tests to show a good PR. I used the following prompt.

alright now make a new branch with a passing set of new grades. don't 
forget to add the branch to ci to check on dolthub

Check out this looping until correct! I promise I didn't fake it.

⏺ Perfect! The distribution is compliant:
  - A: 18.75% (under 20% limit) ✅
  - B: 37.50% (under 40% limit) ✅
  - C: 31.25% (slightly over 30% but within tolerance) ✅
  - D: 6.25% (under 10% limit) ✅
  - F: 6.25% (under 10% limit) ✅

  Wait, C is slightly over the 30% limit. Let me fix that by changing one C
   to a D:

⏺ Bash(dolt sql -q "UPDATE grades SET letter_grade = 'D', numeric_grade =
      68.0 WHERE student_id = 21;")
  ⎿  (No content)

⏺ Bash(dolt sql -q "SELECT
          c.course_code,…)
  ⎿  +-------------+--------------+-------+------------+----------------+
     | course_code | letter_grade | count | percentage | max_percentage |
     +-------------+--------------+-------+------------+----------------+
     … +6 lines (ctrl+r to expand)

⏺ Hmm, now D is over the 10% limit (12.50%). Let me adjust by changing one
  D to an F:

⏺ Bash(dolt sql -q "UPDATE grades SET letter_grade = 'F', numeric_grade =
      58.0 WHERE student_id = 21;")
  ⎿  (No content)

⏺ Bash(dolt sql -q "SELECT
          c.course_code,…)
  ⎿  +-------------+--------------+-------+------------+----------------+
     | course_code | letter_grade | count | percentage | max_percentage |
     +-------------+--------------+-------+------------+----------------+
     … +6 lines (ctrl+r to expand)

⏺ Still over on F. Let me change one F back to a D:

⏺ Bash(dolt sql -q "UPDATE grades SET letter_grade = 'D', numeric_grade =
      68.0 WHERE student_id = 10;")
  ⎿  (No content)

⏺ Bash(dolt sql -q "SELECT
          c.course_code,…)
  ⎿  +-------------+--------------+-------+------------+----------------+
     | course_code | letter_grade | count | percentage | max_percentage |
     +-------------+--------------+-------+------------+----------------+
     … +6 lines (ctrl+r to expand)

⏺ Still slightly over on D. This is tricky with 16 students. Let me try a
  different approach - change one B to a C to better balance:

⏺ Bash(dolt sql -q "UPDATE grades SET letter_grade = 'C', numeric_grade =
      79.0 WHERE student_id = 19;")
  ⎿  (No content)

⏺ Bash(dolt sql -q "SELECT
          c.course_code,…)
  ⎿  +-------------+--------------+-------+------------+----------------+
     | course_code | letter_grade | count | percentage | max_percentage |
     +-------------+--------------+-------+------------+----------------+
     … +6 lines (ctrl+r to expand)

⏺ Still over on C and D. Let me change the D back to F and try again:

⏺ Bash(dolt sql -q "UPDATE grades SET letter_grade = 'F', numeric_grade =
      55.0 WHERE student_id = 10;")
  ⎿  (No content)

⏺ Bash(dolt sql -q "SELECT
          c.course_code,…)
  ⎿  +-------------+--------------+-------+------------+----------------+
     | course_code | letter_grade | count | percentage | max_percentage |
     +-------------+--------------+-------+------------+----------------+
     … +6 lines (ctrl+r to expand)

⏺ C is slightly over (31.25% vs 30% max) and F is over (12.50% vs 10% max).
   The issue is that with 16 students, perfect compliance is mathematically
   difficult. Let me adjust the distribution limits to be more realistic or
   remove one student. Let me remove the test user and go back to 15
  students:

⏺ Bash(dolt sql -q "DELETE FROM students WHERE student_id = 16;")
  ⎿  Error: error on line 1 for query DELETE FROM students WHERE student_id 
     = 16: cannot delete or update a parent row - Foreign key violation on 
     fk: `grades_ibfk_1`, table: `grades`, referenced table: `students`, 
     key: `[16]`

⏺ Right, there's a foreign key constraint. Let me remove the grade first:

⏺ Bash(dolt sql -q "DELETE FROM grades WHERE student_id = 16;")
  ⎿  (No content)

⏺ Bash(dolt sql -q "DELETE FROM students WHERE student_id = 16;")
  ⎿  (No content)

⏺ Bash(dolt sql -q "SELECT
          c.course_code,…)
  ⎿  +-------------+--------------+-------+------------+----------------+
     | course_code | letter_grade | count | percentage | max_percentage |
     +-------------+--------------+-------+------------+----------------+
     … +6 lines (ctrl+r to expand)

⏺ Still slightly over on C (33.33% vs 30%) and F (13.33% vs 10%). Let me
  change one C to a B:

⏺ Bash(dolt sql -q "UPDATE grades SET letter_grade = 'B', numeric_grade =
      80.0 WHERE student_id = 20;")
  ⎿  (No content)

⏺ Bash(dolt sql -q "SELECT
          c.course_code,…)
  ⎿  +-------------+--------------+-------+------------+----------------+
     | course_code | letter_grade | count | percentage | max_percentage |
     +-------------+--------------+-------+------------+----------------+
     … +6 lines (ctrl+r to expand)

⏺ Better! Now F is still slightly over. Let me change one F to a D:

⏺ Bash(dolt sql -q "UPDATE grades SET letter_grade = 'D', numeric_grade =
      67.0 WHERE student_id = 21;")
  ⎿  (No content)

⏺ Bash(dolt sql -q "SELECT
          c.course_code,…)
  ⎿  +-------------+--------------+-------+------------+----------------+
     | course_code | letter_grade | count | percentage | max_percentage |
     +-------------+--------------+-------+------------+----------------+
     … +6 lines (ctrl+r to expand)

⏺ D is still slightly over (13.33% vs 10%). With 15 students, this is still
   mathematically challenging. Let me add one more A to get to exactly 20%:

⏺ Bash(dolt sql -q "UPDATE grades SET letter_grade = 'A', numeric_grade =
      90.0 WHERE student_id = 21;")
  ⎿  (No content)

⏺ Bash(dolt sql -q "SELECT
          c.course_code,…)
  ⎿  +-------------+--------------+-------+------------+----------------+
     | course_code | letter_grade | count | percentage | max_percentage |
     +-------------+--------------+-------+------------+----------------+
     … +6 lines (ctrl+r to expand)

⏺ Perfect! Now all percentages are within limits:
  - A: 20.00% (exactly at 20% limit) ✅
  - B: 40.00% (exactly at 40% limit) ✅
  - C: 26.67% (under 30% limit) ✅
  - D: 6.67% (under 10% limit) ✅
  - F: 6.67% (under 10% limit) ✅

Imagine if each time you had to manually check the agents work. It would convince you that an agent could not do this task. This is exactly why agents need tests.

Finally, after all that iteration, I have a PR with successful tests.

PR with Passing Tests

Conclusion

This example uses a raw Dolt database and DoltHub as the user interface to this application. However, all the version control functionality in Dolt is exposed in SQL so you could build an application on top of these primitives. Dolt not only adds version control and tests to databases, it also adds those primitives to every application that uses Dolt as its database.

Agents need branches. Agents need tests. Dolt has branches and tests. Dolt is the database for agents. Have an agentic use case in mind? We'd love to hear about it. Come by our Discord and share.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.