Dolt for SOX Compliance
In 2017 Snap Inc went public which meant that there was a rush to become Sarbanes Oxley (SOX) compliant. At that time I was the lead developer on the ad delivery team which was directly responsible for one of Snap's revenue streams. As a result I spent countless hours with our internal compliance teams and external auditors setting up SOX compliance controls. In this blog I'm going to spend a little bit of time talking about what SOX compliance controls are and how Dolt would have helped build compliance controls that are easier to set up and work with.
What is SOX
The Sarbanes-Oxley Act aka SOX was legislation passed following the Enron Scandal of the early 2000s. The goal of SOX is to prevent tampering with systems or data that could be considered materially relevant to financial reporting. In the case of ad delivery this meant nearly everything needed to be covered by SOX compliance controls. This included:
- The code for the ad server, and the code of programs that generated data that fed into the ad delivery system. If the code was compromised it could be used to generate fake ad impressions or clicks.
- Data such as ad auction data, ad impression logs, and ad click logs which was used to bill advertisers. Being able to tamper with this data could result in fraudulent charges to advertisers.
- The machine learning models that were used to optimize ad delivery. If these models were tampered with it could result in ads being shown to the wrong people, or not shown at all which would result in lost revenue.
- The code responsible for deploying the ad server and the deployment logs which tracked which version of the code was deployed to which servers and when. If the deployment system was compromised the ad server could be replaced with a malicious version generating fraudulent revenue.
- The infrastructure that runs the ad delivery system. If the infrastructure is compromised it can be used to generate fake ad impressions or clicks.
There were many other things covered by SOX at Snap, but these were the things that the ad delivery team was directly responsible for.
What are SOX Compliance Controls
SOX compliance controls are a set of rules and procedures that are put in place to ensure that the systems and data that are covered by SOX are not tampered with. There are many types of controls here I'll be covering some of them and discussing how Dolt's features can be used to implement or improve these controls.
Change Management
Change management is the process by which changes to the systems or data are tracked and reviewed. Requiring reviews protects against accidental changes, ensures the quality of the changes, and guarantees that no single person can make changes that would result in fraud. By tracking all changes you can ensure that any unauthorized changes are detected, and are attributable to the person who made them.
At Snap we used Git to track changes to the code, and required the code to be reviewed. But what
about the data? Well data such as configuration files would be tracked in Git along with the code.
Configuration in Git required us to generate new build artifacts to deploy the configuration. Our deployment system would then need to spin
up new hardware and gradually transition from the old version to the new. This is not a trivial process, and is usually
done slowly and monitored to make sure there are no issues with the new version.
Configuration stored in Dolt, a version controlled database would have greatly simplified this process. Using Dolt we could have tracked changes to the data in the same way that we tracked changes to the code. If we used DoltHub, DoltLab, Dolt Workbench, or our upcoming Dolt Desktop Workbench, we could have required separate pull requests and reviews for changes to the data. Once the changes were merged to main, there are many options for how we could have gotten configuration changes to our servers that wouldn't have required building, and deploying the full ad serving stack.
These concepts are not limited to configuration data. They can be used with any data that can be stored in a database and there
are many cases where Dolt is a better fit for change management than Git. As an
example Dolt has many benefits over Git for large text files containing
structured data. Be it large csv files, json documents, or some other large text based file, Git
is a poor choice. Line based diff and merge is not the ideal fit for structured data. Additionally large text files can
be difficult to edit. Searching for the line you want to change and maintaining the structure when editing can be challenging.
Dolt is a better fit for this type of data as it allows you to query the database and use SQL INSERT
and UPDATE
statements to make changes.
Separation of Duties
In order to be able to develop, debug, and deploy software someone will need administrative access to the code, the data, and the infrastructure. However, there is no reason that a single person needs administrative access to everything. By separating access to different users you ensure that no single person can run amok without being detected. Access can be split even further having different users with different access levels to each different piece of software, each type of data, and each piece of infrastructure.
Dolt has fine-grained access controls that allow you to control who can read, write, modify, and delete data in each table of each database. Dolt even supports branch level access controls. This means that you can protect certain branches and allow developers to make modifications on other branches without being allowed to merge their changes to other branches. This is a powerful feature that allows you to separate access to different users but allows all users to work on the data and submit pull requests for an administrator to review.
$ # Create a test user and lock down the main branch so that only the account 'root' has write access
$ dolt sql-client --user=root
# Welcome to the Dolt MySQL client.
# Statements must be terminated with ';'.
# "exit" or "quit" (or Ctrl-D) to exit.
mysql> SELECT * FROM dolt_branch_control;
+----------+--------+------+------+-------------+
| database | branch | user | host | permissions |
+----------+--------+------+------+-------------+
| % | % | % | % | write |
+----------+--------+------+------+-------------+
1 row in set (0.00 sec)
mysql> CREATE USER testuser@localhost;
mysql> GRANT ALL ON *.* TO testuser@localhost;
mysql> INSERT INTO dolt_branch_control VALUES ('example', 'main', 'root', '%', 'write');
mysql> SELECT * FROM dolt_branch_control;
+----------+--------+------+------+-------------+
| database | branch | user | host | permissions |
+----------+--------+------+------+-------------+
| example | main | root | % | write |
+----------+--------+------+------+-------------+
1 row in set (0.00 sec)
$ # try to update the config table on main as testuser
$ dolt sql-client --user=testuser
# Welcome to the Dolt MySQL client.
# Statements must be terminated with ';'.
# "exit" or "quit" (or Ctrl-D) to exit.
mysql> UPDATE config SET value = 'true' WHERE `key` = 'enable_back_to_back_ads';
Error 1105: `testuser`@`%` does not have the correct permissions on branch `main`
Data Integrity
Data integrity is the process by which data is protected from unauthorized changes and data loss. In cases where it is not possible to review every change to the data, you can use cryptographic signatures, or other means to ensure that the data has not been modified. Dolt stores data in a Merkle DAG. This means that the state of the database at any moment in time can be represented by a cryptographically secure hash. This hash can be used to ensure that the data has not been tampered with. If the data is modified the hash will not match the data. Additionally, Dolt stores the way in which the data has changed over time, and who changed it in an immutable commit graph. Even if the data is tampered with the commit graph will show that along with who tampered with it.
Data loss can come in many forms. One example is data loss due to hardware issues. When running applications at scale it
is only a matter of time until you run into a corrupted disk, or a server that goes down and cannot be recovered. In these
cases Dolt supports both backups and replication. Another example is data loss caused by human error
or issues with application code. There have certainly been many times when rows were deleted in error, updates were run without a where
clause, or a table or database was dropped. In most of these scenarios you would need to go to a backup in order to recover.
With Dolt you can use dolt_undrop()
to recover from an accidentally dropped database, and you can revert commits that caused by bad INSERT
, UPDATE
, DELETE
, or
DROP <TABLE>
statements. This is a powerful feature that allows you to recover from data loss much more quickly than
restoring from a backup.
Auditing
For the purpose of compliance, the ability to audit the systems and data is critical. This includes the ability to review logs, and other data to ensure that the systems and data are being used in a compliant manner. We've already talked about how the Dolt commit graph is used for change management, and data integrity, but the ability to query the commit graph, and query the data diffs provides a very powerful tool for auditing.
ads/main> select * from dolt_log;
+----------------------------------+----------------+-------------------+---------------------+----------------------------+
| commit_hash | committer | email | date | message |
+----------------------------------+----------------+-------------------+---------------------+----------------------------+
| 76qerj11u38il8rb1ddjn3d6kivqamk2 | Brian Hendriks | brian@dolthub.com | 2024-10-08 01:17:37 | just a little tampering |
| gram7r07nvcfahi0ku1jl0sdm936n9hg | Brian Hendriks | brian@dolthub.com | 2024-10-08 01:16:15 | some impressions |
| 7oen8e22633v11v9p6gpb535r3i9h88u | Brian Hendriks | brian@dolthub.com | 2024-10-07 22:52:09 | Initial Tables and Data |
| l86n1ifds32jb3roirt2nqf08ev4c3up | Brian Hendriks | brian@dolthub.com | 2024-10-07 21:02:36 | Initialize data repository |
+----------------------------------+----------------+-------------------+---------------------+----------------------------+
4 rows in set (0.00 sec)
ads/main> select from_bid_micro_usd, to_bid_micro_usd from dolt_diff_serves where to_commit = '76qerj11u38il8rb1ddjn3d6kivqamk2';
+--------------------+------------------+
| from_bid_micro_usd | to_bid_micro_usd |
+--------------------+------------------+
| 1004 | 2000 |
| 1004 | 2000 |
| 1009 | 2000 |
| 1001 | 2000 |
| 1007 | 2000 |
| 1006 | 2000 |
| 1001 | 2000 |
| 1008 | 2000 |
| 1005 | 2000 |
| 1003 | 2000 |
| 1003 | 2000 |
| 1001 | 2000 |
| 1002 | 2000 |
+--------------------+------------------+
13 rows in set (0.01 sec)
Conclusion
In this blog I've talked about some of the SOX compliance controls that are used to protect systems and data. I've also talked about how Dolt can be used to implement or improve these controls. Dolt is a powerful tool that can be used to track changes to data, separate access to data, ensure data integrity, and audit the data. If you are working on a project that requires SOX compliance controls, or if you are working on a project that requires any of the features that Dolt provides, I encourage you to give Dolt a try. If you have any questions about Dolt reach out to us on Discord.