Transferring Data In and Out of Air-Gapped Networks
A high security network can have what is called an "air gap". An air-gapped network is a network where there are no physical connections to other networks. To get data in and out of an air-gapped network, someone physically must bring the data across the air gap, connect to the network, and perform a data transfer or synchronization operation.
Anyone who has worked with an air-gapped network knows that while secure, an air gapped network is a pain to maintain and operate. Any time you want to get software or data in or out, someone must physically cross the gap. If multiple people are updating things in the network, it's hard to know if there are conflicting updates. Moreover, the network is air-gapped because the software and data you have inside is sensitive, so you don't want to maintain much information about the network outside the air gap.
Pictured below is a highly skilled expert crossing an air gap. Seems difficult.
Fortunately, we have new decentralized version control tools like Git for files and Dolt for databases that can help getting data in and out of air-gapped networks. This blog will explain what an air-gapped network is, where it is generally deployed, and some tools to make managing the software and data inside it easier.
What is an Air Gapped Network?
An air-gapped computer or network is one that has no network interfaces, either wired or wireless, connected to outside networks. Pictured below, you have a standard internet-accessible network on the left. On the right, you have an air-gapped network. Note, no internet connectivity and you have enhanced physical security to move data in or out.
Obviously, in an air gapped network you must remove wifi or other airborne network connectivity from devices in the air-gapped network. Most hacks of air gapped environments involve getting data out via some sort of over-the-air means.
To move data between the outside world and the air-gapped system, it is necessary to write data to a physical medium such as a laptop or removable drive, and physically move the data in and out of the network. The less sophisticated the device used to transfer the data, the higher the security. In some high security air-gapped settings, data transfer via any electronic medium is prohibited. Only manual data entry is permitted in the air-gapped network. This is usually complimented by enhanced physical security preventing any unwanted electronics crossing the air gap.
At the network layer a Unidirectional Gateway or data diode can be used to allow one way data transfer, either in or out. A network design in this way has a one way air gap.
Benefits
The main benefit of an air-gapped network is security. The threat profile of an air-gapped network is much smaller and simpler than a network connected to the internet. The network is more observable given traffic is only generated in the network itself. An air-gapped network is also far simpler. Fewer hardware and software components are required to operate it.
Disadvantages
The main disadvantages of an air-gapped network are reduced operability and convenience. Software updates in particular can be very difficult. Some security professionals have argued the inability to keep software up to date is not worth the security of an air-gapped network. As mentioned, getting software or data in and out of the network is intentionally difficult. Coordinating multiple updates and figuring out who has changed what can be a challenge.
Where are Air Gapped Networks Used?
Military/governmental systems
In my professional experience, government and military are the largest users of air-gapped networks. The government runs many systems of national security interest. It is common for these systems to be physically inaccessible from other networks.
Financial systems
Financial systems are often on air-gapped networks. If the financial system is not highly transactional and contains sensitive information, the enhanced security provided by an air-gapped network can be useful. It is also common to have an air-gapped backup of important financial system data for disaster recovery.
Industrial Control systems
Industrial supervisory control and data acquisition (SCADA) systems are often air-gapped. These systems control large, potentially dangerous manufacturing equipment. For instance, the computer systems that control and oil and gas refinery would be air-gapped for safety. Industrial systems may not generally require internet access because the industrial setting is a closed environment.
Lottery machines
National and state lottery machines or random number generators are often required to be completely isolated from networks to prevent lottery fraud.
Life-critical systems
Systems where malfunction can cause loss of life are often air-gapped. Such systems include controls of nuclear power plants, air traffic control systems, and computerized medical equipment.
Very simple systems
Finally, very simple systems like a home thermostat or washing machine are often air-gapped. This is not done for security. It is a byproduct of design simplicity and cost.
Getting Data In and Out
Getting data in and out of air-gapped networks is challenging by design. For this reason, there are a number of tools and technologies to help get data in and out of air-gapped networks securely. Some examples are data diodes like Owl Defense, thin clients like Forcepoint, and secure file transfer like 4sft. These technologies are military focused, very specific, and closed source.
We have a better way!
We here at DoltHub created the world's first version controlled database, called Dolt. Think Git and MySQL had a baby. A number of potential customers have inquired about how to use Dolt to move data in or out of air-gapped networks. After discussing this use case with these potential customers, we think the Git model can be especially useful for transferring data in and out of air-gapped networks.
Git
Git can quickly compute the differences between two sets of files. If the contents of the directory change, Git can quickly compute the differences (ie. diff) between the old and new copies. This diff functionality is incredibly useful when transferring data into or out of an air-gapped network. You have a view of what the file structure you want should look like outside of the air-gapped network. Once you get inside the air-gapped network, you compare what you have to what is there. If you are satisfied with the result, you transfer the files. If not, you back out the change. This is especially useful for software configuration files.
Git has other useful functionality that can be leveraged in air-gapped network data transfer. Git is a single open source program so it should be easy to get security approved on a air-gapped system. Git produces immutable hashes called commits that can be used as a way to summarize the contents of a directory. Git has branches and merges so multiple versions of the system can be running synchronously.
Let's look at an example. First, I'll make the directory I want to version in Git and initialize a Git repository.
$ mkdir airgap
$ cd airgap
$ git init
Initialized empty Git repository in /Users/timsehn/dolthub/git/airgap/.git/
Then, I'll make a simple bash script to generate 100 random files.
$ cat generate.sh
#!/bin/bash
set -e
echo "Generating test folders"
mkdir -p ./parent_{0..9}/child_{0..9}
for file in ./parent_{0..9}/child_{0..9}/test.txt; do
head -c 100 /dev/urandom > $file
done
I run the script and commit all the output to Git.
$ bash ./generate.sh
Generating test folders
$ git add .
$ git commit -am "Added a bunch of files"
[main (root-commit) 2697508] Added a bunch of files
101 files changed, 95 insertions(+)
create mode 100644 generate.sh
create mode 100644 parent_0/child_0/test.txt
create mode 100644 parent_0/child_1/test.txt
create mode 100644 parent_0/child_2/test.txt
...
...
create mode 100644 parent_9/child_7/test.txt
create mode 100644 parent_9/child_8/test.txt
create mode 100644 parent_9/child_9/test.txt
Finally, I pick a random file and modify it so Git can tell me what has changed. This simulates a new file being transferred across the air gap.
$ echo "Look what I did > parent_8/child_4/test.txt"
Look what I did > parent_8/child_4/test.txt
$ echo "Look what I did" > parent_8/child_4/test.txt
$ git diff
diff --git a/parent_8/child_4/test.txt b/parent_8/child_4/test.txt
index ad698ad..54c28e0 100644
--- a/parent_8/child_4/test.txt
+++ b/parent_8/child_4/test.txt
@@ -1,3 +1 @@
-^Gޙ
-<D5>^Ma<D3>~<87>^\u<98>H<84>~<F1><DE>^^<85>^O<FD><9B>
-<8A>Ǐ^Y<F0>R<88><82><D1>9^L^F<D0>+a<C2><F2>^T<80><BE>~H+U<A8>גʃ%^V^P6<80><E1><<F3>ɰ}^_$<F0><A7>^T<FF><D2><C8>A]<D3>^C<E4><87><F9><F9><80>SjTk^^WW<CC>Y<AD>^\^^B<A1><ED>
\ No newline at end of file
+Look what I did
See how easy it is to find what's changed in a bunch of files. Imagine the same process when bringing in new configuration for software in an air-gapped network.
Dolt
Dolt brings Git functionality to SQL database tables instead of files. Dolt allows you to find differences in large sets of structured data instead of files. Traditionally, structured data has been difficult to compare across an air gap but Dolt fixes this issue.
Let's look at an example. First, I'll make the directory I want to store my Dolt database in and initialize it.
$ mkdir airgap
$ cd airgap
$ dolt init
Successfully initialized dolt data repository.
Then, I'll create a table and seed it with 10,000 rows.
$ dolt sql -q "create table airgap (
id int primary key auto_increment,
random_text varchar(100))"
To make a random string I use this funky sql.
$ dolt sql -q "insert into airgap(random_text) select left(md5(rand()), 30)"
Query OK, 1 row affected (0.00 sec)
$ dolt sql -q "select * from airgap"
+----+--------------------------------+
| id | random_text |
+----+--------------------------------+
| 1 | 224cca58ce43d912bcad83c865436f |
+----+--------------------------------+
And I repeat it 10,000 times.
$ for i in {1..10000}
for> do
for> dolt sql -q "insert into airgap(random_text) select left(md5(rand()), 30)"
for> done
Query OK, 1 row affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
...
...
...
Now, I make a Dolt commit so I can refer back to this point later.
$ dolt add .
$ dolt commit -am "Created table and seeded it with random data"
I randomly change one of the strings simulating new data coming across the airgap. This could be generated from a CSV load, a script, or any other database update method.
$ dolt sql -q 'update airgap set random_text="Look what I did" where id=ceiling(rand()*10000)'
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
$ dolt diff
diff --dolt a/airgap b/airgap
--- a/airgap @ q19i1spcs4d60de2rept4vu5rm9p4mkk
+++ b/airgap @ s0qb4ktjlctppapedls61svl8ejgah0e
+---+------+--------------------------------+
| | id | random_text |
+---+------+--------------------------------+
| < | 7292 | 3b62156b16389339344a7882fee318 |
| > | 7292 | Look what I did |
+---+------+--------------------------------+
Dolt quickly and easily tells me what row changed. Dolt has a custom storage engine so fast diff scales to hundreds of millions of rows.
Conclusion
Air-gapped network are secure but hard to maintain, As you can see, Git and Dolt provide a great tool to compare data in files or tables when you bing it into an air-gapped network. Inspired? Come by our Discord for help getting started.