Learn SQL with Real Data using Dolt
Dolt is a SQL database with Git-style versioning. DoltHub is a place on the internet to share Dolt databases. We think these tools can help people learn and perfect their SQL skills like no other database. This blog explains how.
Get started quickly
One of the issues with learning SQL is the set up. Most databases are complicated to set up. They are client/server, you need permissions, and the proper client version. There's just a lot to set up before you can run your first SQL query.
You can run SQL queries on the web using DoltHub, as long as they take less than 20 seconds to execute. For most queries that should be plenty. Here's an example. Explore data with SQL on the web, no messy set up required.
Once you start wanting more complicated queries or making writes, you'll need a copy of Dolt locally. With Dolt, you get a functioning SQL database with data in just a couple commands. Install Dolt, pop open a terminal, run dolt clone
of some database on DoltHub, and you can run a SQL query. Here's the process:
$ sudo bash -c 'curl -L https://github.com/dolthub/dolt/releases/latest/download/install.sh | sudo bash'
$ dolt clone dolthub/nba-players
$ cd nba-players
$ dolt sql -q "show tables"
That's it. That complicated database set up process that would take you at least a day, takes you less than 5 minutes.
SELECT queries with real data
With Dolt, the data travels along with the database. You clone
a database with data in it. Start practicing your SELECT queries on real data that interests you: sports, machine learning, language, weather, or civics. DoltHub's Library of real datasets is growing constantly and new data is written every day.
You don't need to practice your SELECT queries on sample data. Get inspired by inspecting real data without worrying about affecting production. Produce some novel insights or visualizations.
$ dolt sql -q "describe players"
+------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+-------+
| id | BIGINT | NO | PRI | | |
| full_name | LONGTEXT | YES | | | |
| first_name | LONGTEXT | YES | | | |
| last_name | LONGTEXT | YES | | | |
| is_active | BIT(1) | YES | | | |
+------------+----------+------+-----+---------+-------+
$ dolt sql -q "select * from players where full_name='Michael Jordan'"
+-----+----------------+------------+-----------+-----------+
| id | full_name | first_name | last_name | is_active |
+-----+----------------+------------+-----------+-----------+
| 893 | Michael Jordan | Michael | Jordan | 0 |
+-----+----------------+------------+-----------+-----------+
$ dolt sql -q "select pts from career_totals_regular_season as c join players as p on p.id=c.player_id where p.full_name='Michael Jordan'"
+-------+
| pts |
+-------+
| 32292 |
+-------+
Practice write queries without fear
With other real data, you need to worry about your UPDATE, INSERT, or DELETE queries. Modifying schema with CREATE or ALTER statements is often dangerous as well. Not with Dolt. Dolt provides you with your own copy of the database. You can change it without fear of messing up anyone else's work. If you make a mistake, you can roll back to where you started with dolt checkout
or dolt reset --hard
.
$ dolt sql -q "alter table players add column favorite boolean"
$ dolt sql -q "update players set favorite=1 where full_name='Michael Jordan'"
Query OK, 1 row affected
Rows matched: 1 Changed: 1 Warnings: 0
$ dolt reset --hard
$ dolt sql -q "select * from players where full_name='Michael Jordan'"
+-----+----------------+------------+-----------+-----------+
| id | full_name | first_name | last_name | is_active |
+-----+----------------+------------+-----------+-----------+
| 893 | Michael Jordan | Michael | Jordan | 0 |
+-----+----------------+------------+-----------+-----------+
See your diffs
Every write query produces a diff in schema and data so you can see what you did after you did it. Look how easy it is to see what you changed in the above example.
$ dolt sql -q "alter table players add column favorite boolean"
$ dolt diff
diff --dolt a/players b/players
--- a/players @ nndfp4rtlch8nvvbkmb3nj0tebjuiq5t
+++ b/players @ elaqgikgtahhnrpenlb0aeuo1c2ffd3r
CREATE TABLE players (
`id` BIGINT NOT NULL COMMENT 'tag:3235'
`full_name` LONGTEXT COMMENT 'tag:2637'
`first_name` LONGTEXT COMMENT 'tag:3445'
`last_name` LONGTEXT COMMENT 'tag:2133'
`is_active` BIT(1) COMMENT 'tag:3765'
+ `favorite` TINYINT COMMENT 'tag:14373'
PRIMARY KEY (`id`)
);
+-----+----+-----------+------------+-----------+-----------+----------+
| < | id | full_name | first_name | last_name | is_active | |
| > | id | full_name | first_name | last_name | is_active | favorite |
+-----+----+-----------+------------+-----------+-----------+----------+
+-----+----+-----------+------------+-----------+-----------+----------+
$ dolt sql -q "update players set favorite=1 where full_name='Michael Jordan'"
Query OK, 1 row affected
Rows matched: 1 Changed: 1 Warnings: 0
$ dolt diff
diff --dolt a/players b/players
--- a/players @ rb4eb40m1engfbb5b62n3eo7qgpih3k4
+++ b/players @ elaqgikgtahhnrpenlb0aeuo1c2ffd3r
CREATE TABLE players (
`id` BIGINT NOT NULL COMMENT 'tag:3235'
`full_name` LONGTEXT COMMENT 'tag:2637'
`first_name` LONGTEXT COMMENT 'tag:3445'
`last_name` LONGTEXT COMMENT 'tag:2133'
`is_active` BIT(1) COMMENT 'tag:3765'
+ `favorite` TINYINT COMMENT 'tag:14373'
PRIMARY KEY (`id`)
);
+-----+-----+----------------+------------+-----------+-----------+----------+
| < | id | full_name | first_name | last_name | is_active | |
| > | id | full_name | first_name | last_name | is_active | favorite |
+-----+-----+----------------+------------+-----------+-----------+----------+
| < | 893 | Michael Jordan | Michael | Jordan | 0 | <NULL> |
| > | 893 | Michael Jordan | Michael | Jordan | 0 | 1 |
+-----+-----+----------------+------------+-----------+-----------+----------+
$ dolt reset --hard
$ dolt diff
$
The last diff is empty because you put the database back to where you started. I hope you can see how fun and forgiving practicing SQL with Dolt can be.
Contribute back your work
If you make a cool update to a database you are working with, you can contribute the work back to the database owner and have it displayed on DoltHub. Created a cool view or sample query, commit it and share it on DoltHub. Added an index to a few columns that you commonly query, commit it and share it on DoltHub. Whenever you make a change you can submit a Pull Request and have the data maintainer review your work. Your SQL practice can add real value to the DoltHub community and you can get real feedback from an expert.
Work with a friend
It's more fun to learn together. If you and a friend want to learn SQL together, you can use DoltHub as a way to collaborate. Whenever you make a change you can submit a Pull Request and have your friend review your work. You don't need to be sitting in the same place. The same way open source developers collaborate from all over the world to build software projects, you can collaborate on learning and building database projects.
Learn Git at the same time
Dolt matches Git's model and syntax exactly so while you're learning Dolt, you are also learning Git. If you are learning SQL, it's probably safe to assume that learning Git would also be useful. Dolt and Git work exactly the same. In Git, you're working with files. In Dolt, you're working with tables. Where you run git add
or git branch
, you run dolt add
or dolt branch
. Git and Dolt take a bit of getting used to but once you understand them, the tools are really powerful.
Conclusion
Dolt and DoltHub change the game when it comes to learning SQL. Get set up quickly, read and write to real databases, see your diffs, and share and collaborate remotely. It's never been easier to perfect your SQL skills. Don't be afraid to experiment and make mistakes.