- REFERENCE9 min read
So you want Database Versioning?
Here at DoltHub, we've had a lot of success with our "So you want..." series of blog posts helping people find Dolt when they are looking for it. Dolt is a lot of things. Dolt is a version controlled database, a Git database, Git for data, data…
Read More
- SQL11 min read
Implementing indexed joins
Happy Valentines Day from all of us at DoltHub ! You are the reason we do what we do! In honor of the holiday, we want to talk about how much we love making queries faster. We're going to examine how our...
Read More - FEATURE RELEASE4 min read
LICENSE.md and README.md in Dolt
Dolt and DoltHub strive to be the best data distribution platform on the internet. Having documentation versioned alongside data, and a standard, easy way to read the documentation online are features we admire in Git and GitHub. Following ...
Read More - FEATURE RELEASESQL7 min read
Introducing SQL VIEW Support in Dolt
Dolt is a SQL database with Git-style versioning and distribution. The most recent releases of Dolt introduced support for SQL views that are stored as part of, and versioned along with, a Dolt repository. This provides a great way for data sets ...
Read More - REFERENCE8 min read
Dolt and DoltHub: Getting Started
Dolt is a SQL database with Git-style versioning. In Git the unit of versioning is files. In Dolt, the unit of versioning is SQL tables. Dolt will eventually support 100% of the Git command line and 100% of MySQL SQL. Moreover, anything you can d...
Read More - DATASET4 min read
Mapping Income Inequality using IRS SOI Data
In a previous blog I showed how the history of a dataset can be queried using the dolt history tables, and in the first part of this 2 part blog I covered the IRS SOI data . In this second part I use the IRS SOI data along with doltpy ...
Read More - DATASET6 min read
IRS Sources Of Income Dataset
Every year the IRS publishes a treasure trove of data. It contains over a hundred different metrics which provide insight into the finances of American taxpayers. Even more compelling is they provide this information at ZIP code granularity, which…...
Read More - FEATURE RELEASEWEB2 min read
Querying DoltHub Repositories with SQL
Since its launch in 2008, GitHub has catalyzed the open source software world and accelerated the culture of software collaboration. Source control was an old idea at that point, but GitHub offered a centralized place to discover and collaborate...
Read More - SQL8 min read
Access to Everything Through SQL
When we started developing Dolt our vision was to deliver git functionality for data. Where git versions files, Dolt versions tables. We implemented table based diff and conflict logic and shipped the initial version. As we started to use Do...
Read More - FEATURE RELEASEWEB4 min read
DoltHub Redesign
Redesigning DoltHub Dolt is a database and a data format. DoltHub is a way of hosting and collaborating on Dolt databases. We decided to redesign DoltHub to make it more user friendly. We are excited to announce that we have released the resu...
Read More - SQL5 min read
Getting to one 9 of SQL correctness in Dolt
A few months ago we finally settled on a good way to measure the correctness of Dolt's SQL engine: the sqllogictest package, first developed for SQLite and since used as a benchmark for lots of other database implementations. SQLite hit u...
Read More - 5 min read
The History of Data Exchange
IBM and General Electric invented the first databases in the early 1960s. It was only by the early 1970s that enough data had accumulated in databases that the need to transfer data between databases emerged. Enter the Comma Separated Values (CSV…
Read More - DATASET5 min read
Maintained Wikipedia ngrams dataset in Dolt
Wikipedia is the largest and most popular general reference work on the internet, making it a powerful tool for predictive language modeling. Wikipedia releases a dump of all its articles and pages twice a month, and we created a dataset of...
Read More - DATASET5 min read
2 billion primes in a Dolt table
Since releasing Dolt , we have often been asked how it scales. How many rows and how many gigs can you get into a Dolt dataset before things start breaking badly? Answering this question in practice is kind of difficult, simply because it'...
Read More - 2 min read
No Food, One Problem. Have Food, Many Problems.
I have been a huge Econtalk fan for over ten years. On his podcast with Sebastian Junger , Russ Roberts brought up what he called a Chinese proverb. No food, one problem. Have food, many problems. The wisdom of this saying really resonated ...
Read More - DATASET4 min read
ImageNet in Dolt
ImageNet is a dataset maintained by the Stanford Vision Lab. It seems to have fallen into disrepair. The links to download the image labels are broken. We have managed to procure all four released versions of the labeled images and import them ...
Read More - FEATURE RELEASE7 min read
Tracking Data Changes with Dolt Blame
Ever look at some data and wonder where a particular value came from, how long it's been there, or what the reason for changing it was? This is important information, but current data storage formats don't track or expose it—certainly not in a…
Read More - 3 min read
Dolt: A Database with Branches
As we discussed in the Where Is the Data Catalog? blog post, Dolt is a database designed for internet-scale collaboration. There are databases with differences, history, rollback, and audit logging. We think the Git semantics of Dolt provi...
Read More - DATASET4 min read
WordNet in Dolt
The Princeton WordNet database is on DoltHub . This blog entry will be about how it got there and how to use it. WordNet is distributed natively from Princeton as a compilable custom database . You can also download the database files only b...
Read More - SQL4 min read
Testing Dolt's SQL Engine
When we first started writing Dolt , we weren’t thinking about SQL functionality. We just knew we wanted a way to package data sets to make them easy to share, collaborate and merge -- to do for data what git did for source code. But as we de...
Read More - 3 min read
Dolt: A Simple Example
When Dolt and DoltHub first went into private beta, we were surprised that the Iris dataset was the dataset people first tried to put in Dolt. If you are looking for that dataset, we have uploaded it to DoltHub . In this article, we're going t...
Read More