US Presidential Election $25,000 Database Bounty Review
On December 14, we launched our first data bounty to earn a share of $25,000 by wrangling US Presidential Precinct-level data. The bounty ended yesterday. How did it go? This blog entry will answer that question.
Dolt is a SQL database with Git-style versioning. It's the first SQL database you can branch and merge. DoltHub is a place on the internet to share and collaborate on Dolt databases. Without both, data bounties would not be possible.
The Results
We built the best open database of US Precinct-level Election results on the internet.
Here's some statistics:
- 15.5M cells edited. 1.7GB of data collected
- All 51 "states" covered for 2016. 38 states covered for 2020.
- 100% of the vote covered for 2016. 78% for 2020.
- 75 Pull Requests (PRs) accepted across 6 bounty participants.
- Top bounty participant earned over $10,000.
We're very excited to mail the checks to all the folks who worked hard to make this bounty a success.
What's Next?
Now that the bounty is complete, we must figure out what to do with this data. How do we get the most out of our investment in the data?
Clean up the data we got
Given the heavily keyed nature of this data, we decided early on to not accept PRs for standardization or normalization of things like party names. Because most columns are part of the primary key, any change looks like a deletion and corresponding addition to Dolt. Thus, changes like that give the corrector credit for the whole row in the bounty instead of just the cell he or she changed. We queued up a number of these types of changes and we'll execute them now that the bounty is over.
Find some users
We collected the best US presidential Precinct level results on the internet. Now, we want people to use it. We think potentially the data is good enough that an open data community could be bootstrapped around it. The people who use it would have incentive to finish it as new state level data is released. We will reach out to the people at Open Elections and MIT Elections Lab and start a conversation. If you know anyone else who could use this data, come let us know in our Discord.
Run another bounty?
We're still missing 12 states and about 22% of the vote from 2020. If we want to get complete data and an open data community can't be bootstrapped, we may run another bounty in a couple months to finish the dataset. Let us know if you'd be interested in another bounty or the complete dataset.
Conclusion
We think DoltHub Bounties may be the fastest, cheapest way to build databases from open data. For $25,000 and 8 weeks, we were able to assemble a 1.7GB database of election results.
Our plan is to run at least one bounty per month for the rest of the year across a number of distinct data disciplines. We're running a hospital price transparency bounty right now. We'll be launching two more over the next month or so. Hang out in our Discord to keep up to date. Start wrangling data as your new side hustle.