October Dataset Spotlight
Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic.
For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL tables. DoltHub is a place on the internet to share Dolt repositories.
We think the way we share data with each other is broken and we think Dolt is the fix. Whenever you see a link to a CSV, JSON, or XML file, you should think of Dolt. Whenever you see an API but want all the data, not just a few entries, you should think of Dolt. We are working hard to move data shared in these formats to Dolt. This series of blogs will update you on our progress.
NBA Player Statistics
Link: dolthub/nba-players
Contributor: dolthub
First Published: May 5, 2020
The LA Lakers won the NBA Bubble championship. As LA residents, we at DoltHub congratulate our team. Moreover, we updated the NBA player statistics dataset with the NBA Bubble statistics. Also, thanks to the user jacob
who added a draft history table using our new fork and pull request feature. Distributed data collaboration is coming, we can feel it.
Supreme Court Cases
Link: dolthub/us-supreme-court-cases
Contributor: dolthub
First Published: April 23, 2020
The Supreme Court has been in the news a lot lately. We have a supreme court cases transcript dataset complete with data on the justices. We made a Pull Request with Ruth Bader Ginsburg's passing and we made another for Amy Coney Barrett. The Diff and Pull Request workflow is on full display in this dataset.
Primes
Link: bblank/math
Contributor: bblank
First Published: October 11, 2020
What is it about prime numbers? First Zach put the first two billion primes in a Dolt repository. Then bblank
shows up with bigger ambitions, "Math database for storing results of complex calculations.", and does the exact same thing. We'll see what bblank
comes up with next.
End SARS
Link: emekaboris/EndSars
Contributor: emekaboris
First Published: October 17, 2020
End SARS is a movement to stop police brutality in Nigeria. SARS stands for "Special Anti-Robbery Squad", a particularly notorious arm of the Nigerian police. This dataset contains all the tweets with the #EndSARS hashtag. We're glad to see Dolt being used to help social movements all across the globe.
Cloud Native Computing Foundation
Link: cncf/landscape
Contributor: cncf
First Published: September 24, 2020
The Cloud Native Computing Foundation published a dataset of interesting cloud technologies. The dataset is a cool place to start if you are interested in a good survey of cloud technologies, whether they are open source, and how to contact the owners.
Conclusion
That's it for this month. For Dolt and DoltHub to continue to exist, we need a community of data publishers to emerge. Help us build a community by publishing. We published a blog on how to publish with SQL and another on how to publish CSVs.
That said, if you want data in Dolt format but don't have the time or expertise to import and maintain it, send us a note or chat with us on Discord. We're happy to be an open data provider for your projects.