$10,000 Jails and Prisons Data Bounty
What are data bounties?
DoltHub data bounties are our way of getting people to use Dolt: in each bounty, we build meaningful datasets together using Git-style pull requests. If you haven't participated in a bounty already, why not start now? We pay out $10-15k over the course of each one. Past bounties have included:
- the largest open database of hospital chargemasters
- the largest open database of housing prices
- a huge dataset of NBA players data
Incidentally, DoltHub bounties are how I learned python and SQL and helped me land this job at DoltHub. Anyone can get started, even you! Really!
So as we wrap up our $15,0000 hospitals bounty (thanks to all who participated!), allow me to introduce our next project:
The $10,000 US Local Jails Bounty
The world has its eyes on US corrections. We lock up more people than any other country, with 573 people in prison per 100,0000 nationwide. Data on individual prisons is tough to come by, but a number of projects have data science teams working on it. We're taking an even greater step in the direction of transparency by crowdsourcing the data collection and making the database open-source.
With this bounty we hope to:
- get a clear picture of what kind of public-facing information exists
- track prison occurrences (suicides/deaths)
- look at jails that systematically hold onto large numbers of unconvicted people
- jumpstart other projects that depend on the scraped inmate data
Right now we are only looking at aggregate data (total populations, not inmate-level.) This data typically only comes around once every few years when a census is taken. Instead, our participants will be scraping and updating this data as we go, and their work will be laying the foundation for future data transparency projects in this space.
Other initiatives have been similar, but we'll be the first to see how far we can get without an organized data-science team.
Data validation
Data validation is crucial on any serious project involving justice or public health. The bounty system incentivizes data validation: since bounties are scored per cell edit, other players can poach cells by finding mistakes in each others' work. Two source_url
columns give us additional validation power.
The main table: inmate_population_snapshots
We'll be mainly working on the inmate_population_snapshots
and incidents
tables. The idea is to get as much fine-grained information as we can (short of personally-identifying information) from different local jails (including race, offense, deaths, suicides, and citizenship information.) Statistics that come from the federal government are typically aggregated and do not allow for local-level inference.
This is the kind of data we're looking for:
The schema here was inspired by the National Archive of Criminal Justice Data's 2019 census of US jails.
CREATE TABLE `inmate_population_snapshots` (
`id` char(21) NOT NULL,
`snapshot_date` date NOT NULL,
`total` int,
`male` int,
`female` int,
`other_gender` int,
`white` int,
`black` int,
`hispanic` int,
`asian` int,
`american_indian` int,
`multi_racial` int,
`other_race` int,
`probation` int,
`parole` int,
`felony` int,
`misdemeanor` int,
`other_offense` int,
`convicted` int,
`unconvicted` int,
`citizen` int,
`noncitizen` int,
`citizen_convicted` int,
`citizen_unconvicted` int,
`noncitizen_convicted` int,
`noncitizen_unconvicted` int,
`juvenile` int,
`juvenile_male` int,
`juvenile_female` int,
`death_row_condemned` int,
`solitary_confinement` int,
`source_url` varchar(2043) NOT NULL,
`source_url_2` varchar(2043),
PRIMARY KEY (`id`,`snapshot_date`),
KEY `id` (`id`),
CONSTRAINT `1aslttls` FOREIGN KEY (`id`) REFERENCES `jails` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
And for incidents:
CREATE TABLE `incidents` (
`id` char(21) NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
`deaths` int,
`fights` int,
`total_assaults` int,
`inmate_on_inmate_assaults` int,
`inmate_on_staff_assaults` int,
`suicides` int,
`escapes` int,
`source_url` varchar(2043) NOT NULL,
`source_url_2` varchar(2043),
PRIMARY KEY (`id`,`start_date`,`end_date`),
KEY `id` (`id`),
CONSTRAINT `o3jqb74m` FOREIGN KEY (`id`) REFERENCES `jails` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
For more information on the bounty schema see the README. We hope you'll join us by making your first pull request!
Get started
Join our mission and be a part of our experiment. Join our Discord channel to ask us questions, particularly if you need help getting started with Dolt.