');mask-image:url('data:image/svg+xml;charset=utf-8, ');width:16px}.markdown-body details,.markdown-body figcaption,.markdown-body figure{display:block}.markdown-body summary{display:list-item}.markdown-body [hidden]{display:none!important}.markdown-body a{background-color:transparent;color:#0969da;-webkit-text-decoration:none;text-decoration:none}.markdown-body abbr[title]{border-bottom:none;-webkit-text-decoration:underline dotted;text-decoration:underline;text-decoration:underline dotted}.markdown-body b,.markdown-body strong{font-weight:600}.markdown-body dfn{font-style:italic}.markdown-body h1{border-bottom:1px solid rgba(209,217,224,.702);font-size:2em;font-weight:600;margin:.67em 0;padding-bottom:.3em}.markdown-body mark{background-color:#fff8c5;color:#1f2328}.markdown-body small{font-size:90%}.markdown-body sub,.markdown-body sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}.markdown-body sub{bottom:-.25em}.markdown-body sup{top:-.5em}.markdown-body img{border-style:none;box-sizing:content-box;max-width:100%}.markdown-body code,.markdown-body kbd,.markdown-body pre,.markdown-body samp{font-family:monospace;font-size:1em}.markdown-body figure{margin:1em 2.5rem}.markdown-body hr{background:transparent;background-color:#d1d9e0;border:0;box-sizing:content-box;height:.25em;margin:1.5rem 0;overflow:hidden;padding:0}.markdown-body input{font:inherit;font-family:inherit;font-size:inherit;line-height:inherit;margin:0;overflow:visible}.markdown-body [type=button],.markdown-body [type=reset],.markdown-body [type=submit]{-webkit-appearance:button;-moz-appearance:button;appearance:button}.markdown-body [type=checkbox],.markdown-body [type=radio]{box-sizing:border-box;padding:0}.markdown-body [type=number]::-webkit-inner-spin-button,.markdown-body [type=number]::-webkit-outer-spin-button{height:auto}.markdown-body [type=search]::-webkit-search-cancel-button,.markdown-body [type=search]::-webkit-search-decoration{-webkit-appearance:none;appearance:none}.markdown-body ::-webkit-input-placeholder{color:inherit;opacity:.54}.markdown-body ::-webkit-file-upload-button{-webkit-appearance:button;appearance:button;font:inherit}.markdown-body a:hover{-webkit-text-decoration:underline;text-decoration:underline}.markdown-body ::-moz-placeholder{color:#59636e;opacity:1}.markdown-body ::placeholder{color:#59636e;opacity:1}.markdown-body hr:after,.markdown-body hr:before{content:"";display:table}.markdown-body hr:after{clear:both}.markdown-body table{font-feature-settings:"tnum";border-collapse:collapse;border-spacing:0;display:block;font-variant:tabular-nums;max-width:100%;overflow:auto;width:-moz-max-content;width:max-content}.markdown-body td,.markdown-body th{padding:0}.markdown-body details summary{cursor:pointer}.markdown-body [role=button]:focus,.markdown-body a:focus,.markdown-body input[type=checkbox]:focus,.markdown-body input[type=radio]:focus{box-shadow:none;outline:2px solid #0969da;outline-offset:-2px}.markdown-body [role=button]:focus:not(:focus-visible),.markdown-body a:focus:not(:focus-visible),.markdown-body input[type=checkbox]:focus:not(:focus-visible),.markdown-body input[type=radio]:focus:not(:focus-visible){outline:1px solid transparent}.markdown-body [role=button]:focus-visible,.markdown-body a:focus-visible,.markdown-body input[type=checkbox]:focus-visible,.markdown-body input[type=radio]:focus-visible{box-shadow:none;outline:2px solid #0969da;outline-offset:-2px}.markdown-body a:not([class]):focus,.markdown-body a:not([class]):focus-visible,.markdown-body input[type=checkbox]:focus,.markdown-body input[type=checkbox]:focus-visible,.markdown-body input[type=radio]:focus,.markdown-body input[type=radio]:focus-visible{outline-offset:0}.markdown-body kbd{background-color:#f6f8fa;border:1px solid rgba(209,217,224,.702);border-radius:6px;box-shadow:inset 0 -1px 0 rgba(209,217,224,.702);color:#1f2328;display:inline-block;font:11px ui-monospace,SFMono-Regular,SF Mono,Menlo,Consolas,Liberation Mono,monospace;line-height:10px;padding:.25rem;vertical-align:middle}.markdown-body h1,.markdown-body h2,.markdown-body h3,.markdown-body h4,.markdown-body h5,.markdown-body h6{font-weight:600;line-height:1.25;margin-bottom:1rem;margin-top:1.5rem}.markdown-body h2{border-bottom:1px solid rgba(209,217,224,.702);font-size:1.5em;font-weight:600;padding-bottom:.3em}.markdown-body h3{font-size:1.25em;font-weight:600}.markdown-body h4{font-size:1em;font-weight:600}.markdown-body h5{font-size:.875em;font-weight:600}.markdown-body h6{color:#59636e;font-size:.85em;font-weight:600}.markdown-body p{margin-bottom:10px;margin-top:0}.markdown-body blockquote{border-left:.25em solid #d1d9e0;color:#59636e;margin:0;padding:0 1em}.markdown-body ol,.markdown-body ul{margin-bottom:0;margin-top:0;padding-left:2em}.markdown-body ol ol,.markdown-body ul ol{list-style-type:lower-roman}.markdown-body ol ol ol,.markdown-body ol ul ol,.markdown-body ul ol ol,.markdown-body ul ul ol{list-style-type:lower-alpha}.markdown-body dd{margin-left:0}.markdown-body code,.markdown-body pre,.markdown-body samp,.markdown-body tt{font-family:ui-monospace,SFMono-Regular,SF Mono,Menlo,Consolas,Liberation Mono,monospace;font-size:12px}.markdown-body pre{word-wrap:normal;margin-bottom:0;margin-top:0}.markdown-body .octicon{fill:currentColor;display:inline-block;overflow:visible!important;vertical-align:text-bottom}.markdown-body input::-webkit-inner-spin-button,.markdown-body input::-webkit-outer-spin-button{-webkit-appearance:none;appearance:none;margin:0}.markdown-body .mr-2{margin-right:.5rem!important}.markdown-body:after,.markdown-body:before{content:"";display:table}.markdown-body:after{clear:both}.markdown-body>:first-child{margin-top:0!important}.markdown-body>:last-child{margin-bottom:0!important}.markdown-body a:not([href]){color:inherit;-webkit-text-decoration:none;text-decoration:none}.markdown-body .absent{color:#d1242f}.markdown-body .anchor{float:left;line-height:1;margin-left:-20px;padding-right:.25rem}.markdown-body .anchor:focus{outline:none}.markdown-body blockquote,.markdown-body details,.markdown-body dl,.markdown-body ol,.markdown-body p,.markdown-body pre,.markdown-body table,.markdown-body ul{margin-bottom:1rem;margin-top:0}.markdown-body blockquote>:first-child{margin-top:0}.markdown-body blockquote>:last-child{margin-bottom:0}.markdown-body h1 .octicon-link,.markdown-body h2 .octicon-link,.markdown-body h3 .octicon-link,.markdown-body h4 .octicon-link,.markdown-body h5 .octicon-link,.markdown-body h6 .octicon-link{color:#1f2328;vertical-align:middle;visibility:hidden}.markdown-body h1:hover .anchor,.markdown-body h2:hover .anchor,.markdown-body h3:hover .anchor,.markdown-body h4:hover .anchor,.markdown-body h5:hover .anchor,.markdown-body h6:hover .anchor{-webkit-text-decoration:none;text-decoration:none}.markdown-body h1:hover .anchor .octicon-link,.markdown-body h2:hover .anchor .octicon-link,.markdown-body h3:hover .anchor .octicon-link,.markdown-body h4:hover .anchor .octicon-link,.markdown-body h5:hover .anchor .octicon-link,.markdown-body h6:hover .anchor .octicon-link{visibility:visible}.markdown-body h1 code,.markdown-body h1 tt,.markdown-body h2 code,.markdown-body h2 tt,.markdown-body h3 code,.markdown-body h3 tt,.markdown-body h4 code,.markdown-body h4 tt,.markdown-body h5 code,.markdown-body h5 tt,.markdown-body h6 code,.markdown-body h6 tt{font-size:inherit;padding:0 .2em}.markdown-body summary h1,.markdown-body summary h2,.markdown-body summary h3,.markdown-body summary h4,.markdown-body summary h5,.markdown-body summary h6{display:inline-block}.markdown-body summary h1 .anchor,.markdown-body summary h2 .anchor,.markdown-body summary h3 .anchor,.markdown-body summary h4 .anchor,.markdown-body summary h5 .anchor,.markdown-body summary h6 .anchor{margin-left:-40px}.markdown-body summary h1,.markdown-body summary h2{border-bottom:0;padding-bottom:0}.markdown-body ol.no-list,.markdown-body ul.no-list{list-style-type:none;padding:0}.markdown-body ol[type="a s"]{list-style-type:lower-alpha}.markdown-body ol[type="A s"]{list-style-type:upper-alpha}.markdown-body ol[type="i s"]{list-style-type:lower-roman}.markdown-body ol[type="I s"]{list-style-type:upper-roman}.markdown-body div>ol:not([type]),.markdown-body ol[type="1"]{list-style-type:decimal}.markdown-body ol ol,.markdown-body ol ul,.markdown-body ul ol,.markdown-body ul ul{margin-bottom:0;margin-top:0}.markdown-body li>p{margin-top:1rem}.markdown-body li+li{margin-top:.25em}.markdown-body dl{padding:0}.markdown-body dl dt{font-size:1em;font-style:italic;font-weight:600;margin-top:1rem;padding:0}.markdown-body dl dd{margin-bottom:1rem;padding:0 1rem}.markdown-body table th{font-weight:600}.markdown-body table td,.markdown-body table th{border:1px solid #d1d9e0;padding:6px 13px}.markdown-body table td>:last-child{margin-bottom:0}.markdown-body table tr{background-color:#fff;border-top:1px solid rgba(209,217,224,.702)}.markdown-body table tr:nth-child(2n){background-color:#f6f8fa}.markdown-body table img{background-color:transparent}.markdown-body img[align=right]{padding-left:20px}.markdown-body img[align=left]{padding-right:20px}.markdown-body .emoji{background-color:transparent;max-width:none;vertical-align:text-top}.markdown-body span.frame{display:block;overflow:hidden}.markdown-body span.frame>span{border:1px solid #d1d9e0;display:block;float:left;margin:13px 0 0;overflow:hidden;padding:7px;width:auto}.markdown-body span.frame span img{display:block;float:left}.markdown-body span.frame span span{clear:both;color:#1f2328;display:block;padding:5px 0 0}.markdown-body span.align-center{clear:both;display:block;overflow:hidden}.markdown-body span.align-center>span{display:block;margin:13px auto 0;overflow:hidden;text-align:center}.markdown-body span.align-center span img{margin:0 auto;text-align:center}.markdown-body span.align-right{clear:both;display:block;overflow:hidden}.markdown-body span.align-right>span{display:block;margin:13px 0 0;overflow:hidden;text-align:right}.markdown-body span.align-right span img{margin:0;text-align:right}.markdown-body span.float-left{display:block;float:left;margin-right:13px;overflow:hidden}.markdown-body span.float-left span{margin:13px 0 0}.markdown-body span.float-right{display:block;float:right;margin-left:13px;overflow:hidden}.markdown-body span.float-right>span{display:block;margin:13px auto 0;overflow:hidden;text-align:right}.markdown-body code,.markdown-body tt{background-color:rgba(129,139,152,.122);border-radius:6px;font-size:85%;margin:0;padding:.2em .4em;white-space:break-spaces}.markdown-body code br,.markdown-body tt br{display:none}.markdown-body del code{text-decoration:inherit}.markdown-body samp{font-size:85%}.markdown-body pre code{font-size:100%}.markdown-body pre>code{background:transparent;border:0;margin:0;padding:0;white-space:pre;word-break:normal}.markdown-body .highlight{margin-bottom:1rem}.markdown-body .highlight pre{margin-bottom:0;word-break:normal}.markdown-body .highlight pre,.markdown-body pre{background-color:#f6f8fa;border-radius:6px;color:#1f2328;font-size:85%;line-height:1.45;overflow:auto;padding:1rem}.markdown-body pre code,.markdown-body pre tt{word-wrap:normal;background-color:transparent;border:0;display:inline;line-height:inherit;margin:0;max-width:auto;overflow:visible;padding:0}.markdown-body .csv-data td,.markdown-body .csv-data th{font-size:12px;line-height:1;overflow:hidden;padding:5px;text-align:left;white-space:nowrap}.markdown-body .csv-data .blob-num{background:#fff;border:0;padding:10px .5rem 9px;text-align:right}.markdown-body .csv-data tr{border-top:0}.markdown-body .csv-data th{background:#f6f8fa;border-top:0;font-weight:600}.markdown-body [data-footnote-ref]:before{content:"["}.markdown-body [data-footnote-ref]:after{content:"]"}.markdown-body .footnotes{border-top:1px solid #d1d9e0;color:#59636e;font-size:12px}.markdown-body .footnotes ol{padding-left:1rem}.markdown-body .footnotes ol ul{display:inline-block;margin-top:1rem;padding-left:1rem}.markdown-body .footnotes li{position:relative}.markdown-body .footnotes li:target:before{border:2px solid #0969da;border-radius:6px;bottom:-.5rem;content:"";left:-1.5rem;pointer-events:none;position:absolute;right:-.5rem;top:-.5rem}.markdown-body .footnotes li:target{color:#1f2328}.markdown-body .footnotes .data-footnote-backref g-emoji{font-family:monospace}.markdown-body body:has(:modal){padding-right:var(--dialog-scrollgutter)!important}.markdown-body .pl-c{color:#59636e}.markdown-body .pl-c1,.markdown-body .pl-s .pl-v{color:#0550ae}.markdown-body .pl-e,.markdown-body .pl-en{color:#6639ba}.markdown-body .pl-s .pl-s1,.markdown-body .pl-smi{color:#1f2328}.markdown-body .pl-ent{color:#0550ae}.markdown-body .pl-k{color:#cf222e}.markdown-body .pl-pds,.markdown-body .pl-s,.markdown-body .pl-s .pl-pse .pl-s1,.markdown-body .pl-sr,.markdown-body .pl-sr .pl-cce,.markdown-body .pl-sr .pl-sra,.markdown-body .pl-sr .pl-sre{color:#0a3069}.markdown-body .pl-smw,.markdown-body .pl-v{color:#953800}.markdown-body .pl-bu{color:#82071e}.markdown-body .pl-ii{background-color:#82071e;color:#f6f8fa}.markdown-body .pl-c2{background-color:#cf222e;color:#f6f8fa}.markdown-body .pl-sr .pl-cce{color:#116329;font-weight:700}.markdown-body .pl-ml{color:#3b2300}.markdown-body .pl-mh,.markdown-body .pl-mh .pl-en,.markdown-body .pl-ms{color:#0550ae;font-weight:700}.markdown-body .pl-mi{color:#1f2328;font-style:italic}.markdown-body .pl-mb{color:#1f2328;font-weight:700}.markdown-body .pl-md{background-color:#ffebe9;color:#82071e}.markdown-body .pl-mi1{background-color:#dafbe1;color:#116329}.markdown-body .pl-mc{background-color:#ffd8b5;color:#953800}.markdown-body .pl-mi2{background-color:#0550ae;color:#d1d9e0}.markdown-body .pl-mdr{color:#8250df;font-weight:700}.markdown-body .pl-ba{color:#59636e}.markdown-body .pl-sg{color:#818b98}.markdown-body .pl-corl{color:#0a3069;-webkit-text-decoration:underline;text-decoration:underline}.markdown-body [role=button]:focus:not(:focus-visible),.markdown-body [role=tabpanel][tabindex="0"]:focus:not(:focus-visible),.markdown-body a:focus:not(:focus-visible),.markdown-body button:focus:not(:focus-visible),.markdown-body summary:focus:not(:focus-visible){box-shadow:none;outline:none}.markdown-body [tabindex="0"]:focus:not(:focus-visible),.markdown-body details-dialog:focus:not(:focus-visible){outline:none}.markdown-body g-emoji{display:inline-block;font-family:Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;font-size:1em;font-style:normal!important;font-weight:400;line-height:1;min-width:1ch;vertical-align:-.075em}.markdown-body g-emoji img{height:1em;width:1em}.markdown-body .task-list-item{list-style-type:none}.markdown-body .task-list-item label{font-weight:400}.markdown-body .task-list-item.enabled label{cursor:pointer}.markdown-body .task-list-item+.task-list-item{margin-top:.25rem}.markdown-body .task-list-item .handle{display:none}.markdown-body .task-list-item-checkbox{margin:0 .2em .25em -1.4em;vertical-align:middle}[dir=rtl] .markdown-body ol .task-list-item-checkbox,[dir=rtl] .markdown-body ul .task-list-item-checkbox{margin:0 -1.6em .25em .2em}.markdown-body .contains-task-list:focus-within .task-list-item-convert-container,.markdown-body .contains-task-list:hover .task-list-item-convert-container{clip:auto;display:block;height:24px;overflow:visible;width:auto}.markdown-body ::-webkit-calendar-picker-indicator{filter:invert(50%)}.markdown-body .markdown-alert{border-left:.25em solid #d1d9e0;color:inherit;margin-bottom:1rem;padding:.5rem 1rem}.markdown-body .markdown-alert>:first-child{margin-top:0}.markdown-body .markdown-alert>:last-child{margin-bottom:0}.markdown-body .markdown-alert .markdown-alert-title{align-items:center;display:flex;font-weight:500;line-height:1}.markdown-body .markdown-alert.markdown-alert-note{border-left-color:#0969da}.markdown-body .markdown-alert.markdown-alert-note .markdown-alert-title{color:#0969da}.markdown-body .markdown-alert.markdown-alert-important{border-left-color:#8250df}.markdown-body .markdown-alert.markdown-alert-important .markdown-alert-title{color:#8250df}.markdown-body .markdown-alert.markdown-alert-warning{border-left-color:#9a6700}.markdown-body .markdown-alert.markdown-alert-warning .markdown-alert-title{color:#9a6700}.markdown-body .markdown-alert.markdown-alert-tip{border-left-color:#1a7f37}.markdown-body .markdown-alert.markdown-alert-tip .markdown-alert-title{color:#1a7f37}.markdown-body .markdown-alert.markdown-alert-caution{border-left-color:#cf222e}.markdown-body .markdown-alert.markdown-alert-caution .markdown-alert-title{color:#d1242f}.markdown-body>:first-child>.heading-element:first-child{margin-top:0!important}.markdown-body .highlight pre:has(+.zeroclipboard-container){min-height:52px}
Assembling a Grand Catalog—A Data Bounty Retrospective | DoltHub Blog Should you use DoltHub Bounties for your data-wrangling needs? Our bounty partners wanted to assemble a “master” catalog of all the college courses taught in the United States. For them, it was an easy riddle.
To recap, a partner approached us with a request to create a database of US College Course Catalogs . At first we were a bit hesitant because the data wouldn’t be open after the bounty ended. But, we’re intrepid souls, and we ended up giving it a try. We’re excited to share the results in this blog post.
The Virtues of Data Bounties
So far hosting bounties have delivered a flywheel of benefits for DoltHub . First, they incentivize community use and familiarity with the product—we find that Dolt and DoltHub confer compelling new powers for programmers, but only if they overcome the initial hurdles towards familiarity. Secondly, in addition to attracting participants to take part in the collection and curation of the dataset, once acquired, the data itself serves as an attraction to Dolt and DoltHub. Users who wish to peruse or look at the data without needing revision control features are incentivized to acquire Dolt and utilize the in-built sql engine. Lastly, the data is fascinating. It is this potential to source riveting datasets that excites us the most about all that’s enabled by Dolt.
The Results
After running the bounty for one month, with an incentive of $10,000, we received structured course catalog information from 65 schools nearly 7 GB, and spanning as far back as 1984. We received 77 PR’s from 13 different community members, eager to claim their share of the prize, at the cost of under $200 per school.
The final scoreboard for our National Course Catalog Data Bounty.
After an initial kernel of work of describing and clarifying the “shape” of the data via a schema, bounties take on a life of their own. After a bit of gamification, the data collects itself! Compare that to the challenges involved with trying to mine this data in-house, either taxing your development team, or outsourcing the project to an agency or freelancers.
Ease and Affordability
It’s important not to underestimate the amount of work involved, to gather the data for each school requires understanding the topology of the online course catalog site and the creation of a custom scraping tool for each one. Although there are shared design patterns across the individualized scrapers, there are limited opportunities for code reuse. In short, there’s no way we could match the affordability and ease of the bounty model with more traditional approaches.
A Transmogrification
Bounty participants are motivated by a myriad of factors. Of course they want to increase their share of the prize; yet we find the force of the dollar is rivaled by that of the friendly competitive spirit, i.e. the desire to outdo one’s collaborators. Furthermore, participants feel a sense of stewardship over the database and anticipate with pride the future use of the dataset they assemble. Self and altruistic interest thus dovetail synergistically here. Our bounty partners in sponsoring the prize pool wished to remain anonymous and asked that we privatize the dataset at the bounty’s conclusion, which dampened community interest a little bit. Even still, the force of the remaining motivators were enough to drive impressive levels of participation.
We’ve stumbled upon something a little magical, a way to transmogrify the tedium and doldrum of data-cleaning into a fun and engaging contest. We’ll be discussing the bounty results with our partners over the next week and you may see a version 2.0 of this bounty. There’s wonderful interest from all corners.
Our top earning participant took home a whooping $5.6k and had an edit count of more than 48 million.
Info-oleum
What information is out there that would advance your projects, existing in an unrefined and latent form? What insights might you gain, if only the laborious process of gathering and cleaning that information could be elided? Our partners wanted deep insight into what’s being taught across the nation, and a data bounty was the answer.. Your questions shall be different, but we are eager to discuss them all the same. Reach out to us via our Discord !
You fund the bounty prize pool, we design, publicize, and administer the data bounty, and out the other side comes structured, massive datasets.
Let’s take a closer look. Suppose we had a dire need for some semi-structured data. Our national course catalog will serve as an instructive example. And suppose that a programmer could maintain a pace of one catalog per five hours when designing and deploying a scraper. If we assigned 4 programmers and allocated 15 hours a week for each programmer, at the end of one month, we’d have data from 48 schools, (not quite the 65 from the bounty!). Those 360 programmer-hours at the low end might cost $22,500, but could reasonably run as much as $36,000. All to fall short of what a data bounty accomplished with an input of only $10,000.
Of course, we’re assuming that the team has the spare bandwidth to pursue this project. Most of us don’t have 4 engineers with 15 free hours a week to dedicate to data acquisition and transformation, realistically an in-house team could only take this on by sidelining existing projects. Instead, this work would need be outsourced, or the in-house team expanded to accommodate this work. Add the cost of talent acquisition to the raw cost of the programmer-hours, and we could easily exceed $50,000 to accomplish the same as the data-bounty.
JOIN THE DATA EVOLUTION
Get started with Dolt