ShareDB - A Licensing Model and Ecosystem for Data Sharing

Collaboratively-sourced shared data sets are the core of many of the most societally important ”big data” problems. Consider personalized medicine: the promise is that by comparing a new patient’s symptoms, genome, and demographics to millions of other patients who have come before, it will be possible to develop personalized therapies that revolutionize medicine. Doing this requires hospitals, pharmaceutical companies, sequencing labs, etc. to create shared data repositories for analysis. As part of the NSF North East BD Hub, we build a data sharing ecosystem along the lines of what exists in the open-source software community that enables data providers to easily share data while enforce certain requirements at the same time. Although services and licenses (e.g., Creative Commons) for open data exist today, the emphasis is on attribution, and these frequently sidestep the most difficult issues by stipulating that data is available for anyone to use however they wish. Instead, we propose to develop standardized data sharing agreements that provide easy-to-understand terms that have worked well in individual data sharing agreements.

This is a joint project with MIT and Drexel.

Tim Kraska
Carsten Binnig
