E-mail:
Jack Balkin: jackbalkin at yahoo.com
Bruce Ackerman bruce.ackerman at yale.edu
Ian Ayres ian.ayres at yale.edu
Corey Brettschneider corey_brettschneider at brown.edu
Mary Dudziak mary.l.dudziak at emory.edu
Joey Fishkin joey.fishkin at gmail.com
Heather Gerken heather.gerken at yale.edu
Abbe Gluck abbe.gluck at yale.edu
Mark Graber mgraber at law.umaryland.edu
Stephen Griffin sgriffin at tulane.edu
Jonathan Hafetz jonathan.hafetz at shu.edu
Jeremy Kessler jkessler at law.columbia.edu
Andrew Koppelman akoppelman at law.northwestern.edu
Marty Lederman msl46 at law.georgetown.edu
Sanford Levinson slevinson at law.utexas.edu
David Luban david.luban at gmail.com
Gerard Magliocca gmaglioc at iupui.edu
Jason Mazzone mazzonej at illinois.edu
Linda McClain lmcclain at bu.edu
John Mikhail mikhail at law.georgetown.edu
Frank Pasquale pasquale.frank at gmail.com
Nate Persily npersily at gmail.com
Michael Stokes Paulsen michaelstokespaulsen at gmail.com
Deborah Pearlstein dpearlst at yu.edu
Rick Pildes rick.pildes at nyu.edu
David Pozen dpozen at law.columbia.edu
Richard Primus raprimus at umich.edu
K. Sabeel Rahmansabeel.rahman at brooklaw.edu
Alice Ristroph alice.ristroph at shu.edu
Neil Siegel siegel at law.duke.edu
David Super david.super at law.georgetown.edu
Brian Tamanaha btamanaha at wulaw.wustl.edu
Nelson Tebbe nelson.tebbe at brooklaw.edu
Mark Tushnet mtushnet at law.harvard.edu
Adam Winkler winkler at ucla.edu
Genomics—the study of organisms’ entire genomes—holds great promise to advance biological knowledge and facilitate the development of new diagnostics and therapeutics. Genomics research has benefited greatly from various policies requiring the rapid disclosure of nucleotide sequence data in public databases. The result is a genomic data commons, a widely-accessible repository of information from which all members of the scientific community can draw. Notably, this intensely productive space operates almost completely outside of formal intellectual property law through a combination of public funding, agency policy, and communal norms.
The genomic data commons has attracted significant scholarly interest both because of its great potential to advance biomedical research as well as its broader lessons about the nature of commons-based productivity. For instance, Jorge Contreras has charted the evolution of the genomic data commons from a system that essentially disseminates information into the public domain into a more complex, “polycentric” governance institution for managing knowledge resources. This paper, which grows out of Brett Frischmann, Michael Madison, and Kathy Strandburg’s project to study commons governance, explores less appreciated but highly significant complexities of managing genomic information. In so doing, it seeks to shed greater light on the nature of commons in general.
In particular, this paper focuses on the governance challenges of correcting, updating, and annotating vast amounts of sequence data in the commons. Most legal accounts of the genomic data commons focus on researchers’ initial provisioning of data and access to such data by other scientists. Delving into the science of genome sequencing, assembly, and annotation, however, this paper highlights the indeterminate nature of sequence data and related information. Quite simply, the genomic data commons is full of errors and incompleteness. Accordingly, this paper examines four approaches for correcting, completing, and updating existing data: contributor-centric data management, third-party biocuration, community-based wikification, and specialized databases and genome browsers. It argues that these approaches reveal deep tensions between centralization and fragmentation of control within the genomic data commons, a tension that can be mitigated through a strategy of replication.
On one hand, contributor-centric data management and third-party biocuration represent mechanisms for centralizing control over data. In these models, the original data contributor or database manager has almost an exclusive ability to update existing records. On the other hand, wiki-based annotation fragments control throughout the community, exploiting the power of peer production and parallel data analysis to augment existing data records. Both centralization and fragmentation have their pros and cons, and this paper argues that stakeholders can capture the best of both worlds through exploiting the nonrivalry of information. In particular, researchers are engaged in a strategy of replication, employing specialized databases and genome browsers that combine centralized, archival data and widespread community input to provide more textured, value-added renderings of genomic information. Among other advantages, this approach has important epistemological implications, as it both reflects and reveals that genomic knowledge is the product of social consensus.
Among other implications, this study reveals that the genomic data commons is both less and more of a commons than previously thought. On one hand, it features a highly centralized data architecture. The efforts of thousands of genomic researchers around the world feed into a consortium of three publicly-sponsored databases, which members of the community may not modify directly. On the other hand, this knowledge system represents a set of commons on top of a commons. At one level, it’s an archival data repository emerging from a global community of scientists. At another level, however, the genomic data commons also encompasses many sub-communities (often organized around model organisms) that develop their own specialized databases and nomenclatures. Additionally, user groups develop meta-tools such as genome browsers and freely distribute them throughout the community, thus helping to make genomic data more intelligible.
Furthermore, this study highlights the strong role of centralization and standardization in the effective operation of a commons. The commons is often perceived as an open space free of government intervention and insulated from market demands. Indeed, the genomic data commons has been structured quite conscientiously to operate outside of the legal and economic influence of patents. However, the genomic data commons underscores that commons-based productivity systems are not simply "free for alls" lacking order or regulation. Too much control, and the power of sharing, parallel processing, and peer production goes unrealized. Too little control, however, and the commons just dissipates into chaos and entropy. Truly effective commons function at the balance of centralization and fragmentation.
Peter Lee is Professor of Law and Chancellor's Fellow at the University of California, Davis, School of Law. He can be reached at ptrlee at ucdavis.edu.