For the Unlocking the Black Box conference -- April 2, 2016 at Yale Law School
Medicine is a notoriously
unpredictable science. A treatment that provides a miraculous recovery for one
patient may do nothing for the next. A new chemotherapy drug may extend patient
lives by two years on average, but that average is made up of some patients who
live many years longer and some patients whose lives are not extended at all,
or even are shortened. And with new drugs costing more and more money, personalizing
medicine is increasingly important, so that doctors can predict disease
risk and choose treatments tailored for individual patients.
This unpredictability has a simple
cause. The human body is extraordinarily complex, with endless genetic
variations, biological pathways, protein expression patterns, metabolite
concentrations, and exercise patterns (to name just a few of the dozens of
variables) affecting each person differently. And only a few of these variables
are well-understood by scientists. When a drug doesn’t work, then, or a patient
develops a rare disease, it could be because of some genetic variation, or a
particular metabolite concentration, or several of these things acting together
in ways doctors may never understand.
Black-box
medicine—the use of big data and sophisticated machine-learning techniques
in opaque medical applications—could be the answer. It takes significant time, money, and luck
for scientists to discover the precise combination of variables that makes a
drug work or not—if it can be discovered that way at all—but with enough data,
a machine-learning algorithm could find a predictive correlation much more
rapidly. Using datasets of genetic and health information, then, researchers
can uncover previously unknown connections between patient characteristics,
symptoms, and medical conditions. And these connections promise to yield new
diagnostic tests and treatments and to enable individually tailored medical
decisions.
Big-data techniques are only as
powerful as the input data and the methods used to analyze those data. Health
care is especially ripe for a big-data
revolution, though, because of the sheer quantity of data available:
researchers can obtain an endless variety of data points from literally
millions of patients. And because assembling and analyzing such large-scale
datasets is becoming easier
and cheaper by the hour, many different researchers, from both industry and
the academy, are working on ways of using data for everything from guiding
choices between different drugs to best allocating scarce hospital resources
among different patients.
The sheer scale and scope of
health data available to researchers, and the sensitivity of that data, lead to
two related but opposing problems. The first problem is algorithmic
accountability. Because of the black-box nature of big-data techniques and the
sheer complexity of biological systems, it can be difficult or impossible to
know if conclusions drawn are incomplete, inaccurate, or biased, whether due to
data or analytical limitations or due to intentional interference. These
conclusions can sometimes be validated by researchers or government agencies,
but doing so can be expensive and difficult, and can require access to the same
extensive medical data from which the conclusions were drawn.
The second problem is privacy.
Medical information is some of the most private and sensitive information that
exists, and black-box medicine requires access to a lot of that information. It
also creates new information, like predictions based on the models developed
with big data. And this information may be used in ways that harms individuals,
whether through marketing, sales to others, or discrimination in employment,
insurance, or other decisions. Even when it is not used in these ways, its
collection, disclosure, and use can infringe individual autonomy and decisional
privacy.
These two problems are
interrelated because efforts to reduce one will usually make the other worse.
The solution to the accountability problem is to validate black-box models, but
that requires access to more information, which can exacerbate the privacy
problem. And the solution to the privacy problem is to limit the amount of
information to which researchers, companies, and the government have access,
but that can make it harder to validate models and easier to hide or overlook
algorithmic problems. Algorithms need to be validated to ensure high-quality
medicine, but at the same time, a data free-for-all would eviscerate patient
privacy.
Solutions to the accountability
and privacy problems, then, must consider the broader effects on black-box
medicine. We propose three pillars to an effective verification system that
respects patient privacy. The first is a system of limitations on the collection,
use, and dissemination of medical data, so that data that is gathered and used
to develop and verify black-box algorithms is not also used for illegitimate
purposes. The second is a system of independent gatekeepers to govern access
to, and transmission of, patient data, so that government and independent
researchers can work to verify big-data models. And the third is robust
information-security provisions, so that unintended outsiders cannot obtain,
use, or disseminate patient data. The design of these verification systems can
draw on the ongoing debate over the disclosure of clinical-trial
data, which has addressed related issues of how to promote data sharing
without sacrificing patient privacy. These verification systems provide the
greatest means of ensuring that black-box medicine lives up to its promise
through a system of privacy-protecting verification.
Roger
Ford and Nicholson Price are
Assistant Professors of Law at the University of
New Hampshire School of Law. They
welcome comments on this ongoing project and can be reached at roger.ford or
nicholson.price, both at law.unh.edu.