Balkinization: Privacy and Accountability in Black-Box Medicine

Roger Ford and Nicholson Price

For the Unlocking the Black Box conference -- April 2, 2016 at Yale Law School

Medicine is a notoriously unpredictable science. A treatment that provides a miraculous recovery for one patient may do nothing for the next. A new chemotherapy drug may extend patient lives by two years on average, but that average is made up of some patients who live many years longer and some patients whose lives are not extended at all, or even are shortened. And with new drugs costing more and more money, personalizing medicine is increasingly important, so that doctors can predict disease risk and choose treatments tailored for individual patients.

This unpredictability has a simple cause. The human body is extraordinarily complex, with endless genetic variations, biological pathways, protein expression patterns, metabolite concentrations, and exercise patterns (to name just a few of the dozens of variables) affecting each person differently. And only a few of these variables are well-understood by scientists. When a drug doesn’t work, then, or a patient develops a rare disease, it could be because of some genetic variation, or a particular metabolite concentration, or several of these things acting together in ways doctors may never understand.

Black-box medicine—the use of big data and sophisticated machine-learning techniques in opaque medical applications—could be the answer. It takes significant time, money, and luck for scientists to discover the precise combination of variables that makes a drug work or not—if it can be discovered that way at all—but with enough data, a machine-learning algorithm could find a predictive correlation much more rapidly. Using datasets of genetic and health information, then, researchers can uncover previously unknown connections between patient characteristics, symptoms, and medical conditions. And these connections promise to yield new diagnostic tests and treatments and to enable individually tailored medical decisions.

Big-data techniques are only as powerful as the input data and the methods used to analyze those data. Health care is especially ripe for a big-data revolution, though, because of the sheer quantity of data available: researchers can obtain an endless variety of data points from literally millions of patients. And because assembling and analyzing such large-scale datasets is becoming easier and cheaper by the hour, many different researchers, from both industry and the academy, are working on ways of using data for everything from guiding choices between different drugs to best allocating scarce hospital resources among different patients.

The sheer scale and scope of health data available to researchers, and the sensitivity of that data, lead to two related but opposing problems. The first problem is algorithmic accountability. Because of the black-box nature of big-data techniques and the sheer complexity of biological systems, it can be difficult or impossible to know if conclusions drawn are incomplete, inaccurate, or biased, whether due to data or analytical limitations or due to intentional interference. These conclusions can sometimes be validated by researchers or government agencies, but doing so can be expensive and difficult, and can require access to the same extensive medical data from which the conclusions were drawn.

The second problem is privacy. Medical information is some of the most private and sensitive information that exists, and black-box medicine requires access to a lot of that information. It also creates new information, like predictions based on the models developed with big data. And this information may be used in ways that harms individuals, whether through marketing, sales to others, or discrimination in employment, insurance, or other decisions. Even when it is not used in these ways, its collection, disclosure, and use can infringe individual autonomy and decisional privacy.

These two problems are interrelated because efforts to reduce one will usually make the other worse. The solution to the accountability problem is to validate black-box models, but that requires access to more information, which can exacerbate the privacy problem. And the solution to the privacy problem is to limit the amount of information to which researchers, companies, and the government have access, but that can make it harder to validate models and easier to hide or overlook algorithmic problems. Algorithms need to be validated to ensure high-quality medicine, but at the same time, a data free-for-all would eviscerate patient privacy.

Solutions to the accountability and privacy problems, then, must consider the broader effects on black-box medicine. We propose three pillars to an effective verification system that respects patient privacy. The first is a system of limitations on the collection, use, and dissemination of medical data, so that data that is gathered and used to develop and verify black-box algorithms is not also used for illegitimate purposes. The second is a system of independent gatekeepers to govern access to, and transmission of, patient data, so that government and independent researchers can work to verify big-data models. And the third is robust information-security provisions, so that unintended outsiders cannot obtain, use, or disseminate patient data. The design of these verification systems can draw on the ongoing debate over the disclosure of clinical-trial data, which has addressed related issues of how to promote data sharing without sacrificing patient privacy. These verification systems provide the greatest means of ensuring that black-box medicine lives up to its promise through a system of privacy-protecting verification.

Roger Ford and Nicholson Price are Assistant Professors of Law at the University of New Hampshire School of Law. They welcome comments on this ongoing project and can be reached at roger.ford or nicholson.price, both at law.unh.edu.

Balkinization

Pages

Tuesday, March 08, 2016

Privacy and Accountability in Black-Box Medicine