Pages

Wednesday, October 31, 2018

Big Data: Destroyer of Informed Consent

A. Michael Froomkin
Consent, that is ‘notice and choice,’ is a fundamental concept in the U.S. approach to data privacy, as it reflects principles of individual autonomy, freedom of choice, and rationality. Big Data, however, makes the traditional approach to informed consent incoherent and unsupportable, and indeed calls the entire concept of consent, at least as currently practiced in the U.S., into question.

            Big Data kills the possibility of true informed consent because by its very nature one purpose of big data analytics is to find unexpected patterns in data. Informed consent requires at the very least that the person requesting the consent know what she is asking the subject to consent to. In principle, we hope that before the subject agrees she too comes to understand the scope of the agreement. But with big data analytics, particularly those based on Machine Learning, neither party to that conversation can know what the data may be used to discover. 

Nor, given advances in re-identification, can either party know how likely it is that any given attempt to de-identify personal data will succeed. Informed consent, at least as we used to understand it, is simply not possible if medical data is to become part of Big Data, and ever so much more so if researchers intend to link personal health records with data streams drawn from non-medical sources because what we will learn with the information cannot be predicted. Similar—indeed, maybe worse—problems arise with big data analytics uses outside the context of medical research, especially as informed consent seemed a plausible solution to the problem of routinized or non-existent consent for data acquisition.

Big Data just keeps getting bigger and thus more attractive to researchers in everything from medicine to marketing. The data arise from multiple sources: some are transactional, including both online and offline commerce; some are communications such as texts, email, video chat; some is collected from personal devices including cell phone, cell phone apps, wearable health monitors, smart watches, so-called smart home technology, and other internet-of-things devices.

Another important source of Big Data is self-surveillance, in which people document their activities—and importantly, those of others—via Twitter, Facebook, Instagram, and other platforms. Increasingly, also, personal information is collected via the operation of remote sensors, whether security systems such as ccTV (increasingly paired with facial recognition software), license plate readers, or the myriad data collection programs that form so-called Smart City initiatives. On deck are connected cars, implantable devices, and the dreams of the next startup.

Since the key promise of big data is the discovery of unexpected patterns, medical and public health researchers will seek to link patient records with other ‘non-patient’ data streams (indeed the UK government has said it plans to link social media to NHS health records to do predictive care). And why not–once one has consent for using the patient data, the temptation to link it when possible to other data, usually acquired without the need for consent, or with mere ‘ordinary’ rather than informed consent, is sure to be irresistible given the potentially enormous benefits to public health.

Medical treatment and human subjects research are unusual in the U.S. in that for them U.S. law imposes a substantially heightened consent requirement. U.S. law requires physicians in all but the most critical emergency situations to get “informed consent” from patients before they undergo medical treatment. Similarly the so-called “Common Rule,”[1] the federal regulation that since 1991 has governed federally funded or sponsored human subjects research. Impending revisions to the Common Rule reflect a modern understanding of informed consent: Information provided to the subject (or her representative) must “begin with a concise and focused presentation of the key information that is most likely to assist a prospective subject or legally authorized representative in understanding the reasons why one might or might not want to participate in the research.” Information provided in an informed consent form “must be presented in sufficient detail relating to the research, and must be organized and presented in a way that does not merely provide lists of isolated facts, but rather facilitates the prospective subject's or legally authorized representative's understanding of the reasons why one might or might not want to participate.”

Unfortunately, the Revised Common Rule addresses the tension between informed consent and Big Data by, in effect, creating a work-around to informed consent. The Revised Common Rule will permit researchers to get “broad consent”–“prospective consent to unspecified future research”–instead of requiring informed consent, or even ordinary consent, on a case-by-case basis. Disclosure to the subject must provide a “general description” of the “types of research” that may be conducted with the private information or biospecimens, as well as “the types of institutions or researchers that might conduct research with” them. The disclosures must suffice to put a reasonable person on notice that they might not have consented to some of the specific research studies had they known what they were—a long way from true informed consent.  Researchers armed with sufficiently broad consent for the storage, maintenance, and use of identifiable biospecimens and data may make any secondary research uses of the individual’s identifiable biospecimens and data without the need to secure any additional consent—indefinitely, if that is the permission they asked for and received.

While getting virtual carte blanche from patients and research subjects may solve the formal legal problem from the researcher's point of view, it makes a mockery of informed consent and should raise some ethical qualms for human subjects research, especially since informed consent has been the cornerstone of conducting ethical research involving humans. Furthermore, given the unforeseeable leaps in technology (who foresaw Big Data 20 years ago?) there are good reasons to think consent should sunset rather than be made eternal.

Maybe we will develop techniques that at least protect individual privacy from exposure via Big Data research, even if it does not necessarily address any moral objections that persons might have to their information being used for a particular type of research. Meanwhile, informed consent is all we have—or had.

Even before the growth of Big Data, there were good reasons to criticize the regimes in which we required just ordinary, not-especially-informed consent, for most waivers of privacy rights, and did not even require even that much to permit the capture of information streams ‘in public.’ Sociology-based critiques suggested that most people ignore most notices most of the time. Cognitive critiques suggested that even if people do look at disclosures, they likely do not understand them, not to mention that as notices and requests for consent proliferate, people become desensitized to them and tune them out. Indeed, it seems likely that even if people suddenly had perfect information about the devices that watch them they would not be able to use that information well. The simplest form of the bounded rationality claim has to do with the amount of time it would take to process all the information and make rational decisions. More far-reaching forms of the claim invoke various cognitive limits constraining our ability to weigh risks and uncertainties, and our tendency to over-optimism. Meanwhile, I and others have argued that failing to require even basic consent before allowing unfettered capture of personally identifiable information in ‘public’ amounts to a form of privacy pollution. Requiring more and better consent—informed consent—seemed one solution to these problems, even if informed consent was far from perfect.

            Informed consent is not perfect, and indeed is subject to some of the same critiques as ordinary, oft uninformed, consent. Nevertheless, the Common Rule has been the closest thing to a gold standard of consent, a set of requirements that usually put private industry to shame. By undermining—erasing—the possibility of genuine informed consent, Big Data poses a challenge to informed consent, and indeed to all consent-based regimes designed to protect privacy. Even the logic of the EU’s GDPR, under which data processors must get consent for every new use of PII, is threatened if the knowledge needed to give informed consent does not exist at the time the consent is requested. The Revised Common Rule surrenders to this development by creating the new category of ‘broad consent,’ which will be as broad as the drafters of a consent form can make it, potentially eternal, and which is certainly not going to be informed as we previously understood the term. The justification for the change is that the public health benefits could be enormous. That might be true, but one needs also to consider the side-effects of undermining the gold standard of consent. Not only does the shift from true informed consent create an ethical challenge for medical researchers, but it likely will lower the consent ceiling for all private research whether based on home genetic testing, remote sensing in public places, or Facebook likes.

            If real consent to Big-Data-based research is not possible, the only way for individuals to preserve control over their data and the uses that others make of their data and biospecimens will be to choose not to share it in the first place. The alternatives are bleak: either people reduce their willingness to share for science (and, less bleakly, in the non-health data realm for marketing), or we must learn to view consent in the Big Data era as what it truly has become: the gift that keeps on giving, for which the donor-subject receives neither a profit share nor even a tax deduction.

A. Michael Froomkin is Laurie Silvers & Mitchell Rubenstein Distinguished Professor of Law, University of Miami; Member, University of Miami Center for Computational Science; and Affiliated Fellow, Yale Information Society Project.



[1] Codified (as amended in 2005) at 45 C.F.R. §§ 46.101-.124 (2018).