Consent, that is ‘notice and
choice,’ is a fundamental concept in the U.S. approach to data privacy, as it
reflects principles of individual autonomy, freedom of choice, and rationality.
Big Data, however, makes the traditional approach to informed consent
incoherent and unsupportable, and indeed calls the entire concept of consent,
at least as currently practiced in the U.S., into question.
Big Data
kills the possibility of true informed consent because by its very nature one
purpose of big data analytics is to find unexpected patterns in data. Informed
consent requires at the very least that the person requesting the consent know
what she is asking the subject to consent to. In principle, we hope that before
the subject agrees she too comes to understand the scope of the agreement. But
with big data analytics, particularly those based on Machine Learning, neither party to that conversation can know
what the data may be used to discover.
Nor, given advances in
re-identification, can either party know how likely it is that any given
attempt to de-identify personal data will succeed. Informed consent, at least
as we used to understand it, is simply not possible if medical data is to
become part of Big Data, and ever so much more so if researchers intend to link
personal health records with data streams drawn from non-medical sources
because what we will learn with the information cannot be predicted.
Similar—indeed, maybe worse—problems arise with big data analytics uses outside
the context of medical research, especially as informed consent seemed a
plausible solution to the problem of routinized or non-existent consent for
data acquisition.
Big Data just keeps getting bigger
and thus more attractive to researchers in everything from medicine to
marketing. The data arise from multiple sources: some are transactional,
including both online and offline commerce; some are communications such as
texts, email, video chat; some is collected from personal devices including
cell phone, cell phone apps, wearable health monitors, smart watches, so-called
smart home technology, and other internet-of-things devices.
Another important source of Big
Data is self-surveillance, in which people document their activities—and
importantly, those of others—via Twitter, Facebook, Instagram, and other
platforms. Increasingly, also, personal information is collected via the
operation of remote sensors, whether security systems such as ccTV
(increasingly paired with facial recognition software), license plate readers,
or the myriad data collection programs that form so-called Smart City
initiatives. On deck are connected cars, implantable devices, and the dreams of
the next startup.
Since the key promise of big data
is the discovery of unexpected patterns, medical and public health researchers
will seek to link patient records with other ‘non-patient’ data streams (indeed
the UK government has said it plans to link social media to NHS health records
to do predictive care). And why not–once one has consent for using the patient
data, the temptation to link it when possible to other data, usually acquired
without the need for consent, or with mere ‘ordinary’ rather than informed
consent, is sure to be irresistible given the potentially enormous benefits to
public health.
Medical treatment and human
subjects research are unusual in the U.S. in that for them U.S. law imposes a
substantially heightened consent requirement. U.S. law requires physicians in
all but the most critical emergency situations to get “informed consent” from
patients before they undergo medical treatment. Similarly the so-called “Common
Rule,”[1]
the federal regulation that since 1991 has governed federally funded or
sponsored human subjects research. Impending revisions
to the Common Rule reflect a modern understanding of informed consent:
Information provided to the subject (or her representative) must “begin with a
concise and focused presentation of the key information that is most likely to
assist a prospective subject or legally authorized representative in
understanding the reasons why one might or might not want to participate in the
research.” Information provided in an informed consent form “must be presented
in sufficient detail relating to the research, and must be organized and
presented in a way that does not merely provide lists of isolated facts, but
rather facilitates the prospective subject's or legally authorized
representative's understanding of the reasons why one might or might not want
to participate.”
Unfortunately, the Revised Common
Rule addresses the tension between informed consent and Big Data by, in effect,
creating a work-around to informed consent. The Revised Common Rule will permit
researchers to get “broad consent”–“prospective consent to unspecified future
research”–instead of requiring informed consent, or even ordinary consent, on a
case-by-case basis. Disclosure to the subject must provide a “general
description” of the “types of research” that may be conducted with the private
information or biospecimens, as well as “the types of institutions or
researchers that might conduct research with” them. The disclosures must
suffice to put a reasonable person on notice that they might not have consented
to some of the specific research studies had they known what they were—a long
way from true informed consent.
Researchers armed with sufficiently broad consent for the storage,
maintenance, and use of identifiable biospecimens and data may make any
secondary research uses of the individual’s identifiable biospecimens and data
without the need to secure any additional consent—indefinitely, if that is the
permission they asked for and received.
While getting virtual carte blanche from patients and research
subjects may solve the formal legal problem from the researcher's point of
view, it makes a mockery of informed consent and should raise some ethical
qualms for human subjects research, especially since informed consent has been
the cornerstone of conducting ethical research involving humans. Furthermore,
given the unforeseeable leaps in technology (who foresaw Big Data 20 years
ago?) there are good reasons to think consent should sunset rather than be made
eternal.
Maybe we will develop techniques
that at least protect individual
privacy from exposure via Big Data research, even if it does not
necessarily address any moral objections that persons might have to their
information being used for a particular type of research. Meanwhile, informed
consent is all we have—or had.
Even before the growth of Big Data,
there were good reasons to criticize the regimes in which we required just
ordinary, not-especially-informed consent, for most waivers of privacy rights,
and did not even require even that much to permit the capture of information
streams ‘in public.’ Sociology-based
critiques suggested that most people ignore most notices most of the time. Cognitive critiques suggested that even if people do look at disclosures,
they likely do not understand them, not to mention that as
notices and requests for consent proliferate, people become desensitized to
them and tune them out. Indeed,
it seems likely that even if people suddenly had perfect information about the
devices that watch them they would not be able to use that information well.
The simplest form of the bounded rationality claim has to do with the amount of
time it would take to process all the information and make rational decisions.
More far-reaching forms of the claim invoke various cognitive limits
constraining our ability to weigh risks and uncertainties, and our tendency to
over-optimism. Meanwhile, I and others have argued that failing to require even basic
consent before allowing unfettered capture of personally identifiable
information in ‘public’ amounts to a form of privacy pollution. Requiring
more and better consent—informed consent—seemed one solution to these problems,
even if informed consent was far from perfect.
Informed
consent is not perfect, and indeed is subject to some of the same critiques as
ordinary, oft uninformed, consent. Nevertheless, the Common Rule has been the
closest thing to a gold standard of consent, a set of requirements that usually
put private industry to shame. By undermining—erasing—the possibility of
genuine informed consent, Big Data poses a challenge to informed consent, and
indeed to all consent-based regimes designed to protect privacy. Even the logic
of the EU’s GDPR, under which data processors must get consent for every new
use of PII, is threatened if the knowledge needed to give informed consent does
not exist at the time the consent is requested. The Revised Common Rule
surrenders to this development by creating the new category of ‘broad consent,’
which will be as broad as the drafters of a consent form can make it,
potentially eternal, and which is certainly not going to be informed as we previously
understood the term. The justification for the change is that the public health
benefits could be enormous. That might be true, but one needs also to consider
the side-effects of undermining the gold standard of consent. Not only does the
shift from true informed consent create an ethical challenge for medical
researchers, but it likely will lower the consent ceiling for all private
research whether based on home genetic testing, remote sensing in public
places, or Facebook likes.
If real
consent to Big-Data-based research is not possible, the only way for
individuals to preserve control over their data and the uses that others make
of their data and biospecimens will be to choose not
to share it in the first place. The alternatives are bleak: either people
reduce their willingness to share for science (and, less bleakly, in the
non-health data realm for marketing), or we must learn to view consent in the
Big Data era as what it truly has become: the gift that keeps on giving, for
which the donor-subject receives neither a profit share nor even a tax
deduction.
A. Michael Froomkin is Laurie Silvers & Mitchell Rubenstein Distinguished Professor of Law, University of Miami; Member, University of Miami Center for Computational Science; and Affiliated Fellow, Yale Information Society Project.
A. Michael Froomkin is Laurie Silvers & Mitchell Rubenstein Distinguished Professor of Law, University of Miami; Member, University of Miami Center for Computational Science; and Affiliated Fellow, Yale Information Society Project.