Balkinization: Industrial Policy for Big Data

Industrial Policy for Big Data

Guest Blogger

Frank Pasquale

For the conference on Innovation Law Beyond IP at Yale Law School

The bigger the data set, the more correlations one can observe (and exploit). For example, if you’re childless, shop for clothing online, spend a lot on cable TV, and drive a minivan, data brokers are probably going to assume you’re heavier than average. We know that drug companies may us that data to recruit research subjects. Marketers could utilize the data to target ads for diet aids, or for types of food that research reveals to be particularly favored by people who are childless, shop for clothing online, spend a lot on cable TV, and drive a minivan.

We may also reasonably assume that the data can be put to darker purposes: for example, to offer credit on worse terms to the obese (stereotype-driven assessment of looks and abilities reigns from Silicon Valley to experimental labs). And perhaps some day it will be put to higher purposes: for example, identifying "obesity clusters" that might be linked to overexposure to some contaminant.

To summarize: let's roughly rank these biosurveillance goals as:

1) Curing illness or precursors to illness (identifying the obesity cluster; clinical trial recruitment)

2) Helping match those offering products to those wanting them (food marketing)

3) Promoting the classification and de facto punishment of certain groups (identifying a certain class as worse credit risks)

At present, law does not do enough to recognize how valuable goals like 1) are, and how destructive 3) could become. In fact, to the extent 1 is highly regulated, and 3 is unregulated, law may perversely help channel capital into discriminatory ventures and away from socially productive ones.

"So deregulate all of it!", a well-funded lobby might reply. But we need to update anti-discrimination law and policy, not simply give up on it in the face of big-data driven construction of new minorities. Reputation intermediaries outside the health sector are now using data not covered by HIPAA to impute health conditions to individuals. As the former CIO of Google (& CEO of ZestFinance) puts it, “[A]ll data is credit data, we just don’t know how to use it yet." A lawyer might respond: "all data is health data," too, and should be subject to HIPAA and HITECH strictures.

We need to distinguish between innovation and discrimination. If a firm like ZestFinance finds out that the obese (or people with minivans) are worse credit risks, and imposes a higher interest rate on them, I question whether that is "innovation" as valuable as, say, finding better ways of curing a disease, growing food, or cooking a meal. It may, instead, merely be a way for industry to arrogate to itself a quasi-juridical role of punishing one group and forcing them to generate more rents for the finance sector.

A recent review of Julia Angwin's excellent book "Dragnet Nation" (a muckraking take on privacy) concluded that its "lack of a more radical critique of digital capitalism may say more about the scope of the problem than our paucity of solutions." But some academics and activists are addressing fundamental issues. Our innovation (and privacy) law must recognize that a cancer cure is of greater value than a tool that helps companies avoid hiring people who are likely to have cancer. The ever-insightful Tarleton Gillespie offers one way of doing so:

The third party data broker who buys data from an e-commerce site I frequent, or scrapes my publicly available hospital discharge record, or grabs up the pings my phone emits as I walk through town [is] building commercial value on my data, but offer me no value to me, my community, or society in exchange. So what I propose is a “pay it back tax” on data brokers. . . .

If a company collects, aggregates, or scrapes data on people, and does so not as part of a service back to those people . . . then they must grant access to their data and access 10% of their revenue to non-profit, socially progressive uses of that data. This could mean they could partner with a non-profit, provide them funds and access to data, to conduct research. Or, they could make the data and dollars available as a research fund that non-profits and researchers could apply for. Or, as a nuclear option, they could avoid the financial requirement by providing an open API to their data. . . . . I think there could be valuable partnerships: Turnstyle’s data might be particularly useful for community organizations concerned about neighborhood flow or access for the disabled; health data could be used by researchers or activists concerned with discrimination in health insurance. There would need to be parameters for how that data was used and protected by the non-profits who received it, and perhaps an open access requirement for any published research or reports.

Gillespie's proposal addresses core problems of our increasingly big data driven (and intermediary driven) economy: law's agnosticism as to the ultimate productive value of what innovators are doing. Yiren Lu recently asked, "Why do . . . smart, quantitatively trained engineers, who could help cure cancer or fix healthcare.gov, want to work for a sexting app?" The answer is pretty obvious: the money. If we develop an elaborate set of laws that channels billions of dollars to at best reallocative (and at worst, flat out discriminatory) endeavors, we shouldn't be surprised when tech talent flocks to them.

If we want entrepreneurs to use big data for higher ends, we have to change the incentives. The question is not: "should the US have an industrial policy for big data?"--we already have a highly dysfunctional one. We should, instead, focus on improving returns for those who contribute to real gains in productivity.

Frank Pasquale is a professor of law at The University of Maryland. He can be reached at pasquale.frank at gmail.comhttps://www.healthcare.gov/

Posted 3:49 PM by Guest Blogger [link]