Saturday, November 03, 2018

The Devil is in the Data

Guest Blogger

Oliver J. Kim

At various points in our nation’s health history, a new technological advance is hyped as the silver bullet for our healthcare system. Of course, it is an axiom of law and public policy that the speed at which technology advances, vastly outpaces the law—that’s why we are coming together for this conference. Without legal, policy, and ethical guidelines to balance innovation, these breakthroughs may lead to unforeseen or even negative consequences for our society in our efforts to make healthcare more affordable and accessible.

One area that I focus on is how technology can be leveraged to reduce health disparities. Concerns about disparities can often focus on the relationship between innovation and costs: if these disruptive technologies are only be available to those who can best afford them, they will continue to widen the healthcare and digital divides in our society.

But there is another area of concern: who is actually in the data? The simplest way to illustrate this concern came from Jerry Smith, the Vice President of Data Sciences and Artificial Intelligence at Cognizant, at a Politico forum on AI. Type “grandpa” into Google’s image search and see what pictures come up. The vast majority of images are old, white men, and when I did my search for this blog, I scrolled through seven rows before I spotted an African American and down to the twentieth before I see a second. Perhaps because it is close to Halloween, I spotted a zombie grandpa and a Sponge Bob grandpa before even seeing an image even remotely depicting someone of my paternal grandpa’s ethnicity.

There is a Catch 22 about equity in the use of big data. Among many communities of color—often those most hurt by health disparities and in need of greater healthcare access—there is a historic mistrust in the healthcare system. Many individuals may fear giving up data due to uncertainties over who has access and how it may be used against them in unforeseen ways. But without this data, we are building systems that may not reflect our society as a whole.

We know well of numerous examples of medical experiments on low-income black communities. These events still have far-reaching effects: as Harriet Washington wrote in Medical Apartheid, “Mainstream medical scientists, journals, and even some news media fail to evaluate these fears in the light of historical and scientific fact and tend instead to dismiss all such doubts and fears as antiscience.” These concerns resonate even today in various aspects of care: in a community study of Washtenaw County, Michigan, African-American participants in a focus group revealed they were concerned about sharing information related to their end-of-life wishes because they were concerned that it could be used against them to ration their care. Current political trends also may make patients—particularly those seeking care that is either stigmatized or at odds with federal policy—fearful of sharing data or even accessing care.

But the datasets that inform our technologies may be biased towards a whiter, more affluent construct of American society and fail to pick up on nuances to create a richer, more accurate picture of society as a whole. For example, the term “Asian American” refers to a wide array of very different ethnicities with varied cultures, languages, socioeconomic statuses, and immigrant experiences. But being able to parse out this diversity has huge implications, particularly in health policy, for the Asian American-Pacific Islander (AAPI) community. One often-cited example is that the incidence of colorectal cancer appears to be similar between whites and Asian Americans as a whole, but when data on Asian Americans was disaggregated, researchers found that certain Asian ethnicities have lower screening rates. In other words, if AAPIs are viewed as a whole, it would be difficult to notice that difference but if the data is further sliced, it is possible to see significant variation. Data disaggregation is a huge issue for AAPI organizations such as the Asian American & Pacific Islander Health Forum, of which I am a board member.

Some of technology’s limits are due to the biases of its human creators. Often in designing a policy or a product, we may fail to meet people where they are. For example, the means that patients use to access patient portals—or get online in general—can present a barrier for some communities to fully access their data. For many African American and Latino patients, a smartphone, not a desktop computer or a tablet, is the most common device for going online. However, such devices may not be suitable for accessing health records: “Although it is possible for patients with smartphones to access any available computer-based PHR using their mobile devices, websites that are not optimized for mobile use can be exceedingly difficult to navigate using the relatively small-sized smartphone screens.” Moreover, federal Medicare and Medicaid incentives for the meaningful use of electronic medical records “do not require that PHRs be easily accessible via mobile devices.

If our data is “bedeviled” because it is not fully comprehensive yet the potential sources—many individuals who may have strong feelings about the healthcare system and value their privacy—of such missing data are reluctant to share, how do we exorcise this devil in the data? Indeed, tools such as artificial intelligence and machine learning threaten to exacerbate health disparities and mistrust in the healthcare system if they are built on a data infrastructure that does not truly look like American society.

What can the law do to address these issues? I’ll be discussing in a forthcoming paper for the conference, tools that policymakers could utilize to help diversify health data by encouraging an environment of trust, security, and accountability between patients and the research community. Policymakers can regulate, including prohibit, behavior that runs counter to their policy goals. For example, a series of federal laws—including Section 185 of the Medicare Improvements for Patients and Providers Act, Section 3002 of the Health Information Technology for Economic and Clinical Health Act, and Section 4302 of the Affordable Care Act—were supposed to encourage more rigorous reporting requirements for Medicare, Medicaid, and the Children’s Health Insurance Program as well as federally certified EMRs. Such richer data sets would “represent a powerful new set of tools to move us closer to our vision of a nation free of disparities in health and health care.” However, such requirements are only useful if they are utilized or enforced.

We have high hopes for using data to improve care: “For example, epigenetic data, if validated in large-scale data, could be used to address health disparities and environmental justice.” That “if” though is crucial, and many demons need to be exorcised from the data before the hype over such data and its related uses meets our actual reality. As Dr. Smith noted, “All the data we get from our lives by its nature has biases built into it.” Bias doesn’t mean animus necessarily, but it does mean we need to think through the data—how it was collected, who is represents—before accepting it carte blanche. 

Oliver J. Kim is Adjunct Professor of Law at the University of Pittsburgh, and Principal, Mousetrap Consulting. You can reach him by e-mail at oliver at

Older Posts
Newer Posts