Thursday, March 31, 2016

Accountable Algorithms

Guest Blogger

Joshua A. Kroll, Joanna Huey, Solon Barocas, Edward W. Felten, Joel R. Reidenberg, David G. Robinson, and Harlan Yu, Princeton University

For the “Unlocking the Black Box” Conference, April 2 at Yale Law School

Computers make decisions across a wide and growing spectrum of modern life; from consequential decisions such as counting votes, assigning visas, or approving credit to mundane decisions such as controlling the internal operation of cars, aircraft, and home appliances, automated decision making will only grow in importance.

However, as several recent news stories demonstrate, the governance of these systems is lacking, in large part because the approaches for governing human-mediated decision systems do not translate well to automated ones. This can be seen through many examples that have made news just in the past year: whether it is televisions that might be transmitting what you say in your living room to a foreign company; vehicle control computers programmed to detect emissions testing and reconfigure cars to defeat the test standards; or startups using machine learning to make loans to borrowers with nontraditional credit scoring standards and triggering discrimination concerns, the pattern is the same: software-mediated decisions today do not receive the same level of oversight as human-mediated decisions, nor is it easy to trust that they can be effectively overseen.

One approach to solving these problem is through transparency: if only we knew what software was running for any given decision and what data were fed to it, so the argument goes, we would understand whether the decision was made in a way which is socially, politically, or legally acceptable. However, on its own, transparency cannot solve many of these concerns.

Technologists think about trust and assurance for computer systems a bit different from policymakers, seeking strong formal guarantees or trustworthy digital evidence that a system works as it is intended to or complies with a rule or policy objective rather than simple assurances that a piece of software acts in a certain way. This is because software is a discrete object, from which observations for particular inputs and outputs can only be generalized with knowledge of the program’s internal structure. Additionally, a computer system can run software that does anything a computer is capable of doing, and it is very hard to tell which software is actually running in any particular instance.

Disclosure of software source code or the data being fed into it can also run up against legitimate security and privacy concerns. Some systems or their data are even subject to legal restrictions on disclosure, while others might be protected by trade secret doctrine or contractual nondisclosure obligations. Many systems, such as those used for selecting people or packages for enhanced security checking, could not be made transparent without risking some strategic gaming by the subjects of those systems, rendering them less effective. As a result, full transparency is not always desirable or practical.

Even when we do know what software a system is running, there are fundamental technical limitations on how well that software can be analyzed to determine its behavior (these limits are important in practice, as any researcher trying to identify malware knows well). And in at least some cases, even understanding the software is not of much use: when a program interacts with and depends on its environment (as most interesting software does in at least some way), we can only be certain of its behavior if we understand the precise nature of those interactions. For example, the software code implementing a lottery might tell us that the lottery is a fair random selection, but it does not tell us how to repeat any particular lottery or whether the announced results of the lottery were the result of a single fair execution or the result of post-selection across many runs of the software. That is, transparency is not always helpful in determining what a computer system actually did.

Even if the inputs and outputs of a system are made transparent, it may be difficult or impossible to determine facts of interest about that system. For example, many systems, such as those used for credit and hiring decisions, must collect a large amount of sensitive demographic information as a matter of their normal operation, but are legally barred from using much of that information as a basis for decisions. A credit application may well contain the applicant’s gender, or an address on which basis a strong inference of race or religion can be made. If only the inputs and outputs are made transparent, we fundamentally cannot know whether or not these protected classes of information are or are not part of the basis for an individual decision. When applied correctly and to a broad set of inputs, however, this sort of black box testing can help elucidate certain classes of discrimination issues.

Fortunately, techniques from computer science can help build accountable algorithms. Our work proposes an alternate regime in which transparency is supplemented by design changes that make transparency more effective. We propose, for example, that policymakers should think about computer systems in terms of invariants, or facts about a system that are true regardless of input or interaction with the environment, just as computer scientists do. In this way, the oversight process can focus on what it actually needs to learn from a computer system and ensure that it is possible to discover this information. Even when all or portions of a computer system must be kept secret, careful review of what needs to be known about that system can allow accountability.

As an example, consider a healthcare system which handles confidential patient data. Although it is very challenging to determine with certainty that the system is secure against hacking, by encrypting the data while it is stored, the system’s designer can be more confident that even if the system is compromised, the confidential data will not be leaked. In this case, one need not see the full internals of the system to understand the gain from encryption. It is enough to know when the system encrypts or decrypts data, how it stores the data encryption keys, and when and how these are used.

In fact, it is often possible to achieve oversight with only partial transparency, where techniques from computer science are used to build relationships between the things that are disclosed and things that must be kept secret, in order to ensure properties of interest such as fairness or procedural regularity, the property that decisions are made under a pre-announced rule consistently applied in all cases. We describe how to achieve procedural regularity with partial transparency using a combination of techniques from cryptography. We then describe how procedural regularity is the basis for investigations of other topics of interest, such as whether a computer system is discriminatory or compliant with applicable law.

In conclusion, our work on accountable algorithms shows that transparency alone is not enough: we must have transparency of the right information about how a system works. Both transparency and the evaluation of computer systems as inscrutable black boxes, against which we can only test the relationship of inputs and outputs, both fail on their own to effect even the most basic procedural safeguards for automated decision making. And without a notion of procedural regularity on which to base analysis, it is fruitless to inquire as to a computer system’s fairness or compliance with norms of law, politics, or social acceptability. Fortunately, the tools of computer science provide the necessary means to build computer systems that are fully accountable. Both transparency and black-box testing play a part, but if we are to have accountable algorithms, we must design for this goal from the ground up.


Older Posts
Newer Posts