16 September 2013

Consumer Privacy Reviews and Anonymisation

Ethics review committees for consumer research? 'Consumer Subject Review Boards: A Thought Experiment' by Ryan Calo in (2013) 66 Stanford Law Review Online comments [PDF] that
The adequacy of consumer privacy law in America is a constant topic of debate. The majority position is that United States privacy law is a “patch-work,” that the dominant model of notice and choice has broken down, and that decades of self-regulation have left the fox in charge of the henhouse. 
A minority position chronicles the sometimes surprising efficacy of our current legal infrastructure. Peter Swire describes how a much-maligned disclosure law improved financial privacy not by informing consumers, but by forcing firms to take stock of their data practices. Deirdre Mulligan and Kenneth Bamberger argue, in part, that the emergence of the privacy professional has translated into better privacy on the ground than what you see on the books. 
There is merit to each view. But the challenges posed by big data to consumer protection feel different. They seem to gesture beyond privacy’s foundations or buzzwords, beyond “fair information practice principles” or “privacy by design.” The challenges of big data may take us outside of privacy altogether into a more basic discussion of the ethics of information. The good news is that the scientific community has been heading down this road for thirty years. I explore a version of their approach here. 
Part I discusses why corporations study consumers so closely, and what harm may come of the resulting asymmetry of information and control. Part II explores how established ethical principles governing biomedical and behavioral science might interact with consumer privacy.
Calo goes on to comment that
People have experimented on one another for hundreds of years. America and Europe of the twentieth century saw some particularly horrible abuses. In the 1970s, the U.S. Department of Health, Education, and Welfare commissioned twelve individuals, including two law professors, to study the ethics of biomedical and behavioral science and issue detailed recommendations. The resulting Belmont Report — so named after an intensive workshop at the Smithsonian Institute’s Belmont Conference Center — is a statement of principles that aims to assist researchers in resolving ethical problems around human-subject research.
The Report emphasizes informed consent — already a mainstay of consumer privacy law.In recognition of the power dynamic between experimenter and subject, however, the Report highlights additional principles of “beneficence” and “justice.” Beneficence refers to minimizing harm to the subject and society while maximizing benefit — a kind of ethical Learned Hand Formula. Justice prohibits unfairness in distribution, defined as the undue imposition of a burden or withholding of a benefit. The Department of Health, Education, and Welfare published the Belmont Report verbatim in the Federal Register and expressly adopted its principles as a statement of Department policy.
Today, any academic researcher who would conduct experiments involving people is obligated to comply with robust ethical principles and guidelines for the protection of human subjects, even if the purpose of the experiment is to benefit those people or society. The researcher must justify her study in advance to an institutional, human subject review board (IRB) comprised of peers and structured according to specific federal regulations.But a private company that would conduct experiments involving thousands of consumers using the same basic techniques, facilities, and personnel faces no such obliga- tions, even where the purpose is to profit at the expense of the research subject.
Subjecting companies to the strictures of the Belmont Report and academic institutional review would not be appropriate. Firms must operate at speed and scale, protect trade secrets, and satisfy investors. Their motivations, cultures, and responsibilities differ from one another, let alone universities. And that is setting aside the many criticisms of IRBs in their original context as plodding or skewed. till, companies interested in staying clear of scandal, lawsuit, and regulatory action could stand to take a page from biomedical and behavioral science.
The thought experiment is simple enough: the Federal Trade Commission, Department of Commerce, or industry itself commissions an interdisciplinary report on the ethics of consumer research. The report is thoroughly vetted by key stakeholders at an intensive conference in neutral territory (say, the University of Washington). As with the Belmont Report, the emphasis is on the big picture, not any particular practice, effort, or technology. The articulation of principles is incorporated in its entirety in the Federal Register or an equivalent. In addition, each company that conducts consumer research at scale creates a small internal committee comprised of employees with diverse training (law, engineering) and operated according to predetermined rules. Initiatives clearly intended to benefit consumers could be fast-tracked whereas, say, an investigation of how long moviegoers will sit through commercials before demanding a refund will be flagged for further review.
The result would not be IRBs applying the Belmont Report. I suspect Consumer Subject Review Boards (CSRBs) would be radically different. I am not naïve enough to doubt that any such effort would be rife with opportunities to pervert and game the system. But the very process of systematically thinking through ethical consumer research and practice, coupled with a set of principles and bylaws that help guide evaluation, should enhance the salutary dynamics proposed by Mulligan, Bamberger, Swire, and others.
Industry could see as great a benefit as consumers. First, a CSRB could help unearth and head off media fiascos before they materialize. No company wants to be the subject of an article in a leading newspaper with the title How Companies Learn Your Secrets. Formalizing the review of new initiatives involving consumer data could help policy managers address risk. Second, CSRBs could increase regulatory certainty, perhaps forming the basis for an FTC safe harbor if sufficiently robust and transparent. Third, and most importantly, CSRBs could add a measure of legitimacy to the study of consumers for profit. Any consumer that is paying attention should feel like a guinea pig, running blindly through the maze of the market. And guinea pigs benefit from guidelines for ethical conduct. 
I offer CSRBs as a thought experiment, not a panacea. The accelerating asymmetries between firms and consumers must be domesticated, and the tools we have today feel ill suited. We need to look at alternatives. No stone, particular one as old and solid as research ethics, should go unturned.
'Privacy and Data-Based Research' by Ori Heffetz and Katrina Ligett offers a perspective on big data and deanonymisation, asking 
What can we, as users of microdata, formally guarantee to the individuals (or firms) in our dataset, regarding their privacy? We retell a few stories, well-known in data-privacy circles, of failed anonymization attempts in publicly released datasets. We then provide a mostly informal introduction to several ideas from the literature on differential privacy, an active literature in computer science that studies formal approaches to preserving the privacy of individuals in statistical databases. We apply some of its insights to situations routinely faced by applied economists, emphasizing big-data contexts.
The authors conclude
Privacy concerns in the face of unprecedented access to big data are nothing new. More than thirty-five years ago, Dalenius (1977) already discusses "the proliferation of computerized information system[s]" and "the present era of public concern about 'invasion of privacy'." But as big data get bigger, so do the concerns. Greely (2007) discusses genomic databases, concluding that
[t]he size, the cost, the breadth, the desired broad researcher access, and the likely high public pro le of genomic databases will make these issues especially important to them. Dealing with these issues will be both intellectually and politically difficult, time-consuming, inconvenient, and possibly expensive. But it is not a solution to say that "anonymity" means only "not terribly easy to identify," . . . or that "informed consent"  is satisfied by largely ignorant blanket permission.
Replacing "genomic databases" with "big data" in general, our overall conclusion may be similar. The stories in the first part of this paper demonstrate that relying on intuition when attempting to protect subject privacy may not be enough. Moreover, privacy failures may occur even when the raw data are never publicly released and only some seemingly innocuous function of the data, such as a statistic, is published. The purpose of these stories is to increase awareness.
The ideas from the differential privacy literature we introduce in the second part of this paper provide one formal way for thinking about the notion of privacy that researchers may want to guarantee to subjects. They also provide a framework, or a tool, for thinking quantitatively about privacy-accuracy tradeoff s. We would like to see more such thinking among data-based researchers. In particular, with computer scientists using phrases such as - the amount of privacy loss" and - the privacy budget," the time seems ripe for more economists to join the conversation. Is a certain lifetime amount of e a basic right? Is privacy a term in the utility function that can in principle be compared against the utility from access to accurate data? Should fungible, transferable e be allowed to be sold in markets from private individuals to potential data users, and if so, what would its price be? Should a certain privacy budget be allocated across interested users of publicly owned (e.g., Census) data, and if so, how? Such questions are beginning to receive attention, as mentioned above. Increased attention may eventually bring change to common practices.
What kind of changes could one envision? In the third part of our paper we discuss specific applications of differential privacy to concrete situations, highlighting some limitations. When big data means large n, an increasing number of common computations can be achieved in a differentially private manner, with little cost to precision. It is not inconceivable that within a few years, many of the computations that have been - and those that are yet to be - proven achievable in theory, will be applied in practice. Echoing Dwork and Smith (2010), who "would like to see a library of differentially private versions of the algorithms in R and SAS," we would be happy to have a differentially private option in estimation commands in STATA. But ready-to-use, commercial-grade applications will not be developed without sufficient demand from potential users. We hope that the incorporation of privacy considerations into the vocabulary of empirical researchers will help raise demand, and stimulate further discussion and research|including, we hope, regarding additional approaches to privacy.
Until such applications are available, it might be wise to pause and reconsider researchers' promises and, more generally, obligations to subjects. When researchers (and IRBs!) are con dent that the data pose only negligible privacy risks - e.g., some innocuous small surveys and lab experiments it may be preferable to replace promises of anonymity with promises for "not terribly easy" identification or, indeed, with no promises at all. In particular, researchers could explicitly inform consenting subjects that a determined attacker may be able to identify them in posted data, or even learn things about them merely by looking at the empirical results of a research paper. We caution against taking the naive alternate route of simply refraining from making harmless data publicly available; freedom of information, access to data, transparency, and scientific replication are all dear to us. Of course, the tradeoff s, and in particular the question of what privacy risks are negligible and what data are harmless, should be carefully considered and discussed; a useful question to ask ourselves may resemble a version of the old newspaper test: would our subjects mind if their data were identified and published in the New York Times?
'The Illusory Privacy Problem in Sorrell v. IMS Health' [PDF] from 2011 by Jane Yakowitz and Daniel Barth-Jones comments -
Those in the habit of looking for privacy invasions can find them everywhere. This phenomenon is on display in the recent news coverage of Sorrell v. IMS Health Inc., a case currently under review by the Supreme Court. The litigation challenges a Vermont law that would limit the dissemination and use of prescription drug data for the purposes of marketing to physicians by pharmaceutical companies. The prescription data at issue identify the prescribing physician and pharmacy, but provide only limited detail about the patients (for example, the patient’s age in years and gender). Nevertheless, privacy organizations like the Electronic Frontier Foundation (EFF) and the Electronic Privacy Information Center (EPIC) have filed amici curiae briefs sounding distress alarms for patient privacy. A recent New York Times article describes the case as one that puts the privacy interests of “little people” against the formidable powers of “Big Data.” The fear is that, in the information age, data subjects could be re-identified using the vast amount of auxiliary information available about each of us in commercial databases and on the internet. 
Such fears have already motivated the Federal Trade Commission to abandon the distinction between personally identifiable and anonymized data in their Privacy By Design framework. If the Department of Health and Human Services (HHS) were to follow suit, the result would be nothing short of a disaster for the public, since de-identified health data are the workhorse driving numerous health care systems improvements and medical research activities. 
Luckily, we do not actually face a grim choice between privacy and public health. This short article describes the small but growing literature on de-anonymization—the ability to re-identify a subject in anonymized research data. When viewed rigorously, the evidence that our medical secrets are at risk of discovery and abuse is scant.