There was a great passage in Alexis Madrigal’s recent interview with Gibu Thomas, who runs innovation at Walmart:
This notion of “the creep factor” seems fairly central as we think about the future of privacy regulation. When companies use our data for our benefit, we know it and we are grateful for it. We happily give up our location data to Google so they can give us directions, or to Yelp or Foursquare so they can help us find the best place to eat nearby. We don’t even mind when they keep that data if it helps them make better recommendations in future. Sure, Google, I’d love it if you can do a better job predicting how long it will take me to get to work at rush hour! And yes, I don’t mind that you are using my search and browsing habits to give me better search results. In fact, I’d complain if someone took away that data and I suddenly found that my search results just weren’t as good as they used to be!
But we also know when companies use our data against us, or sell it on to people who do not have our best interests in mind.
When credit was denied not because of your ability to pay but because of where you lived or your racial identity, that was called “redlining,” so called because of the practice of drawing a red line on the map to demarcate geographies where loans or insurance would be denied or made more costly. Well, there’s a new kind of redlining in the 21st century. The Atlantic calls it data redlining:
In some ways, the worst case scenario in the last paragraph above is tinfoil hat stuff. There is no indication that State Farm Insurance is actually doing those things, but we can see from that example where the boundaries of fair use and analysis might lie. It seems to me that insurance companies are quite within their rights to offer lower rates to people who agree to drive responsibly, and to verify the consumer’s claims of how many miles they drive annually, but if my insurance rates suddenly spike because of data about formerly private legal behavior, like the risk profile of where I work or drive for personal reasons, I have reason to feel that my data is being used unfairly against me.
Similarly, if I don’t have equal access to the best prices on an online site, because the site has determined that I have either the capacity or willingness to pay more, my data is being used unfairly against me.
The right way to deal with data redlining is not to prohibit the collection of data, as so many misguided privacy advocates seem to urge, but rather, to prohibit its misuse once companies have that data. As David Brin, author of the prescient 1998 book on privacy, The Transparent Society, noted in a conversation with me last night, “It is intrinsically impossible to know if someone does not have information about you. It is much easier to tell if they do something to you.”
Furthermore, because data is so useful in personalizing services for our benefit, any attempt to prohibit its collection will quickly be outrun by consumer preference, much as the Germans simply routed around France’s famed Maginot Line at the outset of World War II. For example, we are often asked today by apps on our phone if it’s OK to use our location. Most of the time, we just say “yes,” because if we don’t, the app just won’t work. Being asked is an important step, but how many of us actually understand what is being done with the data that we have agreed to surrender?
The right way to deal with data redlining is to think about the possible harms to the people whose data is being collected, and primarily to regulate those harms, rather than the collection of the data itself, which can also be put to powerful use for those same people’s benefit. When people were denied health coverage because of pre-existing conditions, that was their data being used against them; this is now restricted by the Affordable Care Act. By contrast, the privacy rules in HIPAA, the 1996 Health Information Portability and Accountability Act, which seek to set overly strong safeguards around the privacy of data, rather than its use, have had a chilling effect on many kinds of medical research, as well as patients’ access to their very own data!
Another approach is shown by legal regimes such as that controlling insider trading: once you have certain data, you are subject to new rules, rules that may actually encourage you to avoid gathering certain kinds of data. If you have material nonpublic data obtained from insiders, you can’t trade on that knowledge, while knowledge gained by public means is fair game.
I know there are many difficult corner cases to think through. But the notion of whether data is being used for the benefit of the customer who provided it (either explicitly, or implicitly through his or her behavior), or is being used against the customer’s interests by the party that collected it, provides a pretty good test of whether or not we should consider that collecting party to be “a creep.”
For more information on big data and privacy — and to get involved in the conversation — subscribe to the free Data Newsletter.
Editor’s note: we’re running a series of five excerpts from our forthcoming book Designing for Emerging Technologies, a compilation of works by industry experts in areas of user experience design related to genomics, robotics, the Internet of Things, and the Industrial Internet of Things.
In this excerpt, author — and editor of Designing for Emerging Technologies — Jonathan Follett addresses designer’s roles as new technologies begin to blur the boundaries between design and engineering for software, hardware, and biotech.
Technology extends our grasp, making it possible for us to achieve our goals rapidly and efficiently; but it also places its own set of demands upon us. The fields of industrial design, graphic design, and software user experience design have all evolved in response to these demands — a need for a human way to relate to and interact with our new tools. Graphic design makes information depicted in printed media clear, understandable, and beautiful; industrial design makes products elegant, usable, and humane; and user experience design makes the interaction with our digital tools and services efficient and even pleasurable.
The future of design is to envision humanity’s relationship to technology and each other — whether we’re struggling with fear and loathing in reaction to genetically altered foods, the moral issues of changing a child’s traits to suit a parent’s preferences, the ethics guiding battlefield robots, or the societal implications of a 150-year extended lifetime. Now, more than ever, designers have the opportunity to help define the parameters of and sculpt the interactions between man and technology.
The evolution of these fields will create opportunities for influencing humanity’s progress as a species; making the tumult and disruption of the Information Revolution look like a minor blip by comparison. While the miracles of our Information Age are many, the technologies of computers, the Internet, and mobile devices primarily serve to accelerate human communication, collaboration, and commerce. Without dismissing their importance, we observe that many existing models of interaction have merely moved from the physical to the digital realm — becoming cheaper, faster and, perhaps, better in the process. E-mail replaces the post, e-commerce replaces the brick-and-mortar store, and so on. Conversely, we can see, especially in the technologies of robotics and genomics, the potential for tremendous change, disruption on a wide scale, and the re-making of our current order in substantial fashion.
Designers have only just begun to think about the implications of emerging technologies for the human condition. We must prepare for the transformation of our field of practice — moving from design as facilitation, shaping the interface and workflow, to design as the arbiter, driving the creation of the technology itself and applying our understanding of interaction, form, information, and artistry to new areas.
To balance those asking, “How can this be done?” we must ask, “Why should we do this, to what end, and for whose benefit?” We must move from being passive receptors of new technology to active participants in its formation. As design thinkers and practitioners, we’re called to serve as a bridge between technology and humanity, to be explorers and actively seek out new opportunities in areas that are not yet obvious. We’re on the eve of some of the most significant technological changes to ever grace our world, and whether these changes serve everyone or just a few will be up to us.
In the coming years, as the boundaries between design and engineering for software, hardware, and biotech continue to blur, those who began their professional lives as industrial designers, computer engineers, user experience practitioners, and scientists will find that the trajectory of their careers takes them into uncharted territory. Like the farmers who moved to the cities to participate in the birth of the Industrial Revolution, we can’t imagine all of the outcomes of our work. But if history is any indicator, the convergence of these technologies will be greater than the sum of their parts. If we are prepared to take on such challenges, then we only have to ask: “What stands in the way?”
If you are interested in the collision of hardware and software, and other aspects of the convergence of physical and digital worlds, subscribe to the free Solid Newsletter.
What happens if emerging technology and automation result in a world of abundance, where anyone at anytime can produce anything they need and there’s no need for jobs? In his recent Strata keynote, James Burke warned that society is not prepared for scarcity (and the value it brings) to be a thing of the past — an eventuality Burke predicts will occur in the next 40 years or so. This topic kicks off a discussion between Jim Stogdill, Jon Bruner and myself that we recorded while at Strata.
Link fodder from our chat includes:
If you liked this article, you might be interested in a new report, “Building a Solid World,” that explores the key trends and developments that are accelerating the growth of a software-enhanced, networked physical world. (Download the free report.)
Thrust into controversy by Edward Snowden’s first revelations last year, President Obama belatedly welcomed a “conversation” about privacy. As cynical as you may feel about US spying, that conversation with the federal government has now begun. In particular, the first of three public workshops took place Monday at MIT.
Given the locale, a focus on the technical aspects of privacy was appropriate for this discussion. Speakers cheered about the value of data (invoking the “big data” buzzword often), delineated the trade-offs between accumulating useful data and preserving privacy, and introduced technologies that could analyze encrypted data without revealing facts about individuals. Two more workshops will be held in other cities, one focusing on ethics and the other on law.
A narrow horizon for privacy
Having a foot in the hacker community and hearing news all the time about new technical assaults on individual autonomy, I found the circumscribed scope of the conference disappointing. The consensus on stage was that the collection of personal information was toothpaste out of the tube, and that all we could do in response was promote oral hygiene. Much of the discussion accepted the conventional view that deriving value from data has to play tug of rope with privacy protection. But some speakers fought that with the hope that technology could produce a happy marriage between the rivals of data analysis and personal data protection.
No one recognized that people might manage their own data and share it at their discretion, an ideal pursued by the Vendor Relationship Management movement and many health care reformers. As an audience member pointed out, no one on stage addressed technologies that prevent the collection of personal data, such as TOR onion routing (which was sponsored by the US Navy).
Although speakers recognized that data analysis could disadvantage individuals, either through errors or through efforts to control us, they barely touched on the effects of analysis on groups.
Finally, while the Internet of Things was mentioned in passing and the difficulty of preserving privacy in an age of social networking was mentioned, speakers did not emphasize the explosion of information that will flood the Internet over the upcoming few years. This changes the context for personal data, both in its power to improve life and its power to hurt us.
One panelist warned that the data being collected about us increasingly doesn’t come directly from us. I think that’s not yet true, but soon it may be. The Boston Globe just reported that a vast network of vehicle surveillance is run by private industry, unfettered by the Fourth Amendment or discrimination laws (and providing police with their data). If people can be identified by the way they walk, privacy may well become an obsolete notion. But I’m not ready to give up yet on data collection.
In any case, I felt honored to hear and interact with the impressive roster of experts and the well-informed audience members who showed up on Monday. Just seeing Carol Rose of the Massachusetts ACLU sit next to John DeLong of the NSA would be worth a trip downtown. A full house was expected, but a winter storm kept many potential attendees stuck in Washington, DC, or other points south of Boston.
Questions the government is asking itself, and us
John Podesta, a key adviser to the Clinton and Obama administrations, addressed us by phone after the winter storm grounded his flight. He referred to the major speech delivered by President Obama on January 17 of this year, and said that Podesta was leading a working group formed afterward to promote an “open, interoperable, secure, and reliable Internet.”
It would be simplistic, however, to attribute administration interest in privacy to the flak emerging from the Snowden revelations. The government has been trying to cajole industries to upgrade security for years, and launched a cybersecurity plan at the same time as Podesta’s group. Federal agencies have also been concerned for some time with promoting more online collaboration and protecting the privacy of participants, notably in the National Strategy for Trusted Identities in Cyberspace (NSTIC) run by the National Institute of Standards and Technology (NIST).
Yes, I know, these were the same folks who passed NSA mischief on to standards committees, seriously weakening some encryption mechanisms. These incidents can remind us that the government is a large institution pursuing different and sometimes conflicting goals. We don’t have to withdraw on them on that account and stop pressing our values and issues.
The relationship between privacy and identity may not be immediately clear, but a serious look at one must involve the other. This understanding underscores a series I wrote on identity.
Threats to our autonomy don’t end with government snooping. Industries want to know our buying habits and insurers want to know our hazards. MIT professor Sam Madden said that data from the sensors on cell phones can reveal when automobile drivers make dangerous maneuvers. He also said that the riskiest group of drivers (young males) reduce risky maneuvers up to 78% if they know they’re being monitored. How do you feel about this? Are you viscerally repelled by such move-by-move snooping? What if your own insurance costs went down and there were fewer fatalities on the highways?
But there is no bright line dividing government from business. Many commenters complained that large Internet businesses shared user data they had collected with the NSA. I have pointed out that the concentration of Internet infrastructure made government surveillance possible.
Revelations that the NSA collected data related to international trade, even though there’s no current evidence it is affecting negotiations, makes one wonder whether government spies have cited terrorism as an excuse for pursuing other goals of interest to businesses, particularly when we were tapping the phone calls of leaders in allies such as Germany and Brazil.
Podesta said it might be time to revisit the Fair Information Practices that have guided laws in both the US and many other countries for decades. (The Electronic Privacy Information Center has a nice summary of these principles.)
Podesta also identified a major challenge to our current legal understanding of privacy: the shift from predicated searching to non-predicated or pattern searching. This jargon can be understood as follows: searching for a predicate can be a simple database query to verify a relationship you expect to find, such as whether people who reserve hotel rooms also reserve rental cars. A non-predicated search would turn up totally unanticipated relationships, such as the famous incident where a retailer revealed a customer’s pregnancy.
Podesta asked us to consider what’s different about big data, what business models are based on big data, what uses there are for big data, and whether we need research on privacy protection during analytics. Finally, he promised a report about three months from now about law enforcement.
Later in the day, US Secretary of Commerce Penny Pritzker offered some further questions: What principles of trust do businesses have to adopt? How can privacy in data be improved? How can we be more accountable and transparent? How can consumers understand what they are sharing and with whom? How can government and business reduce the unanticipated harm caused by big data?
Incentives and temptations
The morning panel trumpeted the value of data analysis, while acknowledging privacy concerns. Panelists came from medicine, genetic research, the field of transportation, and education. Their excitement over the value of data was so infectious that Shafi Goldwasser of the MIT Computer Science and Artificial Intelligence Laboratory later joked that it made her want to say, “Take my data!”
I think an agenda lay behind the choice of a panel dangling before us an appealing future when we can avoid cruising for parking spots, can make better use of college courses, and can even cure disease through data sharing. In contrast, the people who snoop on social networking sites in order to withdraw insurance coverage from people were not on the panel, and would have had a harder time justifying their use of data. Their presence would highlight the deceptive enticements of data snooping. Big data offers amazing possibilities in the aggregate. Statistics can establish relationships among large populations that unveil useful advice to individuals. But judging each individual by principles established through data analysis is pure prejudice. It leads to such abuses as labeling a student as dissolute because he posts a picture of himself at a party, or withdrawing disability insurance from someone who dares to boast of his capabilities on a social network.
Having our cake
Can technology save us from a world where our most intimate secrets are laid at the feet of large businesses? A panel on privacy enhancing techniques suggested it may.
Data analysis without personal revelations is the goal; the core techniques behind it are algorithms that compute useful results from encrypted data. Normally, encrypted data is totally random in principle. Traditionally, it would violate the point of encryption if any information at all could be derived from such data. But the new technologies relax this absolute randomness to allow someone to search for values, compute a sum, or do more complex calculations on encrypted values.
Goldwasser characterized this goal as extracting data without seeing it. For instance, suppose we could determine whether any faces in a surveillance photo match suspects in a database without identifying innocent people in the photo? What if we could uncover evidence of financial turmoil from the portfolios of stockholders without knowing what is held by each stockholder?
Nickolai Zeldovich introduced his CryptDB research, which is used by Google for encrypted queries in BigQuery. CryptDB ensures that any value will be represented by the same encrypted value everywhere it appears in a field, and can also support some aggregate functions. This means you can request the sum of values in a field and get the right answer without having access to any individual values. Different layers of protection can be chosen, each trading off functionality for security to a different degree.
MIT professor Vinod Vaikuntanathan introduced homomorphic encryption, which produces an encrypted result from encrypted data, allowing the user to get the result without seeing any of the input data. This is one of the few cutting-edge ideas introduced at the workshop. Although homomorphic encryption was suggested in 1979, no one could figure out how to make it work till 2009, and viable implementations such as HELib and HCrypt emerged only recently.
The white horse that most speakers wanted to ride is “differential privacy,” an unintuitive term that comes from a formal definition of privacy protection: any result returned from a query would be substantially the same whether or not you were represented by a record in that data. When differential privacy is in place, nobody can re-identify your record or even know whether you exist in the database, no matter how much prior knowledge they have about you. A related term is “synthetic data sets,” which refers to the practice of offering data sets that are scrambled and muddied by random noise. These data sets are carefully designed so that queries can produce the right answer (for instance, “how many members are male and smoke but don’t have cancer?”), but no row of data corresponds to a real person.
Cynthia Dwork, a distinguished scientist at Microsoft Research and one of the innovators in differential privacy, presented an overview that was fleshed out by Harvard professor Salil Vadhan. He pointed out that such databases make it unnecessary for a privacy expert to approve each release of data because even a user with special knowledge of a person can’t re-identify him.
These secure database queries offer another level of protection: checking the exact queries that people run. Vaikuntanathan indicated that homomorphic encryption would be complemented by a functional certification service, which is a kind of mediator that accepts queries from users. The server would check a certificate to ensure the user has the right to issue that particular query before carrying it out on the database.
The ongoing threat to these technologies is the possibility of chipping away at privacy by submitting many queries, possibly on multiple data sets, that could cumulatively isolate the information on a particular person. Other challenges include:
The use of these techniques will also require changes to laws and regulations that make assumptions based on current encryption methods.
Technology lawyer Daniel Weitzner wrapped up the panel on technologies by describing technologies that promote information accountability: determining through computational monitoring how data is used and whether a use of data complies with laws and regulations.
There are several steps to information accountability:
Challenges include making a policy language sufficiently expressive to represent the law without become too complex for calculations. The language must also allow incompleteness and inconsistency because laws don’t always provide complete answers.
The last panel of the day considered some amusing and thought-provoking hypothetical cases in data mining. Several panelists dismissed the possibility of restricting data collection but called for more transparency in its use. We should know what data is being collected and who is getting it. One panelist mentioned Deborah Estrin, who calls for companies to give us access to “data about me.” Discarding data after a fixed period of time can also protect us, and is particularly appealing because old data is often no use in new environments.
Weitzner held out hope on the legal front. He suggested that when President Obama announced a review of the much-criticized Section 215 of the Patriot Act, he was issuing a subtle message that the fourth amendment would get more consideration. Rose said that revelations about the power of metadata prove that it’s time to strengthen legal protections and force law enforcement and judges to treat metadata like data.
Privacy and dignity
To me, Weitzner validated his role as conference organizer by grounding discussion on basic principles. He asserted that privacy means letting certain people handle data without allowing other people to do so.
I interpret that statement as a protest against notorious court rulings on “expectations of privacy.” According to US legal doctrine, we cannot put any limits on government access to our email messages or to data about whom we phoned because we shared that data with the companies handling our email and phone calls. This is like people who hear that a woman was assaulted and say, “The way she dresses, she was asking for it.”
I recognize that open data can feed wonderful, innovative discoveries and applications. We don’t want a regime where someone needs permission for every data use, but we do need ways for the public to express their concerns about their data.
It would be great to have a kind of Kickstarter or Indiegogo for data, where companies asked not for funds but for our data. However, companies could not sign up as many people this way as they can get now by surfing Twitter or buying data sets. It looks like data use cannot avoid becoming an issue for policy, whoever sets and administers it. Perhaps subsequent workshops will push the boundaries of discussion farther and help us form a doctrine for our decade.