Data-Smart City Pod

The Intersection of Privacy, Technology, and Bias with Dr. Latanya Sweeney

Episode Summary

Professor Steve Goldsmith discusses data privacy, bias prevention, and tech policy with renowned expert Dr. Latanya Sweeney.

Episode Notes

In this episode Professor Goldsmith interviews Dr. Latanya Sweeney, an influential expert in the areas of tech policy, data privacy, bias, and regulation. Previously Chief Technologist at the Federal Trade Commission and a pivotal figure in the formation of the Health Insurance Portability and Accountability Act of 1996 (HIPAA), Dr. Sweeney discusses how cities can better create inclusive and transparent services, why biased data leads to biased results, and how one simple challenge in grad school changed her whole view on technology and computers.

Music credit: Summer-Man by Ketsa

About Data-Smart City Solutions

Data-Smart City Solutions, housed at the Bloomberg Center for Cities at Harvard University, is working to catalyze the adoption of data projects on the local government level by serving as a central resource for cities interested in this emerging field. We highlight best practices, top innovators, and promising case studies while also connecting leading industry, academic, and government officials. Our research focus is the intersection of government and data, ranging from open data and predictive analytics to civic engagement technology. We seek to promote the combination of integrated, cross-agency data with community data to better discover and preemptively address civic problems. To learn more visit us online and follow us on Twitter.

Episode Transcription

Betsy Gardner:

Hi, this is Betsy Gardner, Senior Editor at the Harvard Kennedy School and producer of the Data-Smart City Pod. Since we started this podcast, we've had great support from our listeners. And to make sure that you don't miss an episode, please find us under the new, Data-Smart City Pod channel, wherever you listen. Make sure to subscribe so you get each episode. And thanks for listening.

Stephen Goldsmith:

Welcome back to Steve Goldsmith, Professor of Urban Policy at Harvard's Kennedy School. Welcome you to another one of our podcasts with one of the country's leading experts on privacy, technology, and innovation. Latanya Sweeney is the Professor of Practice of Government and Technology at the Kennedy School, and is also on the Harvard faculty of Arts and Sciences. Dr. Sweeney has a wonderful background of interests to many of our listeners as she has had various government roles as well. So I'll call you by your first name. You can call me by my first name since we're friends. So welcome, Latanya, to our podcast.

Latanya Sweeney:

Oh, Stephen, it's great to be here. Thank you for having me on.

Stephen Goldsmith:

You're welcome. So we spend a fair amount of time, in particular, with state and local officials and others who care a lot about innovation and the way cities work. Let's start with a little bit about your interesting background in terms of how you got to privacy and technology, in particular, and a little bit about your last couple of jobs, and then we'll dig into the issue at hand.

Latanya Sweeney:

Great. Well, all my life, from a young girl, I loved mathematics and I always wanted to be a mathematician. I was raised by my great-grandparents, who had no idea what that meant or what that leads a young woman who was interested in mathematics. And I always wanted to be a mathematician until I didn't. It was the moment I took a computer programming course in high school. And from that moment forward, everything I had loved about math came alive in computer programming. Instead of abstract numbers and equations on paper, all of a sudden, you could build things that interacted in the real world and change people's lives. And in my poor high school, nothing that I did was exempt from it. Every art class to sports activities, all benefited by some program that I was determined to write for it.

And I just really wanted to go on and build a thinking machine, a computer that would think like humans. And eventually, I become a PhD student in computer science at MIT. And well on my way to building my thinking machine, I made my first really breakthrough. And one day, I was walking through the lounge and I heard an ethicist say, "Computers are evil." And so I was like, "Well, I got to stop and fix her thinking." And so literally, the year was 1996, and she really foretold the future.

She talked about how technology is changing and breaking all of our social contracts. And in particular, she was concerned about the availability of cheap, hard drive space and what that was doing in terms of both people's collection of data and the ability to share that data widely. And she used, as one of her examples, a data set that had been shared here in Massachusetts of state employees, their families and retirees' copy had been sold to industry. And another copy was given to a researcher.

And I then told her, "Look at all of the amazing things that this data set could do. Look at the lives it could change, that maybe the cost that will be reduced because this data is available being shared." And she said, "Yeah, but is it anonymous?" And so it didn't have anybody's name, or address, or social security number in it, but it did have their demographics. Their month, day, and year of birth, their gender, and their five-digit zip code. So I do a quick mathematical calculation. There are 365 days in a year. Let's say people live 100 years, and the database had two genders. That's 73,000 unique combinations.

But I happened to know that the typical five-digit zip code in Massachusetts had only about 20,000 people. That meant that on general, that combination would be unique for individuals. So I run to the city hall. So now, my argument to this ethicist is unfolding right before my face. I am not going to be undone. This must be okay. I have to prove her wrong. So I run up to city hall in Cambridge, and for $20 bucks, I got the Cambridge voter list. It came on two five and a quarter inch floppies. And if I tell this story to my students, they have no idea what a five and a quarter inch floppy disk is. But for those who appreciate it, you know what I mean.

But I went to get the Cambridge voter list because William Weld was the Governor of Massachusetts at that time. And he had collapsed. And information about his collapse was in that health data. And so, sure enough, six people had the same date of birth that he did on the voter list, and only three of them were men. And he was the only one in his five-digit zip code. Date of birth, gender, and zip code were unique for Governor Weld. And because they were unique in the voter list, there was also unique in the health data. And by linking on that, I was able to put his identity directly to his health record.

Like I said, the year was 1996. That was the experiment that went around the world. Literally a month later, I was testifying before Congress because it wasn't just that one data set. That was the best practice around the world. And what that experiment really showed was that technology had changed in a way that it wasn't our past practices that were our protection for privacy in this case. It was the absence of technology. The technology had changed to be a new kind of model, a new way of sharing data, and all the benefits that I described were promised. But at the same time, it had this other consequence.

So my thinking has been that once I... And just so you know, that happened at exactly the time the United States Congress was debating what became known as the HIPAA Privacy Rule. And so this example appears in the preamble. I'm not responsible for HIPAA, but I'm just saying the way it is. But the experiment certainly had an impact. We ran that same experiment against 1990 census data. I was able to predict that 87% of the population in the United States was unique based on date of birth, gender, and zip code. And so it predicated those changes.

It's with that foundation that I've come to this field. And of course, privacy was the first wave of adverse consequences. I went on to do a single experiment that illuminates others to understand the situation in algorithmic fairness. And of course, around the 2016 election.

Stephen Goldsmith:

That's an interesting story. And I can see it. I can see your passion as you made these discoveries along the way. So just one question and then I want to ask you a number of specific issues, but what did you concentrate on at the FTC? I mean, I can think of all sorts of ways you would be involved in regulating the use of data. So which challenges did you take on there?

Latanya Sweeney:

That was a fantastic job. That's like the job you always want. So at first, you might think that seems odd. Who wants to work in the federal government if you're not a federal government type? But the chief technology officer was a really new position at that time. And when you're up in the C-suite, the entire organization seems completely flat. And I could never actually remember who worked where, because it was so flat.

And what was so amazing is that flatness and the lack of expectation for the job gave me free run of the place, and to get involved in every kind of activity that the FTC does, and I got involved in every kind of activity that the FTC does. And so everything from policy formation and recommendations to Congress to investigations. So the Federal Trade Commission is responsible for fair advertising practices and gets involved in monopolies and deceitful actions.

Because most of the tech companies are American companies, the FTC functions a lot as the sort of police department of the internet. My engagements went everywhere from actual investigations, also, I set up a team to do research and to build out infrastructures for the FTC to do its job in new technological settings. It also has a Bureau of Economists. And so I also pose economic models and theories about technology monopolies and things like that.

So it was a fantastic experience. I got to play in all the pools in which the FTC plays in. I was exposed to just a tremendous number of unforeseen consequences of technology. It was like this huge... Whatever I thought was this situation. If I thought it was, "Yes, technology's causing some conflicts with society," when I got to the FTC, it's like, "Oh my God, it is a tidal wave of conflicts coming our way. What do we do?" So it was an exciting time.

Stephen Goldsmith:

You applied what you learned there in your earlier work to, say, cities, right? Many of the cities, as you know, have built in practices that end up discriminating, or leave out attributing whether that's intentional or not. Just saying, "There are practices that have developed over time that create discriminatory impact," like that. Trying to get more of those cities to use technology to uncover that discrimination causes them to worry because, "Hey, there's a privacy issue," but also the algorithmic bias issue in the data itself. So you're one of the country's experts on technology innovation. How would you think about, or what would you tell state and local leaders about how they should go about using technology to uncover discrimination without aggravating discrimination?

Latanya Sweeney:

Well, discovering it and learning about whether it exists in those is different than deploying the data in a way that creates a disparate impact. So those are two different things. I can take the data and I can learn lots of things about the data, how biased the data itself might be, other really important things that I could learn in the data to help me deliver services better or be more responsive to the community. I should go ahead and learn those things from data. I shouldn't be afraid of learning from data. And then, like you said, in government, we also build tools that are for service delivery, and we build other kinds of tools and mechanisms, and those are two different hats and two different sets of criteria. When I build a tool that I'm trying to launch out, the first thing I should ask myself is a question that Kathy O'Neil often says, "Who does this not work for? Where does this go wrong?"

One of the things, when we're the designers of new technology or of a new tool, is we get excited about all the new benefits it's going to launch, and we get blindsided in exactly the same way tech companies do. We get so focused on the new shiny thing it's going to deliver, we forget about what adverse consequences might be. And what's interesting when we are the designer is the remedy is usually something really simple. It's not like it takes a big thing, but it becomes a big thing if we don't identify it early. Because once we roll out our new tool and then somebody exposes it, then we're in a totally different place. We're upset about it. We want to defend it. We want to hide behind it.

And that goes from everything. And we've seen this from something as simple as a website. You got money, you bought a new web developer. They give you a spiffy new colorful website. Does it work for people who are blind? You still have to service them. Does it work for people who don't speak English? So you have to begin to ask yourself, "Who is this new shiny thing working for and who does it not work for?" I use that example of the websites that most people tend to go to are at the state level. Some are at the county level. But I use it because we did a survey and found a huge number of them are not ADA compliant. They make it very difficult for people with disabilities to use. So I use that as an example, that when we're building the shiny new thing, we have to have some discipline to ask ourselves, "Who are all the stakeholders and what are the issues for this new shiny thing from their standpoint?"

We don't want to ask the question to the extent that we can't move forward. Because for any new tool we build, there's a long list of stakeholder issues. And it doesn't matter how great the technology is. It's just going to be a long list. And you're not going to address all of them because you can't. But what you do want to address are the ones that are most likely to have the biggest adverse impact, and are likely to happen quickly in the process, and for a significant number of people. Those are the ones that you would want to address in the design of the technology itself.

On the other hand, using data, using data so that I can understand better. And sometimes, of course, the new tool might actually be a tool that's using data also like, a new AI that's learning off of that data, learning some patterns and so forth. I have to ask myself, again, the question of, "What is it," as you pointed out so well, "to what is the data itself representative of bias?" I often give an example when it comes to bias in data. I explain, I use an example here at Harvard with the students. I'm old enough that when I was a teaching fellow helping a professor with a course, I would walk into the classroom in computer science. And it was pretty much a sea of young white men. And if you walk into a computer class today, it's like you walked into a young version of the United Nations. These are some of the best and brightest minds from around the world together in this classroom.

Now, we want to build an admissions algorithm. If I used the data from years ago, it'll figure out other proxies and the admissions decision that that algorithm will make based on the data from decades ago will end up looking like a sea of white guys again. Even if I take out race and even if I take out gender. On the other hand, if I took today's student body and I use that to train it, then the next crop of applicants or the next group of students, rather, will look more like the United Nations. This is how algorithms work. That's why we want to use them. We want them to help us figure patterns that we don't even see, help us find other hidden relationships that we may not realize. But at the same time, sometimes, it's baking in our own biases. Sometimes, it's leading us to a place we don't want to go. And so we have to be ever cognizant of that and even test for that.

Stephen Goldsmith:

Do you recommend, at the local level, whether it's for security or privacy or bias, and they're three separate interlocking subjects, that there'd be a separate legislatively-appointed oversight committee for that subject? Or should there be forensic auditing? Should there be a privacy officer in every department? What do you think a best practice would look like in today's city or state government?

Latanya Sweeney:

Oh, that's a tough question. That's a really tough question. And so we certainly have seen conversations around the country in all variations of answers to that question. And I think ultimately, the groups who are on the ground will make a decision out what that was going to look like. So let me whittle out of answering the question by exposing more of what the question involves, and what are the pieces and how does that work out?

So one way to think about this is, if I have the vision for a project, I have a vision for a project. If my vision says, "I want to make sure, I don't know how it relates, but make sure it doesn't have this kind of bias. It doesn't do this and that," and I gave a laundry list of, "Make sure it doesn't dos," that kind of vision or that kind of ethical guidance can get us kicked off to a good start, but it doesn't necessarily mean that the outcome will actually adhere to those things.

Because those might be your marching orders, but then when you're trying to get the thing done, you're trying to just wrangle this data, or you're trying to build this tool, it's hard work. If it was easy, we would just buy it. It's not easy. And it's often the first time it's being done. And so you've got to do a lot of hard work. And now, you've got this laundry list. You're like, "It might get discounted, because there's no time," or "It might be left to the end to be considered," or "It's so much more work. Now, I've got to go back and reanalyze, 'Does it have this bias?'"

If you use that website example I gave, "I want to build a spiffy new website for my voters. I build the website for my voters or for people to check their property taxes," or whatever. I don't have time to think about all these other people who may not be able to use it. Maybe we'll think about that later. And so that would be a reason it doesn't. Instead, what we want at this next stage. So at the vision stage, a list of principles. At the development stage or at the design stage in which you're constructing these things, what you really need is a kind of risk assessment or a stakeholder design, an idea that, "I'm trying to build this technology. What are the big, important stakeholder issues that I have to also make a part of decisions?"

And then when it comes out the other end, if there is a board, that's going to review it, see, you're going to get flying colors, because you know what? You had principles when you started, you had a stakeholder risk assessment that you did while you're doing the design, and you can show it to that and you should be good to go. So even an organization that had all three, this example shows you how the three work. If I have one of the three, then it's certainly better than zero of the three. But you have to set your expectations accordingly.

If you only have the last one, the board that's going to review afterwards, you can just feel the political nightmare that that is. Somebody has just worked really hard, spent a lot of money to do this new thing, only for you to say, "It's all wrong. We can't use it." So I don't know that any one of them is the right, as much as elements of the three are important.

Stephen Goldsmith:

Last question or two. So in that last comment, what's the role of the non-professional expert, which let's just call the community, for the purpose of this question? How do you involve the community? And the community is a big general word, so it doesn't really express who from the community should participate, and obviously, depend on the subject. But what would a best practice look like that involves the community? And last part of that question, we had a member of our chief data group that was a chief data officer in New York City. I think he was a graduate of an HBCU. And his view was that if the participants, in the designing of the algorithm, were not more representative of the community, that the design would, by definition, miss something. Because the people who are participating couldn't see it. So how do you think about additional perspectives, context, and community participation in these issues that we've been discussing?

Latanya Sweeney:

Yeah. Another great question. So I think the sweetest spot of those three, the vision that could set up in the front, the while I'm designing it, stakeholder design while designing it, or the review afterwards. To me, the most critical one is that middle one, the person who's making all these little micro decisions in the design stage. Because now, if they got something wrong, it's easy to fix. There's no embarrassment. The money's not all gone. I can fix it. If it wasn't on the list of dos and don'ts, I could still incorporate it so that I could still make sure it doesn't do or don't. And I know that I've adhered to stakeholders. So that middle piece really speaks to your question.

Does the team that's doing the design itself have to reflect diversity? So the advantage and the disadvantage, it's a heck of a lot... I'll stick to this website example I've been using all throughout. It's a heck of a lot easier to forget about some stakeholders when they're not present. When they're present, any hidden assumptions you have about users becomes really clear. So if I was building a website, one group I would probably not think about at all are people who don't use websites. Because you're building a website. So you're thinking is everyone who's going to come here is going to know what "click" means. They're going to know when they see it what they're supposed to do.

This really came to our head on vaccination websites when vaccines were first being given out. You had to go to a website, but there were just lots of communities that didn't have internet, lots of people who didn't know how to use their website. And many of them were the same groups that they were trying to reach. But they do use phones and other things, but people didn't think that through. So the design can be biased by who's at the table and doing it. But the designers can help offset that by taking some time to think about, "Who are all the stakeholders involved? And let me think through, let me get some representation or some way of assessing what I'm trying to construct or what I'm trying to achieve from the stakeholders' perspective."

Because what I worry about a Litmus test that says the diversity had to be in the group, that doesn't mean that they're actually thinking through all the stakeholders. And it's not really clear that all the stakeholder interest would emerge in that view. Certainly, it's better than having none. If all the members of the design team are the same, then they're likely to have big blinders on. But the way to offset the blinders is to strategically study or do an assessment of, "What are the key stakeholder issues and how am I going to address them? Which ones are the important ones to address? And I'll address them."

So sometimes, when we think about stakeholder design, we might think in terms of including communities. There are ways to do that. We've done that before, where we will have almost like a focus group where we say, "We're thinking about doing this kind of thing. What do you think about it?" And begin to elicit things from communities. A lot of times, though, it's usually people who are experts, who can walk between the two, whose expertise is in a particular area of a stakeholder, but who also understands them and talk to them.

Stephen Goldsmith:

Well, we could talk to you for four or five hours. You have so many insights and so much enthusiasm, but I we've already kept you longer than we promise. So this is Steve Goldsmith, Professor of Kennedy School, talking to my colleague, Latanya Sweeney, Professor of Practice of Technology and her wonderful insights. No wonder you're a national leader. We hope to get you back. Thank you for your time.

Betsy Gardner:

If you liked this podcast, please visit us at datasmartcities.org, or follow us @datasmartcities on Twitter. And remember to subscribe at the new Data-Smart City Podcast channel on Spotify, Apple Podcasts, or wherever you listen. This podcast was produced by me, Betsy Gardner, and hosted by professor Steve Goldsmith. We're proud to be the central resource for cities interested in the intersection of government, data, and innovation. Thanks for listening.