The Failure of Predictive Algorithms I mean, certainly models are used more if you include in that. Not just economic models like we are discussing, but things like algorithms. Think about all the VC funded, big Data, AI companies nowadays, they are all based on the assumption that predictive analytics AI or whatever you want to call it, will be able to solve problems. So there are inherently assuming that modeling works or worked well enough for this company, Uber. All those companies that are using the algorithm as a basic business model. What do you think? I mean, give me your opinion of that. I mean, full disclosure, I have an algorithmic auditing company. The point of my company is to poke at that assumption, and the very short answer is it's a very narrow view. Most algorithms work narrowly in the way that the company that built them and is deploying them wants them to work, but probably fails in a lot of other ways. They might be unimportant to that company, but it might be important to the targets of the algorithm or some other side effect of the algorithm. So I mean, just going back to the way that economists thought about derivatives. The way they talked about it. The way they thought about it. It "Worked" for them. Put that in quotes, because that's what you'll find over and over again with models. If models are working for the people that are using them, whether that's because the data looked good that they weren't looking at other data. Or because it worked for them politically. Or because they kept on getting better and better jobs when they talked about how great these models were. Like there's all sorts of you can even think of it as a corruption in a certain sense. Because working politically for them is still working for them. I'm just saying that, that is a very narrow view. The real question isn't, does this work for you? Because yes it is. You wouldn't be doing it if it didn't. The real question is for whom does this fail? Right. For whom does this fail. That's the question that isn't being asked. Wasn't being asked then, still isn't being asked now. But also, the definition of working for you, can also be a misleading. I mean, what does it mean to be working? Yeah. It works for you for the moment. You maybe you've got a promotion or something. Yes. But if it brought down your company, is that really working for you? If you've got another job. I mean, that's the thing about people don't quite understand how cynical the world of finance was at that time. I would talk to people about that, like, "Oh, this model seems flawed. Seems like as a business model that's dangerous for the company. Oh, but I'm just going to skip ship when it fails and I'll get another job," and that was the assumption. So it's very, very narrow perspective. Yeah, working means in the case of many of those models that we saw fail during the crisis, simply meant short-term profit. I mean, is very simple. It's very money based. The algorithms that I think about now, let's talk about the Facebook Newsfeed algorithm. "Works" for Facebook. Its ends up translating into money, but the short term, the more direct definition is engagement. Keeping people on Facebook. So we're going to privilege the Newsfeed items that keep people on Facebook. We're going to demote the items that people tend to leave Facebook after reading or seeing. Just that one thing, of course it is aligned with profit because the longer people see on Facebook, the more they click on ads, the more money Facebook makes. So that's their narrower definition of working. There like, "This is working because we're making more money." It's very clear that that's their incentive. But what we've seen in the last few years, and it was pretty predictable actually. Looking back at it, is that that also privileged things that we find outrageous. Because why do we stay on Facebook? To argue. Things that we find divisive. Why do we stay on Facebook? To get outrage. To fight with each other. Or be part of a group that excludes others. Yeah, or to even be radicalized, and to find your people, your new radicalize people. There's all sorts of stories we've heard. What it doesn't privilege is, thoughtful discussions that makes you go and do your own research at a library. We all know that, that's not happening. So we've seen that when Facebook optimizes to its bottom line, its definition of success which is profit, it's also optimizing directly away from our definition of success. Which is being informed and not fighting. Then there's an even further irony to it. Which is that, by optimizing to that narrow definition of working for them, they put their company and the crosshairs. In other words, it may not work for them, and they may be a very extremely corrosive thing for the company itself in the longer term, the bigger picture. Yet they've been focused on this very short-term. Short-termism of course, is one of the great problems of our ages. Yeah. But to backup for a minute. I want to ask you about what an algorithm is. Because it is a term that's thrown around all the time. Yes. I'm a music and math person, and there is a beauty and crystalline structure to mathematics, that makes people think that it doesn't lie. Mathematics indeed does have proofs, and mathematics itself doesn't lie. But the assumptions behind mathematics most certainly can lie. Work me through a definition for people who don't maybe understand when they throw around the word algorithm. What is an algorithm? Okay. I'm just going to backup and just disagree with one thing you just said, which is I feel like axioms in mathematics if stated as axioms, they're not lies, they're just assumptions. The thing that we're going to see in my explanation of what an algorithm is, is that, it's not mathematics at all actually. Right. So what is an algorithm? When I say algorithm, I really mean predictive algorithm. Because if just taken straight up, an algorithm just means a series of instructions like that's not what I mean. I mean, a predictive algorithms. So what I mean by that in almost every algorithm I will discuss is predicting. Not just predicting something, predicting a person. Most of examples I can talk about predictive person. Are you going to pay back this loan? Are you going to have a car crash? How much should we charge you for car insurance? Or you're going to get sick. How much should we charge you for health insurance? Are you going to do a good job at this job should we hire you? Are you going to get reinvested after leaving prison? That's your crime risk score. It's a prediction to scoring system. It's even more precise. It's a scoring system on humans. If your score is above 77, you get the job. It's to below 77, you don't get the job. Simply that kind of thing. But more generally, a predictive algorithm is an algorithm that predicts success. Success is the thing I've just been mentioning in those examples. Are you going to click? Are you going to get in a car crash? Those are the definitions of success. Specific event, and the reason we have to be so precise about that is because you train your algorithm on historical data. So go 10 years, 20 years. This is what I did when I was working as a quantum in finance, you look for statistical patterns. Things you're looking for in particular at initial conditions that later led to success. So people like you got raises. People like you got hired. People like you got promoted in this company. So we're going to hire you because we think your chances of getting a raise, getting promoted, and staying at the company are good. Because you match the pattern of people who I've had that happen. Exactly. So the inherent thing is, things that happened in the past will predict what will happen again. But we have to define what that means. What particular thing is going to happen? That's the definition of success. So really to build an algorithm, a predictive algorithm, you just need past data, and this definition of success, and that's it. Then you can propagate into the future stuff patterns from the past. So how does that play out into? I heard you give a wonderful real-world example of your- Oh yeah, with my kids? -your kids. Yeah. So I talk about this. Because I really do think, it's like a no brainer. It's not complicated. It's something we do every day. Sometimes you gives the example of getting dressed in the morning. What am I going to where? You have a lot of memories. It doesn't have to be formal. It doesn't have to be in a database. It's just like memories in your head. Things I wore the past, was I comfortable? If that's a definite edition of success for you today, being comfortable, you have a lot of memories to decide what to wear if you want to be comfortable. If you want to look professional, then you have memories to help you look professional. Another example I'd like to give though that shows more of a social structure of predictive algorithms, and how things can go wrong is cooking dinner for my family. So I cook dinner for my three sons and my husband. I want to know what to cook and so I think back to my memories of cooking for them, like this guy likes carrots but only when they're raw. This guy doesn't eat pasta but he likes bread. Then I cook a meal of course, it depends on what ingredients are in my kitchen so that's data I need to know. How much time do I have also data in each now, but at the end of the day I cook something, we eat it together and then I assess whether it is successful. That's when you need to know what was my definition of success and my definition of success is, did my kids eat vegetables? I say this because, I want to contrast it against like my youngest son, Wilty, whose only goal in life is to eat Nutella, his definition of success if he were in charge would be did I get to have Nutella? So the two lessons to learn from that are first of all, well, the first thing is it matters what the definition of successes is. Because I'm not just asking to know, I'm asking because I'm going to remember this was successful, this wasn't successful, this was. In the future, I'm going to optimize to success. I'm going to make more and more meals that were successful in the past because I think they'll be successful again. That's how we do it. We optimize to success. The meals that I make with my definition of success are very different meals that I would make for my son's definition of success. So that's one really important point, but the other just as important point is, that I get to decide what the definition of successes is because I'm in charge and my son is not in charge. So the point I'm trying to make is, it's about power. Predictive algorithms are optimized to this success defined by their owners, by their builders, by their deployers, and it's all about the power dynamic. So when we're scoring people, the people who are being scored might not agree with the definition of success but they don't get a vote. The people who are owning the algorithm, the scorers are the ones who say, "Here's what I mean by a good score." That could seriously be different for the person who's being scored. An end for that matter, many of the examples I wrote about in my book, they're just unfair. They're simply unfair. Never mind the definition of success, they might even be defined in a reasonable way, but the score is actually computed in an unfair away and the people who are being scored have really no appeal system. So the typical situation for a real power relationship and most of the examples I just gave are like that insurance, credit, getting a job or sentencing to prison. All of those are examples where the standard setup is, the company who uses this scoring system licenses the predictive algorithm, the scoring system from some third party. That third party basically scores the person, tells this big company what the score is. The person being scored can't ask any questions because these guys don't even know how it works. They typically have a licensing agreement that stipulates that the big company will never get to see the secret source of the scoring system. Because it's a trade secret? It's a trade secret. So it's really opaque and it's often unfair. There's just nothing that the person being score can really do about it and at the same time, they are missing out on really important financial opportunities, or job opportunities, or even going to prison. Well, this is really a problem because people when they see that something has been done by a computer and that there's an algorithm's magic word involved, they think well, that's objective because the computer did it. People didn't do it. It's number crunching and it must be right. I don't like the result but it must be right and then the people who are in charge of it feel like in some way move the responsibility for the decision over to some black box that has some magic secret source, whatever you going to call it. But that it's because it's mathematical and algorithmic, it's objective. That is just not true, is it? It's really not true, but you said it well. That is the assumption going in and that's the blind trust I call it, the blind trust that I'm pushing back against. That's what I do now. I push back against this blind trust that we have. Now, it's true that it's not idiosyncratically favoring certain friends. It's not nepotistic. Like you could imagine a hiring manager who just simply lets their buddies get a job even though their buddies aren't qualified under the official rules. Algorithms don't do that. They're not particular to a specific person, but they are inherently discriminatory in as much as the data that they're trained on is discriminatory. So for example, if we train an algorithm to look for people who in the past were given promotions in a company where men got promoted over women or tall men got promoted over short men. Then that algorithm would be trained on that data and would replicate that bias. So there's no sense in which it's actually objective. Should be objective would mean something else. It would be closer to something like, what does it mean to be good at your job? Let's measure the person's ability to do their job. That would be closer to objective. That's not what we typically see with algorithms. We're training on success but success is really a very bad proxy for underlying worth. It's actually difficult to measure somebody's underlying worth, underlying ability and you can try to get there and through various means, but most of those means that we've developed, are known to be biased. Biased against minorities, biased against women, biased against the usual suspects. So what we end up doing, is replicating that bias in the algorithms. In some sense you could just say, "Okay, we're just doing what we used to do. No better, no worse." I would argue that it's actually a little worse because now we are also imbuing this stuff with our blind trust. We think we've done our job and we think okay, great. Now it's fair. We don't have to worry about it. In fact, we do have to worry about it. There are at least two ways that this goes off track. One is, the people who were selecting the tools and the elements that are going to feed into this in terms of what's important and what isn't, or even if you didn't have that and you just are looking at past patterns. If the past patterns are going to be whatever they are. If it's machine-learning, the machine is learning itself from that, you're still going to reflect the fact that the company only ever hired men. Yeah. You're distinguishing and I was conflating those two things. I think it's really important to distinguish. There's lots of different ways algorithms can mess up and one of them is bad data, like garbage in garbage out. Like if it is biased data, it's going to propagate the bias. That's what we just discussed but then there's also the definition of success itself. You can take a perfectly good data but turn it into garbage algorithm by defining success the way Facebook define success and making everyone worse off and democracy dissolve.