The Limits of Political Debate

I.B.M. taught a machine to debate policy questions. What can it teach us about the limits of rhetorical persuasion?
A person debating with a robot with mics in front of each figure.
We need A.I. to be more like a machine, supplying troves of usefully organized information. It can leave the bullshitting to us.Illustration by Angie Wang

In February, 2011, an Israeli computer scientist named Noam Slonim proposed building a machine that would be better than people at something that seems inextricably human: arguing about politics. Slonim, who had done his doctoral work on machine learning, works at an I.B.M. Research facility in Tel Aviv, and he had watched with pride a few days before as the company’s natural-language-processing machine, Watson, won “Jeopardy!” Afterward, I.B.M. sent an e-mail to thousands of researchers across its global network of labs, soliciting ideas for a “grand challenge” to follow the “Jeopardy!” project. It occurred to Slonim that they might try to build a machine that could defeat a champion debater. He made a single-slide presentation, and then a somewhat more elaborate one, and then a more elaborate one still, and, after many rounds competing against many other I.B.M. researchers, Slonim won the chance to build his machine, which he called Project Debater. Recently, Slonim told me that his only wish was that, when it was time for the actual debate, Project Debater be given the voice of Scarlett Johansson. Instead, it was given a recognizably robotic voice, less flexible and punctuated than Siri’s. A basic principle of robotics is that the machine shouldn’t ever trick human beings into thinking that they are interacting with any person at all, let alone one whom Esquire has twice named the “Sexiest Woman Alive.”

Scientific work inside the biggest corporations can sometimes feel as insulated and speculative as in an academic lab. It wasn’t hard to imagine that businesses might make use of Slonim’s programming—that is, they might substitute a very persuasive machine for any human who interacts with people. However, Slonim’s Tel Aviv-based team was not supposed to think about any of that—they were only supposed to win a debate. To Slonim, that was a lot to ask. I.B.M. had built computers that had beaten human champions at chess, and then at trivia, and this had left the impression that A.I. was close to “humanlike intelligence,” Slonim told me. He considered that “a misleading conception.” Slonim is trim and pale, with a shaved head and glasses, and in place of the usual boosterism about artificial intelligence he has a slight sheepishness about how new the technology is. To him, the debate project was a half-step out into reality. Debate is a game, like trivia or chess, in that it has specific rules and structures, which can be codified and taught to a machine. But it is also like real life, in that the goal is to persuade a human audience to change their minds—and to do that the machine needed to know something about how they thought about the world.

Slonim was already well versed in machine learning, thanks to his doctoral work. When it came to debate, his only authority was national—Israelis, he pointed out to me, argue voluminously, and he thought that his own family argued even more voluminously than most. But I.B.M.’s vast resources were brought to bear on the project, and, slowly, during a politically tumultuous decade, Project Debater took shape—it was a sort of education. The young machine learned by scanning the electronic library of LexisNexis Academic, composed of news stories and academic journal articles—a vast account of the details of human experience. One engine searched for claims, another for evidence, and two more engines characterized and sorted everything that the first two turned up. If Slonim’s team could get the design right, then, in the short amount of time that debaters are given to prepare, the machine could organize a mountain of empirical information. It could win on evidence.

In 2016, a debate champion was consulting on the project, and he noticed that, for all of its facility in extracting facts and claims, the machine just wasn’t thinking like a debater. Slonim recalled, “He told us, ‘For me, debating whether to ban prostitution, or whether to ban the sale of alcohol, this is the same debate. I’m going to use the same arguments. I’m just going to massage them a little bit.’ ” If you were arguing for banning prostitution or alcohol, you might point to the social corrosion of vice; if you were arguing against, you might warn of a black market. Slonim realized that there were a limited number of “types of argumentation,” and these were patterns that the machine would need to learn. How many? Dan Lahav, a computer scientist on the team who had also been a champion debater, estimated that there were between fifty and seventy types of argumentation that could be applied to just about every possible debate question. For I.B.M., that wasn’t so many. Slonim described the second phase of Project Debater’s education, which was somewhat handmade: Slonim’s experts wrote their own modular arguments, relying in part on the Stanford Encyclopedia of Philosophy and other texts. They were trying to train the machine to reason like a human.

In February, 2019, the machine had its first major public debate, hosted by Intelligence Squared, in San Francisco. The opponent was Harish Natarajan, a thirty-one-year-old British economic consultant, who, a few years earlier, had been the runner-up in the World Universities Debating Championship. Before they appeared onstage, each contestant was given the topic and assigned a side, then allotted fifteen minutes to prepare: Project Debater would argue that preschools should be subsidized by the public, and Natarajan that they should not. Project Debater scrolled through LexisNexis, assembling evidence and categorizing it. Natarajan did nothing like that. (When we spoke, he recalled that his first thought was to wonder at the topic: Was subsidizing preschools actually controversial in the United States?) Natarajan was kept from seeing Project Debater in action before the test match, but he had been told that it had a database of four hundred million documents. “I was, like, ‘Oh, good God.’ So there was nothing I could do in multiple lifetimes to absorb that knowledge,” Natarajan told me. Instead, he would concede that Project Debater’s information was accurate and challenge its conclusions. “People will say that the facts speak for themselves, but in this day and age that is absolutely not true,” Natarajan told me. He was prepared to lay a subtle trap. The machine would be ready to argue yes, expecting Natarajan to argue no. Instead, he would say, “Yes, but . . .”

The machine, a shiny black tower, was placed stage right, and spoke in an airy, bleating voice, one that had been deliberately calibrated to sound neither exactly like a human’s nor exactly like a robot’s. It began with a scripted joke and then unfurled its argument: “For decades, research has demonstrated that high-quality preschool is one of the best investments of public dollars, resulting in children who fare better on tests and have more successful lives than those without the same access.” The machine went on to cite supportive findings from studies: investing in preschool reduced costs by improving health and the economy, while also reducing crime and welfare dependence. It quoted a statement made in 1973 by the former “Prime Minister Gough Whitlam” (the Prime Minister of Australia, that is), who said subsidizing preschool was the best investment that a society could make. If that all sounded a bit high-handed, Project Debater also quoted the “senior leaders at St. Joseph’s RC primary school,” sprinkling in a reference to ordinary people, just as a politician would. Project Debater could sound a bit like a politician, too, in its offhand invocation of moral first principles. Of preschools, it said, “It is our duty to support them.” What duties, I wondered, did the machine and audience share?

Natarajan, who stood behind a podium at stage left, wore a gray three-piece suit and spoke in a clipped, confident voice. His decision not to challenge the evidence that Project Debater had assembled had a liberating effect: it allowed him to argue that the machine had taken the wrong approach to the question, drawing attention to the fact that one contestant was a human and the other was not. “There are multiple things which are good for society,” he said. “That could be, in countries like the United States, increased investment in health care, which would also often have returns for education”—which Project Debater’s sources would probably also note is beneficial. Natarajan had identified the sort of expert-inflected, anti-poverty argument that the machine had attempted, and, rather than competing on the facts, he relied on a certain type of argumentation—taking in the tower of electricity a few feet from him, with its Darth Vader sheen, and identifying it as a dreamy idealist.

The first time I watched the San Francisco debate, I thought that Natarajan won. He had taken the world that Project Debater described and tipped it on its side, so that the audience wondered whether the computer was looking at things from the right angle, and that seemed the decisive maneuver. In the room, the audience voted for the human, too: I.B.M. had beaten Kasparov, and beaten the human champions of “Jeopardy!,” but it had come up short against Harish Natarajan.

But, when I watched the debate a second time, and then a third, I noticed that Natarajan had never really rebutted Project Debater’s basic argument, that preschool subsidies would pay for themselves and produce safer and more prosperous societies. When he tried to, he could be off the cuff to the point of ridiculousness: at one point, Natarajan argued that preschool could be “actively harmful” because it could force a preschooler to recognize that his peers were smarter than he was, which would cause “huge psychological damage.” By the end of my third viewing, it seemed to me that man and machine were not so much competing as demonstrating different ways of arguing. Project Debater was arguing about preschool. Natarajan was doing something at once more abstract and recognizable, because we see it all the time in Washington, and on the cable networks and in everyday life. He was making an argument about the nature of debate.

I sent the video of the debate to Arthur Applbaum, a political philosopher who is the Adams Professor of Political Leadership and Democratic Values at Harvard’s Kennedy School, and who has long written about adversarial systems and their shortcomings. “First of all, these Israeli A.I. scientists were enormously clever,” Applbaum told me. “I have to say, it’s nothing short of magic.” But, Applbaum asked, magic to what end? (Like Natarajan, he wanted to tilt the question on its side.) The justification for having an artificial intelligence summarize and channel the ways in which people argue was that it might shed light on the underlying issue. Applbaum thought that this justification sounded pretty weak. “If we have people who are skilled in doing this thing, and we listen to them doing this thing, will we have a deeper, more sophisticated understanding of the political questions that confront us, and therefore be better-informed citizens? That’s the underlying value claim,” Applbaum said. “Straightforwardly: No.”

As Applbaum saw it, the particular adversarial format chosen for this debate had the effect of elevating technical questions and obscuring ethical ones. The audience had voted Natarajan the winner of the debate. But, Applbaum asked, what had his argument consisted of? “He rolled out standard objections: it’s not going to work in practice, and it will be wasteful, and there will be unintended consequences. If you go through Harish’s argument line by line, there’s almost no there there,” he said. Natarajan’s way of defeating the computer, at some level, had been to take a policy question and strip it of all its meaningful specifics. “It’s not his fault,” Applbaum said. There was no way that he could match the computer’s fact-finding. “So, instead, he bullshat.”

I.B.M. has staged public events like the San Francisco debate as man versus machine, in a way that emphasizes the competition between the two. But, at their current level, A.I. technologies operate more like a mirror: they learn from us and tell us something about the limits of what we know and how we think. Slonim’s team had succeeded, imperfectly, in teaching the machine to mimic the human mode of debate. We—or, at least, Harish Natarajan—are still better at that. But the machine was far better at the other part—the collection and analysis of evidence, both statistical and observed. Did subsidized preschool benefit society or not? One of the positions was correct. Project Debater was more likely to assemble a strong case for the correct answer, but less likely to persuade a human audience that it was true. What the audience in the hall wanted from Project Debater was for it to be more like a human: more fluid and emotional, more adept at manipulating abstract concepts. But what we need from the A.I., if the goal is a more specific and empirical way of arguing, is for it to be more like a machine, supplying troves of usefully organized information and leaving the bullshit to us.

Whether you spend years inside the world of debate, as Slonim’s consultants and Natarajan did, or just a few days, as I did recently, you tend to see its patterns everywhere. Turn on CNN and you will quickly find politicians or pundits transforming a specific question into an abstract one. When I reached Slonim on a video call last week, I found that he had grown a salt-and-pepper beard since the San Francisco debate, which made him look older and more reflective. There was a note of idealism that I hadn’t heard before. He had been working on using Project Debater’s algorithms to analyze which arguments were being made online to oppose COVID-19 vaccination. He hoped, he told me, that they might be used to make political argument more empirical. Perhaps, one day, everyone would have an argument check on their smartphones, much as they have a grammar check, and this would help them make arguments that were not only convincing but true. Slonim said, “It’s an interesting question, to what extent this technology can be used to enhance the abilities of youngsters to analyze complex topics in a more rational manner.” I found it moving that the part of the technology that held the most transformative potential, to make argument more empirical and true, was also what made Project Debater seem most computer-like and alien. Slonim thought that this was a project for the next generation, one that might outlive the current levels of political polarization. He said, ruefully, “Our generation is perhaps lost.”