Transcript for Tuomas Sandholm: Poker and Game Theory
SPEAKER_00
00:00 - 01:29
The following is a conversation with Thomas Sandholm. He's a professor of CMU and co-creator of the Brottis, which is the first AI system to beat top human players in the game of heads-up No Limit, Texas Holden. He has published over 450 papers on Game Theory and Machine Learning, including a best paper in 2017 at Nips. Now, renamed to a new rips, which is where I caught up with him for this conversation. His research and companies have had wide reaching impact in the real world, especially because he and his group not only proposed new ideas, but also build systems to prove that these ideas work in the real world. This conversation is part of the MIT course on artificial general intelligence and the artificial intelligence podcast. If you enjoy it, subscribe on YouTube, iTunes, or simply connect with me on Twitter at Lex Friedman spelled FRID. And now, here's my conversation with Thomas Sandholm. Can you describe at the high level, the game of poker, Texas Holden, heads up, Texas Holden for people who might not be familiar with this car game?
SPEAKER_01
01:29 - 02:47
Yeah, happy to. So, heads up, no limit, Texas Holden has really merged into AI community as a main benchmark for testing these application independent algorithms for imperfect information game solving. And this is a game that's actually played by humans, you don't see that much on TV or casinos because well for various reasons but you do see it in some expert level casinos and you see it in the best poker movies of all time it's actually an event in the world series of poker but mostly it's played online and typically for pretty big sums of money and this is a game that usually only experts play. So if you go to your home game on a Friday night, it probably is not going to be heads up. No limit decks as whole. That might be not limit decks as whole them in some cases, but typically for a big group and it's not as competitive. Well heads up means it's too play or so it's really like me against you and my better or are you better much like chess or or go in that sense but in imperfect information game which makes it much harder because I have to deal with issues of you know and things that I don't know and I know things that you don't know instead of pieces being nice and laid on the board for both of us to see.
SPEAKER_00
02:47 - 03:05
So in Texas, hold them. There's two cards that you only see. The long two. Yeah. And there is a gradually layout some cards that add up overall to five cards that everybody can see. Yeah. The imperfect nature of the information is the two cards that you're holding in front.
SPEAKER_01
03:05 - 03:32
Yeah. So as you said, you first get two cards in private each. And then there's a betting rod. Then you get three cards in public on the table, then there's a betting round. Then you get the fourth card in public on the table, there's a betting round. Then you get the fifth card on the table, there's a betting round. So there's a total of four betting rounds and four trances of information revelation if you will. The only the first trance is private. And then it's public from there.
SPEAKER_00
03:33 - 04:27
And this is probably probably by far the most popular game in AI. And just a general public in terms of imperfect information. So this is probably the most popular spectator game to watch, right? So which is why it's a super exciting game tackle. So it's on the order of chess. I would say in terms of popularity in terms of the AI setting it as the bar of what is intelligence. So in 2017, LeBritis, how do you pronounce it? LeBritis, LeBritis, LeBritis beats. Little Latin there, a little bit Latin. LeBritis beats a few four expert human players. Can you describe that event? What you learned from it? What was it like? What was the process in general for people who have not read the papers and said, yeah, so the event was that
SPEAKER_01
04:28 - 05:56
We invited four of the top 10 players with his specialist players in heads-up, no limit takes us all and which is very important because this game is actually quite different than the multiplayer version. We brought him in to Pittsburgh to play at the reverse casino for 20 days. We wanted to get 120,000 hands in. because we wanted to get statistical significance. So it's a lot of hands for humans to play, even for these top pros who play fairly quickly normally. So we couldn't just have one of them play so many hands 20 days. They were playing basically morning to evening and raised 200,000 as a little incentive for them to play. And the setting was so that they didn't all get 50,000. We actually paid them out based on how they did against the AI each, so they had an incentive to play. as hard as they could, whether they're way ahead, the way behind or right at the mark of beating the AI. And you don't make any money, unfortunately. Right, no, we can't make any money. So originally, a couple of years earlier, I actually explored whether we could actually play for money because that would be, of course, interesting as well, to play against the top people for money, but the Pennsylvania Gaming Board said no. So if we couldn't, so this is much like an exhibit like for a musician or a boxer or something like that.
SPEAKER_00
05:56 - 06:18
Nevertheless, you were keeping track of the money and the brightest one cost a $2 million, I think. So if it was for real money, if you were able to earn money, that was a quite impressive and inspiring achievement. Just a few details. What were the players looking at? I mean, were they behind a computer? What was the interface like?
SPEAKER_01
06:18 - 06:53
Yes, they were playing much like they normally do. These stop players when they play this game. They play mostly online. So they used to playing through UI. And they did the same thing here. So there was this layout you could imagine, there's a table on a screen, the human sitting there and then there's the AIC thing there and the screen shows everything that's happening, the cards coming out and shows the bits being made and we also had the bedding history for the human. So if the human forgot what had happened in the hands so far, they could actually reference back and so forth.
SPEAKER_00
06:53 - 06:57
Is there a reason they were given access to the bedding history for
SPEAKER_01
06:57 - 07:12
Well, it didn't really matter that they wouldn't have forgotten anywhere. These are top quality people, but we just wanted to put out there so it's not a question of a human forgetting and the AI somehow trying to get advantage of a better memory.
SPEAKER_00
07:12 - 07:24
So what was that like? I mean, that was an incredible accomplishment. So what did it feel like before the event did you have doubt? Hope, where was your confidence at?
SPEAKER_01
07:24 - 08:23
Yeah, that's great. It's a great question. So for 18 months earlier, I had organized a similar brain versus AI competition with the previous AI called Cloudical, and we couldn't beat the humans. So this time around, it was only 18 months later, and I knew that this new AI liberators was way stronger, but it's hard to say how you'll do against the top humans before you try. So I thought we had about a 50-50 shot. and the international betting sites put us as a 4-1 or 5-1 underdog. So it's kind of interesting that people really believe in people and get over AI. Not just people don't just believe over believing themselves, but they have over confidence in other people as well compared to the performance of AI. And yeah, so we were afforded one to five to one underdog. And even after three days of beating the humans in a row, we were still 50, 50 on the international betting sites.
SPEAKER_00
08:23 - 09:29
Do you think there's something special and magical about poker in the way people think about it? In a sense, you have I mean, even in chess, there's no Hollywood movies. Poker is the star of many movies. And there's this feeling that certain human facial expressions and body language, eye movement, all these tells are critical to poker. Like, you can look into somebody's soul and understand that they're betting strategy and so on. So that's probably why, possibly, do you think that is why people have a confidence that humans because AI systems cannot in this construct perceive these kinds of tells. They're only looking at betting patterns and nothing else in betting patterns and statistics. So what's more important to you if you step back on human players, human versus human, what's the role of these tells of these ideas that we romanticize?
SPEAKER_01
09:29 - 10:40
Yeah. So I'll split it into two parts. So one is wider humans trust humans more than AI and all have over confidence in humans. Yes. I think that's not really related to the telequestion. It's just that they've seen these top players how good they are and they're really fantastic. So it's just hard to believe that the AI could beat them. So I think that's where that comes from. and that's actually maybe a more general lesson about AI that I'm until you've seen it over perform a human, it's hard to believe that it could. But then the tales, a lot of these top players, they're so good at hiding tales that among the top players, it's actually not really worth it for them to invest a lot of effort trying to find tales in each other because they're so good at hiding them. So yes, at the kind of Friday evening game tells that gonna be a huge thing. You can read other people and if you're a good reader, you'll read them like an open book. But at the top level, so poker now. The tells become a lot smaller and smaller aspect of the game as you go to the top levels.
SPEAKER_00
10:41 - 11:05
The amount of strategies, the amount of possible actions, is very large, 10 to the power of 100 plus. So there has to be some, I've read a few of the papers related. It has to form some abstractions of various hands and actions. So what kind of abstractions are effective for the game of poker?
SPEAKER_01
11:05 - 11:45
Yeah, so you're exactly right. So when you go from a game three that's 10 to the 161, especially in an imperfect information game, it's way too large to solve directly, even with our fastest equilibrium finding algorithms. So you want to abstract it first, and abstraction in games is much trickier than abstraction in MDPs or other single agent settings. because you have these abstraction pathologies that if I have a finer grain abstraction, The strategy that I can get from that for the real game might actually be worse than the strategy I can get from the coarse-grained abstraction. She have to be very careful.
SPEAKER_00
11:45 - 11:54
Now, the kinds of abstractions, just a zoom out, we're talking about, there's the hands, abstractions, and then there's bedding strategies.
SPEAKER_01
11:54 - 13:16
We're hitting actions. So there's information abstraction, talk about general games. Information abstraction, which is the abstraction of what chance does. And this would be the cards in the case of poker. And then there's action abstraction, which is abstracting the actions of the actual players, which would be bits in the case of poker. Your self and the other players? Yes, yourself and other players. And for information abstraction, We were completely automated. So these are algorithms. We do what we call potential aware abstraction, where we don't just look at the value of the hand, but also how it might materialize in the good or bad hands over time. And it's a certain kind of bottom-up process with integer programming there and clustering and various aspects. How do you build this abstraction? and then in the action abstraction. There, it's largely based on how humans and other AIs have played this game in the past, but in the beginning, we actually used an automated action abstraction technology, which is probably convergent that it finds the optimal combination of bad sizes, but it's not very scalable. So we couldn't use it for the whole game, but we used it for the first couple of betting actions.
SPEAKER_00
13:16 - 13:37
So what's more important, the strength of the hand, so that the information distraction or the how you play them, the actions, does it, you know, the romanticized notion again is that it doesn't matter what hands you have, that the actions, the bedding, maybe the way you win, no matter what hands you have.
SPEAKER_01
13:37 - 14:10
So that's why you have to play a lot of hands, so that the role of luck get smaller. So you could otherwise get lucky and get some good hands and then you're going to win the match. Even with thousands of hands, you can get lucky. Because there's so much variance in normally mid-texers hold them because if we both go all in, it's a huge stack of variance. So there are these massive swings in normally mid-texers hold them. So that's why you have to play not just thousands, but over a hundred thousand hands, they'll get statistics of significance.
SPEAKER_00
14:11 - 14:22
So let me ask another way this question, if you didn't even look at your hands, but they didn't know that the your opponent didn't know that. How old would you be able to do?
SPEAKER_01
14:23 - 14:41
Oh, that's a good question. There's actually, I heard the story that there's this Norwegian female poker player called Annette Oberstad, who's actually won a tournament by doing exactly that. But that would be extremely rare. So I can't really play well with that.
SPEAKER_00
14:41 - 15:25
So the hands do have some role to play. Yes. So the broadest is not used as far as I understand, I used learning methods, deep learning. is there room for learning in, you know, there's no reason why the bias doesn't, you know, combine with a now-for-go-type approach for estimating the quality for a function estimator. What are your thoughts on this? Maybe as compared to another algorithm, which I'm not there familiar with, deep stack, the engine that does use deep learning that is unclear how well it does, but nevertheless uses deep learning. So what are your thoughts about learning methods to aid? in the way that celebratus plays the game of poker.
SPEAKER_01
15:25 - 18:08
Yeah, so as you said, Librarios did not use learning methods and played very well without them. Since then, we have actually here. We have a couple of papers on things that do use learning techniques. So, and deep learning in particular, and sort of the way you're talking about where it's learning an evaluation function. but in imperfect information games, unlike let's say in Go or now also in Chess and Shogi. It's not sufficient to learn and evaluation for a state because the value of an information set depends not only on the exact state, but it also depends on both players' beliefs. Like, if I have a bad hand, I'm much better off if the opponent thinks I have a good hand. And vice versa, if I have a good hand, I'm much better off if the opponent believes I have a bad hand. So the value of a state is not just a function of the cards. It depends on if you will the path of play. But only to the extent that it's captured in the belief distributions. So that's why it's not as simple as it is in perfect information games. And I don't want to say it's simple there either. It's of course very complicated computationally there too, but at least conceptually it's very straightforward. There's a state. There's an evaluation function. You can try to learn it. Here. You have to do something more. And what we do is in one of these papers, we're looking at allowing where we allow the opponent to actually take different strategies at the leaf of the search tree as if you will. And that is a different way of doing it and it doesn't assume therefore a particular way that the opponent plays, but it allows the opponent to choose from a set of different continuation strategies. And that forces us to not be too optimistic. in a local head search. And that's one way you can do sound local head search in imperfect information games, which is very difficult. And in you were asking about deep stack, what they did, it was very different than what we do, either in Librarians or in this new work. They were randomly generating various situations in the game. Then they were doing the look ahead from there to the end of the game, as if that was a start of a different game. And then they were using deep learning to learn those values of those states, but the states were not just the physical states, they include belief distributions.
SPEAKER_00
18:09 - 18:23
When you talk about look ahead of a deep stack, or with the bodice, doesn't mean considering every possibility that the game can evolve? Are we talking about extremely, sort of, at this exponentially growth of a tree?
SPEAKER_01
18:23 - 19:14
Yes. So we're talking about exactly that. much like you do in alpha beta search or want to crawl to create search but with different techniques. So there's a different search algorithm and then we have to deal with the leaves differently. So if you think about what Libraris did, we didn't have to worry about this because we only did it at the end of the game. So we would always terminate into a real situation and we would know what to pay out this. It didn't do these depth limited local heads, but now in this new paper, which is called depth-limited. I think it's called depth-limited search for imperfect information games. We can actually do sound depth-limited look ahead. So we can actually start to do the look ahead from the beginning of the game on because that's too complicated to do for this whole long game. So in Librarians, we were just doing it for the end.
SPEAKER_00
19:14 - 19:23
So in the other side, there's belief distribution. So is it explicitly modeled what kind of beliefs that the opponent might have?
SPEAKER_01
19:23 - 20:09
Yeah, yeah, it is explicitly modeled, but it's not assumed that beliefs are actually output, not input. Of course, the starting beliefs are input, but they just fall from the rules of the game, because we know that the dealer deals uniformly from the deck. So I know that every pair of cards that you might have is equally likely. I know that for a fact, that's just follows from the rules of the game. Of course, except the two cards that I have, I know you don't have those. You have to take that into a context called card removal and that's very important. Is the dealing always coming from a single deck in a heads up, so you can assume single deck that's a card removal, you know that if I have the Ace of Spades, I know you don't have an Ace of Spades.
SPEAKER_00
20:10 - 20:19
So in the beginning, your belief is basically the fact that it's a fair dealing of hands, but how do you start to adjust that belief?
SPEAKER_01
20:19 - 21:24
Well, that's where this beauty of game theory comes. So Nash equilibrium, which John Nash introduced in 1950 introduces what rational play is when you have more than one player. And these are pairs of strategies where strategies are contingency plans, one for each player. So neither player wants to deviate to a different strategy, giving that the other doesn't deviate. But that's a side effect. You get the beliefs from page rule. So Nash equilibrium really isn't just deriving in this imperfect information games. Nash equilibrium doesn't just define strategies. It also defines beliefs for both of us. And it defines beliefs for each state. So the each state, it's called information sets. At each information set in the game, there's a set of different states that we might be in, but I don't know which one we're in. Nashically really tells me exactly what is the probability distribution over those real wall states in my mind.
SPEAKER_00
21:24 - 21:28
How does Nashically give you that distribution?
SPEAKER_01
21:28 - 22:10
So why? I'll do a simple example. You know the game rock piece paper scissors. So we can draw it as player one moves first and then player two moves, but of course It's important that player 2 doesn't know what player 1 moved, otherwise player 2 would win every time. So we can draw that as an information set, where player 1 makes 1 or 3 moves first, and then there's an information set for player 2, so player 2 doesn't know which of those nodes we'll see. But once we know the strategy for player one, Nash equilibrium will say that you play one third rock, one third paper, one third scissors. From that I can derive my beliefs on the information said that they're one third, one third, one third.
SPEAKER_00
22:10 - 22:21
There's a base gives you that. But is that specific to a particular player? Or is it something you quickly update with those missions?
SPEAKER_01
22:21 - 22:52
No, that's the game theory isn't really players specific. So that's also why we don't need any data. We don't need any history how these particular humans played in the past or how any AI or even had played before. It's all about rationality. So we just think that the AI just thinks about what would a rational opponent do. And what would I do if I were, I am Russian, and that's that's the idea of game theory. So it's really a data-free opponent-free approach.
SPEAKER_00
22:52 - 22:56
So it comes from the design of the game as opposed to the design of the player.
SPEAKER_01
22:56 - 23:25
Exactly. There's no opponent-modeling per se. I mean, we've done some work on combining opponent modeling with game theory so you can exploit weak players even more, but that's another strand and in the liberators we didn't turn that on. Because I decided that these players are too good and when you start to exploit an opponent, you typically open yourself up to exploitation. And these guys have so few holes to exploit and the world's leading experts in counter exploitation. So I decided that we're not gonna turn that stuff on.
SPEAKER_00
23:25 - 23:49
Actually, I saw a few papers in an exploiting opponents. It sounded very interesting to explore. Do you think there's room for exploitation? Generally, that's out of the broadest. Is there a subject or people differences that could be exploited? Maybe not just in poker, but in general interactions and negotiations, all these other domains that you're considering?
SPEAKER_01
23:50 - 25:03
Yeah, definitely. We've done some work on that and I really like the work that hybridizes the two. So you figure out what would a rational opponent do? And by the way, that's safe in these zero-some games. Two players, zero-some games because if the opponent does something irrational, yes, it might show a throw of my beliefs. But the amount that the player can gain by throwing of my belief is always less than they lose by playing poorly. So it's safe. But still, if somebody's weak as a player, you might want to play differently to exploit them more. So you can think about it this way. A game theoretic strategy is unbeatable. But it doesn't maximally be the other opponent. So the winnings per hand might be better with a different strategy. And the hybrid is that you start from a game therapy approach and then as you gain data from about the opponent, In certain parts of the game tree, then in those parts of the game tree, you start to tweak your strategy more and more towards exploitation. While still staying fairly close to the game through these strategies, so as to not open yourself up to exploitation too much.
SPEAKER_00
25:03 - 25:15
How do you do that? Do you try to vary up strategies, make it unpredictable? It's like, what is it to the task strategies in prisoners dilemma?
SPEAKER_01
25:17 - 25:29
Well, it hasn't, that's a repeated game, kind of, a repeated game. It's a simple person, it's the lemma repeated games. But even there, there's no proof that says that that's a best thing. But experimentally, it actually does, does well.
SPEAKER_00
25:29 - 26:07
So what kind of games are there, first of all? I don't know if this is something that you could just summarize. There's perfect information games, we're all the information on the table. There is imperfect information games. There's repeated games that you play over and over. There's zero-sum games. There's non-zero-sum games. Yeah. And there's a really important distinction you're making to player versus more players. So what are what other games are there? And what's the difference for example with this to player game versus more players? Yeah. What are the key differences?
SPEAKER_01
26:07 - 27:54
Right. So let me start from the basic. A repeated game is a game where the same exact game is played over and over. In these extensive form games, I think about three form, maybe with these informations as to represent incomplete information. You can have kind of repetitive interactions. Even repeated games are a special case of that by the way. But the game doesn't have to be exactly the same. It's like in sourcing objects. Yes, we're going to see the same supply base year to year. But what I'm buying is a little different every time. And the supply base is a little different every time. And so it's not really repeated. So to find a purely repeated game is actually very rare in the world. So they're really a very coarse model of what's going on. Then, if you move up from just repeated simple repeated matrix games, not all the way to extensive form games, but in between the stochastic games where, you know, this, you think about it like these little matrix games and when you take an action and your own text an action, They determine not which next date I'm going to, next game I'm going to, but the distribution over next games where I might be going to. So that's the stochastic game, but it's like matrix games repeated stochastic games, extensive form games. That is from less to more general. And poker is an example of the last one, so it's really in the most general setting. extensive form games and that's kind of what the AI community has been working on and being benched marked on with this heads-up and all of it takes us hold.
SPEAKER_00
27:54 - 27:58
Can you describe extensive form games? What's the model here?
SPEAKER_01
27:58 - 29:30
Yeah, so if you're easily with the tree form, so it's really the tree form. Like in chess, there's the search tree versus a matrix. Yeah, and that the matrix is called the matrix form or by matrix form or normal form game. And here you have the tree form, so you can actually do certain types of reasoning there. that you'll lose to information when you go to normal form. There's a certain form of equivalence, like if you go from three form and you said, every possible contingency plan is a strategy. Then I can actually go back to the normal form, but I lose some information from the lack of sequentiality. Then the multiplayer versus two player distinction is an important one. So two player games in zero sum. Conceptually easier and computationally easier. The still huge like this one, but they're conceptually easier and computationally easier in that conceptually you don't have to worry about which equilibrium is the other guy going to play when they're multiple because any equilibrium strategy is a best response to any other equilibrium strategy. So I can play a different equilibrium from you and we'll still get the right values of the game. That falls apart even with two players when you have general some games. Even without cooperation, just even without cooperation. So there's a big gap from two players zero sum to two player general sum or even to three players zero sum. That's a big gap. At least in theory.
SPEAKER_00
29:30 - 29:47
Can you maybe not mathematically provide the intuition why all falls apart with three or more players? It seems like you should still be able to have a Nash equilibrium that's instructive that holds.
SPEAKER_01
29:47 - 30:38
Okay, so it is true that all finite games have a Nash equilibrium. So this is what John Nash actually proved. So they do have a Nash equilibrium. That's not the problem. The problem is that there can be many. And then there's a question of which equilibrium to select. So, and if you select your strategy from a different equilibrium and I select mine, then what does that mean? And in these non-zero game, we may lose some So it benefits with, by being just simply stupid, we could actually both be better off if we did something else. Yes. And in three player, you get other problems also like collusion. Like maybe you and I can get up on a third player and we can do radically better by colluding. So there are lots of issues that come up there.
SPEAKER_00
30:38 - 31:21
So no Brown, the student you work with on this has mentioned, I looked through the AMA and read it. He mentioned that the ability of poker players to collaborate will make the game He was asked the question of how would you make the game of poker, or both of you were asked the question, how would you make the game of poker beyond being solvable by current AI methods? And he said that there's not many ways of making poker more difficult, but collaboration or cooperation between players would make it extremely difficult. So can you provide the intuition behind why that is? If you agree with that idea?
SPEAKER_01
31:21 - 31:55
Yeah, so we've done a lot of work, uh, coalitional games. And we actually have a paper here with my other student, Gabrielle Farrina and some other collaborators on, uh, at Nip's on that. Actually, it just came back from the post-assession where we presented it. But, uh, so when you have a collusion, it's a different problem. Yes. And it typically gets even harder than even the game representations. Some of the game representations don't really allow. good computation, so we actually introduced a new game representation for that.
SPEAKER_00
31:55 - 32:14
Is that kind of cooperation part of the model? Do you have information about the fact that other players are cooperating or is it just this chaos that we're nothing is known? So there's some things unknown. Can you give an example of a collusion type game or is it just like the breach?
SPEAKER_01
32:14 - 33:01
Yeah, so think about breach. It's like when you and I are on a team, Our payoffs are the same. The problem is that we can't talk. So so when I get my cards, I can't whisper to you what my cards are. That would not be allowed. So we have to somehow coordinate our strategies ahead of time. and only ahead of time. And then there's certain signals we can talk about, but they have to be such at the other team also understands that. So that's an example where the coordination is already built into the rules of the game. But in many other situations like auctions or negotiations or diplomatic relationships, poker, it's not really built in, but it still can be very helpful for the colloders.
SPEAKER_00
33:02 - 33:28
Ever, you write somewhere, when negotiations, you come to the table with prior, like a strategy that you're willing to do and not willing to do those kinds of things. So how do you start to, now moving away from Pokemon, moving beyond poker into other applications, like negotiations? How do you start applying this to other domains, be even real world domains that you've worked on?
SPEAKER_01
33:29 - 34:00
Yeah, I actually have two startup companies doing exactly that. One is called strategic machine and that's for kind of business applications, gaming, sports, all sorts of things like that. Any applications? of this to business and to sports and to gaming to various types of things in finance, electricity markets and so on. And the other is called strategy robot where we are taking this to military security, cybersecurity and intelligence applications.
SPEAKER_00
34:00 - 34:14
I think you worked a little bit in how you put it advertisement, sort of suggesting ad kind of thing, that's another company, optimized markets.
SPEAKER_01
34:14 - 34:23
But that's much more about a combinatorial market and optimization based technology. That's not using these games here with the reasoning technologies.
SPEAKER_00
34:23 - 34:44
Okay, so what's the high level do you think about our ability to use Game theory at the concepts to model human behavior. Do you think human behavior is amenable to this kind of modeling? So outside of the poker games and where have you seen it done successfully in your work?
SPEAKER_01
34:44 - 35:03
I'm not sure. The goal really is modeling humans. Like for example, if I'm playing a zero, some game. Yes. I don't really care that the opponent is actually following my model of rational behavior because if they're not, That's even better for me.
SPEAKER_00
35:03 - 36:20
Right. So see with the opponent's games, there's a the prerequisite is that you formalize the interaction in some way that can be amenable to analysis. And you've done this amazing work with mechanism design, designing games that have certain outcomes. But so I'll tell you an example from my world of autonomous vehicles, right? We're studying pedestrians and pedestrians and cars negotiate in this non-verbal communication There's a weird game dance of tension where pedestrians are basically saying I trust that you won't kill me and so as a J walker I will step on to the road even though I'm breaking the law and there's this Tension and the question is, really don't know how to model that well in trying to model intent. And so people sometimes bring up ideas of game theory and so on. Do you think that aspect of human behavior can use these kinds of imperfect information approaches modeling? How do you start to attack a problem like that? when you don't even know how to design the game to describe the situation, you know, to solve it.
SPEAKER_01
36:20 - 37:37
Okay, so I haven't really thought about Jay walking, but one thing that I think could be a good application in an autonomous vehicle since the following. So let's say that you have fleets of autonomous cars operating by different companies. So maybe here's the way more fleet and here's the Uber fleet. If you think about the rules of the road, They define certain legal rules, but that still leaves a huge strategy space open. Like as a simple example, when cars merge, you know how humans merge, you know, they slow down and look at each other and try to try to merge. Wouldn't it be better if these situations would all repeat pre-negotiated? So we can actually merge at full speed and we know if this is the situation, this is how we do it and it's all going to be faster. But there are way too many situations to negotiate manually. So you could use automated negotiation. This is the idea at least. You could use automated negotiation to negotiate all of these situations or many of them in advance. And of course it might be that hey, Maybe you're not going to always let me go first. Maybe you said, okay, well, in these situations, I'll let you go first. But in exchange, you're going to give me to them as you're going to let me go first in these situations. So it's this huge combinatorial negotiation.
SPEAKER_00
37:37 - 37:46
And do you think there's room in that example of merging to model those whole situations in in perfect information game? Or do you really want to consider it to be a perfect?
SPEAKER_01
37:46 - 37:49
No, that's a good question. Yeah, that's a good question.
SPEAKER_00
37:49 - 37:56
Do you pay the price? of assuming that you don't know everything.
SPEAKER_01
37:56 - 38:11
Yeah, I don't know. It's certainly much easier. Games with perfect information are much easier. So if you can get away with it, you should. But if the real situation is of imperfect information, then you're going to have to deal with imperfect information.
SPEAKER_00
38:11 - 38:47
Great, so what lessons have you learned, the annual computer poker competition? An incredible accomplishment of AI, you know, you look at the history of Deep Blue, AlphaGo, these kind of moments when AI stepped up in an engineering effort and a scientific effort combined to beat the best of human players. So what do you take away from this whole experience? What have you learned about designing AI systems that play these kinds of games? And what does that mean for sort of AI in general for the future of AI development.
SPEAKER_01
38:47 - 40:58
Yeah, so that's a good question. So there's so much to say about it. I do like this type of performance-oriented research, although in my group we go all the way from idea to theory, to experiments, to big system-fielding, to commercialization. So we spend that spectrum, but I think that in a lot of situations in AI, you really have to build the big systems and evaluate them at scale before you know what works and doesn't. And we've seen that in a computational game theory community, that there are a lot of techniques that look good in the small. but then this is to look good in the large. And we've also seen that there are a lot of techniques that look superior in theory. And I really mean in terms of convergence rates better, like first order methods, better convergence rates, like the CFR based algorithms, yet the CFR based algorithms are the first, fastest in practice. So it really tells me that you have to test this in reality, the theory isn't tight enough if you will to tell you which algorithms are better than the others, and you have to look at these things in the large, because any sort of projections you do from the small can at least in this domain be very misleading. So that's kind of from a kind of science, an engineering perspective, from a personal perspective, it's been just a wild experience in that with the first poker competition, the first first Brains versus AI, Manmaschine, poker competition that we organized. There had been, by the way, for other poker games. There had been previous competitions, but this was, oh, heads up, no limit, this was the first. And I probably became the most hated person in the world of poker. And I didn't mean to, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, People would be scared to play poker because they're all the superhuman AI's running around taking their money and you know, all of that. So I just, it's just really aggressive. The comments were super aggressive. I got everything just short of death rates.
SPEAKER_00
40:58 - 41:11
Do you think the same was true for chess? Because right now, they just completed the World Championships in chess and humans just started ignoring the fact that there's AI systems now that perform humans and they still enjoy the game and still a beautiful game.
SPEAKER_01
41:12 - 41:50
That's what I think, and I think the same thing happens in poker. So I didn't think of myself as somebody who was going to kill the game, and I don't think I did. I've really learned to love this game. I wasn't the poker player before, but learn so many nuances about it from these AI's, and they've really changed how the game is played by the way. So they have these very Martian ways of playing poker, and the top humans are now incorporating those types of strategies into their own play. So if anything to me, Our work has made poker a richer, more interesting game for humans to play. Not something that is going to steer humans away from it entirely.
SPEAKER_00
41:50 - 42:28
Just a quick comment and something you said was just, if I may say so in academia is a little bit rare sometimes, it's pretty brave to put your ideas to the test and the way you described. Saying that sometimes good ideas don't work when you actually try to apply them at scale. So, where does that come from? I mean, what if you could do advice for people, what drives you in that sense? Were you always this way? I mean, it takes a brave person, I guess is what I'm saying, to test their ideas and to see if this thing actually works against human top human players and so on.
SPEAKER_01
42:28 - 42:39
I don't know about brave, but it takes a lot of work. It takes a lot of work and a lot of time to organize, do make something big and to organize an event and stuff like that.
SPEAKER_00
42:39 - 42:48
What drives you in that effort? Because you could still, I would argue, get a best paper award in Nips as you did in 17 without doing this. That's right. Yes.
SPEAKER_01
42:51 - 43:25
So in general, I believe it's very important to do things in the real world and at scale. And that's really where they're. the pudding. If you will prove it's in the pudding, that's where it is. In this particular case, it was kind of a competition between different groups and for many years as the who can be the first one to beat the top humans at heads up, no limit takes us whole them. So it became kind of a competition who didn't get there.
SPEAKER_00
43:26 - 44:31
Yeah, so a little friendly competition could be, uh, I could do wonders for progress. Yes. So the topic of mechanism design, which is really interesting, also kind of new to me, except as an observer of, I don't know, politics and any, I'm an observer of mechanisms, but you're writing your paper, an automated mechanism design that I quickly read. So, Microsoft Design is designing the rules of the game, so you get a certain desirable outcome. And you have this work on doing so in an automatic fashion as opposed to fine tuning it. So, what have you learned from those efforts? If you look, say, I don't know, at complex, like our political system, can we design our political system to have in an automated fashion? to have outcomes that we want, can we design something like traffic lights, to be smart, where it gets outcomes that we want. So what are the lessons that you draw from that work?
SPEAKER_01
44:31 - 44:59
Yeah, so I still very much believe in the automated mechanism design direction. But it's not a panacea. There are impossibility results in mechanism design, saying that there is no mechanism that accomplices objective x in class C. So it's not going to, there's no way using any mechanism design tools, manual or automated. to do certain things in the mechanism.
SPEAKER_00
44:59 - 45:04
Can you describe that again? So, meaning there is some possible to achieve that. Yeah.
SPEAKER_01
45:04 - 46:29
And so, there's something like, impossible. So, so these are not statements about human ingenuity. It will might come up with something smart. These are proofs that if you want to accomplish properties X in class C, that is not doable with any mechanism. good thing about automated mechanism design is that we're not really designing for a class, we're designing for specific settings at a time. So even if there's an impossibility result for the whole class, it just doesn't mean that all of the cases in the class are impossible. It just means that some of the cases are impossible. So we can actually carve these islands of possibility within these known impossible classes. And we've actually done that. So one of the famous results in mechanism design is some higher-ish and set-eth weight theorem, by Roger Meyers and Mark Setteth weight from 1983. It's an impossibility of efficient trade under imperfect information. We showed that you can in many settings avoid that and get efficient trade anyway. Depending on how they design the game, okay? So depending how you design the game, and of course, it's not, it doesn't in any way, anyway, contradict the impossibility result. It's still there, but it just finds spots within this impossible class. where in those spots you don't have the impossible.
SPEAKER_00
46:29 - 47:05
Sorry if I'm going a bit philosophical, but what lessons do you draw towards like I mentioned politics or human interaction and designing mechanisms for outside of just these kinds of trading or auctioning or Purely formal games are human interaction, like a political system. Do you think it's applicable to politics or to business, to negotiations, these kinds of things, designing rules that have certain outcomes?
SPEAKER_01
47:05 - 47:07
Yeah, yeah, I do think so.
SPEAKER_00
47:07 - 47:09
Have you seen success that successfully done?
SPEAKER_01
47:10 - 48:30
They hasn't really, oh, you mean mechanism design or automated mechanism design? So mechanism design itself has had fairly limited success so far. There are certain cases, but most of the real world situations are actually not sound from a mechanism design perspective, even in those cases where they've been designed by very knowledgeable mechanism design people. The people are typically just taking some insights from the theory and applying those insights into the real world rather than applying the mechanisms directly. So one famous example of is the FCC spectrum options. So I've also had a small role in that and very good economists have been working on that with no game theory. Yet the rules that are designed in practice there, they are such that being truthfully is not the best strategy. usually mechanism design we try to make things easy for the participants, so telling the truth is the best strategy. But even in those very high stakes options where you have tens of billions of dollars worth of spectrum reduction, truth telling is not the best strategy. And by the way, nobody knows even a single optimal bidding strategy for those options.
SPEAKER_00
48:30 - 48:34
What's the challenge of coming up with an up to because there's a lot of players and there's in person.
SPEAKER_01
48:34 - 48:46
There's not a lot of players but many items for sale. And these mechanisms are such that even with just two items or one item, putting truthfully wouldn't be the best strategy.
SPEAKER_00
48:47 - 49:17
If you look at the history of AI, it's marked by seminal events and an AlphaGo meeting world champion, human go player. I would put LeBritis winning the heads of nullimit, hold him as one of such events. And what do you think is the next such event? Whether it's in your life or in the broadly A.A. community that you think might be out there that would surprise the world.
SPEAKER_01
49:18 - 50:23
So that's a great question and I don't really know the answer. In terms of game-solving, hits up no limit, takes us all and really was the one remaining widely agreed upon benchmark. So that was the big milestone. Now, are there other things? Yeah, certainly there are, but there is not one that community has kind of focused on. So what could be other things? There are groups working on Starcraft. There are groups working on Dota 2, these are video games. Yes, or you could have like diplomacy or Hanaabi, you know, things like that. These are like recreational games, but none of them are really acknowledged as kind of the main next challenge problem. Like chess or go or heads up, no limit takes us whole and was. So I don't really know in the game solving space what is or what will be the next benchmark. I hope that there will be an next benchmark because really the different groups working on the same problem really drove these application independent techniques forward very quickly over 10 years.
SPEAKER_00
50:23 - 50:33
Do you think there's an open problem that excites you that you start moving away from games into real world games like say the stock market trading?
SPEAKER_01
50:33 - 50:54
Yeah, so that's kind of how I am. So I am probably not going to work as hard on these recreational benchmarks. I'm doing two startups on games, solving technology, strategic machine and strategy robot. And we're really interested in pushing this stuff into practice.
SPEAKER_00
50:54 - 51:18
What do you think would be really, you know, as a powerful result of the surprising? that would be, if you can say, I mean, you know, five years, 10 years from now, something that's statistical, you would say is not very likely, but if there's a breakthrough, what achieve?
SPEAKER_01
51:18 - 52:14
Yeah, so I think at the overall, we're in a very different situation in Game theory, then we are in, let's say machine learning. Yes. So in machine learning, it's a fairly mature technology and it's very broadly applied and proven success in the real world. In game solving, there are almost no applications yet. We have just become superhuman, which machine learning you could argue happened in the 90s, if not earlier. And at least on supervised learning, certain complex supervised learning applications. Now, I think the next challenge problem, I know you're not asking about this way, you're asking about the technology breakthrough, but I think that the big, big breakthrough is to be able to show it, hey, maybe most of, let's say, military planning or most of business strategy will actually be done strategically using computational game theory. That that's what I would like to see as the next five or ten year goal.
SPEAKER_00
52:14 - 52:43
Maybe you can explain to me again, forgive me if this is an obvious question, but You know, machine learning methods, you know, networks are suffer from not being transparent, not being explainable. A game theoretic methods, you know, Nash equilibrium, do they generally when you see the different solutions, are they? When you talk about military operations, are they once you see the strategies, do they make sense or they explainable or do they suffer from the same problems as you know that works too?
SPEAKER_01
52:44 - 53:36
So that's a good question. I would say a little bit yes and no. Well, what I mean by that is that these game-threaded strategies, let's say, Nashickalyperium. It has provable properties. So it's unlike, let's say, deep learning where you kind of cross your fingers, hopefully it'll work. And then after the fact when you have the weights, you're still crossing your fingers and I hope it will work. Here you know that the solution quality is there. This provable solution quality guarantees. Now that doesn't necessarily mean that the strategies are human understandable. That's a whole other problem. So I think at deep learning and computational game theory are in the same boat in that sense that both are difficult to understand. But at least the game theory techniques they have this guarantees of security.
SPEAKER_00
53:36 - 53:49
Do you see business operations, strategic operations, or even military in the future, being at least the strong candidates being proposed by automated systems? Do you see that?
SPEAKER_01
53:49 - 53:55
Yeah, I do. But that's more of a belief than a substantiated fact.
SPEAKER_00
53:56 - 54:37
Depending on where you land and optimism or pessimism, that's a really, to me, that's an exciting future, especially if there are approval things in terms of optimality. So looking into the future, there's a few folks worried about, especially you look at the game of poker, which is probably one of the last benchmarks in terms of games. being solved. They worry about the future and the existential threats of artificial intelligence. So the negative impact in whatever form on society, is that something that concerns you as much or you more optimistic about the positive impacts of AI?
SPEAKER_01
54:37 - 55:50
Well, I am much more optimistic about the positive impact. So just in my own work, what we've done so far, we run the nation where kidney exchange hundreds of people are walking around alive today, who would it be? And it's increased employment. You have a lot of people now, running kidney exchanges, and that transplant centers interacting with the kidney exchange. You have extra surgeons, nurses, anesthesiologists, hospitals, all of that. So, so, employment is increasing from that, and the world is becoming a better place. Another example is combinatorial sourcing auctions. We did 800 large-scale combinatorial sourcing auctions from 2001 to 2010 in a previous startup of mine called Combignment. And we increased the supply chain efficiency on that $60 billion of spend by 12.6%. So that's over $6 billion of efficiency improvement in the world. And this is not like shifting value from somebody to somebody else, just efficiency improvement, like in tracking, less empty driving. So there's less waste, less carbon footprint and so on.
SPEAKER_00
55:50 - 55:59
The huge positive impact in the near term, but sort of to stay in it for a little longer because I think game theory is a role to play here.
SPEAKER_01
55:59 - 56:10
Well, let me actually come back on that as one thing. I think AI is also going to make the world much safer. So that's another aspect that Dolphin gets overlooked.
SPEAKER_00
56:10 - 56:54
Let me ask this question, maybe you can speak to the safer. So I talked to Max Techmark and Stuart Russell who are very concerned about existential threats of AI and often the concern is about value misalignment. So AI systems basically working, operating towards goals that are not the same as human civilization, human beings. So it seems like Game Theory has a role to play there to make sure the values are aligned with human beings. I don't know if that's how you think about it. If not, how do you think AI might help with this problem? How do you think AI might make the world safer?
SPEAKER_01
56:56 - 58:04
Yeah, I think this value misalignment is a fairly theoretical worry and I haven't really seen it in because I do a lot of real applications. I don't see it anywhere. The closest I've seen it was the following type of mental exercise really where I had this argument in the late 80s when we were building this transportation optimization system. And somebody had heard that it's a good idea to have high utilization of assets. So they told me, hey, why don't you put that as objective? And we didn't even pull it as an objective because I just showed him at, you know, if you had that as your objective, the solution would be to load your trucks full and driving circles. Nothing would ever get delivered. You'd have 100% utilization. So yeah. I know this phenomenon, I've known this for over 30 years, but I've never seen it actually be a problem reality. And yes, if you have the wrong objective, the AI will optimize that to the health, and it's going to hurt more than some human who's trying to solve it in a half-baked way with some human inside too, but I just haven't seen that materializing practice.
SPEAKER_00
58:05 - 59:00
there's this gap that you've actually put your finger on very clearly just now between theory and reality that's very difficult to put into words I think it's what you can theoretically imagine the worst possible case or even yeah I mean bad cases and what usually happens in reality so for example to me maybe it's something you can comment on having grown up and I grew up in the Soviet Union There's currently 10,000 nuclear weapons in the world. And for many decades, it's theoretically surprising to me that the nuclear war is not broken out. Do you think about this aspect from a game theoretic perspective in general? Why is that true? Why in theory you can see how things go terribly wrong and somehow yet they have not.
SPEAKER_01
59:00 - 01:00:07
Yeah, how do you think so? So I do think about that a lot. I think the biggest two threats that we're facing as mankind one is climate change and the other is nuclear war. So as those are my main two worries that they're worried about and I've tried to do something about climate, I thought about trying through something for climate change twice. Actually, before two of my startups have actually commissioned studies of what we could do on those things and we didn't really find a sweet spot but I'm still keeping an eye out on that if there's something where we could actually provide a market solution or optimization solution or some other technology solution to problems. Right now, like for example, pollution, credit markets was what we were looking at then. And it was much more the lack of political will by those markets were not so successful rather than bad market design. So I could go in and make a better market design, but that wouldn't really move the needle on the world very much if there's no political will and in the US, you know, the market at least the Chicago market was just shut down and so on. And then it doesn't really help create the market design was.
SPEAKER_00
01:00:07 - 01:00:31
So on the nuclear side, it's more so global warming is a more encroaching problem. Nuclear weapons have been here. It's an obvious problem has just been sitting there. So how do you think about what is the mechanism designed there that just made everything seems stable and are you still extremely worried?
SPEAKER_01
01:00:31 - 01:01:14
I am still extremely worried so you probably know the simple game theory of mad so this was a mutually assured destruction and it doesn't require any computation with small matrices you can actually convince yourself that the game is such that nobody wants to initiate yeah that's a very coarse-grained analysis and it really works in a situational way you have Two superpowers or small numbers or superpowers. Now things are very different. You have a smaller nuke so the threshold of initiating smaller and you have smaller countries and non nation actors who make it a nuke and so on. So I think it's riskier now than it was maybe ever before.
SPEAKER_00
01:01:15 - 01:01:37
And what idea? Application AI. You've talked about it a little bit, but what is the most exciting to you right now? I mean, you're here at NIPs, new reps now. You have a few excellent pieces of work, but what are you thinking into the future? What's the most exciting thing or one of the exciting things?
SPEAKER_01
01:01:37 - 01:02:17
The number one thing about for me right now is coming up with these scalable techniques for game solving and applying them into the real world. I'm still very interested in market design as well and we're doing that in the optimized markets, but I'm most interested if number one right now is strategic machine strategy robot getting that technology out there and seeing as you are in the trenches doing applications, what needs to be actually filled, what technology gaps still need to be filled. So it's so hard to just put your feet on the table and imagine what needs to be done, but when you're actually doing real applications, the applications tell you what needs to be done, and I really enjoy that interaction.
SPEAKER_00
01:02:17 - 01:03:06
Is it a challenging process to apply some of the state the architect needs you working on and having the various players in industry, or the military, or people who could really benefit from it, actually use it. What's that process like of, you know, in Atari's vehicles who work with automotive companies, and in many ways, they're a little bit old fashioned, it's difficult. They really want to use this technology. There's clearly will have a significant benefit, but the systems aren't quite in place to easily have them integrated in terms of data in terms of compute in terms of all these kinds of things. So, do you, is that one of the bigger challenges that you're facing and how do you tackle that challenge?
SPEAKER_01
01:03:06 - 01:03:39
Yeah, I think that's always a challenge. That's going to slowness and inertia really. of let's do things the way we've always done it. You just have to find the internal champions at the customer who understand that, hey, things can't be the same way in the future. Otherwise bad things are going to happen. And it's in the autonomous vehicles. It's actually very interesting that the car makers are doing that, they're very traditional. But at the same time, you have tech companies who have nothing to do with cars or transportation, like Google and Baidu, really pushing on autonomous cars. I find that fascinating.
SPEAKER_00
01:03:39 - 01:04:05
Clearly, you're super excited about actually these ideas having the impact in the world. In terms of the technology, in terms of ideas and research, are there directions that you're also excited about, whether that's on some of the approaches you talked about for the imperfect information games, whether it's applying deep learning to some of these problems? Is there something that you're excited in the research side of things?
SPEAKER_01
01:04:05 - 01:04:48
Yeah, yeah, lots of different things in the game solving. So solving even bigger games games where you have more hidden action of the player actions as well. Poker is a game where really the chance actions are hidden or some of them are hidden, but the player actions are public. multiplayer games of various sorts, collusion, opponent exploitation, and even longer games. Some games that basically go forever, but they're not repeated. So seek extensive fun games that go forever. What would that even look like? How do you represent that? How do you solve that?
SPEAKER_00
01:04:48 - 01:04:52
What's the example of a game like that? This is some of the stochastic games that you imagine.
SPEAKER_01
01:04:52 - 01:06:02
Let's say business strategy. And not just modeling like a particular interaction, but thinking about the business from here to eternity. Or let's say military strategy. So it's not like war is going to go away. How do you think about military strategy? Let's go forever. How do you even model that? How do you know whether a move was good that you use somebody made? And so on. So that's kind of one direction. I'm also very interested in learning much more scalable techniques for integer programming. So we had a nice ML paper this summer on that. For the first automated algorithm configuration paper that has theoretical generalization guarantees. So if I see these many training examples, And I told my algorithm in this way, it's going to have good performance on the real distribution, which I've not seen. So if we're just kind of interesting that algorithm configuration has been going on now for at least 17 years, seriously. And there has not been any generalization here before.
SPEAKER_00
01:06:02 - 01:06:10
Well, this is really exciting. And it's a huge honor to talk to you. Thank you so much, Tomas. Thank you for bringing in the artists to the world. And all the great work you're done.
SPEAKER_01
01:06:10 - 01:06:12
Well, thank you very much. It's been fun. Good for questions.