Transcript for #94 – Ilya Sutskever: Deep Learning

SPEAKER_01

00:00 - 03:09

The following is a conversation with Elias Eskever. Co-founder and chief scientist of OpenAI, one of the most cited computer scientists in history with over 165,000 citations. And to me, one of the most brilliant and insightful minds ever in the field of deep learning. There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence, and life in general than Ilya, on and off the mic. This was an honor and a pleasure. This conversation was recorded before the outbreak of the pandemic. For everyone feeling the medical, psychological, and financial burden of this crisis, I'm sending love your way. Stay strong. We're in this together. We'll beat this thing. This is the Artificial Intelligence podcast. If you enjoy it, subscribe my YouTube, review it with five stars in that podcast, support it on Patreon or simply connect with me on Twitter at Lex Freedman spelled FRIDMAN. As usual, I'll do a few minutes of ads now and never any ads in the middle that can break the flow of the conversation. I hope that works for you and doesn't hurt the listening experience. This show is presented by Cash App. The number one finance app in the app store. When you get it, use code Lex Podcast. Cash App, let's you send money to friends by Bitcoin, invest in the stock market with his little is one dollar. Since Cash App allows you to buy Bitcoin, let me mention that cryptocurrency in the context of the history of money is fascinating. I recommend a scent of money as a great book on this history. Both the book and audiobook are great. Deppison credits on ledgers started around 30,000 years ago. The US dollar created over 200 years ago, and Bitcoin, the first decentralized cryptocurrency, released just over 10 years ago. So given that history, cryptocurrency is still very much in its early days of development, but it's still aiming to just might redefine the nature of money. So again, if you get cash out from the App Store Google Play and use the code, Lex, podcast, you get $10 and cash out will also donate $10 to first, an organization that is helping advance robotics and STEM education for young people around the world. And now, here's my conversation with Ilya Satskiver. You were one of the three authors with Alex Kishowski, Jeff Hinton of the famed Alex Ned paper that is arguably the paper that marked a big catalytic moment that launched a deep learning revolution. At that time, take us back to that time. What was your intuition about neural networks, about the representational power of neural networks? And maybe you could mention how did that evolve over the next few years up to today over the 10 years?

SPEAKER_00

03:10 - 04:50

Yeah, I can answer that question. At some point in about 2010 or 2011, I connected two facts in my mind. Basically, the realization was this. At some point, we realize that we can train very large, I shouldn't say very tiny, by today's standards, but large and deep neural networks, and to end with back propagation. At some point, different people obtained this result, I obtained this result. The first moment in which I realized that deep neural networks are powerful was when James Martin's invented the Hessian free optimizer in 2010. and hit train the 10 layer neural network and to end without pre-training from scratch. And when that happens, I thought this is it. Because if you can train a big neural network, a big neural network can represent very complicated function. Because if you have a neural network with 10 layers, It says, though, you allow the human brain to run for some number of milliseconds, neuron firing the slow. And so in maybe 100 milliseconds, your neurons only fire 10 times. So it's also kind of like 10 layers. And in 100 milliseconds, you can perfectly recognize any object. So I thought, so I already had the idea then that we need to train a very big neural network. on lots of supervised data, and then it must succeed because we can find the best neural network. And then there's also theory that if you have more data than parameters, you want over fit. Today we know that actually this theory is very incomplete and you want over fit even you have less data than parameters, but definitely if you have more data than parameters you want over fit.

SPEAKER_01

04:50 - 05:03

So the fact that neural networks were heavily overparameterized wasn't discouraging to you. So you were thinking about the theory that the number of parameters, the fact there's a huge number of parameters is okay. It's going to be okay.

SPEAKER_00

05:03 - 05:39

I mean, there was some evidence before that it was okay. But the theory was the theory was that if you had a big data set and a big neural native is going to work, the overparameterization just didn't really figure much as a problem. I thought, well, the images are just going to add some data augmentation and it's going to be okay. So where was any doubt coming from? The main doubt was, can we train a big, if you really have enough computer training, big enough neural net, with very propagation? The bad propagation I thought was would work, the thing which wasn't clear would, was whether there would be enough compute to get a very convincing result. And then at some point Alex Kyrgyzki wrote these insanely fast good occernals for training convolutional neural nets. And that was bam. Let's do this. Let's get an image net and it's going to be the greatest thing.

SPEAKER_01

05:40 - 06:11

Was your intuition most of your intuition from empirical results by you and by others? Like, just actually demonstrating that a piece of program can train a time later in your own network? Or was there some pen and paper or marker and whiteboard thinking intuition? Because you just connected a 10 layer large neural networks to the brain. So you just mentioned the brain. So in your intuition about neural networks, does the human brain come into play as intuition builder?

SPEAKER_00

06:11 - 07:18

Definitely. I mean, you know, you gotta be precise with these analogies between neuroartificial neural networks and the brain. But there is no question that the brain is a huge source of intuition and inspiration for deep learning researchers since all the way from Rosenblatt in the 60s. Like, if you look at the whole idea of a neural network is directly inspired by the brain, you had people like McCollum and Pitz who were saying, hey, you got these new neurons in the brain. And hey, we recently learned about the computer and automata. Can we use some ideas from the computer and automata to design some kind of computational object that's going to be simple computational and kind of like the brain and they invented the neuron. So they were inspired by it back then. Then you had the convolutional neural network from Fukushima and then later Jan Lacan who said, hey, if you limit the receptive fields of neural network, it's going to be especially suitable for images as it's going to be true. So there was a similar very small number of examples of where analogies to the brain were successful. And yeah, I thought, well, probably at an artificial neuron is not that different from the brain if it's cleaned hard enough. So it's just assumed it is and roll with it.

SPEAKER_01

07:18 - 07:42

So we're now at a time where deep learning is very successful. So let us squint less and say, let's open our eyes and say, what do you use an interesting difference between the human brain? Now I know you're probably not an expert neither in your scientists and your biologists, but loosely speaking, what's the difference between the human brain and artificial neural networks? That's interesting to you for the next decade or two.

SPEAKER_00

07:43 - 08:17

That's a good question to ask. What is an interesting difference between the brain and our artificial neural networks? So I feel like today artificial neural networks, so we all agree that there are certain dimensions in which the human brain vastly outperforms our models. What I also think that there are some ways in which our artificial neural networks have a number of very important advantages over the brain. Look, looking at the advantages versus disadvantages is a good way to figure out what is the important difference. So the brain uses spikes which may or may not be important.

SPEAKER_01

08:17 - 08:24

Yes, that's a really interesting question. Do you think it's important or not? That's one big architectural difference between artificial neural networks.

SPEAKER_00

08:25 - 09:32

It's hard to tell, but my prior is not very high, and I can say why. You know, there are people who interested in spike in neural networks. And basically what they figured out is that they need to simulate the non-spike in neural networks in spikes. And that's how they're going to make them work. If you don't simulate the non-spike neural networks in spikes, it's not going to work because the question is why should it work? And that connects to questions around back propagation and questions around deep learning. You got this giant neural network. Why should it work at all? Why should the learning rule work at all? It's not a self-evident question, especially if you would just start in the field and you read the very early papers. You can say, hey, people are saying, let's build neural networks. That's a great idea because the brain is a neural network, so it would be useful to build neural networks. Now, let's figure out how to train them. It should be possible to train them properly, but how? And so the big idea is the cost function. That's the big idea. The cost function is a way of measuring the performance of the system according to some measure.

SPEAKER_01

09:32 - 09:50

By the way, that is a big, actually let me think. Is that a one a difficult idea to arrive at and how big of an idea is that that there's a single cost function? Sorry, let me take a pause. It's supervised learning a difficult concept to come to.

SPEAKER_00

09:50 - 09:50

I don't know.

SPEAKER_01

09:51 - 10:15

All concepts are very easy in retrospect. Yeah, that's what it seems trivial now, but I, because the reason I asked that, and we'll talk about it, is there other things? Is there things that don't necessarily have a cost function, maybe have many cost functions, or maybe have dynamic cost functions, or maybe a totally different kind of architectures? Because we have to think like that in order to arrive at something new, right?

SPEAKER_00

10:15 - 10:46

So the only, so the good examples of things that don't have clear cost functions are GANS. And again, you have a game. So instead of thinking of a cost function, where you want to optimize where you know that you have an algorithm gradient descent, which will optimize the cost function. And then you can reason about the behavior of your system in terms of what it optimizes. With a game, you say I have a game, and I'll reason about the behavior of the system in terms of the equilibrium of the game. But it's all about coming up with this mathematical object that help us reason about the behavior of our system.

SPEAKER_01

10:47 - 10:54

Right, that's interesting. So again, it's the only way. It's kind of the cost function is emergent from the comparison.

SPEAKER_00

10:54 - 11:14

I don't know if it has a cost function. I don't know if it's meaningful to talk about the cost function of again. It's kind of like the cost function of biological evolution of the cost function of the economy. It's you can talk about regions to which it could be will go towards, but I don't think I don't think the cost function analogy is the most useful.

SPEAKER_01

11:14 - 11:48

So evolution doesn't. That's really interesting. So if evolution doesn't really have a cost function like a cost function based on its. something akin to our mathematical conception of a cost function, then do you think cost functions in deep learning are holding us back? You just kind of mentioned that cost function is a nice first performed idea. Do you think that's a good idea? Do you think it's an idea we'll go past? So self-play starts to touch on that a little bit in reinforcement learning systems.

SPEAKER_00

11:48 - 12:20

That's right. Self-play and also ideas around exploration where you're trying to take action that's supposed to surprise a predictor. I'm a big fan of course functions as ink was functions are great and they service really well and I think that whenever we can do things we got these cost functions we should. And, you know, maybe there is a chance that we will come up with some yet another profound way of looking at things that will involve course functions in a less central way. But I don't know, I think course functions are, I mean, I would not better guess what, but against course functions.

SPEAKER_01

12:20 - 12:31

Is there other things about the brain that pop into your mind that might be different and interesting for us to consider in design artificial neural networks? So we talk about spiking a little bit.

SPEAKER_00

12:33 - 12:44

I mean, one thing which may potentially be useful, I think people neuroscientists figured out something about the learning rule of the brain, or talking about spiked time in the pen and close to the city and it would be nice if some people would just study that in simulation.

SPEAKER_01

12:45 - 12:47

Sorry, spike time independent plasticity.

SPEAKER_00

12:47 - 13:16

Yeah, that's what that STD. It's a particular learning rule that uses spike time to figure out how to determine how to update the synopsis. So it's kind of like if the synopsis fires into the neuron before the neuron fires, then it strengthens the synopsis. And if the synopsis fires into the neuron shortly after the neuron fires, then it becomes the synopsis. Something along this line, I'm 90% sure it's right. So if I said something wrong here, Don't don't get too angry.

SPEAKER_01

13:16 - 14:13

But you sounded really well saying it. But the timing, that's one thing that's missing the the temple dynamics is not captured. I think that's like a fundamental property of the brain is the timing of the of the signals. What do you regard here on it works? But you you think of that as this. I mean, that's a very crude simplified. What's that called? There's a clock, I guess, to recurrent neural networks. This seems like the brain is the continuous version of that, the generalization, where all possible timings are possible, and then within those timings is contained, some information. You think recurrent neural networks, the recurrent neural networks can capture the same kind of phenomena as the timing that seems to be important for the brain in the, in the firing of neurons in the brain.

SPEAKER_00

14:13 - 14:31

I mean, I think, I think regarding the recurrent neural networks are amazing and they can do, I think they can do anything we want them to, if we want a system to do, right now recurrent neural networks have been superseded by transformers, but maybe one day they'll make a comeback, maybe it'll be back, we'll see.

SPEAKER_01

14:32 - 14:50

Let me in a small tangent say, do you think they'll be back? So so much of the breakthroughs recently that we'll talk about on natural language processing and language modeling has been with transformers that don't emphasize recurrence. Do you think recurrence will make it come back?

SPEAKER_00

14:50 - 15:00

Well, some kind of recurrence I think very like a recurrent neural networks for as their typically thought of for processing sequences, I think it's also possible.

SPEAKER_01

15:02 - 15:07

What is to you a recurrent neural network? In general speaking, I guess, what is a recurrent neural network?

SPEAKER_00

15:07 - 15:20

You have a neural network which maintains a high dimensional hidden state. And then when an observation arrives, it updates its high dimensional hidden state through its connections in some way.

SPEAKER_01

15:20 - 15:50

So do you think, you know, that's what like expert systems did, right? Symbolic AI, the knowledge based growing a knowledge base is maintaining a hidden state, which is a knowledge base and is growing it by some question processing. Do you think of it more generally in that way or is it simply, is it the more constrained form of a hidden state was sort of kind of getting units that we think of yesterday with LSDMs and that.

SPEAKER_00

15:51 - 16:12

I mean, the hidden state is technically what you describe the hidden state that goes inside the LSTM or the RNN or something like this. But then what should be contained, you know, if you want to make the expert system analogy, I mean, you could say that the knowledge stored in the connections and then the fortune processing is done in the hidden state.

SPEAKER_01

16:13 - 16:49

Yes, could you say that? So sort of do you think there's a future of building large scale knowledge bases within the neural networks? Definitely. So we're going to pause in that confidence because I want to explore that. Well, let me zoom back out and ask back to the history of ImageNet. Neon Networks have been around for many decades, as you mentioned. What do you think were the key ideas that led to their success that ImageNet moment and beyond the success in the past 10 years?

SPEAKER_00

16:49 - 16:56

Okay, so the question is to make sure I didn't make anything. The key ideas that led to the success of Deep Learning over the past 10 years.

SPEAKER_01

16:56 - 17:01

Exactly. Even though the fundamental thing behind Deep Learning has been around for much longer.

SPEAKER_00

17:04 - 18:34

The key idea about deep learning or rather the key fact about deep learning before deep learning started to be successful is that it was underestimated. People who worked in machine learning simply didn't think that neural networks could do much. People didn't believe that large neural networks could be trained. People thought that, well, there was a lot of debate going on in machine learning about what are the right methods and so on. And people were arguing because there were no, there was no way to get hard facts. And by that I mean, there were no benchmarks which were truly hard that if you do really well on them and you can say, look, Here is my system. That's when you switch from. That's when this field becomes a little bit more of an engineering field. So in terms of deep learning to answer the question directly, the ideas were all there. The thing that was missing was a lot of supervised data and a lot of compute. Once you have a lot of service data and a lot of compute, then there is a third image needed as well. And that is Conviction. Conviction that if you take the right stuff which already exists and apply and mix with a lot of data and a lot of compute that it will in fact work. And so that was the missing piece. It was you had the data, you needed the compute which showed up in terms of GPUs and needed the conviction to realize that you need to mix them together.

SPEAKER_01

18:35 - 19:32

So that's really interesting. So I guess the presence of compute and the presence of supervised data allowed the empirical evidence to do the convincing of the majority of the computer science community. So I guess there's a key moment with a Jitendra, Malik, and Alex Alioche Afros who were very skeptical, right? And then there's a Jeffrey Hinton that was the opposite of skeptical. And there was a convincing moment, and I think a mission that served is that moment. That's right. And they represented this kind of where the big pillars of computer vision community kind of, the wizards got together, and then all of a sudden there was a shift. And it's not enough for the idea to all be there and the computer be there. It's for to convince the cynicism that existed. That's interesting. The people just didn't believe for a couple of decades.

SPEAKER_00

19:33 - 20:13

Yeah, well, but it's more than that. It's kind of, when put this way, it sounds like, well, you know, those silly people who didn't believe what were they, what were they missing, but in reality, things were confusing because neural networks really did not work on anything and they were not the best method on pretty much anything as well. And it was pretty rational to say, yeah, this stuff doesn't have any traction. And that's why you need to have these very hard tasks which are which produce undeniable evidence. And that's how we make progress. And that's why the field is making progress today because we have these hard benchmarks which represent true progress. And so, and this is why we are able to avoid endless debate.

SPEAKER_01

20:15 - 20:53

So, incredibly, you've contributed some of the biggest recent ideas in AI in computer vision, language and actual English processing, reinforcement learning, sort of everything in between. Maybe not GANS. They may not be a topic given touch, and of course the fundamental science of deep learning. What is the difference to you between vision, language and as in reinforcement learning action as learning problems and what are the commonalities? Do you see them as all interconnected? Are they fundamentally different domains that require different approaches?

SPEAKER_00

20:55 - 23:04

Okay, that's a good question. Machine learning is a field with a lot of unity, a huge amount of unity. In what it means by unity, like overlap of ideas, overlap of ideas, overlap of principles. In fact, there is only one or two or three principles, which are very, very simple. And then they apply in almost the same way, in almost the same way to the different modalities, to the different problems. And that's why today, when someone writes a paper on improving optimization of deep learning in vision, it improves the different NLP applications and it improves the different reinforcement learning applications. reinforcement learned, so I would say that computer vision and NLP are very similar to each other. Today, they differ in that they have slightly different architectures. We use transformers in NLP and we use convolutional neural networks in vision. But it's also possible that one day, this will change and every single be unified with the single architecture. Because if you go back a few years ago in natural language processing, there were a huge number of architectures for every different tiny problem had its own architecture. today, this is just one transformer for all those different tasks. And if you go back in time even more, you had even more and more fragmentation and every little problem in AI had its own little subspecialization and sub, you know, little set of collection of skills, people who would know how to engineer the features. Now it's all been subsumed by deep learning. We have this unification. And so I expect vision to become unified with natural languages. Well, all right, the alternative I expect. I think it's possible. I don't want to be too sure because I think on the convolutional neural net is very computationally efficient. Our rail is different. Our rail does require slightly different techniques because you really do need to take action. You really need to do something about exploration, your variance is much higher. But I think there is a lot of unity even there. And I would expect for example that at some point there will be some broader unification between a rail and supervised learning where somehow the rail will be making decisions to make the supervised learning go better and it will be I imagine one big black box and you just throw a you know you shovel shovel things into it and it just figures out what to do with whatever you shovel it

SPEAKER_01

23:05 - 23:24

I mean, reinforcement learning has some aspects of language and vision combined, almost. There's elements of a long term memory that you should be utilizing and there's elements of a really rich sensory space. So it seems like it's like the union of the two or something like that.

SPEAKER_00

23:24 - 23:33

I'd say something slightly differently. I'd say that reinforcement learning is neither, but it naturally interfaces and integrates with the two of them.

SPEAKER_01

23:34 - 23:43

You think action is fundamentally different. So yeah, what is interesting about what is unique about policy of learning to act?

SPEAKER_00

23:43 - 24:04

Well, so one example, for instance, is that when you learn to act, you are fundamentally in a non-stationary world, because as your actions change, the things you see start changing. You experience the world in a different way and this is not the case for the much-traditional static problem. You have a decent distribution and you just apply a model to that distribution.

SPEAKER_01

24:06 - 24:14

You think it's a fundamentally different problem or it's just a more difficult generalization of the problem of understanding.

SPEAKER_00

24:14 - 24:54

I mean, it's a question of definitions almost. There is a huge amount. There is a huge amount of community for sure. You take gradients. You take gradients. We try to approximate gradients in both cases in some case in the case of reinforcement learning. You have some tools to reduce the variance of the gradients. You do that. There's lots of common out. You use the same neural net in both cases. You compute the gradient. You apply adam in both cases. So, I mean, there's lots in common for sure, but there are some small differences which are not completely insignificant. It's really just the matter of your point of view, what frame of reference, how much do you want to zoom in or out as you look at these problems?

SPEAKER_01

24:54 - 25:09

Which problem do you think? It's harder. So people like no Chomsky believe that language is fundamental to everything. So it underlies everything. Do you think language understanding is harder than visual scene understanding or vice versa?

SPEAKER_00

25:09 - 25:19

I think that asking if a problem is hard is likely wrong. I think the question is a little bit wrong, but I want to explain why. So what does it mean for a problem to be hard?

SPEAKER_01

25:21 - 25:41

Okay, the not interesting dumb answer to that is there's this is a benchmark and there's a human level performance on that benchmark and how is the effort required to reach the human level okay benchmark so from the perspective of how much until we get to human level on a very good benchmark

SPEAKER_00

25:43 - 26:13

Yeah, like I understand what you mean by that. So what I was going to say that a lot of it depends on, you know, once you solve a problem, it stops being hard. That's all that's always true. But something is hard or not depends on what a tools can do today. So you know, you say today, through human level, language understanding and visual perception, a hard sense that there is no way of solving the problem completely in the next three months. Right. So I agree with that statement. Beyond that, I'm just I'd be my guess would be as good as yours.

SPEAKER_01

26:13 - 26:19

I don't know. Okay, so you don't have a fundamental intuition about how hard language understanding is.

SPEAKER_00

26:19 - 26:52

I think I know a change in my mind. That's a language is probably going to be hard. I mean, it depends on how you define it. Like if you mean absolute top notch 100% language understanding, I'll go with language. Also, but then if I show you a piece of paper with letters on it, is that, if you see what I mean, you have a vision system, you say it's the best human-level vision system. I show you, I open a book and I show you letters. Really to understand how these letters form into word and sentences and meaning is this part of the vision problem. Where does the vision end and language begin?

SPEAKER_01

26:53 - 27:38

Yeah, so Chomsky would say it starts at language. So Vision is just a little example of the kind of structuring and fundamental hierarchy of ideas that's already represented in our brain somehow that's represented through language. But where does Vision stop and language begin? That's a really interesting question. So one possibility is that it's impossible to achieve really deep understanding in either images or language without basically using the same kind of system. So you're going to get the other for free.

SPEAKER_00

27:38 - 28:01

I think it's pretty likely that, yes, if we can get one, our machine learning is probably that good that we can get the other. But it's not 100% sure. And also, I think a lot of it really does depend on your definitions. definitions of like perfect vision, because reading is vision, but should it count?

SPEAKER_01

28:01 - 28:14

Yeah, to me, so my definition is of a system looked at an image and then a system looked at a piece of text and then told me something about that and I was really impressed.

SPEAKER_00

28:15 - 28:22

That's relative. You'll be impressed for half an hour and then you're going to say, well, I mean, all the systems do that, but here's the thing they don't do.

SPEAKER_01

28:22 - 29:14

Yeah, but I don't have that with humans. Humans continue to impress me. Is that true? Well, the ones, okay, so I'm a fan of monogamous. I like the idea of marrying somebody being with them for several decades. So I believe in the fact that yes, it's possible to have somebody continuously giving you pleasurable, interesting, witty, new ideas, friends. Yeah, I think so. They continue to surprise you. The surprise, it's you know that injection of randomness seems to be a nice source of Yeah, continued Inspiration like the the with the humor. I think yeah That that that would be it's a very subjective task, but I think if you have enough humans in the room

SPEAKER_00

29:15 - 29:37

Yeah, I understand what you mean. Yeah, I feel like I'm, I misunderstood what you meant by impressing you. I thought you meant to impress you with its intelligence, with how, with how good value it understands an image. I thought you meant something like, I'm going to show you to really complicated image and it's going to get it right and you're going to say, wow, that's really cool, a system of, you know, a January to 2020, have not been doing that.

SPEAKER_01

29:37 - 30:03

Yeah, I think it all boils down to like, The reason people click like on stuff on the internet, which is like it makes them laugh So it's like humor or wit or insight I'm sure we'll get it as get that as well So forgive the romanticized question, but looking back to you, what is the most beautiful or surprising idea in deep learning? Or AI in general, you've come across.

SPEAKER_00

30:03 - 30:39

So I think the most beautiful thing about deep learning is that it actually works. And I mean it, because you got these ideas, you got the little neural network, you got the back propagation algorithm. And then you've got some theories as to, you know, this is kind of like the brain. So maybe if you make it large, if you make the neural network large in a trained in a lot of data, then it will do the same function of the brainness. And it turns out to be true. That's crazy. And now if you just train these neural networks and you make them larger and they keep getting better. And I find it unbelievable. I find it unbelievable that this whole AI stuff with neural networks works.

SPEAKER_01

30:39 - 30:47

Have you built up an intuition of why are there little bits and pieces of intuitions of insights of why this whole thing works?

SPEAKER_00

30:48 - 31:03

I mean, some, definitely, while we know that optimization, we now have good, you know, we've had lots of empirical, you know, huge amounts of empirical reasons to believe that optimization should work on all most problems we care about.

SPEAKER_01

31:04 - 31:31

do you have insights of what so you just said empirical evidence is most of your sort of empirical evidence kind of convinces you it's like evolution is empirical as shows you that look this evolutionary process seems to be a good way to design organisms that survive in their environment, but it doesn't really get you to the insights of how the whole thing works.

SPEAKER_00

31:31 - 32:17

I think it's a good analogy is physics. You know how you say, hey, let's do some physics calculation and come on for some new physics theory and make some prediction. But then you've got around the experiment. You know, you got around the experiment, it's important. So it's a bit the same here, except that maybe sometimes the experiment came before the theory. But it still is the case, you know, you have some data and you come up with some predictions. So yeah, let's make a big neural network, let's train it, and it's going to work much better than anything before it, and it will, in fact, continue to get better as you make it larger. And it turns out to be true that that's amazing when a theory is validated like this. You know, it's not a mathematical theory. It's more or a biological theory almost. So I think there are not terrible analogies between deep learning and biology. I would say it's like the geometric mean of biology and physics. That's deep learning.

SPEAKER_01

32:17 - 32:33

The geometric mean of biology and physics. I think I'm going to need a few hours to wrap my head around that. Because just to find the geometric, just to find the set of what biology represents.

SPEAKER_00

32:33 - 32:48

Well, biology things are really complicated. And there's a really, really hard to have good predictive theory. And in physics, the theory is a good physics. In physics, people make the super precise theories with these amazing predictions. And in machine learning, we're kind of in between.

SPEAKER_01

32:48 - 33:07

Kind of in between. But it'd be nice if machine learning somehow helped us discover the unification of the two as opposed to serve the in between. but you're right, you're kind of trying to juggle both. So do you think there's still beautiful and mysterious properties in your networks there yet to be discovered?

SPEAKER_00

33:07 - 34:20

Definitely. I think that we are still massively underestimating deep learning. What do you think it'll look like? Like what? Find you. I would have died. So, uh, but if you look at all the progress from the past 10 years, I would say most of it, I would say there've been a few cases where some were things that felt like really new ideas showed up, but by and large, it was every year. It's okay. Deep learning goes this far. Nope. It actually goes further. And then the next key, okay, now this is the speak deep learning. If you're really done, nope, goes further. It just keeps going further each year. So that means that we keep underestimating, we keep not understanding it. A surprising property is all the time. Do you think it's getting harder and harder to make progress? Need to make progress? It depends on what we mean. I think the field will continue to make a very robust progress for quite a while. I think for individual researchers, especially people who are doing research, it can be harder because there is a very large number of researchers right now. I think that if you have a lot of compute, then you can make a lot of very interesting discoveries, but then you have to deal with the challenge of managing a huge computer class, a huge computer class to try new experiments. It's a little bit harder.

SPEAKER_01

34:20 - 34:48

So asking all these questions that nobody knows the answer to, but you're one of the smartest people I know, so I'm going to keep asking. So let's imagine all the breakthroughs that happen in the next 30 years in deep learning. Do you think most of those breakthroughs can be done by one person with one computer? Sort of in a space of breakthroughs, do you think compute will be compute and large efforts will be necessary?

SPEAKER_00

34:49 - 34:55

I mean, I can't be sure. When you say one computer, you mean how, how large?

SPEAKER_01

34:55 - 34:59

You're, you're clever. I mean, one GPU.

SPEAKER_00

34:59 - 35:34

I see. I think it's pretty unlikely. I think it's pretty unlikely. I think that there are many, the stack of deep learning is starting to be quite deep. If you look at it, you've got all the way from. the ideas, the systems to build the data sets, the distributed programming, the building the actual cluster, the GPU programming, putting it all together. So now the stack is getting really deep and I think it becomes, it can be quite hard for a single person to become to be world-class in every single layer of the stack.

SPEAKER_01

35:35 - 35:51

What about the, like, Vladimir Vapnik really insists on is taking amnest and trying to learn from very few examples. So being able to learn more efficiently. Do you think that there'll be breakthroughs in that space that would may not need the huge compute?

SPEAKER_00

35:51 - 36:21

I think there will be a large number of breakthroughs in general if you're looking at the huge amount of compute. So maybe I should clarify that. I think that some breakthroughs will require a lot of compute. And I think building systems which actually do things will require a huge amount of compute. That one is pretty obvious. If you want to do X and X requires a huge neural net, you gotta get a huge neural net. But I think there will be lots of, I think there is lots of room for very important work being done by small groups and individuals.

SPEAKER_01

36:22 - 36:38

Can you maybe sort of on the topic of the science of deep learning? Talk about one of the recent papers that you've released that deep double descent. We're bigger models and more data hurt. I think it's a really interesting paper. Can you describe the main idea?

SPEAKER_00

36:39 - 37:02

Yeah, definitely. So what happened is that some over over the years, some small number of researchers noticed that it is kind of weird that when you make the neural network larger it works better and it seems to go in contradiction with statistical ideas. And then some people made an analysis showing that actually you got this double descent bump and what we've done was to show that double descent occurs for all for pretty much all practical deep learning systems.

SPEAKER_01

37:03 - 37:13

and that it will be also, so can you step back what's the x-axis and the y-axis of a double descent plot?

SPEAKER_00

37:13 - 38:55

Okay, great. So you can look, you can do things like, you can take your neural network and you can start increasing its size slowly while keeping your data set fixed. So if you increase the size of the neural network slowly and if you don't do early stopping, that's a pretty important detail. Then when the neural network is really small, you make it larger, you get a very rapid increase in performance. Then you continue to make it larger and at some point performance you'll get worse and it gets the worst exactly at the point at which it achieves zero training or precise zero training loss. And then as you make it larger, it starts to get better again. And it's kind of counterintuitive because you'd expect deep learning phenomena to be monotonic. And it's hard to be sure what it means, but it also occurs in the case of linear classifiers and the intuition basically boils down to the following. When you have a lot, when you have a large dataset, and a small model. Then small tiny random, so basically what is overfitting? Overfitting is when your model is somehow very sensitive to the small random, unimportant stuff in your dataset, in the training data set precisely. So if you have a small model and you have a big dataset, And there may be some random, you know, some training cases are randomly in the data says, and others may not be there. But the small model is kind of insensitive to this randomness because It's the same, there is pretty much no uncertainty about the model when they say it's large.

SPEAKER_01

38:55 - 39:12

So okay, so at the very basic level to me, it is the most surprising thing that you'll know we don't overfit every time very quickly before ever being able to learn anything. The huge number of parameters.

SPEAKER_00

39:13 - 40:07

So here is, so there is one way, okay, so let me try to give the explanation and maybe that will work. So you got a huge neural network. Let's suppose you've got M. You have a huge neural network, you have a huge number of parameters. Now let's pretend everything is linear, which is not let's just pretend. Then there is this big subspace, where a neural network achieves zero error. And as a GT is going to find approximately the point, right? Yeah, approximately the point with the smallest norm in that subspace. And that can also be proven to be insensitive to the small randomness in the data, when the dimensionality is high. But when the dimensionality of the data is equal to the dimensionality of the model, then there is a one-to-one correspondence between all the datasets and the models. So small changes in the dataset actually lead to large changes in the model, and that's why performance gets worse. So this is the best explanation, model S.

SPEAKER_01

40:09 - 40:15

So then it would be good for the model to have more parameters, so to be bigger than the data.

SPEAKER_00

40:15 - 40:37

That's right, but only if you don't early stop. If you introduce early stop in your regularization, you can make a double descent pump almost completely disappear. What is early stop? Early stop is when you train your model and you monitor your test rehabilitation performance. And then if at some point validation performance starts to get worse, you say, okay, let's stop training. If you're good, if you're good enough.

SPEAKER_01

40:37 - 40:42

So the magic happens after after that moment, so you don't want to do that early stopping.

SPEAKER_00

40:42 - 40:46

Well, if you don't do that early stopping, you get these very, you get the very pronounced double descent.

SPEAKER_01

40:47 - 40:50

Do you have any intuition why this happens?

SPEAKER_00

40:50 - 41:27

Double descent? Or is it stopping? No, the double descent. So yeah, so I try to see the intuition is basically, is this that when the data set has as many degrees of freedom as the model, then there is one to one correspondence between them and so small changes to the data set lead to noticeable changes in the model. So your model is very sensitive to all the randomness. It is unable to discard it. Whereas it turns out that when you have a lot more data than parameters or a lot more parameters than data, the resulting solution will be insensitive to small changes in the data set.

SPEAKER_01

41:27 - 41:34

So it's able to nicely put discard the small changes, the randomness, exactly.

SPEAKER_00

41:34 - 41:36

The spurious correlation, which you don't want.

SPEAKER_01

41:37 - 41:56

Jeff Hinton suggested we need to throw back propagation really kind of talked about this a little bit but he suggested we need to throw away back propagation and start over I mean of course some of that is a little bit uh... wit and humor but what do you think what could be an alternative method of training you'll know works

SPEAKER_00

41:56 - 42:09

Well, the thing that he said precisely is that, to the extent that you can't find back propagation in the brain, it's worth seeing if we can learn something from how the brain learns, but back propagation is very useful and we should keep using it.

SPEAKER_01

42:09 - 42:33

Well, you're saying that once we discover the mechanism of learning in the brain or any aspects of that mechanism, we should also try to implement that in your network. If it turns out that we can't find back propagation in the brain, if we can't find back propagation in the brain. Well, So I guess your answer to that is bad propagation is pretty damn useful. So why are we complaining?

SPEAKER_00

42:33 - 43:00

I mean, I personally am a big fan of bad propagation. I think it's a great algorithm because it solves an extremely fundamental problem which is finding a neural circuit subject to some constraints. And I don't see that problem going away. So that's why I really, I think it's pretty unlikely that we'll have anything which is going to be dramatically different. It could happen, but I wouldn't bet on it right now.

SPEAKER_01

43:00 - 43:08

So let me ask a sort of big picture question. Do you think can, do you think neural networks can be made to reason?

SPEAKER_00

43:08 - 43:34

Why not? Well, if you look for example at AlphaGo or AlphaZero, The neural network of Alpha Zero plays Go, which we all agree is a game that requires reasoning. Better than 99.9% of all humans, just the neural network without this search, just the neural network itself. Doesn't that give us an existence proof that neural networks can reason?

SPEAKER_01

43:34 - 44:35

To push back and disagree a little bit, we all agree that Go is reasoning. I think I agree. I don't think it's a trivial. So obviously reasoning like intelligence is a loose gray area term a little bit. Maybe you disagree with that. But yes, I think it has some of the same elements of reasoning. Reasoning is almost like akin to search. There's a sequential element of step-wise consideration of possibilities. and sort of building on top of those possibilities in a sequential manner until you arrive at some insight. So yeah, I guess playing goes kind of like that. And when you have a single neural network doing that without search, that's kind of like that. So there's an existing proof in a particular constrained environment that a process akin to what many people call reasoning exists. But more general kind of reasoning. So off the board.

SPEAKER_00

44:35 - 44:37

The reason why not there exists this problem, boy.

SPEAKER_01

44:38 - 44:55

Which one? Us humans? Yes. Okay. All right. So, do you think the architecture that will allow you on our horizon will look similar to the neural network architectures we have today?

SPEAKER_00

44:55 - 45:26

I think it will, I think, well, I don't want to make too overly definitive statements. I think it's definitely possible that the neural networks that will produce the reasoning breakthroughs of the future will be very similar to the architectures that exist today. Maybe a little bit more current, maybe a little bit deeper. But these neural heads are so insanely powerful. Why wouldn't they be able to learn to reason? Humans can reason. So why can't neural networks?

SPEAKER_01

45:26 - 45:36

Do you think the kind of stuff we've seen neural networks do is a kind of just weak reasoning. So it's not a fundamentally different process. Again, this is stuff we don't know what we know the answer to.

SPEAKER_00

45:36 - 46:01

So when it comes to our neural networks, I would think each I would say is that neural networks are capable of reasoning. But if you train a neural network on a task which doesn't require reasoning, it's not going to reason. This is a well known effect where the neural network will solve exactly the is will solve the problem that you pose in front of it in the easiest way possible.

SPEAKER_01

46:01 - 46:25

Right. That takes us to the One of the brilliant ways you've described neural networks, which is, you've referred to neural networks as the search for small circuits, and maybe general intelligence as the search for small programs, which I found is a metaphor very compelling. Can you elaborate in that difference?

SPEAKER_00

46:26 - 47:22

Yeah, so the thing which I said precisely was that if you can find the shortest program that outputs the data in your to your disposal, then you will be able to use it to make the best prediction possible. And that's a theoretical statement which can be proven mathematically. Now, you can also prove mathematically that it is that finding the shortest program with generates some data is not a computable operation. No finite amount of compute can do this. So then, with neural networks, neural networks are the next best thing that actually works in practice. If you are not able to find the best, the shortest program which generates our data, but we are able to find a small, but now that statement should be amended, even a large circuit which fits our data in some way.

SPEAKER_01

47:22 - 47:26

I think what you meant by the small circuit is the smallest needed circuit.

SPEAKER_00

47:27 - 48:02

Well, the thing I would change now, back then I really haven't fully internalized the over-parameteries results. The things we know about over-parameteries neural nets, now I would phrase it as a large circuit whose weight contains a small amount of information, which I think is what's going on. If you imagine the training process of a neural network as you slowly transmit entropy from the data set to the parameters, Then somehow the amount of information in the weights ends at being not very large, which would explain by the generally so well.

SPEAKER_01

48:02 - 48:19

So that's the large circuit might be one that's helpful for the regular, for the generalization. Yeah, some of this. But do you see there, do you see it important to be able to try to learn something like programs?

SPEAKER_00

48:19 - 48:53

I mean, if we can, definitely. I think it's kind of the answer is kind of yes, if we can do it, we should do things that we can do it. It's the reason we are pushing on deep learning. The fundamental reason, the root cause, is that we are able to train them. So now what's training comes first. We've got our pillar, which is the training pillar. And now if you're trying to contoured our neural networks around the train people, have you got a state trainable? This is an invariant, we cannot violate.

SPEAKER_01

48:53 - 49:01

And so being trainable means starting from scratch, knowing nothing, you can actually pretty quickly converge towards knowing a lot.

SPEAKER_00

49:01 - 49:12

Or even slowly. But it means that given the resources that you're disposal, you can train the neural net and get it to achieve useful performance.

SPEAKER_01

49:12 - 49:14

Yeah, that's a pillar we can't move away from.

SPEAKER_00

49:14 - 49:24

That's right. Because if you can, and various, if you say, hey, let's find the shortest program. Well, we can't do that. So it doesn't matter how useful that would be. We can do it.

SPEAKER_01

49:24 - 49:46

So we want. So do you think you kind of mentioned that the neural networks are good at finding small circuits or large circuits? Do you think then the matter of finding small programs is just the data? No. So the size or the type of data, sort of, ask giving it programs.

SPEAKER_00

49:46 - 50:13

Well, I think the thing is that right now, finding there are no good precedents of people successfully finding programs really well. And so the way you find programs is you train and deep neural networks to do it basically. Which is the right way to go about it. But there's not good illustrations that it hasn't been done yet, but it in principle, it should be possible.

SPEAKER_01

50:13 - 50:21

Can you elaborate a little bit, what's your answer in principle? Well, it put another way, you don't see why it's not possible.

SPEAKER_00

50:21 - 50:42

Well, it's kind of like more, it's more a statement of, I think that it's, I think that it's unwise to bet against deep learning. If it's a cognitive function that humans seem to be able to do, then it doesn't take too long for some deep neural net to pop up that can do it too.

SPEAKER_01

50:42 - 51:18

Yeah, I'm there with you. I've stopped betting against neural networks at this point because I continue to surprise us. What about long-term memory? Can you all know or have long-term memory or something like knowledge basis? So being able to aggregate important information over long periods of time, that would then serve as useful sort of representations of state that you can make decision by, so have a long-term context based on which you make into decision.

SPEAKER_00

51:18 - 51:44

So in some sense, the parameters already do that. The parameters are an aggregation of the neural of the entirety of the neural net experience and so they count as the long term knowledge. And people have trained various neural nets to act as knowledge bases and you know, investigated with people who have investigated languages and knowledge bases. So there is work, there is work there.

SPEAKER_01

51:44 - 53:04

Yeah, but in some sense, do you think in every sense, do you think there's It's all just a matter of coming up with a better mechanism of forgetting the useless stuff. And remember the useful stuff is right now. I mean, there's not been mechanisms that do remember really long term information. What do you mean by the precisely precisely? I like the word precisely. I'm thinking of the kind of compression of information that knowledge bases represent, sort of creating a, now I apologize for my sort of human centric thinking about what knowledge is because neural networks aren't interpretable necessarily with the kind of knowledge they have discovered. But a good example for me is knowledge base is being able to build up over time something like the knowledge that Wikipedia represents. It's a really compressed, structured knowledge base. Obviously not the actual Wikipedia or the language, but like a semantic web, the dream that semantic web represented. So it's a really nice compressed knowledge base. or something akin to that in not interpretable sense as neural networks would have.

SPEAKER_00

53:04 - 53:08

Well, the neural networks would be not interpretable if you look at their weights, but their output should be very interpretable.

SPEAKER_01

53:09 - 53:15

Okay, so how do you make very smart neural networks like language models interpretable?

SPEAKER_00

53:15 - 53:19

Well, you asked them to generate some text and the text it would generally be interpretable.

SPEAKER_01

53:19 - 53:50

Do you find that the epitome of interpretability? Like it can you do better? Like can you, because you can't, okay, I'd like to know what does it know and what doesn't know. I would like the neural network to come up with examples where it's completely dumb and examples where it's completely brilliant. And the only way I know how to do that now is to generate a lot of examples and use my human judgment. But it would be nice if the neural network had some awareness about it.

SPEAKER_00

53:50 - 54:39

100% I'm a big believer in self-awareness and I think I think New neural nets, self-awareness, will allow for things like the capabilities like the ones you describe, like for them to know what they know and what they don't know and for them to know where to invest to increase their skills most optimally. And to your question of interpretability, there are actually two answers to that question. One answer is, you know, we have the neural nets, so we can analyze the neurons and we can try to understand what the different neurons and different players mean. And you can actually do that and open AI is done some work on that. But there is a different answer which is that, I would say that's the human centric answer where you say, you know, you look at a human being, you can't read, you know, how do you know what a human being is thinking? You ask them, you say, hey, what do you think about this? What do you think about that?

SPEAKER_01

54:39 - 55:47

And you get some answers. The answers you get are sticky in the sense you already have a mental model, you already have an Yeah, I went to a model of the human being. You already have an understanding of like a big conception of what it of that human being, how they think, or what they know, how they see the world, and then everything you ask, you're adding onto that. And that's stickiness seems to be That's one of the really interesting qualities of the human being, is that information is sticky. You don't, you seem to remember the useful stuff aggregated well and forget most of the information that's not useful. That process, but that's also pretty similar to the process in neural networks, too. It's just that neural networks are much crappier at this time. It doesn't seem to be fundamentally that different, but just to stick on reasoning for a little longer. He said, why not? Why can't there isn't? What's a good impressive feat benchmark to you of reasoning? That you'll be impressed by if you're not also able to do.

SPEAKER_00

55:47 - 55:59

Is that something you already have in mind? Well, I think writing really good code. I think proving really hard theorems, solving open-ended problems without other box solutions.

SPEAKER_01

56:03 - 56:06

and sort of theorem type mathematical problems.

SPEAKER_00

56:06 - 56:34

Yeah, I think those ones are a very natural example as well. You know, if you can prove an unproven theorem then it's hard to argue, don't reason. And so by the way, and this comes back to the point about the hard results, you know, if you got a hard, if you have the machine learning, deep learning is a field is very fortunate because we have the ability to sometimes produce these unambiguous results. And when they happen, the debate changes, the conversation changes. It's a conversation.

SPEAKER_01

56:34 - 56:41

And then, of course, just like you said, people kind of take that for granted, say, that wasn't actually a hard problem.

SPEAKER_00

56:42 - 56:46

Well, I mean at some point, you probably run out of heart problems.

SPEAKER_01

56:46 - 57:10

Yeah, that whole mortality thing is kind of a sticky problem that we haven't quite figured out. Maybe we'll solve that one. I think one of the fascinating things in your entire body of work, but also the work that opening I recently, one of the conversation changes has been in the world of language models. Can you briefly kind of try to describe the recent history of using your own networks in the domain of language and text

SPEAKER_00

57:11 - 58:27

Well, there's been lots of history. I think the element network was a small, tiny recurrent unit that was applied to language back in the 80s. So the history is really You know, fairly long at least. And the thing that starts, the thing that's changed the trajectory of neural networks and language is the thing that changed the trajectory of all deep learning and that's data and compute. So suddenly you move from small language models, which learn a little bit. And with language models in particular, you can, there's a very clear explanation for why they need to be large, to be good, because they're trying to predict the next word. So we don't, we don't know anything. You'll notice very, very broad strokes surface level patterns like. Sometimes there are characters and there is space between those characters. You notice the pattern. And you'll notice that sometimes there is a comma and the next character is a capital letter. You'll notice that pattern. Eventually you may start to notice that there are certain words that occur often. You may notice that spellings are a thing. You may notice syntax. And when you get really good at all these, you start to notice the semantics. You start to notice the facts. But for that to happen, the language model needs to be larger.

SPEAKER_01

58:28 - 59:09

So that's, let's linger on that. That's where you and no jumps get disagree. See, you think we're actually taking incremental steps. Sort of larger network larger compute will be able to get to the semantics, be able to understand language without what nom likes to sort of think of as a fundamental understandings of the structure of language, like imposing your theory of language onto the learning mechanism. So you're saying the learning, you can learn from raw data, the mechanism that underlies language.

SPEAKER_00

59:10 - 01:00:03

Well, I think it's pretty likely, but I also want to say that I don't really No precisely what is what Chomsky means when he talks about him. You said something about imposing your structural language. I'm not 100% sure what he means, but empirically it seems that when you inspect those larger language models, they exhibit signs of understanding the semantics, where is the small language models do not. We've seen that a few years ago when we did work on the sentiment neuron. We trained a small, you know, small shell esteem to predict the next character in Amazon reviews. And we noticed that when you increase the size of the LSTM from 500 LSTM cells to 4,000 LSTM cells, then one of the neurons starts to represent the sentiment of the article of sorry of their view. Now why is that sentiment is a pretty semantic attribute, it's not a syntactic attribute.

SPEAKER_01

01:00:04 - 01:00:09

And for people who might not know, I don't know if that's a standard term, but sentiment is whether it's a positive or negative review.

SPEAKER_00

01:00:09 - 01:00:27

That's right, like these are these the person happy with something or is the person not happy with something. And so here we had very clear evidence that a small neural net does not capture sentiment, well, a large neural net does. And why is that? Well, our theory is that at some point, you run out of syntax to models, you start to go out of focus on something else.

SPEAKER_01

01:00:28 - 01:00:36

And with size, you quickly run out of syntax to model and then you really start to focus on the semantics. This would be the idea.

SPEAKER_00

01:00:36 - 01:00:50

That's right. And so I don't want to imply that our models have complete semantic understanding because that's not true. But they definitely are showing signs of semantic understanding, partial semantic understanding, but the smaller models do not show that those signs.

SPEAKER_01

01:00:51 - 01:01:00

Can you take a step back and say, what is GPT2, which is one of the big language models that was the conversation change in the past couple years?

SPEAKER_00

01:01:00 - 01:01:26

Yes, so GPT2 is a transformer with one and a half billion parameters that was trained on about 40 billion tokens of text, which were obtained from web pages that were linked to from ready articles with more than three outputs. and what's the transformer? The transformer, it's the most important advance in neural network architectures in Greece and history.

SPEAKER_01

01:01:26 - 01:01:38

What is attention, maybe, too? Because I think that's the interesting idea, not necessarily sort of technically speaking, but the idea of attention versus maybe what recurring neural networks represent.

SPEAKER_00

01:01:38 - 01:02:46

Yeah, so the thing is the transformer is a combination of multiple ideas simultaneously, which attention is one. Do you think attention is the key? No, it's a key, but it's not the key. The transformer is successful because it is the simultaneous combination of multiple ideas. And if you were to remove either idea, it would be much less successful. So the transformer uses a lot of attention, but attention exists for a few years, so that can't be the main innovation. The transformer is designed in such a way that it runs really fast on the GPU. And that makes a huge amount of difference. This is one thing. The second thing is the transformer is not recurrent. And that is really important, too, because it is more shallow and therefore much easier to optimize. So in other words, it uses attention. It is a really great fit to the GPU. And it is not recurring, so therefore less deep and easier to optimize. And the combination of those factors makes it successful. So now it makes great use of your GPU. It allows you to achieve better results for the same amount of compute. And that's why it's successful.

SPEAKER_01

01:02:48 - 01:03:03

where you surprised how well transformers worked and GPT2 worked. So if you worked on language, you've had a lot of great ideas before transformers came about in language. So you've got to see the whole set of revolutions before and after where you surprised.

SPEAKER_00

01:03:04 - 01:03:50

Yeah, a little. Yeah. I mean, it's hard to remember because you adapt really quickly, but it definitely was surprising. It definitely was, in fact, you know, what I'll retract my statement. It was, it was pretty amazing. It was just amazing to see, generate the text of this. And you know, you gotta keep in mind that we've seen at that time we've seen all these progress in GANS and improving the, you know, the samples produced by GANS, we're just amazing. You have these realistic faces, but text hasn't really moved that much. And suddenly we moved from, you know, whatever games were in 2015 to the best most amazing games in one step. And I was really stunning. Even though theory predicted, yeah, you train a big language model, of course you should get this, but then to see it with your own eyes at something else.

SPEAKER_01

01:03:52 - 01:04:24

And yet, we adapt really quickly and now there's sort of some cognitive scientists, right? Articles saying that GPT2 models don't really understand language. So we adapt quickly to how amazing the fact that they're able to model the language so well is. So what do you think is the bar? For what? For impressing us that it... I don't know. Do you think that bar will continuously be moved? Definitely.

SPEAKER_00

01:04:24 - 01:04:55

I think when you start to see really dramatic economic impact, that's when. I think that's in some sense, in the next barrier. Because right now, if you think about the work in AI, it's really confusing. It's really hard to know what to make of all these advances. It's kind of like, okay, you've got an advance and now you can do what more things and you've got another improvement and you've got another cool demo. At some point, I think people who are outside of AI, they can no longer distinguish this progress anymore.

SPEAKER_01

01:04:55 - 01:05:14

So we were talking offline about translating Russian to English and how there's a lot of brilliant work in Russian that the rest of the world doesn't know about. That's true for Chinese, it's true for a lot of scientists and just artistic work in general. Do you think translation is the place where we're going to see sort of economic big impact?

SPEAKER_00

01:05:14 - 01:05:44

I don't know, I think I think there is a huge number of applications. I mean, I was first of all, I would want to I want to point out that translation already today is huge. I think billions of people interact with big chunks of the internet primarily through translation. So translation is already huge and it's hugely hugely positive too. I think self-driving is going to be hugely impactful and that's You know, it's unknown exactly when it happens, but again, I would not bet against deep learning.

SPEAKER_01

01:05:44 - 01:05:52

So I. So there's deep learning in general, but you're deep learning for self-driving. Yes, deep learning for self-driving, but I was talking about sort of language models.

SPEAKER_00

01:05:52 - 01:05:54

I see. I see. I see. I see. I see.

SPEAKER_01

01:05:55 - 01:05:58

Just to check, you're not seeing a connection between driving and language.

SPEAKER_00

01:05:58 - 01:06:01

No, no. Okay. All right, both use neural nets.

SPEAKER_01

01:06:01 - 01:06:20

There'll be a poetic connection. I think there might be some. I like you said there might be some kind of unification towards a kind of multitask transformers that can take on both language and vision tasks. I'd be an interesting unification. Now let's see what can I ask about GPT two more.

SPEAKER_00

01:06:22 - 01:06:29

It's simple, so don't much to ask. You take a transform, you make it bigger, give it more data. And suddenly that's all those amazing things.

SPEAKER_01

01:06:29 - 01:06:47

Yeah, one of the beautiful things is that GPT, the transformers are fundamentally simple to explain, to train. Do you think bigger will continue to show better results in language? Probably. Sort of like what are the next steps of GPT to do you think?

SPEAKER_00

01:06:48 - 01:07:30

I mean, I think for sure seeing what a large version can do is one direction. Also, I mean, there are many questions. This one question which I'm curious about next following. So right now, GPT2, so we feel it all is data from the internet, which means that it needs to memorize all those random facts about everything in the internet. And it would be nice if The model could somehow use its own intelligence to decide what data it wants to accept and what data it wants to reject. Just like people, people don't learn all data indiscriminately. We are super selective about what we learn. And I think this kind of active learning I think would be very nice to have.

SPEAKER_01

01:07:31 - 01:08:14

Yeah, listen, I love active learning. So let me ask does the selection of data can you just elaborate that a little bit more do you think the selection of data is Like, I have this kind of sense that the optimization of how you select data, so the active learning process is going to be a place for a lot of breakthroughs, even in the near future, because there hasn't been many breakthroughs there that are public. I feel like there might be private breakthroughs that companies keep to themselves, because the fundamental problem is to be solved if you want to solve self-driving, if you want to solve a particular task. What do you think about the space in general?

SPEAKER_00

01:08:14 - 01:08:36

Yeah, so I think that for something like active learning or in fact for any kind of capability, like active learning. The thing that is really needs is a problem. It needs a problem that requires it. It's very hard to do research about the capability if you don't have a task because then what's going to happen is you will come up with an artificial task. Get good results, but not really convinced anyone.

SPEAKER_01

01:08:37 - 01:08:47

Right. We're now past the stage, we're getting a result on MNIST. Some clever formulation of MNIST will convince people.

SPEAKER_00

01:08:47 - 01:09:08

That's right. In fact, you could quite easily come up with a simple active learning scheme on MNIST and get a 10x speed up, but then so what? And I think that with active learning, the active learning will naturally arise as problems that require it pop up. That's how I would, that's my, my take on it.

SPEAKER_01

01:09:09 - 01:10:14

There's another interesting thing that OpenA has brought up with GPT2, which is when you create a powerful artificial intelligence system, and it was unclear what kind of detrimental, once you release GPT2, what kind of detrimental effect they'll have, because if you haven't A model that can generate a pretty realistic text, you can start to imagine that it would be used by bots in some way that we can't even imagine. There's this nervousness about what it's possible to do. You did a brave and profound thing which started a conversation about this. How do we release powerful artificial intelligence models to the public. If we do it all, how do we privately discuss with other even competitors about how we manage the use of the systems and so on. So from that, this whole experience, you released a report on it, but in general, are there any insights that you've gathered from just thinking about this about how you release models like this?

SPEAKER_00

01:10:14 - 01:11:27

I mean, I think that my take on this is that the field of AI has been in a state of childhood and now it's exiting that state and it's entering a state of maturity. What that means is that AI is very successful and also very impactful and its impact is not only large but it's also growing. And so for that reason, it seems wise to start thinking about the impact of our systems before releasing them, maybe a little bit too soon rather than a little bit too late. And with the case of GPT II, like I mentioned earlier, the results really were stunning. And it seemed plausible. It didn't seem certain. It seemed plausible that something like GPT II could easily use to reduce the cost of this information. And so there was a question of what's the best way to release it and stage it release, psychological. Small model was released. And there was time to see the many people use these models in lots of cool ways. They've been lots of really cool applications. They haven't been any negative application to be no of. And so eventually it was released. But also other people who replicated similar models.

SPEAKER_01

01:11:27 - 01:11:52

This is the question now that we know of. So in your view stage release, as at least part of the answer to the question of how do we, how, what do we do once we create a system like this? It's part of the answer yes. Is there any other insights? Like say you don't want to release the model at all because it's useful to you for whatever the business is.

SPEAKER_00

01:11:52 - 01:11:56

Well, there are plenty of people done through these models already, right?

SPEAKER_01

01:11:56 - 01:12:36

Of course, but is there some moral ethical responsibility when you have a very powerful model to sort of communicate? Like, just as you said, when you had GPT2, it was unclear how much it could be used for misinformation. It's an open question. And getting an answer to that might require that you talk to other really smart people that are outside of, that's how they do your project clear group. Have you please tell me there's some optimistic pathway for people across the world to collaborate on these kinds of cases? Or is it still really difficult from one company to talk to another company?

SPEAKER_00

01:12:36 - 01:12:50

So it's definitely possible. It's definitely possible to discuss these kind of models with colleagues elsewhere and to get their take on what to do. How hard is it though?

SPEAKER_01

01:12:50 - 01:12:55

I mean, Do you see that happening?

SPEAKER_00

01:12:55 - 01:13:15

I think that's a place where it's important to gradually build trust between companies. Because ultimately, all the AI developers are building technology, which is going to be increasingly powerful. And so it's the way to think about it is that ultimately we're only together.

SPEAKER_01

01:13:15 - 01:13:48

Yeah, it's, I tend to believe in the, the better range of our nation, but I do hope that when you build a really powerful AI system in a particular domain, that you also think about the potential negative consequences of, yeah. It's an interesting and scary possibility that there would be a race for AI's AI development that would push people to close that development and not share ideas with others.

SPEAKER_00

01:13:50 - 01:13:58

I don't love this. I've been in a pure academic for 10 years. I really like sharing it. He is, and it's fun and exciting.

SPEAKER_01

01:13:58 - 01:14:10

What do you think it takes to, let's talk about AGI a little bit? What do you think it takes to build a system of human level intelligence? We talked about reasoning. We talked about a long-term memory, but in general, what does it take?

SPEAKER_00

01:14:10 - 01:14:19

Well, I can't be sure. But I think the deep learning plus maybe another small idea.

SPEAKER_01

01:14:19 - 01:14:44

Do you think self-play will be involved? Sort of like you've spoken about the powerful mechanism of self-play where systems learn by sort of exploring the world in a competitive setting against other entities that are similarly skilled as them. And so incrementally improving this way. Do you think self-play will be a component of building an AGI system?

SPEAKER_00

01:14:44 - 01:15:56

Yeah. So what I would say to build AGI, I think it's going to be Deep learning plus some ideas. And I think self-play will be one of those ideas. I think that that is a very self-play has this amazing property that it can surprise us. In truly novel ways, for example, like we, I mean, pretty much every self-play system. Both are daughter-bought. I don't know if you open a, I had a release about multi-agent where you had two little agents who were playing hide and seek. And of course, also Alpha Zero, they were all produced surprising behaviors. They all produced behaviors that we didn't expect. They are creative solutions to problems. And that seems like an important part of AGI that our systems don't exhibit routinely right now. And so that's why I like this area, like this direction, because of its ability to surprise us. to surprise us, and a jazz system would surprise us, but not just the random surprise, but to find the surprising solution to a problem, it's also useful.

SPEAKER_01

01:15:56 - 01:16:30

Right. Now, a lot of the self-play mechanisms have been used in the game context or at least in the simulation context. How much How far along the path to EGI do you think will be done in simulation? How much faith promise do you have in simulation versus having to have a system that operates in the real world? Whether it's the real world of digital real world data or real world like actual physical world of robotics?

SPEAKER_00

01:16:30 - 01:16:38

I don't think it's in either a war. I think simulation is a tool and it helps you have certain strengths and certain weaknesses and we should use it.

SPEAKER_01

01:16:38 - 01:17:21

Yeah, but okay, I understand that that's true. But one of the criticisms of self-play, one of the criticisms of reinforcement learning is one of the It's current power, it's current results while amazing, have been demonstrated in a simulated environment, or very constrained physical environments. Do you think it's possible to escape them? It's escaped the simulated environments and be able to learn in non-simulated environments, or do you think it's possible to also just simulate in the photorealistic and physics-realistic way, the real world and the way that we can solve real problems with self-play,

SPEAKER_00

01:17:22 - 01:17:49

in simulation. So I think that transfer from simulation to the real world is definitely possible and has been exhibited many times in many different groups. It's been a specialist successful in vision. Also open AI in the summer has demonstrated a robot hand which was trained entirely in simulation in a certain way that allowed for seem to real transfer to occur. This is the stuff for the Rupert scoop. Yeah, that's right.

SPEAKER_01

01:17:49 - 01:17:57

I wasn't aware that was trained in simulation was trained in simulation entirely Really, so what it wasn't in the physics that the hand wasn't trained

SPEAKER_00

01:17:58 - 01:18:11

No. 100% of the training was done in simulation. And the policy that was learned in simulation was trained to be very adaptive. So adaptive, that when you transfer it, it could very quickly adapt to the physical world.

SPEAKER_01

01:18:11 - 01:18:18

So the kind of perturbations with the giraffe or whatever the heck was, those weren't those part of the simulation.

SPEAKER_00

01:18:18 - 01:18:34

Well, the simulation was generally, so the simulation was trained to be robust to many different things, but not the kind of perturbations we've had in the video. So it's never been trained with the glove, it's never been trained with the stuff giraffe.

SPEAKER_01

01:18:34 - 01:18:49

So in theory, these are novel perturbations. Correct. It's not in theory in practice. Those are novel perturbations. Well, that's okay. That's a clean, small scale, but clean example of a transfer from the simulated world to the physical world.

SPEAKER_00

01:18:49 - 01:19:12

Yeah, and I will also say that I expect the transfer capabilities of deep learning to increase in general, and the better the transfer capabilities are, the more useful simulation will become. Because then you could take, you could experience something in simulation, and then learn a moral of the story, which you could then carry with you to the real world. Right? As humans do all the time, and they play computer games.

SPEAKER_01

01:19:14 - 01:19:37

So let me ask sort of a embodied question staying on AGI for a sec. Do you think AGI is still we need to have a body? We need to have some of those human elements of self-awareness, consciousness, sort of fear of mortality, sort of self-preservation in the physical space, which comes with having a body.

SPEAKER_00

01:19:37 - 01:20:07

I think having a body would be useful. I don't think it's necessary. But I think it's very useful to have a body for sure, because you can learn a whole new, you can learn things which cannot be learned without a body. But at the same time, I think that you can call, if you don't have a body, you could compensate for it and still succeed. Thanks, though. Yes. Well, there is evidence for this. For example, there are many people who were born, deaf, blind, and they were able to compensate for the lack of modalities. I'm thinking about healing, healing caler specifically.

SPEAKER_01

01:20:08 - 01:20:36

So even if you're not able to physically interact with the world and if you're not able to, I mean, I actually was getting it. Maybe let me ask on the more particular, I'm not sure if it's connected to having a body or not, but the idea of consciousness and a more constrained version of that is self-awareness. Do you think an EGI system should have consciousness? We can't define kind of whatever the heck you think consciousness is?

SPEAKER_00

01:20:36 - 01:20:50

Yeah, car question to answer, given how hard it is to define it. Do you think it's useful to think about? I mean, it's definitely interesting. It's fascinating. I think it's definitely possible that our systems will be conscious.

SPEAKER_01

01:20:50 - 01:21:04

Do you think that's an emergent thing that just comes from? Do you think consciousness could emerge from the representation that's stored within your networks? So like that, it naturally just emerges when you become more and more. You're able to represent more and more of the world.

SPEAKER_00

01:21:04 - 01:21:21

Well, let's say I'd make the following argument, which is Humans are conscious. And if you believe that artificial neural nets are sufficiently similar to the brain, then there should at least exist artificial neural nets which would be conscious too.

SPEAKER_01

01:21:21 - 01:21:23

You're leaning on that existence proof pretty heavily.

SPEAKER_00

01:21:23 - 01:21:30

Okay. So that's the best sense where I can give.

SPEAKER_01

01:21:30 - 01:21:47

No, I know. I know. There's still an open question if there's not some magic in the brain that we're not. I mean, I don't mean a non-materialistic magic, but that the brain might be a lot more complicated and interesting than we give a credit for.

SPEAKER_00

01:21:47 - 01:21:54

If that's the case, then it should show up and at some point we will find out that we can't continue to make progress. But I think it's unlikely.

SPEAKER_01

01:21:55 - 01:22:22

So we talk about consciousness, but let me talk about another poorly defined concept of intelligence. Again, we've talked about reasoning. We've talked about memory. What do you think is a good test of intelligence for you? Are you impressed by the test that Alan Torin formulated with the imitation game of natural language? Is there something in your mind that you will be deeply impressed by if a system was able to do?

SPEAKER_00

01:22:23 - 01:22:59

I mean, lots of things. There is a certain frontier, there is a certain frontier of capabilities today. And there exists things outside of that frontier. And I would be impressed by any such thing. For example, I would be impressed by a deep learning system, which solves a very pedestrian, you know, pedestrian task, like machine translation or computer vision task or something, which never makes me stake a human wouldn't make under any circumstances. I think that is something which have not yet been demonstrated and I would find it very impressive.

SPEAKER_01

01:22:59 - 01:23:06

Yeah, so right now they make mistakes and they might be more accurate than you being, but they still make a difference at mistakes.

SPEAKER_00

01:23:06 - 01:23:29

So my, I would guess that a lot of the skepticism that some people have about deep learning is even their look at their mistakes and they say, well, those mistakes, they make no sense. Like if you understood the concept, you wouldn't make that mistake. And I think that changing that would be, that would inspire me. That would be yes. This is progress.

SPEAKER_01

01:23:29 - 01:23:53

Yeah. That's a really nice way to put it. But I also just don't like that human instinct to criticize a model is not intelligent. That's the same instinct as we do when we criticize any group of creatures as the other. Because It's very possible that GPT2 is much smarter than human beings at many things.

SPEAKER_00

01:23:53 - 01:23:56

That's definitely true. It has a lot more breadth of knowledge.

SPEAKER_01

01:23:56 - 01:24:01

Yes, breadth of knowledge and even perhaps depth on certain topics.

SPEAKER_00

01:24:03 - 01:24:12

It's kind of hard to judge what depth means, but there's definitely a sense in which humans don't make mistakes that these models do.

SPEAKER_01

01:24:12 - 01:24:51

The same is applied to autonomous vehicles. The same is probably going to continue being applied to a lot of artificial intelligence systems. This is the process of In the 21st century, the process of analyzing the progress of AI is the search for one case where the system fails, you know, big way where humans would not. And then many people writing articles about it. And then broadly as a public generally gets convinced that the system is not intelligent. And we like pacify ourselves by thinking it's not intelligent because of this one and a total case. And this seems to continue happening.

SPEAKER_00

01:24:51 - 01:25:16

Yeah, I mean, there is truth to that. There are people, although I'm sure that plenty of people are also extremely impressed by the systems that exist today. But I think this connects to the early point, we discussed that. It's just confusing to judge progress in AI. And you know, you have a new robot demonstrating something. How impressed should you be? And I think that people will start to be impressed once AI starts to really move the needle on the GDP.

SPEAKER_01

01:25:17 - 01:25:33

So you're one of the people that might be able to create knee jazz system here, not you, but you and open AI. If you do create knee jazz system and you get the spend sort of the evening with it, him, her, what would you talk about, do you think?

SPEAKER_00

01:25:35 - 01:25:50

the very first time. Well, the first time I would just ask all kinds of questions and try to make it to get it to make a mistake and that would be amazed that it doesn't make mistakes and just keep asking broad.

SPEAKER_01

01:25:50 - 01:26:00

What kind of questions do you think would they be factual or would they be personal, emotional, psychological, what do you think all of them about?

SPEAKER_00

01:26:03 - 01:26:10

Would you ask for advice? Definitely. I mean, why would I limit myself talking to a system like this?

SPEAKER_01

01:26:10 - 01:27:03

Now, again, let me emphasize the fact that you truly are one of the people that might be in the room where this happens. So let me ask a sort of a profound question about, I've just talked to Stalin, his story. I've been talking to a lot of people who are studying power. Abraham Lincoln said, nearly all men can stand adversity, but if you want to test a man's character, give him power. I would say the power of the 21st century, maybe the 22nd, but hopefully the 21st would be the creation of an AGI system and the people who have control, direct possession of control of the AGI system. So what do you think after spending that evening having a discussion with the AGI system, what do you think you would do?

SPEAKER_00

01:27:03 - 01:27:57

Well, they deal with world that like to imagine. is one where humanity, alike, the board members of a company where they agize the CEO. So it would be, I would like the picture which I would imagine is you have some kind of different entities, different countries of cities. And the people that leave their vote for what the AGI that represents them should do, and the AGI that represents them goes and does it. I think a picture like that, I find very appealing. And you can have multiple aid use, have an AGI for a city for a country, and it would be, it would be trying to, in effect, take the democratic process to the next level. And the board, you know, was fired essentially, press the reset button, say.

SPEAKER_01

01:27:59 - 01:28:10

That's a beautiful vision, as long as it's possible to press the reset button.

SPEAKER_00

01:28:10 - 01:28:38

Do you think it will always be possible to press the reset button? So I think that it's definitely impossible to build. So the question that I really understand from you is, Will, Will, Humans or, Humans people have control over the AI systems that they build. Yes, and my answer is, it's definitely possible to build AI systems which will want to be controlled by their humans.

SPEAKER_01

01:28:38 - 01:28:51

Wow, that's part of their, so it's not that just they can't help but be controlled, but that's, that's, the existence of their existence is to be controlled.

SPEAKER_00

01:28:51 - 01:29:26

In the same way that human parents generally want to help their children. They want their children to succeed. It's not a burden for them. They are excited to help the children to feed them and to dress them and to take care of them. And I believe with high conviction that the same will be possible for an AGI. It will be possible to program an AGI to design it in such a way that it will have a similar deep drive that it will be delighted to fulfill and the drive will be to help humans flourish.

SPEAKER_01

01:29:28 - 01:30:24

But let me take a step back to that moment where you create the HCI system. I think this is a really crucial moment. And between that moment and the Democratic board members with the AGI at the head, there has to be a relinquishing a power. So as George Washington, despite all the bad things he did, one of the big things he did is he relinquished power. He first of all didn't want to be president. And even when he became president, he gave, he didn't keep just serving as most dictators do for indefinitely. Do you see yourself being able to relinquish control over an EGIS system, given how much power you can have over the world at first financial, just make a lot of money, right? And then control by having possession of the EGIS system?

SPEAKER_00

01:30:24 - 01:30:39

I find it trivial to do that. I find it trivial to relinquish this kind of, I mean. You know, the kind of scenario you are describing sounds terrifying to me. That's all. I would absolutely not want to be in that position.

SPEAKER_01

01:30:39 - 01:30:53

Do you think you represent the majority or the minority of people in the eye community? Well, I mean, open question and an important one. Our most people good is another way to ask it.

SPEAKER_00

01:30:53 - 01:31:04

So I don't know if most people are good, but I think that when it really counts, people can be better than we think.

SPEAKER_01

01:31:04 - 01:31:17

That's beautifully put. Yeah. Are there specific mechanisms you can think of of aligning AIG values to human values? Is that do you think about these problems of continued alignment as we develop the AI systems?

SPEAKER_00

01:31:17 - 01:32:24

Yeah, definitely. In some sense, The kind of question which you are asking is, so if you have to translate the question to the day's terms, yes, it would be a question about how to get an RL agent that's optimizing a value function which itself is learned. And if you look at humans, humans are like that because the reward function, the value function of humans is not external, it is internal. There are definite ideas of how to train a value function, basically an objective, you know, an as objective as possible perception system that will be trained separately. to recognize, to internalize human judgments on different situations. And then that component wouldn't be integrated as the base value function for some more capable of real system. You could imagine a process like this. I'm not saying this is the process I'm saying this is an example of the kind of thing you could do.

SPEAKER_01

01:32:24 - 01:32:35

So on that topic, of the objective functions of human existence, what do you think is the objective function? That's implicit in human existence. What's the meaning of life?

SPEAKER_00

01:32:45 - 01:33:09

I think the question is wrong in some way. I think that the question implies that there is an objective answer which is an external answer, you know, your manual life is X. I think what's going on is that we exist and that's amazing. And we should try to make the most of it and try to maximize our own value and enjoyment of our very short time while we do exist.

SPEAKER_01

01:33:10 - 01:33:24

It's funny because action does require an objective function. It's definitely theirs in some form, but it's difficult to make it explicit and maybe impossible to make it explicit. I guess is what you're getting. And that's an interesting fact of an RL environment.

SPEAKER_00

01:33:25 - 01:33:43

Well, but I was making a slightly different point, is that humans want things and their wants create the drives that cause them to, you know, our wants are our objective functions, our individual objective functions. We can later decide that we want to change, that what we wanted before is no longer good and you want something else.

SPEAKER_01

01:33:44 - 01:34:13

But there's so dynamic, there's got to be some underlying sort of Freud. There's things, there's sexual stuff. There's people who think it's the fear of death. And there's also the desire for knowledge. And all these kinds of things are a procreation. The sort of all the evolutionary arguments that seems to be and might be some kind of fundamental objective function from which everything else emerges, but it seems like it's very difficult.

SPEAKER_00

01:34:13 - 01:34:46

I think that probably is an evolutionary objective function, which is to survive and procreate and make sure you make your students succeed. That would be my guess, but it doesn't give an answer to the question of what's the meaning of life. I think you can see how humans are part of this big process, this ancient process, we exist on a small planet, And that's it. So, given that we exist, try to make the most of it and try to enjoy more and suffer less as much as we can.

SPEAKER_01

01:34:46 - 01:35:00

Let me ask two silly questions about life. One, do you have regrets? Moments that if you went back, you would do differently. And two, are there moments that you're especially proud of? I mean, you truly happy.

SPEAKER_00

01:35:01 - 01:35:30

So I can answer both questions. Of course, there's a huge number of choices and decisions that have made that with the benefit of hindsight, I wouldn't have made them. And I do experience some regret, but you know, I try to take solace in the knowledge that at the time I did the best they could. And in terms of things that I'm proud of, there are very fortunate things I'm proud of, and they made me happy from some time, but I don't think that that is the source of happiness.

SPEAKER_01

01:35:31 - 01:35:46

So your academic accomplishments, all the papers, you're one of the most cited people in the world, all of the breakthroughs I mentioned in computer vision and language and so on, is what is the source of happiness and pride for you?

SPEAKER_00

01:35:46 - 01:36:21

I mean, all those things are a source of pride for sure. I'm very, very grateful for having done all those things and it was very fun to do them. What happiness comes, you know, you can happiness well. My current view is that happiness comes from our to a lot to a very large degree from the way we look at things. You know, you can have a simple meal and be quiet, happy as a result or you can talk to someone and be happy as a result as well. Or conversely, you can have a meal and be disappointed that the meal wasn't a better meal. So I think a lot of happiness comes from that, but I'm not sure, I don't want to be to confident I.

SPEAKER_01

01:36:22 - 01:36:44

Being humble in the face of the uncertainty seems to be also a part of this whole happiness thing. Well, I don't think there's a better way to end it than meaning of life and discussions of happiness. So, I really thank you so much. You've given me a few incredible ideas. You've given the world many incredible ideas. I really appreciate it and thanks for talking today.

SPEAKER_00

01:36:44 - 01:36:46

Thanks for stopping by. I really enjoyed it.

SPEAKER_01

01:36:47 - 01:37:36

Thanks for listening to this conversation with Ilyas at Skiver, and thank you to our presenting sponsored cash app. Please consider supporting the podcast by downloading cash app and using code Lex podcast. If you enjoy this podcast, subscribe my YouTube review the five stars in Apple podcast, support on Patreon or simply connect with me on Twitter at Lex Friedman. And now, let me leave you with some words from Alan Turing on machine learning. instead of trying to produce a program to simulate the adult mind, why not rather try to produce one, which simulates the child? If this were then subjected to an appropriate course of education, one would obtain the adult brain. Thank you for listening and hope to see you next time.