Transcript for #258 – Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning

SPEAKER_01

00:00 - 07:11

The following is a conversation with Yalenkun. His second time in the podcast, he is the chief AI scientist that met a formerly Facebook professor at NYU, touring a ward winner, one of the seminal figures in the history of machine learning and artificial intelligence, and someone who is brilliant and opinionated in the best kind of way, and so is always fun to talk to. And now, a quick few second mention of each sponsor. Check them out in the description. It's the best way to support this podcast. First, it's public goods. An online shop I use for household products. Second, is indeed a hiring website. Third is Roka. My favorite song glasses in prescription glasses. Fourth is Nat Sweet. Business software for managing HR, financials, and other details. And fifth is Magic Spoon. Low carb, kiddo friendly cereal. So the choice is business, health, or style. Choose wisely my friends. And now onto the full eyed reads. As always, no ads in the middle. I try to make this interesting, but if you skip them, please still check out our sponsors. I enjoy their stuff, maybe you will too. This show is brought to you by Public Goods. The one stop shop for affordable, sustainable, healthy household products. I use their handsop, toothpaste, and toothbrush. They can't use a much other stuff too, but that's what comes to mind. The products often have this minimalist black and white design that I just absolutely find beautiful. I love it. I love minimalism and design. It doesn't go over the top. It doesn't have all these extra things and features that you don't need just these essentials. I think it's hard to explain. But there's something about the absence of things that can take up your attention that allows you to truly be attentive to what matters in life. Anyway, go to publicgoods.com slash lex or use code lex at checkout to get $15 off your first order plus you will receive your choice of either a free pack of bamboo straws or reusable food storage wraps. Visit publicgoods.com slash Lex or use code Lex at checkout. This show is also brought to you by indeed a hiring website. I've used them as part of many hiring efforts I've done for the teams I've led in the past. Most of those for was for engineering for research efforts. They have tools like indeed is the match giving you quality candidates whose resumes and indeed fit your job description immediately. For the past few months I've been going through this process of building up a team of folks that helped me have been doing quite a bit of hiring. It's a treacherous and an exciting process. because you get to meet some friends. So it's a beautiful process, but I think it's one of the most important processes in life. It's selecting the group of people whom you spend your days with. And so you gotta use the best tools for the job. Indeed, I think it's an excellent tool. Right now, you can get a free $75 sponsored job credit to upgraded job post at indeed.com slash Lex. In terms and conditions apply, go to indeed.com slash Lex. This show is also brought to you by Roka. The makers of glasses and sunglasses that I love wearing for their design, feel and innovation on material optics and grip. Roka was started by two All-American swimmers from Stanford and it was born out of an obsession with performance. I like the way they feel. I like the way they look. Whether I'm running like a fast pace run for talking about eight minute mile or faster or for doing a slow pace run nine ten minute mile along the river in the heat or in the cold or if I'm just wearing my suit Out on a town however that expression goes I'm not sure but they look classy with a suit they look badass in running gear It's just my go to sunglasses Check them out for both prescription glasses and sunglasses at rocco.com and enter code Lex the safe 20% on your first order. That's rocco.com and enter code Lex. This shows also brought to you by NetSuite. NetSuite allows you to manage financials, human resources, inventory, e-commerce, and many more business related details on one place. I'm not sure why I was doing upspeak on NetSense. Maybe because I'm very excited about that suite. Anyway, there's a lot of messy things you have to get right when running a company. If you're a business owner, if you're an entrepreneur, if you're a founder of a startup, this is something you have to think about. You have to use the best tools for the job. to make sure all the messy things require to run a business that's taken care of for you so you can focus on the things that you're best at, that where your brilliance shines. If you are starting a business, I wish you the best of luck. It's a difficult journey, but it's worth it. Anyway, right now special financing is back. Had to net sweet.com slash lex to get there. One of a kind financing program that's net sweet.com slash lex. that suite.com slash Lex. This episode is also brought to you by Magic Spoon. The OG, not quite OG, but really old school sponsor of this podcast that I love. It's a low carb kid or friendly cereal. It has zero grams of sugar. It's delicious. I don't say that enough. It's really is delicious. Given that it's zero grams of sugar, it's very surprising how delicious it is. 13 to 14 grams of protein, only 4 net grams of carbs, and 140 calories in each serving. You could build your own box or get a variety pack with available flavors of cocoa, fruity, frosted peanut butter, blueberry, and cinnamon. Cocoa is my favorite. It's the flavor of champions. I don't know why I keep saying that, but it seems to be true. Anyway, Magic Spoon has a 100% happiness guarantee, so if you don't like it, they will refund it. Who else? We'll give you a 100% happiness guarantee. Go to magicspoon.com slash lex and use code lex at checkout to say $5 off your order. That's magicspoon.com slash lex and use code lex. This is the Lex Friedman podcast. And here's my conversation. We all look good. You co-wrote the article, self-supervised learning the dark matter of intelligence. Great title, by the way. Will he shine, Missura? So let me ask what is self-supervised learning and why is it the dark matter of intelligence?

SPEAKER_00

07:11 - 07:52

I'll start by the documentary part. There is obviously a kind of learning that humans and animals are doing, that we currently are not reproducing properly with machines, or with AI. So the most popular approaches to machine learning today are, or paradigms, I should say, are supervised running and reinforcement running. And they are extremely inefficient. Supervised learning requires many samples for learning anything. And reinforcement learning requires a ridiculously large number of trial and errors to for, you know, a system to learning anything. And that's why we don't have self-driving cars.

SPEAKER_01

07:55 - 08:16

That was a big leap for one to the other, okay? So that to solve difficult problems, you have to have a lot of human annotation for supervised learning to work and to solve those difficult problems with the reinforcement learning, you have to have some weight to maybe simulate that problem such that you can do that large scale kind of learning that reinforcement learning requires.

SPEAKER_00

08:16 - 09:34

Right, so how is it that, you know, most teenagers can learn to drive a car in about 20 hours of practice? Whereas even with millions of hours of simulated practice, sort of having car can't actually learn to drive itself properly. And so obviously we're missing something, right? And it's quite obvious for a lot of people that the immediate response you get from people is, well, humans use their background knowledge to run faster. And they're right. Now, how was that background knowledge acquired? And that's to be questioned. So, now you have to ask, you know, how do babies in the first few months of life learn how the world works? Mostly by observation because they can hardly act in the world. and they learn enormous amount of background knowledge about the world that may be the the basis of what we call common sense. This type of learning is not learning a task is not being reinforced for anything is just observing the world and figuring out how it works. Building world models, learning world models. How do we do this? And how do we reproduce this in machine? So self supervision is one instance or one attempt at trying to reproduce this kind of learning.

SPEAKER_01

09:35 - 09:47

Okay, so you're looking at just observation. So not even the interacting part of a child. It's just sitting there watching Mom and Dad walk around, pick up stuff, all that. That's the, that's what we meet up back on knowledge.

SPEAKER_00

09:47 - 09:52

Perhaps not even watching Mom and Dad just, you know, watch the world go by.

SPEAKER_01

09:52 - 10:17

Just having eyes open or having eyes closed or the very act of opening and closing eyes that the world appears and disappears all that basic information. And you're saying in, in order to learn to drive, Like the reason humans are able to learn to drive quickly, some faster than others, is because of the background knowledge they're able to watch cars operate in the world in the many years leading up to it, the physics of basic objects, all that kind of stuff.

SPEAKER_00

10:17 - 11:03

That's right. I mean, the basic physics of objects, you don't even know, you don't even need to know how it works, right? Because that, you can learn fairly quickly. I mean, the example I use very often is you're driving next to a cliff. And you know in advance, because of your understanding of intuitive physics, that if you turn the wheel to the right, the car will be out to the right, we'll run off the cliff, fall off the cliff, and nothing good will come out of this, right? But if you are a sort of You know, tabularize a reinforcement learning system that doesn't have a model of the world. You have to repeat falling off this cliff thousands of times before your figure out it's a bad idea. And then a few more thousand times before you figure out how to not do it. And then a few more million times before your figure out how to not do it in every situation you ever encounter.

SPEAKER_01

11:04 - 11:37

So self-supervised learning still has to have some source of truth being told to it by somebody. And you have to figure out a way without human assistance or without significant amount of human assistance to get that truth from the world. So the mystery there is how much signal is there, how much truth is there that the world gives you, whether it's the human world, like you watch E2 or something like that, or it's the more natural world. So how much signal is there?

SPEAKER_00

11:37 - 13:21

So here is a trick. There is way more signal in sort of a self-supervised setting than there is in either a supervised or reinforcement setting. And this is going to my analogy of the cake. The, you know, locate has someone that's called it, where when you try to figure out how much information you ask the machine to predict and how much feedback you give the machine at every trial, In reinforcement learning, you give the machine a single scalar. You tell the machine, you did good, you did bad. And you only tell this to the machine once in a while. When I say you, it could be the universe telling the machine right. But it's just one scalar. And so as a consequence, you could not possibly learn something very complicated with many, many, many trials where you get many, many feedbacks of this type. Supervisioning, you give a few bits to the machine at every sample. It's a, you're training a system on, you know, recognizing images on the image net. There was 1,000 categories that a little less than 10 bits of information, for example. But so super resonating here is setting. You ideally, we don't know how to do this yet, but ideally, you would show a machine a segment of a video and then stop the video and ask the machine to put it what's going to happen next. So you let the machine predict and then you let time go by and show the machine what actually happened and hope the machine will learn to do a better job at predicting next time around. There's a huge amount of information you give the machine because it's an entire video clip of the future after the video clip you fed it in the first place.

SPEAKER_01

13:22 - 14:15

So both for language and provision, there's a subtle, seemingly trivial construction, but maybe that's representative of what is required to create intelligence, which is throwing the gap. So it sounds dumb. It is possible you can solve all of intelligence in this way. Just for both language, just give a sentence and continue it or give a sentence and there's a gap in it. Some words blanked out and you fill in what words go there. For vision, you give a sequence of images and predict what's going to happen next or you fill in what happened in between. Do you think it's possible that formulation alone as a signal for self-supervised learning can solve intelligence for vision and language?

SPEAKER_00

14:15 - 15:12

I think that's so best shot at the moment. So whether this will take us all the way to, you know, human level intelligence or something or just cat level intelligence is no clear, but among all the possible approaches that people have proposed, I think it's so best shot. So, I think this idea of an Italian system filling in the blanks, either, you know, pretty into the future, inferring the past, filling in missing information. You know, I'm currently filling the blank of what is beyond your head and what you, what you had looked like from the back because I have, you know, a basic knowledge about how humans are made. And I don't know if you're going to, you know, where are you going to say at which point you're going to speak, whether you're going to move your head this way or that way, which way you're going to look. But I know you're not going to just dematerialize and reappear at three meters down the whole, you know, because I know what's possible and what's impossible according to the physics.

SPEAKER_01

15:12 - 15:19

So you have a model of what's possible and it'd be very surprised if it happens and then you'll have to reconstruct your model.

SPEAKER_00

15:19 - 15:40

Right. So that's the model of the world. It's what tells you what fills in the blanks. So given your partial information about the set of the world, given by your perception, your model of the world fills in the missing information, and that includes predicting the future, which would dictate the past, filling in things you don't immediately perceive.

SPEAKER_01

15:40 - 16:10

And that doesn't have to be purely generic vision or vision information or generic language. You can go to specifics like predicting what control decision you make when you're driving in a lane. You have a sequence of images from a vehicle. and then you couldn't you have information if you recorded on a video where the car ended up going so you can go back in time and predict where the car went based on the visual information that's very specific domain specific

SPEAKER_00

16:11 - 17:16

Right, but the question is whether we can come up with a generic method for training machines to do this kind of prediction or filling in the blanks. So right now, this type of approach has been unbelievably successful in the context of natural language processing. Every modern natural language processing is pre-trained in self-supervised manner. to fill in the blanks. You show it a sequence of words, you remove 10% of them, and then you train some gigantic neural net to predict the words that are missing. And once you've pre-trained that network, you can use the internal representation, learn by it, as input to something that you train supervised or whatever. That's been incredibly successful, not so successful in images, although it's making progress. And it's based on manual data augmentation. We can go into this later. But what has not been successful yet is training from video. So getting a machine to learn to represent the visual world, for example, by just watching video. Nobody is really succeeded in doing this.

SPEAKER_01

17:16 - 17:59

okay well let's kind of give a high level overview what's the difference in kind and in difficulty between vision and language so you said people haven't been able to really kind of correct the problem of vision open in terms of self-supervised learning but that may not be necessarily because it's fundamentally more difficult maybe like when we're talking about achieving like passing the touring test in full spirit of the touring test in language might be harder than vision. That's that's not obvious. So in your view, which is harder or perhaps are they just the same problem. When the farther we get the solving each, the more we realize it's all the same thing. It's all the same cake.

SPEAKER_00

17:59 - 20:42

I think what I'm looking for are methods that make them look essentially like the same cake, but currently they're not. And the main issue with learning world models or learning predictive models, is that the prediction is never a single thing, because the world is not entirely predictable. It may be deterministic or stochastic, we can get into the philosophical discussion about it, but even if it's deterministic, it's not entirely predictable. And so, if I play a short video clip and then I ask you to predict what's going to happen next, there is many, many plausible continuations for that video clip and the number of continuation grows with the interval of time that you're asking the system to make a prediction for. And so, one big question with our supervision is how you represent this uncertainty, how you represent multiple discrete outcomes, so you represent a sort of continuum of possible outcomes, etc. And you know, if you are a sort of a classical machine learning person, you say, oh, you just represent a distribution, right? And that we know how to do when we're predicting words, missing words in the text, because you can have a neural net give a score for every words in the dictionary. It's a big list of numbers, 100,000 or so. You can turn them into a part of this distribution that tells you when I say a sentence, the cat is chasing the blank in the kitchen. You know, there are only a few words that make sense there. You know, it could be a mouse or it could be a lizard spot or something like that, right? And if I say the blank is chasing the blank in the Savannah, you also have a bunch of plausible options for those two words, right? Because you have kind of a underlying reality that you can refer to to sort of fill in those blanks. You cannot say for sure in the Savannah, if it's a lion or a cheetah or whatever, you cannot know if it's a zebra or a blue or whatever, will it be the same thing? But you can represent the uncertainty by just a long list of numbers. If I do the same thing with video when I ask you to predict a video clip, it's not a discrete set of potential frames. You have to have somewhere representing an infinite number of plausible continuations of multiple frames in a, you know, a high dimensional continuous space. And we just have no idea how to do this properly.

SPEAKER_01

20:42 - 20:44

Uh, finite, high dimensional.

SPEAKER_00

20:44 - 20:47

So like you could find a high dimensional. Yes.

SPEAKER_01

20:47 - 21:17

Just like the words, they try to get it to, uh, down to a small finite set of like under a million, something like that. I mean, it's kind of ridiculous that we're doing a distribution of every single possible word for language and it works. It feels like that's a really dumb way to do it. Like there seems to be like there should be some more compressed representation of the distribution of the words.

SPEAKER_00

21:17 - 21:17

You're right about that.

SPEAKER_01

21:18 - 21:25

And so, I agree. Do you have any interesting ideas about how to represent all the reality in a compressed way, such that you can form a distribution over it?

SPEAKER_00

21:25 - 23:06

That's one of the big questions, you know, how do you do that? Right, I mean, what's kind of, you know, another thing that really is stupid about, I shouldn't say stupid, but like simplistic about current approaches to self-supervisioning in an LPE in text. is that not only do you represent the giant distribution of words, but for multiple words that are missing, those distributions are essentially independent of each other. And you don't pay too much of price for this. So you can't. So the system, the sentence that I gave earlier, If he gives a certain probability for lion and cheetah and then a certain probability for, you know, gazelle, welderbeast and zebra, those two probabilities are independent of each other. And it's not the case that those things are independent lions actually attack like bigger animals than cheetahs. So, you know, there is a huge independent hypothesis in this process, which is not actually true. The reason for this is that we don't know how to represent properly distributions over a combinatorial sequences of symbols, essentially, when the numbers grow exponentially with the length of the symbols. And so we have to use tricks for this, but those techniques can get around, like, don't even deal with it. So the big question is, would there be some sort of abstract latent representation of text that would say that when I switch lion for gazelle, lion for chida, I also have to switch zebra for gazelle.

SPEAKER_01

23:07 - 23:44

Yes, so this independence assumption, let me throw some criticism out of you that I often hear and see how you respond. So this kind of feeling in the blanks, just statistics, you're not learning anything. Like the deep underlying concepts, you're just mimicking stuff from the past. You're not learning anything new such that you can use it to generalize about the world. Or, okay, let me just say the crude version, which is just statistics. It's not intelligence. What do you have to say to that? What do you usually say to that if you kind of hear this kind of thing?

SPEAKER_00

23:44 - 23:55

I don't get into the discussions because they are kind of pointless. So first of all, it's quite possible that intelligence is just statistics. It's just statistics with a particular kind.

SPEAKER_01

23:55 - 24:01

Where this is the philosophical question. Is it possible that intelligence is just statistics?

SPEAKER_00

24:03 - 25:31

But what kind of statistics? So if you're asking the question, are the model of the world, the models of the world that we learn, do they have some notion of causality? Yes. So if the criticism comes from people who say, you know, a current machine running system don't care about causality, which by the way is wrong, you know, I agree with them. You should, you know, your model of the world should have your actions as one of the inputs, and that will drive you to learn causal models of the world where you know what what intervention in the world will cause, what result, or you can do this by observation of other agents acting in the world and observing the effect of other humans, for example. So I think at some level of description, intelligence is just statistics. But that doesn't mean you don't, you don't, you don't have models that have, you know, deep mechanistic explanation for what goes on. The question is how do you learn them? That's the question I'm interested in. Because, you know, a lot of people who actually voice their criticism say that those mechanistic models have to come from someplace else, they have to come from human designers, they have to come from, I don't know what. And obviously, we learn them. or if we don't learn them as an individual nature, learn them for us using evolution. So regardless of what you think, those processes have been learned somehow.

SPEAKER_01

25:32 - 26:57

So if you look at the human brain, just like when we humans introspect about how the brain works, it seems like when we think about what it is intelligence, we think about the high level stuff, like the models which constructed concepts, like cognitive science, like concepts of memory and reasoning module, almost like these high level modules. Is this service a good analogy? Like are we ignoring the The dark matter, the basic low-level mechanisms, just like we ignore the way the operating system works, we're just using the high-level software. We're ignoring that at the low level, the neural network might be doing something like statistics. Sorry to use this word probably incorrectly, but doing is kind of feeling the gap kind of learning. It's just kind of updating the model constantly in order to be able to support the raw sensory information. and making it to predict it and adjust to the prediction when it's wrong. But like, when we look at our brain at the high level, it feels like we're doing like we're playing chess, like we're playing with high level concepts and we're stitching them together and we're putting them into long-term memory. But really what's going underneath is something we're not able to introspect, which is this kind of simple, large neural network that's just filling in the gaps.

SPEAKER_00

26:58 - 29:52

Right. Well, OK, so there's a lot of questions that are answers there. OK, so first of all, there's a whole school of thought in neuroscience, competition on neuroscience, in particular, that likes the idea of predictive coding, which is really related to the idea I was talking about. It's also a provisioning. So everything is about prediction. The essence of intelligence is the ability to predict. And everything that brain does is trying to predict everything from everything else. Okay, and that's really sort of the underlying principle, if you want that self-supervised learning is trying to kind of reproduce this idea of prediction that's kind of an essential mechanism of task independent learning, if you want. The next step is what kind of intelligence are you interested in reproducing? And of course, we all think about trying to reproduce high-level cognitive processes in humans, But like with machines, we're not even at the level of even reproducing the learning processes, you know, cat brain. You know, the most intelligent or intelligent systems don't have as much common sense as a house cat. So how is it that cat's learn? And cats don't do a whole lot of reasoning. They certainly have causal models. They certainly have, because many cats can figure out how they can act on the world to get what they want. They certainly have a fantastic model of intuitive physics, certainly of the dynamics of their own bodies, but also praise and things like that. So they're pretty smart. They only do this with about 800 million neurons. We are not anywhere close to reproducing this kind of thing. So to some extent, I could say, let's not even worry about the high-level cognition and, you know, long-term planning and reasoning that humans can do until we figure out, like, you know, can we even reproduce what cats are doing? Now that said, this ability to learn world models, I think is the key to the possibility of running machines that can also reason. So whenever I give a talk, I say there are three challenges in the three main challenges in machine learning. The first one is getting machines to learn to represent the world and I'm proposing self-supervised learning. The second is getting machines to reason in ways that are compatible with It's actually gradient-based running because this is what deep running is all about, really. And the third one is something we have no idea how to solve it, is I have no idea how to solve. Can we get machines to learn how rarchical representations of action plans? You know, like, you know, we know how to trend them to learn how rarchical representations of perception, you know, with computational nets and things like that and transformers. But what about action plans? Can we get them to spontaneously learn good rarchical representations of actions?

SPEAKER_01

29:52 - 29:53

Also, gradient based.

SPEAKER_00

29:54 - 30:04

Yeah, all of that needs to be somewhat differentiable so that you can apply sort of gradient based running, which is really what deep running is about.

SPEAKER_01

30:04 - 30:23

So it's background knowledge ability to reason in a way that's differentiable that is somehow connected deeply integrated with that background knowledge or builds and top that background knowledge. And then giving that background knowledge be able to make hierarchical plans in the world.

SPEAKER_00

30:24 - 34:03

So if you take classical optimal control, there's something classical optimal control called model-productive control. And it's been around since the early 60s. And as I use it's that to compute trajectories of rockets. And the basic idea is that you have a predictive model of the rocket, let's say, or whatever system you intend to control. which given the state of the system at time t and given an action that you're taking in the system. So for a racket to be thrust and you know all the controls you can have. It gives you the state of the system at time t plus delta t, right? So basically the differential equation, something like that. And if you have this model and you have this model in the form of some sort of neural net or some sort of set of formula that you can back propagate gradient through, you can do what's called model predictive control or gradient based model predictive control. So you have you can unroll that that model in time, you you you you you you feed it a hypothesis I sequence of actions. And then you have some objective function that measures how well at the end of the trajectory of the system as succeeded or matched what you wanted to do. Is it a reward harm? Have you grasped the object you want to grasp if it's a rocket? Are you the right place near the space station? Things like that. And by back propagation through time, and again, this was invented in the 1960s by optimal control theories, you can figure out what is the optimal sequence of actions that will get my system to the best final state. That's a form of reasoning. It's basically planning, and a lot of planning systems in robotics are actually based on this. And you can think of this as a form of reasoning. So to take the example of the teenager driving a car again, you have a pretty good dynamic model of the car. It doesn't need to be very accurate. But again, that if you turn the wheel to the right, and there is a cliff, you're going to run off the cliff. You don't need to have a very accurate model to predict that. And you can run this in your mind and decide not to do it for that reason. because you can predict an advance that the result is going to be bad. So you can sort of imagine different scenarios and then employ or take the first step in the scenario that is most favorable and then repeat the process of planning. That's called receding horizon model for the deep control. So even all those things have names going back decades. And so if you're not, not the, you know, classical optimal control, the model of the world is not generally learned. There's sometimes a few parameters you have to identify that's called systems identification. But generally, the model is mostly deterministic and mostly built by hand. So the big question of AI I think the big challenge of AI for the next decade is how we get machines to run pretty big models of the world that deal with uncertainty and deal with the real world in all this complexity. So it's not just trajectory of a rocket, which you can reduce to first principles, it's not even just trajectory of a robot arm, which again you can model by careful mathematics. But it's everything else, everything we've observed in the world, you know, people behavior, physical systems that involve collective phenomena like water or trees and branches in a tree or something or complex things that humans have no trouble developing abstract representations and pretty clear model for, but we still don't know how to do with machines.

SPEAKER_01

34:03 - 34:28

Where do you put in these three maybe in the planning stages, the game theoretic nature of this world, where your actions not only respond to the dynamic nature of the world, the environment, but also affected. So if there's other humans involved, is this point number four, or is it somehow integrated into the hierarchical representation of action in your view?

SPEAKER_00

34:28 - 34:42

I think it's integrated. It's just that now your model of the world has to deal with, it just makes it more complicated. The fact that humans are complicated and not easily predictable, that makes your model of the world much more complicated, that much more complicated.

SPEAKER_01

34:43 - 35:51

Well, there's a chess, I mean, I suppose chess is an analogy. So, multi-cultural tree search. I mean, there's a, I go, you go, I go, you go. Like, under Capot, the reason to give a talk at MIT about Cardoars. I think there's some machine learning too, but mostly Cardoars. And there's a dynamic nature to the car, like the person opening the door check and he wasn't talking about that. He was talking about the perception problem of what the anthology of what defines a car door, this big philosophical question. But to me it was interesting because like it's obvious that the person opening the car doors, they're trying to get out, like here in New York, trying to get out of the car, you're slowing down, is going to signal something, you're speeding up, is going to signal something, and that's a dance. It's a asynchronous chess game, I don't know. So it feels like it's not just, I mean, I guess you can integrate all of them to one giant model, like the entirety of these little interactions, because it's not as complicated as chess, it's just like a little dance. We do like a little dance together, and then we figure it out.

SPEAKER_00

35:52 - 38:31

Well, in some ways, it's way more complicated than chess because because it's continuous, it's uncertain in a continuous manner. It doesn't feel more complicated. But it doesn't feel more complicated because that's what we've evolved to solve. This is the kind of problem we've evolved to solve. And so we're good at it because, you know, nature has made us good at it. Nature has not made us good at chess. We completely suck at chess. In fact, that's why we design it as a game is to be challenging. And if there is something that, you know, a recent progress in the chess and go has made us realize, is that humans are retireable at those things. Like, really bad. You know, there was a story right before AlphaGo that, you know, the best goal player thought there were maybe two or three stones behind, you know, an ideal player that they would call God. In fact, no, they are like nine or ten stones behind them. We just bad. So we're not good at, because we have limited working memory. We're not very good at doing this tree exploration that computers are much better at doing than we are. But we are much better at learning differentiable models of the world. I mean, I said differentiable in the kind of You know, I should say, not differentiable in the sense that, you know, we run back far through it, but in the sense that our brain as some mechanism for estimating gradients of some kind, and that's what makes us efficient. So if you have an agent that consists of a model of the world, which, you know, in the human brain is basically the entire front half of your brain. an objective function, which in humans is a combination of two things. There is your intrinsic motivation module, which is based on Ganglia. That's the thing that measures pain and hunger and things like that, like immediate feelings and emotions. And then there is You know, the equivalent of what people in reinforcement learning called a critic, which is a sort of module that predicts ahead what the outcome of a situation will be. And so it's not a cross-function, but it's sort of not a objective function, but it's sort of you know, train predictor of the ultimate objective function. And that also is differentiable. And so if all of this is differentiable, your cost function, your credit, your, you know, your world model, then you can use gradient-based type methods to do planning, to do reasoning, to do learning, you know, to do all the things that we like an intelligent agent.

SPEAKER_01

38:33 - 38:47

to do. And the grading based learning, like what's your intuition, that's probably at the core of what can solve intelligence. So you don't need like a logic based reasoning in your view.

SPEAKER_00

38:47 - 40:10

I don't know how to make logic based reasoning compatible with efficient learning. OK, I mean, there is a big question, perhaps, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if you do this, if Second, if it does optimize an objective function, does it do it by some sort of gradient estimation? It doesn't need to be backprop, but some way of estimating the gradient in efficient manner. whose complexity is on the same order of magnitude as, you know, actually running the inference. Because you can't afford to do things like, you know, perturbing a weight in your brain to figure out what the effect is, and then sort of, you know, you can do sort of estimating gradient by perturbation. To me, it seems very implausible that the brain uses some sort of, you know, zorth order of black box, gradient free optimization, because it's so much less efficient than ingredient optimization. So it has to have a way of estimating ingredients.

SPEAKER_01

40:11 - 40:35

Is it possible that some kind of logic, big reasoning emerges in pockets? As a useful, like you said, if the brain is an objective function, maybe some mechanism for creating objective functions. It's a mechanism for creating knowledge bases, for example, that can then be queried. Like maybe a sufficient representation or knowledge that's learned in a gradient based way or something like that.

SPEAKER_00

40:35 - 40:53

Well, so I think there is a lot of different types of intelligence. So first of all, I think the type of logical reasoning that we think about that we are, you know, maybe stemming from, you know, sort of classical AI within 1970s and 80s. I think humans use that relatively rarely.

SPEAKER_01

40:55 - 41:03

And are not particularly good at it, but we judge each other based on our ability to solve those rare problems. It's called, like, you test.

SPEAKER_00

41:03 - 41:07

Think so. Like, I'm not very good at chess.

SPEAKER_01

41:07 - 41:09

Yes, I'm judging you this whole time.

SPEAKER_00

41:09 - 41:15

Because what we actually with your, with your, you know, heritage, I'm sure you're good at chess.

SPEAKER_01

41:15 - 41:18

So stereotypes, not all stereotypes that you're

SPEAKER_00

41:20 - 42:47

Well, I'm terrible at chess. But I think perhaps another type of mutations that I have is this ability of building models to the world from reasoning, obviously, but also data. And those models generally are more analogical. So it's reasoning by simulation and by analogy. where you use one model to apply to a new situation, even though you've never seen that situation, you can sort of connect it to a situation you encountered before. And your reasoning is more akin to some sort of internal simulation. So you're kind of simulating what's happening when you're building, I don't know, a box at a word or something, right? You can imagine in advance what would be the result of cutting the wood in this particular way? Are you going to use screws on nails or whatever? When you are interacting with someone, you also have a model of that person and sort of interact with that person, you know, having this model in mind to kind of tell the person what you think is useful to them. So I think this disability to construct models of the world is basically the essence of intelligence. and the ability to use it then to plan actions that will fulfill particular criterion, of course, is necessary as well.

SPEAKER_01

42:47 - 43:47

So I'm going to ask you a series of impossible questions as we keep asked, is that been doing? So if that's the fundamental sort of dark matter of intelligence, disability to form a background model, what's your intuition about how much knowledge is required? You know, I think dark matter, you put a percentage on it of the composition of the universe and how much of it is dark matter, how much of it is dark energy, how much Information do you think is required to be a house cap? So you have to be able to when you see a box going it when you see a human compute the most evil action if there's a thing that's near an edge you knock it off All of that plus the extra stuff you mentioned which is a great self-awareness of the physics of your of your own body in the in the world How much knowledge is required do you think to solve it? I don't even know how to measure an answer to that question.

SPEAKER_00

43:47 - 44:57

I'm not sure how to measure it, but whatever it is, it fits in about 800,000 neurons, 800 million neurons, or the representation does. Everything, all knowledge, everything, right? It was less than a billion. A dog is two billion, but a cat is less than one billion. And so, multiply that by a thousand and you get the number of synapses. And I think almost all of it is learned through this, you know, a sort of self-supervised running. Although, you know, I think a tiny sliver is learned through reinforcement training and certainly very little through classical supervised running, although it's not even clear how supervised running actually works in the biological world. So I think almost all of it is self supervised running. But it's driven by the sort of ingrain objective functions that a cat or human have at the base of their brain, which kind of drives their behavior. So nature tells us you're hungry. It doesn't tell us how to feed ourselves. That's something that the rest of our brain has to figure out.

SPEAKER_01

44:57 - 45:26

What's interesting is that it might be more like deeper objective functions than allowing the whole thing. So hunger may be some kind of, you know, you go to like neurobiology, it might be just the brain. trying to maintain homeostasis. So hunger is just one of the human perceivable symptoms of the brain being unhappy with the way things are currently. It could be just like one really dumb objective function at the core.

SPEAKER_00

45:26 - 47:11

But that's how behavior is driven. The fact that the orbital ganglia drivers to do things that are different from say in a long term, or certainly a cat, is what makes human nature versus long term nature versus cat nature. So for example, our visual ganglia drives us to seek the company of other humans. And that's because nature has figured out that we need to be social animal for our species to survive and is true of many primates. It's not true or wrong atons. Or wrong atons are solitary animals. They don't seek the company of others. In fact, they avoid them. In fact, they scream at them when they come too close because they are territorial. Because for their survival, you know, evolution is figured out. That's the best thing. I mean, they're occasionally social, of course, for, you know, production stuff like that. But they're mostly solitary. So all of those behaviors are not part of intelligence. You know, people say, oh, you're never going to have intelligent machines because, you know, human intelligence is social. But then you look at orangutans, you look at octopus. Octopus never under the parents. They barely interact with any other. And they get to be really smart in less than a year, like half a year. You know, in the year of the adults, in tweers, the dead. So there are things that we think as humans are individually linked with intelligence, like social interaction, like language. We think, I think we give way to much importance to language as a substrate of intelligence as humans, because we think our reasoning is so linked with language.

SPEAKER_01

47:11 - 47:27

So for just solve the housecat intelligence problem, you think you could do it on a desert island. You could have pretty much a few. You could just have a cat sitting there looking at the waves at the ocean waves and figure a lot of it out.

SPEAKER_00

47:27 - 48:01

It needs to have sort of, you know, the right set of drives to kind of, you know, get it to do the thing and learn the appropriate things, right? But, like for example, you know, baby humans are driven to, long to stand up and walk. Okay, you know, it's not, that's kind of, this desire is hard-wired. How to do it precisely is not. That's learned. But the desire to walk, move around and stand up. That's sort of, It's very simple to hide where you're this kind of stuff.

SPEAKER_01

48:01 - 48:17

Oh, like the desire to, well, that's interesting. You're hardwired to want to walk Let's not, uh, it's got to be a deeper need for walking. I think it was probably socially imposed by society that you need to walk all the other bipedal.

SPEAKER_00

48:17 - 48:25

No. You could all of, see what animals that, you know, they probably walk without ever watching any other members of the species.

SPEAKER_01

48:25 - 48:36

It seems like a scary thing to have to do because you suck up bipedal walking in the first. It seems crawling him is much safer, much more like, why are you in a hurry?

SPEAKER_00

48:37 - 48:49

Well, because because you have this thing that drives you to do it, you know, which is sort of part of the sort of human development. Is that understood actually what not entirely?

SPEAKER_01

48:49 - 48:54

No, what is what's the reason to get on two feet? It's really hard. Like most animals don't get on two feet.

SPEAKER_00

48:54 - 48:59

Well, they get on four feet, you know, many mammals get on four feet. Yeah, they vary quickly. Some of them extremely quickly.

SPEAKER_01

49:00 - 49:08

But I don't, you know, like from the last time I've interacted with a table that's much more stable than a thing than two legs. It's just a really hard problem.

SPEAKER_00

49:08 - 49:11

Yeah, I mean, birds have figured it out with two feet.

SPEAKER_01

49:11 - 49:18

Technically, we can go into ontology. They have four. I guess they have two feet. They have two feet, chickens.

SPEAKER_00

49:18 - 49:20

You know, dinosaurs are two feet. Many of them.

SPEAKER_01

49:20 - 49:45

allegedly. I'm just not learning that T-Rex was eating grass, not other animals. T-Rex might have been a friendly pet. What do you think about, I don't know if you looked at the test for general intelligence, that fresh wash away put together. I don't know if you got a chance to look at that kind of thing. What's your intuition about how to solve an IQ type of test?

SPEAKER_00

49:45 - 49:52

I don't know, I think it's so outside of my radar screen that it's not really relevant. I think in the short term.

SPEAKER_01

49:52 - 50:04

Well, I guess a one way to ask another way perhaps more closer to what do you work as like how do you solve MNIST with very little example data?

SPEAKER_00

50:04 - 51:24

That's right. And that's the answer to this probably, sort of, by joining. Just learn to represent images and then learning, you know, to recognize sendwritten digits on top of this will only require a few samples. And we observe this in humans, right? You show a young child a picture book with a couple of pictures of an elephant and that's it. The child knows what an elephant is. And we see this today with practical systems that we, you know, we train image recognition systems with An enormous amount of images is also completely supervised or very weakly supervised. For example, you can train a neural net to predict whatever hashtag people type on Instagram. You can do this with billions of images because this billions per day that are showing up. So the amount of training data is essentially unlimited. And then you take the output representation, you know, a couple of layers down from the output of what the system learned, and feed this as input to a classifier for any object in the world that you want, and it works really well. So that's transfer learning, okay, or weekly supervised transfer learning. People are making very fast progress using self-supervised learning for this kind of scenario as well. And, you know, my guess is that that's going to be the future. Thank you.

SPEAKER_01

51:24 - 51:54

For self-supervised learning, how much cleaning do you think is needed for filtering malicious signal or with a better term? But like a lot of people use hashtags on Instagram to get like good SEO that doesn't fully represent the contents of the image. Like they'll put a picture of a cat and hashtag it with like science, awesome fun. I don't know. Why would you put science? That's not very good SEO.

SPEAKER_00

51:54 - 52:18

The way my colleagues will work on this project at Facebook, now Meta. Meta AI, a few years ago, dealt with this, they only selected something like 17,000 tags that correspond to kind of physical things or situations. That has some visual content. You wouldn't have like hash, TBT or anything like that.

SPEAKER_01

52:19 - 52:24

Also, they keep a very select set of hashtags.

SPEAKER_00

52:24 - 52:31

It's still on the order of 10 to 20,000, so it's fairly large.

SPEAKER_01

52:31 - 52:42

Can you tell me about data augmentation, what the heck is data augmentation, and how is it used, maybe contrast of learning for video? What is some cool ideas here?

SPEAKER_00

52:42 - 56:39

Right. So data augmentation. I mean, first data augmentation is the idea of artificially increasing the size of your training set by distorting the images that you have in ways that don't change the nature of the image. Right. So you take, you do end this. You can do data augmentation on end list. And people have done this since 1990. So I do take an end digit and you shift it a little bit or you change the size or rotate it, skew it You know, et cetera. I don't know. It's, et cetera. And it, it works better. If you train a supervised classifier with augmented data, you're going to get better results. Now, it's become really interesting over the last couple years, because a lot of self supervised learning techniques to pre-trained vision systems are based on data augmentation. And the basic techniques is originally inspired by techniques that I worked on in the early nineties and Jeffington worked on in the early nineties. There was sort of parallel work. I used to call this same is network. So basically you take two identical copies of the same network. They share the same weights. And you show two different views of the same object. Either those two different views may have been obtained by data augmentation, or maybe it's two different views of the same scene from a camera that you moved or different times or something like that, or two pictures of the same person, things like that. And then you train this neural net, those two identical copies of this neural net, to produce an output representation vector, in such a way that the representation for those two images are as close to each other as possible, as identical to each other as possible, right? Because you want the system to basically learn a function that will be invariant, that will not change, whose output will not change when you transform those inputs in those particular ways, right? So that's easy to do. What's complicated is how do you make sure that when you show two images or a different, the system will produce different things? Because if you don't have a specific provision for this, the system will just ignore the inputs. When you train it, it will end up ignoring the input and just produce a constant vector that is the same for every input. That's called a collapse. Now, how do you avoid collapse? So there's two ideas. One idea that I proposed in the early 90s, with my colleagues at Bell Labs, Gene Bromley and a couple other people, which we now call contrastive learning, which is to have negative examples, right? So you have pairs, images that you know are different and you show them to the network and those two copies and then you push the two output vectors away from each other and they will eventually guarantee that things that are semantically similar would use similar representations and things that are different would use different representations. We actually came up with this idea for a project of doing signature verification. So we would collect signature signatures from multiple signatures on the same person and then train on that to produce the same representation. And then force the system to produce different representation from different signatures. This was actually the problem was proposed by people from what was a subsidiary of AT&T at the time called NCR. And they were interested in storing representation of the signature on the 80 bytes of the magnetic strip of a credit card. So we came up with this idea of having neural networks with 80 outputs. you know that we would quantize them by it's tool so that we could encode the and that actually was then used to compare with the signature matches or not that's right so then you would you know sign would run through neural net and then you would compare the I put vector to whatever is stored on your card actually work it worked but the ended up not using it Because nobody cares actually. I mean, the American financial payment system is incredibly lags in that respect compared to Europe.

SPEAKER_01

56:39 - 56:43

Oh, with the signatures. What's the purpose of signatures anyway? This is very.

SPEAKER_00

56:43 - 59:06

Everybody looks at them. Nobody cares. Yeah. Yeah. So that's contrastive running. Right. So you need positive and negative pairs. And the problem with that is that You know, even though I add the original paper on this, I actually not very positive about it because it doesn't work in high dimension. If your representation is high dimensional, there's just too many ways for two things to be different. And so you would need lots and lots and lots of negative pairs. So there is a particular implementation of this, which is a relatively recent, from actually the Google Toronto group, where Jeffington is the senior member there, in this school, Sim Clear, SIMCLR. And basically, a particular way of implementing this idea of contraceptive rounding, the particular objective function. Now, what I'm much more enthusiastic about these days is non-contrastive methods. So other ways to guarantee that the representations would be different for different inputs. And it's actually based on an idea that Jeffington proposed in the early 90s with student at the time, Sue Becker. And it's based on the idea of maximizing the mutual information between the outputs of the two systems. You only show positive pairs. You only show pairs of images that you know are somewhat similar. And you're trying to connect work to be informative. but also to be as informative, of each other as possible. So basically, one representation has to be predictable from the other, essentially. And he proposed that idea had a couple of papers in the early 90s, and then nothing was done about it for the case. And I kind of revived this idea together with my postdocs at Fair, particularly a postdoc called Stephanie, who is now a junior professor in Finland at University of Alto. We came up with something called barlow twins, and it's a particular way of maximizing the information content of a vector using some hypotheses. And we have kind of another version of it that's more recent, now called VickRag, VIC, RIG. That means variance invariance covariance regularization. And it's the thing I'm the most excited about in machine learning in the last 15 years. I mean, I'm not, I'm really, really excited about this.

SPEAKER_01

59:06 - 59:21

What kind of data augmentation is useful for that non-contrastal learning method? Are we talking about does that not matter that much? Or it seems like a very important part of the step. Yeah. Are you generating the images that are similar but sufficiently different?

SPEAKER_00

59:21 - 01:01:19

Yeah, that's right. It's an important step, and it's also an annoying step, because you need to have that knowledge of what data augmentation you can do that do not change the nature of the object. And so the standard scenario, which a lot of people working in this area are using, is you use the type of distortion. So basically you do geometry distortion. So one basically just shifts the image a little bit, it's called cropping. Another one can change the scale a little bit. Another one can rotate it, another one changes the colors, you know, you can do a shift in color balance or something like that. Saturation, another one sort of blurs it, another one as noise. So you have like a catalog of standard things and people try to use the same ones for different algorithms so that they can compare. But some algorithms, some self-supervised algorithms actually can deal with much bigger, like more aggressive data augmentation and some don't. So that kind of makes a whole thing difficult. But that's the kind of distortions we're talking about. And so you train with those distortions. And then you chop off the layer of couple layers of the network and you use the representation as input to classify your, you train the classifier. on ImageNet, let's say, or whatever, and measure the performance. And interestingly enough, the methods are really good at eliminating the information that is irrelevant, which is a distortion between those images. Too good job at eliminating it. And as a consequence, you cannot use the representations in those systems for things like object detection and localization, because that information is gone. So the type of data augmentation you need to do depends on the task you want eventually the system to solve and the type of data augmentation standard data augmentation that we use today. Only a appropriate for object recognition or image track classification they're not appropriate for things like.

SPEAKER_01

01:01:19 - 01:01:29

Can you help me out understand what why the localization so you're saying it's just not good at the negative like a class find the negative so that's why it can't be used for the localization.

SPEAKER_00

01:01:30 - 01:01:49

No, it's just that you train the system, you give it an image and you give it the same image shifted and scaled, and you tell it that's the same image. So this system basically is trained to eliminate the information about position and size. So now, now you want to use that. Oh, it's like where an object is or what size it is.

SPEAKER_01

01:01:49 - 01:02:25

Like a bonding box, like to be able to actually, okay, it can still find, it can still find the object in the image is just not very good at finding the exact boundaries of that object interesting. Interesting, which, you know, that's an interesting sort of philosophical question, how important, how important is object localization anyway? We were like obsessed by measuring like image segmentation, obsessed by measuring perfectly knowing the boundaries of objects when arguably that's not that essential to an understanding what are the contents of the scene.

SPEAKER_00

01:02:25 - 01:03:03

On the other hand, I think evolutionarily, the first vision systems in animals were basically all about localization, very little about recognition. And in the human brain, you have two separate pathways for recognizing the nature of a scene or an object and localizing objects. So you use the first pathway called a ventral pathway for telling what you're looking at. The other path with a DOSO pathway is used for navigation, for grasping, for everything else. And basically, a lot of the things you need for survival are localization and detection.

SPEAKER_01

01:03:03 - 01:03:17

Is similarity learning or contrast of learning with these non-contrasts of methods the same as understanding something? Just because you know the storycat is the same as a non-distorted cat. Does that mean you understand what it means to be a cat?

SPEAKER_00

01:03:18 - 01:03:22

To some extent, I mean, it's a superficial understanding, obviously.

SPEAKER_01

01:03:22 - 01:03:31

But like, what is the ceiling of this method, do you think? Is this just one trick on the path to doing self-supervised learning? Can we go really, really far?

SPEAKER_00

01:03:32 - 01:05:23

I think we can go really far. So if we figure out how to use techniques of that type, perhaps very different, but this signature to train a system from video to video prediction essentially, I think we'll have a path towards, I wouldn't say unlimited, but a path towards some level of physical common sense in machines. And I also think that that ability to learn how the world works from a sort of high throughput channel like Vision is a necessary step towards sort of real artificial intelligence. In other words, I believe in grounded intelligence. I don't think we can train a machine to be intelligent and purely from text. Because I think the amount of information about the world that's contained in text is tiny compared to what we need to know. So for example, and you know, people have attempted to do this for 30 years, right? The site project and things like that, right? So basically kind of writing down all the facts that are known and hoping that some some sort of common sense will emerge. I think it's basically hopeless. But let me take an example. You take an object. I describe a situation to you. I take an object. I put it on the table and I push the table. It's completely obvious to you that the object will be pushed with the table, right? Because it's sitting on it. There is no text in the world, I believe, that explains this. And so, you're trying to machine as powerful as it could be, you know, your GPT 5000 or whatever it is. It's never going to learn about this. That information is just not present in any text.

SPEAKER_01

01:05:23 - 01:05:50

Well, the question, like, with a site project, the dream, I think is to have like, like 10 million, say facts like that. that give you a head start, like a parent guiding you. Now, we humans don't need a parent to tell us that the table will move, sorry, this smartphone will move with the table. But we get a lot of guidance, in other ways. So it's possible that we can give it a quick shortcut.

SPEAKER_00

01:05:50 - 01:05:53

What about cat? We got knows that.

SPEAKER_01

01:05:53 - 01:07:01

No, but they evolved. So no, they learned like us. Sorry, the physics of stuff. Yeah. Well, yeah, so you're saying it's, you're putting a lot of intelligence onto the nurture side, not the nature. Yes. We seem to have, you know, there's a very inefficient, arguably, process of evolution that got us from bacteria to who we are today. Start at the bottom now we're here. The question is the question is how fundamental is that the nature of the whole hardware? Is there any way to shortcut it if it's fundamental if it's not if it's most of intelligence most of the cool stuff we've been talking about is most in nurture mostly trained we figured out by observing the world we can form that Big beautiful sexy background model that you're talking about just by sitting there Then okay, then you need to then like maybe It is all supervised learning, all the way down. It's all supervised learning, say.

SPEAKER_00

01:07:01 - 01:07:37

Whatever it is that makes human intelligence different from other animals, which a lot of people think is language and logical reasoning and this kind of stuff. It cannot be that complicated because it only popped up in the last million years. Yeah. And it, you know, it, it only involves, you know, less than one percent of a genome right, which is a difference between human genome and chimps or whatever. So it can be that complicated, you know, it can be that fundamental. I mean, the most of the, so complicated stuff already exists in cats and dogs and, you know, certainly primates, nonhuman primates.

SPEAKER_01

01:07:39 - 01:07:55

Yeah, that little thing with humans might be just something about social interaction and ability to maintain ideas across like a collective of people. It sounds very dramatic and very impressive, but it probably isn't mechanistically speaking.

SPEAKER_00

01:07:55 - 01:07:59

It is, but we're not there yet. We have, I mean, this is number 634, you know, in the list of problems we have to solve.

SPEAKER_01

01:08:05 - 01:08:42

So basic physics of the world is the number one. What do you just a quick tangent on data augmentation? So a lot of it is hard-coded versus learned. Do you have any intuition that maybe there could be some weird data augmentation, like generative type of data augmentation, like doing something weird to images, which then improves the similarity learning process, so not just kind of dumb simple distortions, but by you shaking your head just saying that even simple distortions are enough.

SPEAKER_00

01:08:42 - 01:09:34

I think no, I think data augmentation is a temporary necessary evil. So what people are working on now is two things. One is the type of self supervision. like trying to translate the type of software as a new people using language, translating these two images, which is basically a denozingo to encode our method, right? So you take an image, you block, you mask some parts of it, and then you train some joint neural net to reconstruct the parts that you are missing. And until very recently, There was no working methods for that. All the autoencoder type methods for images weren't producing very good representation. But there's a paper now coming out of the fair group in the park that actually works very well. So that doesn't require the documentation that requires only masking.

SPEAKER_01

01:09:37 - 01:09:40

only masking for images.

SPEAKER_00

01:09:40 - 01:09:55

Okay, right. So you must put a part of the image and you train a system, which you know, in this case is a transformer because you can you can the transformer represents the image as non overlapping patches. So it's easy to mask patches and things like that.

SPEAKER_01

01:09:55 - 01:10:01

Okay, then my question transfers to that problem, the masking like why should the mask be a square rectangle.

SPEAKER_00

01:10:02 - 01:10:12

So it doesn't matter, I think we're gonna come up probably in the future with sort of ways to mess that are kind of random essentially.

SPEAKER_01

01:10:12 - 01:10:37

I mean, there are random already, but no, no, but like something that's challenging. like optimally challenging so like maybe it's a metaphor that doesn't apply but you're it seems like there's a date augmentation or masking there's an interactive element with it like you're almost like playing with an image yeah and like it's like the way we play with an image in our minds

SPEAKER_00

01:10:37 - 01:12:13

Now, it's like dropout. It's like both solution training. Every time you see a percept, you can perturb it in some way. And then the principle of the training procedure is to minimize the difference of the output or the representation between the clean version and the corrupted version essentially, right? And you can do this in real time, right? So, you know, both machine work like this, right? You show a percept, you tell the machine that's a good combination of activities or your input neurons. And then you either let them go their merry way without clamping them to values, or you only do this with the subset. And what you're doing is you're training the system so that the stable state of the entire network is the same regardless of whether it's easy, entire input, or whether it's easy on the part of it. You know, denosing a 20-quarter method is basically the same thing, right? You're training a system to reproduce the input, to complete inputs, and filling the blanks. We've got less of which parts are missing, and that's really the underlying principle. And you could imagine sort of even in the brain, some sort of normal principle, where, you know, neurons gonna oscillate, right? So they detect the activity, and then temporarily they kind of shut off. to, you know, force the rest of the system to basically reconstruct the input without their help, you know, and you can imagine, you know, more or less better, particularly possible processes.

SPEAKER_01

01:12:13 - 01:12:39

Some of that. And I guess with this denoising autoencoder and masking and data augmentation, you don't have to worry about being super efficient. You can just do as much as you want and get better over time. because I was thinking like you might want to be clever about the way you do all these procedures you know but that's only it's somehow costly to do every iteration but it's not really.

SPEAKER_00

01:12:42 - 01:13:09

And then there is, you know, data augmentation without explicit data augmentation is data augmentation by waiting, which is, you know, the sort of video prediction. You're observing a video clip, observing the, you know, the continuation of that video clip, try to learn our representation using the joint embedding architectures in such a way that the representation of the future clip is easily predictable from the representation of the, of the observed clip.

SPEAKER_01

01:13:10 - 01:13:18

Do you think YouTube has enough raw data from which to learn how to be a cat?

SPEAKER_00

01:13:18 - 01:13:19

I think so.

SPEAKER_01

01:13:19 - 01:13:23

So the amount of data is not the constraint.

SPEAKER_00

01:13:23 - 01:13:30

No, it would require some selection. I think some selection or maybe the right type of data.

SPEAKER_01

01:13:30 - 01:14:00

You can put it on the rabbit hole of just cat videos that might you might need to watch some lectures or something. No, wouldn't it? How meta would that be? If it like watches lectures about intelligence and then learns watches your lectures and why you and learns from that how to be intelligent. I don't think there would be enough. What's your define multi-model learning interesting we've been talking about visual language? Combining those together may be audio, all those kinds of things.

SPEAKER_00

01:14:00 - 01:14:59

These are a lot of things that I find interesting in the short term, but are not addressing the important problem that I think are really kind of the big challenges. So I think things like multitask learning, continual learning, adversarial issues. I mean, those have great practical interests in the relatively short term, possibly, but I don't think they're fundamental. Active learning, even to some extent, reinforcement learning. I think those things will become either obsolete or useless or easy. Once we figured out how to do self-supervised representation learning or learning predictive world models. And so I think that's what the entire community should be focusing on. At least people are interested in fundamental questions or really kind of pushing the envelope of AI towards the next stage. But of course, there's a huge amount of very interesting work to do in practical questions that have shot time impact

SPEAKER_01

01:15:00 - 01:17:02

Well, you know, it's difficult to talk about the temporal scale, because all of human civilization will eventually be destroyed, because the Sun will die out, and even if you'll almost a successful military colonization across the galaxy, eventually the entirety of it would just become giant black holes. And that's going to take a while, though. So, but what I'm saying is then that logic can be used to say it's all meaningless. I'm saying all that to say that multitask learning might be, you're calling it practical or pragmatic or whatever. That might be the thing that achieves something very akin to intelligence. while we're trying to solve the more general problem of self supervised learning and back on knowledge. So the reason I bring that up may be one way to ask that question. I've been very impressed by what Tesla Autopolitan was doing. I don't know if you got any chance to glance at this particular one example of multi-task learning whether literally taking the problem, like, I don't know, Charles Darwin's starting animals. They're studying the problem of driving and asking, okay, what are all the things you have to perceive? And the way they're solving it is one, there's an ontology where you're bringing that to the table, so you formulate a bunch of different tasks that's like over a hundred tasks or something like that, they're involved in driving. And then they're deploying it and getting data back from people that run to trouble and then trying to figure out, do we add tasks? Do we focus on each individual tasks separately? Sure, in fact, half. So I would say, I'll classify Andre Caporthy's talking to us. So one was about doors. And the other one about how much image net sucks, he will be able to go back and forth on those two topics, which image net sucks, meaning you can't just use a single benchmark. You have to have a giant suite of benchmarks that can't how well you're systemizing.

SPEAKER_00

01:17:06 - 01:22:12

Now, it's very clear that if you're faced with an engineering problem that you need to solve in a relatively short time, particularly if you have it almost breathing down your neck, you're going to have to take short cut. You might think about the fact that the right thing to do in the long-term solution involves, you know, some fancy certain provisioning, but you have You know, you know, most breathing on your neck and, you know, this involves, you know, human lives. And so you have to basically just do the systematic engineering and, you know, fine tuning and refinements and try on error and, and all that stuff. There's nothing wrong with that. That's, that's called engineering. That's called, you know, putting technology out in the in the world and and you have to kind of ironclad it before before you do this you know so much for you know grand grand ideas and principles but you know I'm placing myself sort of you know some You know, upstream of this, you know, quite a bit upstream of this. You're playing all think about platonic forms. You're not platonic because eventually I want us to have to get used, but it's okay if it takes five or ten years for the community to realize this is the right thing to do. I've done this before. It's been the case before that, you know, I've made that case. I mean, if you look back in the mid to thousand, for example, and you ask yourself the question, okay, I want to recognize cars or faces or whatever. You know, I can use complexion nets, so I can use a more conventional kind of computer vision techniques, you know, using each corresponding detector or a shift density features and, you know, sticking an SVM on top. At that time, the data sets were so small that those methods that use more engineering work better than companies. It was just none of the data for companies. And companies were a little slow with the kind of hardware that was available at the time. And there was a sea change when basically when data sets became bigger and GPUs became available. That's what... Two of the main factors that basically made people change their mind. And you can look at the history of all sub branches of AI or pattern recognition. And there's a similar trajectory followed by techniques where people start by, you know, engineering the head out of it. You know, be it optical character recognition, speech recognition, computer vision, like image recognition in general, natural language understanding, like, you know, translation, things like that, right? You start to engineer the hell out of it. You start to acquire all of the knowledge, the prior knowledge you know about image formation, about the shape of characters, about morphological operations, about like feature extraction, Fourier transforms, you know, very clean moments, you know, whatever, right people have come up with thousands of ways of representing images so that it could be easily classified afterwards, same for speech recognition, right? You know, to decades for people to figure out a good font and to pre-process speech signals, so that, you know, the information about what is being said is preserved, but most of the information about the identity of the speaker is gone. So whatever, right? And same for text, right? You do native entity recognition and you parse and you do tagging of the parts of speech and you do this sort of tree representation of clauses and stuff, right before you can do anything. So that's how it starts, right? Just engineer the hell out of it. And then you start having data. and maybe you have more powerful computers, maybe you know something about statistical learning. So you start using machine learning and it's usually a small sliver on top of your kind of handcrafted system where you extract features by hand. Okay, and now nowadays the standard way of doing this is that you train the entire thing into end with a deep learning system and it learns its own features and you know, speech recognition systems nowadays, or see our systems are completely into end. It's, you know, some giant neural net that takes raw waveforms and produces a sequence of characters coming out. And it's just a huge neural net, right? There is no macro model, there is no language model that is explicit other than something that's ingrained in the sort of neural language model. Same for translation, same for all kinds of stuff. So you see this continuous evolution from less and less hand crafting and more and more learning. And I think I mean, he's tweeting about it all as well.

SPEAKER_01

01:22:12 - 01:23:07

So I mean, we might disagree about this maybe not in this one little piece at the end. You mentioned active learning. It feels like active learning, which is the selection of data, and also the interactivity needs to be part of this giant neural network. You cannot just be an observer to do self-supervised learning. You have to, well, self-supervised learning is just the word, but whatever this giant stack of a neural network that's automatically learning, it feels my intuition is that you have to have a system whether it's a physical robot or a digital robot that's interacting with the world and doing so in a flawed way and improving our time. In order to form the self-supervised learning, well, you can't just give it a giant sea of data.

SPEAKER_00

01:23:07 - 01:23:30

Okay, I agree and I disagree. Okay. I agree in the sense that I think I agree in two ways. The first way I agree is that if you want and you certainly need a causal model of the world that allows you to predict the consequences of your actions to train that model, you need to take actions. You need to be able to act in a world and see the effect. for you to be to learn causal models of the world.

SPEAKER_01

01:23:30 - 01:23:38

So this is not obvious because you can observe others. You can observe others. And you can infer that they're similar to you. And then you can learn from that.

SPEAKER_00

01:23:38 - 01:24:42

Yeah, but then you have to kind of hardware your that part. Right. And you know, mirror neurons and all that stuff. Right. So, and it's not clear to me how you would do this in a machine. So I think the action part would be necessary for having causal models of the world. The second reason it may be necessary, at least more efficient, is that active learning basically goes for the particular of what you don't know, right? It's obvious, Harry has an uncertainty about your world and about the other world behaves. And you can resolve this uncertainty by systematic exploration of that part that you don't know. And if you know that you don't know, then it makes a curious, you kind of look into situations that. And across the animal world, different species are different levels of curiosity, right? Depending on how they build, right? So cats and rats are incredibly curious, dogs not so much, I mean less.

SPEAKER_01

01:24:42 - 01:24:46

Yeah. It could be useful to have that kind of curiosity. So it would be useful.

SPEAKER_00

01:24:46 - 01:25:13

But curiosity just makes the process faster. It doesn't make the process exist. So what process, what learning process is it that active learning makes more efficient? And I'm asking that first question, you know, You know, we haven't answered that question yet. So, you know, I'd worry about active learning once this question is, uh, so it's the more fundamental question to ask.

SPEAKER_01

01:25:13 - 01:25:46

And if active learning or interaction increases the efficiency of the learning, sometimes it becomes very different if the increase is several orders of magnitude, right? Like that's true. But fundamentally, it's still the same thing in building up the intuition about how to in a self-supervised way to construct back our models, efficient or inefficient, is the core problem. What do you think about your Shabbangios talking about consciousness and all of these kinds of concepts?

SPEAKER_00

01:25:46 - 01:29:26

Okay, I don't know what consciousness is, but it's a good opener. And to some extent, a lot of the things that are said about consciousness remind me of the questions people were asking themselves in the 18th century or 17th century when they discovered that, you know, how the eye works and the fact that the image of the back of the eye was upside down, right, because you have a lens. And so on your retina, the image that forms is an image of the world, but it's upside down. How is it that you see or I'd say it up? And, you know, with what we know today in science, you know, we realize this question doesn't make any sense. Or it's kind of ridiculous in some way, right? So I think a lot of what is said about consciousness is of that nature. Now that said, there is a lot of really smart people that for whom I have a lot of respect to talking about the topic, people like David Changmers, who is a colleague of mine, and then why you? I have kind of an Orthodox folk speculative hypothesis about consciousness. So we're talking about the study of world model. And I think our entire prefrontal context basically is the engine for a world model. But when we are attending at a particular situation, we focus on that situation. We basically cannot attend to anything else. And that seems to suggest that we basically have only one world-model engine in our pre-photo cortex. That engine is configurable to the situation at hand. So we are building a box at a wood or we are driving down the highway, playing chess. We basically have a single model of the world that we are configuring to the situation at hand. Which is why we can only attend to one task at a time. Now, if there is a test that we do repeatedly, it goes from the sort of deliberate reasoning using model of the world in prediction and perhaps something like model predictive control, which I was talking about earlier, to something that is more subconscious, that becomes automatic. So I don't know if you've ever played against the Chad's Grandmaster. You know, I get wiped out in, you know, 10, 10 plies, right? And, you know, I have to think about my move for, you know, like 15 minutes. And the person in front of me, the grandmaster, you know, would just like react within seconds, right? You know, he doesn't need to think about it. That's become part of subconscious because, you know, it's basically just pattern recognition at this point. Same, you know, the first few hours you drive a car, you're really attentive, you can't do anything else. And then after 20, 30 hours of practice, 50 hours, you know, subconscious, you can talk to the person next to you, you know, things like that. Unless the situation becomes unpredictable. And then you have to stop talking. So that suggests you only have one model in your head. It might suggest the idea that consciousness basically is the module that configures this world model of yours. You need to have some sort of executive kind of overseer that configures your world model for the situation at hand. And that needs to kind of the really curious concept that consciousness is not a consequence of the power of our mind, but of the limitation of our brains. But because we have only one world model, we have to be conscious. If we had as many world models as There are situations we encounter that we could do all of them simultaneously and we would need the executive control that we could consciousness.

SPEAKER_01

01:29:26 - 01:30:34

Yeah, interesting and somehow maybe that executive controller, I mean the hard problem of consciousness, there's some kind of chemicals and biology that's creating a feeling like it feels to experience some of these things. That's kind of like the hard question is what the heck is that and why is that useful? Maybe the more pragmatic question. Why is it useful to feel like this is really you experiencing this versus just like information being processed? It could be just a very nice side effect of the way we evolved. That's just very useful to feel a sense of ownership, to the decisions you make, to the perceptions you make, to the model you're trying to maintain, like your own this thing, and this is the only one you got, and if you lose it, it's going to really suck. And so you should really send the brain some signals about it. What ideas do you believe might be true that most or at least many people disagree with? Let's say in the space of machine learning.

SPEAKER_00

01:30:35 - 01:32:07

What depends who you talk about. But I think, so certainly there is a bunch of people who are nativeists who think that a lot of the basic things about the world are hardwired in our minds. Things like the world is three-dimensional, for example, is that hardwired? Things like object permanence, is something that we learn before the age of three months or so, or are we born with it? And there are very disagreement among the cognitive scientists for this. I think those things are actually very simple to learn. Is it the case that the oriented edge detectors in V1 are learned or are they hardwired? I think they are learned. They might be learned before both because it's really easy to generate signals from the retina that actually will train edge detectors. And again, those are things that can be learned within minutes of opening your eyes. I mean, since the 1990s, we have algorithms that can run on your today's detector. It's completely unsupervised with the equivalent of a few minutes of real time. So those things have to be learned. And there's also those MIT experiments where you kind of plug the optical nerve on the auditory cortex of a baby ferret. It's an auditory cortex because visual cortex is essentially So, you know, clearly, this running taking place there. So, you know, I think a lot of what people think are so basic that you need to be hardwired, I think a lot of those things are learned because they are easy to run.

SPEAKER_01

01:32:07 - 01:32:17

So, you put a lot of value in the power of learning. What kind of things do you suspect might not be learned? Is there something that could not be learned?

SPEAKER_00

01:32:18 - 01:33:38

So your intrinsic drives are not learned. There are the things that, you know, make humans human or make, you know, cats different from dogs, right? That's the, the basic drives that are kind of hardwired in our visual ganglia. I mean, there are people who are working on this kind of stuff. This is called intrinsic motivation in the context of reinforcement learning. So these are objective functions, whether a word doesn't come from the external world, it's computed by your own brain. Your own brain computes, whether you're happy or not, right? It measures your degree of comfort or in comfort. And because it's your brain computing this, presumably it knows also how to estimate gradient. So that's right. So it's easier to learn when your objective is intrinsic. So that has to be hardwired. The critic that makes long-term prediction of the outcome, which is the eventual result of this, that's learned. and perception is learned and your model of the world is learned. But let me take an example of why the critic, I mean example of how the critic may be learned. If I come to you, I reach across the table and I pinch your arm, complete surprise for you. You would not have expected this.

SPEAKER_01

01:33:38 - 01:33:42

I was expecting that the hotel, but yes, right. Let's say for the sake of the story, yes.

SPEAKER_00

01:33:43 - 01:34:15

Okay, your visual ganglia is going to light up because it's going to hurt, right? And now your model of the world includes the fact that I may pinch you if I approach my, yeah, don't trust humans. Right. my hand to your arm. So if I try again, you're going to recoil and that's your critic, your predictive, you know, your predictor of your ultimate pain system that predicts that something bad is going to happen, then you recoil to avoid it.

SPEAKER_01

01:34:15 - 01:34:17

So even that can be learned.

SPEAKER_00

01:34:17 - 01:34:37

That is learned definitely. This is what allows you also to, you know, define sub goals, right? So the fact that, you know, your school child who wake up in a morning and you go to school and, you know, it's not because you necessarily like waking up early and going to school, but you know that there is a long-term objective you're trying to optimize.

SPEAKER_01

01:34:37 - 01:35:29

So Ernest Becker, I'm not sure if you're familiar with the philosopher, he wrote the book, The Nile of Death and his ideas that one of the core motivations of human beings is our terror of death, our fear of death. That's what makes us unique from cats, cats are just surviving. They do not have a deep, like cognizance, introspection that over the horizon is the end. And he says that, I mean, there's a terror management theory that just all these psychological experiments that show the basically this idea that all of human civilization, everything we create is kind of trying to forget if even for a brief moment that we're going to die. When do you think humans understand that they're going to die? Is it learned early on? Also, like,

SPEAKER_00

01:35:31 - 01:35:43

I don't know what point, I mean, it's a question, like, you know, at what point do you realize that, you know, what that really is? And I think most people don't actually realize what that is, right? I mean, most people believe that you go to heaven or something.

SPEAKER_01

01:35:43 - 01:37:21

Right? So the push back on that, what Ernest Bakker says and Sheldon Salman, all those folks, and I find those ideas a little bit compelling is that there is moments in life, early in life, a lot of this fun happens, early in life, when you are when you do deeply experience the terror of this realization and all the things you think about about religion all those kinds of things that we kind of think about more like teenage years and later we're talking about way earlier no it's like seven or eight years something like that you realize holy crap this is like the mystery the terror like it's almost like you're a little prey little baby deer sitting in the darkness of the jungle of the woods looking all around you there's darkness full of terror i mean that's that realization says okay i'm gonna go back in the comfort of my mind where there's a where there is a deep meaning where there's a maybe like pretend i'm immortal and however way However, kind of idea can construct to help me understand that I'm immortal. Religion helps with that. You can delude yourself in all kinds of ways, like lose yourself in the business of each day, have little goals in mind, all those kinds of things to think that's going to go on forever. And you kind of know you're going to die, yeah, and it's going to be sad, but you don't really understand that you're going to die. And so that's their idea. And I find that compelling because it does seem to be a core unique aspect of human nature that we were able to think that we were able to really understand that this life is finite.

SPEAKER_00

01:37:21 - 01:37:42

That seems important. There's a bunch of different things there. So first of all, I don't think there is a qualitative difference between us and cats in the term. I think the difference is that we just have a better long term ability to predict in the long term. And so we have better understanding of other work. So we have better understanding of finiteness of life and things like that.

SPEAKER_01

01:37:42 - 01:37:45

We have a better planning engine than cats.

SPEAKER_00

01:37:45 - 01:37:47

Yeah.

SPEAKER_01

01:37:47 - 01:37:50

But it was the motivation for planning that.

SPEAKER_00

01:37:50 - 01:38:24

Well, I think it's just a side effect. The fact that we have just a better planning engine, because it makes us, as I said, the essence of metallurgency is the ability to predict. And so because we're smarter, As a side effect, we also had this ability to kind of make predictions about our own future existence or lack thereof. You say religion helps with that. I think religion hurts actually. It makes people worry about like, you know, what's going to happen after their death, et cetera. If you believe that, you know, you just don't exist after death. So like, you know, it's all completely the problem at least.

SPEAKER_01

01:38:24 - 01:38:30

You're saying if you don't believe in God, you don't worry about what happens after death. I don't know.

SPEAKER_00

01:38:30 - 01:38:35

You worry about about, you know, this life because that's the only one you have.

SPEAKER_01

01:38:36 - 01:39:09

I think it's, well, I don't, I don't know, if I were to say what I don't know, as Becca says, and I've said, I agree with him more than not is you do deeply worry. If you believe there's no God, there's still a deep worry of the mystery of it all. How does that make any sense that it just ends? I don't think we can truly understand that this right, I mean, so much of our life, the consciousness that ego is invested in this being.

SPEAKER_00

01:39:09 - 01:39:16

Science keeps bringing humanity down from its pedestal. Yeah, but that's another example of it.

SPEAKER_01

01:39:16 - 01:39:49

That's wonderful, but for us individual humans, we don't like to be brought down from a past. But see, you're fine with it because, well, so what Ernest Becker would say is you're fine with it because that's just a more peaceful existence for you, but you're not really fine. You're hiding from, in fact, some of the people that experience the deepest trauma earlier in life, they often before they seek extensive therapy, we'll say that I'm fine. It's like when you talk to people who are truly angry, how are you doing? I'm fine. The question is what's going on?

SPEAKER_00

01:39:49 - 01:40:02

I had a near death experience. I had a very bad motorbike accident when it was 17. But that didn't have any impact on my reflection on that topic.

SPEAKER_01

01:40:02 - 01:40:50

So I'm basically just playing a bit of a delzav, getting pushed back and wondering, is it truly possible to accept that? And the flip side, this more interesting, I think, for AI and robotics, it's how important is it to have This is one of the sweet of motivations is to not just avoid falling off the roof or something like that, but ponder the the end of the ride. If you listen to the stoics, it's a great motivator. It adds a sense of urgency. So it might be to truly fear death or be cognizant of it might give a deeper meaning and urgency to the moment to live fully.

SPEAKER_00

01:40:51 - 01:42:05

I mean, maybe I don't disagree with that. I mean, I see what Molives me here is. knowing more about human nature. I mean, I think human nature and human intelligence is a big mystery. It's a scientific mystery. In addition to, you know, philosophical and et cetera, but, you know, I'm too believe in science. And I do have kind of a belief that for complex systems, like the brain on the mind, the way to and understand it to try to reproduce it with, you know, artifacts that you build because you know what's essential to it when you try to build it. You know, the same way I've used this analogy before with you, I believe, the same way we only started to understand aerodynamics when we started building airplanes and that helped us understand how birds fly. You know, so I think there's kind of a similar process here where we don't have a theory of a full theory of intelligence. But building, you know, intelligent artifacts will help us perhaps develop some, you know, underlying theory that encompasses not just artificial implements, but also human and biological intelligence in general.

SPEAKER_01

01:42:05 - 01:42:57

So you're an interesting person to ask this question about sort of all kinds of different other intelligent entities or intelligences. What are your thoughts about kind of like the touring or the Chinese room question? If we create an AI system that exhibits a lot of properties of intelligence and consciousness, how comfortable are you thinking of that entity as intelligent or conscious? So you're trying to build now systems that have intelligence and there's metrics about their performance, but that metric is external. Okay, so how are you? Are you okay calling a thing intelligent? Or are you going to be like most humans and be once again unhappy to be brought down from a pedestal of consciousness and slash intelligence?

SPEAKER_00

01:42:57 - 01:45:12

No, I'll be very happy to understand more about human nature, human mind and human intelligence through the construction of machines that have similar abilities. And if a consequence of this is to bring down humanity one notch down from it's already low capital and just fine with it, that's just the ability of life. So I'm fine with that. Now you were asking me about things that opinions I have that a lot of people may disagree with. I think If we think about the design of autonomous intelligence system, so assuming that we are somewhat successful at some level of getting machines to learn models of the world, pretty team models of the world, we build intrinsic motivation objective functions to drive the behavior of that system. The system also has perception modules that allows it to estimate the set of the world, and then have some way of figuring out the sequence of actions that, you know, to optimize a particular objective. If it has a critic of the type that was describing before, the thing that makes you recoil your arm, the second time I tried to pinch you. Intelligent autonomous machine will have emotions. I think emotions are an integral part of autonomous intelligence. If you have an intelligent system that is driven by intrinsic motivation, by objectives, If it has a critic that allows it to predict an advance, whether the outcome of a situation is going to be good or bad, it's going to have emotions. It's going to have fear. When it predicts that the outcome is going to be bad and something to avoid is going to have elation when it predicts it's going to be good. If it has drives to relate with humans, you know, in some ways the way humans have, you know, it's going to be social, right? And so it's going to have emotions about attachment and things of that type. So I think, you know, the sort of sci-fi thing where you see coming to data like having an emotion chip that you can turn out off right.

SPEAKER_01

01:45:12 - 01:45:46

I think that's ridiculous. So I mean, here's the difficult philosophical social question. Do you think there will be a time like a silhouette's movement for robots where? Okay, forget the movement, but a discussion of like the Supreme Court that particular kinds of robots. you know, particular kinds of systems deserve the same rights as humans because they can suffer just as humans can all those kinds of things.

SPEAKER_00

01:45:46 - 01:46:02

Well, perhaps perhaps not. Like you mentioned that humans were that you could You know, die and be restored. Like, you know, you could be sort of, you know, be 3D-referented and, you know, your brain could be reconstructed in its finest details.

SPEAKER_01

01:46:02 - 01:46:14

RGS of rights will change in that case. If you can always just... There's always a backup, you could always restore. Yeah. Maybe like the importance of murder will go down. It's one notch.

SPEAKER_00

01:46:14 - 01:46:53

That's right. But also your, your, you know, desire to do dangerous things like, you know, you know, skydiving or, you know, or risk, or car racing or that kind of stuff, you know, would probably increase or, you know, airplane aerobatics or that kind of stuff, right? We find to do a lot of those things or explore, you know, dangerous areas and things like that, we kind of change our relationship. So now, it's very likely that robots would be like that because, you know, there'll be based on perhaps technology that is somewhat similar to the technology and you can always have a backup.

SPEAKER_01

01:46:54 - 01:47:01

So it's possible, I don't know if you like video games, but there's a game called Diablo.

SPEAKER_00

01:47:01 - 01:47:03

My sons are huge fans of this.

SPEAKER_01

01:47:03 - 01:47:06

Yes.

SPEAKER_00

01:47:06 - 01:47:09

In fact, they made a game that's inspired by it.

SPEAKER_01

01:47:09 - 01:47:11

Awesome. Like built a game.

SPEAKER_00

01:47:11 - 01:47:19

My three sons have a game design studio between them. That's awesome. They came out with a game last year. No, this was last year, early last year, about a year ago.

SPEAKER_01

01:47:20 - 01:47:50

That's awesome. But in DIY, there's something called hardcore mode, which if you die, there's no, you're gone. Right. That's it. And so it's possible with AI systems for them to be able to operate successfully and for us to treat them in a certain way because they have to be integrating human society. They have to be able to die, no copies allowed. In fact, copying is illegal. It's possible with humans as well, like cloning will be illegal, even what's possible.

SPEAKER_00

01:47:50 - 01:47:57

We've cloning is not copying, right? I mean, you don't reproduce the, the mind of the person, like experience, right? It's just a delay twins.

SPEAKER_01

01:47:58 - 01:48:17

But then we were talking about with computers that you would be able to copy. You would be able to perfectly save pickle the mind state. And it's possible that that would be illegal because that goes against, that will destroy the motivation of the system.

SPEAKER_00

01:48:18 - 01:49:35

Okay, so let's say you have a domestic robot. Okay, sometime in the future. Yes. And a domestic robot, you know, comes to you kind of somewhat pretrained. You know, you can do a bunch of things. Yes. But it has a particular personality that makes it slightly different from the other robots because that makes them more interesting. And then because it's, you know, it's live with you for five years. You've, you've grown some attachment to it. And vice versa. And it's learned a lot about you. Or maybe it's not a hazard war, but maybe it's a virtual assistant that lives in your augmented reality license or whatever. The hermovie type thing. And that system to some extent, the intelligence in that system, is a bit like your child or maybe your future student in a sense that there's a lot of you in that machine now, right? And so if it were a living thing, you would do this for free if you walked, right? If it's your child, you child can then live his or her own life and the fact that the learned stuff from you doesn't mean that you have any ownership of it, right? But if it's a robot, that you've trained, perhaps you have some intellectual property claim.

SPEAKER_01

01:49:35 - 01:49:43

Until actually, probably, or I thought you meant like permanent value in a sense that's part of use in... Well, the first permanent value, right?

SPEAKER_00

01:49:43 - 01:50:00

So you would lose a lot if that were to be destroyed and you had no backup, you would lose a lot. A lot of investment, kind of like... You know, a person dying, you know, that a friend of yours dying or co-worker or something like that.

SPEAKER_01

01:50:00 - 01:50:15

But also you have intellectual property rights in the sense that that that system is fine tuned to your particular existence. So that's now a very unique instantiation of that original background model, whatever it was that arrived.

SPEAKER_00

01:50:16 - 01:50:57

And then there are issues of privacy, right? Because now imagine that robot has its own kind of volition and decides to work for someone else, or kind of thinks life with you is sort of untenable or whatever. Now all the things that system learned from you, You know, how can you like, you know, delete all the personal information that that system knows about you? Yeah. I mean, that would be kind of an ethical question. Like, you know, can you erase the, the mind of, of an intelligent robot to protect your, your privacy? Yeah. You can do this with humans. You can ask them to shut up, but that you don't have complete power over them.

SPEAKER_01

01:50:57 - 01:51:25

Can't erase humans. Yeah, it's the problem with the relationships. You know, if you break up, you can't erase the other human with robots. I think it will have to be the same thing with robots. That risk that that has to be some risk to our interactions to truly experience them deeply, it feels like. So you have to be able to lose your robot friend. And that robot friend to go tweeting about how much of an asshole you were.

SPEAKER_00

01:51:25 - 01:51:30

But then are you allowed to, you know, murder the robot to protect your private information?

SPEAKER_01

01:51:30 - 01:51:49

You have probably decided to leave? I have the situation that for robots with certain, like, it's almost like regulation. If you declare your robot to be, let's call it essentially, enter something like that. Like, this robot is designed for human interaction. Then you're not allowed to murder these robots. It's the same as murdering of the humans.

SPEAKER_00

01:51:50 - 01:51:55

Well, but what about you do a backup of the robot yet you preserve on the on a high drive or the equivalent in the future.

SPEAKER_01

01:51:55 - 01:52:00

That might be illegal just like it's a and then your priority and piracy is illegal.

SPEAKER_00

01:52:00 - 01:52:13

But it's your own it's your own work about right. But you can't you don't. But then but then you can wipe out. It's great. So the this robot doesn't know anything about you anymore, but you still have. Technically is still the existence because you backed it up.

SPEAKER_01

01:52:13 - 01:53:04

And then there'll be these great speeches of the Supreme Court by saying, Oh, sure. You can erase the mind of the robot, just like you can erase the mind of a human. We both can suffer. There'll be some epic like Obama type character with a speech that we like the robots and the humans are the same. We can both suffer, we can both hope, we can both, all those kinds of things, raise families, all that kind of stuff. It's interesting for these Jessica said, emotion seems to be a fascinating, the powerful aspect of human interaction, human robot interaction, and if they're able to exhibit emotions at the end of the day, that's probably going to have as deeply consider human rights, like what we value humans, what we value in other animals. That's why robots and AI is great.

SPEAKER_00

01:53:04 - 01:53:15

It makes us ask, have you asked about the Chinese women type? Is it real? If it looks real, I think the Chinese women argument is a ridiculous one.

SPEAKER_01

01:53:17 - 01:53:56

So for people who don't know Chinese rumors, you can not, I don't even know how to formulate it well. But basically, you can mimic the behavior of an intelligence system by just following a giant algorithm code book that tells you exactly how to respond in an exact each case. But is that really intelligent? It's like a giant look up table. When this person says this, you answer this. When this person says this, you answer this. And if you understand how that works, you know, this giant nearly infinite look-up table, is that really intelligence? Because intelligence seems to be a mechanism that's much more interesting and complex than this look-up table.

SPEAKER_00

01:53:56 - 01:55:17

I don't think so. So the real question comes down to, do you think, you know, you can make an eyes intelligence in some way, even if that involves learning. And the answer is, of course, yes, there's no question. There's a second question, then, which is, assuming you can reproduce intelligence in sort of different hardware than biological hardware, you know, what I computers. Can you, you know, match human intelligence in all the domains in which humans are intelligent? Is it possible, right? So this is quite the hypothesis of a strong AI. The answer to this, in my opinion, is an unqualified yes. With this well-happened at some point. There's no question that machines at some point will become more intelligent than humans in all domains where humans are intelligent. This is not for tomorrow, it's going to take a long time, regardless of what, you know, on an other, it's have claimed or believed. This is a lot harder than many of those guys think it is. And many of those guys who thought it was simpler than that years, you know, five years ago, now I think it's hard because it's been five years and the realize it's going to take a lot longer. That includes a bunch of people I deep mine, for example.

SPEAKER_01

01:55:17 - 01:55:50

But I want to have an actually touch base for the deep mine folks, but some of it, Elon, or Dennis, I was, I mean, sometimes your role, you had to kind of create deadlines that are nearer than farther away to kind of create an urgency because you know you have to believe the impossible is possible in order to accomplish it and there's of course a flip sides of that coin but it's a weird you can't be too cynical if you want to get something done absolutely I agree with that but I mean you have to inspire people right to work on sort of ambitious things

SPEAKER_00

01:55:53 - 01:56:17

So, you know, it's certainly a lot harder than we believe, but there's no question in my mind that this will happen. And now, you know, people are kind of worried about, what does that mean for humans? They are going to be brought down from their pedestal, you know, a bunch of notches with that. And, you know, is that going to be good about it? I mean, it's just going to give more power, right? It's an amplifier for human intelligence, really.

SPEAKER_01

01:56:18 - 01:56:44

So speaking of doing cool ambitious things, fair, the Facebook AI research group has recently celebrated its eighth birthday, or maybe you can correct me on that. Looking back, what has been the successes, the failures, the lessons learned from the eight years of fair, and maybe you can also give context of where it is the newly minted meta-AI fit into how does it relate to fair?

SPEAKER_00

01:56:44 - 02:00:42

Right, so let me tell you a bit about the organization of all this. Yeah, Fair was created almost exactly a year ago. It wasn't called Fair yet. It took that name a few months later. And at the time, I joined Facebook. There was a group called AI Group that had 12 engineers and a few scientists, like 10 engineers and two scientists, something like that. I run it for three and a half years as a director, you know, hired the first few scientists and kind of set up the culture and organized it, you know, explained to the Facebook leadership what fundamental research was about and how it can work within industry and needs to be open and everything. And I think it's been an unqualified success in the sense that fair has simultaneously produced, you know, top level research and advanced the science and the technology provided tools, open source tools like PyTorch and many others. But at the same time, as had a direct or mostly indirect impact on Facebook at the time, now meta. In a sense that a lot of systems that meta is built around now are based on research projects that started at fair. So if you were to take out, you know, deep-learning out of Facebook services now, and Meta more generally, I mean, the company would literally combo. I mean, it's completely built around around AI these days. And it's really essential to the operations. So what happened after three and a half years is that I changed role, I became chief scientist. So I'm not doing that to the management of a fair anymore. And more of a kind of think about strategy and things like that. And I carry my conduct my own research of my own research group working on self-supervasioning and things like this, which I didn't have time to do when I was director. So now Ferry is run by Joel Pinot and Antoine Bord together, because Ferry is going to split into two now, there's something called Fair Labs, which is sort of bottom-up sense is driven research and fair Excel which is slightly more organized for bigger projects that require a little more kind of focus and more engineering support and things like that. So Joelle needs fair lab and Antoine Barley needs fair Excel. It's all over. So there's no question that the leadership of the company believes that this was a very worthwhile investment and what that means is that It's there for the long run. So there is, if you want to talk in these terms, which I don't like, there's a business model, if you want, where fair, despite being a very fundamental research lab brings a lot of value to the company, mostly indirectly through other groups. Now what happened three and a half years ago when I stepped down was also the creation of physical AI which was basically a larger organization that covers fair so fair is included in it but also has other organizations that are focused on a applied research or advanced development of AI technology that is more focused on the products of the company's OSF system for the matter research. Let's find them at all. But it's still a research. I mean, there's a lot of papers coming out of those organizations and people are some awesome and wonderful to interact with. But it serves as kind of a way to kind of scale up if you want sort of AI technology which you know maybe very experimental and and sort of live prototypes in two things that are usable.

SPEAKER_01

02:00:42 - 02:00:51

So fair is a subset of meta AI is fair become like you see it will just keep the F nobody cares what the F stands for.

SPEAKER_00

02:00:51 - 02:00:57

We'll notice enough by probably by the end of the of 2021.

SPEAKER_01

02:00:57 - 02:00:58

It's not a giant change mayor fair.

SPEAKER_00

02:01:00 - 02:01:14

where mayor doesn't sound too good but you know the brand people are kind of deciding on this and they've been hesitating for for while now and they you know the tell us they're going to come up with an answer as to whether fair is going to change name or whether we're going to change to assuming of the f

SPEAKER_01

02:01:15 - 02:01:19

That's a good call. I will keep fair and change the meaning of the F. That would be my preference.

SPEAKER_00

02:01:19 - 02:01:23

I would turn the F into fundamental.

SPEAKER_01

02:01:23 - 02:01:25

Oh, that's really good. Oh, that's really good.

SPEAKER_00

02:01:25 - 02:02:13

Yeah, so this would be a matter of fair. Yeah, but you know people will call it fair. Yeah, exactly. I like it. And now meta AI is part of the reality lab. So, you know, uh, meta now the new Facebook or just called meta and it's kind of divided into, you know, Facebook, Instagram, WhatsApp, and reality lab. Reality lab is about, you know, ARVR, you know, telepresence, communication technology and stuff like that. It's kind of the, you can think of it as the sort of a combination of sort of new products and technology part of, of meta.

SPEAKER_01

02:02:14 - 02:02:17

Is that where the touch sensing for robots? I saw that you were posting about that.

SPEAKER_00

02:02:17 - 02:02:20

But it's touching for robots to spark fair, actually.

SPEAKER_01

02:02:20 - 02:02:21

That's, that's it. Oh, it is.

SPEAKER_00

02:02:21 - 02:02:29

Okay. Yeah. This is also the, no, but there is the, the other way, the, the haptic glove, right? Yes. It's like that's more reality.

SPEAKER_01

02:02:29 - 02:02:52

That's, that's reality lab research, reality lab research, but by the way, the touch sensor is super interesting, like integrating that modality into the whole, sensing sweet is very interesting. So what do you think about the metaverse? What do you think about this whole kind of expansion of the view of the role of Facebook and meta in the world?

SPEAKER_00

02:02:52 - 02:03:46

Well, I may have a really should be thought of as the next step in the internet. Trying to make the experience more compelling of being connected with other people or with content. And, you know, we are evolved and trained to evolve in, you know, 3D environments where, you know, we can see other people, we can talk to them when we, when we near them or, you know, and other people are far away, can't hear us, you know, things like that, right? So, it, it, there's a lot of social conventions that exist in the real world that we can try to transpose. Now, what is going to be eventually the, How compelling is it going to be? Is it going to be the case that people are going to be willing to do this if they have to wear a huge pair of goggles all day?

SPEAKER_01

02:03:46 - 02:03:52

Maybe not. But then again, if the experience is sufficiently compelling, maybe so.

SPEAKER_00

02:03:52 - 02:04:12

Or if the device that you have to wear is just basically a pair of glasses, you know, technology, and exhibition progress for that, you know, AR is a much easier concept to grasp that you're going to have, you know, augmented reality glasses that busy contain some sort of, you know, virtual assistant that can help you in your daily lives.

SPEAKER_01

02:04:12 - 02:04:21

But at the same time with the AR, you have to contend with the reality with VR. You can completely detach yourself from reality, so gives you freedom. It might be easier to design worlds and VR.

SPEAKER_00

02:04:22 - 02:04:36

Yeah, but you, you can imagine how, you know, the metaverse being, uh, mix, mix, right, or like you can have objects that exist in the metaverse that, you know, pop up on top of the real world or on the exist in virtual reality.

SPEAKER_01

02:04:36 - 02:04:37

Okay, let me ask the hard question.

SPEAKER_00

02:04:39 - 02:04:40

because all of this was easy.

SPEAKER_01

02:04:40 - 02:04:58

This was easy. The Facebook, now meta, the social network has been painted by the media as a net negative for society, even destructive and evil at times. You've pushed back and guessed this defending Facebook. Can you explain your defense?

SPEAKER_00

02:04:58 - 02:08:39

Yeah, so the description, the company that is being described in the in some media is not the company we know when we work inside and you know it could be claims that a lot of employees are uninformed about what really goes on in the company but yeah I'm a vice president I mean I have a pretty good vision of what goes on you know I don't know everything obviously I'm not involved in in everything but certainly not in decision about like you know content moderation or anything like this but I have seen some decent vision of what what goes on And this evil that is being described, I just don't see it. And then, I think there is an easy story to buy, which is that all the bad things in the world and the reason you're a friend believe crazy stuff, there's an easy scapegoat in social media, in general, Facebook, in particular. We have to look at the data, like, is it the case? that Facebook, for example, polarizes people politically. Are there academic studies that show this? Is it the case that teenagers think of themselves less if they use Instagram more? Is it the case that people get more wild up against opposite sides in a debate or a political opinion if they are more in Facebook or if they are less? and study after study, show that none of this is true. This is independent studies by Akademik, they're not funded by Facebook or Meta. Study by Stanford by some of my colleagues at NYU actually with my book collection. There's a study recently, they paid people, I think it was in in the former Yugoslavia, I'm not exaggeration in what part, but they pay people to not use Facebook for a while in the period before the anniversary of the Saber Nietzsche massacres, right? So people get right up, like should we have a celebration? I mean, a memorial kind of celebration for it or not. So they pay a bunch of people to not use Facebook for a few weeks. And it turns out that Those people ended up being more polarized than they were at the beginning and the people who were more on Facebook were less polarized. There's a study, you know, from Stanford of economists at Stanford that tried to identify the causes of increasing polarization in the US. And it's been going on for 40 years before, you know, Mark Zuckerberg was born. Yeah. Continuously. And so there is a cause. It's not Facebook or social media. So you could say social media just accelerated. But no, I mean, it's basically a continuous evolution by some measure of polarization in the US. And then you compare this with other countries like the west half of Germany because you can go 40 years in this east side or Denmark or other countries. And they use Facebook just as much. And they're not getting more polarized, they're getting less polarized. So if you want to look for a causal relationship there, You can find a scapegoat, but you can't find your cause. Now, if you want to fix the problem, you have to find the right cause and what drives me up is that people now are accusing Facebook of bad deeds that are done by others and those others are, we're not doing anything about them. And by the way, those others include the owner of the Wall Street Journal in which all of those papers were published.

SPEAKER_01

02:08:39 - 02:09:21

So I should mention them talking to Shrap, Mike Shrap for on this podcast and also Mark Zuckerberg and probably these conversations can have with them. Because it's very interesting to me, even if Facebook has some measurable negative effect, you can't just consider that in isolation. You have to consider about all the positive ways it connects us. So like every technology is people. It's like question. You can't just say like there's an increase in division. Yes, probably Google search engine has created increase in division. We have to consider about how much information are brought to the world. Like, I'm sure Wikipedia created more division if you just look at the division. We have to look at the full context of the world and they didn't make a better world.

SPEAKER_00

02:09:21 - 02:09:53

The printing press was invented. The first books that were printed were, seems like the Bible, and that allowed people to read the Bible by themselves, not get the message uniquely from priests. in Europe and that created the Protestant movement and 200 years of religious persecution and wars. So that's a bad side effect of the printing press. Social networks aren't being nearly as bad as the printing press, but nobody would say the printing press was a bad idea.

SPEAKER_01

02:09:55 - 02:10:33

Yeah, a lot of this perception and there's a lot of different incentives operating here. Maybe a quick comment, since you're one of the top leaders at Facebook and in at MetaSari that's in the tech space, I'm sure Facebook involves a lot of incredible technological Challenges they need to be solved a lot of it probably is in the computer infrastructure the hardware that I mean it's just a huge amount Maybe can you give me context about how much of trips life is AI and how much of it is low level compute how much of it is Flying all around doing business stuff in the same was Zuckerberg Mark Zuckerberg

SPEAKER_00

02:10:34 - 02:11:51

They really focus on AI. I mean, certainly in the run-up of the creation of Fair and for at least a year after that, if not more, Mark was very, very much focused on AI and was spending quite a lot of effort on it. And that's his style when he gets interested in something he reads everything about it. No, he read some of my papers, for example, before he joined. And so he, he looked a lot better like notes, right? And you know, Shrap was really to it. Also, Shrap is really kind of, you know, has something I've tried to preserve also despite my not so young age, which is a sense of wonder about science and technology. And he certainly has that. He's also a wonderful person. I mean, in terms of, like, as a manager, like dealing with people and everything, Michael, so actually. I mean, they're very like, you know, very human people. In the case of markets, shocking the human given his trajectory. I mean, the personality of him that he spent in the press is just completely wrong.

SPEAKER_01

02:11:51 - 02:12:11

Yeah. But you have to know how to play the press. So that's I put some of that responsibility and him to you have to It's like, you know, like the director, the conductor of an orchestra, you have to play the press and the public in a certain kind of way where you convey your true self to them.

SPEAKER_00

02:12:11 - 02:12:16

If there's a depth and kind of shit. And it's probably not the best I hit, so yeah.

SPEAKER_01

02:12:18 - 02:12:32

You have to learn. And it's sad to see, I'm not talking to him about it, but the sharpest slowly stepping down. It's always sad to see folks sort of be there for a long time and slowly. I guess time.

SPEAKER_00

02:12:32 - 02:12:57

I think he's done the thing he said how to do and you know, he's got, you know, Family priorities and stuff like that. And I understand, you know, after 13 years or something. It's been a good run. Which in Silicon Valley is basically a lifetime. Yeah, you know, because, you know, it's dog ears.

SPEAKER_01

02:12:57 - 02:13:24

So, uh, newer apps, the conference just wrapped up, uh, let me just go back to something else. You posted the paper. You co-authored was rejected from Europe. As you said, proudly in quotes rejected. Can you talk? Yeah, I know. Can you describe this paper and like what was the idea? And it and also maybe this is a good opportunity to ask what are the pros and cons, what works and what doesn't about the review process.

SPEAKER_00

02:13:25 - 02:16:37

Yeah, let me talk about the paper first. Talk about the visual process afterwards. The paper is called VickRag. So this is a mention that before variants in variants, covariance, regularization. And it's a technique and non-contrastive learning technique for what I call joint embedding architecture. So semi-isnets are an example of joint embedding architecture. So joint embedding architecture is, let me back up a little bit. So if you want to do self-supervised learning, You can do it by prediction. So let's say you want to train a system to predict video, right? You show it a video clip and you train the system to predict the next, the continuation of that video clip. Now because you need to handle uncertainty because there are many, you know, many continuations that are plausible, you need to handle this in some way. You need to have a way for the system to be able to produce multiple predictions. And the way the only way I know to do this is to what's called a latent variable. So you have some sort of hidden vector, a variable that you can vary over a set or draw from distribution. And as you vary this vector over a set, the output, the prediction varies over a set of plausible predictions. So that's called a generative latent variable model. Okay now there is an alternative to this to handle uncertainty and instead of directly predicting the the next frames of the clip you will also run those through another neural net So you now have two neural nets. One that looks at the initial segment of the video clip. Another one that looks at the continuation during training. And what you're trying to do is learn a representation. of those two video clip that is maximally informative about the video clips themselves, but it's such that you can predict the representation of the second video clip from the representation of the first one, easily. And you can sort of formalize this in terms of maximizing which one information, some stuff like that, but it doesn't matter. What you want is informative representative representations of the two video clips that are mutually predictable. But that means that there's a lot of details in the second video clips that are irrelevant. Let's say video clip consists in a camera panning, the scene. There's going to be a piece of that room that is going to be revealed and I can somewhat predict what that room is going to look like, but I may not be able to predict the details of the texture of the ground and where the tiles are ending and stuff like that, right? So those are irrelevant details that perhaps my representation will eliminate. And so what I need is to train this second neural net in such a way that whenever the continuation video clip varies over all the plausible continuations, the representation doesn't change.

SPEAKER_01

02:16:37 - 02:16:46

Got it. So yeah, yeah, got it. Over the space of representations, doing the same kind of thing as you do with similarity learning. Right.

SPEAKER_00

02:16:47 - 02:18:35

So these are two ways to handle multi-modality in a prediction, right? In the first way, you parameterize the prediction with a latent variable, but you predict pixels essentially, right? In the second one, you don't predict pixels, you predict an abstract representation of pixels, and you guarantee that this abstract representation has as much information as possible about the input, but sort of, you know, drops all the stuff that you really can't predict essentially. I used to be a big fan of the first approach, and in fact, in this paper with the Chinese, this blog post, the documentary intelligence I was kind of advocating for this. In the last year and a half, I've completely changed my mind. I'm now a big fan of the second one. And it's because of a small collection of algorithms that have been proposed over the last year and a half or so, two years to do this, including V-crag. It's pretty satisfying to call the bottle twins, which I mentioned. a method from offensive deep mine could be way well. And this bunch of others now that kind of works similarly. So they're all based on this idea of joint embedding. Some of them have an explicit criterion that is an approximation of mutual information. Some others would be way well work, but we don't really know why. And there's been like lots of theoretical papers, but why be way better works? No, it's not that, because we take it out and it's your works. I mean, so there's like a big debate, but But the important point is that we now have a collection of non-contrastive, joint embedding methods, which I think is the best thing since sliced bread. So I'm super excited about this because I think it's a best shot for techniques that would allow us to kind of build pretty good world models. And at the same time, learn how our actual representation of the world, where what matters about the world is preserved and what is irrelevant is eliminated.

SPEAKER_01

02:18:36 - 02:18:44

Yeah, by the way, the representations that before and after is in space, in a sequence of images, or is it for single images?

SPEAKER_00

02:18:44 - 02:18:59

It would be either for single image, for a sequence, it doesn't have to be images, this could be applied to text, this could be applied to just about any signal. I'm looking at, you know, I'm looking for methods that are generally applicable that are not specific to, you know, one particular modality, you know, it could be audio or whatever.

SPEAKER_01

02:18:59 - 02:19:05

Got it. So what's the story behind this paper? This paper is what is describing one of the one such method.

SPEAKER_00

02:19:05 - 02:20:11

This is V correct method. So this is co-authored. The first author is a student called Adrien Bound, who is a resident PhD student at Fair Paris. We score advice by me and Jean Ponce, the professor at Economic Superior, also a research director at Inria. So this is a wonderful program in France where PhD students can basically do their PhD in industry and that's kind of what's happening here. And this paper is a follow-up on this bottle between paper, but yeah, I'm from a postdoc now, it's definitely with Eugene and Yuri Shbantar and a bunch of other people from from fair. And one of the main criticism from reviewers is that Vickray is not different enough from bottle twins. But, you know, my impression is that It's, you know, bottle twins with a few bugs fixed essentially and in the end, this is what people were used. Right. So, but, you know, I'm used to stuff. Yeah. That has to be being rejected for once.

SPEAKER_01

02:20:11 - 02:20:14

So, it might be rejected and actually exception will decide it because people use it.

SPEAKER_00

02:20:14 - 02:20:16

Well, it's already decided, like a bunch of times.

SPEAKER_01

02:20:16 - 02:21:09

So, I mean, the question is then to the deeper question about peer review and conferences. I mean, computer science as a field is kind of unique that the conference is highly priced. That's one. And it's interesting because the peer review process there is similar, I suppose, to journals, but it's accelerated significantly, or not significantly, but it goes fast. And it's a nice way to get stuff out quickly. To peer reviewed quickly, go to the presenting quickly to the community, so not quickly, but quicker. But nevertheless, it has many of the same flaws of peer review, because it's a limited number of people look at it as bias and the following. If you want to do new ideas, you're going to get pushed back. There's self-interested people that can infer who submitted it and be cranky about it, all that kind of stuff.

SPEAKER_00

02:21:09 - 02:23:25

Yeah, I mean, there's a lot of social phenomena there. There's one social phenomenon, which is that because the field that's being growing exponentially, the vast majority of people in the field are extremely junior. Yeah. So as a consequence, and that's just a consequence of the field growing, right? So as the number of the size of the field can start saturating, you will have less of that problem of reviewers being very inexperienced. A consequence of this is that, you know, young reviewers, I mean, there's a phenomenon which is that reviewers try to make their life easy and to make their life easy when reviewing a paper is very simple, you just have to find a flaw in the paper, right? So basically they see their task as finding flaws in papers and most papers have flaws even the good ones. So it's easy to do that. Your job is easier as a reviewer if you just focus on this. But what's important is is there a new idea in that paper that is likely to influence? It doesn't matter if the experiments are not that great, if the protocol is So, so, you know, things like that, as long as there is a worthy idea in it that will influence the way people think about the problem, even if they make it better, you know, eventually I think that's really what makes a paper useful. And so, this combination of social phenomena creates a disease that has plagued other fields in the past, like speech recognition, where basically, you know, people chase numbers on, on benchmarks. And it's much easier to get a paper accepted if it brings an incremental improvement on a mainstream well-accepted method or a problem. And those are to me boring papers. I mean, they're not useless, right? Because industry, you know, strives on those kind of progress. But they're not the one that let me try to, in terms of like new concepts and new ideas. So, our papers that are really trying to strike kind of new advances generally don't make it. Now, thankfully, we have archive.

SPEAKER_01

02:23:26 - 02:24:05

archive exactly and then there's open review type of situations where you and then I mean Twitter is a kind of open review. I am a huge believer that review should be done by thousands of people not two people. I agree. And so archive, a DC of future where a lot of really strong papers is already the present but a growing future where it'll just be archive. And you're presenting an ongoing continuous conference called Twitter slash the internet slash archive sanity, Andre just released a new version. So just not, you know, not being so elitist about this particular.

SPEAKER_00

02:24:05 - 02:25:57

It's not a question of being elitist or not. It's a question of being basically recommendation and set of approvals for people who don't see themselves as having the ability to do so by themselves. So it saves time. If you rely on other people's opinion and you trust those people, or those groups to evaluate a paper for you, That saves you time because you don't have to scrutinize the paper as much. It's brought to your attention. There's a whole idea of collective recommenders system. I actually thought about this a lot about 10, 15 years ago. because there were discussions at NIPs and you were about to create Ikea with Yoshe Benjo and so I wrote a document kind of describing a reviewing system which basically was you post your paper on some repository let's say archive or now it can be open review. And then you can form a reviewing entity, which is equivalent to reviewing board of a journal or program committee of a conference. You have to list the members. And then that group reviewing entity can choose to review a particular paper, spontaneously or not. There is no exclusive relationship anymore between a paper and a venue or reviewing entity. Any reviewing entity can review any paper. or may choose not to. And then, given evaluation, it's not published, it's just an evaluation, and a comment, which would be public, signed by the reviewing entity. And if it's signed by reviewing entity, you know, it's one of the members of reviewing entity. So if the reviewing entity is, you know, Lexfieldman's preferred papers, right? You know, it's Lexfieldman writing the review.

SPEAKER_01

02:25:57 - 02:26:09

Yes. For me, that's a beautiful system, I think, but in addition to that, it feels like there should be a reputation system for the reviewers.

SPEAKER_00

02:26:09 - 02:26:13

For the reviewing entities, not the reviewers individually reviewing entities, sure.

SPEAKER_01

02:26:13 - 02:26:49

But even within that the reviewers too, because there's another thing here, it's not just the reputation, it's an incentive for an individual person to do great. Right now, in the academic setting, the incentive is kind of internal, just wanting to do a good job. But honestly, that's not a strong enough incentive to do a really good job in everything a paper. And finding the beautiful amidst the mistakes and the flaws and all that kind of stuff. Like, if you're the person that first discovered a powerful paper, and you get to be proud of that discovery, then that it gives a huge incentive to you.

SPEAKER_00

02:26:49 - 02:27:16

That's a big part of my proposal, actually, where I describe that. If your evaluation of papers is predictive of future success, then your reputation should go up as a reviewing entity. So yeah, exactly. I mean, I even had a master student who's a master student in library science and computer science actually kind of work out exactly how should that should work with formulas and everything.

SPEAKER_01

02:27:16 - 02:27:20

But so in terms of implementation, do you think that's something that's doable?

SPEAKER_00

02:27:20 - 02:28:19

I mean, I've been sort of talking about this to sort of various people like Andrew McCallum who started Open Review. And the reason why we picked Open Review for Ikea initially, even though it was very early for them, is because my hope was that Ikea was eventually going to kind of inaugurate this type of system. So I clearly kept the idea of Open Reviews, so whether reviews are published with a paper, which I think is very useful. but in many ways that's kind of reverted to kind of more of a conventional type conferences for everything else and that I mean I I don't run, I agree, I'm just the president of the foundation, but you know, people who run it should make decisions about how to run it and I'm not going to tell them because there are volunteers and I'm really thankful that they do that. But I'm saddened by the fact that we're not being innovative enough.

SPEAKER_01

02:28:19 - 02:28:30

Yeah, I mean to, I hope that changes. Yeah, because the communication science broadly, but communication computer science ideas is how you make those ideas have impact, I think.

SPEAKER_00

02:28:30 - 02:29:07

Yeah, and I think a lot of this is because people have in their mind a kind of an objective which is fairness for authors and the ability to count points basically and give credits accurately. But that comes at the expense of the progress of science. So to some extent, we're slowing down the progress of science and I will actually achieving fairness. And we're not as achieving fairness, you know, we set biases, you know, we're doing, you know, double blood review, but, you know, the biases are still there, the different kinds of biases.

SPEAKER_01

02:29:08 - 02:29:32

You write that the phenomenon of emergence, collected behavior exhibited by large collection of simple elements in interaction, is one of the things that got you into neural nets in the first place. I love cellular automata. I love simple interacting elements and the things that emerge from them. Do you think we understand how complex systems can emerge from such simple components that interact simply?

SPEAKER_00

02:29:33 - 02:30:10

No, we don't. It's a big mystery. It's a mystery for physicists. It's a mystery for biologists. How is it that the universe around us seems to be increasing complexity and not decreasing? I mean, that is a curious property of physics that despite the second law of the modernomics, we seem to be evolution and learning, et cetera, seems to be at least locally. to increase complexity and decrease it. So perhaps the ultimate purpose of the universe is just get more complex.

SPEAKER_01

02:30:11 - 02:30:31

have these, I mean, small pockets of beautiful complexity. Does that, to sell your topologies, kinds of emergence and complex systems, give you some intuition or guide your understanding of machine learning systems and neural networks and so on? Are these for you right now, disparate concepts?

SPEAKER_00

02:30:31 - 02:32:44

Well, you got me into it. You know, I discovered the existence of the perceptron when I was a college student. You know, by reading a good book, it was a debate between Chomsky and Piagre, and see more paper from MIT. It was kind of singing the praise of the perception on that book. And I was first time I heard about the learning machine. So I started digging the literature and I found those papers, those books, which were basically transcription of workshops or conferences from the 50s and 60s about self-organizing systems. So there were there was a series of conferences on self-organizing systems and this books on this. Some of them are, you can actually get them at the internet archive, you know, digital version. And there are like fascinating articles in there, but this guy whose name has been largely forgotten, Heinz von Förster. It's a German physicist who immigrated to the U.S. And worked on self-organizing systems in the 50s. And in the 60s, he created at University of Illinois by Nash and Pan, he created the biological computer laboratory, DCL, which was all about neural nets. Unfortunately, that was kind of towards the end of the popularity of neural nets, so that lab never kind of strives very much. But, but you're all the bunch of papers about how forganization and the mystery of self-organization. An example he has is, you take, imagine you are in space, there's no gravity. You have a big box with magnets in it, okay? You know, what kind of rectangle or magnets with nospole on one end? So that's one on the other end. You shake the box gently, and the magnets will stick to themselves and probably form a complex structure. That could be an example of self-organization. But you have lots of examples. Neural nets are an example of self-organization in many respects. And it's a bit of a mystery, you know, how, like what is possible with this, you know, pattern formation in physical systems, in chaotic system and things like that, you know, the emergence of life, you know, things like that. So, you know, how does that happen? It's a big puzzle for physicists as well.

SPEAKER_01

02:32:44 - 02:33:11

It feels like understanding the mathematics of emergence in some constrained situations might help us create intelligence, like help us add a little spice to the systems, because you seem to be able to, in complex systems with emergence, to be able to get a lot from little. And so that seems like a shortcut to get big leaps and performance.

SPEAKER_00

02:33:11 - 02:34:49

But there's a missing concept that we don't have. And it's something also I've been fascinated by since my undergrad days. And it's how you measure complexity. So we don't actually have good ways of measuring or at least we don't have good ways of interpreting the measures that we have out of disposal. Like how do we measure the complexity of something, right? So there's all those things, you know, like, you know, comogor of chitings, sort of amount of complexity of You know, the length of the shortest program that we generated this string can be thought of as the complexity of that this string. I've been fascinated by that concept. The problem with that is that that complexity is defined up to constant, which can be very large. There are similar concepts that are derived from, you know, Bayesian probability theory where, you know, the complexity of something is the negative log of its probability essentially, right? And you have a complete equivalent of speaking to two things. And then you would think, you know, the probability is something that's well defined mathematically, which means complexity is well defined. But it's not true. You need to have a model of the distribution. And you may need to have a prior if you're doing Bayesian inference. And the prior place is same world as the choice of the computer with which you measure, you're going to go off complexity. And so every measure complexity we have has some arbitrary necessity. You know, an additive constant, which is, can be arbitrary large. And so, How can we come up with a good theory of how things become more complex if we don't have a good measure of complexity?

SPEAKER_01

02:34:49 - 02:35:24

Which we need for this one way that people study this in space biology, the people that study the original life, we're trying to recreate life in the laboratory and the more interesting one is the alien one is when we go to other planets, how do we recognize this life? because you know, complexity associated complexity may be some level of mobility with life. You know, we have to be able to like have concrete algorithms for like measuring the level of complexity we see in order to know the difference in life and non-life.

SPEAKER_00

02:35:25 - 02:36:34

And the problem is that complexity is in the eye of the beholder. So let me give you an example. If I give you an image of the emnis digits, right? And I flip through emnis digits. There is some, obviously some structure to it because local structure, you know, neighboring pixels are correlated across the entire data set. Now imagine that I apply a random permutation to all the pixels. a fixed random permutation. Now, I show you those images that will look, you know, really just organize to you more complex. In fact, they're not more complex in absolute terms, they exactly the same as originally, right? And if you knew what the permutation was, you know, you could undo the permutation. Now, imagine I give you special glasses that undo their permutation. Now, all of a sudden, what looked complicated becomes simple. Right. So if you have your humans on one end and then another race of aliens that sees the universe with permutation glasses, what we perceive as simple to them is hardly complicated, it's probably heat.

SPEAKER_01

02:36:34 - 02:36:35

Yeah. Heat. Yeah.

SPEAKER_00

02:36:35 - 02:36:42

Okay. And what they perceive as simple to us is random fluctuation. It's heat. Yeah.

SPEAKER_01

02:36:42 - 02:36:47

So truly in the eye, they'd be holder. Yeah. That's what kind of glasses you're wearing.

SPEAKER_00

02:36:47 - 02:36:47

Right.

SPEAKER_01

02:36:47 - 02:36:50

That's what kind of algorithm you're running in your perception system.

SPEAKER_00

02:36:50 - 02:37:02

So I don't think we'll have a theory of intelligence, self-organization, evolution, things like this. Until we have a good handle on an ocean of complexity, which we know is in the eye of the beholder.

SPEAKER_01

02:37:04 - 02:37:12

Yeah, it's sad to think that we might not be able to detect or interact with alien species because we're wearing different glasses.

SPEAKER_00

02:37:12 - 02:37:33

Because their notion of locality might be different from ours. Yeah. This actually connects with fascinating questions in physics at the moment, like modern physics, quantum physics, like, you know, questions about, like, you know, can we recover the information that's lost in a black hole and things like this, right? And that relies on notions of complexity, which, you know, I find this fascinating.

SPEAKER_01

02:37:33 - 02:37:46

Can you describe your personal quest to build an expressive electronic wind instrument EWI? What is it? What does it take to build it?

SPEAKER_00

02:37:46 - 02:38:23

Well, I'm a Tinkerer. I like building things. I like building things with combinations of electronics and mechanical stuff. I have a bunch of different hobbies, but probably my first one was a little building model airplanes and stuff like that, and I still do that to some extent. But also electronics, I thought myself electronics before I studied it. And the reason I taught myself electronics is because of music. My cousin was inspiring an electronic musician and then he had an analog synthesizer and I was basically modifying it for him and building sequencer and stuff like that, right for him. I was in high school when I was doing this.

SPEAKER_01

02:38:23 - 02:38:31

That's the interest in progressive rock, like 80s. What's the greatest band of all time according to Yalan Kuhn?

SPEAKER_00

02:38:31 - 02:38:51

I'm not as too many of them, but it's a combination of, you know, my vision of orchestra, whether report, yes, Genesis, you know, yes, Genesis. Free Peter Gabriel. Gentle giant, you know, things like that.

SPEAKER_01

02:38:51 - 02:38:56

Great. Okay, so this, this level of electronics and this level of music combined together.

SPEAKER_00

02:38:56 - 02:41:07

Right, so I was actually trying to play a Baroque and Renaissance music and I played in our orchestra when I was in high school and first I was a college. And I played the recorder, cram horn, a little bit of oboe, you know, things like that. So I'm a wind instrument player. But I always wanted to play improv as music, even though I don't know anything about it. And the only way I figured, you know, short of like running to play saxophone was to play electronic wind instrument. So they behave, the finger ring is similar to saxophone, but you know, you have wide variety of sound because you control the synthesizer with it. So I had a bunch of those, you know, going back to the late 80s from either Yamaha or Akai, they both kind of do the main manufacturers of those. So they were classically, you know, going back several decades. But I've never been completely satisfied with them because of lack of accessibility. And, you know, those things, you know, are somewhat expressive. I mean, they measure the breath pressure, they measure the lip pressure, and, you know, you have a large parameters. You can, you can vary it with fingers, but they, they're not really as expressive as a acoustic instrument, right? You, you hear John Cook trying to play two notes. And you, you know, it's your own country and you know, it's got a unique sound or, or my Davis, right? You can hear it's my Davis playing the trumpet because the the sound reflects their you know, physiognomy, basically, the shape of the vocal track, kind of shapes the sound. So how do you do this with electronic equipment? And I was, many years ago I met a guy called David Wessel. He, he was a professor at Berkeley and created the center for like, you know, music technology there. And he was interested in that question. And so I kept kind of thinking about this for many years. And finally, because of COVID, you know, I was in my workshop. My workshop serves also as my kind of Zoom room and home office. And this is New Jersey. New Jersey. And I started really being serious about building my own EU instrument.

SPEAKER_01

02:41:07 - 02:41:17

What else is going on and then New Jersey workshop? Is there some crazy stuff you built? Or like left on the workshop floor left behind?

SPEAKER_00

02:41:17 - 02:41:28

A lot of crazy stuff is electronics will be able to with microcontrollers of our sky and we're flying contraptions.

SPEAKER_01

02:41:28 - 02:41:29

So you still love flying?

SPEAKER_00

02:41:30 - 02:41:55

It's a family disease. My dad got me into it when I was a kid. And he was building model airplanes when he was a kid. And he was a mechanical engineer. He taught himself electronics also. So he built his early radio control systems in the late 60s, early 70s. And so that's what got me into, I mean he got me to kind of engineering and science and technology.

SPEAKER_01

02:41:55 - 02:42:04

You also have an interest in appreciation of flight in other forms like what drones, quadroctors, or do you essentially model airplane?

SPEAKER_00

02:42:04 - 02:42:25

You know, before drones were kind of a consumer product. You know, I built my own, you know, with also building a microcontroller with JavaScript and accelerators for stabilization, writing the firmware for it, you know, and then when it became kind of a standard thing you could buy, it was boring, you know, I stopped doing it. It was not for any more.

SPEAKER_01

02:42:25 - 02:42:51

Yeah. You were doing it before it was cool. Yeah. What advice would you give to a young person today in high school and college that Dreams of doing something big like Yannikun, like let's talk in the space of intelligence. Dreams of having a chance to solve some fundamental problem in space of intelligence, both with our career and just in life, being somebody who was a part of creating something special.

SPEAKER_00

02:42:52 - 02:43:54

So try to get interested by big questions, things like, you know, what is intelligence, what is the universe made of, what's life all about, things like that. Like even like crazy big questions, like what's time, like nobody knows what time it's. And then learn basic things, like basic methods, either from math, from physics or from engineering. Things that have a long shelf life, like if you have a choice between learning mobile programming or iPhone or quantum mechanics, take quantum mechanics. Because you're going to learn things that you have no idea exist. And you may never be quantum physicist, but you're lying about passing to goals. And passing to goals are used everywhere. It's the same formula that you use for, you know, vision integration and stuff like that.

SPEAKER_01

02:43:55 - 02:44:08

So the ideas, the little ideas within quantum mechanics, within some of these kind of more solidified fields will have a longer shelf life. They'll use somehow, use indirectly in your work.

SPEAKER_00

02:44:08 - 02:46:53

Learning classical mechanics, like you're learning about Lagongens, for example. which is like a huge, hugely useful concept for all kinds of different things. Learn statistical physics because all the math that comes out of machine learning basically comes out of what's we got at by statistical physicists in the late 19th, early 20th century. And for some of them, actually, more recently, for people like George O'Pareasy, who just got Nobel Prize for the replica method among other things. It's used for a lot of different things. You know, vibrational inference, that math comes from the SQL physics. So a lot of those kind of basic courses, you know, if you do it like a engineering, you take single processing, you'll learn about for your transforms. Again, something super useful is that the basis of things like graph neural nets, which is an entirely new sub-area of AI machine learning deep learning, which I think is super promising for all kinds of applications. Something very promising if you're more interested in applications is the applications of AI machine learning and deep learning to science. or to science that can help solve big problems in the world. I have colleagues at Meta at Fair, who started this project called Open Catalyst. It's an open project collaborative. The idea is to use deep learning to help design new chemical compounds or materials that would facilitate the separation of hydrogen from oxygen. If you can efficiently separate oxygen from hydrogen with electricity, you solve climate change. It's as simple as that. Because you cover some random desert with solar panels, and you have them work all day, produce hydrogen, and then you should the hydrogen and wherever it's needed, you don't need anything else. You have controllable power that can be transported anywhere. So if we have large scale efficient energy storage technology like producing hydrogen, we solve climate change. Here's another way to solve climate change is figuring out how to make fusion work. The problem with fusion is that you make a super hard plasma and the plasma is unstable and you can control it. Maybe with deep learning, you can find controllers that would stabilize plasma and make practical fusion reactors. I mean, that's very speculative, but, you know, it's worth trying because, you know, the payoff is huge. There's a group at Google working on this led by Trump Platt.

SPEAKER_01

02:46:53 - 02:47:02

So control, convert as many problems in science and physics, biology and chemistry into a, into a learnable problem and see if a machine can learn it.

SPEAKER_00

02:47:03 - 02:48:43

Right. I mean, there's properties of complex materials that we don't understand from first principle, for example. So if we could design new materials, we could make more efficient batteries. We could make maybe faster electronics. There's a lot of things we can imagine. doing or, you know, lighter materials for cars or airplanes, things like that, maybe better fuel sales. I mean, there's all kinds of stuff we can imagine. If we had good fuel sales, hydrogen fuel sales, we could use them to power airplanes. And, you know, transportation wouldn't be, or cars. And we wouldn't have emission problem, CO2 emission problems for air transportation anymore. So there's a lot of those things I think where AI can be used. And this is not even talking about all the sort of medicine biology and everything like that, right? You know, like protein folding, you know, figuring out, like how could you design your proteins that sticks to another protein that a particular site? Because that's how you design drugs in the end. So, you know, deep round you would be useful. Those are kind of, you know, it would be sort of enormous progress if we could use it for that. Here's an example. If you take, this is like from recent material physics, you take a monotomic layer of graphene, right? So it's just carbon on an hexagonal mesh and you make this single atom thick. You put another one on top. You twist them by some magic, number of degrees, three degrees or something. It becomes superconductor. Nobody has any idea why.

SPEAKER_01

02:48:43 - 02:48:47

I want to know how that was discovered. But that's the kind of thing that machine learning can actually discover these things.

SPEAKER_00

02:48:47 - 02:50:14

Well, maybe not. But there is a hint, perhaps, that with machine learning, we would train a system to basically be a phenomenological model of some complex immersion phenomenon, which superconductivity is one of those. where, you know, think the descriptive phenomenon is too difficult to describe from first principles with the current, you know, the usual sort of reductionist type method. But we could have deep learning systems that predict the properties of a system from a description of it after being trained with sufficiently many samples. So it's got Pascal Fuad, EPFL, it has a startup company that basically trained a convolutional net essentially to predict the aerodynamic properties of solids. And you can generate, as much as you want, by just running computational fluid dynamics. So you give, like, a wing for something shape of some kind and your own competition for dynamics you get as a result the dragon you know lift on all that stuff right and you can you can generate lots of data trainer neural net to make those predictions and now what you have is a differentiable model of let's say dragon and lift as a function of the shape of that solid and so you can do by gradient descent you can optimize the shape so you get the properties you want

SPEAKER_01

02:50:16 - 02:50:56

Yeah, that's incredible. That's incredible. And on top of all that, probably you should read a little bit of literature and a little bit of history for inspiration and for wisdom. Because after all, all of these technologies will have to work in a human world. Yes. And the human world is complicated. Yeah, and this is an amazing conversation. I really honor the talk with me today. Thank you for all the amazing work you're doing at Fair at Meta. And thank you for being so passionate after all these years about everything that's going on. You're a beacon of hope for the machine learning community. And thank you so much for spending your valuable time with me today.

SPEAKER_00

02:50:56 - 02:50:59

That was awesome. Thanks for having me on. That was a pleasure.

SPEAKER_01

02:51:00 - 02:51:23

Thanks for listening to this conversation with Yannukun. To support this podcast, please check out our sponsors in the description. And now, let me leave you some words from Isaac Asimov. Your assumptions are your windows in the world. Scrub them off every once in a while, or the light won't come in. Thank you for listening and hope to see you next time.