Transcript for François Chollet: Keras, Deep Learning, and the Progress of AI
SPEAKER_01
00:00 - 01:48
The following is a conversation with Francois Sholae. He's the creator of Caras, which is an open source deep learning library that is designed to enable fast user-friendly experimentation with deep neural networks. It serves as an interface to several deep learning libraries, most popular of which is TensorFlow, and it was integrated into the TensorFlow main code base a while ago. Meaning, if you want to create, train, and use neural networks, probably the easiest and most popular option is to use Keras inside TensorFlow. Aside from creating an exceptionally useful and popular library, Francois is also a world class AI researcher and software engineer at Google. And he's definitely an outspoken if not controversial personality in the AI world, especially in the realm of ideas around the future of artificial intelligence. This is the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, give us five stars and iTunes, support on Patreon, or simply connect with me on Twitter at Lex Friedman spelled FRIDMAN. And now, here's my conversation with Francois Schale. You're known for not sugar coating your opinions and speaking your mind about ideas and AI, especially on Twitter. It's one of my favorite Twitter accounts. So what's one of the more controversial ideas you've expressed online and gotten some heat for? How do you pick?
SPEAKER_02
01:51 - 03:56
Yeah, no, I think if you go through the trouble of maintaining a Twitter account, you might as well speak your mind, you know, otherwise it was even the point during a Twitter account. It's definitely nice. Just living in the garage. Yeah, so what's one thing for which I got I love pushback. Perhaps that time I wrote something about the idea of intelligence explosion and I was questioning the idea and the reasoning behind this idea. And I guess I love pushback on that. I guess I have to flag for it. So, yeah, as an intelligence explosion, I'm sure you're familiar with the idea, but it's the idea that if you were to build a general AI problem surrounding algorithms, well, the problem of building such an AI, That itself is a problem that could be sold by your AI and maybe it could be sold better than what humans can do. Your AI could start tweaking in some of the guys and could start making a better version of itself. And so on, iteratively in a recursive fashion and so you would end up with an AI with exponentially increasing intelligence. And I was basically questioning this idea. First of all, because the notion of intelligence explosion uses an implicit definition of intelligence that doesn't sound quite right to me. It considers intelligence as a property of a brain that you can consider in isolation, like the height of a building, for instance. But as not really what intagence is, intagence emerges from the interaction between a brain, a body, like embodied intagence and an environment. And if you're missing one of these spaces, then you cannot really define the intagence anymore. So just tweaking a brain to make it smarter and smarter doesn't actually make any sense to me.
SPEAKER_01
03:56 - 05:48
So first of all, you're crushing the dreams of many people, right? So there's a, let's look at like Sam Harris. I feel a lot of physicists, Max Tag Mark, people who think, you know, the universe is the information processing system. Our brain is kind of an information processing system. So what's the theoretical limit? Like, it doesn't make sense that there should be some, It seems naive to think that our own brain is somehow the limit of the capabilities and this information is just I'm playing that was advocate here this information processing system and then if you just scale it if you're able to build something that's on par with the brain you just the process that builds it just continues and it will improve exponentially so that That's the logic that's used actually by almost everybody that is worried about superhuman intelligence Yeah, so you're you're trying to make the most people who are skeptical of that are kind of like This doesn't there's thought process. This doesn't feel right like that's for me as well. So I'm more like it doesn't The whole thing is shrouded mystery where you can't really say anything concrete, but you could say this doesn't feel right. It doesn't feel like that's how the brain works. And you're trying to with your blog post and now making a little more explicit. So one idea is that the brain isn't the least alone. It exists within the environment. So you can't exponentially, you have to somehow exponentially improve the environment and the brain together. Almost yet, in order to create something that's much smarter in some kind of, of course, we don't have a definition of intelligent. That's correct.
SPEAKER_02
05:48 - 07:10
That's correct. I don't think you should get very smart people today, even humans. Not even talking about AIs. I don't think the brain and the performance of the brain is the bottleneck to the express intelligence to their achievements. You cannot just tweak one part of the system, like of this brain, body, environment system, and expect capabilities like what emerges out of the system to just explode exponentially. Because anytime you improve one part of the system with many interdependencies like this, there's a new bottleneck that arises. And I don't think even today, for very smart people, their brain is not the bottleneck to do sort of problems they can solve. In fact, many very spot people today, you know, they're not actually solving any big scientific problems, they're not Einstein. They're like Einstein, but you know, the patent-click days. Like Einstein became Einstein, because this was a meeting of a genius with a big problem at the right time, right? But maybe this meeting could have known and never happened. And then Einstein, there's just been a patent-click, right? And in fact, many people today are probably like genius level smart, but you wouldn't know because they're actually expressing it.
SPEAKER_01
07:11 - 07:39
Well, it's brilliant. So we can think of the world, earth, but also the universe as just as a space of problem. So all these problems and tasks are roaming at a various difficulty. And there's agents, creatures like ourselves and animals and so on that are also roaming it. And then you get coupled with a problem. And then you solve it. But without that coupling, you can't demonstrate your quantum quantum intelligence.
SPEAKER_02
07:39 - 07:59
Exactly, intelligence is the meeting of great problems solving capabilities with a great problem. And if you don't have the problem, you don't be expressing intelligence, all your left ways is potential intelligence, like the performance of your brain, you know, how high your IQ is, which in itself. It's just a number, right?
SPEAKER_01
07:59 - 08:22
So you mentioned problem solving capacity. Yeah. What do you think of as problem solving capacity? What can you try to define intelligence? Like what does it mean to be more or less intelligent? Is it completely coupled to a particular problem? Or is there something a little bit more universal?
SPEAKER_02
08:22 - 09:30
Yeah, I do believe all intelligence is specialized intelligence. Even human intelligence, as some degree of generality, well, all intelligent systems of some degree of generality, they're always specialized in one category of problems. So the human intelligence is specialized in the human experience, and that shows at values levels, that shows in some prior knowledge that's innate that we have at birth. knowledge about things like agents, gold driven behavior, visual priors about what makes an object, priors about time and so on. That shows also in the way we learn, for instance, it's very, very easy for us to pick up language. It's very, very easy for us to learn certain things because we are basically hard-coated to learn them. And we are specialized in solving certain kinds of problems. And we are quite useless when it comes to other kinds of problems. For instance, we are not really designed to handle very long-term problems. We have no credibility of seeing the very long-term. We don't have
SPEAKER_01
09:32 - 09:44
much working memory you know so how do you think about long-term do you think long-term planning or talk about scale of years millennia what do you mean by long-term we're not very good
SPEAKER_02
09:45 - 11:16
Well, human intelligence, especially as in the human experience, and human experience is very short. Like one lifetime is short. Even within one lifetime, we have a very hot time and visioning, you know, things on a scale of years. Like it's very difficult to project yourself at a scale of five years, a scale of ten years and so on. We can solve only fairly narrowly-scoped problems. So when it comes to solving bigger problems, larger scale problems, we are not actually doing it on an individual level, so it's not actually our brain doing it. We have this thing called civilization, which is itself a sort of problem-solving system, a sort of artificial intelligence system. And it's not running on one brain, it's running on network of brains. In fact, it's running on much more than a network of brains. It's running on a lot of infrastructure, like books and computers and the internet and human institutions and so on. And that is capable of handling problems on the Mishkreda scale and any individual human. Computer science, for instance, that's an institution that solves problems and it is superhuman, right? I had to press on the greatest scale, it can solve, it can solve much bigger problems than an individual human could. And science itself, science as a system, as an institution is a kind of artificial intelligence problem solving algorithm that is superhuman.
SPEAKER_01
11:16 - 12:13
Yeah, it's a well, these computer science is like a theorem proofer. at a scale of thousands, maybe hundreds of thousands of human beings. At that scale, what do you think is intelligent agent? So there's us humans at the individual level. There is millions, maybe billions of bacteria in our skin. There is, that's at the smaller scale. You can even go to the particle level as systems that behave. You can say intelligently in some ways. And then you can look at Earth as a single organism. You can look at our galaxy and even the universe as a single organism. Do you think, how do you think about scale in defining intelligent systems? And we're here at Google, there is millions of devices doing computation in a distributed way. How do you think what intelligence does scale?
SPEAKER_02
12:13 - 13:50
You can always characterize anything as a system. I think people will To talk about things like intelligence explosion, tend to focus on one agent is basically one brain, like one brain considered in isolation, like a brain jaw that's controlling a body in a very like top to bottom can a fashion and that body is person goals into an environment. So it's a very hierarchical view. You have the brain at the top of the pyramid. then you have the body just plainly receiving orders and then the body is manipulating objects in environment and so on. So everything is subordinate to this one thing, this epicenter which is the brain, but in real life intelligent agents don't really work like this. There is no strong limitation between the brain and the body to stop it. You have to look not just at the brain but at the nervous system. But then the nervous system and the body are not free to separate entities. So you have to look at an entire animal as one agent. But then you start realizing as you observe an animal of any length of time that a lot of the intelligence of an animal is actually externalized. That's especially true for humans. A lot of all intelligence is externalized. When you write down some notes, that is externalizing intelligence. When you write a computer program, you are externalizing cognition. So it's externalizing books. It's externalized in computers, the internet in other humans. It's externalizing language and so on. So there is no like how the limitation of what makes an intelligent agent. It's all about context.
SPEAKER_01
13:54 - 14:23
Alpha goes better at go than the best human player. You know, there's levels of skill here. So do you think there's such a ability as such a concept as an intelligence explosion and a specific task? And then, well, yeah, do you think it's possible to have a category of tasks on which you do have something like an exponential growth of ability to solve that particular problem?
SPEAKER_02
14:24 - 15:37
I think if you consider a specific ethical, it's probably possible to some extent. I also don't think we have to speculate about it because we have real world examples of regular civility, self-improving, intelligent systems, science. is a problem solving system and knowledge generation system, like a system that experiences the world in some sense and then gradually understands it and can act on it and that system is superhuman and it is clearly a recursively self-improving because science feeds into technology technology can be used to build better tools, better computers, better instrumentation and so on which in turn can make science faster. So science is probably the closest thing we have today to a regular civility self-improving experiment AI. And you can just observe, you know, science is scientific progress through the exploding, which you know itself is an interesting question. You can use that as a basis to try to understand what we happen with a superhuman AI that has science like behavior.
SPEAKER_01
15:38 - 15:54
Let me linger on it a little bit more. What is your intuition why it, and it tells you this explosion is not possible, like taking the scientific, all the scientific revolutions? Why can't we slightly accelerate that process?
SPEAKER_02
15:55 - 21:25
So you can absolutely accelerate any problem-solving process. So recursively, recursive search improvement is absolutely a real thing. What happens with recursively-severing system is typically not explosion because no system exists in isolation and so tweaking one part of the system. means that, suddenly, another positive system becomes a bottleneck. And if you look at science, for instance, which is clearly a recursively self-improving, clearly a problem-solving system, scientific progress is not actually exploding. If you look at science, what you see is the picture of the system that is consuming an exponentially increasing amount of resources. But it's having a linear output in terms of scientific progress. And maybe that will seem like a very strong claim. Many people are actually saying that, you know, scientific progress is exponential. But when they're claiming this, they're actually looking at indicators of resource consumption by science. For instance, the number of papers being published, the number of patterns being filed and so on, which are just just completely credited with how many people are working on science today, right? So it's actually an indicator of resource consumption, but what you should look at is the output is progress in terms of the knowledge that sense generates, in terms of the scope and significance of the problems that we solve. And some people have actually been trying to measure that. Like Michael Nielsen, for instance, he had a very nice paper, I think that was last year by it. So his approach to measure a scientific progress was to look at the timeline of scientific discoveries over the past, you know, 150 years. And for each measure discovery, ask a panel of experts to write the significance of the discovery. And if the output of sciences in institution or exponential, you will expect the temporal density of significance to go up exponentially, because there's a faster rate of discoveries, because the discoveries are increasingly more important. And what actually happens if you plot this temporal density of significance measured in this way is that you see very much a flat graph. You see a flat graph across all disciplines across physics, biology, medicine, and so on. And it actually makes a lot of sense. If you think about it, it's because think about the progress of physics, a hundred and ten years ago, right? It was a time of crazy change. Think about the progress of technology, you know, a hundred and seventy years ago when we started, you know, replacing horses with cars when we started electricity and so on. It was a time of incredible change, and today is also a time of very fast change, but it would be an unfair characterization to say that today, technology and science are moving way faster than they did 50 years ago or 100 years ago. And if you do try to rigorously plot the temporal density of You do see very flat curves. And you can check out the paper that Michael Nilsson had about this idea. And so the way interpret it is as you make progress. in a given field, on a given sub-field of science, it becomes exponentially more difficult to make further progress, like the very first person to work on information theory. If you enter a new field and is still very early years, there's a lot of low-hanking fruit you can pick. But the next generation of researchers is going to have to dig much harder actually to make smaller discoveries. A probably larger number of smaller discoveries, and to achieve the same amount of impact you're going to need a much greater head count. And that's exactly the picture you're seeing with science, that the number of scientists and engineers is in fact increasing exponentially. The amount of computational resources that are available to science is increasing exponentially and so on. So the resource consumption of science is exponential, but the output in terms of progress in terms of significance is linear. And the reason why is because And even though science is rigorously self-improving, meaning that scientific progress turns into technological progress, which, in turn, helps science. If you look at computers, for instance, our products of science and computers are tremendously useful in speaking of science. The internet, same thing, the internet is a technology that's made possible by various scientific advances. And it's self because it enables in the scientists to network, to communicate, to exchange papers and ideas much faster. It is a way to speed a scientific process. So even though you're looking at a recursively self-improving system, it is consuming exponentially more resources to produce the same amount of problem-solving factors.
SPEAKER_01
21:26 - 21:49
So this is a fascinating way to paint it. And certainly that holds for the deep learning community, right? If you look at the temporal, what did you call it? The temporal density of significant ideas. If you look at deep learning, I think I'd have to think about that. But if you really look at significant ideas in deep learning, there might even be decreasing.
SPEAKER_02
21:49 - 22:54
So I do believe the per For paper, significance is increasing. But the amount of papers is still today, exponentially increasing. So if you look at an aggregate, my guess is that you would see a linear progress. If you were to sum the significance of all papers, you would see roughly in your progress. And in my opinion, it is not a coincidence that you're seeing linear progress in science despite exponential resource conception. I think the resource consumption is dynamically adjusting itself to maintain in your progress because we as a community expecting your progress meaning that if we start investing less and sing less progress it means that suddenly there are some lower hanging fruits that become available and someone's going to step up and pick them right so it's very much like a market for discoveries and ideas.
SPEAKER_01
22:54 - 23:17
But there's another fundamental part which you're highlighting, which as a hypothesis as science or like the space of ideas, anyone path you travel down, it gets exponentially more difficult to get new ideas. And your sense is that that's going to hold across our mysterious universe.
SPEAKER_02
23:18 - 25:14
Yes, well, exponential progress triggers exponential friction, so that if you tweak one part of the system, certainly some other part becomes a bottleneck. For instance, let's say, let's say, develop some device that measures its own acceleration, and then it has some engine, and it outputs even more acceleration in proportion of its own acceleration, and you drop it somewhere. It's not going to reach infinite speed, because it exists in a certain context. So the error on this is going to generate friction. It's going to block it at some top speed. And even if you were to consider the broader context and lift the bottleneck there, like the bottleneck of friction. then some other part of the system, which starts stepping in and creating exponential friction, maybe the speed of flight, I know whatever. And it's definitely horse true when you look at the problem solving algorithm that is being run by science as an institution, science as a system. As you make more and more progress, despite having this recursive self-improvement component, you are encountering exponential friction. Like the more researchers you have working on on different ideas, the more overhead you have in terms of communication across researchers. If you look at you were mentioning quantum mechanics, right? Well, if you want to start making significant discoveries today, significant progress in quantum mechanics, there is an amount of knowledge you have to ingest, which is huge. So there's a very large overhead to even start to contribute. There's a large amount of overhead to synchronize across researchers and so on. And of course, the significant practical experiments are going to require exponentially expensive equipment because they're easier ones. I've already been run, right?
SPEAKER_01
25:14 - 25:25
So in your senses, there's no way escaping, there's no way of escaping this kind of friction with artificial intelligence systems.
SPEAKER_02
25:26 - 26:41
Yeah, no, I think science is a very good way to model what would happen with the superhumanity of improving AI. That's my intuition. It's not like a mathematical proof of anything. That's not my point. I'm not trying to prove anything. I'm just trying to make an argument to question the narrative. of intelligence explosion which is quite dominant narrative and you do get a lot of pushback if you go against it because so for many people right AI is not just a subfield of computer science it's more like a belief system like this belief that the world is headed towards an event, the singularity past which you know AI will become, we will go exponential very much, and the world will be transformed and humans will become obsolete. And if you go against this narrative, because it is not really a scientific argument, but more of a belief system, it is part of the identity of many people. If you go against this narrative, it's like you're attacking the identity of people who believe in it. It's almost like saying God doesn't exist or something, right? So if you do get a lot of pushback, if you try to question these ideas,
SPEAKER_01
26:42 - 27:37
First of all, I believe most people, they might not be as eloquent or explicit as you're being, but most people in computer science. And most people who actually have built anything that you could call AI, quote unquote, would agree with you. They might not be describing in the same kind of way, it's more. So the pushback you're getting is from people who get attached to the narrative from not from a place of science, but from a place of imagination. So, why do you think that's so appealing? Because the usual dreams that people have when you create a super intelligence system past a singularity that what people imagine is somehow always destructive. Do you have if you were put on your psychology head? Why is it so appealing to imagine the ways that all of human civilizations will be destroyed?
SPEAKER_02
27:37 - 28:21
I think it's a good story. It's a good story. And very interestingly, it's mirrors. A rigid story. It's a rigid mythology. If you look at the mythology of most civilizations, it's about the world being headed towards some final events in which the world will be destroyed. And some new world order will arise that will be mostly spiritual, like the apocalypse followed by paradox probably. It's a very appealing story on a fundamental level and we all need stories. We all need stories to structure in the way we see the world, especially at timescales that are beyond our ability to make predictions.
SPEAKER_01
28:21 - 29:02
So, on a more serious non-exponential explosion, question, do you think there will be a time when we'll create something like human level intelligence or intelligence systems that will make you sit back and be just surprised at damn how smart this thing is? That doesn't require exponential growth or exponential improvement. But what's your sense in a timeline and so on? That's where you'll be really surprised at certain capabilities and we'll talk about limitations in deep learning. So when do you think in your lifetime you'll be really damn surprised?
SPEAKER_02
29:03 - 29:24
Around 20-30s, when you fall in, I was many times surprised by the capabilities of deep learning, actually. That was before we had assessed exactly where deep learning could do and could not do, and it felt like a time of immense potential. And then we started, you know, narrowing it down. But I was very surprised, so it's already happened.
SPEAKER_01
29:24 - 29:44
Was there a moment? There must have been a day in there. where your surprise was almost bordering on the belief of the narrative that we just discussed. Was there a moment, because you've written quite eloquently about the limits of deep learning? Was there a moment that you thought that maybe deep learning is limitless?
SPEAKER_02
29:47 - 30:55
No, I don't think I've ever believed this. What was really shocking is that it worked. They worked at all, yeah. But there's a big jump between being able to do really good computer vision and human level intelligence. So I don't think at any point I wasn't an impression that results we got in computer vision meant that whoever I close to human level intelligence. I don't think whoever I close to human level intelligence. I do believe that There's no reason why we want to achieve it at some point. I also believe that, you know, it's the problem with talking about human level intelligence, that implicitly you're considering like an axis of intelligence with different levels. But that's not really how intelligence works. Intelligence is very multidimensional. And so there's the question of capabilities, but there's also the question of being human-like. and two very different things. Like you can be potentially very advanced intelligent agents that are not human like at all. And you can also be very human-like agents and these are very two very different things. Right.
SPEAKER_01
30:55 - 31:16
Let's go from the philosophical to the practical. Can you give me a history of carous and all the major deep learning frameworks that you kind of remember in relation to carous and in general, TensorFlow, Theanol, the old days, Can you give a brief overview Wikipedia star history and you're rolling it before return to AGI discussions?
SPEAKER_02
31:16 - 32:00
Yeah, that's a broad topic. So I started working on Keras. It was a name Keras at the time. I actually picked the name like just the day I was going to release it. So I started working on it in February 2015. And so at the time, there weren't many people working on deep learning, maybe like a few edits and so on. The software tooling was not really developed. So the main deep learning library was Café. which was mostly C++. Why do you say café was the main one? Café was vastly more popular than Tiano in late 2014, early 2015. Café was the one library that everyone was using for computer vision.
SPEAKER_01
32:00 - 32:03
And computer vision was the most popular problem.
SPEAKER_02
32:03 - 34:42
Absolutely. Like, covenants was like the sub-file of depleting it, everyone was working on. So myself, so in late 2014, I was actually interested in RNNs, in regular neural networks, which was a very niche topic at the time, right? H3, H3, 2K from LN 2016. And so I was looking for good tools. I'd use Torch 7, I'd use Tiano, use Tiano a lot in Kaggle competitions. I'd use Cafe. And There was no like good solution for RN and so the time like there was no reusable open source implementation in LSTM for instance. So I decided to build my own and that first the pitch for that was it was going to be mostly around LSTM, Mercury on your networks. It was going to be in Python. an important decision at the time that was kind of not obvious is that the models would be defined via a Python code which was kind of like going against the mainstream in the time because cafe Thailand, two months on like all the big libraries, we're actually going with the approaching setting configuration files in Yamon to define models. So some libraries were using a code to define models. Like torch 7, I'm just leaving it. It was not Python. Lazzine was like a Tiano based very early library that was I think developed. I'm not sure exactly. Probably late 2014. Python as well. It's Python as well. It was like on top of Tiano. And so, I started working on something, and the value proposition at the time was that not only that the, what I think was the first reusable open source implementation of LSTM, you could combine RNNs and Covenets with the same library, which is not really possible before, like FAI was only doing Covenets. And it was kind of easy to use because so before I was using ten I was actually using cycling and I loved cycling and voice usability. So I drew a lot of inspiration from cycling when I met Keras. It's almost like cycling and fun your own networks. the fit function exactly the fit function like reducing a complex string loop to a single function call right and of course you know some people will say this is hiding a lot of details but that's exactly the point right the magic is the point right so it's magical but in a guild where it's magical in the sense that it's delightful yeah yeah
SPEAKER_01
34:43 - 35:11
I'm actually quite surprised. I didn't know that it was born out of desire to implement RNNs and LCMs. That's fascinating. So you were actually one of the first people to really try to attempt to get the major architecture together. And it's also interesting. I mean, realize that that was a design decision at all is defining the model in code. Just I'm putting myself in your shoes, whether the EMO, especially Cafe, was the most popular. It was the most popular I'd buy fall.
SPEAKER_02
35:11 - 35:12
If I was...
SPEAKER_01
35:13 - 35:27
If I were, yeah, I don't, it, I didn't like the yellow thing, but it makes it more sense that you'll put in a configuration file the definition of a model. That's an interesting gutsy move to stick with defining it in code.
SPEAKER_02
35:27 - 37:44
Just if you look back. Other libraries where we are doing it as well, but it's with definitely the Monish option. Yeah. Okay, Keras, and then. Keras, so I really scare us in March 2015. and it got users pretty much from the start. So the deep learning community was very, very small at the time. Lots of people were starting to be interested in the SCM. So it was going to release at the right time because it was offering an easy to use the SCM implementation. Exactly at the time where lots of people started to be intrigued by the capabilities of RNN. RNN is one LP. So it grew from there. Then I joined Google. about six months later, and that was actually completely unrelated to Keras, actually joined research team, working on image classification mostly, like computer vision. So I was doing computer vision research at Google initially. And immediately, when I was on Google, I was exposed to the early internal version of TensorFlow. And the way to build to me at the time, and that was definitely the way to the time, is that this was an improved version of Tiano. So I made this in you, I had to port carous to this new TensorFlow thing. And I was actually very busy as a new Googler. So I had not time to work on that, but then in November, I think to November 2015, TensorFlow got released. And it was kind of like my wake up call at, hey, it's actually, you know, go and make it happen. So in December, I, I, I pulled it. Carous to run onto a tens of flow, but it was not exactly ported with more like a refactoring, where I was abstracting away all the back end functionality into when module, so that the same code base could run on top of multiple back ends. Right. So I'll, I'll, I'll, I'll, I'll, I'll, I'll, I'll, I'll, I'll, I'll, I'll, I'll, I'll, I'll, in a state as the default option. It was, you know, it was easier to use some Atlas plugin. It was much faster, especially when it came to audience. But eventually, you know, a TensorFlow over took it. Right.
SPEAKER_01
37:44 - 38:01
And TensorFlow, the early TensorFlow, a similar architecture of the decisions as the arrow. Yeah. So what is there was a natural, as a natural transition? Yeah, absolutely. So what, I mean, that's still carries as a side, almost a fun project.
SPEAKER_02
38:01 - 39:34
Right. Yeah, so it was not my job assignment. It was not I was doing it on the side. So I'm and even though it grew to have, you know, a lot of users for deploying library at the time, like, throughout 2016, but I wasn't doing it as my main job. So things started changing in, I think it must have maybe October 2016, so one year later, So Rajat, who was the leader in TensorFlow, basically showed a point in our building where I was doing like, so I was doing research and things like, so I did a lot of computer vision research, also collaborations with Christian Ziggading and deep learning for the RAM proving. It was a really interesting research topic. And so, Roger was saying, hey, we so care us, we like it, we so that you're at Google, why don't you come over for like a quarter and work with us? And I was like, yeah, that sounds like a great opportunity. Let's do it. And so I started working on integrating the Keras API into TensorFlow more tightly. So what followed up is a sort of like temporary TensorFlow only version of Keras that was in TensorFlow.com trip for a while and finally moved to TensorFlow Core. And, you know, I've never actually gotten back to my old team during research.
SPEAKER_01
39:34 - 40:32
Well, it's kind of funny that somebody like you who dreams of or at least sees the power of AI systems that reason, and they're improving, we'll talk about, has also created a system that makes the most basic kind of Lego building that is deep learning, super accessible, super easy, so beautifully so. It's a funny irony that you're both, you're responsible for both things, but so TensorFlow 2.0 is kind of, there's a sprint, I don't know how long it'll take, but there's a sprint on the finish. What do you look? What are you working on these days? What are you excited about? What are you excited about in 2.0? I mean, eager execution. There's so many things that just make it a lot easier to work. What are you excited about? And what's also really hard? What are the problems you have to kind of solve?
SPEAKER_02
40:32 - 42:32
So I've spent the best year and a half working on TensorFlow 2. And it's been a long journey. I'm actually extremely excited about it. I think it's a great product. It's a delightful product, Competitence Flow 1. We've made a huge progress. So on the carousel side, what I'm really excited about is that so you know, previously carousel has been this very easy to use high level interface to the deep learning. But if you wanted to If you wanted a lot of flexibility, the Chaos framework was probably not the optimal way to do things compared to just writing everything from scratch. So in some way, the framework was getting in the way. And in TensorFlow 2, you don't have this at all, actually. You have the usability of the high level interface. But you have the flexibility of the slow level interface. And you have the spectrum of workflows where you can get more or less usability and flexibility a trade-offs depending on your needs, right? You can write everything from scratch and you get a lot of help doing so by, you know, subclassing models and writing, some train loops using ego execution. It's very flexible, it's very easy to debug, it's very powerful. But all of these integrates seamlessly with higher level features up to the classic care square flows, which are very psychedelic and ideal for a data scientist, the machine learning engineer type of profile. So now you can have the same framework. offering the same set of APIs that enable a spectrum of workflows that are more or less high-level, that are suitable for profiles ranging from researchers to data scientists than everything in between.
SPEAKER_01
42:32 - 43:24
Yeah, so that's super exciting. It's not just that. It's connected to all kinds of tooling. You can go on mobile and go with that for light. You can go in the cloud of serving and so on and all is connected together. Some of the best software written ever is often done by one person sometimes too. So it's with a Google, you're now seeing sort of carous having to be integrated in TensorFlow and sure has a ton of engineers working on. So in there's I'm sure or a lot of tricky design decisions to be made. How's that process usually happen from at least your perspective? What are the what are the debates like? What is there? a lot of thinking considering different options and so on.
SPEAKER_02
43:24 - 44:14
Yes. So a lot of the time, I spent at Google is actually discussing design discussions, right? Writing design docs, participating in design review meetings and so on. This is, you know, as important as actually writing a cool, right? Well, there's a lot of thoughts, there's a lot of thoughts and a lot of care that is taken in coming up with this decisions and taking into account all of our users because TensorFlow has this extremely diverse user base, right? It's not, it's not like just one user segment where everyone has the same needs. We have small scale production users, large scale production users. We have startups, we have researchers, you know, it's sort of a place. And we have to catch up to all of the needs.
SPEAKER_01
44:14 - 44:43
If I just look at the standards, debates of C++ or Python, there's some heated debates. Do you have those at Google? I mean, they're not heated in terms of emotionally, but there's probably multiple ways to do it right. So how do you arrive through those design meetings at the best way to do it? Especially in deep learning where the field is evolving. as you're doing it. Is there some magic to it? Is there some magic to the process?
SPEAKER_02
44:43 - 46:17
I don't know if just magic to the process but they think the ease of process. So Making design decision is about satisfying a set of constraints, but also trying to do so in the simplest way possible, because this is what can be maintained, this is what can be expanded in the future. So you don't want to naively satisfy the constraints by each capability. You need a variable. You're going to come up with one argument in your API and so on. You want to design APIs that are modular and hierarchical so that they have and an API surface that is as small as possible. And you want this modular hierarchical architecture to reflect the way that domain experts think about the problem. Because as a domain expert, when you're reading about a new API, you're reading a tutorial on some dark spaces. You already have a way that you're thinking about the problem. You already have certain concepts in mind and you're thinking about how they relate together. And when you're reading darks, you're trying to build as quickly as possible and mapping between the concepts. Featured in new API and the concepts in your mind. So you're trying to map your mental model as a domain expert to the way things work in the API. So you need an API and an underlying implementation that are reflecting the way people think about these things.
SPEAKER_01
46:17 - 46:20
So in Minmas in the time it takes them to the mapping.
SPEAKER_02
46:20 - 46:40
Yes, I mean, in the time the cognitive load there is in ingesting this new knowledge about your API. An API should not be self-referential or referring to implementation details. It should only be referring to domain-specific concepts that people already need to understand.
SPEAKER_01
46:40 - 46:47
Brilliant. So what's the future of Keras and TensorFlow look like? What does TensorFlow 3.0 look like?
SPEAKER_02
46:47 - 47:39
So that's going to form the future for me to answer, especially since I'm not even the one making these decisions. But so from my perspective, which is just one perspective among many different perspectives on the TensorFlow team, I'm really excited by developing even higher level APIs, higher level and chaos. I'm really excited by hyper parameter tuning by automated machine learning or to ML. I think the future is not just defining a model like US and being Lego blocks and then collect fit on it. It's more like an automagical model that would just look at your data and optimize the objective view after. So that's what I'm looking into.
SPEAKER_01
47:40 - 47:48
Yeah, so you put the baby into a room with the problem and come back a few hours later with the fully solved problem.
SPEAKER_02
47:48 - 47:58
Exactly. It's not like a box of Legos, right? It's more like the combination of a kid that's free gear that Legos. It's like a box of Legos. And it's just building the thing. I'm sorry.
SPEAKER_01
47:58 - 48:43
Very nice. So that's an exciting feature and I think there's a huge amount of applications and revolutions to be had. under the constraints of the discussion we previously had. But what do you think of the current limits of deep learning if we look specifically at these function approximators that tries to generalize from data? If you've talked about local versus extreme generalization, You mentioned that neural networks don't generalize well humans do so there's this gap So and you've also mentioned that it's generalization extreme generalization requires something like reasoning to fill those gaps So how can we start trying to build systems like that?
SPEAKER_02
48:43 - 51:13
All right, yeah, so this is this is by design right deep learning models on a huge parametric models differentiable, so continuous, that go from an input space to an output space. And they're trained with gradient descent. So they're trained pretty much point by point. They're learning a continuous geometric morphing from an input vector space to an output vector space. Because this is done point by point, a deep neural network can only make sense of points in explain space that are very close to things that it has already seen in string data at best, it can do interpolation across points. But that means in order to train your network, you need a dense sampling of the input cross output space. almost a point by point sampling, which can be very expensive if you're dealing with complex real world problems like autonomous driving, for instance, or robotics. It's doable if you're looking at the subset of the visual space, but even then it's still fairly expensive, you're still in millions of examples. And it's only going to be able to make sense of things that are very close to ways that seem before. And in contrast to that, well, of course, we have human intelligence, but even if you're not looking at human intelligence, you can look at very simple rules, algorithms. If you have a symbolic rule, it can actually apply to a very, very large set of inputs because it is abstract. It is not obtained by doing a point-by-point mapping. For instance, if you try to learn a sorting algorithm using a deep neural network, well, you're very much limited to learning point-by-point. what the sort of representation of this specific list is like. But instead, you could have a very, very simple sorting algorithm, written in a few lines, maybe it's just, you know, two nested loops. And it can process any list at all, because it is abstract, because it is a set of folds. So deep learning is really like point-by-point geometric morphings, morphings, train-risk, and descent. And meanwhile, abstract rules can generalize much better. And I think the future needs to combine the two.
SPEAKER_01
51:13 - 51:37
So how do we do you think combine the two? How do we combine good point-by-point functions with programs, which is what the symbolic AI type systems, which levels the combination happen. Obviously, we're jumping into the realm of where there's no good answers. It's just kind of ideas and intuitions and so on.
SPEAKER_02
51:37 - 52:48
Well, if you look at the really successful AI systems today, I think they're already hybrid systems that are combining symbolic AI with deep learning. For instance, successful robotics systems are already mostly model-based through-based. things like planning algorithms and so on. At the same time, they're using deep learning as perception modules. Sometimes they're using deep learning as a way to inject fuzzy intuition into a rule-based process. If you look at the system, I can sell driving car. It's not just one big end to a neural network, you know, that wouldn't work at all precisely because in order to train that you need a dense sampling of expansion space when it comes to driving, which is completely unrealistic, obviously. Instead, the cell driving car is mostly symbolic, you know, it's software, it's programmed by hand. So it's mostly based on explicit models in this case, mostly streety models of the environment around the car. But it's interfacing with the real world using deep learning modules, right?
SPEAKER_01
52:48 - 53:05
So the deep learning there serves the way to convert the raw sensory information to something usable by symbolic systems. Okay, well, let's link around that a little more. So dense sampling from input to output. You said it's obviously very difficult.
SPEAKER_02
53:05 - 53:10
Is it possible? Indicates of centraving human? Let's say self-driving.
SPEAKER_01
53:10 - 53:20
Self-driving for many people. Let's not even talk about self-driving. Let's talk about steering. So staying inside the lane.
SPEAKER_02
53:22 - 53:27
Lane's following. Yeah, it's definitely a problem. You can slowly use an end to end the planning. That's like one small subset.
SPEAKER_01
53:27 - 53:41
It's not a second here. I don't know how you're jumping from the extremes so easily because I disagree with you on that. I think, well, it's not obvious to me that you can solve Lane following.
SPEAKER_02
53:41 - 54:07
No, it's not obvious. I think it's doable. I think in general, you know, there is no hard limitations to what you can learn with a deep neural network as long as The search space is rich enough, it's flexible enough, and as long as you have this dense sampling of the input cross output space, the problem is that this dense sampling could mean anything from 10,000 examples to like 3D and 3D.
SPEAKER_01
54:09 - 54:37
So that's my question. So what's your intuition? And if you could just give it a chance and think what kind of problems can be solved by getting a huge amounts of data and thereby creating a dense mapping. So let's think about natural language dialogue, the touring test. Do you think the touring test can be solved with a neural network alone?
SPEAKER_02
54:38 - 54:59
While the deterrent test is all about tricking people into believing they're talking to human, and I don't think that's actually very difficult because it's more about exploiting human perception and not so much about intelligence. There's a big difference between mimicking, intention, behavior and actual intention, behavior.
SPEAKER_01
54:59 - 55:31
So, okay, let's look at maybe the Alexa Prize and so on, the different formulations of the natural language conversation that are less about mimicking and more about maintaining a fun conversation that lasts for 20 minutes. That's a little less about mimicking and that's more about I mean, it's still mimicking, but it's more about being able to carry forward a conversation with all the tangents that happen in dialogue and so on. Do you think that problem is learnable? with this kind of neural network that does the point to point mapping.
SPEAKER_02
55:31 - 55:40
So I think it would be very, very challenging to do this with deep learning. I don't think it's out of the question either. I don't read that.
SPEAKER_01
55:40 - 55:49
The space of problems that could be solved or the large neural network. What's your sense about the space of those problems? So it's useful problems for us.
SPEAKER_02
55:49 - 56:19
In theory, it's infinite. It can solve any problem. In practice, Well, deep learning is a great fit for perception problems. Any problem, which is not really amenable to explicit and crafted rules, or rules that you can generate by excessive search over some program space. So perception, artificial intuition, as long as you have a sufficient training there.
SPEAKER_01
56:20 - 56:46
And that's the question. I mean, perception, there's interpretation of understanding of the scene, which seems to be outside the reach of current perception systems. So do you think large networks will be able to start to understand the physics and the physics of the scene, the three-dimensional structure and relationships of advisors in the scene and so on? Or really, that's where some ball of gas has to step in.
SPEAKER_02
56:46 - 57:11
Well, it's always possible To solve these problems with deep learning is just extremely inefficient. A model would be an explicit rule based abstract model would be a far better, more compressed representation of physics. Then learning just is mapping between in this situation, this thing happens if you change the situation like slightly than this other thing happens and so on.
SPEAKER_01
57:11 - 57:39
Do you think it's possible to automatically generate the programs that would require that kind of reasoning? Or does it have to? So the word expersystems failed. There's so many facts about the world had to be encoded in. I think it's possible to learn those logical statements that are true about the world and their relationships. I mean, that's kind of what theorem proving at a basic level is trying to do, right?
SPEAKER_02
57:39 - 59:19
Yeah, except it's much harder to familiar statements about the world compared to familiar thing mathematical statements. Statements about the world tend to be subjective. So can you learn rule-based models? Yes, definitely. That's the philosophy of program synthesis. However, today, we just don't really know how to do it. So it's very much a grad search or research problem. And so we are limited to, you know, the sort of a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a Well, if it were a deep planning, this is like the 90s. So meaning that we already have existing solutions. We are starting to have some basic understanding of what this is about, but it's still a fill that is in its infancy. There are very few people working on it. There are very few real world applications. So the one real application I'm aware of is A flash fail in next time. It's a way to automatically learn very simple programs to format science in an Excel spreadsheet from a few examples. For instance, learning it way to format a date, things like that.
SPEAKER_01
59:19 - 59:48
Oh, that's fascinating. Yeah. You know, okay, that's the fascinating topic. I was wondering when I provided a few samples to excel. What is able to figure out? Like just giving it a few dates. What are you able to figure out from the pattern I just gave you? That's the fascinating question. It's fascinating whether that's learnable. the patterns, and you're saying they're working on that. Yeah. I hope because of toolbox currently. Are we completely in the dark? So if you said that in terms of progress this?
SPEAKER_02
59:48 - 01:00:07
No, so I would say. So maybe 90s is even too optimistic, because by the 90s, we already understood that product. We already understood the engine of deep learning even though we couldn't really see its potential quite. Today, I don't think we've found the engine of problems into this.
SPEAKER_01
01:00:07 - 01:00:09
So we're in the winter before backprop.
SPEAKER_02
01:00:09 - 01:00:51
Yeah, in a way, yes. So I do believe program synthesis. In general, the discrete search of a root-based models is going to be a cornerstone of a research in the next century. And that doesn't mean we're going to drop deep learning, deep learning is immensely useful, like Being able to learn is a very flexible adaptable, parametric model that's actually meant to be useful. All it's doing is patterned cognition, but being good at patterned cognition given lots of data is just extremely powerful. We are still going to be working on the planning and we're going to be working on programs and this is going to be combining the two increasingly automated ways.
SPEAKER_01
01:00:53 - 01:01:52
So let's talk a little about data. You've tweeted about 10,000 deep learning papers I've been written about hard coding priors about a specific task and a neural network architecture works better than a lack of a prior. Basically, summarizing all these efforts, they put a name to an architecture, but really what they're doing is hard coding some priors that improve the process. But we just get straight to the point, it's probably true. So you say that you can always buy performance by, in quotes, performance by either training on more data, better data, or by injecting task information to the architecture of the pre-processing. However, this is an informative about the generalization power that techniques use, the fundamental ability generalize. Do you think we can go far by coming up with better methods for this kind of cheating, for better methods of large-scale adaptation of data, so building better priorities?
SPEAKER_02
01:01:52 - 01:01:54
If you've made it, it's not cheating anymore.
SPEAKER_01
01:01:54 - 01:02:13
Right. I'm joking about the cheating, but large-scale, so basically I'm asking about something that hasn't, for my perspective, been researched too much is exponential improvement in annotation of data. Yeah.
SPEAKER_02
01:02:13 - 01:02:37
Do you often think about, I think it's actually been, I'm being researched quite a bit. You just don't see publications about it's because, you know, People who publish papers are going to publish, but no one benchmarks, sometimes they're going to read the new benchmark. People who actually have real world large-scale deepening problems, they're going to spend a lot of resources into data annotation and get data annotation pipelines, but you don't see any papers about it. That's interesting.
SPEAKER_01
01:02:37 - 01:02:42
So do you think the certainly resources, but do you think there's innovation happening?
SPEAKER_02
01:02:42 - 01:04:34
Oh yeah, that's true. To clarify, at the point in the twist. Machine learning in general is the science of generalization. You want to generate knowledge that can be reused across different datasets, across different tasks. And if instead you're looking at one datasets, and then you are hard-coding knowledge about this task into your architecture. This is no more useful than training in network and then saying, oh, I found this weight values perform well, right? So David Ha, I don't know if you know David, yeah, the paper the other day about weight, agnostic neural networks. And this is very interesting paper because it really illustrates the fact that an architecture, even with that weight, architecture is knowledge about the task. It includes knowledge. And when it comes to architectures that are uncrafted by researchers, In some cases, it is very very clear that all they are doing is artificially re-encoding the templates that corresponds to the proper way to solve task encoding given data set. I know if you've looked at the baby data set, which is about natural language question and so on. It is generated by an algorithm, so this is a question on their pairs, the generated by an algorithm, the algorithm is showing a certain template. Turns out, if you craft a network that literally echoes this template, you can solve this data set with nearly 100% accuracy. But that doesn't actually tell you anything about how to solve question answering in general, which is the point.
SPEAKER_01
01:04:34 - 01:05:08
The question is just the linger on it, whether it's from the data side, from the size of the network. I don't know if you read the blog post by very sudden, the bitter lesson. Yeah, the biggest lesson that we can read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective. So as opposed to figuring out methods that can generalize effectively, do you think we can get pretty far by just having something that leverages computation, then the improvement of computation.
SPEAKER_02
01:05:08 - 01:07:27
Yeah, so I think rich is making very good points, which is that a lot of these papers, which are actually all about manually hot-coiding prior knowledge about the task into some system, doesn't have to be deeply architected into some system, right? You know, these papers are not actually making any impact. Instead, what's making really long-term impact is very simple, very general systems that are really agnostic to all these tricks, because these tricks do not generalize. And of course, the one general and simple thing that you should focus on is that which leverage is computation, because computation the availability of a large-scale computation has been increasing exponentially for more so if you algorithm is all about exploring this, then you algorithm is suddenly exponentially improving. So I think rich is definitely right. However, you know, is right about the past 70 years, is like assessing the past 70 years. I am not sure that this assessment will still hold true for the next 70 years. It's might to some extent, I suspect it will not, because The truth of this assessment is a function of the context, in which this research took place. And the context is changing, like most, though, might not be applicable anymore, for instance, in the future. And I do believe that when you, when you, when you, when each week when aspect of the system, when you exploit one aspect of the system, some other aspect of self becoming the bottleneck, Let's say you have unlimited computation. Well, then data is the bottleneck. And I think we're already starting to be in a regime where our systems are so large in scale and so data and greater data today and the quality of data and the scale of data is the bottleneck. And in this environment, the bitter lesson from rich, it's not going to be true anymore, right? So I think we are going to move from a focus on scale of a competition scale to focus on data efficiency.
SPEAKER_01
01:07:27 - 01:07:48
Data efficiency. So that's getting to the question of symbolic AI, but the linger on the deep learning approaches. Do you have hope for either unsupervised learning or reinforcement learning, which are ways of being more data efficient in terms of the amount of data they need that required human annotation.
SPEAKER_02
01:07:48 - 01:08:04
So, in supervised learning and reinforcement learning are frameworks for learning, but they are not like any specific technique. So usually when people say reinforcement learning, but they really mean it's deep reinforcement learning, which is like one approach which is actually very questionable.
SPEAKER_01
01:08:04 - 01:08:11
The question I was asking was unsupervised learning with deep neural networks and deep reinforcement learning.
SPEAKER_02
01:08:11 - 01:08:54
Well, these are not really data efficient because you're still leveraging, you know, this huge parametric model is trained point by point with gradient descent. It is more efficient in terms of the number of annotations, the density of annotations you need. So the AD being to learn the latent space around which the data is organized and then map the sparsal annotations into it. And sure, I mean, that's clearly a very good idea. It's not really topic I would be working on, but it's clearly good idea. So it would get us to solve some problems that it will get us to incremental improvements in labeled data efficiency.
SPEAKER_01
01:08:54 - 01:09:04
Do you have concerns about short term or long term threats from AI from artificial intelligence?
SPEAKER_02
01:09:04 - 01:09:07
Yes, definitely to some extent.
SPEAKER_01
01:09:07 - 01:09:09
And what's the shape of those concerns?
SPEAKER_02
01:09:09 - 01:11:35
This is actually something I've briefly written about, but the capabilities of deep learning technology can be used in many ways that are concerning from mass surveillance with things like facial recognition, internal tracking lots of data about everyone, and then being able to make sense of this data to do identification, to do prediction. That's concerning, that's something that's being very aggressively pursued by totalitarian states like China. One thing I am very much concerned about is that You know, our lives are increasingly online, are increasingly digital, made of information, made of information consumption and information production, our digital footprint, I would say. And if you absorb all of this data and you are in control of where you consume information, you know, social networks and so on, Recommendation engines. Then you can build a sort of reinforcement loop for human behavior. You can observe the state of your mind at time t. You can predict how you would react to different pieces of content. how to get you to move your mind in a certain direction. Then you can feed the specific piece of contents that will move you in a specific direction. And you can do this at scale. You know, at scale in terms of doing it continuously in your time, you can also do it at scale in terms of scaling these too many, many people to enter applications. So potentially artificial intelligence, even in its current state, if you combine it with the internet, with the fact that we have All of our lives are moving to digital devices and digital information consumption and creation. What you get is the possibility to achieve mass manipulation of behavior and mass psychological control. And this is a very real possibility.
SPEAKER_01
01:11:35 - 01:12:13
Yeah, so you're talking about any kind of recommended system. Let's look at YouTube algorithm Facebook anything that recommends content. You should watch next and it's fascinating to think that there's some aspects of human behavior that you can you know say a problem of is this person hold Republican beliefs or Democratic beliefs and this is a trivial uh that's an objective function and you can optimize and you can measure and you can turn everybody into a Republican or everybody can take them. Absolutely.
SPEAKER_02
01:12:13 - 01:15:41
That had to be the visual. So the human mind is is very If you look at the human mind as a kind of computer program, it has a very large exploit interface, right? It has many, many, many abilities. Where do you control it? Where do you control it? For instance, when it comes to your political beliefs, this is very much tied to your identity. So for instance, if I'm in control of your news feed on your favorite social media platforms, this is actually where you're getting your news from. And of course, I can choose to only show you news that will make you see the world in a specific way. But I can also create incentives for you to post about some political beliefs. And then when I get you to express a statement, if it's a statement that me as a controller, I want to reinforce. I can just show it to people who will agree and they will like it. And that will reinforce the statement in your mind. If this statement, I want you to abandon, I can, on the other hand, show it to opponents, right? We'll attack you. And then because they attack you at the very least, next time you will think twice about posting it. But maybe you will even start believing this because you got pushback. Right. So there are many ways in which social media platforms can potentially control your opinions. And today, so all of these things are already being controlled by AI algorithms. The algorithms do not have any explicit political goal today. Well, potentially they could, like if some totalitarian government takes over, you know, social media platforms and decides that, you know, now we're going to use these nudges for mass surveillance, but also for mass opinion, control and behavior control. Very bad things could happen. But what's really fascinating and actually quite concerning is that even with that an explicit intent to manipulate, you're already saying very dangerous dynamics in terms of has this content recommendation algorithms behave. Because right now, the goal, the objective function of the algorithms, is to maximize engagement, right? Which seems very innocuous at first. However, it is not because content that will maximally engage people to react in emotional way and people to click on something. It is very often content that is not healthy to the public discourse. For instance, fake news are far more likely to get you to click on them. than a real new simply because they are not constrained to reality so they can be as a threat as as as as as surprising as as good stories as you want because the artificial right yeah to me that's an exciting world because so much good can come so there's an opportunity to educate people you can
SPEAKER_01
01:15:43 - 01:16:47
balance people's worldview with other ideas. So there's so many objective functions. The space of objective functions that create better civilizations is large, arguably infinite. But there's also a large space that creates division and destruction of a war, a lot of bad stuff. And the worry is naturally, probably that space is bigger, first of all. And if we don't explicitly think about what kind of effects are going to be observed from different objective functions, then we're going to get into trouble. But the question is, how do we How do we get into rooms and have discussions? So inside Google, inside Facebook, inside Twitter, and think about, okay, how can we drive up engagement and at the same time create a good society? Is it even possible to have that kind of philosophical discussion?
SPEAKER_02
01:16:48 - 01:18:58
I think again, if you try, so from my perspective, I would feel rather uncomfortable with companies that are in control of these news field algorithms, with them making explicit decisions. to manipulate people's opinions or behaviors, even if the intent is good, because that's a very totalitarian mindset. So instead, what I would like to see is probably never going to happen, because it's not very realistic, but that's actually something I really care about. I would like all these algorithms to present configuration settings to their users, so that the users can actually make the decision about how they want to be impacted, biases, information, recommendation, content, recommendation algorithms. For instance, as a user of something like YouTube about Twitter, maybe I want to maximize learning about a specific topic. So I want the algorithm to feed my curiosity, which is in itself a very interesting problem. So instead of maximizing my engagement, it will maximize how fast and how much I'm learning. And it will also take into account your curiosity, hopefully, you know, of the information I'm learning. So, yeah, the user should be able to determine exactly how the algorithms are affecting their lives. I don't want actually any entity making decisions about in which direction they're going to try to manipulate me, right? I want technology. So AI, these algorithms are increasingly going to be our interface to a world that is increasingly made of information, right? And I want everyone to be in control of this interface to interface with the world on their own terms. So if someone wants these algorithms to serve you know, their own personal growth goals, this should be a goal to continue this algorithm since it's a way.
SPEAKER_01
01:18:58 - 01:20:21
Yeah, but so I know it's painful to have explicit decisions, but there is underlying explicit decisions, which is some of the most beautiful fundamental philosophy that we have before us, which is personal growth. If I want to watch videos from which I can learn, what does that mean? So if I have a checkbox that wants to emphasize and learning, they're still in algorithm with explicit decisions in it that would promote learning. What does that mean for me? Like for example, I've watched a documentary on flatter theory, I guess. It was very, like, I learned a lot. I really glad I watched it. It was a friend recommended to me, not because I don't have such an allergic reaction to crazy people as my fellow colleagues do, but it was very, it was very eye-opening. And for others, it might not be. From others, they might just get turned off from the same with the Republican Democrat at what It's a non-trivial problem. First of all, if it's done well, I don't think it's something that wouldn't happen that the YouTube wouldn't be promoting or Twitter wouldn't be. It's just a really difficult problem, how to give people control.
SPEAKER_02
01:20:22 - 01:20:52
Well, it's mostly an interface design problem. Right. The way I see it, you want to create technology that's like a mentor or a coach or an assistant, so that it's not your boss, right? You are in control of it. You are telling it's what to do for you. And if you feel like it's manipulating you, it's not actually, it's not actually doing what you want. You should be able to switch to a different algorithm, right?
SPEAKER_01
01:20:52 - 01:21:43
So that's fine to control. You kind of learn the interesting human collaboration. I mean, that's how I see a ton of vehicles to is giving as much information as possible and you learn that dance yourself. Yeah, Adobe, I don't know if you use Adobe product like Photoshop. They're trying to see if they can inject YouTube into their interface, but basically allow you to show you all these videos that's, because everybody's confused about what to do with features. So basically teach people by linking to, in that way, it's an assistant that shows, uses videos as a basic element of information. Okay, so what practically, or should people do to try to fight against abuses of these algorithms, or algorithms that manipulate us?
SPEAKER_02
01:21:43 - 01:23:32
Honestly, it's a very difficult problem because the solid waste is very, it's all public awareness of these issues. There are a few people who think there's anything wrong with their newsletter algorithm. Even though there is actually something wrong already, which is that it's trying to maximize engagement because of the time, which has a very negative side effects. So ideally, the very first thing is to stop trying to purely maximize engagement, try to propagate content based on popularity, right, instead taking into account the goals and the profiles of each user. So you will be one example is for instance, when I look at topic recommendations on Twitter, like you know, they have this news tab with switch recommendations. It's always the worst coverage because it's content that appeals to them. The smallest command denominator to all Twitter users, because they're trying to optimize the purely trying to obtain us popularity, the purely trying to optimize engagement, but that's not what I want. So they should put me in control of some setting so that I define where's the objective function. that Twitter is going to be following to show me this kind. Honestly, so this is all about interface design. It's not realistic to give users control of a bunch of knobs that define algorithm. Instead, we should purely put them in charge of defining the object function. Let the user tell us what they want to achieve, how they want this algorithm to impact their lives.
SPEAKER_01
01:23:32 - 01:23:41
So do you think it's that? Or do they provide individual article by article rewards structure where you give a signal? I'm glad I saw this and I'm glad I didn't.
SPEAKER_02
01:23:41 - 01:24:11
So I like a Spotify type. Yeah. Feedback mechanism. It works to some extent. I'm kind of skeptical about it because the only way the algorithm, the algorithm, we attempt to relate your choices with the choices of everyone else, which might, you know, if you have an average profile that works fine, I'm sure Spotify accommodations work fine, if you just like mainstream stuff, if you don't, It can be, it's not optimal at all actually.
SPEAKER_01
01:24:11 - 01:24:17
It'll be in an efficient search for the part of the Spotify world that represents you.
SPEAKER_02
01:24:17 - 01:24:31
So it's a tough problem, but do note that even a feedback system like what Spotify has does not give me control over what the algorithm is trying to optimize for.
SPEAKER_01
01:24:33 - 01:24:45
Well, public awareness, which is what we're doing now, is a good place to start. Do you have concerns about long-term existential threats of artificial intelligence?
SPEAKER_02
01:24:45 - 01:25:21
Well, As I was saying, our world is increasingly made of information. AI algorithms are increasingly going to be our interface to this world of information. And somebody will be in control of these algorithms. And that puts us in any kind of bad situation, right? It has risks. It has risks coming from potentially large companies wanting to optimize their own goals, maybe profit, maybe something else. Also from governments, we might want to use these algorithms as a means of control of the population.
SPEAKER_01
01:25:21 - 01:25:24
Do you think there's a substantial threat that could arise from that?
SPEAKER_02
01:25:24 - 01:25:33
So kind of existential threat. So maybe you're referring to the singularity narrative where robots just take over.
SPEAKER_01
01:25:33 - 01:26:00
Well, I don't not terminate a robots and I don't believe it has to be a singularity. We're just talking to just like you said the algorithm controlling masses of populations. The existential threat being would hurt ourselves much like a nuclear war, would hurt ourselves. That kind of thing. I don't think that requires a singularity that requires a loss of control over AI algorithm.
SPEAKER_02
01:26:00 - 01:27:49
Yes. So I do agree, they are concerning trends. Honestly, I wouldn't want to make any long-term predictions. I don't think today we would have the capability to see what the dangers of AI are going to be in 50 years in 100 years. I do see that we are already faced with concrete and present dentures surrounding the negative side effects of content recommendation systems of new swedylgrisms concerning algorithmic bias as well. So we are delegating more and more decision processes to algorithms. Some of these algorithms are uncrafted some are learned from data, but we are delegating control. Sometimes it's a good thing, sometimes not so much. And there is in general very little supervision of this process, right? So we were still in this period of very fast change, even chaos, where society is restructuring itself, turning into an information society, which itself is turning into an increasingly automated information passing society. And well, yeah, I think the best we can do today is try to raise awareness around some of these issues. And I think we are actually making good progress if you look at algorithmic bias, for instance, three years ago, even two years ago, very, very few people were talking about it. And now, all the big companies are talking about it. There are often not in very serious, but at least it is part of the public discourse. You see people in Congress talking about it. So, and it all started from raising awareness.
SPEAKER_01
01:27:49 - 01:28:17
So in terms of alignment problem, trying to teach as we allow algorithms, just even recommend their systems on Twitter, encoding human values and morals decisions that touch on ethics, how hard do you think that problem is? How do we have lost functions in neural networks that have some components, some fuzzy components of human morals?
SPEAKER_02
01:28:19 - 01:28:57
Well, I think this is really all about objective function engineering, which is probably going to be increasingly a topic of concern in the future. Like for now, we are just using very naive last functions because the hard part is not actually what you're trying to minimize it's everything else. But as the everything else is going to be increasingly automated, we're going to be focusing our human attention on increasingly high level components like what's actually driving the whole learning system like the objective function. So the last function engineering is going to be the last function engineer is probably going to be a job title in the future, you know.
SPEAKER_01
01:28:57 - 01:29:09
And then the tooling you're creating with Keras essentially takes care of all the details underneath and basically the human expert is needed for exactly that.
SPEAKER_02
01:29:09 - 01:29:28
Let's get to the engineer. Keras is the interface between the data you're collecting. and the business goals. And your job as an engineer is going to be to express your business goals and your understanding of your business or your product, your system, as a kind of loss function or a kind of set of constraints.
SPEAKER_01
01:29:28 - 01:29:36
Does the possibility of creating an AGI system excite you or scare you or bore you?
SPEAKER_02
01:29:36 - 01:30:06
So, intelligence can never really be general. You know, at best it can have some degree of generality, like human intelligence. It's also always as some specialization in the same way that human intelligence is specialized in a certain category of problems is specialized in the human experience. And when people talk about HDR, I'm never quite sure if they're talking about very, very smart AI, so smart that it's even smarter than humans, or they're talking about human-like intelligence because it is our different things.
SPEAKER_01
01:30:06 - 01:30:25
Let's say presumably I'm impressing you today with my humaneness. So imagine that I was in fact a robot. So what does that mean? I'm impressing you with natural language processing, maybe if you weren't able to see me, maybe this is a phone call.
SPEAKER_02
01:30:25 - 01:31:59
That's okay, so companion. So that's very much about building human-like AI. And you're asking me, you know, is this an exciting perspective? Yes. I think so, yes. Not so much because of what artificial human-like intentions could do, but from an intellectual perspective, I think, if you could build truly human-like intelligence, that means you could actually understand human intelligence, which is fascinating. Human-like intelligence is going to require emotions, it's going to require consciousness, which is not things that would normally be required by an intelligence. system. If you look at, you know, we were mentioning earlier on science as a superhuman problem solving agent or system. It does not have consciousness in the emotions. In general, so emotions I see consciousness as being on the sense spectrum as emotions. It is a component of the subjective experience that is meant very much to guide behavior generation, right? It's meant to guide your behavior. In general, Human intelligence and animal intelligence has evolved for the purpose of behavior generation, including in social context. So that's why we actually need emotions. That's why we need consciousness. An artificial intelligence system developed in different contexts may well never need them, may well never be conscious. That's science.
SPEAKER_01
01:32:01 - 01:32:12
On that point, I would argue it's possible to imagine that there's echoes of consciousness in science when viewed as an organism. That science is consciousness.
SPEAKER_02
01:32:12 - 01:32:22
So, I mean, how would you go about testing this hypothesis? How do you probe the subjective experience of an abstract system like science?
SPEAKER_01
01:32:24 - 01:32:37
Well, the point of probably any subjective experience is impossible, because I'm not science, I'm Lex. So I can't probe another entities, another, it's no more than bacteria on my skin.
SPEAKER_02
01:32:37 - 01:32:44
You relax. I can ask you questions about your subjects' experience and you can answer me. And that's how I know your conscious.
SPEAKER_01
01:32:45 - 01:32:52
Yes, but that's because we speak the same language. You perhaps, we have to speak the language of science in order to ask.
SPEAKER_02
01:32:52 - 01:33:10
I don't think consciousness, just like emotions of pain and pleasure, is not something that inevitably arises from any sort of sufficiently intelligent information processing. It is a feature of the mind, and if you've not implemented it explicitly, it is not there.
SPEAKER_01
01:33:11 - 01:33:16
So you think it's an immersion feature of a particular architecture.
SPEAKER_02
01:33:16 - 01:34:04
So do you think it's a feature in the Simpsons? So again, the subjecting experience is all about guiding behavior. If the problems you're trying to solve, don't really involve embedded agents, maybe in a social context, generating B-view and pursuing goals like this. And if you get sense, it's not sure what's happening, even though it is. It is a form of artificial AI in artificial intelligence in the sense that it is solving problems, it is a community knowledge, a solution, and so on. So if you're not explicitly implementing a subjective experience, implementing certain emotions and implementing consciousness, it's not going to just spontaneously emerge.
SPEAKER_01
01:34:04 - 01:34:11
Yeah, but so for a system like human-like intelligence system that has consciousness, do you think you need to have a body?
SPEAKER_02
01:34:13 - 01:34:20
Yes, definitely. I mean, there's enough to be a physical body, right? And there's not that much difference between a realistic submission or your world.
SPEAKER_01
01:34:20 - 01:34:23
So there has to be something you have to preserve kind of thing.
SPEAKER_02
01:34:23 - 01:34:29
Yes, but human lock intelligence can only arise in a human like context.
SPEAKER_01
01:34:29 - 01:35:13
Intelligence in other humans in order for you to demonstrate that you have human like intelligence, essentially. Yes. So what kind of test and demonstration would be sufficient for you to demonstrate human-like intelligence? Yeah. I just started curiosity. You talked about in terms of theorem proving and program synthesis. I think you've written about that there's no good benchmarks for this. Yeah. That's one of the problems. So let's talk programs, programs synthesis. So what do you imagine as a good I think it's related questions for human-like intelligence and for programs that this is. What's a good benchmark for either of both?
SPEAKER_02
01:35:13 - 01:36:51
Right. So, I mean, you're actually asking asking two questions, which is one is about quantifying intelligence and comparing the intelligence of an artificial system to the intelligence for human. And the other is about to agree to which the intelligence is human-like. It's actually to different questions. So if you look, you mentioned earlier the Turing test. Well, I actually don't like the Turing test because it's very lazy. It's sort of about completely bypassing the problem of defining and measuring intelligence. And instead, delegating to a human judge or a panel of human judges. So it's a tool, a cookout. If you want to measure how human-like a nation is, I think you have to make it interact with other humans. Maybe it's not necessarily a good idea to have these other humans be the judges. Maybe you should just observe the fear and compare it to where the human would actually have done. When it comes to measuring how smart our clever an agent is and comparing that to the degree of human intelligence. So we're already talking about two things, right? The degree, kind of like the magnitude of an intelligence and its direction, right? Like the norm of the vector and its direction. And the direction is like human likeness and the magnitude the norm is interactions. You could call it interactions.
SPEAKER_01
01:36:51 - 01:36:59
So the direction here is the space of directions that are human like a very narrow.
SPEAKER_02
01:36:59 - 01:38:30
So the way you would measure the magnitude of intelligence in a system in a way that also enables you to compare it. to that of a human. Well, if you look at different benchmarks for intelligence today, they're all too focused on skill at a given task. Let's scale it playing chess. Yeah, scale it playing goals, scale it playing Doda. And I think that's not the right way to go about it because you can always be to human at one specific task. The reason why our skill at playing go or juggling or anything is impressive is because we're expressing this skill within a certain set of constraints. If you remove the constraints, the constraints that we have one lifetime that we have this body. And so on, if you remove the context, if you have unlimited string data, if you can add access to, you know, for instance, if you look at juggling, if you have no restriction on the hardware, then achieving arbitrary levels of skill. is not very interesting and says nothing about the amount of intelligence you've achieved. So if you want to measure intelligence, you need to rigorously define what intelligence is, which in itself, you know, it's a very challenging problem. And do you think that's possible? If you define intelligence, yes, absolutely. I mean, you can provide many people have provided in a some definition. I have my own definition.
SPEAKER_01
01:38:30 - 01:38:32
Where does your definition begin if it doesn't end?
SPEAKER_02
01:38:33 - 01:39:49
Well, I think intelligence is essentially the efficiency with which you turn experience into generalize the world programs. So what that means it's the efficiency with which you turn a sampling of experience space into the ability to process a larger chunk of experience space. So measuring skill can be one proxy, because many, many different tasks can be one proxy for a measuring intelligence, but If you want to only measure scale, you should control for two things. You should control for the amount of experience that your system has and the priors that your system has. But if you control, if you look at two agents and you give them the same priors and you give them the same amount of experience, there is one of the agents that is going to learn programs, representation, something, a model. that will perform well on the larger chunk of expand space than the other. And that is the smaller agent.
SPEAKER_01
01:39:49 - 01:39:59
So if you have fixed the experience which generate better programs, better meaning more generalizable, that's really interesting. That's a very nice clean definition of
SPEAKER_02
01:39:59 - 01:40:29
By the way, in this definition, it is already very obvious that intelligence has to be specialized because you're talking about experience space. You're talking about segments of experience space. You're talking about priors and you're talking about experience. All of these things define the context in which intelligence emerges. And you can never look at the totality of expand space, right? So, intelligence has to be specialized.
SPEAKER_01
01:40:29 - 01:40:41
And what it can be sufficiently large, the experience space, even though specialized is a certain point when the experience space is large enough to where it might as well be general. It feels general. It looks general.
SPEAKER_02
01:40:41 - 01:41:19
So, I mean, it's very relative, like for instance, many people would say human intelligence is general. In fact, it is quite specialized. You know, we can definitely build systems that start from the same innate priors as which humans have at birth, because we already understand fairly well what's of the priors we have as humans. Like many people have worked on this problem, most notably, Elizabeth Spelka from Harvard, and I actually know her. This worked out on which it calls a cool knowledge. And it is very much about trying to determine and describe what price we are born with.
SPEAKER_01
01:41:19 - 01:41:24
Like language skills and stuff.
SPEAKER_02
01:41:24 - 01:42:23
So we have some pretty good understanding of what price we are born with. So I've actually been working on a benchmark. For the past couple years, you know, on our thing, hope to be able to release it at some point. The idea is to measure the intentions of systems by controlling for priors, controlling for a month of experience and by assuming the same priors as with humans are born with so that you can actually compare this course. If you human intelligence, then you can actually have humans pass the same test in a way that's fair. And so, importantly, there's a benchmark. It should be such that any amount of practicing does not increase your score. So, try to picture a game where no matter how much you play this game, that does not change your skill at the game. Can you picture that?
SPEAKER_01
01:42:25 - 01:42:30
as a person who deeply appreciates practice, I cannot actually.
SPEAKER_02
01:42:30 - 01:43:39
So this is not, I cannot, there's actually a very simple trick. So in order to come up with a task, so the only thing you can measure is skill at a task. Yes. All tasks are going to involve priors. The trick is to know where they are and to describe that. And then you make sure that this is the same set of priors as what human stuff is. So you create a task that assumes this priors, that exactly documents this priors. So that's the priors I might explicit and I'll know other priors involved. And then you generate a certain number of samples in explain space for this task. And this, for one task, assuming that the task is new for the agent, that's one test of this definition of intelligence that we set up. And now you can scale that to many different tasks that each task should be new to the agent, And this should be human, human interpretable in the Sunabore, so that you can actually have a human basis, same test. And then you can compare the score of your machine and the score of your human.
SPEAKER_01
01:43:39 - 01:43:46
Which could be a lot of say, could even start as like amnest, just as long as you start with the same set of times.
SPEAKER_02
01:43:46 - 01:44:15
Yeah, so the problem with amnest humans are already trying to recognize digits, right? But let's say we're considering objects that are not digits. some complete arbitrary patterns. Well, humans already come with visual priors about how to process that. So in order to make the game fair, you would have to isolate this prior and describe them and then express them as computational rules.
SPEAKER_01
01:44:15 - 01:44:33
Having worked a lot with vision science people has exceptionally difficult. A lot of progress has been made. There's been a lot of good tests and basically reducing all of human vision into some good priors. We still probably far away from that perfectly, but as a start for benchmark, that's an exciting possibility.
SPEAKER_02
01:44:33 - 01:45:15
Yeah, so I'll tell you a little bit about actually lists objectness as one of the core knowledge piles, objectness, core objectness. So we have prior to about objectness, like about the visual space, about time, about agents, about co-oriented behavior. We have many different priors, but what's interesting is that, sure, we have this pretty diverse and enriched set of priors, but this also not that diverse, right? We are not born into this world with a ton of knowledge about the world, with only a small set of core knowledge.
SPEAKER_01
01:45:16 - 01:45:45
Yeah, it's hard. Do you have a sense of how it feels those humans that that said is not that large? But just even the nature of time that we kind of integrate pretty effectively through all of our perception, all of our reasoning, maybe how, you know, do you have a sense of how easy it is to encode those priors? Maybe it requires building a universe and then the human brain in order to encode those priors? Or do you have a hope that it's
SPEAKER_02
01:45:46 - 01:47:16
can be listed like in XMAD. You have to keep in mind that any knowledge about the world that we are born with is something that has to have been encoded into our DNA by evolution at some point. And DNA is a very, very low bandwidth medium, like it's extremely long and expensive to encode anything into DNA because first of all, You need some sort of evolutionary pressure to guide this writing process and then You know the higher level information trying to write the longer it's going to take and the Think in the environment that you're trying to anchor knowledge about has to be stable over this duration. So you can only encode into DNA things that constitute an evolutionary advantage. So this is actually a very small subset of all possible knowledge about the world. You can only encode things that are stable, that are true over very, very long periods of time, typically millions of years. For instance, we might have some visual prior about the shape of snakes, right? But what makes the face? What's the difference between the face and the antiface? But consider this interesting question. Do we have any innate sense of the visual difference between the male face and the female face? What do you think?
SPEAKER_01
01:47:17 - 01:47:22
For human, I mean, I would have to look back into evolutionary history when the gender is emerged.
SPEAKER_02
01:47:22 - 01:47:40
But yeah, most, I mean, the faces of humans are quite different from the faces of great hapes, great apes, right? Yeah, like you just couldn't tell, you couldn't tell the face of a female chimpanzee from the face of a male chimpanzee, probably.
SPEAKER_01
01:47:40 - 01:47:43
Yeah, I don't suggest humans of all that.
SPEAKER_02
01:47:43 - 01:48:13
So we do have innate knowledge of what makes a face, but it's actually impossible. For us to have any DNA and coding knowledge of the difference between a female human face and a male human face. Because that knowledge, that information came up into the world actually very recently. If you look at the slowness of the process of coding knowledge into DNA,
SPEAKER_01
01:48:13 - 01:48:22
Yeah, so that's interesting. That's a really powerful argument. The DNA is a low bandwidth, and it takes a long time to encode here, that naturally creates a very efficient encoding.
SPEAKER_02
01:48:22 - 01:48:54
But yeah, one important consequence of this is that, so yes, we are born into this world, which is a bunch of knowledge. Sometimes we are high level knowledge about the word like the shape, the rough shape of the snake of the rough shape of face. But importantly, because this knowledge takes so long to write, Almost all of this innate knowledge is shared with our cousins, with great apes. So it is not actually this innate knowledge that makes us special.
SPEAKER_01
01:48:54 - 01:49:06
But to throw it back at you from the earlier on in our discussion, that encoding might also include the entirety of the environment of Earth.
SPEAKER_02
01:49:07 - 01:49:48
to some extent, so it can include things that are important to survival and production, so for which there is some evolutionary pressure and things that are stable, constant, over very, very long time periods. And, honestly, it's not that much information. There's also, besides the bandwidth constraint and the constraints of the writing process, there's also memory constraints. Like DNA, the part of DNA that deals with the human brain, it's actually very small. It's like, you know, on the order of megabytes, right? It's not that much high level knowledge about the world you can encode.
SPEAKER_01
01:49:48 - 01:50:01
That's quite brilliant and hopeful for benchmark of that you're referring to of encoding prior. It's actually look forward to I'm skeptical that you can do in this couple of years, but hopefully
SPEAKER_02
01:50:02 - 01:50:10
I've been working so honestly it's a very simple benchmark and it's not like a big Brexit or anything It's more like a fun a fun side project, right?
SPEAKER_01
01:50:10 - 01:50:23
So this fun so is ImageNet that these fun side projects could launch entire groups of efforts towards towards creating reasoning systems and so on and I think yeah, that's the goal.
SPEAKER_02
01:50:23 - 01:50:34
It's trying to measure strong generalization to measure the strength of abstraction in our minds, in our minds and in our in that artificial intelligence agent.
SPEAKER_01
01:50:34 - 01:50:51
And if there's anything true about this science organism, it's individual cells love competition. So in benchmarks encourage competition. So that's an exciting possibility. If you do think an AI winter is coming and how do we prevent it?
SPEAKER_02
01:50:52 - 01:53:12
Not really, so an AI winter is something that we do occur when there's a big mismatch between how we are selling the capabilities of AI and the actual capabilities of AI. And today, so deep learning is creating a lot of value and it will keep creating a lot of value in the sense that This is a model, so applicable to very wide range of problems that are relevant today. And we are only just getting started with the clients algorithms to every problem they could be solving. So deep planning will keep creating a lot of value for the time being. What's concerning, however, is that There's a lot of hype around depleting and around AI. There are lots of people now overselling the capabilities of this system. Not just the capabilities, but also overselling the fact that they might be more or less, you know, brain-like, like giving a kind of a mystical aspect with these technologies. And also overselling the pace of progress. Which, you know, it might look fast in the sense that we have this exponentially increasing number of papers. But again, this just a simple consequence of the fact that we have ever more people coming into the field. Doesn't mean the progress isn't actually exponentially fast. Let's say you're trying to raise money for your solar power, your research lab. You might want to tell you, you know, a grand your story is to investors about how deep learning is just like the brain and how it can solve all these incredible problems like self-driving and robotics and so on and maybe you can tell them that the fill is progressing so fast and we are going to have HGI within 15 years or even 10 years and all none of this is true. And every time you're like things, these things, and an investor, or a decision-maker, believes them. Well, this is like the equivalent of taking on credit card debt, but for trust, right? And maybe this will be what enables you to raise a lot of money, but ultimately, you are creating damage, you are dimensioning the fields.
SPEAKER_01
01:53:12 - 01:53:28
That's the concern is that that that's what happens to the other AI winters. The concern is you actually tweet about this with autonomous vehicles, right? There's almost every single company now have promised that they will have full autonomous vehicles by 2021, 2020. Let's do a good example of that.
SPEAKER_02
01:53:31 - 01:53:36
the consequences of over-hyping the capabilities of AI and the pace of progress.
SPEAKER_01
01:53:36 - 01:54:07
Because I work, especially a lot recently in this area, I have a deep concern of what happens when all these companies after I've invested billions have a meeting and say, how much do we actually, first of all, do we have an autonomous vehicle? The answer will definitely be no. and second will be, wait a minute, we've invested one, two, three, four billion dollars into this, and we've made no profit. And the reaction to that may be going very hard in another direction. That might impact you even other industries.
SPEAKER_02
01:54:07 - 01:55:44
And that's what we call an air winter, is when there is backlash, where no one believes any of these promises anymore because they've turned that to be big lies the first time around. And this will definitely happen to some extent for autonomous vehicles. because the public and decision makers have been convinced that around 2015 they've been convinced by these people who are trying to raise money for the startup and so on. That L5 driving was coming in maybe 2016, maybe 2017, 2018. Now when 2019 was still waiting for it. And so I don't believe we're going to have a full-on AI winter because we have these technologies that are producing a tremendous amount of free-all value. But there is also too much hype. So there will be some backlash, especially there will be backlash. So you know, some startups are trying to sell the Dream of HGI. and the fact that IGI is going to create infinite value. Like IGI is like a free nunch. Like if you can develop an AI system that passes a certain threshold of IQ or something, then suddenly you have infinite value. And well, there are actually lots of investors buying into this idea. And you know, they will wait maybe 10, 15 years and nothing will happen. And the next time around, well, maybe maybe there will be a new generation of investors, no one will care. You know, a human memory is very short after all.
SPEAKER_01
01:55:44 - 01:56:06
I don't know about you, but because I've spoken about AGI sometimes, poetically, like, and I get a lot of emails from people. They're usually like a large manifesto of, they say to me that they have created an AGI system where they know how to do it, and there's a long right up of how to do it.
SPEAKER_02
01:56:06 - 01:56:07
I get thought of as easy as you.
SPEAKER_01
01:56:07 - 01:56:15
They're a little bit feel like it's generated by an AI system, actually, but there's usually no doubt.
SPEAKER_02
01:56:15 - 01:56:23
Maybe that's recursively still if you're in her drink. Yeah, exactly. It's, you have a transform out generating crank papers.
SPEAKER_01
01:56:23 - 01:57:13
So, the question is about because you've been such a good, you have a good radar for crank papers. How do we know they're not on to something? How do I, So when you start to talk about AGI or anything like reasoning benchmarks and so on, something that doesn't have a benchmark, it's really difficult to know. I mean, I talk to Jeff Hawkins who's really looking at neuroscience approaches to how, and there's some, there's echoes of really interesting ideas in at least Jeff's case which he's showing. How do you usually think about this? They're preventing yourself from being too narrow minded and elitist about deep learning. It has to work on these particular benchmarks otherwise it's trash.
SPEAKER_02
01:57:13 - 01:57:37
The thing is intelligence It does not exist in the abstract. Intelligence says to be applied. So if you don't have a benchmark, if you don't have an improvement and some benchmark, maybe it's a new benchmark, maybe it's not something we've been looking at before. But you do need a problem that you're trying to solve. You're not going to come up with a solution without a problem.
SPEAKER_01
01:57:38 - 01:57:48
So you general intelligence, I mean, you've clearly highlighted generalization. If you want to claim that you have an intelligence system, it should come with a benchmark.
SPEAKER_02
01:57:48 - 01:58:54
Yes, it should display capabilities of some kind. It should show that it can create some form of value, even if it's a very artificial form of value. And that's also the reason why he don't actually need to care about telling which papers actually submit in potential and which do not. Because if If there is a new technique, it's actually creating value. You know, this is going to be brought to light way quickly because it's actually making a difference. So as the difference between something that is ineffective and something that is actually useful and ultimately usefulness is our guide, not just in this field, but if you look at science in general, maybe there are many, many people over the years that have had some really interesting theories of everything, but they were just completely useless. And you don't actually need to tell the interesting theories from the useless theories. All you need is to see, you know, is this actually having an effect on something else? You know, is this actually used? Is this making an impact on not?
SPEAKER_01
01:58:54 - 01:59:00
That's beautifully put. I mean, the same applies to quantum mechanics, to strength theory, to the holographic principle.
SPEAKER_02
01:59:00 - 01:59:19
We are doing deep planning because it works. You know, that's what? Like before it started working, people, you know, You know, I consider people working on neural networks as cranks by much like, you know, no one was working on this anymore. And now it's working, which is what makes it valuable. It's not about being right, right? It's about being effective.
SPEAKER_01
01:59:19 - 01:59:34
And nevertheless, the individual entities of this scientific mechanism, just like it was your bench area on the cone, they are while being called cranks stuck with it, right? Yeah. And so us individual agents, even if everyone's laughing at us, just stick with it.
SPEAKER_02
01:59:36 - 01:59:40
If you believe you have something you should stick with it and see it's true.
SPEAKER_01
01:59:40 - 01:59:45
That's a beautiful inspirational message to end on. That's what I think is so much for talking to you. That was amazing.
SPEAKER_02
01:59:45 - 01:59:46
Thank you.