Transcript for #73 – Andrew Ng: Deep Learning, Education, and Real-World AI

SPEAKER_00

00:00 - 02:59

The following is a conversation with Andrew Eng. One of the most impactful educators, researchers, innovators and leaders in artificial intelligence and technology space in general. He co-founded Coursera and Google Brain, launched Deep Learning AI, Lending AI and the AI Fund, and was the chief scientist that by-do. As a Stanford professor and with Corsair and Deep Learning AI, he has helped educate and inspire millions of students, including me. This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, give it 5 stars in Apple Podcast, support it on Patreon or simply connect with me on Twitter at Lex Friedman spelled FRIDMAN. As usual, I'll do one or two minutes of ads now and never any ads in the middle that can break the flow of the conversation. I hope that works for you and doesn't hurt the listening experience. This show is presented by Cash App, the number one finance app in the app store. When you get it, use CodeLex.cast. Cash App lets you send money to friends by Bitcoin and invest in the stock market with his little is one dollar. Broker services provided by Cash App investing a subsidiary of Square, a member SIPC. Since cash app allows you to buy Bitcoin, let me mention that cryptocurrency in the context of the history of money is fascinating. I recommend a scent of money as a great book on the history. Debates and credits on ledgers started over 30,000 years ago. The US dollar was created over 200 years ago. And Bitcoin, the first decentralized cryptocurrency, released just over 10 years ago. So given that history, cryptocurrency is still very much in its early days of development. But it's still aiming to, and just might redefine the nature of money. So again, if you get cash app from the App Store or Google Play and use the code Lex podcast, you'll get $10 and cash app will also do an $10 to first. One of my favorite organizations that is helping to advance robotics and step-end education for young people around the world. And now, here's my conversation with Andrew Ang. The courses you taught on machine learning are Stanford and later on Coursera that you go founded have educated inspired millions of people. So let me ask you what people are ideas inspired you to get into computer science and machine learning when you were young. When did you first fall in love with the field? There's another way to put it.

SPEAKER_01

03:01 - 04:42

Growing up in Hong Kong Singapore, I started learning to code when I was five or six years old. At that time, I was learning the basic programming language, and it would take these books, and they'll tell you type this program into your computer, so type that program to my computer. And as a result of all that typing, I would get to play these very simple, shoot them up games that I had implemented on my little computer. So I thought it's fascinating as a young kid that I could write this code that's really just copying code from a book into my computer to then play these cool little video games. Another moment for me was when I was a teenager and my father, Chris Doctor, was reading about ex-bit systems and about neural networks. So he got me recently these books and I thought it was really cool. He could write a computer that started to exhibit intelligence. Then I remember doing internship was in high school, this was in Singapore, where I remember doing a lot of photocopying and office assistants. And the highlight of my job was when I got to use this shredder. So the teenager me, remember thinking, boy, this is a lot of photocopying. If only you could write software, build a robot, something to automate this. Maybe I could do something else. So I think a lot of my work since then has centered on the theme of automation. Even the way I think about machine learning today, we're very good at writing learning algorithms that can automate things that people can do. Or even launching the first MOOCs method from online courses that later led to a Coursera, I was trying to automate. what could be automatable in how I was teaching on campus.

SPEAKER_00

04:42 - 04:52

Process of education tried to automate parts of that to make it more sort of to have more impact from a single teacher or a single educator.

SPEAKER_01

04:52 - 05:19

Yeah, I felt, you know, teaching a standard teaching machine learning to about 400 students a year at the time. And I found myself filming the exact same video every year, telling the same jokes, the same room. And I thought, why am I doing this? Why would you take last year's video? And then I can spend my time building a deeper relationship with students. So that process of thinking through how to do that, that led to the first first moves that we launched.

SPEAKER_00

05:19 - 05:30

And then you have more time to write new jokes. Are there favorite memories from your early days of Stanford teaching thousands of people in person and then millions of people online?

SPEAKER_01

05:31 - 06:45

You know, teaching online, what not many people know was that a lot of those videos were shot between the hours of 10 p.m. and 3 a.m. A lot of times, launching the first one was outstanding. We were very now so close, but a hundred thousand people had signed up. We just started to write the code and we had not yet actually filmed the videos. So we have a lot of pressure, a hundred thousand people waiting for us to produce a content. So many Friday, Saturdays, I would go out, have dinner my friends, and then I would think, okay, do you want to go home now or do you want to go to the office to film videos? And the start to be able to help 100,000 people potentially learn machine learning. Fortunately, that made me think, okay, I'm going to go to my office, goes to my time-ealer recording studio. I would adjust my logic, where can I adjust my, you know, waggoms have to make sure my little mic was on. And then I would start recording, often until 2 a.m. I think I'm fortunate that it doesn't show that it was recorded that late at night, but it was really inspiring the thought that we could create content to help. So many people learn about machine learning.

SPEAKER_00

06:45 - 07:10

How do that feel? The fact that you're probably somewhat alone, maybe a couple of friends recording with a logic tech webcam and kind of going home alone at one or two AM at night and knowing that that's going to reach sort of thousands of people, eventually millions of people, what's that feeling like? I mean, is there a feeling of just satisfaction of pushing through.

SPEAKER_01

07:10 - 08:03

I think it's humbling and I wasn't thinking about what I was feeling. I think one thing we've done proud to say we got right from the early days was I taught my whole team back then that the number one priority is to do what's best for the learners, do what's best for students. And so when I went to the recording studio, the only thing on my mind was, what can I say, how can I design my size? Well, I need to draw right to make these concepts as clear as possible for learners. I think you know, I've seen sometimes instructors is tempting to, hey, let's talk about my work, maybe if I teach you about my research, some will cite my papers a couple more times. And I think one thing we got right launched the first few moves and later building closerer was putting a place that bedrock principle of let's just do what's best for learners and figure about everything else. And I think that that is a guiding principle turned out to be really important to the rise of the movement.

SPEAKER_00

08:04 - 08:17

And the kind of learner you're managing in your mind is as broad as possible, as global as possible. So really try to reach as many people interested in machine learning and AI as possible.

SPEAKER_01

08:17 - 08:48

I really want to help anyone that had an interest in machine learning to break into fields. And I think sometimes, I think people ask me, hey, why are you spending so much time explaining and grading to send? And my answer was, If I look at what I think to learn the needs and what benefit from, I felt that having that occurred on the standing of the foundations, coming back to the basics, would put them in a better state to then build on a long-term career. So try to consistently make decisions on that principle.

SPEAKER_00

08:48 - 09:37

One of the things you actually revealed to the narrow AI community at the time and to the world is that the amount of people who are actually interested in AI is much larger than we imagined. by you teaching the class and how popular it became. It showed that, wow, this isn't just a small community of people who go to a new reps and it's much bigger, it's developers, it's people from all over the world, I mean, I'm Russian, so everybody in Russia is really interested. This is a huge number of programmers who are interested in machine learning, India, China, South America everywhere that it's just millions of people who are interested in machine learning to help big you get a sense that this the number of people is that are interested from your perspective

SPEAKER_01

09:37 - 11:21

I think the numbers grow over time. I think a lot of those things that maybe it feels like it came out of nowhere, but it's an inside that building it took years. There's not those overnight successes that took years to get there. My first forery into this type of online education was when we were filming my Stanford calls and sticking the videos on YouTube. And then some of the things we uploaded the whole work and so on. But you know, basically the one hour or 15 minute video that we put on YouTube. And then we had four or five other versions of websites that had built, most of which you would never have heard of because they reached small audiences, but that allowed me to iterate, allow my team and me to iterate to learn what the ideas that work and what doesn't. For example, one of the features I was really excited about in really proud of was, both this website where multiple people could be logged into the website at the same time. So today, if you go to website, you know, if you all logged in and then I want to log in, you need to log out. There's the same browser, the same computer. But I thought, well, what if two people say you and me were watching a video together in front of a computer? What if a website could have you type your name and password, have me type my name and password? And then now the computer knows both of us are watching together and it gives both of us credit for anything we do as a group. Inferences feature, wrote it out in a school in San Francisco. We had about 20 something users. Where's the teacher there? Sacred Heart Cathedral, perhaps. It's just great. I mean, guess what? Zero people use this feature. It turns out people studying online. They want to watch the videos by themselves. So you can play back, pause at your own speed, rather than in groups. So that was one example of a timely lesson learned. All the many that allow us to hone into the set of features.

SPEAKER_00

11:22 - 11:49

And it sounds like a brilliant feature. So I guess the lesson to take from that is there's something that looks amazing on paper and then nobody uses it doesn't actually have the impact that you think it might have. So yeah, I saw that you really went through a lot of different features and a lot of ideas to arrive at the final, of course, there's a final kind of powerful thing that showed the world that MOOCs can educate millions.

SPEAKER_01

11:49 - 12:59

And I think with how machine learning movements as well, I think it didn't come out of nowhere. Instead, what happened was, as more people learn about machine learning, they will tell their friends, and their friends will see how the people will to their work, and then the community kept on growing. And I think what's still growing, you know, I don't know in the future what percentage of all developers will be AI developers. I could easily see it being more for 50%, right? Because so many AI developers broadly constitute not just people doing the machine learning modeling, but the infrastructure, the pipelines, all the software surrounding the core machine learning model. Maybe it's even bigger. I feel like today, almost every software engineer has some understanding of the cloud, not all. Maybe this is my microcontroller developer, it doesn't need to do the cloud. But I feel like the vast majority of software engineers today are sort of having the patience to cloud. I think the future maybe will approach nearly a hundred percent of all developers being, you know, in some way, an AI developer, or at least have an appreciation of machine learning.

SPEAKER_00

12:59 - 13:44

And my hope is that there's this kind of effect that there's people who are not really interested in software engineering, like biologists, chemists, and physicists, even mechanical engineers, all these disciplines that are now more and more sitting on large data sets. and here they didn't think they're interested in programming until they have this data set and they realize there's these set of machine learning tools that allow you to use the data set so they actually become they learn to program and they become new programmers so like the not just because you mentioned a larger percentage of developers become machine learning people the it seems like more and more the the kinds of people who are becoming developers is also growing significantly

SPEAKER_01

13:44 - 14:55

Yeah, I think once upon a time, only a small part of humanity was literate, you know, could read and write. And maybe you thought, maybe not everyone needs to learn to read and write. You know, you just go listen to a few months, right, read to you and maybe that was enough. Or maybe we just need a few handful of authors to write the best solos. And then no one else needs to write. But what we found was that by giving as many people, you know, in some countries, almost everyone, basically literacy, it dramatically enhanced human-to-human communications. And we can now write for an audience of one, such as if I sent you an email, or you sent me an email. I think in computing we're still in that phase where so few people know how the code, that the code is mostly have the code for relatively large audiences. But the very one, well, most people became Developers at some level, similar to how most people in developing economies are somewhat literate. I would love to see the owners of a modern pop store be able to write a little bit of code to customize the TV display for their special this week. I think it will enhance human-to-computer communications, which is becoming more and more important today as well.

SPEAKER_00

14:56 - 15:12

So you think it's possible that machine learning becomes kind of a similar to literacy where, yeah, like you said, the owners of a mile in pop shop is basically everybody in all walks of life would have some degree of programming capability.

SPEAKER_01

15:13 - 16:20

I could see society getting there. There's one interesting thing, you know, if I go talk to the mom and pop store, if I talk to a lot of people in their daily professions, I previously didn't have a good story for why they should learn to code. You know, we could give them some reasons. But what I found with the rise of machine learning and data sizes is that I think the number of people with a concrete use for data science in their daily lives, in their jobs, maybe even larger than the number of people with a concrete use for software engineering. For example, if you run a small mom and pop store, I think if you can analyze the data about your sales, your customers, I think there's actually a real value there. Maybe even more than traditional software engineering. So I find that for a lot of my friends and various professions, be it recruiters or countings or people that work in factories, which I do with more and more these days. I feel if they were data scientists at some level, they could immediately use that in their work. So I think that data science and machine learning maybe even easier on-tray into the developer world for a lot of people than the software engineering.

SPEAKER_00

16:20 - 16:44

That's interesting. And I agree with that, but that's beautifully put. We live in a world where most courses and talks have slides, PowerPoint, keynote, and yet you famously often still use a Marker and a Whiteboard. The simplicity of that is compelling in for me, at least, fun to watch. So let me ask, why do you like using a Marker and Whiteboard, even not the biggest of stages?

SPEAKER_01

16:46 - 17:20

I think it depends on the concepts you will explain. For mathematical concepts, it's nice to build up the equation one piece of the time. And the whiteboard marker or the penistarles is a very easy way to build up the equation. Build up a complex concept one piece of the time while you're talking about it and sometimes that enhances understand the ability. The downside of writing is that it's slow, and so if you want to long sentence, it's very hard to write that. So I think they're frozen cons, and sometimes I use slides, and sometimes I use a whiteboard or a stylus.

SPEAKER_00

17:20 - 18:35

The slowness of a whiteboard is also its upside. Because it forces you to reduce everything to the basics. Some of your talks involve the whiteboard. I mean, it's really not what you go very slowly and you really focus on the most simple principles. That's a beautiful that enforces a kind of a minimalism of ideas that I think is surprising to these for me is great for education. Like a great talk, I think, is not one that has a lot of content. A great talk is one that just clearly says a few simple ideas. And I think the whiteboard somehow enforces that. Peter or Bill, who's now one of the top roboticists. I mean, enforcement learning experts in the world was your first PhD student. So I bring him up just because I kind of imagine this was must have been an interesting time in your life. Do you have any favorite memories of working with Peter, so you're your first student in those uncertain times, especially before deep learning, really really sort of blew up any favorite memories from those times?

SPEAKER_01

18:35 - 19:17

Yeah, I was really fortunate to have at Peter at the US, my first PhD student, and I think even my long-term professional success builds on early foundations or early work that Peter was so critical to, so it was really grateful to him for working with me. You know, what not a lot of people know is just how hard research was and stories. Peter's PhD thesis was using reinforcement learning to fly helicopters. And so, you know, even today, the website heli.stanford.edu, HEOi.stanford.edu is still up in watch videos of us using reinforcement learning to make a helicopter fly upside down fly loose rolls. It's cool.

SPEAKER_00

19:17 - 19:29

So one of the most incredible robotics videos ever, so please do watch it. Oh, yeah. Thanks. Inspiring. That's from like 2008 to 7 or 6 like that range.

SPEAKER_01

19:29 - 19:30

So it was over 10 years old.

SPEAKER_00

19:30 - 19:32

That was really inspiring to a lot of people yet.

SPEAKER_01

19:32 - 21:41

And what not many people see is how hard it was. So Peter and Adam codes and Morgan quickly and I were working on various versions of the helicopter and a lot of things did not work. For example, in terms of one of the hardest problems we had was when the helicopter is lying around upside down doing stunts How do you figure out the position? How do you localize the whole factor? So we want to try and also also things, having one GPS unit doesn't work because you fly outside down, GPS unit is facing down, so you can't see the satellites. So we try to experiment to try to have two GPS units, one facing up, one facing down, so if you flip over, that didn't work because the downward facing one couldn't synchronize if you're flipping. quickly, Morgan quickly was exploring this crazy complicated configuration of specialized hardware to interpret GPS signals. Looking at FPGs, completely insane. It's spent about a year working on that. Didn't work. So I remember Peter, a great guy, him and me sitting down in my office, looking at some of the latest things we had tried that didn't work and saying, you know, done it like what now because because we tried so many things and it just didn't work. In the end, what we did in Adam Cole was crucial to this was put cameras on the ground and use cameras on the ground to localize the helicopter and that solved the localization problem so that we could then focus on the reinforcement learning and investment for reinforcement learning techniques to didn't actually make the helicopter fly. And you know, and remind it, when it was doing this work at Stanford, around that time, there was a lot of reinforcement learning theoretical papers, but not a lot of practical applications. So the Autonomous helicopter for flying helicopters was one of the few practical applications of reinforcement learning at the time, which caused it to become pretty well-known. I feel like we might have almost come full circle with today that's so much, but so much hype, so much excitement about reinforcing their own thing, but the game we're hunting for more applications of all of these great ideas that the Vietnamese come up with.

SPEAKER_00

21:42 - 22:10

What was the drive in the face of the fact that most people are doing theoretical work? What motivated you in the uncertainty and the challenges to get the helicopter to do the applied work, to get the actual system to work? Yeah, in the face of fear, uncertainty, the setbacks that you mentioned for localization. I like stuff that works. I'm not physical work. It's back to the shredder.

SPEAKER_01

22:10 - 23:32

You know, I like theory, but when I work on theory myself, and this is personal taste, I'm not saying anyone else should do what I do. But when I work on theory, I proceed and enjoy more. If I feel that the work I do will influence people, have positive impact, will help someone. I remember when many years ago, I was speaking with a mathematics professor. And it kind of just said, hey, why do you do what you do? And then he said, he had stars in his eyes when he answered, and this mathematician, not from Stanford, different university, he said, I do what I do because it helps me to discover truth and beauty in the universe. He had stars in his eyes and he said, yeah. And I thought that's great. I don't want to do that. I think it's great that someone does that fully support the people that do it. A lot of respect for people that, but I am almost evaded when I can see a line to how the work that my teams and I are doing helps people. The world needs also so people. I'm just one type. I don't think everyone should do things the same way as I do. But when I delve into either theory or practice, if I press the have conviction that here's a path for it to help people, I find that more satisfying. It's half that conviction.

SPEAKER_00

23:32 - 23:49

That's your path. You were a proponent of deep learning before gained widespread acceptance. What did you see in this field that gave you confidence? What was your thinking process like in that first decade of the, I don't know what that's called, 2000s, the odds.

SPEAKER_01

23:51 - 25:58

Yeah, I can tell you the thing we got wrong or the thing we got right. The thing we really got wrong was the importance of the early importance of unsupervised learning. So early days of Google Brain, we put a lot of effort into unsupervised learning rather than supervised learning. And this is arguing. I think it was around 2005 after Europe's, at that time caught nips, but now in Europe said ended. And Jeffington and I were sitting on the cafeteria outside, you know, the conference, we had lunch with this chatting. And Jeff pulled up this napkin. He started sketching this argument on an napkin. It was very compelling, the hour of repeated. Human brain has about 100 trillion, so there's 10 to the 14 synaptic connections. You will live about 10 to nine seconds, that's 30 years. You actually live for two by 10 to nine, maybe three by 10 to nine seconds. So let's say 10 to nine. So if each synaptic connection, each weight in your brain's neural network has just a one bit parameter that's 10 to the 14 bits you need to learn in up to 10 to nine seconds of your life. So, via this simple argument, which is a lot of problems, it's very simplified. That's 10 to the 5 bits per second you need to learn in your life. And I have a 1-year-old daughter. I am not pointing out 10 to the 5 bits per second of labels to her. So, and I think I'm a very loving parent, but I'm just not going to do that. So, from this very crude, definitely problematic argument, there's just no way that most of what we know is through supervised learning. But why can you get so many bits of information from sucking an image as audio experiences in the world? And so that argument, and there are a lot of no forces argument, you know, going to really convince me that there's a lot of power to unsupervised learning. So that was the part that we actually maybe got wrong. I still think unsupervised learning is really important, but we, but in the early days, you know, 10, 15 years ago, a lot of us thought that was to pop forward.

SPEAKER_00

25:58 - 26:03

Oh, see, you're saying that that perhaps was the wrong intuition for the time.

SPEAKER_01

26:03 - 27:01

For the time, that was the part we got wrong. The part we got right was the importance of scale. So Adam Coats, another wonderful person, fortunate to work with him. He was in my group as Stanford at the time, and Adam had run these experiments as Stanford, showing that the bigger we train a learning algorithm, the better it's performance. And it was based on that, it was a graph that Adam generated, you know, where the X axis Y axis lies going up into the right. So big, big, big, big, big, distinct, the better performance accuracy is of earth practices. So it's really based on that chart that Adam generated, that gave me the conviction that you could scale these models way bigger than what we could on the few CPUs, which you would have to stand for, that we could get. even better results. And it was really based on that one figure that Adam generated, that gave me the conviction, a deco of Sebastian Thrun to pitch, you know, starting a project at Google, which became the Google Brain Crunch Brain.

SPEAKER_00

27:01 - 27:23

You go find a Google Brain. And there the intuition was scale will bring performance for the system, so we should chase a larger and larger scale. And I think people don't realize how groundbreaking of it is simple, but it's a groundbreaking idea that bigger data sets will result in better performance.

SPEAKER_01

27:23 - 28:00

It was controversial at the time. Some of my well-meaning friends, you'll see many people in the machine learning community, I won't name, but people with some of whom we know. My well-meaning friends came and were trying to give me friends, like, hey, Andrew, why are you doing this? It's crazy. It's in the near and that architecture. Look at these architectures of building. You're just going to go for scale. There's a bad career move. So my well-meaning friends, we're trying to, some of them were trying to talk me out of it. But I find that if you want to make a breakthrough, you sometimes have to have conviction and do something before it's popular, since that lets you have a bigger impact.

SPEAKER_00

28:00 - 28:54

Let me ask you just in a small tangent on that topic. I find myself arguing with people saying that greater scale, especially in the context of active learning, so very carefully selecting the data set, but growing the scale of the data set is going to lead to even further breakthroughs in deep learning. And there's currently push back at that idea that larger data sets are no longer. So you want to increase the efficiency of learning, you want to make better learning mechanisms. And I personally believe that just bigger data sets will still, with the same learning methods we have now, will result in better performance. What's your intuition at this time on this dual side? Is do we need to come up with better architectures for learning? Or can we just get bigger data sets that will improve performance?

SPEAKER_01

28:55 - 29:43

I think both are important. And it's also problem-dependent. So for a few datasets, we may be approaching, you know, base error rate or approaching or surpassing human level performance and then there is that theoretical ceiling that we will never surpass or base error rate. But then I think there are plenty of problems where we're still quite far from either human-level performance or from Bayes or Array. And Baker Data says, with neural networks, without further outlook, innovation will be sufficient to take us further. But on the flip side, if we look at the recent breakthroughs using, you know, transforming networks on language models, it was a combination of novel architecture, but also scale had a lot to do with it. We look at what happened with new GPU and birds. I think scale was a large part of the story.

SPEAKER_00

29:43 - 30:03

Yeah, that's not often talked about is the scale of the data set it was trained on and the quality of the data set because there's some so it was like redded threads that had they were operated highly. So there's already some weak supervision on a very large data set that people don't often talk about right.

SPEAKER_01

30:04 - 30:36

I find it today, we have maturing processes to managing code, things like git, version control. It talks a long time to evolve the good processes. I remember when my friends and I were emailing each other, C++ files in email. But then we had CVS a version git, maybe something else in the future. We're very maturing in terms of two managing data and think of how the clean data and how the software, I'm very hot messy data problems. I think there's a lot of innovation there. to be has still.

SPEAKER_00

30:36 - 30:39

I love the idea that you were versioning through email.

SPEAKER_01

30:39 - 32:29

I'll give you one example. When we work with manufacturing companies, it's not at all an comment for there to be multiple labels that disagree with each other, right? And so we were doing the work in vision inspection. We will You'll take, say, a plastic part, and show it to one inspector. And the inspector sometimes very opinionated to go clearly, that's the defect, the scratch, unacceptable, go and reject this part. Take the same part to different inspector, different, very opinionated, clearly the scratch is small. It's fine. Don't throw the way you're going to make us the old. And then sometimes you take the same plastic part, show it to the same inspector in the afternoon in this post in the morning, and very affinity goes in the morning to say, clearly it's okay, and the afternoon, equally confident, clearly this is a defect. And so what does the new I team suppose to do if sometimes even one person doesn't agree with himself or herself in the span of a day? So I think these are the types of very practical, very messy data problems that my team's wrestle with. In the case of large consumer internet companies where you have a billion users, you have a lot of data, you don't worry about it, just take the average, it kind of works. But in a case of other industry settings, we don't have Big data, if you're a small data, very small data says, maybe you're in the 100 defective parts or 100 examples of a defect. If you have only 100 examples, these little labeling errors, you know, if 10 of your 100 labels are wrong, that actually is 10% of your data is there as a big impact. So how do you clean this up? What are you supposed to do? This is an example of the types of things that my teams did. This is a land AI example, a wrestling with to deal with small data. When it comes to all the time, once you're outside, consumer internet.

SPEAKER_00

32:29 - 32:44

Yeah, that's fascinating. So then you invest more effort and time and thinking about the actual labeling process. What are the labels? What are the how are disagreements resolved in all those kinds of like pragmatic real world problems? That's a fascinating space.

SPEAKER_01

32:44 - 33:11

Yeah. I find it actually when I'm teaching at Stanford, I increasingly encourage students at Stanford to try to find their own project for the end of term project rather than just downloading someone else's nicely clean data set. It's actually much harder if you need to go and define your own problem and find your own data set rather than you go to one of the several good websites, very good websites with clean sculpt data sets that you could just work on.

SPEAKER_00

33:12 - 33:49

You're not running three efforts. The AI Fund, Landing AI, and Deep Learning.AI. As you've said, the AI Fund has evolved in creating new companies from scratch. Landing AI is involved in helping already establish companies do AI, and Deep Learning AI is for education of everyone else, or of individuals interested of getting into the field and installing in it. So let's perhaps talk about each of these areas first, Deep Learning.AI. how the basic question, how does a person interested in deep learning get started in the field?

SPEAKER_01

33:49 - 34:02

Deep Learning AI is working to create causes to help people break into AI. So my machine learning course that I taught through Stanford means one of the most popular causes on Coursera.

SPEAKER_00

34:02 - 34:24

To this day it's probably one of the courses sort of if I asked somebody how did you get into machine learning or how did you fall in love with machine learning or get you interested? It always goes back to entering at some point. You've employed the amount of people you've influenced is ridiculous. So for that I'm sure I speak for a lot of people say big thank you.

SPEAKER_01

34:24 - 34:50

No, yeah, thank you. I was once reading a news article. I think it was tech review and I'm going to mess up this statistic. But I remember reading article that said something like one third of our programmers are self-taught. I may have the number one third around me as two thirds. But remember that article, I thought, this doesn't make sense. Everyone is self-taught. So because you teach yourself, I don't teach people. I just don't.

SPEAKER_00

34:51 - 34:57

So yeah, so how does one get started in deep learning and where does deep learning that AI fit into that?

SPEAKER_01

34:58 - 35:32

So the deep-lying specialization of a by-demand AI is, I think, it was called Serious Specialization. It might still be. So it's a very popular way for people to take that specialization, to learn about everything from neural networks, to how to tune in your network. So what is the confident to what is a RNA in our sequence model or what is an attention model? the different specialization steps everyone through those algorithms so you deeply understand it and can implement it and use it for whatever.

SPEAKER_00

35:32 - 35:42

From the very beginning, so what would you say of the prerequisites for somebody to take the deep learning specialization in terms of maybe math or programming background?

SPEAKER_01

35:42 - 36:09

Yeah, need to understand basic programming since there are programming exercises in Python. And the math prerec is quite basic. So no calculus is needed. If you know calculus is great, you get better intuitions. But they're really trying to teach that specialization without requiring calculus. So I think on high school math would be sufficient. If you know how to multiply two matrices, I think that that that that that that's great.

SPEAKER_00

36:09 - 36:11

So a little basically in your algebra is great.

SPEAKER_01

36:12 - 36:37

basically in the algebra, even very, very, basically in the algebra and some programming. I think that people that have done the machine learning calls will find a deep learning specialization a bit easier, but it's also possible to jump into the deep learning specialization directly, but it'll be a little bit harder since we tend to go over faster concepts like how this gradient is going to send the work and what is an objective function, which covers more slowly in the machine learning calls.

SPEAKER_00

36:37 - 36:46

Could you briefly mention some of the key concepts in deep learning that students should learn that you envision them learning in the first few months in the first year or so?

SPEAKER_01

36:46 - 37:59

So if you take the deep learning session, you learn the foundations of what is in your network. How do you build up a neural network from a single literacy unit to stack of layers to different activation functions? You don't have the trained in your networks. One thing I'm very proud of in that specialization is we go through a lot of practical know-how of how to actually make these things work. So what are the differences between different optimization algorithms? So what do you do of the algorithm over fit? So how do you tell the algorithm is over fitting? When do you collect more data? When should you not bother to collect more data? I find that Even today unfortunately there are engineers that will spend six months trying to pursue a political direction such as collect more data because we heard more data is valuable. But sometimes you could run some tests and could have figured out six months earlier that for this particular problem collecting more data isn't going to cut it. So just don't spend six months collecting more data. spend your time modifying the architecture or trying something else. So go through a lot of the practical know-how so that when someone, when you take the device specialization, you have those skills to be very efficient in how you build these networks.

SPEAKER_00

37:59 - 38:22

So dive right in to play with the network to train it, to do the inference on a particular data set to build the intuition about it without without building it up too big to where you spend like you said six months learning building up your big project without building an intuition of a small small aspect of the data that could already tell you everything you need to know about that data.

SPEAKER_01

38:23 - 39:44

Yes, and also the systematic frame rest of thinking for how to go about building practical machine learning. Maybe to make an analogy, when we learn to code, we have to learn the syntax of some programming language, right, be it Python or C++ or Octivo or whatever. But that equally important or maybe even more important part of coding is to understand how to string together these lines of code and to code here and things. So when should you put something in the function column? When should you not? How do you think about abstraction? So those frameworks are what makes the program efficient even more than understanding the syntax. I remember when I was an undergrad at Carnegie Mellon, one of my friends would debug their code by first trying to compile it and then it was C++ code. And then every line that is syntax error, they want to get rid of syntax errors as quickly as possible. So how do you do that? Well, they would delete every single line of code with syntax error. So really efficient for getting rid of syntax errors were horrible debugging errors. So I think, so we learn how to debug. And I think in machine learning, the way you debug, machine learning program is very different than the way you You know, I do buy new research or whatever. Use a debugger, I trace the code, and then traditional software engineering. So, as an evolving discipline, but I find that the people that are really good at debugging machine learning algorithms are easily 10x, maybe 100x faster at getting some of the work.

SPEAKER_00

39:45 - 40:03

And the basic process of debugging is, so the bug in this case, why is in this thing learning and improving, sort of going into the questions of overfitting and all those kinds of things, that's the logical space that the debugging is happening in with neural networks.

SPEAKER_01

40:04 - 40:28

Yeah, the often the question is, why doesn't it work yet? Or can I expect it to eventually work? And what are the things I could try? Change the architecture, more data, more regularization, different optimization algorithms, different types of data, so to answer those questions systematically, so that you don't heading down, so you don't spend six months heading down the blind alley before someone comes and says, why is she spend six months doing this?

SPEAKER_00

40:29 - 40:43

What concepts and deep learning do you think students struggle the most with or sort of is the biggest challenge for them was to get over that hill. It hooks them and it inspires them and they really get it.

SPEAKER_01

40:45 - 41:53

similar to learning mathematics, I think one of the challenges of deep learning is that there are a lot of concepts that build on top of each other. If you ask me what's hard about mathematics, I have a hard time pinpointing one thing, is it addition, subtraction, is it carry, is it multiplication law stuff? I think one of the challenges of learning math and learning certain technical fields is that a lot of concepts, and if you miss a concept, then you're kind of missing the prerequisite for something that comes later. So in the deep learning specialization, try to break down the concepts to maximize the art of each component being understandable. So when you move on, to the more advanced thing. We learn your confidence, hopefully you have enough intuitions from the earlier sections, to then understand why we structure confidence in a certain way and then eventually why we built RNNs and LSTMs or attention model in a certain way, building on top of the earlier concepts. Are you doing a lot of teaching as well? Do you have a favorite? This is the hard concept moment in your teaching.

SPEAKER_00

41:57 - 43:09

I don't think anyone's ever turned the interview on me. I think that's a really good question. Yeah, it's really hard to capture the moment when they struggle. I think you put it really eloquently. I do think there's moments that are like a ha moments that really inspire people. I think for some reason reinforcement learning, especially deep reinforcement learning, is a really great way to really inspire people and get what the use of neural networks can do. Even though neural networks really are just a part of the deep art of framework, but it's a really nice way to paint the entirety of the picture of a neural network being able to learn from scratch, knowing nothing and explore the world and pick up lessons. I find that a lot of the aha moments happen when you use deep RL to teach people about neural networks, which is counterintuitive. I find like a lot of the inspired sort of firing people, passion, people's eyes, it comes from the RL world. Do you find reinforcement learning to be a useful part of the teaching process or not?

SPEAKER_01

43:09 - 44:12

I still teach reinforcement for slowly learning in one of my scientific classes, and my PhD thesis is learning for some things, right? I find that if I'm trying to teach students the most useful techniques for them to use today, I end up shrinking the amount of time I talk about reinforcement learning. It's not what's working today. Now, our work changes so fast. Maybe it does be totally different in a couple years. But I think we need a couple more things for reforsal learning to get there. One of my teams is looking to reforsal learning for some robotic control tasks. So I see the applications, but if you look at it as a percentage of all of the impact of the types of things we do, it's at least today outside of playing video games in a few of the games. Actually, that neuros, a bunch of us were standing around saying, hey, what's your best example of an actual deployment force of learning application? And, you know, among like, seeing a machine learning researchers, right? And again, there are some emerging ones, but there are not that many great examples.

SPEAKER_00

44:12 - 45:14

Well, I think you're absolutely right. The sad thing is there hasn't been a big application, impactful real world application read for someone learning. I think it's biggest impact to me has been in the toy domain, in the game domain, in a small example. That's what I mean for educational purpose. It seems to be a fun thing to explore, and that works with. But I think From your perspective and I think that might be the best perspective is if you're trying to educate with a simple example in order to illustrate how this can actually be grown the scale and have a real world impact then perhaps focusing on the fundamentals of supervised learning in the context of you know a simple data set even like an feminist data set is the right way is the right path to take I just, the amount of fun I've seen people have with reinforcements learned has been great, but not in the applied impact on the real world setting. So it's a trade off. How much impact do you want to have versus how much fun you want to have?

SPEAKER_01

45:14 - 46:09

That's really cool. And I feel like the world actually needs all sorts. Even within machine learning, I feel like deep learning is so exciting. But AI team shouldn't just use deep learning. I find that my team is just a portfolio of tools. And maybe that's not the exciting thing to say. But some days we use the neural net, some days we use PCA. Actually, the other day I was sitting down with my team, looking at PCA residuals, trying to figure out what's going on with PCA applied to manufacturing problem. Some days we use a proxy draft goal model, some days we use a knowledge draft, which is one of the things that has tremendous industry impact, but the amount of chat about knowledge drafts in academia is really thin compared to the actual raw impact. So I think we're in fourth and learning should be in that portfolio, and then it's about balancing how much we teach all of these things. And the world should have diverse skills, or be sad if everyone just learn one narrow thing.

SPEAKER_00

46:09 - 46:27

Yeah, the diverse skill will help you discover the right tool for the job. What is the most beautiful surprising or inspiring idea in deep learning to you? Something that captivated your imagination. Is it the scale that could be the performance I could be achieve with scale or is there other ideas?

SPEAKER_01

46:29 - 48:51

I think that if my only job was being an academic researcher and having unlimited budget and didn't have to worry about short-term impact and only focus on long-term impact, I've recently spent all my time doing research on non-survivors learning. I still think, unsurprisingly, there's a beautiful idea. At both this past narratives in ICML, I was attending workshops on this interview as talks about self-supervised learning, which is one vertical segment, maybe a sort of unsurvised learning that I'm excited about. Maybe just a summarized idea. I guess you know the idea of describe movie. No, please. So here's an example of self-survised learning. Let's say we grab a lot of unlabeled images after the internet, so we've infinite amounts of this type of data. I'm going to take each image and rotate it by a random multiple of 90 degrees. And then I'm going to train a supervised neural network to predict what was the original orientation. So it has some sort of a rotated 90 degrees, 180 degrees, 170 degrees or zero degrees. So you can generate an infinite amount of label data because you rotate to the image. So you know what's the structure of label. And so various researchers have found that by taking on label data and making up label data sets and training a large neural network on these tasks, you can then take the hidden layer representation and transfer it to a different task very powerful. Learning word embeddings where we take a sentence to lead a word predict the missing word which is how we learn one of the ways we learn word embeddings is another example and I think there's now this portfolio of techniques for generating these made up tools. Another one called jigsaw, would be if you take an image, cut it up into a 3x3 grid, so like a 9, 3x3 puzzle piece, jump out the 9 pieces and have a neural network predict which of the 9 factorial possible permutations it came from. So many groups, including, you know, OpenAI, Peter B has been doing some work on this to Facebook, Google, Brain, I think DeepMine, oh, actually, Aaron then did all of his great work on the CPC objective. So many teams are doing site work, and I think this is a way to generate infinite label data. And I find this a very exciting piece of unsupervised learning.

SPEAKER_00

48:51 - 48:59

long-term, you think that's going to unlock a lot of power in machine learning systems. Is this kind of unsupervised learning?

SPEAKER_01

48:59 - 49:43

I don't think there's a hole in Chalada. I think it's just a piece of it. And I think this one piece, self-supervised learning, is starting to get traction. We're very close to it being useful. Well, we're then betting really useful. I think we're getting close and closer to just having a significant real world impact, maybe in computer vision and video. But I think this concept, and I think there'll be other concepts around it. Other unsupervised letting things that that worked on that been excited about. I was really excited about spa's coding and I see a slow feature analysis. I think all of these are ideas that veers of us were working on about a decade ago before we all got distracted by how well supervised learning was doing.

SPEAKER_00

49:43 - 49:50

Yeah. So we would return. We would return to the fundamentals of representation learning that that really started this movement of deep learning.

SPEAKER_01

49:51 - 49:57

I think there's a lot more work that one could explore around the steam of ideas and other ideas to come with better algorithms.

SPEAKER_00

49:57 - 50:09

So if we could return to maybe talk quickly about the specifics of deep learning that AI, the deep learning specialization perhaps, how long does it take to complete the course, would you say?

SPEAKER_01

50:10 - 50:49

The official length of the Deavank Socialization is I think 16 weeks, so about four months, but it's going to go at your own pace. So if you subscribe to the Deavank Socialization, there are people that have finished it in less than a month by working more intensely and studying more intensely. So it really depends on the individual. We're created a different socialization. We wanted to make it very accessible and very affordable. And with Coursera and Devon Diya's education mission, one of the things that's really important to me is that if there's someone for whom paying anything is a financial hardship, then just apply for financial aid and get it for free.

SPEAKER_00

50:51 - 51:11

If you were to recommend a daily schedule for people in learning whether it's through the deep learning.js specialization or just learning in the world of deep learning, what would you recommend? How would they go about data days or a specific advice about learning about their journey in the world of deep learning machine learning?

SPEAKER_01

51:12 - 52:22

I think getting the habit of learning is key and that means regularity. So for example, we send out a weekly newsletter the batch every Wednesday. So people know it's coming Wednesday. You can spend a lot of the time on Wednesday. catching up on the latest news through the batch on Wednesday and for myself I've picked up a habit of spending some time every Saturday and every Sunday reading or studying and so I don't wake up on Saturday and have to make a decision to feel like reading or studying today or not. It's just, it's just what I do. And the fact is, a habit makes it easier. So I think if someone can get into that habit, it's like, you know, just like, we brush our teeth every morning. I don't think about it. If I thought about this a little bit annoying to have to spend two minutes doing that, but it's a habit that it takes no cognitive loads. But there's so much harder to make a decision every morning. So, and then actually that's the reason why we're the same thing every day as well. It's just more or less the decision. I just get out and then we're right for sure. So, but I think if you get that habit, that consistency of studying, then actually fuse easier.

SPEAKER_00

52:23 - 52:53

So yeah, and it's kind of amazing. In my own life, like I play guitar every day for at force myself to at least for five minutes, playing guitar. It's a ridiculously short period of time, but because I've gotten into that habit, it's incredible what you can accomplish in a period of a year or two years. You can become You know, exceptionally good, it's certain aspects of a thing by just doing it every day for a very short period of time. It's kind of a miracle that that's how it works. It's as up over time.

SPEAKER_01

52:53 - 53:19

Yeah, and I think it's often not about the birth of sustained ethas and the all-nightas because you could only do that limited number of times. It's the sustained ethas over a long time. I think you're reading two research papers. It's a nice thing to do. But the power is not reading two research papers. This is reading two research papers a week for a year. Then you read a hundred papers and you actually learn a lot when you read a hundred papers.

SPEAKER_00

53:19 - 53:38

So regularity and making learning a habit. Do you have general other study tips for particularly deep learning that people should In their process of learning, is there some kind of recommendations or tips you have as they learn?

SPEAKER_01

53:38 - 54:16

One thing I still do when I'm trying to study something really deeply is take handwritten notes. If there is, I know there are a lot of people that take the deep learning courses during the commutes or something where it may be more awkward to take notes. So I know it may not work for everyone. But when I'm taking courses on Coursera, you know, and I still take some of my every night then. The most recent I took was a course on clinical trials because those inches about that. I got out of my little, most skinned notebook and I was sitting in my desk and just taking down notes. So what the instructor was saying and that act, we know that that act of taking notes, preferably handwritten notes, increases retention.

SPEAKER_00

54:17 - 54:25

So as you're sort of watching the video, just kind of pausing maybe and then taking the basic insights down on paper.

SPEAKER_01

54:25 - 55:09

Yeah. So there have been a few studies of your search online. You find some of these studies that taking handwritten notes, because handwriting is slower as we're saying just now. It causes you to recode the knowledge in your own words more. And that process of recoding promotes long-term attention. This is as opposed to typing, which is fine. Again, typing is better than nothing or taking a class and not taking a notice that and not taking any class law. But comparing the handwritten notes and typing, you can usually type faster for a lot of people or do you can handwrite notes. And so when people type them all likely to just try and strike verbatim what they heard and that reduces the amount of recalling and that actually results in less long-term retention.

SPEAKER_00

55:09 - 55:21

I don't know what the psychological effect there is, but it's so true. There's something fundamentally different about writing and handwriting. I wonder what that is. I wonder if it is as simple as just the time it takes to write a slower.

SPEAKER_01

55:21 - 56:16

Yeah, and because you can't write as many words, you have to take whatever they said and summarize it into fewer words. And that summarization process requires deeper processing of the meaning, which then results in better attention. That's fascinating. I think, you know, because of course, Sarah has been so precise studying pedagogies. It's actually one of the passions that I really love learning how to more efficiently help others learn. One of the things I do both in creating videos or when we write the batch is I try to think is one minute spent with us going to be a more efficient learning experience than one minute spent anywhere else and we really try to You know, make a time efficient from the learners, because in every one's busy. So when we're editing, I often tell my teams, every word needs to fight for his life. And if you can delete it, where this is deleted, and not wait, that's not wasted in this time.

SPEAKER_00

56:17 - 56:45

That's so amazing that you think that way because there is millions of people they're impacted by your teaching and sort of that one minute spent has a ripple effect right through years of time which is fascinating about how does one make a career out of an interest in deep learning to have advice for people we just talked about sort of the beginning early steps but if you want to make it an entire life journey or at least a journey of a decade or two how do you do it

SPEAKER_01

56:46 - 58:56

So most important thing is to get started. Right. And I think in the early parts of the career course work, like the Delegant specialization, it's a very efficient way to master this material. So because, you know, instructors be at me or someone else or you know, Lawrence Moroni teaches our TensorFlow specialization, other things we're working on. spend effort to try to make it time efficient for you to learn new concepts. So coursework is actually a very efficient way for people to learn concepts and the beginning parts of break into new fields. In fact, one thing I see at Stanford, some of my PhD students want to jump in the research right away, and I actually tend to say, look, in your first couple years of PhD students, spend time taking courses because it lays a foundation. It's fine if you're less productive in your first couple years. You'd be better off in the long term. Beyond the certain point, there's materials that doesn't exist in courses, because it's too cutting-edge. The courses we created here, there's some practical experience that we're not yet diagnosed teaching in the course. And I think after exhausting the efficient course, we're then most people need to go on to either ideally work on projects, and then maybe also continue their learning by reading blog posts and research papers and things like that. Doing projects is really important. And again, I think it's important to start spawns, just do something. Today, you read about deep learning, if you say, oh, all these people are doing such exciting things. Whether I'm not building a neural network, they change the world and what's the point? Well, the point is sometimes building that time in your network, you know, be it Mness or upgrade to fashion Mness to whatever. So doing your own fun, Harvey project, that's how you gain the skills to let you do bigger and bigger projects. I find this to be true at the individual level and also at the organizational level. For a company to become good at machine learning, sometimes the right thing to do is not to tackle the giant project is instead to do the small project that lets the organization learn and then build up from there. But this is true both for individuals and for companies.

SPEAKER_00

58:57 - 59:16

taking the first step and then taking small steps is the key. Should students pursue a PhD, do you think you can do so much? That's one of the fascinating things on the machine learning. You can have so much impact without ever getting a PhD. So what are your thoughts? Should people go to grad school? Should people get a PhD?

SPEAKER_01

59:17 - 01:00:04

I think that there are multiple good options of which doing PhD could be one of them. I think that if someone's admitted to a top PhD program, you know, that MIT stands at top schools, I think that's a very good experience. Or if someone gets a job at a top organization at the top AI team, I think that's also a very good experience. There are some things you still need a PhD to do. If someone's aspiration is to be a professor at the top academic university, you just need a PhD to do that. But if it goes to, you know, start a complete, build a complete, do great technical work. I think a PhD is a good experience, but I would look at the different options available to someone, you know, where the place is, where you can get a job, where the place is, you get a PhD program and kind of weigh the pros and cons of those.

SPEAKER_00

01:00:05 - 01:00:49

It's just a linger in that for a little bit longer. What final dreams and goals do you think people should have? So what options should they explore? So you can work in industry. So for a large company, Google Facebook by do all these large companies that already have huge teams of machine learning engineers. You can also do with an industry sort of more research groups that kind of like Google research, Google Brain, that you can also do, like we said, a professor in academia and what else? Oh, you can still build your own company. You can do a startup. Is there anything that stands out between those options or are they all beautiful different journeys that people should consider?

SPEAKER_01

01:00:50 - 01:02:18

I think the thing that affects your experience more is less are you in this company versus that company or academia versus industry. I think the thing that affects your experience most is who are the people you interact with in a daily basis. So even if you look at some of the large companies, The experience of individuals in different teams is very different. And what matters most is not the logo above the door when you walk into the giant building every day. What matters the most is who are the 10 people who are the 30 people you interact with every day? So I actually tend to advise people if you're going to drop from a company. ask who's your manager, who are your peers, who are you actually going to talk to? We're all social creatures. We tend to, you know, become more like the people around us. And if you're working with great people, you will learn faster. Or if you get admitted, if you get a job in a great company or a great university, maybe the logo you walk in is great, but you're actually stuck in some team doing really work that doesn't excite you. And then that's actually really bad experience. So, this is true for universities and for large companies. For small companies, you can kind of figure out who you'd be working quite quickly. And I tend to advise people if a company refuses to tell you who you work with. So, from say, oh, join us. If a rotation system will figure out, I think that that's a worrying answer because it means you may not get sent to, you mean not actually get to team with brave peers and brave people that work with.

SPEAKER_00

01:02:18 - 01:02:56

It's actually a really profound advice that we sometimes sweep. We don't consider too rigorously or carefully. The people around you are really often, especially when you accomplish great things. It seems to the great things are accomplished because of the people around you. So it's not about the, whether you learn this thing or that thing or like you said the logo that hangs up top, it's the people. That's a fascinating. And it's such a hard search process of finding, just like finding the right friends and somebody to get married with and that kind of thing. It's a very hard search problem.

SPEAKER_01

01:02:58 - 01:03:14

Yeah, but I think when someone interviews at the university or the research lab or the launch corporation, it's good to insist on just asking, who are the people, who is my manager? And if you refuse to tell me, I'm going to think, well, maybe that's because you don't have a good answer. It may not be someone I like.

SPEAKER_00

01:03:14 - 01:03:25

And if you don't particularly connect or something feels off with the people, then don't stick to it. That's a really important signal to consider.

SPEAKER_01

01:03:27 - 01:03:43

And actually, in my standard cost, CS230, as was an ACN talk, I think I gave our long talk on career advice, including on the job search process and then some of these. So if you can find those videos on the last one, I'll point people to them.

SPEAKER_00

01:03:43 - 01:03:57

Beautiful. So the AI fund helps AI startup get off the ground. Or perhaps you can elaborate all the fun things it's evolved with. What's your device and how does one build a successful AI startup?

SPEAKER_01

01:03:59 - 01:04:33

You know, it's the kind of value. A lot of startup failures come from building a product that no one wanted. So when cool technology, but who's going to use it? So I think I tend to be very outcontraught and customer obsess. Ultimately, we don't get to vote if we succeed or fail, is only the customer that the only one that gets a thumbs up or thumbs down votes in the long term. In the short term, you know, there are various people who get various votes, but in the long term, that's why really matters.

SPEAKER_00

01:04:33 - 01:04:41

So as you build a startup, you have to constantly ask the question, will the customer give a thumbs up on this?

SPEAKER_01

01:04:41 - 01:05:20

I think so. I think startups that are very customer focused, customer says deeply understand the customer and are oriented to serve the customer on more likely to succeed. With a provision that I think all of us should only do things that we think create social good and lose the world for. I personally don't want to build addictive digital products, just a cell of ads. There are things that that could be lucrative that I won't do. But if we can find ways to serve people in meaningful ways, I think those can be, those can be great things to do. Either an academic setting or an corporate setting or a startup setting.

SPEAKER_00

01:05:20 - 01:05:24

So can you give me the idea of why you started the AI fund?

SPEAKER_01

01:05:26 - 01:07:20

I remember when I was leading the AI group at Baidu, I had two jobs, two parts of my job. One was to build an AI engine to support the existing businesses, and that was running, you know, just ran this performance off. The second part of my job at the time was to try to systematically initiate new lines of businesses using the company's AI capabilities. You know, the self-driving car team came out of my group, the smart speaker team, similar to what is Amazon Echo or Alexa in the US. But we actually announced it before Amazon did. So I do wasn't following Amazon that came out of my group. And I found that to be actually the most fun part of my job. So what I want to do was to build AI Fund as a start of studio to systematically create new startups from scratch. With all of the things we can now do of AI, I think the ability to build new teams to go after this rich space of opportunities is a very important way to very important mechanism to get these projects done that we'll move the world forward. So I've been fortunate to build a few teams that had a meaningful positive impact and I felt that we might be able to do this in a more systematic, repeatable way. So a startup studio is a relatively new concept. There are maybe dozens of startup studios right now. But I feel like all of us, many teams are so trying to figure out how do you systematically build companies with a high success rate. So I think even a lot of my new venture capital friends are seem to be more and more building companies rather than investing in companies. But I find the fascinating thing to do. to figure out the mechanisms by which we could system as create built successful team, successful businesses in areas that we find meaningful.

SPEAKER_00

01:07:20 - 01:07:30

So a startup studio is something is a place and a mechanism for startups to go from zero to success. So try to develop a blueprint.

SPEAKER_01

01:07:31 - 01:07:48

It's actually a place for us to build standards from scratch. So we often bring in founders and work with them, or maybe even have existing ideas that we match, founders with, and then this launches, hopefully into successful companies.

SPEAKER_00

01:07:48 - 01:07:57

So how close are you to figuring out a way to automate the process of starting from scratch and building successful AI startup?

SPEAKER_01

01:07:58 - 01:08:52

I think we've been constantly improving and iterating on our processes, but how we do that. So things like how many customer calls do we need to make and only get customer validation. How do you make sure this technology can be built? Quite a lot of our businesses need cutting-edge machine learning algorithm. So it kind of algorithms are developed in the last One or two years and even if it works in a research paper, it turns out taking the production is really hard. There are a lot of issues for making these things work in the real life that are not widely addressed in the academia. So how do you validate that this is actually doable? how the ability to get the specialized domain knowledge, be it an education or healthcare, whatever set you're focusing on. So I think we're actually getting, we've been getting much better at giving the entrepreneurs a high success rate, but I think we're still, I think the whole world is still in the early phases, thinking this out.

SPEAKER_00

01:08:52 - 01:09:00

But do you think there is some aspects of that process that are transferable from one startup to another to another to another?

SPEAKER_01

01:09:00 - 01:10:19

Very much so. Starting a company to most entrepreneurs is a really lonely thing and I've seen so many entrepreneurs not know how to make certain decisions like When do you need to, how do you do BDP sales? If you don't know that, this is really hard. Or how do you market this efficiently, other than buying ads, which is really expensive, other more efficient tactics that. For machine learning project, basic decisions can change the course of whether machine learning products or not. And so there are so many hundreds of decisions that entrepreneurs need to make and making mistakes in a couple of key decisions kind of a huge impact on the faith of the company. So I think a standard studio provides a support structure that makes starting a completely much less of a low-end experience. And also, when facing with these key decisions like trying to hire your first VP of engineering was a good selection criteria. How do you solve? Should I hire this person or not? By helping, by having like ecosystem around the entrepreneurs to find this to help, I think we help them at the key moments and hopefully significantly make them more enjoyable and then hire success rates.

SPEAKER_00

01:10:19 - 01:10:25

So there's somebody to brainstorm with in these very difficult decision points.

SPEAKER_01

01:10:25 - 01:10:32

And also to hope them recognize what they may not even realize as a key decision point.

SPEAKER_00

01:10:32 - 01:10:34

That's the first and probably the most important part, yeah.

SPEAKER_01

01:10:35 - 01:11:30

I can say one other thing. I think building companies is one thing, but I feel like it's really important that we build companies that move the world forward. For example, within the AFN team, there's one's an idea for a new company that if it had succeeded, whatever resulted in people watching a lot more videos in a certain narrow vertical type of video. I love that the business case was fine. The revenue case was fine, but I looked at it and just said, I don't want to do this. I don't actually just want to have a lot more people watch this type of video. It wasn't educational. It was educational, maybe. And so I code the idea on the basis that I didn't think it would actually help people. So whether building companies or work event prizes or doing personal projects, I think stuff to each of us to figure out what's the difference we want to make in the world.

SPEAKER_00

01:11:31 - 01:11:41

With lending AI, you help already establish companies grow their AI and machine learning efforts. How does a large company integrate machine learning into their efforts?

SPEAKER_01

01:11:42 - 01:12:53

AI is a general purpose technology and I think it will transform every industry. Our community has already transformed the logic sent to software internet sector, most software internet companies. They are also at the top right, five or six or three or four already have reasonable machine learning capabilities or getting there. It's the room from improvement. But when I look outside the software internet sector, everything from manufacturing, agriculture, healthcare, logistics, transportation, the so many opportunities that very few people are working on. So I think the next way for AI is for us to also transform all of those other industries. There was a McKinsey study estimating $13 trillion of global economic growth. U.S. GDP is $19 trillion, so $13 trillion is a big number, or PWC has been $16 trillion, so whatever number is this large. But the interesting thing to me was a lot of that impact would be outside the software internet sector. So we need more teams to work with these companies to help them adopt AI. And I think this is one thing, so I'll help drive global economic growth and make humanity more powerful.

SPEAKER_00

01:12:53 - 01:13:01

And like you said, the impact is there. So what are the best industries, the biggest industries where AI can help perhaps outside the software tech sector?

SPEAKER_01

01:13:01 - 01:14:27

Frankly, I think it's all of them. Some of the ones I'm spending a lot of time on are manufacturing, agriculture, looking to health care. For example, in manufacturing, we do a lot of work in visual inspection where today there are people standing around using the eye human eye to check if, you know, this plastic part or the smartphone or this thing has a stretch or a gentle something in it. We can use a camera to take a picture, use a algorithm, deep learning, and other things to check if it's defective or not and does how factories improve yield and improve quality and improve throughput. It turns out the practical problems we run into are very different than the ones you might read about in most research papers. The data says it really small, so we face small data problems. The factories keep on changing the environment, so it works well on your test set, but guess what? Something changes in the factory. The lights go on or off. Recently, there was a factory in which a dirt threw through the factory and pooped on something. So that changed stuff. Increasing our revved, make robustness. So all the changes happen in the factory. I find that we run a lot of practical problems that are not as widely discussed in academia. It's really fun being on a cutting-edge solving piece of problems. Maybe before many people are even aware that there is a problem.

SPEAKER_00

01:14:27 - 01:14:50

And that's such a fascinating space. You're absolutely right. But what is the first step that a company should take? It's just scary leap into this new world of going from the human eye, inspecting to digitizing that process, having a camera, having an algorithm? What's the first step? Like, what's the early journey that you recommend that you see these companies taking?

SPEAKER_01

01:14:51 - 01:16:10

I published a document called the AI Transformation Playbook that's online and taught briefly in the AI for everyone, course and course error, about the long-term journey that Counties should take, but the first step is actually to start small. I've seen a lot more companies fail by starting too big than by starting too small. Take even Google. Most people don't realize how hard it was and how controversial it was in the early days. So when I started Google Brain, it was controversial. People thought deep learning, near and near tried it, didn't wear it. Why would you want to do deep learning? So my first internal customer in Google was Google Speech Team, which is not the most lucrative project in Google, but not the most important. It's not web search or advertising. But by starting small, my team helped to speak to him, build a more accurate speech recognition system. And this caused their peers, other teams to start at more faith and deep learning. My second internal customer was the Google Maps team, where we used computer vision to read house numbers from basic street view images to more accurately locate houses with Google Maps, so improve the quality of the geodater. And there's only after those two successes that I then start to the most serious conversation with the Google Ads team.

SPEAKER_00

01:16:10 - 01:16:19

So there's a ripple effect that you show that it works in these cases and they just propagates through the entire company that this this thing has a lot of value and use for us.

SPEAKER_01

01:16:20 - 01:16:54

I think the early small-scale projects, it helps the teams gain faith, but also helps the teams learn what these technologies do. I still remember when our first GPU server, it was a server under some guy's desk. And then that taught us early important lessons about how do you have multiple users share a set of GPUs, which is really not obvious at the time. but those early lessons were important. We learned a lot from that first GPU server that later helped the teams think through how to scale it up to much launch deployments.

SPEAKER_00

01:16:54 - 01:17:00

Are there concrete challenges that companies face that you see is important for them to solve?

SPEAKER_01

01:17:01 - 01:18:56

I think building and deploying machine learning systems is hard. There's a huge gulf between something that works in a Jupyter notebook on your laptop versus something that runs in a production deployment setting in a factory or a culture plans or whatever. So I see a lot of people get something that works on your laptop and say, wow, look what I've done. That's great. That's hard. That's a very important first step. but a lot of teams underestimate the rest of the steps needed. So for example, I've heard this exact same conversation between a lot of machine learning people and business people. The machine learning person says, look, my algorithm does well on the test set and the clean test set I didn't peak. And the business person says, thank you very much, but your algorithm sucks, it doesn't work. And the machine learning person says, no, wait, I did well on the test set. And I think there is a gulf between what it takes to do well on a test set on your high-drive versus what it takes to work well in a deployment setting. Some common problems, robustness and generalization, you need to find something in the factory, maybe they chop down a tree outside the factory so that treat no longer covers the window and the lighting is different. So the test set changes. And in machine learning, especially in academia, we don't know how the view of test set distributions are dramatically different than the training set distribution. You know that this research, this stuff like domain annotation, transfer learning, you know that the people working on it, but we're really not good at this. So how do you actually get this to work? Because your test set distribution is going to change. And I think also, if you look at the number of lines of code in the software system, the machine learning model is maybe 5% or even fewer relative to the entire software system you need to build. So how do you get all that work done? They make a reliable and systematic.

SPEAKER_00

01:18:56 - 01:19:03

So good software engineering work is fundamental here to building a successful small machine learning system.

SPEAKER_01

01:19:03 - 01:21:00

Yes, and the software system needs to interface with people's workloads. So machine learning is automation theorists. If we take one task out of many tasks that are done in factories. So, actually, there's lots of things. One task is vision spectrum. If we automate that one task, it can be really valuable. But you may need to redesign a lot of other tasks around that one task. For example, Say the machine learning algorithm says this is defective. What is supposed to do? Do you throw away? Do you get a human to double check? Do you want to rework it or fix it? So you need to redesign a lot of tasks around that thing you've now automated. So planning for the change management and making sure that the software you write is consistent with the new workflow. And you take the time to explain to people when it's happened. So I think What land AI has become good at, and then I think we learn by making mistakes and painful experiences or might. What become good at is working with upon this, to think through all the things beyond just the machine learning model around you. You put the notebook, but to build the entire system, manage the change process, and figure out how to deploy this in the way that has an actual impact. The processes that the large software tech companies use for deploying don't work for a lot of other scenarios. For example, when I was leading your large speech teams, if the speech recognition system goes down, what happens? What alarms goes off? And then someone like me would say, hey, you 20 engineers, please fix this in the market. But if you have a system go down the factory, there are not 20 machine learning engineers sitting around. You can pay your duty and have them fix it. So how do you Do you have the maintenance or the death ops or the ML ops or the other aspects of this? So these are concepts that I think land AI and then a few other teams on the cutting-edge are but we don't even have systematic terminology yet to describe some of the stuff we do because I think we're inventing an butterfly.

SPEAKER_00

01:21:02 - 01:21:12

So you mentioned some people are interested in discovering mathematical beauty and truth in the universe and you're interested in having big positive impact in the world.

SPEAKER_01

01:21:12 - 01:21:14

Let me ask a- The two are not inconsistent.

SPEAKER_00

01:21:14 - 01:21:53

No, they're all together. I'm only have joking because you're probably interested a little bit in both. But let me ask a romanticized question. So much of the work and your work and our discussion today has been on the applied AI. maybe you can even call narrow AI, where the goal is to create systems that automate some specific process that adds a lot of value to the world. But there's another branch of AI starting with Alan Turing that kind of dreams of creating human level or superhuman level intelligence. Is this something you dream of as well? Do you think we human beings love or build a human level intelligence or superhuman level of intelligence system?

SPEAKER_01

01:21:54 - 01:22:05

I would love to get to AGI and I think humanity will, but whether it takes 100 years or 500 or 5000 I find hard to estimate.

SPEAKER_00

01:22:05 - 01:22:18

Do you have some folks have worries about the different trajectories that path would take even existential threats of an AGI system? Do you have such concerns whether in the short term or the long term?

SPEAKER_01

01:22:19 - 01:22:53

I do worry about the long term fate of humanity. I do wonder as well. I do worry about overpopulation on the planet Mars. Just not today. I think there will be a day when maybe maybe someday in the future, Mars would be polluted. There are these children dying and some will look back at this video and say Andrew, how is Andrew so hot that he didn't care about all these children dying on the planet Mars? And I apologize to the future viewer. I do care about the children, but I just don't know how the product of the work on that today.

SPEAKER_00

01:22:53 - 01:23:25

Your picture will be in the dictionary for the people who are ignorant about the overpopulation of Mars. Yes. So it's a long term problem. Is there something in the short term? We should be thinking about in terms of aligning the values of our AI systems, with the values of us humans, sort of something that's too arousal and other folks are thinking about, as this system develops more and more, we want to make sure that it represents the better angels of our nature, the ethics, the values of our society.

SPEAKER_01

01:23:26 - 01:25:33

You know, if you take cell driving cars, the biggest problem with cell driving cars is not that there's some trolley dilemma and you teach this. So, you know, how many times when you're driving your car, did you face this moral dilemma? Is it food that I trash into? So, I think cell driving cars will run the problem. roughly as often as we do when we drive our cars. The biggest problem of self-joint cars is when there's a big white truck across the road and what you should do is break and not crash into it and the self-joint car fails and it crashes into it. So I think we need to solve that problem for us. I think the problem was some of these discussions about HGI alignments, the paperclip problem, is that is a huge distraction from the much harder problems that we actually need to add just today. Some hard problems need to add just today. I think bias is a huge issue. I worry about wealth inequality. The AI and internet are causing the separation of concentration of power because we can now centralized data used air to process it, and so industry after industry, with effect every industry. So the internet industry has a lot of win-and-take modes of win-and-take all dynamics, but with infected all these other industries, so also giving these other industries win-and-take modes of win-and-take all flavors. So look at what Uber and Lyft did to taxi industry. So we're doing this type of thing to log in. So this, so we're creating tremendous wealth, but how do we make sure that the wealth is fairly shared? I think that, and then how do we help people whose jobs are displaced? You know, I think education is part of it. There may be even more that we need to do than education. I think bias is a serious issue. They're at various uses of AI and deep fakes being used for various affairs purposes. So I worry about some teams maybe accidentally and I hope not deliberately making a lot of noise about things that problems in the distant future rather than focusing on some to the much harder problems.

SPEAKER_00

01:25:33 - 01:25:55

Yeah, the overshadowed the problems that we have already today that are exceptionally challenging like those you said and even the silly ones but the ones that have a huge impact which is the lighting variation outside of your factory window. that ultimately is what makes a difference between, like you said, the Jupiter notebook and something that actually transforms an entire industry, potentially. Yeah.

SPEAKER_01

01:25:55 - 01:26:11

And I think, and then just to the, some companies are a regulator, it comes to you and says, look, your product is messing things up. Fixing it may have a revenue impact. Well, it's much more fun to talk to them about how your problem is not the wife of humanity that's into a face that actually really hard problems we face.

SPEAKER_00

01:26:13 - 01:26:30

So your life has been a great journey from teaching, to research, to entrepreneurship, to questions. One, are there regrets moments that if you went back, you would do differently, and two, are there moments you're especially proud of, moments that made you truly happy?

SPEAKER_01

01:26:30 - 01:27:19

You know, I've made so many mistakes. It feels like every time I discover something, I go, why did I think of this? you know, five years earlier or you're 10 years earlier. And sometimes I read a book and I go, I wish I read this book 10 years ago, my life would have been so different. Although that happened recently and then I was thinking, if only I read this book when we were starting a course era, I could have been so much better. But I discovered that book had not yet been written or it's starting to cost error, so that may be better. But I find that the process of discovery, we keep on finding out things that seem so obvious in hindsight. But it always takes us so much longer than I wish to figure it out.

SPEAKER_00

01:27:20 - 01:27:35

So on the second question, are there moments in your life that if you look back that you're especially proud of or you're especially happy that that feels you with happiness and fulfillment?

SPEAKER_01

01:27:35 - 01:28:13

Well, two answers. One, that's my daughter, Nova. Yes, of course. She's like, no matter how much time I spend for her, I just can't spend in her time with her. Congratulations. Thank you. And then second is the helping other people. I think to me, I think the meaning of life is helping others achieve whatever our dairy dreams. And then also to try to move the world forward by making humanity more powerful as a whole. So the times that I felt most happy and most proud was when I felt someone else allowed me the good fortune of helping them a little bit on the path to their dreams.

SPEAKER_00

01:28:14 - 01:29:18

I think there's no better way to end it than talking about happiness and the meaning of life. So, it's a huge honor. Me and millions of people, thank you for all the work you've done. Thank you for talking to us. Thank you so much, thanks. Thanks for listening to this conversation with Andrew Eng. And thank you to our presenting sponsored cash app. Download it, use code Lex Podcast. You'll get $10 and $10 will go to first, an organization that inspires and educates young minds to become science and technology innovators of tomorrow. If you enjoy this podcast, subscribe my YouTube, give it $5,000 Apple Podcast, support around Patreon, or simply connect with me on Twitter at Lex Friedman. And now, let me leave you with some words of wisdom from Andrew Ang. Ask yourself, if you're working on success beyond your wildest dreams, which you have significantly helped other people. If not, then keep searching for something else to work on. Otherwise, you're not living up to your full potential. Thank you for listening and hope to see you next time.