Transcript for #381 – Chris Lattner: Future of Programming and AI
SPEAKER_01
00:00 - 06:55
The following is a conversation with Chris Ladner. It's third time on this podcast. As I've said many times before, he's one of the most brilliant engineers in modern computing. Having created LLAM Compiler Infrastructure Project, the Clank Compiler, the Swift Programming Language, a lot of key contributions to TensorFlow and TPUs as part of Google. He served as Vice President of Autopilot Software at Tesla. A software innovator and leader at Apple, and now he co-created a new full stack AI infrastructure for distributed training, inference, and deployment on all kinds of hardware called Modular. And a new programming language called Mojo. That is a super set of Python, giving you all the usability of Python, but with the performance of Cc++. In many cases, Mojo code has demonstrated over 30,000 x speed up over Python. If you love machine learning, if you love Python, you should definitely give Mojo a try. This programming language, this new AI framework and infrastructure, and this conversation with Chris is mind-blowing. I love it. It gets pretty technical a time, so I hope you hang on for the ride. And now a quick view set convention of each sponsor. Check them out in the description. It's the best way to support this podcast. We've got I Herb for Health, Numerai for World's hardest data science tournament and inside tracker for tracking your biological data, choose wise and my friends. Also, if you want to work with our team, our amazing team, who are always hiring CadalexFreatment.com slash hiring. And now, until the full ad reads, as always no adds in the middle, I try to make this interesting, but if you skip them, if you must, my friends, please still check out the sponsors. I enjoy their stuff, maybe you will too. This show is brought to you by Iherb. applied for a website a place we can go and get high quality selected just for you health and wellness products for great value inexpensive affordable I get fish oil over there it's one of the main supplements I've taken for a long long long time in pill form makes me feel like I'm oiling the machine that is the human body and the human mind Even just saying that makes you wonder, what is the power of the placebo effect in all of this? I'm actually a big believer in the power of the human mind, coupled with the effectiveness of medication and supplements and nutrition and diet and exercise all of it. If you coupled the belief that the thing will work with stuff that actually works, It's like a supercharge. There's something about the mind allowing the thing to work and maybe the belief that it works reduces stress and has kind of secondary and tertiary effects that you can't even comprehend on the entirety of the biological system that is a human body. It's so fascinating. And it's so difficult to do good studies on that. Because the whole point is you want to study the effect of the entirety of the lifestyle and diet decisions you make on the entirety of the human organism, the billions of organisms that make up a single organism that is you. Anyway, get 22% off with promo code Lex, when you go to iherb.com slash Lex. This shows also brought to you by Numerai, a hedge fund that uses AI and machine learning to make investment decisions. They create a tournament, a challenge for all machine learning gurus to come and to compete against each other. To build the best predictive models for financial markets, the stakes are high. This is the kind of problems and the machine learning space that are really care about real world problems with high stakes, not toy problems, not image net. Now, image net and all that kind of stuff is good for exploring little ideas, the nuances of architectures, training procedures of cool little ideas of the entirety of the pipeline of how to do machine learning or for education purposes. But if you want to really develop ideas that work in the real world, you should be working on real world data where the stakes are high. And this is probably one of the hardest tournaments for machine learning in the world. Head over to numeri.com slash lex to sign up for a tournament and hone your machine learning skills that's numeri.com slash lex for a chance to play against me and win a share of the tournament prize pool. This show is also brought to you by Insight Tracker, a service I used to track biological data. As I was saying, the entirety of the biological organism that is you, the individual cells, all the life force that makes up the details of the cells and the proteins and the blood and the organs and the individual systems that are interconnected to each other in these incredible complex ways. that are all asynchronously communicating with each other through chemistry, through physics, through electrical signals, through mechanical signals, through whatever the hell signals that I can't even find the right words for. It's an incredible system. How is it possible that this thing works at all? I'm looking out into the distance now as I say these words and my visual cortex is processing all of it and somehow makes sense of all of it. This is incredible. Anyway, measuring data on this incredible organism is good. And companies that allow you to measure it in order to make lifestyle decisions is obviously the future. That's why I support Insight Tracker. You can get special savings for a limited time when you go to InsightTracker.com slash Lex. This is the Lex Freedom podcast to support it. Please check out our sponsors in the description. And now, dear friends, here's Chris Ladner. It's been, I think, two years since we last talked in that time. You somehow went and co-created a new programming language called Mojo. So it's optimized for AI. It's a super set of Python. Let's look at the big picture. What is division for Mojo?
SPEAKER_00
06:55 - 08:24
Well, I mean, I think I have to zoom out. So I've been working on a lot of related technologies for many, many years. So I've worked on LFM and a lot of things in mobile and servers and things like this. But the world's changing and what's happened with AI is we have new GPUs and new machine learning accelerators and other A6 and things like that that make AI go real fast. Google worked on TPUs. That's one of the biggest larger-scale deployed systems that exist for AI. And really what you see is if you look across all of the things that are happening in the industry, there's this new compute platform coming. And it's not just about CPUs, or GPUs, or NPUs, or IPs, or whatever all the PUs. It's about how do we program these things? And so for software folks like us, it doesn't do us any good if there's this amazing hardware that we can't use. And one of the things you find out really quick is that having the theoretical capability of programming something and then having the world's power and the innovation of all the smart people in the world get unleashed on something can be quite different. And so really where Mojo came from was starting from a problem of we need to be able to take machine learning, take the infrastructure underneath it and make it way more accessible, way more usable, way more understandable by normal people and researchers and other folks that are not themselves like experts in GPS and things like this. And then through that journey, we realize, hey, we need syntax for this. We need to do programming language.
SPEAKER_01
08:25 - 09:00
So one of the main features of the language I say so fully ingest is that it allows you to have the file extension to be an emoji or the fire emoji, which is one of the first emojis used as a file extension I've ever seen in my life and then you ask yourself the question why in the 21st century I were not using unicode for file extensions. This is an epic decision. I think clearly the most important decision in most, but you could also just use M.O.J.O as the file extension.
SPEAKER_00
09:00 - 09:08
Well, so, okay, so take a step back. I mean, come on, Lex, you think that the world's ready for this? This is a big moment in the world, right? This is where we're released into the world. This is innovation.
SPEAKER_01
09:09 - 09:18
I mean, it really is kind of brilliant. It modes a such a big part of our daily lives. Why isn't not in programming?
SPEAKER_00
09:18 - 09:39
Well, and like you take a step back and look at what file extensions are, right? They're basically metadata. Right, and so why are we spending all the screen space on them and all the stuff also, you know, you have them stacked up next text files and PDF files and whatever else, like if you're going to do something cool, you want to stand out right emojis are colorful they're visual they're beautiful right was been their response so far from
SPEAKER_01
09:40 - 09:44
Is there a support? I'm like Windows on operating system in displaying like file explorer.
SPEAKER_00
09:44 - 11:09
Yeah, the one problem I've seen is that get doesn't escape it right? And so things that the fire mode is unprintable and so it like prints out weird hex things if you use the command line get tool. But everything else as far as I'm aware works fine. And I have faith that get can be improved. So get hard. It's fine. GitHub is fine. Yeah, GitHub is fine. Visual Studio code, Windows, like all this stuff, totally ready because people will have internationalization in their normal part of their paths. So let's just take the next step, right? somewhere between oh wow that makes sense cool I like any things to oh my god you're killing my baby like what are you talking about this can never be like I can never happen this how am I gonna type this like all these things and so this is something where I think that the world will get there we don't have to bet the whole farm on this I think we can provide both paths but I think it'll be great one can we have emojis this part of the code Yeah, so I mean, let's just provide that. So I think that we have personal support for that. It's probably not fully done yet, but yeah, you can you can do that. For example, in Swift, you can do that for sure. So an example we give gave it Apple was the dog cow. Yeah. So that's a classical Mac heritage thing. And so it's the dog and the cow emoji together. And that could be a variable name. But of course, the internet went and made pile of poop for everything. Yeah. So you know, if you want to name your function pile of poop, then you can totally go to town and see how they could sort of review.
SPEAKER_01
11:12 - 11:21
Okay, so let me just ask a bunch of random questions. So is Mojo primarily designed for AI? Or is it a general purpose program?
SPEAKER_00
11:21 - 12:34
Yeah, good question. So it's AI first. And so AI is driving a lot of the requirements. And so modular is building and designing and driving Mojo forward. It's not because it's an interesting project theoretically to build. It's because we need it. And so, what modular were really tackling the AI infrastructure landscape and the big problems in AI and the reasons that it is so difficult to use in scale and adopt and deploy and like all these big problems in AI. And so, we're coming out from that perspective. Now, when you do that, when you start tackling these problems, you realize that the solution to these problems isn't actually an AI-specific solution. And so while we're doing this, we're building mode should be a fully general program language. And that means that you can obviously tackle GPUs and CPUs and like these AI things, but it's also a really great way to build NumPy. And other things like that, or just if you look at what many Python libraries are today, often they're a layer of Python for the API, and they end up being C and C++ could underneath them. That's very true in AI, that's true in lots of other demands as well. And so anytime you see this pattern, that's an opportunity for Moja to help simplify the world and help people have one thing.
SPEAKER_01
12:34 - 12:43
Optimize through simplification. by having one thing. So you mentioned modular, modures the programming language, modular is the whole software stack.
SPEAKER_00
12:43 - 14:18
So just over a year ago, we started this company called modular. Yeah. Okay. What modular is about is, it's about taking AI and up leveling it into the next generation. Right. And so if you take step back, what's gone on in the last five, six, seven, eight years is that we've had things like TensorFlow and PyTorch. And these other systems come in. You've used them. You know this. And what's happened is these things have grown like crazy. They get tons of users in production deployments scenarios. It's being used to power so many systems. I mean, AI is all around us now. Now, I used to be controversial years ago, but now it's a thing. But the challenge with these systems is that they haven't always been thought out with current demands in mind. So you think about it, when were all of them eight years ago? Well, they didn't exist, right? AI has changed so much and a lot of what people are doing today are very different than when these systems were built. Meanwhile, the hardware side of this has gone into a huge mess. There's tons of new chips and accelerators and every big company is announcing a new chip every day it feels like. And so, between that, you have like, moving system on one side, moving system on the other side, and it just turns into the gigantic mess, which makes it very difficult for people to actually use AI, particularly in production deployments and areas. And so what modular doing is we're helping build out that software stack to help solve some of those problems, so then people can be more productive and get more AI research into production. Now, what Mojo does is it's a really, really, really important piece of that. And so that is part of that engine and part of the technology that allows us to solve these problems.
SPEAKER_01
14:18 - 14:31
So Mojo is a programming language that allows you to do a high level programming, the low level programming. They do all kinds of programming in that spectrum that gets your close and closer to the hardware.
SPEAKER_00
14:32 - 14:35
So, take a step back. So, Lex, what do you love about Python?
SPEAKER_01
14:35 - 14:41
Oh, boy. Where do I begin? What is love? What do I love about Python?
SPEAKER_00
14:41 - 14:43
Your guy who knows love, I know this.
SPEAKER_01
14:43 - 15:16
Yes. How intuitive it is? How it feels like I'm writing natural language English. how when I can not just write but read other people's codes and how I can understand it faster. It's more condensed than other languages like ones. I'm really familiar with like C++ and see there's a bunch of sexy little features. Yeah. Move probably talk about some of them but list comprehensions and stuff like this.
SPEAKER_00
15:17 - 15:25
And don't forget the entire ecosystem of all the packages. There's probably huge there's always something if you want to do anything there's always a package.
SPEAKER_01
15:25 - 15:44
Yeah, so it's not just the ecosystem of the packages and the ecosystem of the humans that do it that that's a really That's an interesting dynamic. I think something about the usability and the ecosystem makes the thing viral. It grows and it's a virtuous cycle.
SPEAKER_00
15:44 - 16:11
Well, there's many things that went into that. I think that ML was very good for Python. And so I think that TensorFlow and PyTorchney systems embracing Python really took and helped Python grow. But I think that the major thing underlying it is that Python's like the universal connector. Right, it really helps bring together lots of different systems so you can compose them and build out larger systems without having to understand how it works. But then what is the problem with Python?
SPEAKER_01
16:11 - 16:15
Well, I guess you could say several things, but probably that is slow.
SPEAKER_00
16:15 - 16:27
I think that's usually what people complain about. Right, and so I mean, other people complain about tabs in spaces versus curly braces or whatever, but I mean, those people are just wrong because it is actually just better to use an orientation.
SPEAKER_01
16:29 - 17:13
Wow, strong words. So actually, it's a small change. Let's take that. Let's take all kinds of changes. Oh, come on. Alex, you can push me out. I can take a line and decide. Listen, I've recently left EMAX for VS code. Okay. The kind of hate mail I had to receive because on the way to doing that, I also said I've considered VIM. Yep. And chose not to. And when would the escode? And you're just trying to deep religions, right? Anyway, Tabs is an interesting design decision, and so you really ridden a new programming language here. Yes, it is a super set of Python, but you can make a bunch of different interesting decisions here. And you chose actually to stick with Python as in terms of some of the syntax.
SPEAKER_00
17:13 - 18:32
Well, so let me explain why, right? So I mean, you can explain this in many rational ways. I think that the annotation is beautiful, but that's not a rational explanation, right? So, but I can defend it rationally, right? So, first of all, Python 1 has millions of programmers. It is huge. It's everywhere. It owns machine learning, right? So, factually, it is the thing, right? Second of all, if you look at it, C code, C++ code, Java, whatever, Swift, curly brace languages also runs through formatting tools and get indented. And so if they're not intended correctly, first of all, we'll twist your brain around. It can lead to bugs. There's notorious bugs that have happened across time where the annotation was wrong or misleading, and it wasn't format right. And so it turned into an issue. And so what ends up happening in modern, large scale code bases is people run automatic formatters. So now what you end up with is indentation and curly braces. Well, if you're going to have you know, the notion of grouping, why don't have one thing, right, and get rid of all the clutter and have a more beautiful thing. Right. I'll see you look at many of these languages. It's like, okay, well, you can have curly braces or you can omit them if there's one statement or you just like enter this entire world of complicated design space that objectively you don't need if you have Python silent and Tation.
SPEAKER_01
18:32 - 18:44
So yeah, I would love to actually see statistics on errors made because of indentation. like how many errors are made in Python versus in C++ that have to do with basic formatting all that kind of stuff.
SPEAKER_00
18:44 - 19:03
I would love to see. I think it's probably pretty minor because once you get, like, you use VS code either too. So if you get VS code set up, it does the annotation for you generally. Right. And so you don't, you know, it's actually really nice to not have to fight it. And then what you can see is the editor is telling you how your code will work by indenting it, which I think is pretty cool.
SPEAKER_01
19:03 - 19:11
I honestly don't think of ever, I don't remember having an errand Python because I intend to stuff wrong.
SPEAKER_00
19:11 - 20:20
So I mean, I think that there's, again, this is a religious thing. And so I can joke about it and I love, I love to kind of, you know, I realize that this is such a polarizing thing and everyone wants to argue about it. So I like poking at the bear a little bit, right? But frankly, right, come back to the first point, Python 1. Like it's huge, it's an AI, it's the right thing. For us, like we see Mojo being an incredible part of the Python ecosystem, we're not looking to break Python or change it or quote unquote fix it. We love Python for it as our view is that Python is just not done yet. And so if you look at, you know, you mentioned Python being slow. Well, there's a couple of different things to go into that, which we can talk about if you want. But one of them is it just doesn't have those features that you would use to do, see like program. And so if you say, okay, well, I'm forced out of Python and to see for certain use cases. Well, then what we're doing is we're saying, okay, well, why, why is that? Can we just add those features that are missing from Python back up to Mojo, and then you can have everything that's great about Python, all the things you're talking about the love. Plus, not be forced out of it when you do something a little bit more computationally intense or weird or hardwarey or whatever it is that you're doing.
SPEAKER_01
20:20 - 20:30
Well, I mean, any questions I want to ask, but hi level again, is it compiled? There's an interpreter language. So Python is just in time compilation. What's Mojo?
SPEAKER_00
20:31 - 22:16
So Mojo is a complicated answer. It does all the things. So it's interpreted as chicken piled and it's statically compiled. And so this is for a variety of reasons. So one of the things that makes Python beautiful is that it's very dynamic. And because it's dynamic, one of the things they added is that it has this powerful metaprogramming feature. And so if you look at something like PyTorch or TensorFlow, or, or, I mean, even a simple use case like you define a class that has the plus method, right? You can overload the Dunder methods like Dunder add, for example, and then the plus method works on your class. And so it has very nice and very expressive dynamic, meta programming features. In Mojo, we want all those features come in. Like we don't want to break by, then we want all the work. But the problem is, is you can't run those super dynamic features on an embedded processor or on a GPU, right? Or if you could, you probably don't want to just because of the performance. And so we entered this question of saying, okay, how do you get the power of this dynamic meta programming? into a language that has to be super efficient in specific cases. And so what we did was it said, okay, we'll take that interpreter. Python has an interpreter in it, right? Take the interpreter and allow to run it compile time. And so now what you get is you get compile time meta programming. And so this is super interesting and super powerful because One of the big advantages you get is you get Python-style expressive APIs. You get the ability to have overloaded operators. And if you look at what happens inside of, like, PyTorch, for example, with automatic differentiation, and eager mode, like all these things, they're using these really dynamic and powerful features at runtime. But we can take those features and lift them so that they run a compile time.
SPEAKER_01
22:16 - 22:23
So you're, because C++ does matter programming with templates, but it's really messy.
SPEAKER_00
22:23 - 23:59
It's super messy. It's always, it was accidentally, I mean, different people have different interpretations. My interpretation is that it was made accidentally powerful. It was not designed to be turned complete, for example, but that was discovered kind of along the way accidentally. And so there have been a number of languages in the space. And so they usually have templates, or coding stations, include copying features of various sorts. Some more modern languages, or some more newer languages, let's say, like, you know, they're fairly unknown, like Zig, for example, says, okay, well, let's take all of those types so you can run it, all those things you can do at runtime and allow them to happen at compile time. And so one of the problems with C++, I mean, which is one of one of the problems with C++. There we go. It's wrong words. It doesn't. Oh, that's okay. I mean, everybody hates me for a variety of reasons. Anyway, I'm sure, right? I've read just the way they show love. I've read enough C++ code to earn a little bit of grumpyness with C++. But one of the problems with it is that the meta programming system templates is just a completely different universe from the normal runtime programming world. And so if you do meta programming and programming, it's just like a different universe, different syntax, different concepts, different stuff going on. And so again, one of our goals with modules to make things really easy to use, easy to learn, and so there's a natural stepping stone. And so as you do this, you say, okay, well, I have to do programming at runtime, after you programming at compile time. Why are these different things?
SPEAKER_01
23:59 - 24:12
How hard is that to pull it out? Because that sounds to me as a fan of meta programming in C++ even. How hard is it to pull that off? That sounds really, really exciting because you can do the same style programming at compile time in a runtime.
SPEAKER_00
24:12 - 25:22
That's really, really exciting. Yep. And so I mean, in terms of the compiler implementation details, it's hard. I won't be shy about that. It's super hard. It requires, I mean, what Mojo has underneath the cover is a completely new approach to the design of the compiler itself. And so this builds on these technologies like MLA. I already mentioned, but it also includes other like caching and other interpreters and jip compilers and other stuff like that. You have like an interpreter inside that within the compiler. Yes. And so it really takes the standard model of programming languages and kind of twist it and unifies it with the runtime model, which I think is really cool. And to me, the value of that is that, again, many of these languages have meta programming features. Like, they grow macros or something, right? List, right? I know your roots, right? And this is a powerful thing, right? And so, you know, if you go back to List, one of the most powerful things about it is that it said that the meta programming and the programming are the same. Right. And so that made it way simpler, way more consistent, way easier to understand reason about and it made it more composable. So if you've got a library, you can use it both at runtime and compile time, which is pretty cool.
SPEAKER_01
25:22 - 25:38
Yeah. And for machine learning, I think meta programming, I think we could generally say is extremely useful. And so you get features, I mean, jump around, but there's the feature of auto tuning in adaptive compilation just blows my mind.
SPEAKER_00
25:38 - 26:30
Well, so okay, so let's come back to that. So what is machine learning? What is machine learning model? Like you take a PyTorch model off there. It's really interesting to me because what PyTorch and what TensorFlow and all these frameworks are kind of pushing compute into is they're pushing into like this abstract specification of a compute problem. which then gets maps and a whole bunch of different ways. And so this is why it became a meta programming problem. You want to be able to say, cool, I have this neural net. Now run it with batch size 1000. Do a mapping across batch or okay, I want to take this problem now run it across 1000 CPUs or GPUs. And so like this problem of like describe the compute and then map it and do things and transform it or like actually it's very profound and that's one of the things that makes machine learning systems really special.
SPEAKER_01
26:30 - 27:01
Maybe you can describe auto tuning and how do you pull off? I mean, I guess adaptive compilation is what we're talking about as a matter of programming. Yeah, how do you pull off auto tune? I mean, is that as profound as I think it is? It seems like a really like, you know, we'll mention list comprehension. To me, from a quick glance of Mojo, which by the way, I have to absolutely dive in. as I realize how amazing this is, I absolutely must have, and it that looks like it's just an incredible feature for machine learning people.
SPEAKER_00
27:01 - 29:14
Yeah, well, so what is Autotang? So take a step back. Autotang is a feature in Mojo. It's not, so very, very little of what we're doing is actually research. Like many of these ideas have existed in other systems, in other places, and so what we're doing is we're pulling together good ideas, remixing them, and making them into a hopefully a beautiful system, right? And so auto tuning, the observation is that Turns out hardware systems, algorithms are really complicated. Turns out maybe you don't actually want to know how the hardware works. Right? A lot of people don't, right? And so there are lots of really smart hardware people. And I know a lot of them where they know everything about, okay, that the cache size is this and the number of registers is that. And if you use this, what length of vector is going to be a super efficient maps directly onto what it can do. All this kind of set for the GPU has SMs and has a warp size of whatever, right? All the stuff that goes into these things are the tile size of a TPU is 128, like these these factoids, right? My belief is that most normal people, and I love hardware people also, I'm not trying to fend literally everybody in the internet, but most programmers actually don't want to know this stuff, right? And so if you come out of from perspective of how do we allow people to build both more abstracted, but also more portable code? because, you know, could be that the vector length changes, or the cache-size changes, or it could be that the tile size of your matrix changes, or the number, you know, an A100 versus an H100 versus a Volta versus a whatever GPU have different characteristics, right? A lot of the algorithms that you run are actually the same, but the parameters, these magic numbers, you have to fill in and up being really fiddly numbers that an expert has to go figure out. And so what auto tuning does, it says, okay, well, guess what? There's a lot of compute out there. Right. So instead of having humans go randomly try all the things or do a grid search or go search some complicated multi-dimensional space. But we have computers to that. Right. And so what auditing does is you can say, hey, here's my algorithm. If it's a matrix operation or something like that, you can say, OK, I'm going to carve it up into blocks. I'm going to do those blocks in parallel. And I want this with 128 things that I'm running on. I want this way or that way or whatever. And you can say, hey, go see which ones actually empirically better on the system.
SPEAKER_01
29:15 - 29:18
And then the result of that you cash for that system. Yep.
SPEAKER_00
29:18 - 29:38
You see it. And so come back to twisting your compiler brain. Right. So not only does the compiler have an interpreter that she used to do meta programming, that compiler that interpreter that meta programming now has to actually take your code and go run it on a target machine. See, see, which one it likes the best and then stitch it in and then keep going.
SPEAKER_01
29:38 - 29:40
Right. So part of the compilation is machine specific.
SPEAKER_00
29:41 - 30:39
Yeah, well, so I mean, this is an optional feature. I see you don't have to use it for everything, but yeah, if you're, so one of one of the things that we're in the quest of is ultimate performance. Yes, right, ultimate performance is important for a couple of reasons, right? So if you're an enterprise, you're looking to save costs and compute and things like this, ultimate performance translates to, you know, fewer servers. Like if you care about the environment, hey, better performance leads to more efficiency. I mean, you could joke and say, like, you know, Python's bad for the environment. And so if you move to Mojo, it's like at least 10x better, just out of the box. And then keep going, right? But performance is also interesting because it leads to better products. And so in the space of machine learning, right, if you reduce the latency of a model, so that runs faster. So every time you query the server right in the model, it takes less time. Well, then the product team can go and make the model bigger. Well, that actually makes it so you have a better experience as a customer. And so a lot of people care about that.
SPEAKER_01
30:39 - 30:54
So for auto tuning for like tile size, you mentioned 120 for TPU, you would specify like a bunch of options to try. Just in the code, just simple statement, and then you could just set and forget and know depending what wherever it compiles, it'll actually be the fastest.
SPEAKER_00
30:55 - 31:55
Yeah, exactly in the beauty of this is that it helps you in a whole bunch of different ways, right? So if you're building, so often what will happen is that, you know, you've written a bunch of software yourself, right? You wake up one day and say, I have an idea. I'm going to go cut up some code. I get to work. I forget about it. And move on with life, I come back six months or a year or two years or three years later, you dust it off and you go use it again in a new environment. And maybe your GPU is different, maybe you're running on a server instead of a laptop, maybe whatever. And so the problem now is you say, OK, well, I mean, again, not everybody cares about performance, but if you do, you say, OK, well, I want to take advantage of all these new features. I don't want to break the old thing, though. Right, and so the typical way of handling this kind of stuff before is, you know, if you're talking about people's templates or you're talking about, see with macros, you end up with ifduffs, you get like all these weird things get layered in, make the code super complicated, and then how do you test it? Right, it becomes this crazy complexity, multidimensional space, you have to worry about. And, you know, that just doesn't scale very well.
SPEAKER_01
31:57 - 32:13
Actually, let me just jump around if I go to some specific features like the increase in performance here that we're talking about can be just insane. You write that mode you can provide a 35,000 X speed up of Python.
SPEAKER_00
32:13 - 34:00
How does it do that? Yeah, so I couldn't even do more, but we'll get to that. So, first of all, when we say that we're talking about what's called CPython. It's the default Python that everybody uses when you type Python 3 that's like typically the one you guys, right? CPython is an interpreter. And so interpreters, they have an extra layer of like byte codes and things like this that they have to go read, parse, interpret, and make some kind of slow from that perspective. And so one of the first things we do is we move to a compiler. And so I'm just moving to a compiler, getting the interpreter out of the loop is two to five to 10x speed up depending on the code. So just out of the gate, just using more modern techniques. Right. Now, if you do that, one of the things you can do, you can start to look at how CPython started to lay out data. And so one of the things that C. Python did, and this isn't part of the Python spec necessarily, but this is just sets of decisions, is that if you take an integer, for example, it'll put it in an object, because when Python everything's an object. And so they do the very logical thing of keeping the memory representation of all objects the same. So all objects have a header, they have payload data, and what this means is that every time you pass around an object, you're passing around a pointer to the data. Well, this has overhead. It turns out that modern computers don't like chasing pointers very much and things like this. It means that you have to allocate the data, manage after it, reference count it, which is another way that Python uses to keep track of memory. And so this has a lot of overhead. And so if you say, OK, let's try to get that out of the heap, out of a box, out of an interaction, and into the registers. That's another 10x.
SPEAKER_01
34:00 - 34:06
So it adds up, if you're a reference counting, every single thing you create that adds up.
SPEAKER_00
34:06 - 35:12
And if you look at people complaining about the Python Guild, this is one of the things that hurts parallelism, that's because the reference counting. So the guild and reference counting are very tightly intertwined and Python. It's not the only thing, but it's very tightly intertwined. And so then you lean into this and say, OK, cool. Well, modern computers, they can do more than one operation at a time. So they have vectors. What is a vector? Well, a vector allows you to take one instead of taking one piece of data doing an ad or a multiply and then pick up the next one, you can now do a 4 or 8 or 16 or 32 at a time. Well, Python doesn't expose that because of reasons. And so now you can say, OK, well, you can adopt that. Now you have threads, and you have additional things that you can control in memory. And so what module allows you to do is it allows you to start taking advantage of all these powerful things that have been built into hardware over time. And it gives the library gives very nice features, so you can say just parallelize, do some parallel. So it's very powerful weapons against Sloanus, which is why people have been, I think having fun, just taking code and making it go fast, because it's just kind of an adrenaline rush to see how fast you can get things.
SPEAKER_01
35:12 - 35:28
Before I talk about some of the interesting stuff with parallelization, all that, let's first talk about the basics. We're talking about the indentation. So this thing looks like Python. It's sexy and beautiful like Python as I mentioned. Is it a typed language? So what's the role of types?
SPEAKER_00
35:28 - 37:23
Yeah, good question. So Python has types, it has strings, it has integers, it has dictionaries and like all that stuff, but they all live at runtime. Right. And so because all this types live at runtime and Python, you never, you don't have to spell them. Python also has like this whole typing thing going on now and a lot of people use it yeah I'm not talking about that that's that's kind of a different thing we can go back to that if you want but but typically the um you know you just say I take I have a death and my death takes two parameters on your call them a and b and I don't have for a type okay so that is great, but what that does is that forces what's called a consistent representation. So these things have to be a pointer to an object with the object header and they all have to look the same. And then when you dispatch a method, you go through all the same different paths no matter what the receiver or whatever that type is. So it modures does. So it allows you to have more than one kind of type. And so what it does is allows you to say, OK, cool, I have an object. And objects behave like Python does. And so it's fully dynamic. And that's all great. And for many things, classes, like that's all very powerful and very important. But if you want to say, hey, it's an integer. And it's 32 bits or 64 bits or whatever it is. Or it's a floating point value. It's six formats. Well, then the compiler can take that and it can use that to do way better optimization. And from that, again, getting rid of the interactions that's huge means you can get better code completion because you have, because compiler knows what the type is. And so it knows what operations work on it. And so that's actually pretty huge. And so what Mojo does allows you to progressively adopt types into your program. And so you can start, again, it's compatible with Python. And so then you can add however my types you want, wherever you want them. And if you don't want to deal with it, you don't have to deal with it. And so one of our opinions on this is that it's not that types are the right thing or the wrong thing. It's that they're a useful thing.
SPEAKER_01
37:23 - 37:35
Which is kind of optional. It's not strict typing. You don't have to specify a type. Exactly. Okay, so it's starting from the thing that Python is kind of reaching towards right now with trying to inject types into it.
SPEAKER_00
37:35 - 37:37
Yeah, with a very different approach, but yes, yes.
SPEAKER_01
37:37 - 38:07
What's the different approach? I'm actually one of the people. They have not been using types very much in Python. Why did you say it just well because I know the importance it's like adults use strict typing and so I refuse to grow up in that sense. It's a kind of rebellion, but I just know that It probably reduces the amount of errors, even just for forget about performance improvements. It's probably reduces errors when you do strict tapping.
SPEAKER_00
38:07 - 40:40
Yeah, so I mean, I think it's interesting for you to look at that, right? And the reason I'm giving you hard time is that there's this cultural norm, this pressure, there has to be a right way to do things. You know, grown-ups only do it one way and if you have to do that, you should feel bad. Right? Like some people feel like Python's a guilty pleasure or something. And that's like when I get serious, I need to go rewrite it, right? Exactly. I mean, cool. I understand history and I understand kind of where this comes from, but I don't think it has to be a ghostly pleasure. Yeah. So if you look at that, you say, why do you have to rewrite it? Well, you have to rewrite it to deploy. Well, why do you want to deploy? Well, you care about performance or you care about profitability or you want, you know, a tiny thing on the server that has no dependencies or you know, you have objectives that you're trying to attain. So what if Python can achieve those objectives? So if you want types, well, maybe you want types because you want to make sure you're passing on the right thing. Sure, you can add a type. If you don't care, you're prototyping some stuff, you're hacking some things out, you're like pulling some RAM good off the internet. It should just work. Right, and you shouldn't be pressured. You shouldn't feel bad about doing the right thing or the thing with feels good. Now, if you're in a team, you're working at some massive internet company and you have 400 million lines of Python code. Well, they may have a house rule that you use types, right? Because it makes it easier for different humans to talk to each other and understand what's going on and bugs at scale, right? And so there are lots of good reasons why you might want to use types. But that doesn't mean that everything should use them all the time. So what Moja does, it says cool. Well, allow people to use types. And if you use types, you get nice things out of it, right? You get better performance and things like this, right? But Mojo is a full, compatible, super set of Python. And so that means it has to work without types. It has to support all the dynamic things. It has to support all the packages. It has to support for a comprehensive list of comprehensions and things like this. And so that starting point I think is really important. And I think that again, you can look at why I care so much about this. And there's many different aspects of that. One of which is the world went through a very challenging migration from Python 2 to Python 3. right and this is this migration took many years and it was very painful for many teams right and there's a lot of a lot of things that went on in that I'm not an expert in all the details and honestly don't want to be I don't want the world to have to go through that right and you know people can ignore motion if it's not they're saying that's that's cool but if they want to use motion I don't want them to have to read all their code yeah mean this okay the super set part is
SPEAKER_01
40:41 - 40:58
It's just, I mean, there's so much building stuff here that definitely is incredible. We'll talk about that. First of all, how's the typing implemented differently in Python versus Mojo? So this heterogeneous flexibility is definitely implemented.
SPEAKER_00
40:58 - 43:04
Yeah, so I'm not a full expert. The whole backstory on types and Python. So I'll give you, I'll give you that. I can give you my understanding. My understanding is basically like many dynamic languages. The ecosystem went through a phase where people went from writing scripts to writing a large scale. Huge code bases and Python and at scale. It kind of helps have types. People want to be able to reason about interfaces, what do you expect string or an intro? Like, what do these basic things, right? And so what the Python community started doing is it started saying, okay, let's have tools on the side. Checker tools, right? They go in like, in for some variants, check for bugs, try to identify things. These are called static analysis tools generally. And so these tools run over your code and try to look for bugs. What ended up happening is there's so many of these things, so many different weird patterns and different approaches on specifying the types and different things going on, that the Python community realized and recognized, hey, hey, there's a thing here. And so what they started to do is they started to standardize the syntax for adding types to Python. Now, one of the challenges that they had is that they're coming from kind of this fragmented world where there's lots of different tools. They have different trade offs and interpretations and it types me in different things. And so if you look at types in Python, according to the Python spec, the types are ignored. So according the Python spec, you can write pretty much anything in a type position. And you can technically, you can write any expression. Now, that's beautiful because you can extend it. You can do cool things. You can write build your own tools. You can build your own house, winter, or something like that. But it's also a problem because any existing Python program may be using different tools, and they have different interpretations. And so if you adopt somebody's package and hear ecosystem, try around the tool you prefer, it may throw out tons of weird errors and warnings and problems just because it's incompatible with how these things work. Also because they're added late and they're not checked by the Python interpreter, it's always kind of more of a hint than it is a requirement. Also, the CPython implementation can't use them for performance.
SPEAKER_01
43:04 - 43:10
And so it's really, that's the big one, right? So you can't utilize the for the compilation for the just-in-title compilation, okay?
SPEAKER_00
43:10 - 43:48
Exactly. And this all comes back to the design principle of its kind of hints. They're kind of the definitions a little bit murky. It's unclear exactly the interpretation in a bunch of cases. And so because of that, you can't actually, even if you want to, it's really difficult to use them to say like, it is going to be an end, and if it's not a problem, right? A lot of code would break, if you did that. So in Mojo, right? So you can still use those kind of type annotations. It's fine. But in Mojo, if you declare a type and you use it, then it means it is going to be that type and the compiler helps you check that and force it and it's safe. And it's not, it's not a, like, best effort hint kind of a thing.
SPEAKER_01
43:48 - 43:59
So if you try to show the string type thing into a integer, you get an error from the compiler compile time. Nice. Okay. What kind of basic types are there?
SPEAKER_00
43:59 - 46:04
Yeah. So Mojo is pretty hardcore in terms of what it tries to do in the language, which is the philosophy there is that we Again, if you look at Python, Python is a beautiful language because it's so extensible, right? And so all of the different things in Python like four loops and plus and like all these things can be accessed through these underbar and bar methods. Okay. So you have to say, okay, if I make something that is super fast, I can go all the way down to the metal, why do I need to have integers built into the language? And so what Mojo does is it says, okay, well, we can have this notion of structs. So you have classes and by then, you can have structs. Classes are dynamic, structs are static. Cool. We can get high performance. We can write C++ kind of code with structs if you want. These things mix and work beautifully together. But what that means is that you can go an implement strings and ints and flutes and arrays and all that kind of stuff in the language. Right. And so that's really cool because To me as a idealizing compiler language type of person what I want to do is I want to get magic out of the compiler and put it in the libraries because if somebody can you know if we can build an integer that's beautiful and has an amazing API it does all the things you'd expect an integer to do but you don't like it maybe you want a big integer maybe you want to like sideways integer I don't like what all the space of integers are then then you can do that and it's not a second class citizen And so if you look at certain other languages, like C++, when I also love and use a lot, um, into this hard code in the language. But complex is not. And so it's kind of weird that you have this SDD complex class, but you have int and complex tries to look like a natural numeric type and things like this. But integers and floating point have these like special promotion rules and other things like that that are magic and they're hacked into the compiler. And because of that, you can't actually make something that works like the built-in types.
SPEAKER_01
46:05 - 46:19
Is there something provided as a standard? Because, you know, because it's AI first, you know, numerical types are so important here. So is there something like a nice standard implementation of integer and flow?
SPEAKER_00
46:19 - 46:46
Yeah, so we're still building all that stuff out. So we provide engines and floats and all that kind of stuff. We also provide like buffers and tenters and things like that. They'd expect an ML context. Honestly, we need to keep designing and redesigning and working with the community to build that out and make that better. That's not our strength right now. Give us six months or a year and I think it'll be way better. But the power of putting in the library means that we can have teams of experts that aren't compiler engineers that can help us design and refine and drive us forward.
SPEAKER_01
46:46 - 47:01
So one of the exciting things we should mention here is that this is new and fresh. This cake is unbaked. It's almost baked. You can tell us delicious. but it's not fully ready to be consumed. Yep.
SPEAKER_00
47:01 - 47:52
That's very fair. It is very useful, but it's very useful if you're super low level programmer right now, and what we're doing is we're working our way up the stack. And so the way I would look at Mojo today in May and 2023, is that it's like a 0.1. So I think that, you know, you're from now, it's going to be way more interesting to a variety of people. But what we're doing is we're, we decide to release it early so that people can get access to it and play with it and we can build it with community. We have a big roadmap, fully published, being transparent about this and a lot of people are involved in this stuff. And so what we're doing is we're really optimizing for building the thing the right way and building it the right way is kind of interesting working with the community because everybody wants it yesterday. And so, and sometimes it's kind of, you know, there's some dynamics there, but it's the right thing.
SPEAKER_01
47:52 - 48:08
So there's a discord also, so the dynamics is pretty interesting. Sometimes a community probably can be very chaotic and introduce a lot of stress. We don't famously quit over the stress of the walvers operator. I mean, you know, the prok.
SPEAKER_00
48:08 - 48:10
Maybe that brought the prok.
SPEAKER_01
48:10 - 48:24
Exactly. And so like it could be very stressful as you to develop, but can you just, A tangent upon a tangent is a stressful to work through the design of various features here, given that the community is so recently involved.
SPEAKER_00
48:25 - 50:08
Also, I've been doing open development and community stuff for decades. Somehow this has happened to me. So I've learned some tricks, but the thing that always gets me is I want to make people happy, right? And so this is maybe not all people all happy all the time, but generally I want people to be happy, right? And so the challenge is that Again, we're tapping into some deep-seated long tensions and pressures both in the Python world, but also in AI world and the hardware world and things like this. And so people just want to move faster. And so again, our decision was let's release this early. Let's get people used to it or access to it and play with it and let's build in the open, which we could have You know, I had the language monk sitting in the cloister up on the hilltop, like be varying away trying to build something. But my experience, you get something that's way better if you work with the community. And so yes, it can be frustrating, it can be challenging for lots of people involved. And you know, I mean, if you mentioned our discord, we have over 10,000 people on the discord, 11,000 people or something. Keep in mind, we release mojo like two weeks ago. So it's very cool. But what that means is that 10, 11, 11, 11 people all will want something different. And so what we've done is we've tried to say, OK, cool, here's our roadmap. And the roadmap isn't completely arbitrary. It's based on here's the logical order in which to build these features or add these capabilities and things like that. And what we've done is we spun really fast on bug fixes. And so we actually have very few bugs, which is Cool. I mean, actually for projects in the state, but then what we're doing is we're dropping in features very deliberately.
SPEAKER_01
50:08 - 50:26
I mean, this is fun to watch because you got the two gigantic communities of like a hardware like systems engineers and then you have the machine learning Python people that are like higher level. Yeah. And it's just two like like army like they've been at war.
SPEAKER_00
50:26 - 51:17
Yeah. It's been a war, right? And so here's a Tolkien novel or something. Oh, so here's a test. Again, it's super funny for for something that's only been out for two weeks, right? People are so impatient, right? But okay, cool. Let's fast forward a year. Like any years time, Moja will be actually quite amazing and solve tons of problems and be very good. People still have these problems. right and so you look at this you say in the way I look at this list is to say okay well we're solving big longstanding problems To me, I, again, working on major problems. I want to make sure we do it right. Right. There's like a responsibility you feel because if you mess it up, right? There's very few opportunities to do projects like this and have them really have impact on the world. If we do it right, then maybe we can take those feuding armies and actually heal some of those wounds.
SPEAKER_01
51:17 - 51:23
This feels like a speech by George Washington or Abraham or something.
SPEAKER_00
51:23 - 51:42
And you look at this and it's like, okay, well, how different are we? Yeah, we all want beautiful things. We all want something that's nice. We all want to be able to work together. We all want our stuff to be used, right? And so if we can help heal that, now I'm not optimistic that all people will use Mojo and they'll stop using C++. Okay, that's not my goal, right? But if we can heal some of that, I think that'd be pretty cool.
SPEAKER_01
51:42 - 51:47
Yeah, and we'll start by putting the people who like braces into the Gulag. No.
SPEAKER_00
51:47 - 51:53
So there are proposals for adding braces to Mojo. We just thought, what's your thing? We tell them, no.
SPEAKER_01
51:53 - 52:13
Okay. Anyway, so there's a lot of amazing features on the roadmap and those ready to implement it. It'd be awesome. I'll just ask if you think so. So the other performance improvement comes from immutability. So what's this var and this let thing that we've got going on?
SPEAKER_00
52:13 - 53:38
What's immutability? Yeah, so one of the things that is useful and it's not always required, but it's useful is knowing whether something can change out from underneath you. Right. And so in Python, you have a pointer to an array. Right. And so you pass that pointer to an array around to things. If you pass into function, they may take that in scroll the way in some other data structure. So you get your array back and you go to use it. Now, somebody else is like putting stuff in your array. How do you reason about that? Because it's to be very complicated, at least a lot of bugs. And so one of the things that, again, this is not something Mojo forces on you, but something Mojo enables is a thing called dieousamannics. And what value semantics do is they take collections like arrays, dictionaries, also tensors and strings and things like this. They're much higher level and make them behave like proper values. And so it makes it look like if you pass these things around, you get a logical copy of all the data. And so if I pass you array, sure, array, you can go do what you want to. You're not going to hurt my array. Now, that is an interesting and very powerful design principle defines the way a ton of bugs. You have to be careful to implement it in an efficient way. As a performance hit, that's a significant generally not if you implement it the right way, but it requires a lot of very low level getting the language right bits.
SPEAKER_01
53:38 - 53:44
I assume there'll be a huge performance hit because it's a really the benefit is really nice because you don't get into it. Absolutely.
SPEAKER_00
53:44 - 53:53
Well, the trick is you can't do copies. So you have to provide the behavior of copying without doing the copy. Yeah.
SPEAKER_01
53:53 - 53:56
How do you do that?
SPEAKER_00
53:56 - 56:30
It's not magic. It's actually pretty cool. Well, so first before we talk about how that works, let's talk about how it works in Python. Right. So in Python, you define a person class, or maybe a person class is about it. You define a database class, right? And database class has a array of records, something like that, right? And so the problem is that if you pass in a record or a class instance into the database, it'll take hold of that object, and then it assumes it has it. And if you're passing an object in, you have to know that that database is going to take it, and therefore you shouldn't change it after you put in the database. This is just a kind of have to know that. You just have to kind of know that. And so you roll out version one of the database. You just kind of have to know that. Of course, Lex uses his own database, right? Yeah, right? Because you built it. You understand what this works, right? Some else joins the team. They don't know this, right? And so now they suddenly get bugs. You're having maintained the database. You shake your fist. You argue the tenth time this happens. You're like, okay, we have to do something different. And so what you do is you go change your Python code and you change your database class to copy the record every time you add it. And so what ends up happening is you say, okay, I will do what's called a defensive copy. Inside the database and then that way if somebody passes something in I will have my own copy of it and they can go do whatever and they're not gonna break my thing Okay, this is usually the two design patterns if you look in PyTorch for example, this is cloning a tensor like there's a specific thing and you have to know where to call it if you don't call them right place you get these bugs and this is state of the art right So a different approach. So it's used in many languages. So I work with it in Swift. As you say, OK, well, let's provide value semantics. And so we want to provide the view that you get a logically independent copy. But we want to do that lazily. And so what we do is you say, OK, if you pass something into a function, it doesn't actually make a copy. What actually does is it just increments the reference to it. And if you pass around, you stick in your database. You can go on the database. You are not. And then you come back out of the stack, nobody's copied anything. You come back out of the stack and then the color, let's go of it. Well, then you've just handed it off to the database. You've transferred it and there's no copies made. Now, in the other hand, if, you know, your coworker goes and hands you a record and you pass it and you stick in the database and then you go to town and you start modifying it, what happens is you get a copy lazily on demand. And so what this does, this gives you copies only when you need them. And also, so it defines where the bugs, but also generally reduces number of copies in practice.
SPEAKER_01
56:30 - 56:43
But the implementation details are tricky here. Yeah. So this is, yes, something with reference counting, but to make it performant cost a number of different kinds of objects.
SPEAKER_00
56:44 - 57:50
Yeah, so you need a couple of things. So this concept is existed in many different worlds. And so it's again, it's not novel research at all, right? The magic is getting the design right so that you can do this in a reasonable way, right? And so there's a number of components to go into this. One is when you're passing around. So we're talking about Python and reference counting at the expense of doing that. When you're passing values around, you don't want to do extra reference counting for no good reason. So you have to make sure that you're efficient and you transfer ownership instead of duplicating references and things like that, which is a very low-level problem. You also have to adopt this and you have to build these data structures. You know, Mojo has to be compatible with Python. So of course, the default list is a reference semantic list that works the way you'd expect in Python. But then you have to design a value semantic list. And so you just have to implement that and then you implement the logic within. And so the role of the language here is to provide all the low level hooks that allow the author of the type to be able to get an express this behavior without forcing you into all cases or hard coding this into the language itself.
SPEAKER_01
57:50 - 57:54
But there's ownership, so you're constantly transferring your tracking who owns this thing.
SPEAKER_00
57:54 - 58:37
Yes. And so there's a whole system called ownership, and so this is related to work done in the Rust community. Also, the Swift community is done a bunch of work, and there's a bunch of different other languages that have all kind of C++ actually has copy constructors and destructors and things like that. And so, I mean, C++ has everything. So it has moved constructors and has like this whole world of things. And so, This is a body of work that's kind of been developing for many, many years now, and so Mojo takes some of the best ideas out of all these systems and remixes in a nice way, so that you get the power of something like the Rust programming language, but you don't have to deal with it when you don't want to, which is a major thing in terms of teaching and learning and being able to use in scale these systems.
SPEAKER_01
58:38 - 58:48
How does that play with argument conventions? What are they? Why are they important? How does the value semantics? How does the transfer ownership work with the arguments with their passengers?
SPEAKER_00
58:48 - 01:00:57
So if you go deep into systems programming land, so this isn't, again, there's not something for everybody, but if you go deep into systems programming land, what you encounter as you encounter these types that get weird. So if you're used to Python and you think about everything, I could just copy it around. I can go change it and mutate it and do these things. It's all cool. If you get into systems programming land, you get into these things like I have an atomic number or I have a mutex or I have a uniquely owned database handle. Things like this, right? So these types you can't necessarily copy. Sometimes you can't necessarily even move them to different address. And so what Mojo allows you to do is it allows you to express, hey, I don't want to get a copy of this thing. I want to actually just get a reference to it. And by doing that, what you can say, as you can say, okay, if I'm defining something weird, like a atomic number or something, it's like It has to be, so an atomic number is an area in memory that multiple threads can access at a time without synchronous, without locks. And so the definition of atomic numbers multiple different things have to be poking it. Therefore, they have to agree on where it is. And so you can't just like move it out from underneath one because it kind of breaks what it means and so that's that's an example of a type that you can't even you can't copy you can't move it like once you create it has to be where it was right now if you look at many other examples like a database handle right so okay well what happens How do you copy a database handle? Do you copy the whole database? That's not something you necessarily want to do. There's a lot of types like that where you want to be able to say that they are uniquely owned. There's always one of this thing. Or if I create a thing, I don't copy it. And so what Mojo allows you to do is it allows you to say, Hey, I want to pass around and reference the same without copying it. And so it has borrowed convention. So you can say you can use it, but you don't get to change it. You can pass it by multiple reference. And so if you do that, then you can you get a reference to it, but you can change it. And so it manages all that kind of stuff.
SPEAKER_01
01:00:57 - 01:01:15
So it's just a really nice implementation of my C++ has, you know, the reference pointers. Yeah, I have smart smart different different kinds of applications smart pointers that you can explicitly define this allows you, but you're saying that's more like the weird case versus the common case.
SPEAKER_00
01:01:15 - 01:01:59
Well, it depends on where, I mean, I don't think I'm a normal person, so I mean, I'm not one to call other people weird. But if you talk to a normal Python program where you're typically not thinking about this, this is a lower level of abstraction. If you talk to a C++ programmer, certainly if you talk to a Rust programmer, again, they're not weird, they're delightful. These are all good people, right? Those folks will think about all the time. Right. And so I look at this as there's a spectrum between very deep low level systems. I'm going to go poke the bits and care about how they're laid out in memory all the way up to application and scripting and other things like this. And so it's not that anybody is right wrong. It's about how do we build one system that scales.
SPEAKER_01
01:01:59 - 01:02:22
By the way, the idea of an atomic number has been something that always brought me deep happiness because the flip side of that, the idea that Threads can just modify stuff. A synchronously, just a whole idea of concurrent programming is a source of infinite stress for me.
SPEAKER_00
01:02:23 - 01:04:10
Well, so this is where you jump into, you know, again, you zoom out and get out of program languages or compilers and you just look what the industry has done. My mind is constantly blown by this, right? And you look at what, you know, Moore's Law, Moore's Law has this idea that, like, computers for a long time, single thread performance just got faster and faster and faster and faster and faster for free. But then physics And other things intervened in power consumption, like other things started to matter. And so what ended up happening is we went from single core computers to multicore. Then we went to accelerators. And this trend towards specialization of hardware is only going to continue. And so for years us programming language nerds and compiler people have been saying okay well how do we tackle multicore right for a while was like well that course of future we have to get on top of the thing that was most of the course of the fault what are we doing with the same and that is like there's chips with hundreds of course in them and what happened right and so I'm super inspired by the fact that, you know, in the face of this, you know, those machine learning people invented the site of a tensor, right? And what is a tensor? A tensor is like an arithmetic and algebraic concept. It's like an abstraction around a gigantic parallelizable data set, right? And because of that, and because of things like TensorFlow and PyTorch, we're able to say, okay, we'll express the math of the system, enables you to do automatic differentiations, enables you to do like all these cool things. And it's an abstract representation, because you have that abstract representation, you can now map it onto these parallel machines without having to control, okay, put that right here, put that right there. And this has enabled an explosion in terms of AI compute.
SPEAKER_01
01:04:10 - 01:04:49
accelerators like all the stuff and so that's super super excited what about the deployment the execution cross multiple machines so you write that the modular compute platform dynamically partitions models with billions of parameters and distributes their execution cross multiple machines enabling unparalleled efficiency whether the use of unparalleled in that sentence. Anyway, enabling unparalleled efficiency scale and reliability for the largest workloads. So how do you do this abstraction of distributed deployment of large
SPEAKER_00
01:04:49 - 01:09:27
Yeah, so one of the really interesting tensions, so there's a whole bunch of stuff that goes into that. I'll pick a random walk through it. If you go back and replay the history of machine learning, right? I mean, the brief, the brief most recent history machine learning, because this is, as you know, very deep. I knew Lex when he had an EI podcast. Yeah. Right. Yep. So, so if you look at just TensorFlow and PyTorch, which is pretty recent history in the big picture, right? TensorFlow is all about graphs, PyTorch, I think pretty unarguably ended up winning and why did it win mostly because of usability, right? And the usability of PyTorch is I think huge and I think again, that's a huge testament to the power of taking abstract theoretical technical concepts and bringing it to the masses, right? Now, the challenge with what the TensorFlow versus the PyTors design points was that TensorFlow is kind of difficult to use for researchers, but it was actually pretty good for deployment. PyTors is really good for researchers. It kind of is not super great for deployment, right? And so I think that we as an industry have been struggling. And if you look at what deploying a machine learning model today means is that you'll have researchers who are, I mean, wicked smart, of course, but they're wicked smart. model architecture and data and calculus. They call it like that we could smart and various domains. They don't want to know anything about hardware or deployment or SQL slasters, things like this. And so what's happened is you people who train the model, they throw over the fence and they have people that try to deploy the model. Well, every time you have a team A does X, they throw it over the fence and team Y does some team B does Y, like you have a problem, because of course it never works the first time. And so you throw it over the fence, they figure out, okay, it's too slow, won't that doesn't use the right operator, the tool crashes, whatever the problem is, then they have to throw it back over the fence. And every time you throw a thing over a fan, it takes three weeks of project managers and meetings and things like this. And so what we've seen today is getting models and production can take weeks or months. It's not typical. And I talked a lot to people and you talk about like VP of software, some internet company trying to deploy a model. And I'm like, why do I need a team of 45 people? It's so easy to train a model. Why can't I deploy it? And if you dig into this, Every layer is problematic. So if you look at the language piece, I mean, this is tip of the iceberg. It's a very exciting tip of the iceberg for folks, but you've got Python on one side and see plus plus on the other side. Python doesn't really deploy. I mean, can theoretically technically in some cases, but often a lot of production teams will want to get things out of Python because they get their performance and control and whatever else. So Mojo can help with that. If you look at serving, so you talk about gigantic models. Well, a gigantic model won't fit on one machine. And so now you have this model, it's written Python, it has to be rerun and C++. Now it also has to be carved up so that half of it runs on one machine, half of it runs on another machine. or maybe it runs on 10 machines. Also, now suddenly, the complexity is exploding, right? And the reason for this is that if you look into TensorFlow, PyTorish, these systems, they weren't really designed for this world, right? They were designed for, you know, back in the day when we were, starting and doing things where it was a different much simpler world. Like you want to run ResNet 50 or some ancient model architecture like this. It was just a it was a completely different world than rain on one GPU exactly doing out on one GPU. Yeah, that was not right in the major breakthrough. And the world is changed. And so now the challenge is that TensorFlow PyTorcy systems, they weren't actually designed for LLM. So that was not a thing. And so where TensorFlow actually has amazing power in terms of scale and deployment and things like that. And I think Google is, I mean, maybe not unmatched, but they're incredible in terms of their capabilities in gigantic scale. many researchers using PyTorch, right? And so PyTorch doesn't have those same capabilities, and so what modular can do is it can help with that. Now if you take a step back and say like what is modular doing, right? So modular has like a bitter enemy that we're fighting against in the industry. And it's one of these things where everybody knows it, but nobody is usually willing to talk about it. The bitter enemy, the bitter thing that we have to destroy, that we're all struggling with, and it's like, all around, it's like fish can't see water. It's complexity.
SPEAKER_01
01:09:27 - 01:09:29
Sure, yes, complexity.
SPEAKER_00
01:09:29 - 01:14:11
That was very close off of you. And so if you look at it, yes, it is on our side. Yes, all these, all these accelerators, all these software stacks that go with the accelerator, all these, like, there's massive complexity over there. You look at, Well, it's happening on the modeling side, massive amount of complexity, like things are changing all the time. People are inventing turns out the research is not done, right? And so people want to be able to move fast. Transformers are amazing, but there's a diversity even with inter-transformers. And what's the next transformer? And you look into serving. Also, huge amounts of complexity. It turns out that all the cloud providers have all their very weird, but very cool hardware for networking, all this kind of stuff. And it's all very complicated. People aren't using it. You look at classical serving. There's this whole world of people who know how to write hi-performance servers with zero-copy networking and all this fancy asynchronous IO and all these fancy things in the serving community. very little that has provided into the machine learning world. And why is that? Well, it's because again, the systems have been built up over many years. They haven't been rethought. There hasn't been a first principles approach to this. And so what modular is doing is we're saying, OK. We've built many of these things. So I've worked on TensorFlow and TPUs and things like that. Other folks on our team have worked on PyTor's core. We've worked on on X1 time. We've worked on many of these other systems. And so the Apple accelerators and all that kind of stuff. Our team is quite amazing. And so one of the things that roughly, everybody, much of this grumpy about is that when you're working on one of these projects, you have a first or a goal. get the hardware to work, get the system to enable one more model, get this product out the door, enable the specific workload or make it solve this problem for this this product team, right? And nobody's been given a chance to actually do that step back. And so we as an industry, we didn't take two steps forward. We took like 18 steps forward in terms of all this really cool technology across compilers and systems and run times and heterogeneous computing, like all this kind of stuff. And like all this technology has been, you know, I wouldn't say beautifully designed, but it's been proven in different quadrants. You know, you look at Google with TPUs. Massive huge exoflops of compute strapped together into machines that researchers are programming in Python in a notebook. That's huge. That's amazing. It's incredible. And so you look at the technology that goes into that. And the algorithms are actually quite general. And so lots of other hardware out there and lots of other teams out there don't have this specification or maybe the the years working on it or the budget or whatever the Google does, right? And so they should be getting access to same algorithms, but they just don't have that. Right? And so what modular students were saying, Cool. This is not research anymore. We've built auto tuning in many systems. We've built programming languages, right? And so I have implemented C++, I've implemented Swift, I've implemented many of these things. And so, you know, it's hard, but it's not research. And you look at accelerators. Well, we know there's a bunch of different weird kind of accelerators, but they actually cluster together. And you look at GPUs. Well, there's a couple of major vendors of GPUs, and they maybe don't always get along, but their architectures are very similar. You look at CPUs. CPUs are still super important for the deployment side of things. And you see new new architectures coming out from all the cloud providers and things like this, and they're all super important. to the world, right? But they don't have the 30 years of development that the entrenched people do, right? And so what modular can do is we're saying, okay, all this complexity, like it's not, it's not bad complexity, it's actually innovation, right? And so it's innovation that's happening and it's for good reasons, but I have sympathy for the poor software people. I mean, again, I'm a generally software person too. I love hardware, but software people want to build applications and products and solutions that scale over many years. They don't want to build a solution for one generation of hardware with one vendor's tools, right? And because of this, they need something that scales with them. They need something worse on cloud and mobile. Right? Because, you know, their product manager said, hey, I want it to have low latency and it's better for personalization or whatever they decide, right? Products evolve. And so the challenge with the machine learning technology and the infrastructure that we have today in the industry is that it's all these point solutions. And because they're all these point solutions, it means that as your product evolves, you have to like switch different technology stacks or switch to different vendor. And what that does is that slows down progress.
SPEAKER_01
01:14:12 - 01:14:26
So basically, a lot of the things we've developed in those little silos for machine learning tasks, you want to make that the first class citizen of a general purpose programming language that can be compiled across all these kinds of hardware.
SPEAKER_00
01:14:26 - 01:15:23
Well, so it's not really about a programming language. I mean, the program language is a component of the mission, right? And the mission is, are not literal, but our joking mission is to save the world from terrible AI software. Excellent. Okay. So, so, you know, if you look at this mission, you need a syntax. So, yes, you need a program language, right? And like, we wouldn't have to build the program language if one existed. So if Python was already good enough, then cool, we were just used it. We're not just doing very large-scale expensive engineering projects for the sake of it. It's to solve a problem, right? It's also about accelerators. It's also about exotic, numerics, and B-flute 16, and matrix multiplications and convolutions, and this kind of stuff. Within the stack, there are things like kernel fusion. That's an esoteric but really important thing that leads to much better performance and much more general research hackability together.
SPEAKER_01
01:15:23 - 01:15:53
And that's enabled by the A6. That's enabled by certain hardware. Well, it's like, where's the dance between I mean, several questions here are like, do you add a piece of hardware to this stack? Yeah. If you piece of, like, if I have this genius invention of a specialized accelerator, how do I add that to the module framework and also how does modular as a standard start to define the kind of hardware that should be developed?
SPEAKER_00
01:15:53 - 01:17:12
Yeah. So let me take a step back and talk about status quo. Okay. And so if you go back to TensorFlow 1, Python 1, this kind of time frame, and these have all evolved and gone way more complicated. So let's go back to the glorious simple days, right? These things basically were CPUs and CUDA. And so what you do is you say go do a dense layer and a dense layer has a matrix multiplication in it. Right? And so when you say that, you say go do this big operation and matrix multiplication. And if it's on a GPU, Kick off a good account. If it's on CPU, go do it. like an Intel algorithm or something like that with the Intel MKL. Okay. Now, that's really cool. If you're either a video or Intel, right? But then more hardware comes in. And on one access, you have more hardware coming in. On the other hand, you have an explosion of innovation in AI. And so what happened with both TensorFlow and PyTorch is that the explosion of innovation in AI has led to it's not just about major simplification and convolution. These things have now like 2,000 different operators. And on the other hand, you have, I don't know how many pieces of hardware out there. It's a lot. It's not even hundreds, it's probably thousands. Okay. And across all of Edge and across all the different things that are used at scale. Yeah, exactly.
SPEAKER_01
01:17:12 - 01:17:17
I mean, it's not just like, yeah, it's everywhere. It's not a handful of TPU alternatives.
SPEAKER_00
01:17:17 - 01:17:28
Correct. It's every phone, often with many different chips inside of it from different vendors. Right. Like, it's AI is everywhere. It's a thing.
SPEAKER_01
01:17:28 - 01:17:35
Right? Why are they all making their own chips? Like, why is everybody making their own thing? Well, so be quick. Was that a good thing?
SPEAKER_00
01:17:35 - 01:22:30
So Chris is philosophy on hardware? Yeah. Right. So my philosophy is that there isn't one right solution. Right. And so I think that again, we're at the end of Moore's law. Specialization happens. Yeah. If you if you're building, if you're training GPT-5, you want some crazy supercomputer data center to thingy. If you're making a smart camera that runs on batteries, you want something looks very different. For building a phone, you want something looks very different. If you have something like a laptop, you want something looks maybe similar but a different scale, right? And so AI ends up touching all of our lives, robotics, right? And like lots of different things. And so as you look into this, these have different power envelopes. There's different trade-offs in terms of the algorithms. There's new innovations in sparsity and other data formats and things like that. And so hardware innovation, I think, is a really good thing, right? And what I'm interested in is unlocking that innovation. There's also like analog and quantum, and like although the the really weird stuff, right? And so if somebody can come up with a chip that uses analog computing and it's 100x more power efficient, think what that would mean in terms of the daily impact on the products we use. That would be huge. Now, if you're building an analog computer, you may not be a compiler specialist. These are different skill sets, right? And so you can hire some compiler people if you're writing a company maybe, but it turns out these are really like exotic new generation of compilers. Like this is a different thing, right? And so if you if you take step back out and come back to what is the staff's quote, staff's quote is that If you're in teller here in video, you keep up with the industry and you chase and okay there's 1900 now there's 2,000 now there's 2,100 and you have a huge team of people that are like trying to keep up and tune and optimize and even when one of the big guys comes out with a new generation of their chip, they have to go back and rewrite all these things, right? So really it's only powered by having hundreds of people that are all like frantically trying to keep up and what that does is like keeps out the little guys. and sometimes they're not so little guys, the big guys that are also just not in those dominant positions. And so what has been happening, and so a lot of you talk about the rise of new exotic crazy accelerators, people have been trying to turn this from a let's go right lots of special kernels problem into a compiler problem. And so we and I contributed to this as well. We initially went to a like let's go make this compiler problem phase. Let's call it and much of the industry still in the phase, by the way. So I want to say this phase is over. And so the idea is to say, look, okay. What a compiler does is it provides a much more general, extensible, hackable interface for dealing with the general case. And so within machine learning algorithms, for example, people figured out that, hey, if I do a matrix multiplication, they do a value. All right, the classic activation function. It is way faster to do one pass over the data and then do the value on the output, we're writing out the data because the value is just a maximum operation, right, max with zero. And so it's an amazing optimization. Take map more value, squish together one operation. Now we have map more value. Well, wait a second. If I do that, now I just went from having two operators to three. But now I figure out, okay, well, there's a lot of activation functions. What about, uh, uh, leaky rally, what about? Like, like a million things that are out there, right? And so as I start fusing these in, now I get permutations of all these algorithms, right? And so with the compiler people said, is they said, hey, cool, well, I will go and numerate all the algorithms, and I will numerate all the pairs, and I will actually generate a kernel for you. And I think that this has been very, very useful for the industry. This is one of the things that powers Google TPUs, PyTorch IIs, like rolling out really cool compiler stuff with Triton, this other technology and things like this. And so the compiler people are kind of coming into their for and saying like awesome, this is a compiler problem, we'll compile it. Here's a problem. Not everybody's a compiler person. I love compiler people, trust me, right? But not everybody can or should be a compiler person. It turns out that there are people that know analog computers really well. Or they know some GPU internal architecture thing really well. Or they know some crazy sparse numeric interesting algorithm that is the cusp of research, but they're not compiler people. And so one of the challenges with this new wave of technology trying to turn everything into compiler And again, it is excluded to a ton of people. And so you look at what is Mojo do, what is the modular stack do? It brings programmability back into this world. Like in enables, I wouldn't say normal people, but like a new, you know, a different kind of delightful nerd that cares about numerics or cares about hardware or cares about things like this to be able to express that in the stack and extend the stack without having to actually go hack the compiler itself.
SPEAKER_01
01:22:30 - 01:22:36
So extend the stack on the algorithm side. Yeah. And then on the hardware side.
SPEAKER_00
01:22:36 - 01:23:14
Yeah. So again, go back to like the simplest example of int, right? And so what both Swift and Mojo and other things like this did is we said, okay, pull magic out of the compiler and put it in the standard library. And so, it's modular stilling with the engine that we're providing, and this very deep technology stack, which goes into heterogeneous run times. And like, a whole bunch of really cool things. This whole stack allows that stack to be extended and hacked and changed by researchers and by hardware innovators. And by people who know things that we don't know, because, you know, much like some smart people, we don't have all the smart people that turns out.
SPEAKER_01
01:23:14 - 01:23:15
What are heterogeneous run times?
SPEAKER_00
01:23:15 - 01:26:08
Yeah, so what is heterogeneous? So heterogeneous just means many different kinds of things together. And so the simple example you might come up with is a CPU and a GPU. And so it's a simple heterogeneous computer to say, I will run my data loading and preprocessing and other algorithms on the CPU. And then once I get it into the right shape, I shove it into the GPU. I do a lot of matrix multiplications and convolutions and things like this. And then I get it back out. do some reductions in summaries and they shove it across the wire to across the network to another machine. And so you've got now what are effectively two computers CPU and GPU talking to each other working together in a heterogeneous system. But that was ten years ago. You look at a modern cell phone. Modern cell phone, you've got CPUs, and they're not just CPUs, there's like big dot little CPUs, and so there's multiple different kinds of CPUs, or yeah, and we're going together, they're multi core, you've got GPUs, you've got neural network accelerators, you've got dedicated hardware blocks for media, so for video decode and JPEG decode and things like this. And so you've got this massively complicated system, and this isn't just cell phones, every laptop, these days is doing the same thing. And all these blocks can run at the same time. and need to be choreographed. And so again, one of the cool things about machine learning is it's moving things to like data flow graphs and higher level of abstractions and tensors and these things that it doesn't specify here's how to do the algorithm. It gives the system a lot more flexibility in terms of how to translate or map or compile it onto the system that you have. And so what you need, you know, the bottomest part of the layer there is a way for all these devices to talk to each other. And so this is one thing that I'm very passionate about. I mean, you know, I'm nerd. But all these machines and all these systems are effectively parallel computers running at the same time, sending messages to each other. And so they're all fully asynchronous. Well, this is actually a small version of the same problem you have in a data center. Right, in a data center, you now have multiple different machines, sometimes very specialized, sometimes with GPUs or GPUs and one node, and sometimes with disks and other nodes. And so you get a much larger scale, heterogeneous computer. And so what ends up happening is you have this like multi layer abstraction of hierarchical parallelism, hierarchical asynchronous communication and making that again the enemy my enemy is complexity by getting that away from being different specialized systems that every different part of the stack and having more consistency in uniformity I think we can help lift the world and make it much simpler and actually get used well how do you leverage like the strengths of the different specialized systems so looking inside this smartphone yeah I think there's just what I got in all five six computers essentially is that's my phone how do you
SPEAKER_01
01:26:09 - 01:26:18
without trying to minimize the explicit making it explicit, which computer is supposed to be for which operation.
SPEAKER_00
01:26:18 - 01:27:35
Yeah, so there's a pretty well-known algorithm and what you're doing is you're looking at two factors. You're looking at the factor of sending data from one thing to another, because it takes time to get it from that side of the chip to that side of the chip and things like this. And then you're looking at what is the time it takes to do an operation on a particular block. So take CPUs. CPUs are fully general, they can do anything, right? But then you have a neural net accelerator that's really good at matrix multiplications, okay? And so you say, okay, well, if my workload is all matrix multiplications, I start up. I send the data over the neural net thing. It goes and does matrix multiplications. When it's done, it sends me back to the result. All is good, right? And so the simplest thing is just saying, do matrix, do matrix operations over there, right? But then you realize you get a little bit more complicated, because you can do matrix multiplications on a GPU, you can do it on a neural net accelerator, you can do it on CPU, and they'll have different trade-offs in costs, and it's not just matrix multiplication. And so what you actually look at is you look at I have generally a graph of compute. I want to do a partitioning. I want to look at the communication, the bi-section bandwidth and the overhead and the sending of all these different things and build a model for this and then decide, okay, it's an optimization problem. Where do I want to place this compute?
SPEAKER_01
01:27:35 - 01:27:46
So the old school, theoretical computer science problem, scheduling. And then how does presumably it's possible to somehow magically include auto-tune into this?
SPEAKER_00
01:27:48 - 01:29:22
Absolutely. So, I mean, in my opinion, this is in my opinion, this is not a real degree of this, but in my opinion, the world benefits from simple and predictable systems at the bottom. You can control. But then once you have a predictable execution layer, you can build lots of different policies on top of it. And so one policy can be that the human programmer says, do that here, do that here, do that here, and fully manually controls everything. And the system should just do it. Then you quickly get in the mode of like, I don't want to have to tell it to do it. And so the next logical step that people typically take is they write some terrible heuristic. Oh, if it's a major multiplication due to it over there, or if it's floating point due to it on the GPU, if it's integer due on the CPU, like something like that, right? And then you then get into this mode of like people care more and more and more, and you say, okay, well, let's actually like make sure it's better. Let's get into AutoTent. Let's actually do a search of the space to decide, well, what is actually better, right? Well, then you get into this problem where you realize this is not a small space. This is a many dimensional a hyper-dimensional space that you cannot exhaustly search. So do you know if algorithms are good at searching very complicated spaces for don't tell me you're going to turn this into a machine learning problem. So then you turn into a machine learning problem and then you have a space of genetic algorithms and reinforcement learning and like all these all these things include that into the stack into the into the modules stack. Yeah. Yeah.
SPEAKER_01
01:29:22 - 01:29:26
And where's it? Where does it live? Is it separate thing or is it part of the compilation?
SPEAKER_00
01:29:26 - 01:30:10
So you start from simple and predictable models. And so you can have full control and you can have coarse grain knobs, like no systems you don't have to do this. But if you really care about getting the best, you know, the last ounce out of a problem, then you can use additional tools. And they're the cool thing is you don't want to do this every time you run a model, you want to figure out the right answer and then cash it. And once you do that, you can get, you can say, okay, cool. I can get up and running very quickly. I can get good execution out of my system. I can decide if something's important. And if it's important, I can go through a bunch of machines at it and do a big expensive search over the space using whatever technique I feel like. It's prolept to the problem. And then when I get the right answer, cool, I can just start using it.
SPEAKER_01
01:30:11 - 01:31:08
So you can get out of this this tradeoff between okay am I gonna like spend forever doing a thing or do I get up and run quickly and as a quality result like these these are actually not in contention with each other if the systems design scale You started in the little bit of a world wind overview of how you get the 35,000 x speed up or more over Python Jeremy Howard did a really great presentation about sort of the basic Like we'll get the code, here's how you get the speed up. Like you said, that's something we can probably develop or just can do for the phone code to see how you can get this gigantic speed up. But can you maybe speak to the machine learning task in general? How do you make some of this code fast and specifics? Like what would you say is the main bottleneck? for machine learning tasks. So are we talking about metmol, matrix multiplication, how do you make that fast?
SPEAKER_00
01:31:08 - 01:31:41
So I mean, if you just look at the Python problem, right? You can say, how do I make Python faster? There have been a lot of people that have been working on the, okay, I'm going to make Python to access or 10 access or something like that, right? And there have been a ton of projects in that van, right? Mojo started from the, what can the hardware do? Like, what does the limit of physics? Yeah, what does the speed of light? What is it? Yeah. How fast is the single? And then how do I express that? Yeah. Right. And so it wasn't wrong. It wasn't an anchor relatively on make Python a little bit faster, it's saying. Cool. I know what the hard work can do. Let's unlock that. Right.
SPEAKER_01
01:31:41 - 01:31:50
Now, when you, right now, you just say how, how gutsy that is to be in the meeting and as opposed to trying to see how do we get the improvement. It's like, what can the physics do?
SPEAKER_00
01:31:50 - 01:35:52
I mean, maybe I'm a special kind of nerd, but you look at that. What is the limit of physics? How fast can the things go, right? When you start looking at that, typically it ends up being a memory problem. So today, particularly with these specialized accelerators, the problem is that you can do a lot of math within them, but you get bottleneck sending data back and forth to memory, whether it be local memory or distant memory or disk or whatever it is, and that bottleneck, particularly as the training sizes get large, as you start doing tons of inferences all over the place, so that becomes a huge bottleneck for people. So again, what happened is we went through a phase of many years where people took the special case and hand-tuned it and tweaked it and tricked it out and they knew exactly how the harder it worked and they knew the model and they made it fast. didn't generalize. And so you can make, you know, resident 50 or some, or Alex net or something in Section V1, like you can, you can do that, right? Because the models are small, they fit in your head, right? But as the models get bigger, more complicated, as the machines get more complicated, it stops working. And so this is where things like kernel fusion come in. So what is kernel fusion? This is this idea of saying, let's avoid going to memory. And let's do that by building a new hybrid kernel and numerical algorithm that actually keeps things in the accelerator instead of having to write it all the way out to memory. What's happened with these accelerators now is you get multiple levels of memory. Like in a GPU, for example, you'll have global memory and local memory and like all these things. If you zoom way into how hardware works, the register file is actually a memory. So the registers are like an L0 cache. And so a lot of taking advantage of the hardware ends up being fully utilizing the full power in all of its capability. And this has a number of problems, right? One of which is, again, the complexity disaster, right? There's too much hardware. Even if you just say, let's look at the chips from one line of vendor like Apple or Intel or whatever it is. Each version of the chip comes out with new features. And they change things so that it takes more time or less time to do different things. And you can't rewrite all the software whenever a new chip comes in. And so this is where you need a much more scalable approach. And this is what Mojo and what the modular stack provides is it provides this infrastructure and the system for factoring all this complexity. And then allowing people to express algorithms. You talk about auto tuning, for example. Express algorithms in a more portable way so that when a new chip comes out, you don't have to rewrite it all. So to me, like, you know, I kind of joke like what is a compiler? Well, there's many ways to explain that. You convert thing A into thing B and you convert source code to machine code. Like you can talk about many many things that compilers do. But to me, it's about a bag of tricks. It's about a system and a framework that you can hang complexity. It's a system that can then generalize and it can work on problems that are bigger than the in one human's head. Right? And so what that means, what a good stack and what the modular stack provides is the ability to walk up to it with a new problem and it'll generally work quite well. And that's something with a lot of machine learning infrastructure and tools and technologies don't have. Typical state of the art today is you walk up, particularly if you're deploying, if you walk up with a new model, you try to push it through the converter and the converter crashes. That's crazy. The state of ML tooling today is not anything that a C programmer would ever accept. And it's always been this kind of flaky set of tooling. It's never been integrated well. And it's been never worked together because it's not designed together. It's built by different teams. It's built by different hardware vendors. It's built by different systems. It's built by different internet companies. They're trying to solve their problems, right? And so that means that we get this fragmented, terrible mess of complexity.
SPEAKER_01
01:35:53 - 01:36:04
So, I mean, the specifics of any Jeremy showed this. There's the vectorized function, which I guess is built into the, into Mojo.
SPEAKER_00
01:36:04 - 01:36:07
The vectorized as he showed is built into the library.
SPEAKER_01
01:36:07 - 01:36:18
Into the library, instead of the library. Vectorized parallelize, which vectorizes more low-level, parallelizes higher-level. There's the tiling thing, which is how he demonstrated the,
SPEAKER_00
01:36:20 - 01:37:30
auto-tune, I think. So think about this in like levels, hierarchical levels of abstraction, right? And so at the very, if you zoom all the way into a compute problem, you have one floating point number, right? And so then you say, okay, I want to be, I could do things one at a time in an interpreter. It's pretty slow, right? So I can get to doing one one at a time in a compiler, I can see. Then I can get to doing 4 or 8 or 16 at a time with factors. That's called vectorization. Then you can say, hey, I have a whole bunch of different, you know, what a multi core computer is is this basically a bunch of computers. Right. So they're all independent computers that can talk to each other and they share memory. And so now what parallelized does is it says, okay, run multiple instances on different computers. And now they can all work together on Chrome, right? And so what you're doing is you're saying, keep going out to the next level out. And as you do that, how do I take advantage of this? So telling is a memory optimization, right? It says, OK, let's make sure that we're keeping the data close to the compute part of the problem. Instead of sending it all back and forth through memory every time I load a block.
SPEAKER_01
01:37:30 - 01:37:35
And the size of the block size is all that's how you get to the auto tune to make it optimized.
SPEAKER_00
01:37:36 - 01:38:41
Yeah, well, so all of these, the details matter so much to get good performance. This is another funny thing about machine learning and high performance computing that is very different than sea compilers we all grew up with, where, you know, if you get a new version of GCC or a new version of Clang or something like that, you know, maybe something will go 1% faster. Right. And so compiler ensures will work really, really, really hard to get half a percent out of your C code, something like that. But when you're talking about an accelerator or an AI application or you're talking about these kinds of algorithms. Now, these are things people used to write in Fortran, for example, right? If you get it wrong, it's not 5%, 1%, it could be 2x or 10x. Right? And if you think about it, you really want to make use of the full memory you have, the cache, for example. But if you use too much space, it doesn't fit in the cache. Now you're going to be thrashing all the way back out to main memory. And these can be 2x 10x major performance differences. And so this is where getting these magic numbers and these sounds right is really actually quite important.
SPEAKER_01
01:38:41 - 01:38:52
So you mentioned that mode is a superset of Python. Can you run Python code as if it's Mojo code?
SPEAKER_00
01:38:54 - 01:40:06
Yes. Yes. And so, and this has two sides of it. So, Mojo's not done yet. So, I'll give you this claim, where Mojo's not done yet. But already, we see people that take small pieces of Python code, move it over. They don't change it. And you can get 12x speedups. Like, somebody's just tweeting about that yesterday, which is pretty cool, right? And again, interpreters compiler, right? And so, without changing any code, without, also this is not with, this is not jik compiling or doing anything fancy, this is just, basic stuff, move it straight over. Now, Mojo will continue to grow out, and as it grows out, it will have more and more features, and our Norse stars to be a full super set of Python, and so you can bring over basically arbitrary Python code and have it just work. And it may not always be 12x faster, but it should be at least as fast and way faster in many cases. This is the goal. right now take time to do that and Python is a complicated language there's not just the obvious things but there's also non obvious things that are complicated like we have to be able to talk to see Python packages to talk to the CAPI and there's a bunch of there's a bunch of pieces so you have to make explicit the obvious
SPEAKER_01
01:40:07 - 01:40:27
I mean, so obviously, you think about it. So, you know, to run Python code, that means you have to run all the Python packages and libraries. Yeah. Yeah. So that means what? What's the relationship between Mojo and CPython, the interpreter that's? Yeah. Presumably it would be task with getting those packages to work.
SPEAKER_00
01:40:27 - 01:42:12
Yep. So in the fullness of time, Mojo will solve for all the problems and you'll be able to move Python packages over and run them in Mojo without the CPython. Without CPython, someday. Yeah, right. It's not today, but that's someday. And that'll be a beautiful day because then you'll get a whole bunch of advantages and you'll get massive speedups and things like this. But you can do that one at a time, right? You can move packages one. Exactly. But we're not willing to wait for that. Python is too important that ecosystem is too broad. We want to both be able to build mojo out. We also want to do it the right way without intense time pressure. We're obviously moving fast. And so what we do is we say, OK, well, let's make it so you can import an arbitrary existing package. arbitrary including like you write your own on your local disk whatever it's not it's not like a standard back like an arbitrary package and import that using CPI. see Python or it runs all the packages, right? And so what we do is we built an integration layer where we can actually use C Python. Again, I'm practical and to actually just load and use all the existing packages as they are. The downside of that is you don't get the benefits of mojo for those packages, right? And so the run as fast as they do in the traditional C Python way. But what that does is I give you an incremental migration path. And so if you say, hey, cool. Well, here's a, you know, the path and ecosystem is vast. I want all of it to just work. But there's certain things are really important. And so if I, if I'm doing weather forecasting or something. Well, I want to be able to load all the data. I want to be able to work with it. And then I have my own crazy algorithm inside of it. Well, normally I'd write them in C++. If I can write in Mojo and have one system that scales, well, that's way easier to work with.
SPEAKER_01
01:42:12 - 01:42:20
Is it hard to do that to have that layer that's running CPython? Because is there some communication back and forth?
SPEAKER_00
01:42:20 - 01:44:40
Yes, it's complicated. I mean, this is what we do. So I mean, we make it look easy, but it is it is complicated, but what we do is we use the CPython existing interpreter. So it's running its own byte codes and that's how it provides full compatibility. And then it gives us CPython objects. and we use those objects as is. And so that way we're fully compatible with all the CPython objects and all the, you know, it's not just the Python part, it's also the CPackages, the sea libraries underneath them because they're often hybrid. And so we can fully run and we're fully compatible with all that. And the way we do that is that we have to play by the rules, right? And so we keep objects in that representation when they're coming from that world. What's the representation that's being used in memory? We'd have to know a lot about how the CPython interpreter works. It has, for example, reference counting, but also different rules on how to pass pointers around and things like this. Super low level fiddly and it's not like Python, it's like how the interpreter works. Okay. And so that gets all exposed out and then you have to define rappers around the low level C code. Right. And so what this means is you have to know not only C, which is a different role from Python, obviously. Not only Python, but the rappers. But the interpreter and the rappers and the implementation details and the conventions and it's just a really complicated mess. And when you do that, now suddenly you have a debugger that debugs Python. They can't step into C code. So you have this two world problem. And so by pulling this all into Mojo, what you get is you get one world. You get the ability to say a cool. I have untyped, very dynamic, beautiful, simple code. Okay, I care about performance for whatever reason, right? There's lots of reasons you might care. And so then you, at types, you can parallelize things, you can vectorize things, you can use these techniques, which are general techniques to solve a problem. And then you can do that by staying in the system. And if you're, uh, you have that, that one Python package is really important to you. You can move it to Mojo. You get massive performance benefits on that. And that, another, other advantages, you know, if you like exact types, it's nice if they're enforced. Some people like that, rather than being hints, so there's other advantages too. And then you can do that in criminal ways you go.
SPEAKER_01
01:44:40 - 01:44:50
So one, different perspective on this will be Why Mojo instead of making CPI-thon faster or redesigning CPI-thon?
SPEAKER_00
01:44:50 - 01:45:53
Yeah. Well, I mean, you can argue Mojo is redesigning CPI-thon, but why not make CPI-thon faster and better and other things like that? There's lots of people working on that. So actually, there's a team at Microsoft that is really improving. I think CPI-thon 3.11 came out in October or something like that. And it was, you know, 15% faster, 20% faster across the board. which is pretty huge, given how mature Python is and things like this. And so, that's awesome. I love it. It doesn't run on GPU. It doesn't do AI stuff, like it doesn't do vectors, it doesn't do things. I'm 25% good, 35,000 times better. Right. So like they're definitely, I'm a huge fan that we're, by the way, and it composes well with what we're doing. And so it's not like we're fighting or anything like that. It's actually just, it's goodness for the world, but it's just a different path. Right. And again, we're not working forward from making Python a little bit better. We're working backwards from what is the limit of physics?
SPEAKER_01
01:45:53 - 01:46:02
What's the process of supporting Python code to Moji? What's involved in that in the process? Is there tooling for that?
SPEAKER_00
01:46:02 - 01:46:15
And not yet. So we're missing some basic features right now. And so we're continuing to drop out in new features like on a weekly basis. But, you know, at the fullness of time, give us year and a half, maybe two years.
SPEAKER_01
01:46:15 - 01:46:17
Is it an automatable process?
SPEAKER_00
01:46:17 - 01:46:20
So when we're ready, it will be very automatable.
SPEAKER_01
01:46:20 - 01:46:28
Yes. Is it automatable? Like, is it possible to automate? in the general case, the Python to mode your conversion.
SPEAKER_00
01:46:28 - 01:47:30
Yeah. Well, and you're saying it's possible. Well, so, and this is why, I mean, among other reasons why we use tabs. Yes. Right. So, first of all, by being a superset. Yeah. It's like, see versus C++. Can you move C code to C++? Yeah, right and you move you can move C code to C++ and then you can adopt classes you can adopt templates you can adopt other references or whatever C++ features you want. After you move C to C code to C++ like you can't use templates and see. And so if you leave it to see, fine, you can't use the cool features, but it still works, right? And see, and see, plus, plus, good work together. And so that's the analogy, right? Now, um, here, right, you, you, you, there, there's not a Python is bad, and the module is good. Right. Most of this gives you superpowers. Right. And so if you want to stay with Python, that's cool. But the tooling should be actually very beautiful and simple because we're doing the hard work of defining a superset.
SPEAKER_01
01:47:30 - 01:47:38
Right. So you're right. So there's several things to say there, but also the conversion tooling. should probably give you hints as to, like, how you can prove the code.
SPEAKER_00
01:47:38 - 01:49:27
And then, yeah, exactly. Once you're in the new world, then you can build all kinds of cool tools to say, like, hey, should you adopt this feature or like, and we haven't built those tools yet, but I fully expect those tools will exist, and then you can, like, you know, quote, modernize your code or however you want to look at it, right? So, I mean, one of the things that I think is really interesting about Mojo is that There have been a lot of projects to improve Python over the years. Everything from, you know, getting Python run on the Java virtual machine. Pipi, which is a jit compiler, there's tons of these projects out there that have been working on improving Python in various ways. They found it one of two camps. So Pipi is a great example of a camp that is trying to be compatible with Python. Even there, not really, it doesn't work with all the C packages and stuff like that. But they're trying to be compatible with Python. There's also another category of these things where they're saying, well, Python is too complicated. And I'm going to cheat on the edges and, you know, like integers in Python can be an arbitrary size integer. If you care about it fitting in a going fast on a register and a computer, that's really annoying. And so you can choose to pass on that. You can say, well, people don't really use big integers that often. Therefore, I'm going to just not do it and it will be fine. Not a Python superset or you can do the hard thing and say, OK, this is Python. And you can't be as super set of Python without being a superceter Python. And that's a really hard technical problem, but it's in my opinion worth it, right? And it's worth it because it's not about anyone packages about this ecosystem. It's about what Python means for the world. And it also means we don't want to repeat the Python 2 to Python 3 transition like we want people to be able to adopt this stuff quickly. And so by doing that work, we can help lift people.
SPEAKER_01
01:49:27 - 01:49:39
Yeah, the challenge is really interesting technical philosophical challenges. really making a language is super set of another language. Let's break in my brain a little bit.
SPEAKER_00
01:49:39 - 01:50:50
Well, it paints you in the corners. So again, I'm very happy with Python. So joking, all joking aside, I think that the annotation thing is not. the actual important part of the problem. But the fact that Python has amazing dynamic meta programming features, and they translate to beautiful static meta programming features, I think is profound. I think that's huge. And so Python, I've talked with Guido about this. It's like, it was not designed to do what we're doing. That was not the reason they built it this way, but because they really cared, and they were very thoughtful about how they designed the language, it scales very elegantly in the space. But if you look at other languages, for example, C and C++, right? If you're building a superset, you get stuck with the design decisions of the subset, right? And so, you know, C++ is way more complicated because of C in the legacy than it would have been if they would have theoretically designed a from scratch thing. And there's lots of people right now that are trying to make C++ better and recent tax C++ is going to be great. We'll just change all this in tax. But if you do that now suddenly you have zero packages. You don't have compatibility.
SPEAKER_01
01:50:50 - 01:51:02
So what what are the if you could just linger on that? What are the biggest challenges of keeping that superset status? What are the things just struggling with? Is it all boiled down to having a big integer?
SPEAKER_00
01:51:03 - 01:53:16
No, I mean, it's a lot of the other things. Usually it's the, it's a long-tailed weird things. So let me give you a war story. Okay. So war story in the space is, um, you go way back in time. Project I worked on is called Clang. Clang, what it is is a C C++ parser, right? And when I start working on Clang, It must have been like 2006 or something. It was my 2007, 2006, my first start working on it, right? It's funny how time flies. Yeah. Yeah. I started that project and I'm like, okay, well, I want to build a C parser, C++ parser for LVM. It's going to be the word GCC is yucky. You know, this is me in earlier times. It's yucky. It's unprincipled. It has all these weird features like all these bugs. It's Yaki, so I'm going to build a standard, compliant, C and C++ parser. It's going to be beautiful. It'll be amazing. Well, engineered, all the cool things an engine wants to do. And so I started implementing and building it out, building it out. And then I got to include standardio.h. And all of the headers in the world use all the gc stuff. Okay. And so again, come back away from theory, back to reality, right? I was at a fork on the road. I could have built an amazingly beautiful academic thing that nobody would ever use. Or I could say, well, it's Yaki in various ways. All these design mistakes, accents, history, the legacy at that point, GCC was like over 20 years old. By the way, yeah. Now, LVM's over 20 years old. Yeah. It's funny how. Yeah. Time catches up you, right? And so you say, okay, well, What is easier, right? I mean, as an engineer, it's actually much easier for me to go implement long-tailed compatibility weird features even if they're distasteful and just do the hard work and like figure it out reverse engineer, understand where it is, write a bunch of test cases, like try to understand the behavior. It's way easier to do all that work as an engineer than it is to go talk to all C programmers and get argue with them and try to get them to rewrite their code.
SPEAKER_01
01:53:18 - 01:53:20
Right. And because that breaks a lot more things.
SPEAKER_00
01:53:20 - 01:53:36
Yeah. And you have realities like nobody actually even understands how the code works because it was written by the person who quit 10 years ago. Right. And so this this software is kind of frustrating that way, but it's that's how the world works.
SPEAKER_01
01:53:36 - 01:53:40
Yeah, unfortunately, you can never be this perfect beautiful thing.
SPEAKER_00
01:53:41 - 01:54:10
Well, there are occasions in which you get to build, like, you know, you invent a new data structure or something like that. There's this beautiful algorithm that's, like, makes you super happy. I love that moment, but when you're working with people, you're working with code and dusty decode bases and things like this, right? It's not about what's theoretically beautiful, it's about what's practical, what's real, what people actually use. And I don't meet a lot of people that say, I want to rewrite all my code. Just for the sake of it.
SPEAKER_01
01:54:10 - 01:54:38
By the way, there could be machine possibilities and we'll probably talk about it where AI can help rewrite some code. That might be farther out feature, but it's a really interesting one. How that could create more. Be a tool in the battle against the monster of complexity that you mentioned. Yeah. You mentioned Guido, the benevolent dictator for life of Python. What does he think about Modio? So you've talked so much about it.
SPEAKER_00
01:54:39 - 01:56:59
I have talked with him about it. He found it very interesting. We actually talked with Guido before it launched and so he was aware of it before it went public. I have a ton of respect for Guido for a bunch of different reasons. You talk about walrus operator and Guido is pretty amazing in terms of steering such a huge and diverse community and and like driving forward and I think Python is what it is thanks to him right and so to me it was really important starting to work on Mojo to get his feedback and get his input and get his eyes on this right now a lot of what we do was it wasn't as I think and third about is have we not fragment the community? Yeah. We don't want to Python too to Python. That was that was really painful for everybody involved. And so we spent quite a bit of time talking about that and some of the tricks I learned from Swift, for example. So in the migration from Swift, we managed to like not just convert objective C into a slightly prettier objective C, which we did. We then converted not entirely, but Almost an entire community to completely different language, right? And so there's a bunch of tricks that you learn along the way, they're directly relevant to what we do. And so this is where, for example, the leverage CPython, while bringing up the new thing. Like that approach is, I think, proven and then comes from experience. And so, Guido was very interested in, like, okay, cool. Like, I think the Python is really his legacy. It's his baby. I've tons of respect for that. Incidentally, I see Mojo as a member of the Python family. I'm not trying to take Python away from Guido and from the Python community. And so, to me, it's really important that we're a good member of that community. And so, yeah, I think that, again, you would have to ask Guido this, but I think that he was very interested in this notion of like, cool, Python gets beaten up for being slow. Maybe there's a path out of it, right? And that, you know, if the future is Python, right? I mean, look, look at the far outside case on this, right? And I'm not saying this is Guido's perspective, but, you know, there's this path of saying, like, okay, well, suddenly Python can suddenly go all the places it's never been able to go before. Right, and that means the Python can go even further and can have even more impact on the world.
SPEAKER_01
01:56:59 - 01:57:04
So in some sense, Moja could be seen as Python 4.0.
SPEAKER_00
01:57:06 - 01:57:07
I would not say that.
SPEAKER_01
01:57:07 - 01:57:12
I think that would drive a lot of people really crazy because of the PTSD of the 3.02 point.
SPEAKER_00
01:57:12 - 01:57:20
I'm willing to annoy people about e-max versus them. But not that one. That's that one. I don't know. That might be a little bit far, even for me. My skin may not be that thick.
SPEAKER_01
01:57:20 - 01:58:11
But the point is the step to it being a superset and allowing all these capabilities, I think, is the evolution of a language. It feels like an evolution of a language. So he, he's interested by the ideas that you're playing with, but also concerned about the fragmentation. So how, what are the ideas you've learned? What are you thinking about? How we avoid fragmenting the community where the, the python nest is in the, uh... i don't know what to call the mojo people uh... magicians and i like it uh... can co-exist happily and and share code and basically just have these big code bases that are using uh... see python and more and more moving to the world also can that these were lessons i learned from swift and and here we face very similar problems right and swift you have objective sea super dynamic
SPEAKER_00
01:58:14 - 02:01:25
They're very different syntax, right? But you're talking to people who have large scale code bases. I mean, Apple's got the biggest largest scale code base of Object C code, right? And so none of the companies, none of the iOS developers, none of the other developers want to rewrite everything all once, and so you want to be able to adopt things piece at a time. And so a thing that I found that worked very well in the Swift community was saying, OK, cool. And this is when Swift is very young. And she said, OK, you have a million line of code, Objective C app. Don't rewrite it all. But when you implement a new feature, go implement that new class using Swift. And so now this turns out is a very wonderful thing for an app developer. But it's a huge challenge for this compiler team and the systems people that are implementing. That's right. And this comes back to what is this trade-off between doing the hard thing that enable scale versus doing the theoretically pure and ideal thing. And so Swift adopted and built a lot of different machinery to deeply integrate with the objects he run time. And we're doing the same thing with Python. Right. Now, what happened in the case of Swift is that Swift has language got more and more and more mature over time. Right. And incidentally, Mojo is a much simpler language than Swift in many ways. And so I think that Mojo will develop way faster than Swift for a variety of reasons. But as a language gets more mature and parallel with that, you have new people starting new projects. Right, and so when the language is mature and somebody's starting a new project, that's when they say, OK, cool, I'm not dealing with a million lines of code. I'll just start and use the new thing for my whole stack. Now, the problem is, again, you come back to work communities and we're people that work together. You build new subsystem or new feature or new thing in Swift or you build new thing in Mojo, then you want to be ended up being used on the other side. Right, and so then you need to work on integration back the other way. And so it's not just mojo talking Python, it's also Python talking to mojo. And so what I would love to see, I don't want to see this next month, but what I want to see over the course of time is I would love to see people that are building these packages, like, you know, numpy or, you know, TensorFlow or what, you know, these packages that are half Python, half C++. And if you say, okay, cool, I want to get out of this Python C++ world, into a unified role, and so I can move to Mojo, but I can't give up all my Python clients. these libraries get used by everybody and they're not all going to switch, you know, all once and maybe never, right? Well, so the way we should do that is we should then Python interfaces to the Mojo types. And that's what we did in Swift and work great. I mean, it was a huge implementation challenge for the compiler people, right? But there's only a dozen of those compiler people and there are millions of users. And so it's a very expensive, capital intensive, like skill set intensive problem. But once you solve that problem, it really helps adoption, it really helps the community progressively adopt technologies. And so I think that this approach will work quite well with the Python and the module world.
SPEAKER_01
02:01:25 - 02:01:52
So for a package, port it to Mojo and then create a Python interface. Yep. So how do you just to linger in these packages, NumPy, PyTorch TensorFlow? Yeah. How do they play nicely together? So is Moja supposed to be, let's talk about the machine learning ones? Is Mojo kind of vision to replace PyTorch TensorFlow to incorporate it? What's the relationship in this?
SPEAKER_00
02:01:52 - 02:02:08
All right, so dance. So take a step back. So I wear many hats. So you're you're you're angling in on the mojo side. Yes, mojo is a program in language. And so it can help solve the C C++ Python view that's happening.
SPEAKER_01
02:02:08 - 02:02:11
The fire mode you got me. I'm sorry. We should be talking about modular.
SPEAKER_00
02:02:11 - 02:03:22
Yes. Yes. Okay. So the fire mode is amazing. I love it. It's it's a big deal. The other side of this is the fire emojis and service of solving some big AI problems. And so the big AI problems are again this fragmentation, this hardware nightmare, this explosion of new potential, but it's not getting felt by the industry. And so when you look at how does the modular engine help TensorFlow and PyTorch? It's not replacing them. In fact, when I talk to the people, again, they don't like to rewrite all their code. You have people that are using a bunch of PyTorch, a bunch of TensorFlow. They have models that they've been building over the course of many years. And when I talk to them, there's a few exceptions, but generally they don't want to rewrite all their code. And so what we're doing is we're saying, OK, well, you don't have to rewrite all your code. What happens is the modular engine goes in there and goes underneath TensorFlow and PyTorch. It's fully compatible and just provides better performance, better predictability, better tooling, It's a better experience that helps lift TensorFlow and PyTorch and make them even better. I love Python. I love TensorFlow. I love PyTorch. This is about making the world better because we need AI to go further.
SPEAKER_01
02:03:22 - 02:03:41
But if I have a process that trains the model and have a process that performs inference in that model and have the model itself, what should I do with that in the long arc of history? In terms of if I use PyTorch to train it, should I rewrite stuff in Mojo with that if I care about performance?
SPEAKER_00
02:03:42 - 02:04:56
Oh, so, I mean, again, it depends. So if you care about performance, then writing in an emoji is going to be way better than writing in Python. But if you look at LLM companies, for example, if you look at OpenAI, rumored, and you look at many of the other folks that are working on many of these LLMs and other like innovative machine learning models, on the one hand, they're innovating in the data collection and the model billions of parameters and the model architecture and the are all the age F and the like all the all the cool things that people are talking about but on the other hand there's staying a lot of time writing kuda girls right and so you say wait a second how much faster could all this progress go if they were not having to hand write all these Good girls, right? And so there are a few technologies that are out there and people have been working on this problem for a while and and they try to solve subsets to the problem again, kind of fragmenting the space and so what Mojo provides for these kinds of companies is the ability to say, cool, I can have a unifying theory. Right, and again, the better together, the unifying theory, the two-world problem, or the three-world problem, or the end-world problem, like this is the thing that is slowing people down. And so as we solve this problem, I think it will be very helpful for making this whole cycle confessor.
SPEAKER_01
02:04:56 - 02:05:24
So obviously, we've talked about the transition from objective C to Swift if design this programming language. And you've also talked quite a bit about the use of Swift for machine learning context. Why have you decided to move away from, uh, maybe an intense focus on Swift for the machine learning context versus sort of deciding a new programming language that happens to be a super serious person.
SPEAKER_00
02:05:24 - 02:05:26
It's an irrational set of life choices I make.
SPEAKER_01
02:05:26 - 02:05:32
I go to the desert and did you meditate on it? Okay. All right.
SPEAKER_00
02:05:32 - 02:06:15
No, it was bold and needed and I think I mean it's just bold and sometimes to take those leaps is difficult to take Yeah, well, so okay, I mean I think there's a couple of different things so Actually I left Apple back in 2017 like January 2017 so it's been a number of years that I left Apple and the reason I left Apple was to do AI Okay, so, and again, I won't come on Apple and AI, but at the time, I wanted to get into and understand and understand the technology, understand the applications, the workloads, and so I'm going to go dive deep into Applied and AI and then the technology underneath it, right?
SPEAKER_01
02:06:15 - 02:06:20
I found myself a Google. And those like when TPUs were, yeah, waking up.
SPEAKER_00
02:06:20 - 02:08:04
Exactly. So I found myself a Google and Jeff Dean, who's a rock star, as you know, right? And the, and twice 17 TensorFlow is like really taking off and doing incredible things. And I was attracted to Google to help them with the TPUs, right? And TPUs are an innovative hardware accelerator platform. Have now, I mean, I think proven massive scale and like done incredible things, right? And so one of the things that this led into is a bunch of different projects, which I'll skip over, right? One of which was this Swiffer TensorFlow project, right? And so that project was a research project. And so the idea of that is say, okay, well, let's look at innovative new programming models where we can get a fast programming language, we can get automatic differentiation into language, so let's push the boundaries of these things in a research setting. Now, that project, I think lasted two, three years. There's some really cool outcomes of that, so one of the things that's really interesting is I published a talk at an L of M conference in 2018, and it seems like so long ago about graph programming abstraction, which is basically the thing that's in Python which too. And so PyTorch II, the Allis Dynamo, real thing, it's all about this graph program abstraction thing from Python bike codes. And so a lot of the research that was done ended up pursuing and going out through the industry and influencing things. And I think it's super exciting and awesome to see that. But the software test so project itself did not work out super well. And so there's a couple of different problems with that. One of which is that you may have noticed Swift is not Python. There's a few people that write Python code. Yes. And so it turns out that all of ML is pretty happy with Python.
SPEAKER_01
02:08:04 - 02:08:16
It's actually a problem that other programming languages have as well that they're not Python. We'll probably maybe briefly talk about Julia, which is a very interesting, beautiful programming language, but it's not Python.
SPEAKER_00
02:08:16 - 02:09:25
Exactly. Well, and so if you're saying, I'm going to solve a machine learning problem. We're all the programmers are Python programmers. And you say the first thing you have to do is switch to different language. Well, your new thing may be good or bad or whatever, but if it's a new thing, the adoption barrier is massive. It's still possible. It's still possible. Yeah, absolutely. The world changes and evolves and there's definitely room for new and good ideas, but it just makes it so much harder. Right. And so lesson learned Swift is not Python and people are not always in search of like learning a new thing for the sake of learning new thing and if you want to be compatible with all the world's code, turns out Meet the world where it is. Second thing is that, you know, a lesson learned is that Swift, as a very fast and efficient language, kind of like Mojo, but different, different take on it still. Really worked well with eager mode. And so eager mode is something with PyTorch does and it proved out really well and it enables really expressive and dynamic and easy to debug programming. TensorFlow at the time was not set up for that.
SPEAKER_01
02:09:26 - 02:09:29
Let's say that was not the timing is also important in this world.
SPEAKER_00
02:09:29 - 02:09:40
Yeah, and TensorFlow is a good thing and it has many, many strengths, but you could say, Swift for TensorFlow is a good idea except for the Swift and except for the TensorFlow part.
SPEAKER_01
02:09:42 - 02:09:47
So it's because it's not Python and TensorFlow because it's not- It wasn't set up for eight or more at the time.
SPEAKER_00
02:09:47 - 02:10:28
Yeah, there's one point now. Exactly. And so one of the things about that is in the context of it being a research project. I'm very happy with the fact that we built a lot of really cool technology. We learned a lot of things. I think the ideas went on to have influence with other systems like PyTorch. A few people use that right here, right? And so I think that's super cool. And for me personally, I learned so much from it, right? And I think a lot of the engineers that worked on it also learned a tremendous amount. And so, you know, I think that that's just really exciting to see. And, you know, I'm sorry that the project didn't work out. I wish it did, of course, right? But, you know, it's a research project and so you're there to learn from it. What's interesting to think about
SPEAKER_01
02:10:30 - 02:10:58
the evolution of programming as we come up with these whole new set of algorithms in machine learning in artificial intelligence and what's going to win out is it could be a new programming language. It could be I mean I just mentioned Julia I think there's a lot of ideas behind Julia that Mojo shares. What do your thoughts about Julia and Jen on?
SPEAKER_00
02:10:58 - 02:12:47
So I will have to say that when we launched Mojo, the one of the biggest things I didn't predict was the response from the Julia community. And so I was not, I mean, I've, okay, let me take a step back. I've known the Julia folks for a really long time. They were, they were an adoptive L of M a long time ago. They've been pushing state of the art in a bunch of different ways. Julia is a really cool system. I had always thought of Julia being mostly a scientific computing focus environment. And I thought that was its focus. I neglected to understand that one of their missions is to help make Python work end in. And so I think that was my error for not understanding that. And so I could have been maybe more sensitive to that. But there's major differences between what Mojo is doing and what Julia is doing. So as you say, Julia is not Python. And so one of the things that a lot of the Julie people came out and said is like, okay, well, if we put a ton of more energy and more money or engineering or whatever in Julia, maybe that would be better than starting Mojo, right? I mean, maybe that's true, but it still won't make Julie into Python. So if you work backwards from the goal of let's build something for Python programmers without requiring them to relearn syntax, then Julie just isn't There, right? I mean, that's a different thing, right? And so if you anchor on, I love Julia and I want Julia to go further, then you can look at it from a different lens, but the lens we're coming at was, hey, everybody is using Python, Python isn't syntax isn't broken. Let's take what's great about Python, make it even better, and so it's just a different starting point. So I think Julia's great language. The communities of lovely community, they are doing really cool stuff, but it's just a slightly different angle.
SPEAKER_01
02:12:48 - 02:13:00
But it does seem that Python is quite sticky. Is there some, uh, philosophical almost thing you could say about why Python by many measures seems to be the most popular programming language in the world?
SPEAKER_00
02:13:01 - 02:14:25
Well, I can tell you things I love about it. Maybe that's one way to answer the question. Right. So huge package ecosystem. Super lightweight and easy to integrate. It has very low start-up time. Right. So what start-up time you mean like like my curve or what? Yeah. So if you look at certain other languages, you say like go and it just takes a like Java, for example, takes a long time to jick compile all the things and and then the VM starts up and the garbage clusters kicks in and then and it revs its engines, and then it can plow through a lot of internet stuff or whatever, right? Python is like scripting, it just goes, right? Python has very low compile time, like so you're not sitting there waiting. Python integrates in a notebook, so in a very elegant way that makes exploration super interactive and it's awesome, right? Python is also, it's like almost the glue of computing, because it has such a simple object representation, a lot of things plug into it. that dynamic meta programming thing we were talking about, also enables really expressive and beautiful APIs, right? So there's lots of reasons that you can look at technical things that Python has done and say like, okay, wow, this is actually a pretty amazing thing. And any one of those, you can neglect people all just talk about indentation and ignore like the fundamental things. But then you also look at the community side, right? So Python owns machine learning. Machine learning is pretty big. Yeah, and it's growing. It's growing and important, right?
SPEAKER_01
02:14:25 - 02:14:39
And so in this repetition, prestige to machine learning, to where, like, if you're a new programmer, you're thinking about, like, which programming language to use well, I should probably care about machine learning, therefore, let me try Python and kind of builds and builds and builds.
SPEAKER_00
02:14:39 - 02:14:58
And he even go, go back before that. Like, my kids learn Python. Right now, because I'm telling him we're going to Python, but because what they were telling is you are what? Well, they also learn scratch, right? And things like this, too. But it's because Python is taught everywhere, right? Because it's easy to learn, right? And because it's pervasive, right?
SPEAKER_01
02:14:58 - 02:15:09
And there's my day, we learn jowels, he plus plus. Yeah. Well, uphill, both directions. But yes, I guess Python is the main language of teaching software engineering and schools now.
SPEAKER_00
02:15:09 - 02:15:31
Yeah, well, if you look at this, there's these growth cycles. Right? If you look at what causes things to become popular and then gain in popularity, there's reinforcing feedback loops and things like this. And I think Python has done, again, the whole community has done a really good job of building those growth loops and help propel the ecosystem. And I think that, again, you look at what you can get done with just a few lines of code. It's amazing.
SPEAKER_01
02:15:32 - 02:16:03
So this kind of self-building loop is interesting to understand because when you look at Mojo, what it stands for, some of the features, it seems sort of clear that this is a good direction for programming languages to evolve in the machine learning community. But it's still not obvious that it will will because of this, whatever the engine and popularity of virality, Um, is there something you can speak to like, how, how do you get people to switch?
SPEAKER_00
02:16:03 - 02:16:12
Yeah. Well, I mean, I think that the viral growth loop is just what people do in a code. Yeah, I think the unit code file extensions are what I'm betting on. I think that's going to be the thing.
SPEAKER_01
02:16:12 - 02:16:16
Yeah. Tell the kids that you can use the fire emoji.
SPEAKER_00
02:16:16 - 02:17:34
Exactly. Exactly. Well, in all seriousness, I mean, I think there's really, I'll give you two opposite answers. One is, I hope if it's useful if it solves problems and people care about those problem things solved, they'll adopt the tech. Right, that's kind of the simple answer. And when you're looking to get tech adopted, the question is, is it solving an important problem people needs to solve? And is the adoption cost low enough that they're willing to make the switch and cut over and do the pain up front so that they can actually do it, right? And so hopefully, Mojo will be that for a bunch of people. And, you know, people building these hybrid packages are suffering. It's really painful. And so I think that we have a good shot of helping people. But the other side is like, it's okay if people don't use Mojo. Like, it's not my job to say, like, every day should do this. Like, I'm not saying Python is bad. Like, I hope Python see Python, like, all these implementations because Python ecosystem is not just see Python. It's also a bunch of different implementations with different trade-offs. And this ecosystem is really powerful and exciting as are other programming languages. It's not like TypeScript or something is going to go away, right? And so there's not a winner take all thing. And so I hope that Mojo is exciting and useful to people, but if it's not, that's also fine.
SPEAKER_01
02:17:34 - 02:18:36
But I also wonder what the use case for why you should try Mojo would be. So practically speaking. Yeah. It seems like, uh, so there's entertainment. There's the dopamine hit of saying, holy shit, this is ten times faster. Uh, this little piece of code is ten times faster in mode. Yeah. And I boxed before I get to thirty five thousand. Exactly. I mean, just even that, I mean, that's the dopamine hit that, uh, every programmer sort of dreams of is the the optimization. It's, it's also the drug. They can pull you in and have you waste way too much of your life, optimizing and over optimizing, right? But so what, what do you see that would be like common? Is this very hard to predict, of course? But, you know, if you look 10 years from now on, Moja's super successful, what do you think would be the thing? what people like try and then use it regularly and it kind of grows and grows and grows and grows.
SPEAKER_00
02:18:36 - 02:22:04
So you talk about dopamine hit. And so again, humans are not one thing. And some people love rewriting their code and learning new things and throwing themselves into deep out and try out new things. In my experience, most people don't like their too busy. They have other things going on. By number, most people don't like this. I want to rewrite all my code. But even those people, the two busy people, the people that don't actually care about the language that it just care about getting stuff done, those people do like learning new things, right? And so you talk about the dopamine rush of 10x faster. Wow, that's cool. I want to do that again. Well, it's also like, here's the thing I've heard about in a different domain. And I don't have to read it on my code. I can learn a new trick. Right? Well, that's called growth. And so, and so one thing that I think is cool about Mojo, and again, those will take a little bit of time for, for example, the blog posts and the books and like all that kind of stuff to develop and the languages get further along. But what we're doing, you talk about types. Like you can say, look, you can start with the world you already know, and you can progressively learn new things and adopt them where it makes sense. If you never do that, That's cool. You're not a bad person. If you get really excited, I don't want to go all the way in the deep end and write, write everything and really whatever, that's cool, right? But I think the middle path is actually the more likely one where it's, you know, you come out with a new idea and you discover while that makes my code way simpler, way more beautiful, way faster, way whatever. And I think that's what people like. Now, if you fast forward and you said like 10 years up, right? I can give you a very different answer on that, which is I mean, if you go back and look at what computers look like 20 years ago. every 18 months they got faster for free. Right. Two X faster every 18 months. It was like clockwork. It was it was free, right? You go back 10 years ago and we entered in this world where suddenly we had multi core CPUs and we had GPUs. And if you squint and turn your head, what did GPUs is it's just a many core very simple CPU thing kind of, right? And so and 10 years ago it was CPUs and GPUs and graphics. Today, we have CPUs, graphics, and AI because it's so important because the compute is so demanding because of the smart cameras and the watches and all the different places the AI needs to work in our lives. It's caused this explosion of hardware. And so part of my thesis, part of my belief of where computing goes, if you look out 10 years from now, is it's not going to get simpler? physics isn't going back where we came from. It's only get weirder from here on out. Right. And so to me, the exciting part about what we're building is it's about building that universal platform, which world can continue to get weird. Because again, I don't think it's avoidable. It's physics. But we can help lift people, scale, do things with it. And if we write their code every time a new device comes out, and I think that's pretty cool. And so if Mojo can help with that problem, then I think that it will be hopefully quite interesting and quite useful to wide range of people because there's so much potential and maybe analog computers will become a thing or something, right? And we need to be able to get into a mode where we can move this programming model forward. But do so in a way where we're lifting people and Growing them instead of forcing them to rewrite all their code and exploding them.
SPEAKER_01
02:22:04 - 02:22:11
Do you think there'll be a few major libraries that go mojo first?
SPEAKER_00
02:22:11 - 02:22:20
Well, so I mean, the modular engine is all mojo. So again, come back to like we're not building mojo because it's fun. We're building mojo because we had two to solve these accelerators.
SPEAKER_01
02:22:20 - 02:22:23
That's the origin story. But I mean, once they're currently Python.
SPEAKER_00
02:22:23 - 02:23:12
Yeah. So I think that a number of these projects will. And so one of the things, again, this is just my best guess. Like each of the patch maintainers also has I'm sure plenty of other things going on. People don't really like rewriting code just for sake of rewriting code. But sometimes people are excited about adopting a new idea. It turns out that while rewriting code is generally not people's first. thing, it turns out that redesigning something while you rewrite it and using a rewrite as an excuse to redesign can lead to the 2.0 of your thing that's way better than the 1.0, right? And so I have no idea. I can't predict that. But there's a lot of these places where, again, if you have a package that is half-sea and half-python, right, you just solve the pain, make it easier to move things faster, make it easier to debug and evolve your
SPEAKER_01
02:23:12 - 02:23:27
Tech adopting module kind of makes sense to start with and then it gives you this opportunity to rethink these things So the two big gains are that there's a performance gain and then There's the portability to all kinds of different devices
SPEAKER_00
02:23:28 - 02:24:58
And they're safety, right? So you talk about real types. I mean, not saying this is for everybody, but that's actually a pretty big thing. Yeah, types. So, and so there's a bunch of different aspects of what, you know, what value module provides. And so I mean, it's funny for me, like I've been working on. these kinds of technologies and tools for too many years now. But you look at Swift, right? And we talked about Swift for TensorFlow, but Swift has a programming language, right? Swift's now 13 years old from when I started it. Yeah, so because I started in 2010 if I remember and so that that project and I was involved with it for 12 years or something right that that project has gone through zone really interesting story right and it's a mature successful use by millions of people system right certainly not dead yet, but but also going through that story are I learned a tremendous amount about building languages about building pillars about working with community and things like this and so that experience like I'm helping channel and bring directly in a mojo and you know other systems same thing like apparently like building building and entering and evolving things and so you look at this LVM thing I worked on 20 years ago You look at MLIR, right? And so a lot of the lessons learned in L of M got fed into MLIR. And I think that MLIR is a way better system than L of M was. And you know, Swift is a really good system. And it's, it's amazing. But I hope that Mojo will take the next step for step forward in terms of design.
SPEAKER_01
02:24:58 - 02:25:11
In terms of running Mojo, people can play with it. What's Mojo Playground? Yeah. from the interface perspective and from the hardware perspective, was this incredible thing running on.
SPEAKER_00
02:25:12 - 02:25:44
Yeah, so right now, so here we are two weeks after launch. Yes, we decided that, okay, we're we have this incredible set of technology that we think might be good, but we have not given it to lots of people yet. And so we're very conservative and said, let's put it in a workbook. So if it crashes, we can do something about it. We can monitor and track that, right? And so again, things are still super early, but we're having like one person a minute sign up. with over 70,000 people. Two weeks in, it's kind of crazy.
SPEAKER_01
02:25:44 - 02:25:48
So you, you can sign up to what you play ground and you can use it in the cloud.
SPEAKER_00
02:25:48 - 02:26:16
Yeah, in your browser. And so what that's running on. Yeah, what that's running on is that's running on cloud VMs. And so you share a machine with a bunch of other people. But it turns out there's a bunch of them now because there's a lot of people. And so what you're doing is getting free compute and you're getting a play with the thing and kind of a limited controlled way so that we can make sure that it doesn't Totally crash and be embarrassing right here. So now a lot of the feedback we've gotten is people want to download it locally.
SPEAKER_01
02:26:16 - 02:26:21
So we're working on that right now and so that's the goal that we able to download locally too.
SPEAKER_00
02:26:21 - 02:28:48
So that's what it really expects and so we're working on that right now and so we just want to make sure that we do it right and I think this is one of the lessons I learned from Swift also, by the way, is that when we launch Swift, it actually feels like forever ago, it was 2014. And I mean, it was super exciting. I and we, the team had worked on Swift for a number of years in secrecy. And we, four years into this development, roughly, have worked on this thing at that point about 250 people that Apple knew about it. Okay, so a secret. Apple's good at secrecy and it was a secret project. And so we launched this at WDC, a bunch of hoopla and excitement and said developers are going to be able to develop and submit apps the app store in three months. Okay, well, several interesting things have happened, right? So first of all, we learned that hey, it had a lot of bugs. and it was not actually production quality and it was extremely stressful in terms of like trying to get it working for a bunch of people. And so what happened was we went from zero to, you know, I don't know how many developers Apple had at the time, but a lot of developers overnight and they ran to a lot of bugs and it was really embarrassing and it was very stressful for everybody involved, right? It was also very exciting because everybody was excited about that. The other thing I learned is that when that happened roughly every software engineer who did not know about the project at Apple their head exploded when it was launched because they didn't know it was coming. And so they're like, wait, what is this? I signed up to work for Apple because I love objective C. Why is there a new thing, right? And so now what that meant practically is that the push from launch to first of all the fall, but then the two don't own three don't own like all the way forward was super painful for the engineering team. And myself, it was very stressful. The developer community was very grumpy about it because they're like, okay, well, wait a second, you're changing and breaking my code and like we have to to fix the bugs and it was just like a lot of tension and friction on all sides. There's a lot of technical debt in the compiler because we have to run really fast and you have to go implement the thing and unlock the use case and do the thing. And you know, it's not right, but you'd never have time to go back and do it right. And I'm very proud of the Swift team because they've come We, but they came so far and made so much progress over over this time since launch. It's pretty incredible. And Swift is a very, very good thing. But I just don't want to do that again, right?
SPEAKER_01
02:28:48 - 02:28:53
And so I'm going to iterate more through the development process.
SPEAKER_00
02:28:53 - 02:29:47
And so what we're doing is we're not launching it when it's hopefully zero dot nine with no testers. We're launching it and saying it's zero dot one, right? And so we're saying expectations of saying like, okay, well, don't use this for production. If you're interested in what we're doing, we'll do it in an open way, and we can do it together, but don't use it in production yet. We'll get there, but let's do it the right way. And I'm also saying, we're not in a race. The thing that I want to do is build the world's best thing, right? Because if you do it right, and it lifts the industry, it doesn't matter if it takes an extra two months. Like two months is worth waiting. And so doing it right and not being overwhelmed with technical debt and things like this is like, again, war wounds, lessons learned, whatever you want to say, I think is absolutely the right thing to do. Even though right now, people are very frustrated that, you know, you can't download it or it doesn't have feature X or something like this.
SPEAKER_01
02:29:48 - 02:30:10
What have you learned in a little bit of time since it's been released into the wild that people have been complaining about future acts or why or see what have they been complaining about what they have been excited about, like, almost like detailed things versus big, I think everyone that would be very excited about the big vision.
SPEAKER_00
02:30:10 - 02:31:52
Yeah, yeah. Well, so I mean, I've been very pleased. In fact, I mean, we've been massively overwhelmed with response, which is a good problem to have. It's kind of like a success disaster. Yeah. And it sounds right. And so I mean, if you go back in time when we started modular, which is just not yet a year and a half ago. So it's still a pretty new company, new team, small but very good team of people. like we started with extreme conviction that there's a set of problems that we need to solve and if we solve it, then people will be interested in what we're doing. But again, you're building in basically secret. You're trying to figure it out. It's the creation's a messy process. You're having to go through different paths and understand what you want to do and how to explain it. Often when you're doing disruptive and new kinds of things, just knowing how to explain it is super difficult. And so when we launched, we hope people would be excited. But, you know, I'm an optimist, but I'm also like, don't want to get ahead of myself. And so when people found out about Mojo, I think their heads exploded a little bit. And, you know, here's, I think a pretty credible team. There's built some languages and some tools before. And so they have some lessons learned. And are tackling some of the deep problems in the Python ecosystem and giving it the love and attention that it should be getting. And I think people got very excited about that. And so if you look at that, I mean, I think people are excited about ownership and taking a step beyond rust, right? And there's people that are very excited about that. There's people that are excited about, you know, just like I made Game of Life go 400 times faster. Right, and things like that. And that's really cool. There are people that are really excited about the, okay, I really hate writing stuff in C++. Same for me.
SPEAKER_01
02:31:52 - 02:31:55
Like systems in your, they're like stepping up like, yeah, yeah.
SPEAKER_00
02:31:55 - 02:32:02
So that's, that's, that's me by the way. Also, um, I don't want to stop writing C++.
SPEAKER_01
02:32:02 - 02:32:11
But the, um, I get third person excitement when people tweet, he, I made this code game of life or whatever, it's faster. You're like,
SPEAKER_00
02:32:11 - 02:32:26
Yeah, and also like, I would also say that let me let me cast blame out to people who deserve it. Sure. These terrible people who convinced me to do some of this. Yes. Jeremy Howard. Yes. That guy.
SPEAKER_01
02:32:27 - 02:32:29
Well, he's been pushing for this kind of thing.
SPEAKER_00
02:32:29 - 02:32:30
He's wanted this for years.
SPEAKER_01
02:32:30 - 02:32:32
Yeah, he's wanted this for a long long time.
SPEAKER_00
02:32:32 - 02:32:33
He's wanted this for years.
SPEAKER_01
02:32:33 - 02:32:54
And for people who don't know Jeremy Howard, he's like one of the most legit people in the machine learning community. He's a grass roots. He really teaches, he's an incredible educator, he's incredible teacher, but also legit in terms of a machine learning engineer himself. Yes. And he's been running the fast.ai and looking, I think, for exactly what you've done.
SPEAKER_00
02:32:54 - 02:33:51
Exactly. And so, I mean, the first time so I met Jeremy pretty early on but the first time I set up and I'm like this guy is ridiculous as when I was at Google and we're bringing up TPUs and we had a whole team of people and we're there's this competition called Don Bench of who can train ImageNet yeah fastest right yes and Jeremy and one of his researchers crushed Google yeah not through sheer force of the amazing amount of compute and the number of TPUs and stuff like that that he just decided that progressive imagery sizing was the right way to train the model and if you're a epoch faster and make the whole thing go go room right and I'm like this guy is incredible right so you can say anyways come back to you know where's Mojo coming from Chris finally listen to Jeremy It's all his fault.
SPEAKER_01
02:33:51 - 02:34:19
There's a kind of very refreshing pragmatic view that he has about machine learning that I don't know if it's a mix of desire for efficiency, but ultimately grounded in desire to make machine learning more accessible to a lot of people. I don't know what that is. I guess that's coupled with efficiency and performance, but it's not just obsessed about performance.
SPEAKER_00
02:34:19 - 02:36:03
So a lot of AI and AI research ends up being that it has to go fast enough to get scale. So a lot of people don't actually care about performance, particularly on the research side, until it allows them to have a bigger data set. And so suddenly, now you care about distributed computing, like all these exotic HVC, like you don't actually want to know about that. You just want to be able to do more experiments faster and do so with bigger data sets, right? And so Jeremy has been really pushing limits. And one of the things I'll say about Jeremy, and there's many things I could say about Jeremy, because I'm a fanboy if it is, but he fits in his head. And Jeremy actually takes the time where many people don't to really dive deep into why is the beta parameter of the atom optimizer equal to this. And he'll go survey and understand what are all the activation functions and the trade-offs and why is it that everybody does. you know this model pick that thing so the why not just trying different values like really what is going on here and so as a consequence of that like he's always he again he makes time but he he spends time to understand things that are depth that a lot of people don't and as you say he then brings it and teaches people and his his mission is to help lift You know, his website says, making AI uncool again. Like, it's about, like, forget about the hype. It's actually practical and useful. It's teach people how to do this, right? Now, the problem Jeremy struggled with is he's pushing the envelope, right? Research isn't about doing the thing that is staying on the happy path or the, the well paved road, right? And so a lot of the systems today have been, these really fragile, fragmented things are special cases in this happy path. And if you fall off the happy path, get in by an alligator.
SPEAKER_01
02:36:03 - 02:36:18
So what about so Python has this giant ecosystem of packages and as a package repository do you have ideas of how to do that well for Mojo? Yeah, how to do a repository of packages?
SPEAKER_00
02:36:18 - 02:36:30
Well, so that's another really interesting problem that I knew about but I didn't understand how big of a problem it was Python packaging. A lot of people have very big pain points and a lot of scars with python packaging.
SPEAKER_01
02:36:30 - 02:36:34
Oh, you mean, uh, so there's something building and distributing.
SPEAKER_00
02:36:34 - 02:36:37
Yes, managing dependencies and versioning and all this stuff.
SPEAKER_01
02:36:37 - 02:36:40
So from the perspective of if you want to create your own package.
SPEAKER_00
02:36:40 - 02:38:04
Yes. Yeah. And then, or you want to build on top of a bunch of other people's packages and then they get updated and like this. Now, I'm not an expert in this, so I don't know the answer. I think this is one of the reasons why it's great that we work as a team and there's other really good inspired people involved. One of the things I've heard from smart people who've done a lot of this is that the packaging becomes a huge disaster when you get the Python and see together. And so if you have this problem where you have code split between Python and C, now not only do you have to package the C code, you have to build the C code. C doesn't have a package manager, right? C doesn't have a dependency version management system, right? And so I'm not experiencing the state of the art and all the different Python package managers, but my understanding is that's a massive part of the problem. And I think Mojo solves that part of the problem directly heads on. Now, one of the things I think we'll do with the community, and this isn't, again, we're not solving all the rules problems at once. We have to be kind of focused, start with, is that I think that we will have an opportunity to reevaluate packaging, right? And so I think that we can come back and say, OK, well, given the new tools and technologies and the cool things we have that we've built up, because we have not just syntax, we have an entirely new compiler stack that works in a new way. Maybe there's other innovations we can bring together and maybe we can help solve that problem.
SPEAKER_01
02:38:04 - 02:38:37
It's almost a tangent to that question from the user perspective of packages. It was always surprising to me that it was not easier to sort of explore and find packages. you know with with Pippin stall and it just it feels it's an incredible ecosystem here. It's just interesting that it wasn't made it's still I think not made easier to discover packages to do like a search and discovery as you can see.
SPEAKER_00
02:38:37 - 02:39:21
Well, I mean, it's kind of funny because this is one of the challenges of these like intentionally decentralized communities. And so I don't know what the right answer is for Python. I mean, there are many people though. or I don't even know the right answer for Mojo. So there are many people that would have much more informed opinions than I do, but it's interesting if you look at this, open source communities. You know, there's get, get as a fully decentralized, and anybody could do any way they want, but then there's GitHub, right, and GitHub centralized, commercial, in that case, right, thing, really helped pull together and help solve some of the discovery problems and help build a more consistent community. And so maybe there's opportunities for something like a GitHub for me.
SPEAKER_01
02:39:21 - 02:39:31
Yeah. I'll do even GitHub might be wrong on this, but the assertion discovery for GitHub was not that great. Like I still use Google search.
SPEAKER_00
02:39:31 - 02:39:45
Yeah. Well, I mean, maybe that's because GitHub doesn't want to replace Google search. Right. And I think there is room for specialized solutions to specific problems. But sure. I don't know. I don't know the right answer for GitHub either. That's they can go figure that out.
SPEAKER_01
02:39:46 - 02:39:51
But the point is to have an interface that's usable, that's excessive to people with all different skill levels.
SPEAKER_00
02:39:51 - 02:40:21
Well, and again, like what are the benefit of standards, right? Standards allow you to build these next level up ecosystem and next level up infrastructure and next level up things. And so again, come back to a hate complexity. See, see plus Python is complicated. It makes everything more difficult to deal with. It makes it difficult to port, move code around, work with, all these things get more complicated. I mean, I'm not an expert, but maybe Mojo can help a little bit by helping reduce the amount of seed in this ecosystem and make it their first skill.
SPEAKER_01
02:40:21 - 02:40:45
So when you kind of packages the hybrid in nature would be a natural fit to move to Mojo. Which is a lot of them. A lot of them, especially doing some interesting stuff, computationalized. Let me ask you about some features. Yeah. So we talked about obviously a dentation that is the type language or optionally typed. Is that the right way to say it?
SPEAKER_00
02:40:45 - 02:40:52
It's either optionally to progressively. I think so so people have very strong opinions on the right word to use. Yeah. I don't know.
SPEAKER_01
02:40:52 - 02:41:02
I look forward to your letters. So there's the VAR versus let, but let is for constants, VAR is an optional.
SPEAKER_00
02:41:02 - 02:41:05
Yeah, VAR makes it mutable so you can reassign.
SPEAKER_01
02:41:05 - 02:41:23
Okay. Then there's a function overloading. Oh, okay. Yeah. There's a lot of source of happiness for me, but function overloading that's, I guess, is that, is that for performance or is that, why does Python not have function overloading?
SPEAKER_00
02:41:25 - 02:43:40
So I can speculate. So Python is a dynamic language. The way it works is the Python object of C are actually very similar worlds if you ignore syntax. And so object of C is straight line derived from small talk. It really, venerable, interesting language that much of the world has forgotten about, but the people that remember it, love it generally. And the way that small talk works is that every object has a dictionary in it. And the dictionary maps from the name of a function, or the name of a value within an object to its implementation. And so the way you call a method in an object of C is you say, go look up the way I call foods. I go look up food. I get a pointer to the function back and then I call it. Okay. That's how Python works. And so now the problem with that is that the dictionary within a Python object, all the keys are strings. And it's a dictionary. So you can only have one entry per name. You think it's as simple as that? I think it's as simple as that. And so now, why do they never fix this? Like why do they not change it to not be a dictionary? Like do other things? Well, you don't really have to in Python because it's dynamic. And so you can say, I get into the function. Now if I got past an integer, do some dynamic test for it. If it's a string or another thing. There's another additional challenge, which is, even if you did support overloading, you're saying, okay, well, here's a version of a function for integers and a function for strings. Well, you'd have, even if you could put it in that dictionary, you'd have to have the caller do the dispatch. And so, every time you call the function, you'd have say, like, is an integer string? And so, you'd have to figure out where to do that test. And so, in a dynamic language, overloading is something you don't have to have. But now you get into a type language and, you know, and Python, if you subscript with an integer, then you get typically one element out of a collection. If you subscript with a range, you get a different thing out, right? And so often in type languages, you'll want to be able to express the fact that, cool, I have different behavior depending on what I actually pass into this thing. And if you can model that, it can make it safer, more predictable and faster, and like all these things.
SPEAKER_01
02:43:41 - 02:43:50
It somehow feels safe for us, but it also feels empowering. In terms of clarity, you don't have to design whole different functions.
SPEAKER_00
02:43:50 - 02:44:15
Yeah, well, this is also one of the challenges with the existing Python typing systems is that in practice, like you take sub script, like in practice a lot of these functions, they don't have one signature. They actually have different behavior in different cases. And so this is why it's difficult to like retrofit this into existing Python code and make it play well with typing. You can have to design for that.
SPEAKER_01
02:44:15 - 02:44:34
Okay, so there's an interesting distinction that people that program Python might be inched in is deaf versus FM. So it's two different ways to define a function. And FM is a stricter version of deaf. What's the coolness that comes from the stricness?
SPEAKER_00
02:44:34 - 02:46:23
So here you get into, what is the trade-off with a superset? Yes. So superset, you really want to be a compatible. If you're doing a superset, you've decided compatibility with existing code is the important thing, even if some of the decisions they made were maybe not what you choose. So that means you put a lot of time in compatibility, and that means that you get locked into decisions of the past, even if they may not have been a good thing. Now, systems programmers typically like to control things. And they want to make sure that, you know, not not all cases, of course. And even systems programmers are not one thing, right? But often you want predictability. And so one of the things that Python has, for example, as you know, is that if you define a variable, you just say X equals four. I have a variable named X. Now I say some long name equals 17. Print out some long name. Oops by typoed it. Right? Well, the compiler, the Python compiler doesn't know. In all cases, what you're defining and what you're using. And did you typo the use of it or the definition? Right. And so for people coming from type languages, again, I'm not saying that right wrong, but that drives them crazy because they want to compile it to tell them you type out the name of this thing. Right. And so what FN does is it turns on, as you say, it's a strict mode. And so it says, OK, well, you have to actually declare intentionally declare your variables before you use them. That gives you more predictability, more air checking and things like this, but you don't have to, you don't have to use it. And this is a way that Mojo is both compatible because devs work the same way that devs have already always worked. But it provides a new alternative that gives you more control and allows certain kinds of people that have a different philosophy to be able to express that and get that.
SPEAKER_01
02:46:23 - 02:46:27
But usually if you're writing Mojo code from scratch, you'll be using a fan.
SPEAKER_00
02:46:28 - 02:46:45
It depends, again, depends on your mentality, right? It's not, it's not that Def is Python and FN is Mojo. Mojo has both, and it loves both, right? It really depends on, it's just straight. Yeah, exactly. Are you playing around and scripting something out, and is it a one-off throw away script?
SPEAKER_01
02:46:45 - 02:46:50
Cool, like Python is great at that. I'll still be using FN, but yeah. Well, so I love strictness.
SPEAKER_00
02:46:50 - 02:47:00
Okay, well, so control. suffering, right? Yes. How many, how many, how many pull ups?
SPEAKER_01
02:47:00 - 02:47:03
I've lost count at this point.
SPEAKER_00
02:47:03 - 02:47:17
So I mean, that's cool. I love you for that. Yeah. And I love other people who like strict things, right? But I don't want to say that that's the right thing because Python's also very beautiful for hacking around and doing stuff in research and these other cases where you may not want that.
SPEAKER_01
02:47:17 - 02:47:32
Yeah, just feel like Maybe I'm wrong with that, but he feels like strictness leads to faster debugging. So in terms of going from even on a small project from zero to completion, I guess it depends how many bugs you generate usually.
SPEAKER_00
02:47:32 - 02:48:25
Well, so I mean, it's again, lessons learned and looking at the ecosystem. It's really I think it's, if you study some of these languages over time, like the Ruby community, for example. Ruby is a pretty well-developed, pretty established community, but along their path, they really invested in unit testing. So I think that the Ruby community is really pushed forward the state of the art of testing, because they didn't have a type system that caught a lot of bugs in compile time. And so you can have the best of both worlds. You can have good testing and good types and things like this. But I thought that that was really interesting to see how certain challenges get solved. And in Python, for example, the interactive notebook kind of experiences and stuff like this are really amazing. And if you type out something, It doesn't matter, it just tells you. That's fine, right? And so, I think that the trials are very different if you're building a, you know, large scale production system versus you're building an exploring in a notebook.
SPEAKER_01
02:48:25 - 02:48:34
And the speaking of control, the hilarious thing, if you look at code, I write just for myself a fun. It's like littered with asserts everywhere. Okay.
SPEAKER_00
02:48:34 - 02:48:37
It's, it's a kind of... Yeah, you'd like to.
SPEAKER_01
02:48:37 - 02:48:44
It's basically saying, uh, in a dictatorial way, this should be true now, otherwise everything stops.
SPEAKER_00
02:48:46 - 02:48:55
That is the sign. I love you man. That is the sign of somebody who likes control. And so yes, I think that you're like a fan.
SPEAKER_01
02:48:55 - 02:49:04
I think of like mojo. There be session. Yes, I definitely will speak in a research. Acceptions are called errors. Why is it called errors?
SPEAKER_00
02:49:05 - 02:51:33
So we, I mean, we use the same, we're the same as Python, right? But we implement it very different way, right? And so if you look at other languages, like we'll pick on C++ or favor, right? C++ has a thing called zero cost exception to hand way. Okay. And this is, in my opinion, something to learn lessons from, it's a nice playlist thing. And so zero cost exception handling the way it works is that it's called zero cost because If you don't throw an exception, there's supposed to be no overhead for the non-error code. And so it takes the air path out of the common path. It does this by making throwing an error extremely expensive. And so if you actually throw an error with a C++ compiler using exceptions, let's go look up in tables on the side and do all this stuff. And so throwing an error could be like 10,000 times more expensive than referring for a function. Also, it's called zero-cost exceptions, but it's not zero-cost. By any stretch of the imagination, because it massively blows out your code, your binary, it also adds a whole bunch of different paths because of destructors and other things like that that exists in C++. And it reduces number of optimizations and has like all these effects. And so the thing that was called zero-cost exceptions It really ain't. Okay. Now, if you fast forward to newer languages and this includes Swift and Rust and Go and now Mojo, well, in Python's a little bit different because it's interpreted and so it's got a little bit of a different thing going on, but if you look at it, if you look at compiled languages, Many newer languages say, okay, well, let's not do that through a cost exception handling thing. Let's actually treat and throwing an error the same as returning a variant, returning either the normal or so or an error. Now, programmers generally don't want to deal with all the typing machinery and pushing around a variant. And so you use all the syntax that Python gives us, for example, try and catch functions that raise and things like this. You can put a raises, decorate around your functions, stuff like this. And if you want to control that, and then the language can provide syntax for it, but under the hood, the way the computer executes it, throwing an error is basically as fast as returning something.
SPEAKER_01
02:51:33 - 02:51:37
So it's exactly the same way. It's a Wikipedia perspective.
SPEAKER_00
02:51:37 - 02:53:12
And so this is actually, I mean, it's a fairly nerdy thing, right? Which is why I love it. But this has a huge impact on the way you design your APIs. Right? So in C++, huge communities turn off exceptions because the cost is just so high. Right? And so the zero cost cost is so high, right? And so that means you can't actually use exceptions in many libraries. Right. And even for the people that do use it, well, okay, how and when do you want to pay the cost? If I try to open a file, should I throw an error? Well, what if I'm probing around looking for something, right? I'm looking up and made it from past. Well, if it's really slow to do that, maybe a lot of another function that doesn't throw an error returns an error code instead. And I have two different versions of the same thing. And so it causes you to fork your APIs. And so, you know, one of the things I learned from Apple and I still love is the art of the API design is actually really profound. I think this is something that Python's also done a pretty good job at in terms of building out this large-scale package ecosystem. It's about having standards and things like this. And so, you know, we wouldn't want to end for a mode where You know, there's this theoretical feature that exists in language, but people don't use it in practice. Now, I'll also say one of the other really cool things about this implementation approach is that it can run GPUs and it can run accelerators and things like this. And that standard zero cost exception thing would never work on an accelerator. And so this is also part of how modular can scale all the way down to like little embedded systems and to run GPUs and things like that.
SPEAKER_01
02:53:12 - 02:53:30
Can you actually say about the Maybe is there some high level way to describe the challenge of exceptions and how they work in code, doing compilation? So just this idea of percolating up a thing, an error.
SPEAKER_00
02:53:30 - 02:54:54
Yeah. Yeah. So the way to think about it is, think about a function that doesn't refer anything, just as a simple case, right? And so you have function 1, call as function 2, call as function 3, call as function 4, along that call stack that are tri blocks. And so if you have function 1, call as function 2, function 2 as the tri block. And then within it, it calls function 3. Well, what happens if function 3 throws? Well, actually start simpler. What happens if it returns? Well, if it returns, it's supposed to go back out and continue executing and then fall off the bottom of the tripod and keep going and it all's good. If the function throws, you're supposed to exit the current function and then get into the accept clause, right? And then do whatever code there and then keep falling on and going on. And so the way that a compiler like Moja works is that the call to that function, which happens in the except block, calls the function, and then instead of returning nothing, it actually returns a variant between nothing and an error. And so if you return normally, fall off the bottom or do return, You refer nothing and if you throw through an error you Return the variant that is I'm an error, right? So when you get to the call you say okay cool. I call the function. Hey, I know locally. I'm in a tribal lock, right? And so I call the function and then I check to see what it returns a half as that error thing jump to the except one
SPEAKER_01
02:54:55 - 02:54:57
And that's all done for you behind the scenes.
SPEAKER_00
02:54:57 - 02:55:23
Exactly. And so the compiler is all this for you. And I mean, one of the things if you dig into how this stuff works and Python, it gets a little bit more complicated because you have finally blocks, which now need you need to go into do some stuff and then Those can also throw and return. Wait, what? Yeah. Like the stuff matters for compatibility. Like there's a nestle. There's with clauses. And so with clauses are kind of like finally blocks of some special stuff going on.
SPEAKER_01
02:55:23 - 02:55:32
And so there's nesting in general, nesting of anything. Nesting of functions should be illegal. Well, it just feels like it adds a level of complexity.
SPEAKER_00
02:55:32 - 02:55:46
Alex, I'm merely an implementer. And so this is again, one of the trade offs you get when you decide about a super set is you get to implement a full fidelity implementation of the thing that you decided is good.
SPEAKER_01
02:55:48 - 02:56:01
Yeah, I mean, we can we can complain about the reality of the world and check our first, but it always feels like you shouldn't be allowed to do that like to declare functions in the sudden functions inside functions.
SPEAKER_00
02:56:01 - 02:56:05
Oh, wait, wait, wait, what would have to wax the the list guy?
SPEAKER_01
02:56:05 - 02:56:10
No, I understand that, but list is what I used to do in college.
SPEAKER_00
02:56:10 - 02:56:11
That's a naive grown up.
SPEAKER_01
02:56:12 - 02:56:15
You know, we've all done things in college. We're not part of that.
SPEAKER_00
02:56:15 - 02:56:22
I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. I'm not.
SPEAKER_01
02:56:22 - 02:56:46
I'm not. I'm not. So speaking of which, I don't think you have nested functions implemented yet in Mojo.
SPEAKER_00
02:56:46 - 02:56:49
We don't have lambdas syntax, but we do have nested syntax.
SPEAKER_01
02:56:49 - 02:57:10
So there's a few things on the road map that it would be cool to sort of just fly through because it's interesting to see how many features there are in a language, small and big. They have to implement. Yeah. So first of all, there's two people support. And that has to do with some of their specific aspects of it. Like the parentheses or not parentheses that.
SPEAKER_00
02:57:10 - 02:57:13
Yeah. This is just a totally syntactic thing or something.
SPEAKER_01
02:57:13 - 02:57:18
Okay, there's, but it's cool still. So keyword arguments and functions.
SPEAKER_00
02:57:19 - 02:57:25
Yeah, so this is where in Python you can say Call of Function X equals four. Yeah, and X is the name of the argument.
SPEAKER_01
02:57:25 - 02:57:29
That's a nice sort of documenting, so documenting feature.
SPEAKER_00
02:57:29 - 02:58:47
Yeah, I mean, again, this isn't rocket science to implement. It's just the largest system on the list. The bigger features are things like traits. So traits are when you want to define abstract. So when you get into typed languages, you need the ability to write generics. And so you want to say, I want to write this function. And now I want to work on all things that are arithmetic like. Well, what is arithmetic like? Well, arithmetic like is a categorization of a bunch of types. And so again, you can define it in different ways. And I'm not going to go into ring theory or something. But the, you know, you can say it's arithmetic like if you can add some fragments by divided, for example. Right. And so what you're saying is you're saying there's a set of traits that apply to a broad variety of types. And so there are all these types of arithmetic like all these sensors and floating point in the engine like there's this category of types and then I can define on an orthogonal axis algorithms that then work against types of those properties. And so this is, again, it's a widely known thing. It's been implemented in Swift and Rust, and many languages, so it's not a Haskell, which is where everybody learns their tracks from. But we need to implement that and that will enable a new level of expressivity.
SPEAKER_01
02:58:48 - 02:58:49
Oh, so classes.
SPEAKER_00
02:58:49 - 02:58:50
Yeah.
SPEAKER_01
02:58:50 - 02:59:12
Classes of Brazil. This is a big deal. It's still to be implemented. You said land is syntax and there's like details stuff like home module import. Support for top level code, file scope, and then global variables also. So being able to have variables outside of a top level.
SPEAKER_00
02:59:12 - 02:59:45
And so this comes back to the where module came from and the fact that this is your point one, right? And so we're building so modular building an A stack, right? And an A stack is a bunch of problems working with hardware and writing high performance kernels and doing this kernel fusion thing that's talking about and getting the most out of the hardware. And so we've really prioritized and built Mojo to solve much of the problem. Right now our North Star is built out and support all the things. And so we're making incredible progress. By the way, Mojo's only like seven months old. So that's another interesting thing.
SPEAKER_01
02:59:45 - 03:00:02
I mean, part of the reason I wanted to mention some of these things is like, there's a lot to do and it's pretty cool how you just kind of, sometimes you take for granted how much there is in a programming language, how many cool features you kind of lie on. And this is kind of a nice reminder when you lay it as it's to do this.
SPEAKER_00
03:00:02 - 03:00:24
Yeah, and so I mean, but also you're looking to It's amazing how much is also there. And you take it for granted that a value, if you define it, it will get destroyed automatically. Like that little feature itself is actually really complicated, given the way the ownership system has to work and the way that works within Mojo is a huge step forward from what Rust and Swift have done.
SPEAKER_01
03:00:24 - 03:00:28
But can you say that again when a value and you define it is destroyed on the market?
SPEAKER_00
03:00:28 - 03:02:25
Yeah, so like say you have a string, right? So you just find a string on the stack. Yeah, whatever that means like in your local function, right? And so you say, like whether it be in a deaf one, so you just say X equals hello world, right? Well, if you're strength type requires you to allocate memory, then once destroyed, you have to delicate it. So I'm Python and Mojo. You define that with a Dell method. Right? Where does that get run? Well, it gets run sometime between the last use of the value and the end of the program. Like in this, you know, get into garbage collection. You get into like all these long debated. You talk about religions and and trade-offs and things like this. This is a hugely hotly contested world. If you look at C++, the way this works is that If you define a variable, or a set of variables within a function, they get destroyed in a last-in-first-out order. So it's like nesting. This has a huge problem because if you define a big scope, and you define a whole bunch of values at the top, and then you use them, and then you do a whole bunch of code that doesn't use them, they don't get destroyed until the very end of that scope. And so this also destroys tail calls. So a good functional programming, right? This has a bunch of different impacts on, you know, you talk about reference counting optimizations and things like this a bunch of very low level things. And so what Moja does has a different approach on that from any language I'm familiar with, where destroys them as soon as possible. And by doing that, you get better memory use, you get better predictability, you get tail calls, that we're like, you get a bunch of other things, you get better ownership tracking. There's a bunch of these very simple things that are very fundamental, that are are rebuilt in there in Mozart today, that are the things that nobody talks about generally, but when they don't work right, you find out and you have to complain about.
SPEAKER_01
03:02:25 - 03:02:31
Is it trivial to know what's the soonest possible to delete a thing that's not going to be used again?
SPEAKER_00
03:02:31 - 03:02:51
Yeah, well, I mean, it's generally trivial. It's after the last use of it. So if you define X as a string, and then you have some use of X somewhere in your code within that scope within the so I hope there's accessible. It's yeah exactly. So you can only use something within it scope. And so then it doesn't wait until the end of the scope to delete it. It destroys it after the last use.
SPEAKER_01
03:02:52 - 03:02:56
So there's kind of some very ego machine that's just sitting there and deleting.
SPEAKER_00
03:02:56 - 03:03:09
Yeah, and it's all in the compiler, so it's not at runtime, which is also cool. And so yeah, and so what and this is actually non-travel because you have control flow. And so it gets kind of pretty quickly. And so like getting through it was not.
SPEAKER_01
03:03:09 - 03:03:13
Oh, so you have to insert delete like in a lot of places. Potentially, yeah.
SPEAKER_00
03:03:13 - 03:03:40
Compiler has to reason about this and this is where we Again, it's experienced building languages and not getting this right. So again, you get another chance to do it and you get basic things like this, right? But it's extremely powerful when you do that. And so there's a bunch of things like that that kind of combine together. And this comes back to the, you get a chance to do it the right way, do it the right way and make sure that every brick you put down is really good so that when you put more bricks on top of it, they stack up to something that's beautiful.
SPEAKER_01
03:03:40 - 03:04:00
Well, there's also a company design discussions, do there have to be about particular details, like implementation of particular small features, because the features that seem small, a bit, some of them might be like really require really big design decisions.
SPEAKER_00
03:04:01 - 03:06:09
Well, so let me give you another example of this. Python has a feature called Async Away. So it's a new feature. I mean, in the long outcomes of history, it's a relatively new feature that allows way more expressive asynchronous programming. Again, this is a Python's beautiful thing, and they did things that are great for Mojo for completely different reasons. The reason the Async Away got added to Python, as far as I know, is because Python doesn't support threats. Okay, and so Python doesn't support threads, but you want to work with networking and other things like that that can block. I mean, Python does support threads is just not it's strength. And so, um, And so they added this feature called Async Away. It's also seen in other languages like Swift and JavaScript and many other places as well. Async Away and Mojo is amazing. Because we have a high performance, energy and compute runtime underneath the covers that then allows non-blocking IOC to get full use of your accelerator. That's huge. It's actually really an important part of fully utilizing a machine. You talk about design discussions, I took a lot of discussions, and it probably will require more iteration. My philosophy with Mojo is that we have a small team of really good people that are pushing forward and they're very good at the extremely deep knowing how the compiler and run time and like all the low-level stuff works together. But they're not perfect. The same thing as the Swift team, right? And this is where one of the reasons we released Mojo much earlier is so we can get feedback and we've already like renamed the keyword. Did a community feedback and so on. We use an ampere stand and now it's named in out. We're not, we're not renaming existing Python keywords because that breaks compatibility, right? We're naming things we're adding and making sure that they are designed well, we get usage experience, we iterate and work with the community because again, if you scale something really fast in every right the other code and they start using it in production then it's impossible to change. And so you want to learn from people you want to iterate and work on that early on and this is where design discussions. It's actually quite important.
SPEAKER_01
03:06:09 - 03:06:14
Could you incorporate an emoji into the language, into the main language?
SPEAKER_00
03:06:14 - 03:06:17
Like, like, do you have a favorite one?
SPEAKER_01
03:06:17 - 03:06:30
Why really like, in terms of humor, like, uh, raw full, whatever, rolling on the floor laughing? So that could be like, uh, what would that be the use case for that, like, an exception, an exception, some sort? I don't know.
SPEAKER_00
03:06:30 - 03:06:32
Should totally file a feature request.
SPEAKER_01
03:06:34 - 03:06:37
Or maybe a hard one. It has to be a hard one.
SPEAKER_00
03:06:37 - 03:06:41
People have told me that I'm insane. So this is, I'm liking this.
SPEAKER_01
03:06:41 - 03:06:48
I'm going to use the viral nature of the internet to actually get this past.
SPEAKER_00
03:06:48 - 03:07:01
I mean, it's funny you come back to the flame emoji. File extension, right? You know, we have the option to use the flame emoji, which just even that concept cause, for example, the people that get upset. Now I've seen everything.
SPEAKER_01
03:07:03 - 03:07:13
Yeah, there's something kind of it's reinvigorating. It's like a It's like, oh, that's possible. That's really cool that for some reason that makes everything else, actually.
SPEAKER_00
03:07:13 - 03:07:26
I think the real world is ready for this stuff. And so, you know, when we have a package manager, we'll clearly have to innovate by having the compiled package thing be the little box with the bow on it, right? I mean, it has to be done.
SPEAKER_01
03:07:26 - 03:07:33
It has to be done. Is there some stuff on the roadmap that you're particularly stressed about or excited about that you think about a lot?
SPEAKER_00
03:07:33 - 03:09:27
I mean, as a today snapshot, which will be obsolete tomorrow, the lifetime stuff is really exciting. And so, lifetime's give you safe references to memory without dangling pointers. And so this has been done in languages like Rust before. And so we have a new approach, which is really cool. I'm very excited about that. That'll be out to the community very soon. The traits feature is really a big deal. And so that's blocking a lot of API design. And so there's that. I think that's really exciting. A lot of it is these kind of table six features. One of the things that is, again, also lessons learned with Swift, is that programmers in general like to add syntactic sugar. And so it's like, oh, well, this annoying thing, like in Python, you have to spell it on where I remember, add. Why can I just use plus? Def plus, come on. Why can't I just do that, right? And so trigger a bit of syntactic sugar. It makes sense. It's beautiful. It's obvious. We're trying not to do that. And so for two different reasons, one of which is that, again, less and less and less. So I've just has a lot of syntactic sugar. which may be because they may be not, I don't know. But because it's such an easy and addictive thing to do, sugar, like make sure blood gets crazy, right? Like the community will really dig into that and want to do a lot of that. And I think it's very distracting from building the core abstractions. The second is we want to be a good member of the Python community. Right. And so we want to work with the broader Python community. And yeah, we're pushing forward a bunch of systems programming features and we need to build them out to understand them. But once we get along ways forward, I want to make sure that we go back to the Python community and say, OK, let's do some design reviews. Let's actually talk about this stuff. Let's figure out how we want this stuff all to work together. And syntactic sugar just makes all that more complicated.
SPEAKER_01
03:09:27 - 03:09:33
And yeah, list comprehension is that yet to be implemented. And my favorite, I mean, dictionaries.
SPEAKER_00
03:09:34 - 03:09:37
Yeah, there's some interesting bases. 0.1.
SPEAKER_01
03:09:37 - 03:09:38
That's your point one.
SPEAKER_00
03:09:38 - 03:09:41
But nonetheless, it's actually so quite interesting and useful.
SPEAKER_01
03:09:41 - 03:10:32
As you've mentioned, modular is very new. Mojo is very new. It's relatively small team. Yeah. This building up this gigantic stack, this incredible stack that's going to perhaps define the future of development of our AI overlords. We just hope it will be useful as do all of us. So what have you learned from this process of building up a team? Maybe one question is, how do you hire great programmers, great people that operate in this compiler, hardware, machine learning, software interface design space. Yeah. And maybe our lives are fluid with the can do. So okay.
SPEAKER_00
03:10:32 - 03:11:59
So language design do. So building a company is just as interesting and different ways is building a language. Like different skill sets, different things, but super interesting. And I've built a lot of teams, a lot of different places. If he's zoom in from the big problem into recruiting, Well, so here's our problem. Okay. I'll just I'll be very straightforward about this. We started modular with a lot of conviction about we understand the problems. We understand the customer pain points. We need to work backwards from the suffering in the industry. And if we solve those problems, we think of the useful for people. But the problem is, is that the people we need to hire, as you say, are all these super specialized people that have jobs at big tech, big tech worlds, right? And you know, we, I don't think we have product market fit in the way that an normal startup does. We don't have product market fit challenges because right now everybody's using AI and so many of them are suffering and they won't help. And so again, we started with strong conviction. Again, you have to hire and recruit the best and the best I'll have jobs. And so what we've done as we said, okay, well, let's build an amazing culture. Start with that. That's usually not something a company starts with. Usually you hire a bunch of people and then people start fighting and it turns into a gigantic mess and then you try to figure out how to improve your culture later. My co-founder Tim and particular super passion about making sure that that's right and we've spent a lot of time early on to make sure that we can scale.
SPEAKER_01
03:11:59 - 03:12:13
I think you comment, sorry, before we get to the second. Yeah, well it makes for a good culture. So, I mean, there's made different cultures, and I've learned many things from many places, several very unique, almost famously, unique cultures.
SPEAKER_00
03:12:13 - 03:16:06
And some of them, I learned what to do, and some of them, I learned what not to do. And so, we want an inclusive culture. I believe in like, amazing people working together. And so, I've seen cultures where people, you have amazing people in their fighting each other. I see amazing people and they're told what to do, like doubt-shout line up and do what I say. It doesn't matter if it's the right thing. Do it, right? And neither of these is the, and I've seen people that have no direction, they're just kind of floating in different places. And they want to be amazingly a start of how. And so a lot of it starts with have a clear vision. And so we have a clear vision of what we're doing. And so I kind of grew up at Apple in my engineering life, right? And so a lot of the Apple DNA rubbed off on me. My co-founder Tim also is like a strong product guy. And so what we learned is, you know, I decided Apple that you don't work from building cool technology. You don't work from, like, come up with cool products and think about the features that you'll have in the big check boxes and stuff like this. Because if you go talk to customers, they don't actually care about your product. They don't care about your technology. What they care about is their problems. Right? And if your product can help solve their problems, well, hey, they might be interested in that. Right? And so if you speak to them about their problems, if you understand and you have compassion, you understand what people are working with, then you can work backwards to building an amazing product. So the visions and that's why I need the problem. And then we can work backwards in solving technology. And it Apple, like it's, I think pretty famously said that, you know, for every, you know, there's a hundred notes for every yes. I would refine that say that there's a hundred not yet for every yes. But famously, if you go back to the iPhone, for example, right, the iPhone one, every, I mean, many people laughed at it because it didn't have 3G, it didn't have copy and paste. Right. And then a year later, okay, finally it has 3G, but it still doesn't have copy and paste. It's a joke. Nobody will ever use this product blah, blah, blah, blah, blah, blah, blah, blah, blah. Right. Well, your three had copy and paste and people stop talking about it, right? And so, and so being laser focus and having conviction, understanding what the core problems are and giving the team the space to be able to build the right tech is really important. Also, I mean, you come back to recruiting. You have to pay well. So we have to pay industry-leading salaries and have good benefits and things like this. That's a big piece. We're a remote-first company. And so we have to... So remote-first has a very strong set of pros and cons. On the one hand, you can hire people from wherever they are and you can attract amazing talent even if they live in strange places or unusual places. On the other hand, you have time zones. On the other hand, you have like... everybody on the internet will fight if they don't understand each other and so we've had to learn how to like have a system where we actually fly people in and we get the whole company together periodically and then we get work groups together and we plan an execute together and it's interesting to the in person brainstorming the I guess you lose but maybe you don't maybe if you get to know each other well and you trust each other maybe you can do that Well, so when the pandemic first hit, I mean, I'm curious about your experience, too. The first thing I missed was having whiteboards. Yeah. Right. Those design discussions are like, I can hide high intensity, work through things, get things done, work through the problem of the day, understand where you're on, figure out and solve the problem and move forward. Yeah. But we figured out ways to work around that. Now with all these screen sharing and other things like that that we do, the thing I miss now is sitting down a lunch table with the team. The spontaneous things like the coffee, the coffee bar things and the bumping into each other and getting to know people outside of the transactional solve problem over Zoom.
SPEAKER_01
03:16:07 - 03:16:33
And I think there's just a lot of stuff that I'm not an expert at this, I don't know who is, hopefully there's some people. But there's stuff that somehow is missing on Zoom, even with the whiteboard, if you look at that, if you have a room with one person at the whiteboard and there's like three other people at a table. There's a, first of all, there's a social aspect of that where you're just shooting to share a little bit, almost like.
SPEAKER_00
03:16:33 - 03:16:34
Yeah.
SPEAKER_01
03:16:34 - 03:17:08
And people are just kind of coming in and yeah, that, but also while, like, it's a breakout discussion that happens for like seconds at a time, maybe inside joke or it's like this interesting dynamic that happens that's zoom. You're bonding. Yeah. You're bonding. You're bonding, but through that bonding, you get the excitement, there's certain ideas are like complete bullshit and you'll see that in the faces of others that you won't see necessarily on the Zoom. It feels like that should be possible to do without being in person.
SPEAKER_00
03:17:08 - 03:17:48
Well, I mean, being in person is a very different thing. Yeah, it's worth it, but you can't always do it. And so again, we're still learning and we're all still learning as like a humanity with this new reality, right? But what we found is that getting people together, whether it be a team or the whole company or whatever, is it worth the expense because people work together and are happier after that. There's a massive period of time where you're like, go out and things start getting afraid, pull people together and then you realize that we're all working together and we see things the same way. We work through the disagreement or the misunderstanding. We're talking across each other and then you work much better together. And so things like that I think are really quite important.
SPEAKER_01
03:17:48 - 03:17:56
What about people that are kind of specialized in very different aspects of the stack and together? What are some interesting challenges there?
SPEAKER_00
03:17:56 - 03:19:02
Yeah, well, so I mean, I mean, there's lots of interesting people, as you can tell, I'm, you know, hard to deal with too. But you are one of the most lovely people. So one of the, so there's different philosophies in building teams for me. And so some people say higher 10x programmers, and that's the only thing that whatever that means, right? What I believe in is building well balanced teams. Teams that have people that are different in them, like if you have all generals and no troops or all troops and no generals or you have all people that think in one way and not the other way, what you get is you get a very biased and skewed and weird situation where people end up being unhappy. And so what I like to do is I like to build teams of people where they're not all the same. You know, we do have teams that are focused on like runtime or compiler GPU or whatever the specialty is, but people bring a different take and have a different perspective. And I look for people that compliment each other. And particularly if you look at leadership teams and things like this, you don't want everybody thinking the same way. You want people bringing different perspectives and experiences. And so I think that's really important.
SPEAKER_01
03:19:03 - 03:19:07
That's team, but what about building a company as ambitious as modular?
SPEAKER_00
03:19:07 - 03:20:58
What interesting questions there? I mean, so many. One of the things I love about, okay, so modular is the first company I built from scratch. One of the first things that was profound was I'm not cleaning up somebody else's mess. Right, and so if you look at that's liberating. It's super liberating and and also many of the projects I've built in the past have not been core to the prior to the company. Swift is not Apple's product, right? MLAR is not Google's revenue machine or whatever, right? It's not it's it's important, but it's like working on the accounting software for you know the the retail giant or something, right? It's it's it's like enabling infrastructure and technology and so modular the tech we're building is Here to solve people's problems, like it is directly the thing we're giving to people. And so this is a really big difference. And what it means for me as a leader, but also for many of our engineers, is they're working on the thing that matters. And that's actually pretty, I mean, again, for a compiler people and things like that. That's usually not the case, right? And so that's also pretty exciting and in quite nice. But one of the ways that this manifests is it makes it easier to make decisions. And so one of the challenges I've had in other world is it's like okay well community matters somehow for the goodness of the world like or opensource matters theoretically but I don't want to pay for a t-shirt right or some swag like well t-shirts cost ten bucks each you can have a hundred t-shirts for a thousand dollars to a mega corp a thousand dollars is uncountably can't count that low right but justifying it and getting t-shirt by the way if you'd like a t-shirt
SPEAKER_01
03:20:59 - 03:21:03
I would 100% like a t-shirt. Are you joking?
SPEAKER_00
03:21:03 - 03:21:08
You can have a fire emoji t-shirt. I will, I will treasure this.
SPEAKER_01
03:21:08 - 03:21:11
I will pass it down to my grandchildren.
SPEAKER_00
03:21:11 - 03:21:20
And so, you know, it's, it's very liberating to the side. I think that Lax should have a t-shirt. Right? And it becomes very simple because I like Lax.
SPEAKER_01
03:21:20 - 03:22:41
This, this, this is awesome. So, I have to ask you about the one of the interesting developments with large language models is that they're able to generate code recently really well. Yes, to agree that maybe I don't know if you understand but I have, I struggle to understand because it forces me to ask questions about the nature of programming of the nature of thought. Because the language models are able to predict the kind of code I was about to write so well. Yep. That it makes me wonder like how unique my brain is and where the valuable ideas actually come from, like how much to contribute. in terms of ingenuity innovation to code, I write, or design, and that kind of stuff. When you stand on the shoulders of giants that you're really doing anything and what LLMs are helping you do is they help you stand on the shoulders of giants and you program. There's mistakes, they're interesting that you learn from, but I just it would love to get your opinion first high level of what you think about this impact of large language models when they do programs and this is when they generate code.
SPEAKER_00
03:22:42 - 03:24:06
Yeah, also, I don't know where it all goes. I'm an optimist and I'm a human optimist. I think that things I've seen are that a lot of the LLMs are really good at crushing leak code projects and they can reverse the link list like crazy. Well, it turns out there's a lot of instances of that on the internet and it's a pretty stock thing. And so if you want to see standard questions answered, LLMs can memorize all the answers and that can be amazing. also they do generalize out from that and so there's good work on that but um but I think that if you in my experience building things building something like you talk about mojo or you talk about these things you talk about building and applied solution to a problem it's also about working with people It's about understanding the problem. What is the product that you want to build? What are the use case? What are the customers? Can't just go survey all the customers because they'll tell you that they want to faster horse. Maybe they need a car, right? And so a lot of it comes into, you know, I don't feel like we have to compete with LLMs. I think the help automate a ton of the mechanical stuff out of the way. And just like, you know, I think we all try to scale through delegation and things like this, delegating route things to an LLM. I think it's an extremely valuable. approach that will help us all scale and be more productive. But I think it's a fascinating companion. But I'd say I don't think that means we're going to be done with coding.
SPEAKER_01
03:24:06 - 03:24:49
But there's power in it as a companion. And from there, I would love to zoom in on to Mojo a little bit. Do you think about that? Do you think about LLM's generating Mojo code and helping sort of like we designed new programming language it almost seems like new be nice to sort of almost as a way to learn how I'm supposed to use this thing for them to be trained on some of the most good so I do lead in the air company so maybe there will be a mojo lm at some point but if your question is like how do we make a language to be suitable for lm's yeah I think that the
SPEAKER_00
03:24:51 - 03:25:23
I think the cool thing about LMS is you don't have to. And so if you look at what is English or any of these other terrible languages that we as humans deal with on a continuous basis, they're never designed for machines. And yet they're the intermediate representation, they're the exchange format that we humans use to get stuff done. And so these programming languages, they're an intermediate representation between the human and the computer or the human and the compiler, roughly, right? And so I think the LMS will have no problem learning whatever keyword we pick.
SPEAKER_01
03:25:23 - 03:25:38
Maybe the 5 emoji is going to, maybe that's going to break it. It doesn't tokenize. No, the reverse of that will actually enable it because one of the issues I could see with being a super set of Python is there would be confusion by the gray area. So it would be mixing stuff.
SPEAKER_00
03:25:40 - 03:26:13
But I'm a human optimist, I'm also an LM optimist, I think we'll solve that problem. But you look at that and you say, OK, well. reducing the wrote thing, right? Turns out computers are very particular and they really want things. They really want the indentation to be right. They really want the colon to be there on your else or else it will complain, right? I mean, comparison can do better at this, but um, elements can totally help solve that problem. So I'm very happy about the new predictive coding and compile it type features and things like this because I think it will all just make us more productive.
SPEAKER_01
03:26:13 - 03:26:29
It's still messy and fuzzy and uncertain, unpredictable. So, is there a future you see given how big of a leap GPT-4 was where you start to see something like all limbs inside a compiler?
SPEAKER_00
03:26:29 - 03:27:41
I mean, you could do that. Yeah, absolutely. I think that'd be interesting. Well, I mean, it would be very expensive. So, compilers run fast, and they're very efficient, and all of them are currently very expensive. There's on device, all of them, and there's other things going on, and so maybe there's an answer there. I think that one of the things that I haven't seen enough of is that So LLMs to me are amazingly tap into the creative potential of the hallucinations, right? And so if you're doing creative brainstorming or creating writing or things like that, the hallucinations working your favor. If you're writing code that has to be correct because you're going to ship in production, then maybe that's not actually a feature. And so I think that there has been research and there has been work on building algebraic reasoning systems and kind of like figuring out more things that feel like proofs. And so I think that there could be interesting work in terms of building more reliable at scale systems and that could be interesting. But if you chase that rabbit hole down, the question then becomes, how do you express your intent to the machine? And so maybe you want LM to provide the spec, but you have a different kind of net that then actually implements the code.
SPEAKER_01
03:27:41 - 03:28:06
All right, so it's to use this documentation and inspiration. versus the actual implementation. Yeah, actually. Since it's successful modular will be the thing that runs, I say, so jokingly are AI overlords, but AI systems that are used across, I know it's a cliche term, but in order of things, so costs.
SPEAKER_00
03:28:06 - 03:28:10
So I'll joke and say, like, the AI should be written in Mojo.
SPEAKER_01
03:28:10 - 03:28:47
Yeah, the AI should be written in Mojo. You're joking, but it's also possible that it's not a joke. that a lot of the ideas behind Modures seems like the natural set of ideas that would enable at scale, training and inference of AI systems. So just have to ask about the big philosophical question about human civilization. So folks like LEAzeria Kowsky are really concerned about the threat of AI. Do you think about the good and the bad that can happen at scale deployment of AI systems.
SPEAKER_00
03:28:47 - 03:29:08
Well, so I've thought a lot about it and there's a lot of different parts of this problem, everything from job displacement to sky nut. Things like this and so you can zoom into sub parts of this problem. I'm not super optimistic about AGI being solved next year. I don't think that's going to happen personally.
SPEAKER_01
03:29:08 - 03:29:16
So you have a kind of zen like calm about is there's a nervousness because the leap of GPT for seem so big.
SPEAKER_00
03:29:16 - 03:29:17
Sure.
SPEAKER_01
03:29:17 - 03:29:22
It's like we're almost we're there's some kind of transition in your opinion. You're thinking.
SPEAKER_00
03:29:22 - 03:30:39
Well, so so I mean, there's a couple of things going on there. One is I'm sure GPT 5 and 7 and 19 will be also huge leaps. They're also getting much more expensive to run and so there may be a limiting function in terms of just expense on one hand and train like that could be a limiter that slows things down. But I think the bigger limiter it's outside of like Skynet takes over and I don't spend any time thinking about that because if Skynet takes over and kills us all then I'll be dead so I don't worry about that So you know, I mean that's just Okay other things worry about I'll just focus on yeah, I'll focus not worry about that one But I think that the other thing I'd say is that AI moves quickly, but humans move slowly and we adapt slowly and so what I expect to happen is just like any technology diffusion Like, the promise and then the application takes time to roll out. And so I think that I'm not even too worried about autonomous cars defining way all the taxi drivers. Remember, autonomous will be solved by 2020. I bullied driver. And so I think that on the one hand, we can see amazing progress, but on the other hand, we can see that the reality is a little bit more complicated and it may take longer to roll out than you might expect.
SPEAKER_01
03:30:39 - 03:31:05
Well, that's in the physical space. I do think in the digital space is the stuff that's built on top of LLMs that runs, you know, the millions of apps that could be built on top of them. And that could be run on millions of devices, millions of types of devices. Yeah. I just think that the rapid effect it has in human civilization could be truly transformative to, yeah.
SPEAKER_00
03:31:07 - 03:31:13
One says it predict. And then I think it depends on how you're an optimist or a pessimist or a massacist.
SPEAKER_01
03:31:13 - 03:31:19
It's just a clear fight. Optimist the ball he was a visualization.
SPEAKER_00
03:31:19 - 03:31:43
And so I look at that as saying, OK, cool. Well, yeah, I do. Right. And so some people say, oh my god, it's going to destroy us all. How do we prevent that? I can't look at it from a, is it going to unlock us all? Right. You talk about coding. It's going to make so I don't have to do all the repetitive stuff. Well, suddenly, that's a very optimistic way to look at. And you look at what a lot of these technologies have done to improve our lives. And I want that to go faster.
SPEAKER_01
03:31:43 - 03:32:02
What do you think the future program looks like in the next 10, 20, 30, 50 years? The LMS and with Mojo with Modular, like the revision for devices, the hardware to the compiler, to the different stacks of software.
SPEAKER_00
03:32:02 - 03:33:41
Well, so what I want, I mean, coming back to my arsonemesis, right? It's complexity, right? So again, me being the optimist, if we drive down complexity, we can make these tools, these technologies, these cool hardware widgets accessible to way more people. And so what I'd love to see is more personalized experiences, more things, the research getting into production instead of being lost in Europe's. And these things that impact people's lives by entering products. And so one of the things that I'm a little bit concerned about is right now, the big companies are investing huge amounts of money in our driving the top line of AI capability for really quickly. But if it means that you have to have $100 million to train a model or more $100 billion, right? Well, that's going to make it very concentrated with very few people in the world that can actually do this stuff. I would much rather see lots of people across the industry. be able to participate in uses, right? And you look at this, you know, I mean, a lot of great research has been done in the health world and looking at like detecting pathologies and doing radiology with AI and like doing all these things. Well, the problem today is that to deploy and build the systems you have to be an expert in radiology and expert in AI, And if we can break down the barriers so that more people can use AI techniques. And it's more like programming Python, which roughly everybody can do if they want to, right? Then I think that we'll get a lot more practical application these techniques and a lot more niche year cool, but narrower demands. And I think that's, that's going to be really cool.
SPEAKER_01
03:33:41 - 03:33:45
Do you think we'll have more or less programmers in the world than now?
SPEAKER_00
03:33:46 - 03:33:59
Well, so I think we'll have more more programmers, but they may not consider themselves to be programmers. That'd be a different name for it. I mean, do you consider somebody that uses, you know, I think that arguably the most popular programming language is Excel.
SPEAKER_02
03:33:59 - 03:34:00
Yeah.
SPEAKER_00
03:34:02 - 03:35:01
Right. Yep. And so do they consider themselves three programmers? Maybe not. I mean, some of them make crazy macros and stuff like that. But, but what the, you mentioned Steve Jobs is, it's the bicycle for the mind, the logic of faster, right? And so I think that as we look forward, right? What is AI? I look at it as, hopefully a new programming paradigm. It's like object-oriented programming. Right? If you want to write a cat detector, you don't use four loops. It turns out that's not the right tool for the job, right? And so right now, unfortunately, because, I mean, it's not unfortunate, but it's just kind of where things are. AI is this weird different thing. It's not integrated into programming languages and normal toolchains and all the technologies really weird and doesn't work right, and you have to babysit it, and every time you switch hardware, it's different, shouldn't be that way. When you change that, when you fix that, suddenly, again, the tools technologies can be way easier to use. You can start using them for many more things. So that's why I would be excited about it.
SPEAKER_01
03:35:01 - 03:35:22
What kind of advice could you get to somebody in high school right now, or maybe early college who's curious about programming? and feeling like the world is changing really quickly here. What kind of stuff to learn? What kind of stuff to work on? Should they finish college? Go work at a company, should they build a thing?
SPEAKER_00
03:35:22 - 03:36:56
What do you think? Well, so, I mean, one of the things I'd say is that you'll be most successful if you work on something you're excited by. And so don't get the book and read the book. Cover the cover and study and memorize and recite and flash card and go build something. Like, go solve a problem. Go build the thing that you want to exist, go build an app, go train a model, like, go build something and actually use it and set a goal for yourself. And if you do that, then you'll, you know, there's a success, there's the adrenaline rush, there's the achievement, there's the unlock that I think is where, you know, if you keep setting goals and you keep doing things and building things learning by building is really powerful. In terms of career advice, I mean, everybody's different, it's very hard to give generalized advice. All I'll speak is, you know, a compiler nerd. If everybody's going left, sometimes it's pretty cool to go right. And so just because everybody's doing a thing, it doesn't mean you have to do the same thing and follow the herd. In fact, I think that sometimes the most exciting past or life lead to being curious about things that nobody else actually focuses on, right? And it turns out that understanding deeply parts of the problem that people want to take for granted makes you extremely valuable and specialized in ways that the herd is not. And so, again, I mean, there's lots of rooms for specialization, lots of rooms for generalists, lots of rooms for different kinds of parts of the problem, but I think that it's, you know, just because everybody's doing one thing doesn't mean you should necessarily do it.
SPEAKER_01
03:36:56 - 03:37:31
And now the herd is using Python, so if you want to be a rebel, go check out Mojo. and help Chris and the rest of the world fight the harsh nemesis of complexity. Because simple is beautiful. There you go. Because you're an incredible person. You've been so kind to me ever since we met. You've been extremely supportive. I'll forever grateful for that. Thank you for being who you are, for being legit, for being kind, for fighting this really interesting problem of how to make AI accessible to huge number of people, huge number of devices.
SPEAKER_00
03:37:31 - 03:37:49
Yeah, well, so lecture a pretty special person too. Right. And so I think that, you know, one of the funny things about you is that the size being curious and pretty damn smart, you're actually willing to push on things. And you're, I think that you've got an agenda to like make the world think, which I think is a pretty good agenda.
SPEAKER_01
03:37:49 - 03:37:52
It's a pretty good one. Thank you so much for talking at Chris.
SPEAKER_00
03:37:52 - 03:37:52
Yeah. Thanks, Lex.
SPEAKER_01
03:37:54 - 03:38:12
Thanks for listening to this conversation with Chris Ladner. To support this podcast, please check out our sponsors in the description. And now, let me leave you some words from Isaac Asimov. I do not fear computers. I fear the lack of them. Thank you for listening and hope to see you next time.