Daniel Lenton, Ivy Interview | Craft of Open Source Podcast
What Ivy uniquely enables is for anybody with one line of code to take any model, library, or function from any other version of any other framework. In a single line, they can bring it directly into their project.
There is fragmentation in the field of AI that no one seems to be taking too seriously. Daniel Lenton, the CEO of Ivy, observed this, finding so many frameworks, models, infrastructures, and hardware that make it hard for collaboration. To help solve this problem, Ivy was created to unify all Machine Learning (ML) frameworks.
In this episode, Daniel talks about the way they made it possible to fit these pieces together. Daniel also takes us through his journey prior to Ivy, the walls they hit in the labs, and the lessons they learned overcoming them. Talking about Ivy’s growth, he then shares their success in GitHub and Discord and where they see they’re heading in the future.
Tune in to find out how Ivy uniquely allows you to bring any model, library, or function into one line of code for your project.
—
I’ve got a super interesting guest, partly from a personal point of view. Daniel Lenton, who is CEO of Ivy, did almost exactly the same field of study in his PhD that I did in my Master’s 27 years ago. Daniel, welcome.
Thank you so much for the kind introduction. It’s great to be here. I didn’t know that. Were you doing robotics and things like this?
Let’s start with that. I studied artificial intelligence and software engineering between 1993 and 1997. I then did an MSc in Cognitive Science, which was the first year that the University of Birmingham put on that course. That’s how new it was. It was this pretty new field of, “Let’s get people together from the Computer Science faculty, the Philosophy faculty, the Linguistics, etc. Manage them all together and try to figure out if that helps move things.” At the time, it didn’t do anything because it was the deepest and darkest day of winter. My Master’s thesis had 60 perceptrons in the neural network.
Was it a single layer as well or was it like MSc?
It was a three-layer network, and I had to write that entire thing by hand in C++. The most I could ring out is the 486 at the time. We had a SunOS workstation and you had to go and set the thing training with a nice level, and then come back in the morning and hope you hadn’t pissed too many people off by consuming resources.
That’s when neural nets were on-call.
Do you want to give us what your academic background is, and what you’ve been working on?
First of all, I got into neural networks before there were that on-call, so I don’t quite have that same claim. I got in relatively early. Quick background, I initially did Mechanical Engineering at Imperial College. I do like engineering. The reason I went into that is because, I probably could have lost some very good marketing in my high school where there were wind turbines, green future, and all these things that I was quite passionate about. I went into that but realized I wasn’t as into aerodynamics, planes, and gearboxes quite as much as I was with the coding side of things. We did a super high-level course on MATLAB and I was like, “I like this a lot more.” I like watching Python YouTube videos and C++ YouTube videos even though there weren’t courses for those.
I did a hobby such as robotics projects and stuff in my summers. That was three months of summer. It wasn’t easy to change over to a Computer Science degree because, in the UK, you can only have another year of student loan. By that time, it wasn’t easy for me to change. I stuck with engineering but I did everything I could with all my powers to choose the most robotics computer science-related stuff that I could. There was a little bit of room for. I was quite fortunate that my master’s supervisor pretty much let me self-supervise my project because it was in an area that they weren’t a super expert in. They need me to dive deep into vision.
Anyway, the master’s finished, and then I was thankfully on the back of all of that. It was a leapfrog into a Computer Science PhD in robotics also at Imperial College. At this time, I was in a robotic club with Andrew Davison. This is then where I was in my element. Starting that, I was quite good at coding and I used a bit of C++ in these Xbox Kinect to point cloud cameras. We used a bit of C++. We used a bit of Python, but nothing like proper complex software engineering because it was always like one project hacked together for a few months to get something working. I met some amazing software engineers in the lab. I learned a ton from them.
This is quite a long story. We were using Cafe at the beginning. I joined in 2016. That was when AI was on the rise but it was still pre-transforming with paper and stuff. We were using C++ and Cafe to start with, then we were using TensorFlow when that came out in 2016. Probably half of us moved them to PyTorch in 2017. I interned at Amazon on their drone program in 2018 where we were using MXNet. Others interned at DeepMind where they started to use JAX around that time in 2019. We had people joining from Japan where they were using Chainer, another framework.
There was this huge explosion of frameworks. It was hard for researchers in our lab to collaborate together because everyone was doing the same fundamental 3D vision, robotics, etc. A lot of shared themes and knowledge but not shared code bases because of the fragmentation. I found this annoying. It was a nightmare for new PhD students and I wanted to do something about it.
I ended up diving deep into this project called Ivy which initially enabled the creation of framework-agnostic libraries. It wasn’t about converting code or anything like this. It was about writing a library for mechanics, computer vision, or robotics. Writing that library once in a way that was abstracted such that it could work with any of these frameworks that we were all using. We could have these shared code bases and shared models, etc. That worked well.
As time went on, I felt as though the fragmentation in the entire field and the fragmentation in the lab was reflected in the entire field. Nobody seemed to be taking the problem too seriously. Those things like ONNX are great but still have a lot of downsides in my opinion, so I ended up diving even deeper down this engineering rabbit hole, all of the expensive research that I also am passionate about. In another life, I’ll keep doing that as well. I ended up becoming self-obsessed with this interrupted problem and certainly still do not look back in my regrets because it’s quite a fundamental problem that could benefit a lot of people. That’s super exciting to me.
The company was formed at the beginning of 2022. At that time, it was a handful of stars. GitHub and myself as a sole contributor have not too much to look at. In the last several months or so, we’ve gone from a few hundred stars to getting on 11,000 stars. We got over 11,000 in our Discord now, nearly 1,000 contributors. People meaningfully contributing towards this was still pre-release. On the actual final release which is coming out in June 2023, I’m pretty confident about that, so I’m not too scared to say that it’ll be coming out.
Going back to my experience when I stopped working with neural networks, it was a virtual robot. Back then, you were writing the code for a perceptron and learning objects already programming which lends itself quite well to a neural network. It’s very hard for people not directly working within the field of AI machine learning to follow. You mentioned the transformer paper, I know that that is important and seminal, but I don’t quite understand what that means. First of all, if we decompose things, there’s a model that comes out at the end that you can save and send to someone and they can load. There’s the engine that is running the runtime of the network, and then there is a language that you write to design the network. How do these pieces fit together?
It depends on how granular we go, but I guess on those three points, yes. My way of thinking about the model would be quite abstract. It’s pure mathematics. The model is this mathematical function that’s completely devoid of any hardware or engine. It’s purely numbers, additions, and matrix multiplications. This is therefore why you can imagine taking ChatGPT and reimplementing it in a new framework because ChatGPT is under the hood a big complex mathematical function. It obviously transcends the medium on which it’s being executed.
Conceptually, we can think about it as a very big equation.
It’s a super complex mathematical equation. These things all get a bit overlapped. Engines are more general than just machine learning. There’s this big software hierarchy in general. A lot of it which is not machine learning-related, which is hardware, CPUs, assembly language, compilers, LLDM, and this compiler infrastructure that tries to unify different hardware vendors.
At some point, what you typically then get up to is a C++ layer. All of the machine learning frameworks are compiled using C++ pretty much. What you have now on top of the pre-compiled and pre-written C++ kernels, which indicates the GPU includes pre-written CUDA C code to paralyze these kernels on hardware and on GPUs. What you have are these high-level packaged functions, like map model. For those who are slightly into programming, then anyone familiar with NumPy, this is the level that we’re talking about when we talk about these frameworks.
Matrix multiplication, the addition of entire arrays, and subtraction of entire arrays. During these element-wise functions at this API level and composing those together is the machine learning framework. That is what PyTorch is. PyTorch, TensorFlow, JAX, NumPy arguably as well, and several others are then this fragmented set of frameworks.
Ivy is not only trying to unify the frameworks but also things below the frameworks, which include the compiler infrastructure and the hardware. We see an increasing company and explosion of hardware vendors coming on from the scene as well as compiler infrastructure. Many of these are very much specifically focused on AI because of how much of a big market it is.
There are a lot of hardware vendors popping up that are not making super general computers like CPUs, but they’re making things that are good at executing neural networks in different memory regions and performance regions. We want to have all of those interconnected so that you can always have access to the best combination for the job, whatever that might mean for you.
Another thing I would say quickly because you were mentioning programming, individual neurons, and things like this. I have no idea what the status was like back in the ’90s, but now at least when map model happens, most networks are representing a fully connected layer. Therefore, all of the layer 1 to layer 2 fully dense interconnections are all represented as a single matrix multiplication. At least for the user, there’s no individual programming of neurons in their connections. That’s what’s map model as well. Sometimes when people see the big network, you might think that you’re having each weight explicitly in this separate object or something, but it’s just a big map model.
There are several layers of abstraction going on here that keep getting built on top of those things.
Also, for anyone that’s reading along, we have a page on which is basically our related work section. It’s Unify.ai/docs/ivy/overview/related_work.html. If you go into our docs and this type of related work, then there’s a big image where we show this stack on quite a granular level. Certainly, as you’re saying, even that image that has 8 or 9 levels in the hierarchy is a simplification to some extent.
In terms of the actual hardware that’s doing the learning for these models, and then running them once they’ve been trained, has that hardware been successfully abstracted away for several years now? Is it like you almost don’t care whether you’re running the video card on my computer or some specially designed ASIC in Google Cloud somewhere?
To some extent, it has. Each framework wants to make things as simple as possible, but they also don’t necessarily have aligned incentives to connect to everything always well. Let’s say TPUs, which are Google’s Tensor Processing Units. TensorFlow and JAX, which are both languages and frameworks, which are used extensively in Google, have much better support for TPUs.
To some extent, each framework obviously wants to make things as simple as possible, but they also don’t necessarily have aligned incentives to connect everything always really well.
PyTorch which has its origins now, it’s a foundation but it has its origins much more with Meta. There is something called Torch XLA but this is not the same incentive for the main people who are advocating or supporting PyTorch. A lot of deep learning teams at Meta and so on. They’re not super bothered about making sure that Torch supports’ teaching you well. There are lots of sparsely connected layers. You could simply group them, but there are all these layers and there are sparse interconnections when you go down. Every framework has the ability to go down the hardware without you caring about it. They just don’t have access to all of them.
Our thinking is, if you don’t have access to all of them when you’re working with a sparse subset, there’s a high chance that you have a suboptimal solution for whatever you care about because there’s a lot of great things by other people that don’t have those connections too. We’re taking a connection approach. I also wanted to say because it could be easy for me to paint a picture where everything’s fragmented. Everyone just wants to lock in. We’re the only people trying to unify anything and that’s not the case.
There have been standards emerging at various levels. Some are a bit lower down. There’s something called the Open Neural Network Exchange, which is trying to create a standardized replication on top of hardware vendors. This OpenXLA was released from Google, which again is technically vendor-agnostic, but it’s mostly relevant for TPUs now and so on.
We are doing differently is rather than saying, “Here’s the standard. Mow everybody can plug into this,” we’re saying no, “We’re never going to get consensus. Let’s proactively bind into everything, so we don’t need consensus and we don’t expect or need anybody else to do work for us. We’re just going to make sure that every conceivable endpoint is connected as part of our interconnected web.”
We’re going to do it in such a way that it is scalable, it’s not too much work for us, and then our efforts are arguably aligned with everyone else’s because we’re just bringing them along to the ride and making it work more easily with everything else. There’s no other tool that’s doing that in quite the same way that we are at the moment. There are others that I should mention as well. I’m probably adding more complexity, but then it’s helpful.
It is strange because it feels like I’m peeking through into a different dimension. My experience is, I understand all these layers in a SaaS cloud computing world, but it’s amazing how quickly that feels like you’re peering into a different world. All these concepts and abstractions are mind-boggling. Can you explain prior to Ivy? What would be a typical wall that you guys would hit in your labs?
I have to be honest and say it was a slightly different time. In our labs, the origin of Ivy was directly addressing a deep fragmentation problem of the frameworks at the research level. That particular part of fragmentation has to some extent subsided because we had people using TensorFlow, JAX, PyTorch, and Chainer. Some people are still using Cafe because they could have more control over memory management. Now, the research team, at least at Western University, has converged more towards PyTorch.
There are a lot of things Ivy can be used for, but to weigh things in, one of the things that Ivy will be very useful for is taking existing PyTorch code and making it run faster. There are a lot of demonstrations. If you take a look at Hugging Face‘s posts and efforts, a common theme is that by reimplementing very popular PyTorch models in JAX, they can get almost or over a 70 times speedup, which is huge. Enterprises are deploying these things because it’s massive.
The difference is, use JAX instead of PyTorch, run it on a TPU instead of on a GPU, and do a few tricks that only JAX offers and you get this huge speed up. This is another example, but you can have this sophisticated toolchain under PyTorch which is the main framework, and a lot of clever people working on it. There’s a much bigger world under the stack that doesn’t connect to all these things. If we can unlock that, then that’s a lot of value to create. That would probably be quite high on the list.
More broadly, another thing that a lot of our early users are interested in is the flexibility of prototyping. What Ivy uniquely enables is for anybody with one line of code to take any model, library, or function from any other version of any other framework. In a single line, they can bring it directly into their project. Two examples would be, let’s say that your entire project and you have your own trainer classes and data loader classes for experimenting, and now DeepMind.
This isn’t PyTorch. You have your own PyTorch setup. Deep Mind released a new state of your model that gets a nature paper and it’s amazing. Everyone’s jaw drops and it all injects because DeepMind probably injects. You can’t then quickly try this thing out in your pipeline. You can painstakingly reimplement it. The devil is very much in the detail of machine learning, so maybe subtle deviations are hard to detect.
You don’t have full confidence unless you spend a decent amount of time doing it. You’re then bottlenecked by your own implementation time efforts. In reality, there are a ton of amazing things coming up all the time. If you could get these things converted over in a few seconds, you could try so many more ideas so much more quickly and get good at better products to market more quickly. Another big point is this speed of thought experimentation that we unlocked with Ivy. I could list more because what we’re doing is relatively quite a fundamental piece of tech. It’s not super verticalized at all. There are a lot of different ways people are interested in using this, but these are some of the top ones.
Let’s say I’ve got something running in Stack A, and then I read or hear you say, “I can get a 70X improvement if I use Ivy to port it to Stack B. How do I know that the output’s coming out of Stack B are not guaranteed to be exactly the same as the ones coming out of A?
They’re not exactly the same to the point of the exact 1s and 0s because our abstraction level and our intermediate representation are at the level of the mathematical functional API, quite a high level. If mathematically, tan or sin or cos. Low level, there can be approximations made that differ slightly. We have extensive unit testing to verify that within a very small rounding era, we get the same results.
What we’ve also shown, and this was something we showed in our ODSC West Talk in November 2022. It also will be one of the demos that’s released in a few weeks. This is the case for a very large model. What we’ve shown is you can take a DeepMind model for perceived IO, a very deep principal model. You can transpile this to JAX, PyTorch, etc. When you pass the same image in the input, you get exactly the same predictions. It knows that it’s a dog with 8% confidence and so on.
Even with these rounding errors, because it’s true, we never get exactly the same floating-point representation for any output, but these don’t propagate to create noise at the end. It’s very stable, which is a good thing. Also, the user can test us. If they have their PyTorch model and now they do this one line of code to convert it to JAX, which is now 70 times faster, and they put the same input in, do a quick unit test, and make themselves confident. We can verify that as well.
It’s more about being a certain confidence in all that they’re behaving.
Another thing with Ivy, that is very good for converting for training as well if you don’t want to re-finetune the model or you want to take the architecture and retrain it. This is completely relevant anyway because the mathematics is correct and you are going to then maybe hone in. You’re then going to improve these slight reductions. From our experiment, the loss in accuracy from doing this translation is something in the order of a fraction of a second, maybe 0.2% or something. For many cases, 70 times speed up and a 0.2% drop, particularly when this can be overcome with some finetuning is a worthwhile tradeoff.
I was looking at your star history. You guys have done a cracking job. You’ve already come out of the gate and smashed it. What was the story in terms of the first line of code, and then when it started to get put up on GitHub? Was that you by yourself trying to solve this problem that you had locally?
The truth is, when the company was first formed, at the beginning, there weren’t resources. I could never build Ivy to what it means to be on my own. I had the idea out there and I had version zero basically. When I first put it out there in 2021, I was still finishing my PhD and it was a part-time thing. I then started to probably take it more seriously in January 2022. That’s also when we got our first small investment.
We indexed most computer science departments in the world. We got emails sent around for internships. We put job ads on every job board we can find. We weren’t too worried about only trying to get people from top universities all exclusively undercover. We were like, “No. Let’s get as many people to know about this as possible because my hypothesis is and continues to be that computer science talent is relatively uniformly distributed across the globe. Much more so than it is with fields that have much more insider knowledge.”
Like lawyers, you can’t get insider knowledge unless you go to certain law schools or something. It’s very bad for social mobility and so on. Whereas in comparison, software engineering is thankfully relatively democratized because internet access is increasing all over the globe all the time. There are amazing free resources to learn all of this and everything.
Software engineering is thankfully relatively democratized because internet access is increasing around the globe all the time.
We just wanted to get as many people saying this as possible. We have this fully remote setup so it doesn’t concern us what someone’s location too much. We’re trying to get the best people regardless of where they went and what a CV looks like, etc. We pretty much disregard the CV. Anyway, all of this is to say that this was maybe a bit of a catalyst because then a lot of students started saying about this. They started to apply that got the dry rolling a bit and now word of mouth spreading continues to push things forward because we don’t do that as much anymore.
I think that’s a slight bit of a catalyst. People said, “This seems cool.” They told their friends about it, and now, we seem to have a large community of 11,500 on Discord where everyone is very excited for the release to come out. Another thing just to put the cherry on top would be, it helps as well that our project is unreasonably parallel.
This is a common thing that we haven’t talked about explicitly on the show before. There are some open source projects, especially ones that have a plugin concept and 80% of the code bases plugins. I always think like home assistance is the perfect one. They’ve got these thousands of contributors and stuff, and then you’ve got others like Flag Smith where it’s very serial and difficult. You can’t choose that. It’s like you turn the card over.
That’s also quite a ton of models. Everyone can contribute something. We benefit from that and that helps because then, every applicant that applies needs to make a pull request, and that pull request needs to get accepted. It’s the first phase of the assessment. Poor quests aren’t compared. It’s just an entry-level, and then the actual assessment start because every pull request is different. It’s not like it’s meaningful work so you can’t do apples-to-oranges comparisons anyway.
Some people will luckily have a pull request that’s pretty quick. Some of them are a bit longer, but it’s still a baseline that gives some entry-level. Anyway, that means that we have 1,000 contributors as part of that application and also telling their friends. It’s this thing where adding a new pull request to Ivy can take as little as 30 minutes because adding one function to one framework in the front end or back end can be done quite quickly.
It’s this thing that people can plug into outdoors, which helps as well to increase people’s sense of involvement but also actual involvement. It helps everybody feel part of the community but they also are part of that community because we could not have done this without such a large team of people working on it. It’s the perfect project in that sense.
Have you found that commercial organizations have been contributing to it, or is it more hobbyists or people who have a specific problem that they want to solve?
For now, it’s predominantly hobbyists, researchers, and students. This will change when it comes to a few weeks from now, but the version that’s available publicly is a version that requires people to write quite a lot of Ivy code. This is something that isn’t that commercially relevant right now. What I would suspect will change once Ivy is properly usable is this whole ability to convert code between frameworks and the ability to deploy very efficiently and take a PyTorch model and run it on JAX, these are not publicly accessible yet. That will change at some point. When people are more seriously using it, there is this property where it’s very easy for people to fix their own bugs.
Let’s say you want to run this PyTorch model in your organization, which can be an enterprise. You want to run it 70 times faster, and you want to do that right now with one line of code, but there’s one function that’s missing in the PyTorch front end or one function missing in the JAX backend. When you try to do Ivy.Transpile, it’s going to say, “This almost worked. There are these two things missing. Here’s something with a pull request.”
Once you do that, you set bleeding edge mode equals true, so that you’re not bound to the PIP version and you can pull straight in, then it’ll work. You’re then bottlenecked by yourself getting those PRs accepted. This is where we can start to increase commercial engagement and contributions because it’s picture and bugs.
At the moment, the community is more people for whom the story resonates and they want to be part of the solution. We don’t have meaningful usage yet, people’s only engagement is, “This is awesome. I want to be part of this movement or something.” They’re therefore contributing towards that first major release.
We’re looking at your GitHub. There are 1,700 issues, and nearly 200 pull requests. How are you managing? I mention this from time to time. The HashiCorp Terraform folk gave up. They admitted, “We’re not going to accept any new issues or pull requests. It doesn’t feel like those numbers are quite there yet. In terms of the tooling, I’m always curious to know at that scale, do you feel like there are tools that are missing from GitHub that would be helpful?
I can give you a bit of an overview of our process. First of all, we have about 30 engineers on the team at the moment. All of those have a recurring responsibility to review progress. That’s how it was in the beginning. In the beginning, I reviewed them all because it was only me. On day one, the first intern we hired was responsible for reviewing them with me. At some point, I stop doing it myself because I got some other things. Now, everyone on the team who’s an engineer does that. That means, it paralyzes without anybody spending all day reviewing the code.
It’s something people like to do, and we very much frame it as engagement with the community. It’s not always the most efficient way to get the work done. Quite often it may not be, but it’s something that’s long-term valuable because we’re onboarding people and explaining it. Even if they do or don’t get an internship, they then are familiar with the process. There’s someone else in our community who’s meaningfully contributed and we want the experience to be as good as possible for them. People will enjoy doing this as well.
In terms of the other processes, we’ve done a load of bespoke stuff. We have a GitHub enterprise plan. We have a lot of people on the team using GitHub code spaces for example because it became a nightmare. I personally use PyTorch with darker integration just so we can have the same existing environment but that needs an enterprise plan and some people find that annoying. Anyway, we ended up going with code spaces. We use every single minute from our SCI. It’s crazy.
We have so many tests. We use a hypothesis for property-based testing and we have a whole team dedicated to making the SCI. We have a MongoDB account to store the test results in dynamic dashboards built and react to view it and everything. We’ve spent a lot of time putting a lot of effort into our tools. There are a few things that are a little bit more AI-based. There’s one thing that we’re working on, not super secretive. We’re exploring using AI for automatically creating the pull request because our tasks are incredibly modular and quite self-contained.
We’re training our own LLMs, a bit like Codex or StarCoder but a bit different trying to make the front-end implementation run our tests. It’s very easy for us to make the tests because they’re so consistency-based using hypotheses. We don’t like the right test cases. We can write all the tests with these so the AI has access to the test, access to the failures of the test, and the snap trace. It can then reiterate its own previous importation. If it gets stuck, then at least hopefully, we have a good starting point, like a good template for an engineer to come and pick up the pieces without having to write the whole thing from scratch. This is something that will accelerate the workflow if we get it right.
Another thing that we did find a bit annoying, we found that we wanted to use the issues, but we never found the interface that nice. We wanted to use the discussions, but never found the issues that nice. It also then ended up being fragmented because we also wanted to use Discord. We use Discord for all of our internal chats on private channels rather than Slack or something just to have us all like a closer community.
If the question or the discussion is open source and okay, if it’s a public topic about public code, then we always encourage the team to have that conversation on a public channel, not a private channel to close the loop with the community, and then having another discussion on GitHub, and then maybe having Google Chat was too much. The interface wasn’t great. There’s nothing that screams at us as being something that’s like missing in the workflow yet, but there are always things that can be improved. This is how I see it.
There’s a huge amount of potential value creation happening in a very short period of time, potentially, especially with OpenAI. Whereas, in the SaaS world, companies are very open and sharing in terms of open source tooling and stuff. If there’s a commercial entity initially, are people still a little bit over-guarded in terms of, “I feel like that’s some valuable IP and I don’t want to share that?” or is that misfounded and people are super sharing?
I’ve followed all of that. In AI, it varies because Hugging Face is super sharing. Although, I say that there’s a slightly different license on some of their new stuff. Broadly speaking, it’s all open source models and it’s their engine for running things that make money. Their value creation has been through the community mindshare more than the revenue.
Increasingly so, one thing I say is that there is a trend. As AI moves away from the research lab spinout and hobbyists, which it was only a few years ago. Everything was open sourced. That was how you got attention because everyone was a researcher in the field. If it’s not open source and they can’t play with it, then people aren’t going to care about it.
It’s also because research has held a lot of power because the reason that Google, Meta, and all these companies in OpenAI suggest we should talk to researchers because all of the thought leaders in AI are predominantly first and foremost researchers. They are pursuing research careers and what people are motivated by there is much less money. It’s partly activism and interesting curiosity, but it’s also the prestige of their name. They’re then etching their name into the permanent record of research history.
Certainly, when I was thinking about jobs and everything a few years ago, if you go work for a company that says, “You can keep publishing papers and you can have a high salary.” Another company says, “You’re building an in-house product and there’s no public record of what you’ve done. You just have to rely on the reference from your supervisor.” That’s less attractive. As things become properly commercial, maybe things swing a bit partly because of the prestigious work. At these top companies, the money is another level and everything.
Also, another point is that maybe you don’t need the top researchers to be in the top enterprises to make a ton of money. Even if all the researchers and the creative thought-leading people want to stay somewhere where it’s open, open source, and everything like this, therefore, AI remains fundamentally open source at the cutting edge. That wouldn’t stop like the previous research of a few years before, which is what ChatGPT, to some extent, is like oldish research with mass scaling and with some tricks. It’s not as simple as that, but it might not even be necessary to have thought-leading researchers in those institutions for them to do mass stuff, and then for them to become big companies.
It’s a turbulent transitional time it seems. A lot of AI companies are not open source at all. Many of them still are. These alternatives to things like ChatGPT which are becoming open source are doing very well. Google has a random employee saying that they have no moat. I could see that there is very little moat apart from perhaps the branding, the user experience, and the initial mindshare. In a way, if someone can make another version of ChatGPT but make it cheaper or something, then the open source is going to be an important part of the story. That was a bit meandering.
Do you think you’ll remember going from an iPhone 3 to an iPhone 4? It was like, “This is amazingly much.” It feels like we’re very much there in the AI models. The difference between 3.5 and 4 for the ChatGPT was a huge difference. If that slows down, then things might calm down a little bit, do you think?
You were taking my opinion with a pinch of salt because I’ve been so deep in infrastructure these last few months that I’ve not been able to unpack what’s happening with the research frontiers of AI engines and stuff. It could be the case that we’re running out of data and everything. Therefore, we hit a ceiling and we need new algorithmic innovations fundamentally so.
Maybe the reason that GPT-4 was a bit underwhelming compared to ChatGPT is that the thing that was exciting about ChatGPT wasn’t the technology. The technology was well-known in the research community. It was a super simple interface. We’d been using GPT 3 for a couple of years for various things like helping to respond to questions. We would only ever use anything like this with an API key. Suddenly, probably multiplied by 1,000, the number of people that can engage with this. This was like a huge watershed moment.
GPT-4 is a lot better in lots of ways, but it’s not quite a fundamental shift in human accessibility to AI in the way that ChatGPT was. There’s still a lot of progress going on in computers and the hardware level, Invidia GPUs, and everything. With the amount of investment that now will flood into, I feel as though the computing resources and power will get cheaper and more efficient because now there’s this moment that AI is the new Bitcoin or the new crypto.
There are going to be a lot of amazing capabilities unlocked because of that and I would still expect there to be exponential growth. I also don’t think that AGI is five years away. It’s depending on how you define AGI, but there’s going to be something that’s totally indistinguishable from a person when communicating with them within the next few decades for sure. I don’t think we’ve had a ceiling personally, but I could be wrong.
Do you worry that there’s potentially too much money still going into the field?
What there need to be most likely are regulation products. The research shouldn’t stop. There’s an interesting talk with Yann LeCun and Andrew Ng that I watched. They said the same thing. I feel like stopping the research is not good. Also, how would that work in practice? I don’t know. Russia or China are only going to be a few years behind us in the grand scheme of things. If we stop it, other people won’t. What’s that going to achieve? It will continue. Therefore, what we need to do is try to think very carefully about not stopping the research and scientific discovery, but what’s the policy and stuff around the product and things like this?
Misinformation is a big potential application of it. Again, I also don’t think it’s necessarily quite as existential as it might seem because there’s already a huge suppression in an indoctrination machine in many of these countries anyway. They’re already meddling with elections and stuff. That’s probably where the risk comes from. If the ability to detect when you’re being manipulated gets harder, it’s like this huge paralyzation of efforts where, if a country wants to meddle in another country’s election, they can synthesize millions of people online super convincingly. They can flood likes and put comments on top. If the internet as a whole cannot distinguish who’s real and who’s not, then that’s a big issue.
If the internet as a whole cannot distinguish who’s real and who’s not, then there will be big issues.
I have a pretty uninformed view and haven’t thought too much about it to be honest, but my gut feeling is that one of the big safeguards will be to have good authentication where the, “I’m not a robot,” button you click on Google and all these things get way more strict. You can’t have robots or AI agents going out doing crazy stuff on mass on the internet because that is existentially worrying personally like human beliefs because we’re so malleable. If an AI wants to control elections, then we can already do that without superintelligence easily and without newspapers. AI can easily do that.
I do think the human captures are getting more and more surreal as well. That’s fun.
I imagine maybe what you need is an iris scanner. Maybe every time you log on, it needs to open up your camera. There needs to be this super secure way that guarantees it’s a real camera, scanning your iris, or touch sensor. That level will need to go up and we need to make sure that there aren’t millions of robots manipulating human consciousness and human opinion.
Can you explain the transformer algorithm to me in two minutes or the paper? What was so similar about it?
I can, but I haven’t gone through it in a while. It enables you to learn keys and values to then like look at yourself. It’s like, “Here’s the information I have. Now, I want to attend to parts of this information well and I want to learn how to do that.” When you do a dictionary lookup, you have this mapping of keys and values, and then you take the ones that are most interesting. You then use those or something like this. Whereas, what attention does is make a soft differential version. If you imagine you have a dictionary and you add the key. You then take that value. That’s a hard discreet process where you just take 1 number or 5 numbers as of addiction with 500.
Also, you can do a soft approximation that’s a differentiable version of that where you read from all of them a bit, and you read from some based more strongly than others. It made a soft version of tending to all of your information, learning the keys and the values, and then creating a new matrix, which is based on these. That’s my high-level version of it. I’ve not read the paper in two years, so I could guess something much better probably if I read it again. This is the high-level gist. Learning to attend to your own information and learning those weights.
This is structural within the network model, is that right?
Exactly. It’s handled with the map model then there’s a dot product or something. It’s all within the network itself. The actual architecture of the network has these in it. No matter what you do, the architecture and the mathematical control flow have these processes. You are necessarily learning the weights of these processes. This is how your reasoning must be done. It’s like the substrate upon which the reasoning can be implemented. All you can do is learn the algorithm and modify those weights. This is what we’re doing. We’re attending to our information. How best should we do that to solve the problem? This is then what the learning does, one step at a time.
Finally, what’s next for Ivy, both as a project and as a business?
To be honest, we’re not running before we can walk. We’re only thinking about the next few months ahead and certainly, the rest of this year. We want to provide and verify that we are providing a lot of value. We’re in a position where we’re not desperate to jump on a particular business model when we’re also blessed to be backed by investors that also understand that this could be quite transformative. First of all, we want to get verified about something that is of general usage across the field. Get adoption mindshare and these things and not worry too much about the other things first.
We’re thinking about that as well. We want to release this in June 2023. Once it’s out there, we’re also working with a lot of amazing projects as part of our GSoC this 2023. We’re working with Hugging Face and implementing some Ivy models in the Hugging Face repository. We’re working with Kornia, PyTorch Geometric, gradSLAM, and ten others. Partially, only three of them were officially selected, but we’re going to still partner with all those projects and many others. The idea is, we get a lot of blog posts written alongside these amazing projects. Maybe get notebooks on their master branch, on our master branch, and so on.
Just as an example of what one of those might look like is Kornia. Kornia is an amazing computer vision library for PyTorch. What we can then do is say, “There’s a new function called Kornia.TensorFlow or Kornia.JAX. Suddenly, the entire division library and all of its hundreds of functions are now usable in your version of your framework. We can also give that project that high-level function, which clearly wraps Ivy.Transpiler. This could be quite powerful to demonstrate on mass with ten lines of code per pull request. We can make every project out there instantly for agnostic. This is something we want to see how far we can go with that, how useful that can be, and how people are using it.
I’m also getting design partners. We are working with some enterprises and getting early feedback from them on their enterprise teams and their engineering teams, and how they’re finding the future. Everybody’s going to listen. We’re going to get it out there to see what people are doing, ask people how they’re finding it, learn what value we’re creating, and be an absorbent sponge and iterate based on the things that are good. Hopefully, keep refining some things that are good and useful.
It’s always tough, especially when you’re growing at the rate you are. It’s super hard to not feel constantly on the back foot and tilting backward off the back foot.
Completely in a field like this. Now, the AI height is real for a few months, so the number of companies with these broad sweeping hugely general, “We’re the this of this and everything.” All these much more verticalized things are coming up like training your own LLMs and it’s specific to a company. We’re doing a much more general infrastructure play and strength as well. We’re doing an infrastructure play that’s super broadly applicable. It can do lots of very different things and be used in different ways. Finding the right verticals to market ourselves with is something that’s an ongoing thing because it can do lots of things.
Maybe it turns out that 70% of the usage is converting PyTorch and JAX to make it quicker. I still think that there’s merit in terms of future-proofing yourself and adding a moat for us to do things the right way and to be truly unified and future-proofed because PyTorch isn’t necessarily the winner of the whole race because of things like Modular and Mojo. Anyway, that’s a whole another horror topic, but this is how I think about it.
Daniel, thanks so much. That was super interesting. I’m not disappointed in that last hour, personally. Congratulations. Thank you so much. It’s great to hear that attitude of staying your ears to the ground. Super respect to you for doing that. Thanks again for your time.
Thank you. Likewise. It’s great to chat with you. Chat again at some point in the future. Thanks.
Important Links
Founder of Ivy, which is on a mission to unify all Machine Learning (ML) frameworks. Join us on our journey, and lets-unify.ai!