Meilisearch
Maybe 15% of the pull requests were made by external contributors. This is quite good!
Check out our open-source Feature Flagging system – Flagsmith on Github! I’d appreciate your feedback ❤️
Ben: I’m super excited to introduce Thomas Payet onto the show. Can you pronounce it first so that I don't get it wrong?
Thomas: I'm the CEO and Cofounder of Meilisearch.
Where did the name come from?
We spent a day brainstorming on that. Meili is the God of Forgetfulness in Norse mythology. We are trying to solve forgetfulness with the search engine.
We're interested to learn about your projects and talk about them. In my research before we started, I was looking at your repository. Most of the folk that comes on this show, if you look at their GitHub repose and scroll all the way back to the first commit, it's normally like a whole fully formed project that has been imported from some private repository or somewhere else.
Your one is up there. Ansible has got a great first commit as well. Yours is five methods or something. I don't know if you remember the first commit, there's almost nothing there. You can go all the way back to the origins of the projects. Ansible is ten lines of Python. I don't think anyone's going to beat that. Yours is pretty good. What was the cause or the reason for that first commit?
We were working together in a company called VeePee in France, which is like a big eCommerce company. We were working on the search engine at this time. It wasn't the last tick that was misconfigured. We started to work on a good alternative. We wanted our own technology. We benchmark and we wanted something much more simple and realistic. We began to work on version engraving. It started as a side project, which is now Meilisearch, because he wanted to compete with us on the relevancy and the speed that we were providing on the project, because he always liked risk. He’s been there since the first three years. He has been contributing to the long wage. Since we started to work on that project, he wanted to compete with us. We visited some achievement in risks. The first commits come from there.
You were working for a company that let three people go off and build two prototype search engines. Why did you leave that company? That company sounds great.
For full context, we were in a team which has a partnership with our school, 42.
We had this opportunity of working half time on a company's project and half time on our school projects. One of the projects that they brought us was that the search engine was not giving good results. When you are making millions a day in revenue, having a bit of such can change a lot of things. They wanted to invest some times on it. They are giving students the opportunity to explore and make proof of concept for humans. If they want to take the project back into the IT department, they do so. We had this opportunity of working for three months on trying to prototyping search engine. We started to check all algorithms and try to reverse engineer what they were trying to explain in our breakfasts.
We had this experience of using Elastic so that there were a lot of things that we knew we wanted and we knew we didn't want about the search engine that we were designing. The direction for us was to prove that we could make maybe a different technology. Even from VeePee’s point of view, the duty for them was to create those kinds of teams, maybe sell those technology to other companies. This is what we have in mind at the beginning. We tried to continue in that way, but at some point, we realized that our search engine and the thing that we learn from building that search engine, we could sell it and make it available to different companies. Maybe VeePee was too slow for us. It's a big company.
To take decision about funding, there were two stories. We decided to leave Veepee. We left all the code base. At the beginning, it was about finding some ways to make some revenue because we didn't have a job anymore. From the beginning, the search engine was open source. We are the labels. It was of use to us that the coach on GitHub was available. It’s just that we didn't have any documentation, tests, or whatever. It was just look at this there because we are the labels, it will be open source. We started to work like this, trying to find ways to make revenue. We ended up working with Louis Vuitton.
They became our first customer. They organize this hackathon about such around such. We had this not expertise, but we had this experience that could use. We sold them what is Meilisearch now. They are using it now in production, but we said, “We want to be contractors, but the code that we are doing for you will be open source. We will keep everything about the code and it will be open source.” They didn't care about the codes. They want the product that works in gen. They are not here about the technology. This is the beginning of Meilisearch. We funded the three-people team.
The code was already open source. After a few months of thinking about what we wanted to do with this knowledge that we have, this search engine that we knew a lot of developers might want to use, we realized at some point that our whole strategy should be around open source. It was obvious to us and we had to invest much more time into what makes a good open source project in order to get more people to use Meilisearch, and to get some contributors as well. Also, to make sure that we are building something, not just from our point of view, but with the community in mind and what they need when they think about the search engine.
Elastic has come up quite a lot in this show. I've never gone deeply into it. I've worked with it quite a few years ago now when it was still fairly young. I quite liked it, but I guess before that, there was nothing. Do you think the things that you guys didn't like about or found hard to deal with Elastic were in the way, one of the reasons that Meilisearch has been successful in that what you didn't or didn't like about it was the same as what other folks were experiencing as well? For a while, it was the only game in town if you didn't want to do something proprietary, for four years or so?
It's an amazingly big project. Even now, it's still the default option when it comes to search engine or such. You don't have to think twice. You know someone was already using Elastic and everyone's talking about it. The product is very complete. The search engine, you can do a lot of stuff. It’s based on using solution in the search library. Elastic is providing the GS and API, the reputation. Still, you have to understand how Elastic works, how you want to configure it, and what is happening.
Elastic can be powerful because you get what you need but you have to spend so much time understanding what's happening and what you want it to do, because, like what we are doing with Meilisearch, you can build log ins and it's like two different news cases. Elastic can do everything, but you still have to maintain it. What we are trying to do with Meilisearch, and this is what we felt at that time, is that what we wanted to build is a customer facing search. We don't want to deal with terabytes of data. It's out of scope for us. We are trying to provide a great search experience for them. Easy for developers to implement.
You might be used to Google search, the search on Amazon, which are very good, but every time you go on the different websites, something much smaller on major websites, eCommerce websites, the search is awful. You have to adapt, you have to try to understand others. The search engine is working so you can adapt your way of querying the search engine. You're like, “If I type this, it's up on in the title only, not in the description. Maybe this is the data I'm looking for.”
It's 2020, so we can do better. This is where it makes sense to us. We want to use technology. It's one of the most common features in any app and website and still, it's always very bad. There is Meilisearch now. Thanks. It's open source. Any developer can install it. What we want to see in a few years is that we don't want to use a broken search bar anymore. That wouldn't make sense.
You left the company that you were working out. You've got to then take those learnings of Elastic and those version ones that you'd built with those guys. It's a perfect opportunity. This trope that you should never rewrite things, but if you don't own the IP to version one, then you have to. Every engineer thinks that if they could rewrite everything from scratch, it would be way better the second time. You got an opportunity to do that because you would have been breaking the law if you hadn't. That's a great opportunity. It sounds like you were much more worried or concerned or focusing on the open source aspects of the project rather than any commercial aspects of it. Did you have a plan from day one like, “This is how we're going to make some money and put food on the table?”
The plan was to make a company out of what we could do, building Meilisearch, and that was it. We didn't think about open source. We didn't think about, “We should do it.” It quickly became of use that open source was a way to get our technology known. What we are struggling with now is that we are open source first and now we want to make a business out of it. This is where we faced most difficulties because we are trying things, but still, the search engine that we want to build, we want every feature to be in the open source package. We have been trying to define the vision that we have for the company instead of for the projects for the last few months.
This is work that we are doing with the investors that we're working with because they do trust us in building the best search technology. Now the next step for us is to show that we can build a sustainable business around these open source projects. It's a great opportunity to work in open source and be able to think about how to make the projects sustainable, because there are a lot of different projects that have some difficulties to be sustainable. The costs cannot pay themselves on the project. This is something that we are thinking about. We are developing the cloud offering from a search. We are thinking about how we will provide support and enterprise plans and research. I'm very glad Meilisearch will be open source and we are building a cool search technology that developers will use.
You were committing code to that repository right from day one. At some point in the near future, there was a product that someone else like myself could use and was useful. How long did that take to get to that point?
We worked with Louis Vuitton for eight months, maybe. By the end of those eight months, we had a search engine that was working that was not specific to Louis Vuitton. We had in mind that our code was open source and we wanted something that other people could use. That was the time we were thinking about, “We've been working with a written, but we don't want to become contractors for long search. We don't want to do this always. What's next?” This is where we thought about making the proper open-sourcing investment. We had to build proper documentation. We had to build a proper read me. We had to be the proper project around the code. The code was open source but now we wanted to have an open source project.
You start to think about how you can onboard as fast as possible, people to use your search engine, because it's usable, but only for you. You have to write the quick start and the whole documentation on how it works and why you design it that way. We started to provide some integration in Ruby, Python, and JavaScript so the users can use Meilisearch as fast as possible. The issue that we had at this time is trying search API can be time time-consuming because you have to index your document and then you have to try to search. Even if you try to search in your terminal or something, you don't have the wow effect that you could have on a web UI such as you type expense.
We had to provide all of these, so the robots can try as fast as possible with Meilisearch. They can have that wow effect and they get to invest more time in using Meilisearch, maybe giving us some feedback and open issues. By the end of 2019, we spent two months working on the documentation and the first integration and the renew, and then we began to communicate about it. The great thing is that for them, the issue that we have with Meilisearch was something quite common. Our story was not completely new to other developers. They will quickly understand what we are doing and why we were doing it. People get to try a very fast indigent search engine.
For the first eight months maybe, we had 200 styles, but it was organic. People that came by the repository and did not try, but, “This is a search engine. Maybe I can come back later.” When we started to invest in the documentation and the integrations, this is where we try to communicate much more. We grow to 1,000 stars maybe. At that point, I can use visibility that brought us maybe 3,000 or 4,000 stars. When you get the stars, you don't get the users instantly, because people are just starting the project and hopefully, they will come back 3 or 6 months later. We saw the usage growing a few months after the stars that we get on GitHub.
We went through a process of finding the people that had started a project. I'd like to send them an email or find them on LinkedIn or whatever. We did it all by ourselves. It was completely organic. Half the people wrote back going, “How did you find me? What are you doing?” It backfired a little bit because it freaks people out. You've got that problem where people stole your projects. I generally only look over my starred projects if I'm like, “What was that thing that I found the other day?” There are dozens of projects in there that I'll never go back to.
Looking at your history, for those who don't know, there's a great website called Star-History.com where you can see it's got an index of GitHub projects and star history. I'm obsessed with it. I can see March 2020 at the top of Hacker News. You put on 1,500 stars in one day. It took off from there. Your rank and live attack is a completely different world. It's amazing how that happens.
If people come from news and they fall on your repository, you get the star, but I'm not sure that you will get the news agent because of what you just said. It was the right timing to have the second news, maybe a few months after we begin to truly have the documentation and the read me that people get to understand very quickly, and we got to iterate a lot on how we want to share the message on what we are doing, how we are doing it, and how you should use it before getting people landing on the read me.
How come you decided to write it in Rust? Was there a big fight or did the Rust go on? Talk about that.
The first one was easier for them. I think two recruits going developer. It was quite of use for you when you are writing a search engine, you need performance. There are a few different languages that you can use. There is C++, Golang, and Rust. C++ was very frightening to us because of the maintenance that you might need to maintain those kinds of code base. It was compatible with Rust. We are not sure about using Rust from the beginning because the community was not that big. You can miss sometimes some basic libraries that you can find in different languages. If you want to do repetition, it's very easy to do so. In Rust, there is no standard yet. This is something we have been struggling with for the past few years.
Rust is a low-level language, but one of the most modern ones. The way they are thinking about the language, they invest a lot of time. There are maybe not that many new features about the language, but it's always well designed. We can feel it when we are developing because it was easy for us to maintain what we did a few years back. This is what we want. We want to move fast, but we want to do something with high quality standards. It was a bet a few years ago.
We also have some issues with garbage collector. We were doing benchmarks and the garbage collector is having a past every ten minutes or something, and we saw this degradation of performance is because of that in the search queries. There are few flexibilities in terms of data structure that you can use. It's much easier to go lower in Rust.
Rust doesn't have a garbage collector. Do you think it was helpful from the vibes of the projects and people thinking, “Cool?” It's unfair to say cooler. More interesting to people, maybe.
It's getting cooler now. A few years ago, it was not. The ecosystem was booming. If you were doing a JavaScript project, this is something good. Rust was still not that well known. We are doing this search engine in Rust. People are like, “We should rebuild everything in Rust.” People are crazy about it, but that wasn't the case a few years ago. We benefit from it now at some point of having this being built in Rust.
The other interesting thing about the space, because we had talked about Elastic earlier, but there was Algolia. They've been around for quite a while. They position themselves differently to where Elastic sits. They're much more about developer experience and you're not having to spend ages, designing this femur and tweaking the knobs and all that stuff to try and tune the results. Where you focused much more on replacing Elastic? How did Algolia play into that?
We are a big fan of Algolia from the beginning. In Paris, it's one of the cooler tech startup or tech companies now. Algolia was a lot of inspiration for us, but we don't have in mind that we could replace Algolia at some point because the way I see it is that they are in the different business now. They're refocusing big eCommerce. At the beginning it was a search engine, but now they are marketing a whole approach out to navigate your website to engage more people. All the features that they do provide now are made to have more return investment or make people buy more on your websites.
This is not what we have in mind. We are focusing on the search engine. Sometimes it's quite frustrating because people come from Algolia. They're not happy about the price and they expect us to have the same dashboards and features, but this is not what we want to do in general. We are happy that you are using Meilisearch but we are trying to do this tech to make search great. This is our scope. We got a lot of inspiration from our Algolia, at least from the first blog post Algolia is dealing with. The search engine, Algolia designed it what Algolia used to be at the beginning.
It was something developer-friendly from the beginning, which is less the case now, but it was well known for its documentation and API. The focus shifted. We are already happy because if you're already a big eCommerce and you want to promote certain kinds of products, I would advise you to go to Algolia if you want those kinds of specific recommendations. AI-based, you want people to scroll down. Algolia is the right tool for it. It may be expensive, but if you want to build the search feature, the search bar on your websites, you want type as your search, you want something simple to use, Meilisearch may come in use.
We are not talking that much about Algolia, even on our websites, because I don't think we are competitors in any sense. It's like we are competing with Elastic with a lot of stuff, the subset of what you can do with Meilisearch, because there is also a set of things you can do with Elastic that you cannot do with Meilisearch. We've competed with Elastic on those things. We want developers to stop using Elastic because that doesn't make sense to use it, to build front-end search. We want people to use Meilisearch. It will be much easier to maintain and implement. It will be faster. It will be more relevant, but we do not tell the companies that Algolia is dealing with.
Looking at the Algolia website now, they look almost like an eCommerce provider. That’s a lot easier in a way. That’s the use case that they're trying to solve, rather than more of a general case. In terms of the community involvement, what's worked and what hasn't worked for you, in terms of trying to get community involvement, one of the most common themes that I've found, especially with companies that are commercial as well as open source, is that it's still the core team that is doing 95% of the commits. If you're in the guts of the beast, all of the commits are the founders or the core engineering team. Have you managed to solve that problem in any way? What worked for you?
From the beginning, the search engine interests. It is very easy to contribute to. A lot of people coming to Meilisearch, we were saying, “You want to contribute to Meilisearch. There are all these integrations that you can work with because people need those and it's easy to contribute to an API wrapper in Ruby for Meilisearch.” We are focusing to onboard the contributors because I believe that a lot of the labels have the competencies and the skills to contribute to open source, but they don't know how. They don't know what maintenance I was expecting. We are trying to document a lot of what we expect from approval requests and what will be the process. It's not because you're doing approval requests that we will merge it in a few days, because there are a lot of processes happening, and this is not what we're expecting.
We got a lot of people coming from those integrations and contributed because they were happy to see that we were welcoming for the contributors and we were expecting for them. The way we work now is that everything should be public by default. If you don't get contribution, you will get feedback about what you're doing. We have this very strong, close arbiter of contributors that are helping us on testing the engine, doing some protocols. They are even updating with each release the proper integrations. They are part of the team now, but instead of documenting the internal process that we have on Notion, we make them on GitHub.
We have this product and specification repository in which our product manager is working completely in public. He's saying, “Those are the next features. We are prioritizing them for this and this reason. We want it to behave like this. What do you think?” We are asking in the Slack channel, we need feedback about the design, the naming, everything. This is the contribution that we are expecting. What worked for us was trying to build this community. We are less good, visible, and have a lot of developers coming to us, but we are way better when we have this close community of contributors that we already have.
What we want to be better at now is to be maybe more visible and to get to not industrialize, but to scale that process of onboarding contributors and make them come back regularly. We realized that we had maybe 15% of the pull requests on all digital versions that were made by external contributors. This is quite good.
People don't know the actual number, but they know a gut feel and they’re probably pretty accurate about it. The other thing is, we experienced this as well, keeping twelve language SDKs, trying to keep them in feature parity. If you add a feature to the platform, you need to put that into the SDK. That's a ton of work. We have way more activity on our SDKs from random people on the internet than we do on our core platform. That's where they've touched in the product as a developer.
If they care about having a caching mechanism in Ruby, then they can go and write that, and it's fairly easy. It's in the language that they know how to work with because they've chosen that SDK. SDK-based projects, it's a lot of work like documenting how they work. We've gone through this process of rewriting the interfaces and a lot of the engines, like all of the service side SDKs, because when we started writing them, you've got a day to write the Ruby one, and then write the Golang one. We didn't put any thought into making the experience of those languages consistent with each other. It’s tons of stuff. You don't realize it, it matters. Developers can feel it.
We had this thought of maybe automating a lot of the development for that using open API or stuff like this, because you can read automatic forever. I'm glad we didn't because what works with the whole Meilisearch is to make such easy and make the rebels happy and productive. It's different in every language because every language have their own expectations and way of doing things. Maybe we are slower in producing new integration, but it's because we are spending a lot of time getting to know the language.
It's tough as well. If you've got an untyped asynchronous language and then a typed synchronous, you be completely different expectations of how they behave. We've found that hard. Having computer-generated SDKs, you'd rather use the rest interface. You mentioned investors earlier. You've gone as a hack and use your star count starting to spiral up into the atmosphere. At that point, you're getting emails every day from VCs that you've never heard of desperate to give you some money. What happened then?
We want to show about how to continue this company, because we didn't want to continue to work as contractors like we did for everything. This is not something we wanted to scale. One of the choices that we got was if we got to invest a decent amount of time and invest the company towards open source, we thought that we could build this community that will allow us to bring in investors into the adventure. What we did in March 2020 is that, and we didn't get that much mail in France a few years ago, open source wasn’t that abused. The booming open source, the hot tech startup, but that was it. We spend a lot of time talking with French investors, but we had to explain what was open source and what was the vision behind it, which was much less difficult with UK in this task.
They have a lot of open source companies that I've been working with. This can help. They have the same stage company as us. We spend maybe 5 to 6 months for the first fundraise. We didn't get the right feeling about France investors. Someone helped us and made us meet the UK investors. It's not that people did not understand that we were not able to explain it correctly as the developers.
At first, shaping the whole pitch and what we are trying to explain and say in technical walls into something much more businessy. This is how we got to talk with other UK investors. Now we are even working with some investors from Silicon Valley. This is the beginning for us, but we will get to meet those bigger tech companies and get rid of inspiration from them. There are a lot of open source tech companies that we have to learn from. We are French and European. We are proud about it, but there are a lot of things that are maybe done differently in the US. We can still learn a lot from it. We don't want to become American. We don't want to be a Silicon Valley company. The way they are building tech companies, there are lots we can take from. This is what we are doing and our job now.
You’re at 23,000 stars now. Super big projects. Amazing feat. How do you track your progress now? If you could move one number about the project and the business, what would that be?
We are following digital styles, but we are also following the number of instances that runs for more than seven days and that receive a few requests. We have some analytics. It's not like in SaaS, you cannot have everything. You just have anonymous feedback.
Like a weekly heartbeat thing almost.
We get to guess a number of what we call Meilisearch production. This is not accurate because some people might deactivate it. Maybe the way we are checking it is not good, but we get the sense of how people are using it. This is what is useful to us. We get also more request from companies that are using Meilisearch. They want to work with us now because they need to make sure that they are using Meilisearch the right way. They want us to help them configure Meilisearch to do some support maybe, to answer to these questions. Those are the flags that tell us that we are in the right direction. With the open source project, we are focusing, targeting to the rebels. Now that we are building this cloud offering on the side, we get to see that we have more companies interested to join the beta. We have companies asking us for some support. We are in the right direction and position, but now it's about making sure that we do things the right way and the right step at a time.
The open source project is the most important thing. This is not something that we will forget about in a few months, because we have these clouds offering that wouldn't make sense at all. You have to make a choice at some point say, “We should think about monetization and growing the community and stuff.” It's very easy for us developers to invest in the product, making sure that we are building the right thing, but there is all this stuff around to make sure that we can make Meilisearch sustainable in the long run. We have to think about this.
In terms of what's coming next, especially on the technical side, is there anything that you've got in mind or that you're working on that you're excited about?
On the technical side, yes. We had a lot of difficulty. There are a lot of things. Thinking about reputation first. We want Meilisearch to be able to scale horizontally. This is something quite difficult for us. This is a challenge that we are thinking about every month.
That was one of my questions. The architecture, is it distributed by default?
Not yet. We are thinking about it. We also want to make Meilisearch more flexible. A lot of offering around what they called computing at edge. What cluster is doing with critical workers. Because Meilisearch could be compiled we could have Meilisearch training in all the CDNs that you have around and make search fast for everyone. There are a lot of challenges around that. It's like how to make sure that you can, because we don't have a file system, where do you keep the state of the index? How do you want to share it around the world? Are you using history?
There are some quite about time limits in time of competition that you can do on those workers. For all the search, we might choose those on the edge workers. Those are the things we are exploring right now. One of the key things of the search engine and the search experience is being able to answer fast. Since you don't have the search engine in your browser, you will have this with network latency in any cases.
If you have the network identity, you can feel that it's not instant, and you might think about it. We want to reduce the friction between the human and the computer. It has to be fast. This is one of the ways we think about the future of Meilisearch is being able to provide this search on the edge platform. It’s cheap also. That should be like the future of searching for us. The big changes that we have now is make sure that we can run Meilisearch on this network.
If you're running with a relatively small index, you can push it out into the edge. For certain workloads, you don't have to worry about the state at all.
In a few milliseconds, you have to push. That's very easy, but it's also something quite difficult. For the story about Meilisearch and we get to rewrite everything. We started a few years ago and we worked on this iteration maybe for a year and a half or two years. A few years ago, we were working on a big refactor of the codes. We made the merge in July or August 2021. For a long time, we had few features released in the search engine because we were waiting for this new engine. Once we got this new engine, the following four months, we released a big feature every month.
It was very easy for us, but now that we've released all these features that were necessary for us to become a distant technology option, the big issues that we have are regarding indexation speeds. Not such speed, but we still have to think about it, but indexation speed, which is something we might have missed for the rest humans. It's different for every use case. For us, we are using Meilisearch for product so far, the documentation, or whatever. Indexation time doesn't matter because we have maybe a few hundreds of documents, so it’s easy. For people who have these big workloads of indexation that are happening every day, you cannot have a few hours of indexing because you cannot be reactive with what might be the issue the business is facing. The big issues that we have now is focusing on making indexation faster.
We've done a ton of work. We’re about to launch an Edge platform for our SaaS products as well, for exactly the same reasons. It’s interesting because the compute has been solved altogether. The only problem now is state. There were five different data sources that we looked at. We ended up using DynamoDB, but I still feel like there's a better product that doesn't exist yet. That would be more appropriate and would work better for us. At that point, you're making those choices based on exactly what’s your use case. I love the idea of search at the edge. There are some things that are easy to do and some that are challenging. That's pretty challenging, but it sounds like a great challenge to take upon.
Thomas, it's been super interesting talking to you. I want to thank you for your time again. Thank you for the product. If there's any particular colleagues or members of the community you want to give a shout out to before we go, if there's anyone on your mind who has done some heroic pull request or something. Even someone who's done 10,000 lines of code that you've had to say no to. That's always a good one.
We have this contributor called BB. It’s his name on GitHub. I don't know how he got that login on GitHub. He's been working with us from the beginning. He’s always testing what we are doing, giving us feedback. He’s way more experienced than us in the team. He gives us this senior touch in what we are doing. He is answering to all the questions. We were wondering where he was. He got a new child. We wanted to congratulate him. That's amazing. We wanted to make sure that he is going well, but it seems to be so we are very happy about it.
We had some reject to a massive pull request that was doing something crazy. I want to congratulate you again. Thank you for licensing a project the way you have and putting it first to serve. It's great to hear. I wish you luck in the future.
Thank you very much, Ben. Thank you for having me.