Datastax

Interview with Patrick McFadin: Apache Cassandra and Developer Relations, Datastax

By Ben Rometsch on October 28, 2022
DataStax logo

Many people consider Cassandra the very first database that hasn't failed them. It can run in multiple data centers and withstand all kinds of terrible failures.

ben-rometsch picture
Ben RometschHost Interview
Patrick McFadin DataStax Podcast
Patrick McFadinApache Cassandra and Developer Relations
--:--
--:--

šŸ‘‰ Check out our open-source Feature Flagging system ā€“ Flagsmith on Github! Iā€™d appreciate your feedback ā¤ļø

I am with Patrick McFadin for another episode of the show. Patrick, it's great to have you on. You've got a really interesting and deep open-source background. Do you want to give us a quick run through who you are, what you've been up to, and what you're working on now?

My background is in engineering. My formal life is I'm a computer engineer, but it seems like infrastructure has ruled my life. My career pretty much spanned over the beginning of the internet, and with that running infrastructure on the internet, I got involved with Apache Software in the mid-'90s when it was just starting to be a thing. Everyone started using Apache HTTPd, and I was one of them.

My career with open-sources evolved so has my career with infrastructure. In the early 2000s, when we were all trying to scale like crazy, I felt like that was the time I was trying to get off of proprietary software the most because it was getting in my way. That's when I really got the religion and jumped the whole hog into the database realm. I've been in DataStax working on Apache Cassandra for the past several years. I spend a lot of time working with other data infrastructure projects like Pulsar, Spark, and Flink. Apache fills in the name with something that is about data. There you go.

You've reminded me of something. I was working for an IT consultancy. My first job was at the BBC, helping them move their website from a homepage with a bunch of links on it to something that changed once a day, which was a big lift at the time. I remember that was the first time I touched an Apache HTTPd server. Those of you who were younger than myself or Patrick would probably be surprised to find out that most people were paying for their web server at that point in time. There was a Netscape home web server, and you had to pay them for it.

O'Reilly had one. O'Reilly was the original bookseller. Tim O'Reilly bought a company that was selling a web server. He got into this by saying, "This internet seems to be a thing. I'm going to get into the software game." O'Reilly sold a web server. You could buy it. It was in a book. It had a CD ROM, and you plug it in and like, "Look at us. We're on the web."

I remember the BBC choosing to use Apache and it being more of a curiosity at the time. Java being open-source was a curiosity. How did you get involved with the Apache Foundation, then? Was that way back in those midst of time?

My first real hardcore open-source world was working on the Linux Kernel. I was working at a US university. We didn't have any money. I was working on the Kernel stuff for network drivers at the time. It was when we were moving over to more application based. I remember Brian Behlendorf on mailing lists. I was talking to him a few times because we were trying to get HTTP working. That was before Apache Software Foundation became a thing.

It was the world that I lived in, and I was a part of it. Interestingly enough, I spent more time on the Tomcat server because we were running a lot of stuff in Java. That was another heavy pay for the server system, and we jumped onto Tomcat. Where I got most of my experience with ASF early days, like 1999 to 2000, I was trying to run workloads on the internet, not just serving on webpages. I've always contributed to projects but have never been an ASF member. I can fill out my membership card.

I remember I used Tomcat a hell of a lot back in the late '90s through 2010. I've mentioned this on the show a few times. Back then, you'd pay a huge amount for an application server. I remember working on a big eCommerce store for the Woolworths website in the UK. The licensing cost for those app servers was well into the six digits.

IBM made a pretty good showing. IBM WebSphere was one of the top dogs. It seems like my open-source career has always been fleeing proprietary licensing. That was one of them. When I realized we were turning into an IBM shop, it was like I just had a shutter. WebSphere was great. I used it a lot. It worked for what we needed to do, but every time I wanted to do something, I had to call a salesperson.

We have customers that still run production stuff on WebSphere. Stuff like that has a long shelf life. Cassandra is interesting, technically. It burst out of the gate and onto the scene. I remember it was a pretty big deal at the time.

Not anymore, right?

It is now a super crowded market. What I'm trying to say is Cassandra seemed to be one of the first generation of projects that people were using to come away from this idea that you'd only have a single data store. Everyone was running on CQL, ORM, or object-relational databases. This idea came out because of the scale that these products needed to work at that it was okay to have four data stores with different characteristics. I remember Cassandra coming out as one of those because it got some unusual properties. It's unusual from the point of view of someone who spent their life writing from a relationship standpoint.

Open-source databases are not new. We all have time in the barrel with MySQL and Postgres. Those are two very popular open-source databases. Cassandra was in 2010 when it burst onto the scene. It was in the beginning class of all the NoSQL databases. You're right. It was this explosion of like, "This database from 30 years ago doesn't work anymore." It was only because we were getting in our stride when it came to scaling workloads on the internet.

That's where companies like Facebook or Google were making headlines because every time they did something that was scale, everyone was paying attention. That was when it was all starting to pick up. We weren't talking about thousands of users. We were talking about millions and potentially even a billion. That was the discussion like, "Who's going to be the first to have a billion users on their platform?" In 2008, MySQL was the solid choice, but with what Facebook was doing, they had 1,800 MySQL charted servers running their platform. I worked with a lot of folks on Facebook. I spent a lot of time there working on stuff. Everybody was trying to hold the thing together with duct tape.

That's probably close to two orders of magnitude lower than where they are now in terms of the volume of stuff.

It was clear that this thing that worked great back then was not the right choice now. There were several databases that popped up at that time, and Cassandra was one of them. The thing about Cassandra that helped it go faster was that it was an Apache project, which makes it easier for people to adopt because they trust like, "Here's the license to community safety." It's not a company that owns it. It's also the way that it works.

From the very beginning, it was based on the Dynamo paper. At the time, if you take a time machine and go back, everybody was talking about the Dynamo paper because it was the paper that said, "This is exactly how you can scale a database geographically and maintain uptime in a different way." It was very clear in the prescription how there's shared nothing. Every node acts independently but works as a team, and this is how they work as a team. This is how you keep your data consistent. This is how you deal with failures. As a result, there were a lot of databases that were based on the Dynamo paper. Cassandra was one of them. Some people might remember React. React was another popular database at the time. There was another one called Voldemort.

I remember React.

Fun fact, Voldemort was a Dynamo-based DB hatched at LinkedIn by the same team that brought us Kafka. Jun Rao, who's worked on Voldemort and Kafka, is also a Cassandra committer. He's on the PMC. We were all doing the same thing at the same time like, "We're trying to solve these scale problems." Cassandra was elegant enough to solve the problem. Time and time again, people were like, "This is the first database that hasn't failed on me."

The thing that was exciting is that you could run it in multiple data centers active-active. It could withstand all kinds of terrible failures. At the time, that was pretty much magic. Those stories kept proliferating. Companies like Facebook, Netflix, and Google were all talking about how they were using Cassandra somewhere, and everyone paid attention.

This was back when you had to write your own database engine if you wanted to build something on a global scale. I remember I read about how Google had written its own file system. That blew my mind. They're out there in terms of someone sitting in a meeting and was like, "Why don't we run our own file system?" because it would be more effective. How did that work? Did Facebook hand it over?

They didn't hand over anything. This is a fun story. Facebook at the time was going so fast. I remember they were running with scissors, which Mark Zuckerberg said. They were trying to do anything they could to keep ahead of the scale problems that they were having. Avinash and Prakash, the two engineers at Facebook that were working on Cassandra, got into a bit of a contest with the HBase folks, and they lost.

They're like, "Okay." The history is much clearer, but they were like, "Fine." They donated it to the ASF. I don't think that anyone on Facebook knew that they did that. It showed up as an incubating project at the ASF. There are not a lot of other projects from Facebook on Apache Software Foundation. That was the one that got away.

For the benefit of the people, me and myself included, in terms of using it as a database, you've talked about the infrastructure and reliability aspects. As a developer, it has a fairly radically different set of properties to a regular relational database, right?

One of my strongest backgrounds is relational theory. We've built out years the TADs and CADs. This has been pretty established in computer science for a long time. A relational database is built on this idea where you have a domain of data and build models around the domain. You say, "Here's how I would model it," and there are relationships. I then use joints to make those relationships happen. Relational database models are good for data savings. At the time when storing data was a very expensive problem, it reduced the amount of data that needed to be stored because you could normalize all your data. There was like, "Here's a little bit of data over here, and here's a little bit of data over here, and I could join them and get more data."

That was the premise of relational databases. The thing that we realized, and this is where I was always running into trouble, is that it's a very expensive operation. You're trading this like, "This base is not as expensive but now compute. Performance is the expensive part." How do we flip that around? Cassandra is built on the idea that you have your data models first, and you work through your data. We don't worry about how much data is being stored as much. It's not like meant to be optimization. You're right. One of the biggest things people have first started having trouble with is the data modeling around Cassandra.

Cassandra does have a language called CQL. It looks a lot like SQL because it's tables, and you could do select, insert, and things like that, but there are no joints. The first thing that trips people up is like, "Wait a minute. How do I join?" What it does is it forces developers to think more ahead, thinking about you taking your application, looking at the queries you need, and building your data models to match those queries. It's becoming more realistic now because we build out things like microservices. One service has one thing, and that one service may have one table in Cassandra. That works out great.

There are MVC controllers, things like React JS, and things like that. They match up with that pretty well. The need for joints has gone down greatly. As a matter of fact, one of the things that most people do with relational databases to speed them up is to denormalize their data down to one table anyway. It's not as crazy anymore, but it is the biggest thing to get through in your initial work with Cassandra because there are other databases like Mongo, like, "Throw your JSON here. We'll figure it out for you later."

I remember talking to some people who had been using it. The thing that I couldn't get my head around is if you ask ten engineers to build a relational database for a library, which was always the canonical example, they'd probably come up with ten almost identical tables and relations and things like that. How it was explained to me by Cassandra was there are probably ten different ways that you could model that data. It's just a question of what you feel like you want to do with it in the future.

That's an important bit flip. I've seen it many times when you go from a relational where I'm modeling my domain. It's funny because I was in Oracle DBA, and I remember the pride of the massive and very complex SQL statement where you did join and all these things like, "You did a left outer join." There was a little bit of a swagger whenever you had this massive SQL statement that did something very elegant.

That was almost like writing code. The bit flip when you see the matrix is when you see like, "The query that I need is modeled very simply in this table like a select clause and a good partition key. I'm going to get ten millisecond response times on this all day with billions of records." You will get there, but you have to let go of the past.

I remember I did spend some time in San Francisco in 2001. We were working on the Williams Sonoma website. They had the NC relation diagram on the wall, and it was 78 pieces of A4. You don't have one little bit of the wall that was your bit of the wall. You were just looking at 1/10 of it. We had an Oracle guy there, and he was right. He was writing CQL that wrote CQL to do stuff. That was his kind of like, "This is how I'm going to use it."

It's because you could only keep a certain small part of that in your head at one time before you were off into a bit of the database you didn't know anything about. It's interesting how those things have changed quite radically. Tell us a bit about the story. In 2010, I remember it was Cassandra and Mongo. I remember React was out there. They were much more on the commercial side. I remember people were throwing all their stuff at Cassandra. I remember they didn't know the best what it was being and what or how it was best used. Where did the project go from there?

It's a sorted story, for sure. Anyone who's reading knows open-source is full of drama. DataStax, as a company, decided to back this open-source project, and that's a pretty common story in open-source. There's an open-source project, and then there's a company trying to commercialize some piece of it. We've struggled with that for a long time. It's hard to be an open-source company. I'll say that now. It's because you have these tough conflicting ideals where you're like, "We're a company. We should be making money, but I love open-source and giving away software." Those are conflict hard. As I said, I've been doing this for several years, and I've never gotten it perfect.

The important thing was we were capitalized in a way that we were able to put a lot of engineering effort into Cassandra. An early project needs a lot of engineering time. If it's a hobbyist project, it's going to slowly drip out features and bug things. In the commercialization, there are two pieces of this, which is more general is an open-source company that says, "We're going to try to make money and offer this open-source project," but there's also the adoption of a large company that depends on that open-source project. Cassandra had both of those going on at the same time, which made it take off fast. Netflix and Apple decided that this is their database of the future and put an enormous amount of money and talent towards it.

DataStax said, "Yes, we're trying to make a company from this." If you look at the commits in that from 2010 to 2015, it was so fast. One of the things that upset a lot of people in the community is it changed so much. It was constantly moving forward. There are new features all the time. It was hard to keep up, but it solidified because Apple and Netflix were solving problems at their scale. They're like, "We have real problems." They were solving them and committing those to open-source code. In DataStax, we had customers telling us, "This is what we need in a database." We were all contributing to all these problems being manifested into code.

As you know, any good story goes. Whenever you have all these competing interests, I start competing. We got a bit of a falling out in the community for a while. There's a lot of anger and pointing fingers because when something becomes popular, it becomes a bigger deal than before. We went through this bad adolescent phase. We were not talking to each other and being angry and stuff. We've since gotten much better. We had some counseling and went through therapy. The project is matured a lot.

I know I'm rushing through history, but this is a good template for any open-source project. You have that explosion that happens because it's popular, and everyone wants to be a part of it. You have a cut point. When it becomes so popular, and everyone wants to have a piece of the pie, this is where great projects go to die. Cassandra could have gone that way. We struggled for a long time, like years. We didn't ship code for a long time, and now we're back on pace.

I see this play out a lot or feels like it plays out a lot where you've got these #realproblems where, "I've got a billion users. I need to do a trillion of something and solve the problem for that." Ninety-nine percent of software engineers in the world don't have a billion users and need things to work for different classes of problems. Was that the crux of the issue?

Partially. At the time, let's say those 2015, that's a good marker. Almost every large-scale internet company was using Cassandra, and they still are. You pick up your phone, and you're probably using Cassandra all day long. I don't care what phone you have or what app you're using. You're using Cassandra. It's that ubiquitous now.

It was because, "Our business now depends on this, and we don't like what you're doing to the project." You could say that from any company's point of view, "Our interests are not being met. We have all this investment in it that is under jeopardy." It builds up anger or resentment, or you get into this tribalism, which we did. Tribalism is a thing.

This is probably a good reflection of the world as it is now. It's like, "They are trying to ruin us." That happens because everybody was so dependent on having a healthy, strong Cassandra, but then there are feature comparisons and things like that like, "Which way do we want the project to go?" It's that kind of stuff, and you have to agree. You can't just take it and run. No one can say like, "Mongo has it good when it comes to open-source," because they own the project. They're like, "We want to go this direction," and everyone's either okay with it, or they can leave.

Have there been any long-lived forks? Has it spurred off into different projects that stand by themselves now?

For Cassandra, there was only one fork that got any notoriety. It was in Russia. These engineers had built an ACID-compliant version of Cassandra. They were scratching their itch. They were like, "We want to bring this back to the main line," but they never could because it got so far offline. As far as I know, they went back to regular Cassandra because that's a feature we're adding to Cassandra now.

Probably a good part of this story is we didn't have a lot of intense amount of forking going on. No more forking. The F word was not used. Every large contributor to Cassandra had their own local fork, but it was always about trying to get it back to the main line. You don't want to stray away too far because when you get too far away, you might as well name it something else.

In terms of the relationship that DataStax has with Cassandra, can you talk about that a little bit? How did that come about and the licensing around that? Can you talk about that a little bit?

I'm glad we're talking about this in 2022 because the story is much better. I can look back in hindsight and talk about a few things in those ways. We had a DataStax Enterprise, which was based on Cassandra. It was always some version of Cassandra that we certified and modified with some other things. We added Solar and Spark and some other things. We try to enhance it. We added security, packaged it up, etc. That was the flagship product around Cassandra, and it was very popular. We had a lot of folks sign up for it mainly because they wanted support. They wanted someone to call.

Wasn't there anyone at the time?

There were a few. Probably the biggest fork in the community is the users and the experts. There was a company called The Last Pickle, which I love. It was a group of ex-DataStax people and some experts in the community. They had a great consultancy. When you have a Cassandra problem, you can call The Last Pickle. There's another company called Digitalis, which are some DataStax people who left. They still do Cassandra consulting now. These are all still very good friends of mine. I'd love to see that. The thing that we forked the most was expertise, not code.

There are other companies that are trying to get involved. Aiven is a company trying to create expertise around open-source projects, and Cassandra is one of them. I'm fine with that. Forking code is sad, but forking people is awesome because that means that there are more people out there helping others get something awesome going.

If you look at our engineering, a lot of DataStax engineers went to work for other companies that were using Cassandra, like the Netflix, the Apples, Bloombergs, etc. The expertise started to wander off as well. That's all good because they're still contributing to Cassandra every day. It helped us stay together as a dysfunctional but working family. I talk to people every day who work at 3 or 4 different companies, but we're all working on one thing, Cassandra.

In terms of the business that DataStax is now, how does it see itself?

From a DataStax point of view from the business, it was clear that DataStax Enterprise had its place in the market, but what we realized is that we needed to be a cloud database more than anything. Our biggest competitor from an open-source and commercial standpoint is a cloud database. I don't know if you recall this, but from 2018 to 2019, there was a lot of handwringing around cloud is eating open-source like, "They're going to vacuum up all these open-source projects and leave everyone with nothing."

That's when there was a lot of panic in the open-source community like, "We're going to need new licenses. That's what we need, SSPL, BSL, and all these non-open-source licenses." There were a lot of companies that made that switch, unfortunately. It has left a real bad dent in the open-source database world because now we have a lot of databases that are no longer open-source, and fearing Amazon, Google, and all those. In DataStax, what we decided to do was to put all our expertise into running Cassandra for other people.

That was our DataStax Astra, which is doing amazing. We launched it a couple of years ago. We've had incarnations up to a few years ago. It is clear that people love Cassandra, but they want someone else to run it. Cassandra, being the type of database it is, is purpose-built for running in the cloud. In our cloud service, we can run across multiple clouds at the same time based on how it works. If you have workloads that are in Google, Amazon, and Azure, you can run all three at the same time. It's not that hard.

Do you think that's one of the reasons why there hasn't been an AWS Cassandra because of that?

They do, funny enough. Amazon does have a Cassandra-like clone. It's called Keyspaces. It's based on some Apache code. Keyspaces only runs on Amazon. That's the thing. Microsoft has this CQL layer on Cosmos DB, which is somewhat compatible. It's not 100%. What shows up to me is Cassandra is a mainstream database when everyone else is creating clones of it.

It's interesting as well because you talked about the Dynamo paper, and AWS has DynamoDB. It's also interesting to me as well because these things are very specialized to certain use cases. Elasticsearch is a very generalist product because that was the one that everyone started. That was when everyone was terrified when they discovered that Amazon was making X amount more out of that project than Elastic was. It's a very generalist thing. You can solve tons and tons of different search problems, whereas you were starting to get into more specific workloads. It is interesting to me. I wasn't aware that Cosmos had it.

It's a CQL. It's Elastic turned into an open search. That's what we got out of it. It's two projects. Amazon is claiming the high ground. No, we're really open-source, not that fake open-source. What happens when you have open-source software is people can make choices and split things. What I work on every day is to keep Cassandra together. Let's not break up the family here. Let's try to keep this thing together. I work with Amazon and Microsoft Engineers. It's all good in open-source. We're all working towards the same thing.

A big thing in DataStax is that we develop new features for Cassandra. Instead of doing it in the trunk in the main line of the Cassandra code base, we were trying things out in Astra. One of the things that we had to do as it was growing up in the project was how to introduce new major features into the project without disruption. This is where we have our Cassandra enhancement process or EPs. This is also the maturation of any open-source project. Instead of a random person committing a large block of code into the trunk, now we have a process where we talk about what we want to do and get it to do a design session where we say, "Here are the limitations."

CEPs are amazing. I'm proud of that as part of our community. A lot of other open-source communities have been watching it as well because it is comprehensive. It lets us do the ideation first and work through the details, then gets to code. It's not a big surprise. That's what we've been using quite heavily at DataStax now. When we come up with an idea and start implementing it in Astra to test things out and try it out, we learn a lot. What we do is create a CEP.

If you look at the CEPs, most of those CEPs we've created are based on, "We tried this thing. We think it would be awesome for open-source. What do you think?" It's those new indexing schemes, and people are like, "That would be good, but X, Y, Z," and we donate that code. This is creating a nice downstream contribution mechanism that works in open-source. We're not overbearing. We ask, not demand, and we work together. This is what we're going to see as a success in the project.

How did you come up with that process? Was that something you came up with entirely yourselves organically? Did you take ideas from other projects or organizations?

I can't point to just one. A lot of open-source projects do this. Postgres does it, like Kafka KIPs or Kafka Improvement Process. It's not new. When you think it's new, look around because open-source projects that have matured and have a stable large contributor base always gravitate towards some process instead of committing code. We looked at a lot of the ones that were out there, and it was awesome. It was interesting. I remember one of the things that sit in my mind quite a bit was in 2019 in Las Vegas at ApacheCon. We're sitting in the Flamingo Hotel.

Most Apache committers were there, PMC members, people contributors in the project sitting cross-legged on the carpet in a hallway, having some debates about how we are okay with committing code. It was such an open-sourcey hippie moment. I loved it. No one else would do this. We were sitting around arguing. Everyone was sitting on the carpet. I thought that was amazing. No one was standing up and being overbearing. It's those kinds of moments. We framed out some things and had a long debate on the mailing list, but everything is happening out in the open. There are no committees that are secretly devising things and telling people what to do.

This is interesting because, for a lot of the folk that I speak to on this show, the thing that they're driving towards is increasing their star count and number of people contributing and growth. It's a lot more obvious to them how they define that. How do you define success in that? When you are being used by most of the apps on your phone and all those things, how do you define success now?

It's constantly redefining. We went from installing your software, you had to be an expert at infrastructure and, and basically, DevOps to make it work to now cloud native. The next ten years will be all cloud native. What does that mean? The next success criteria for us is going to be successfully transforming Cassandra into the best cloud native database to be building services with. There's a massive renaissance happening now in Cassandra world.

The amount of what's happening in the project and the code base is staggering. We've got some big aspirations that we'll meet. We have some of the best engineers in the world working on this. Success for us is a stable, useful, appropriate product for people to use. We love having an engaged and vibrant community. That's what I try to work on all the time. When I see an engineer with that look on their face like, "This does solve my problem," that's a success. It's hard.

You're involved in the CNCF as well. How different and similar are those two foundations you've been working across?

It's funny. I had to figure that one out myself. Shameless plug, Jeff Carpenter, who wrote Cassandra: The Definitive Guide, he and I just finished a book for O'Reilly called Managing Cloud Native Data on Kubernetes. The reason we did that book was to force ourselves to start bridging that gap and where this all fit. To try to summate as best I can, the need for this database has not gone away.

The need for it to be a service is more important than ever. It comes down to accessibility like, "Who in the world can use it? Do you need to be a really good distributing engineer? Can you be a JavaScript developer, click a button, and get what you need?" That's a success factor for me. That's where cloud native is going. That's what it means. I want my kids to use Cassandra. I never want them to run it.

Is that the fundamental difference between the two entities?

It's the transition between the last ten years of Cassandra and the next ten years. It's a sad moment, but if you're open-source, you're fine. It's how many people will not know that they're actually using Cassandra. It causes it to disappear.

It's interesting as well. I remember we were talking about working on this stuff in the '90s. As a software engineer, it would not be unusual to go into a data center and cut your hands on stuff there. Over time it wasn't unusual to start hand configuring Tomcat. Getting Tomcat and Apache to talk to each other back in the day was always like, "I was never quite sure if I was doing it right." Over time, those things have bled away. Now, you can start using Dynamo DB and put 100 billion records in it. You wouldn't have cut your hands and be editing XML documents anywhere.

That's it. You say it enough right there. If you have to edit XML, you should be sad. In DataStax, we saw this, and we're putting our thumb under the scale. We want to try to help move this, so we've started two new open-source projects around cloud nativizing Cassandra. We are still heavily contributing to Cassandra. Now, we have some big code-based stuff coming in that will change the project. Around it, we have our K8ssandra project, which is an open-source project. This is where I do a lot of stuff in the CNCF. K8ssandra is a complete package for running Cassandra and Kubernetes.

It's easy. You could install K8ssandra and have a Cassandra cluster hands-off. It does everything you need to do, including running all the maintenance tasks with compaction and repairs. We have Stargate, which slaps all the cloud native APIs on top of Cassandra, so you no longer have to do data modeling, like including a document API. You can do Mongo-style workloads on Cassandra, GraphQL, or REST. These are all the required parts and pieces that we have to orchestrate.

When you install K8ssandra, you get started and get Cassandra. We're pushing the ideal that you don't have to cut your hands on XML. You just install something. It's useful. It's pretty disposable. We're moving Cassandra towards serverless, so you don't have to think about scaling or adding more nodes. That should not be your worry in 2025. If you're worried about how to scale your database, you must be doing retro computing.

It's interesting as well because there's a common trope that things are getting more complicated for engineers and things like that. The inverse is there's the story that WhatsApp had 600 million users and 12 engineers or something of that ratio. You are probably painfully aware of how many engineers you'd need to handle 600 million users years ago. It's easy to lose sight of that achievement because people are going, "Kubernetes is too complicated. Front end JavaScript framework is too complicated."

In terms of the infrastructure load for Webscale stuff, I was working at a bank, and I remember they had someone whose job at lunchtime was to watch the Sun E10K server load and then press a button that would stop people from being able to log in because they couldn't manage any more people. The number of concurrent users was 900 or something, and they had E10K machines. I wouldn't say it was state-of-the-art, but that's how people lived.

Basically, you lived in a constant panic state. How I cut my teeth in infrastructure hard was dealing with continuous scaling problems. As I said, I don't want my kids to have to worry about that. That's another problem. You mentioned these having few engineers. There's a quote that I put in one of the chapters that I wrote, "Progress in technology is when we have the ability to be more lazy."

You see that in manufacturing worldwide. There are less manufacturing jobs because we automate everything. Look at the gigafactory the Tesla is building. There are three people in there, and it's building everything. That's progress. The same is true with DD Infrastructure. We should be able to scale to billions and billions and have a few people watching over it. That seems like progress.

Patrick, that's a great place to finish. It's been super interesting talking to you. It's been great to know your story. I admire the fact that you've been stuck by the project for so long. It's been inspiring.

A lot of people ask me like, "Why have you been at DataStax for several years?" The only reason I could tell them is because I came into this company because I wanted to contribute as much of the code to Cassandra. DataStax has tried to keep up with my values, so I'll still work here. There have been times when they haven't, but it was worth a fight.

Patrick, thanks again. It's great to know the story. Thanks again for your time.

Thanks, Ben. I appreciate it.

About Patrick McFadin

Patrick McFadin DataStax Podcast