Project Calico Interview | Craft of Open Source Podcast
We caught the wave with Kubernetes. We had this solution that seemed to fit so nicely.
Ben Rometsch and Matt Althauser sit down with Shaun Crampton, Principal Engineer at Tigera, who shares his experiences as a core developer on Project Calico. He talks about their origins of moving out of the era of doing forklift moves into OpenStack and down the development of network splitting, focusing more on a dynamic firewall approach. He opens up on his experiences writing their first 10,000 lines of code, their identity-based policy, and the value they are offering to clients. Shaun also provides insights on what may be next for Kubernetes and how open-source projects must get ready for any change ahead.
---
Welcome back, everybody. For some updates, it has been raining in London. It's depressing as hell but it's great to have a bit of an intro. It's great to have Shaun Crampton with me. Shaun in the background has a vise and a spirit level, which is the first time that's happened. I've had a few oscillators but I've never had a vise. That's pretty good. Shaun, welcome to the show.
Thanks. I'm pleased to meet you.
Do you want to give us a bit of background? You grew up in the UK but spent some time in the valley.
I grew up in Leeds and then ended up working for a software company in North London, Metaswitch. Through various turns of events, I got asked to move out to the Bay Area. We were setting up a new office in the Bay Area. We already had an office in Alameda. They wanted to set up a new office in Palo Alto to do R&D and fun stuff. We went for that. 6 months turned into 5 years out there and then I decided to go down the green card route. I came back to the UK.
I carried on working on the project that I was working on when I was out in San Francisco. I came back to London. That was Project Calico at that company. Eventually, that was spun out as a new company. I followed the project to the new company. I had a small London team joining in on that with a bigger team in the Bay Area. That's grown into Tigera, which is the company I still work for.
Do you want to give us a bit of an overview? I like these episodes where I'm naive to the problems that Calico is trying to solve. I'm one of those people that don't have much idea about it.
I can give you our origin story. One of the founders of the project is Christopher Liljenstolpe. His background was in an intercontinental enormous provider networking, the likes of cable, wireless and that stuff. In that world, they have a target. If something goes wrong with the network, they need to be able to get someone out of bed and get the group of people together who need to be together on a call to make decisions, fix the problem, reroute the traffic and reconfigure the switch within fifteen minutes.
That was his background. He moved away from that and was doing some fun stuff in the Bay Area. He was working for a biotech startup. He got into the position where he was managing the OpenStack rig with a few racks and hundreds of virtual machines and that sort of stuff. He did an upgrade to the OpenStack cluster and the networking broke.
Forty-eight hours later, he was still debugging the networking. This is someone who has built networks at an intercontinental scale with tens of millions of subscribers at backbone networks that were very complicated, except they're not as complicated as an OpenStack network that is running on this small cluster in his lab. His brainwave was, "What if we built the OpenStack network in the same way that I build intercontinental networks?"
He turned each of the hosts into a router where the routes get aggregated up to a core instead of solving the network security questions with lots of virtualized networking, which is where it gets complicated. You have overlays that are running to all of the nodes. You might have ten different overlays going to all of the nodes. Each of these overlay networks is bridged with a virtual device that's on another node. You're networking between two VMs on a single node. You might go through five hops to get through all of the virtual routers. It might go over ten tunnels. It's very complicated.
We were moving out of the era where people were doing forklift moves into OpenStack and those kinds of things. People were starting to develop their applications for it. All that complexity was starting to show its age. He thought, "What if we built it as I built those networks, use BGP to advertise routes around and it's a very simple use of BGP to learn the local VM routes, advertise them up into the network and then do away with all of the complexity of all of these overlays? What if we did that?"
That was a thought experiment that led to Calico. Our networking approach went down that path but there's a second question, "I've removed the overlays, which were providing the isolation between traffic. How do I deal with that problem?" The idea is to split the networking into a way that you can do networking efficiently, take the network policy in isolation and solve that separately using a dynamic firewall approach where we have an agent that runs on each host.
It's listening for the status of the network and the network policy and calculating a very reduced subset of policy that needs to apply on this particular node because of this particular set of workloads that are on that node. It applies that policy using whatever techniques we decided. Initially, that was using IP tables, which is a proven firewall technology that ships inside the kernel and IP sets, which allows you to have large lists of IP addresses and things like that.
You can efficiently manage a very large list of allow-and-deny IP addresses. We split those two questions and it worked. We got some adoption in OpenStack and some big adopters mainly because the approach that we were taking was good if you were network-savvy and you were designing a modern data center. These BGP techniques weren't entirely new.
Big companies were starting to adopt those for building out their data centers. It fits nicely into that architecture. We were building this about the time when Kubernetes started to come along. It was before Kubernetes. I don't know if you want me to explain Kubernetes or if your audience would be familiar with that.
Generally, people will be familiar with it but not from a networking point of view. Out of interest, I need the stuff to work at that level of the operating system, the stack and that stuff. I'm often curious when I'm fiddling around with stuff in AWS or trying to figure out how they manage all of that because it seems horrendously complex. It seems odd as well that there must be billions of virtual network devices in AWS and 10,000 times fewer physical network cards. To step back a bit, pre-Kubernetes, how were people lifting this stuff? Were they building this stuff all by hand and working with very primitive things that Linux provides?
Before Kubernetes came along, which is an orchestrator for containers that lets you run lots of containers per host, the big move before that had been from physical hardware where you racked a physical piece of equipment, SSH-ed into it and ran commands to set it up and then it lived for years running whatever software you put on it. The big move before that had been the likes of VMware and then OpenStack coming in as the open-source competitor to VMware. It has been the same transition to virtualization but people were using virtual machines instead of containers.
Before Kubernetes came along, the big move had been from physical hardware. You have to rack a physical piece of equipment, SSH-ed into it, and run commands to set it up.
A virtual machine has a lot higher overhead than a container because it's simulating a virtual CPU and running an entire kernel inside the virtualized hardware and everything. Over time, those overheads have been chipped away. When you have a running virtual machine, it has very low overheads because the hardware has been changed to allow efficient cooperation between the underlying system and the virtual machine but you still had some overhead there. You still had particularly a big overhead when you want to start and stop a virtual machine.
When you want to scale up and scale down, you have to boot an operating system. Even an optimized image is going to take ten seconds to boot minimum. The images are quite big and because they're virtualizing hardware, you have to ship around a whole disk image. You have this whole ecosystem that built up with people having these massive filer systems that have a fast network to shift all those images around and all that stuff. That's what people used to do.
I like to think of the switch to Kubernetes and containers as the next order-of-magnitude shift that we have. There's an order of magnitude increase in churn rates and the ability to dynamically spin things up when we moved to virtual machines. When you move to containers, there's another order of magnitude. In an OpenStack cluster, you might have ten VMs started per second or something across a large cluster that is doing lots of churns but in a Kubernetes cluster, 100 pods started per second is normal.
When you're scaling something up and you could burst to 1,000 if you're scaling something big, it's the next order of magnitude. That's where we started to find a good niche for Calico's approach because we were doing network aggregation, which means you have an IP address per pod on a particular host but we allocate all of those from a subset of the network. We carve out a little chunk of the network, assign it to the host and then allocate the individual IP addresses. From that, we can advertise into the network, "This chunk is here. This cider lives on this host."
We don't need to do anything when we churn pods. You can churn ten pods per second and we reuse those IP addresses on the host without needing to disrupt or transmit any information around the network. We had this security piece where we designed that and then improved that over time to keep up and do those high levels of churn but the architecture was already pretty good for getting from 10 to 100 to 1,000 churn operations per second.
That's the way we built that out. We have been building for OpenStack. OpenStack was quite mature. We were getting a bit of traction but there were lots of stable deployments of OpenStack. There weren't many new OpenStack things coming, whereas Kubernetes was the up-and-coming thing. We had the right technology to scale Kubernetes clusters in that way, applying network policy at the same time scaling the network efficiently and it went from there.
When you were building this original implementation, did you have in your mind that you wanted to build something that was a generic solution rather than something that had a moderately hard dependency on OpenStack? It's tremendously hard to project into the future how these workloads are going to be run. It's constantly shifting. Kubernetes feels like this Goliath. I'm old enough to remember everyone. VMware was in that position and you very rarely hear of it. A lot of that technology has gone on and opened up and stuff. When you were writing the first 10,000 lines of code, was it like, "We need to be careful to make a general solution here rather than something that's tied to OpenStack?"
We decided to have an external data store where OpenStack is very heavily plugin-based anyway but we decided not to store our data in OpenStack because it was quite awkward to add new things into that, especially as an outside team. We need to get the OpenStack deployment model all merged in and then it would have to fan out to all the OpenStack distributors. It's adding something to the kernel when you might get it working.
You need to wait two and a half years.
We decided to have an external data store. We settled on etcd quite early on because it seemed well-suited to the type of information that we were trying to get around. The architecture naturally flowed. You had a plugin to OpenStack that wrote things into etcd in our data model. That flowed down to the individual nodes, which were implementing the networking and the policy. It naturally flowed. We had this data model that we were maintaining in etcd. If we came to another platform, we added support for Docker first, had a libnetwork plugin into Docker and then eventually hit on Kubernetes. It seemed to be getting a lot of traction so we doubled down on Kubernetes.
It naturally worked out. That data model was not far off what we needed. The model for containers and virtual machines is almost the same. You have hosts and they have multiple workloads on them. We were prescient enough to call our data model entry a workload instead of a virtual machine but apart from that, networking a Kubernetes-style pod and a virtual machine is identical. They have a Linux network interface that sticks out into the host network namespace on the host. In the case of an OpenStack VM, it's called tap1234. In the case of Kubernetes, it gets created by us but we called it Kali1234. Beyond that, it was remarkably similar.
We caught the wave with Kubernetes. We had this solution that seemed to fit so nicely. Kubernetes didn't ship with any security policy in the box in its first release. We worked with the Kubernetes team to define the first set of policies that they were adding. We gave feedback from our experience. The networking approach that they had adopted was very compatible with what we were pushing. Through their history at Google, they would come to the conclusion that every pod should have its IP address. That was going to be the cloud-native way of doing things.
Every pod gets an IP address and then it has full access to all of its ports and everything. It's a bit like a little virtual machine. That fits beautifully with our approach of how every workload should get an IP address and then we aggregate them up through the network. That's very efficient. Early on, we got into discussions with them and helped to define that network policy API that was added. We were putting in our experience. We made a few changes to fit nicely with the eventual conclusion. It quickly married up. We were early into the Kubernetes network policy arena because we had something ready to go with a little bit of tweaking.
For a layperson like myself, what benefits does it provide if you're lucky or unlucky enough to run a Kubernetes cluster like you?
Kubernetes is an orchestrator for containerized workloads. They call them pods. A pod has potentially multiple Docker containers or whatever type of content you have running inside it. Kubernetes architecture right at its core is declarative. You tell Kubernetes, "I want there to exist five pods running these images with these environment variables set up. I want them to expose these ports through service and so on." You put that in the Kubernetes API and then the Kubernetes orchestrator looks at that, calculates what it needs to do to make that happen and away it goes.
If there's any disruption, let's say it sets up the five pods and then someone goes in and deletes one of those pods manually or kills a node or any one of these things, the recent group of Kubernetes gets triggered whenever it detects any disturbance and goes and checks, "Do I still have five pods? I've only got 4 but there should be 5. I'll start a new one." It maintains this homeostasis in your cluster running your suite of workloads that build up into your overall application. It solves problems like service discovery and things like that.
You can say, "The pods with these labels set should be represented as a service," and then it assigns an IP address and a domain name to your service. When you connect to the service, the service domain name maps to the IP address of the service and then that maps to one of the backing pods and so on. It's a well-designed orchestrator for building complex microservice applications where you need to be able to load up a bunch of applications that all need to talk to each other. You want replicas of this and you want them to maintain various properties of the cluster.
A big prerequisite for these microservices applications is being able to have network connectivity between all of your pods. That's what they all do. Calico is one of the options there. We're one of the most popular options if not the most popular. I've got a statistic here too. We powered 2 million nodes and we have a rough guess and 1.3 billion polls on Docker Hub. We're powering a lot of that. There are also other things. A lot of the clouds have their CNI plugins and networking plugins that work there as well. We integrate with some of those to provide the policy.
The policy side of the equation allows you to implement a tight policy that says, "This group of workloads should be able to talk to these but not these others." We call it an identity-based policy. When you have a lot of churn in your network or workloads like this IP address belongs to this workload one minute and then if that one is deleted, it might get reused, you don't want to base your policy on the current IP addresses and so on. You want to say, "This group of pods with these labels set on them belongs to this group and these ones don't," and then allow traffic between these based on their labels.
We have a policy engine that takes that all in, monitors the cluster for changes and keeps updating the network policy in real time as things change. Compare that if you have something like an old-fashioned network. Getting a firewall rule changed would submit a ticket to the security team and apply the firewall rule. It would be a fully specified rule, "I want to open this port on this server." That doesn't work when you're scheduling containers for pods at 1,000 pods per second or 100 pods per second. You can't submit 100 or 1,000 tickets to your firewall team. They need to scale up quite a lot for that.
I do remember those days. It's funny how you go from physically earning the machine to then having either a physical machine or a VM but you would look after it. Every few days, you would log in and update anything that needed updating. You talked about the next order of magnitude after that. They always feel like the last step, "Why would you need more than the VM?" You're like, "This is much more efficient." In detail, when you say we're powering too many pods, that's not in the container itself. That's the thing that's managing the container and providing network services to that container.
I jotted down some numbers. Five-hundred thousand Kubernetes clusters are powered by Calico. The average size of a customer must be about 4 or 5 nodes, mostly small clusters.
I bet that's a very long tale, I would expect. In terms of the origins and the commercials of the project and how you got it up and running back in those OpenStack days, how did that work? Was there a corporate sponsor? How did it start?
We started inside this other company, Metaswitch. We were a mini-startup team trying to build this thing out inside that company but it grew to a point where it needed a VC-style investment to grow it. The parent company wasn't willing to do that. They're not an investment company. The R&D project wasn't going to pay off in the right timescale for Metaswitch to keep sponsoring it.
At that point, the leaders of the project decided, "Maybe we should spin this out." They went and got funding and that's where Tigera came from. We spun out as a new company. It was mainly that team in the Bay Area that spun out. There was a small group of us in London too. Since then, we have expanded. We have a Vancouver office and a Cork office as well.
The project was in a bit of an awkward state being inside Metaswitch. They needed more predictable paths to revenue for projects, whereas an open-source project, it's a risky gambit, "How are you going to commercialize it?" As Tigera, the approach we have taken is to go for this open-core model where we have an open-source project and stand behind it. It's used massively across the world. Our model is to attract people to that by making it a solid, reliable, performant and great solution in and of itself. There's a group of people who are already familiar with Calico whom we can hopefully sell on our enterprise version, which has additional features on top of the open source.
I'm always interested in the levers for that because we have a similar model. We're always arguing about where that line is drawn. What gets you those sales? Why do people give you money?
It's very hard to make it work. Few companies have made that work. Red Hat is probably the main example that I can think of. We decided against that early. Over time, it's got a lot clearer but at first, we started adding features for things like observability into your cluster. When you deploy a network policy, for example, if you're in a small cluster and you're tinkering and applying a network policy, then you can see everything that's happening in your cluster.
When you deploy a network policy in a small cluster, tinkering and applying a network policy, you can see everything happening in your cluster.
If packets start being dropped, you can tie it to, "I did this thing and then this application stopped." You can easily debug it but once you get to a large scale, you need tools to inspect the cluster and expose, "What packets are being dropped across the cluster? How can I understand everything that's going on in my cluster? What connects to what?" We added a user interface that in real-time shows you what's being blocked, how much traffic is flowing through each policy and those kinds of things.
Flow logs and the things built on them were our first commercial pieces. At first, we were finding our feet. We added quite a few small commercial-only features that were targeted at specific deals where we've got an enterprise customer who said, "I'll pay to do this." We put that in the enterprise-only thing. Those small features have come back to bite us a little bit in terms of maintenance because you've got little sprinkles of differences.
It becomes harder to merge the open source into the enterprise fork each time you add one of those tiny features, whereas when you add a big feature, it tends to be whole new components that you maintain on the enterprise-only side. That's one of the pain points. Flow logs were our first killer feature. We added a bunch of enterprise features authenticating your enterprise authentication system.
You made the workloads themselves.
I was mainly thinking of our admin interface and then much more advanced RBAC so you can have different teams managing different policies and that kind of thing. Features like more advanced policies are one of the early ones that we added to support multiple teams who have different requirements. You have a security team that wants to set a baseline policy that cannot be overridden and then that's protecting the cluster. You can let the developers open specific ports for their particular workloads, which is how Kubernetes' network policy is focused.
You've got those more advanced policy features. We built out in a lot of directions. We're going in a direction of a much broader security solution for your cluster. There's a lot of demand out there. The Kubernetes environment is starting to head toward the early majority phase in the cycle, getting a lot more people who are wanting a package solution that offers every type of security they need so you can sell something, "Here's the package. It gives you SOC 2 compliance, FIPS and all these compliance things that you need to be able to sell your security team on your Kubernetes cluster now being secure." You can move on to something else because you've got ten hats as the DevOps person in a small thing.
We have also built a SaaS product on top of our enterprise product to make it much easier to deploy into small clusters and that kind of stuff. Those are the areas we have invested in. We have some commercial-only additions. They're mainly in that overall security solution package. We call it runtime defense or runtime security where we're monitoring what your pods are doing at the Cisco level and things like that. We have chosen to put that in our enterprise offering because it's separate from the network and network security pieces.
If a pod suddenly starts sending out the entire database erroneously someone gets a call at 3:00 in the morning.
That's the hope to be able to catch that thing. A simple example would be, "This pod tried to write to /etc/password." What business does any pod have in writing to that file? You can detect things like that and more advanced patterns as well.
In terms of contributions and stuff, I'm going to guess that it's a fairly technically complex project. Would I be right in saying that the contributors tend to work for larger organizations that are wanting to upstream stuff that they have been hacking around themselves?
The thrust of the project is the Tigera team. We're fairly centralized but we have had a few big contributions where we have worked with other organizations. Microsoft contributed a Windows port of the Calico data plane, the policy engine and so on to speak to their technology instead of IP tables. We had a team at Cisco whom we're still working with closely. They're very active. They have contributed to a VPP implementation that uses their user-mode networking stack.
I'm running to the point where I don't know the acronyms.
It's a very high-performance specialized networking stack that replaces the Linux kernel. We had a reasonable level of pluggability there. It was easy for Microsoft to come in and the VPP team to come in and add to Calico rather than needing to build their solution. We have taken advantage of that as well to add a BPF data plane to Calico. We have the same policy model. We do all the same tricks with networking. We're investing in the BPF policy engine for reasons like improved performance and being able to do fun things that you can't do in IP tables. We have had those few big contributions. The second set we get is highly technical users who are scratching an itch. We get a few of those as well.
How much effort do you need to put in to get to a point where you can start making a line of code?
For a very long time, it was hard to contribute to Calico. We adopted a micro repo model way back when where we had each component in it. The first version of the product was in Python. When we moved to Go as we moved into the Kubernetes ecosystem, we decided GOlang is the way to go. Everything is written in Go. It's the ecosystem. We moved to that.
Having lots of repos that separate up your components made it a complete nightmare for anyone to contribute because they would have to go into one repo to add a field to our data model, update the pins to point at that updated version and then submit the PR to the other repo. It's very complicated. The second we combined everything into one mono repo where everything is together, we got a serious uptick in those itch-scratching contributions. It becomes a tractable problem. If you're familiar with Calico, you know what our components are. You can go from one to the other. We have had a lot more since we did that. The project is about seven years old in 2023.
One of the things that we did a couple of years ago is we had our front end and our API as separate repositories and then combined them. It immediately made a difference in terms of our engagement in GitHub because you didn't have two issued trackers. You didn't have stars go into two different repositories and all that stuff. It made a big difference. I could imagine if you had fifteen or something that it would be a big improvement.
All those things made our lives much easier. The folks are working on it full-time as well. We had automated scripts for updating our pins and so on but it would take hours for things to trickle through the repo. We got stuck on that model for too long. It's much better. It brings all the activities together. You don't get a bug report sometimes on the component and sometimes a duplicate report in the main Calico repo where our docs lived. We have 1 repo, 1 stream of issues and 1 stream of PRs. You can see what's going on. All those things multiplied up because we had more repos in open source. It had reached something like 8 or 9 and that was the breaking point. We sometimes spend more time on pin updates than we spend on right on actual work. I'm very glad to get that sorted.
You mentioned the adoption cycle of Kubernetes. Do you have any inclination or thoughts on what's next? There have been talks about single-process kernels. As we're speaking, it sounds quite wacky. If you look at things through a technical lens, it does feel like there are a bunch more optimizations. You can in the mist in front of us see something emerging. Do you spend a lot of time thinking and worrying about that but trying to be aware of where the next Kubernetes is coming from?
A little bit. A lot of the people who have been playing around with that stuff have been building it on top of Kubernetes. People are adding single-process mini-VMs but you can represent that as a pod with a special flag. The teams that have worked on that have adapted the networking so that an ordinary CNI plugin like Calico can network one of those pods. I do think at the very high end where people are trying to do high throughput or they have demanding workloads, there's a lot of room for innovation in running policies and things on the network card, enclaves and things like that.
Various people are trying to do that but the barrier to entry for that stuff is quite high. You need to be a cloud provider who's already doing it and has a team devoted to it probably or you need to be big to have the investment in all the clever stuff. From an isolation point of view, some folks would love to see all the Kubernetes control plane components running in an isolated and hardened environment. Each pod is isolated from a network or a CPU point of view.
AMD has encrypted memory and stuff. Each process or mini-VM can have encrypted memory. Even if you were to break out of your process and access someone else's memory, it's encrypted with a key that's embedded in the CPU and the CPU knows when you're switching the process. There are all kinds of things happening like that but at the moment, they all seem to fit into the Kubernetes mold. I don't know if there are any things on the horizon. There are competitors to Kubernetes like Nomad. Kubernetes seems to have momentum at the moment. Nomad is similar but has some advantages.
It's interesting. It makes me think that there's a possibility that Kubernetes is still there but it fades into the background a little bit more for the bulk of engineering. I used to write IP tables. I still don't understand them. It's still there. There's a bunch of stuff around Linux kernels and virtual devices that don't disappear but the vast majority of the world's software engineers don't have to worry about them.
That's an important trend. People are looking more for packaged Kubernetes solutions where either someone else is managing that or it's something that they deploy and it has batteries included with everything loaded in. That's a trend. There's a bit of a movement around serverless where you don't even have a container. You just write some code and deploy that code into a serverless environment and the serverless environment handles persistence, scaling and all those kinds of things. Having played with that a little bit, I still feel like that hasn't quite reached the maturity level where it's a good way to write software because as soon as things get bigger than you can fit in one function, you end up in a bit of a mess there.
Movement around serverless open-source projects hasn’t reached its maturity level yet where it’s a good way to write software. As soon as things get bigger than you can fit in one function, you end up in a bit of a mess.
Shaun, it has been super interesting. I was never quite sure how much mileage I would get out of something low-level but it has been fascinating. There's the fact that you've managed to win that way through these seismic shifts in how people do this stuff. There have been projects that have wanted to come on the show. They tied their projects onto a bunch of projects that have fallen by the wayside. There are choppy waters that you're sailing through at the moment.
We had a great team with a lot of solid experience in building solid and reliable software. That was the pedigree of the team that spun out. We have been able to bring quite a bit of maturity into the startup that we built around it. There's a lot of luck. We built something for OpenStack but the way we built it scaled to the next order of magnitude, which was what was needed at the right time. I feel lucky about that. I still remember feeling like it was taking off.
I went to do a talk at KubeCon. It was a KubeCon in London that had 400 attendees or something. In 2023, they're 15,000. They moved my talk about networking and security from a little stage to the main stage. Everybody wanted to come to see it. I was expecting to talk to 50 people to effectively do a keynote. That was when it felt like there was something here. It's exciting.
That's great to hear. Thanks so much for your time. Good luck. It would be interesting to see where things go in the next few years for sure.
Thanks for your time. It was fun.
Take care.
Distinguished Engineer at Tigera leading development of the Calico dataplanes
Specialties: Go, Python, Networking, Calico, Android development, iPhone development, GWT, JavaScript, AJAX, HTML, Java, Server architecture, Project management.