Kubernetes Networking for Network Engineers - C2C034 Artwork

Cables2Clouds

Join Chris and Tim as they delve into the Cloud Networking world! The goal of this podcast is to help Network Engineers with their Cloud journey. Follow us on Twitter @Cables2Clouds | Co-Hosts Twitter Handles: Chris - @bgp_mane | Tim - @juangolbez

All Episodes

Cables2Clouds

Kubernetes Networking for Network Engineers - C2C034

May 29, 2024 • The Art of Network Engineering • Episode 34

Send us a text

What if the future of cloud-native networking could revolutionize everything you thought you knew about Kubernetes? Join us on this episode of Cables 2 Clouds as we continue our "Cloud Demystified" series with a deep dive into Kubernetes networking. We're thrilled to have Nicolas Vibert, a seasoned pro from Isovalent with nearly two decades of experience at Cisco, VMware, and HashiCorp. Together, we explore the essentials of Kubernetes networking through the innovative lens of Cilium, a CNI specifically designed for cloud-native environments.

Nico shares his unique journey of learning Kubernetes from a network engineer’s perspective, emphasizing the critical role of hands-on experience and mentorship. We also discuss the creation of hands-on labs and educational materials tailored for network engineers. This segment is loaded with analogies to help traditional network professionals grasp key Kubernetes concepts with ease.

Ever wondered how Kubernetes orchestrates its complex networking operations? We break down the intricacies of the Kubernetes control plane, likening it to traditional network engineering concepts for clarity. Discover the limitations of Kubernetes' default networking tool, kube-proxy, and why modern CNIs like Cilium offer a more efficient solution for large-scale deployments. Nico explains how Cilium leverages eBPF maps for effective traffic routing and load balancing within Kubernetes clusters. Tune in for invaluable insights into the evolving landscape of cloud-native networking solutions.

Purchase Chris and Tim's new book on AWS Cloud Networking: https://www.amazon.com/Certified-Advanced-Networking-Certification-certification/dp/1835080839/

Check out the Fortnightly Cloud Networking News
https://docs.google.com/document/d/1fkBWCGwXDUX9OfZ9_MvSVup8tJJzJeqrauaE6VPT2b0/

Visit our website and subscribe: https://www.cables2clouds.com/
Follow us on BlueSky: https://bsky.app/profile/cables2clouds.com
Follow us on YouTube: https://www.youtube.com/@cables2clouds/
Follow us on TikTok: https://www.tiktok.com/@cables2clouds
Merch Store: https://store.cables2clouds.com/
Join the Discord Study group: https://artofneteng.com/iaatj

Nicolas Vibert: 0:00

Again, cilium is one of the CNIs and it was built for, almost like Kubernetes and the cloud IT space. The challenge is what you get by default with Kubernetes is something called kube proxy.

Chris Miles: 0:18

Welcome to the Cables to Clouds podcast Podcast. Cloud adoption is on the rise and many network infrastructure professionals are being asked to adopt a hybrid approach as individuals who have already started this journey. We would like to empower those professionals with the tools and the knowledge to bridge the gap.

Alex Perkins: 0:49

Hello and welcome back to the Cables to Clouds podcast. I will be your host this week. I am at Bumps in the Wire on socials. Joining me, as always, are Chris and Tim at BJP Maine and at Juan Golbez. I almost forgot your handle there for a second Tim.

Tim McConnaughy: 1:08

It's fine.

Chris Miles: 1:09

Alex, I just felt you didn't even say his name. Yeah, you're just assuming everyone knows who you are. You didn't even say your own name.

Alex Perkins: 1:16

Oh, wow, yeah, Well, I'm Alex Perkins. Yeah, so we got a new installment of our Cloud Demystified series this week and this is going to be kind of more of a series than some of the past ones. We're still planning on doing a podcast episode and then kind of a live demo walkthrough, but the difference is this is definitely going to need more than one part, because we're going to be talking about Kubernetes Networking 101. So this is something I think we've been talking about doing since we started the podcast and we got a great guest that's going to kind of guide us through, and he's right in the middle of all this and definitely comes from a very strong networking background, and that is Nicolas V Barrett. Why don't you go ahead and introduce yourself?

Nicolas Vibert: 2:08

Hi everyone, thanks for having me on. So I'm Nico. I'm a Senior Staff Technical Marketing Engineer Very long title, it's ISOVedant. And, yeah, I've been in the networking space for approaching close to 20 years now, which is quite scary. And deep into the world of cloud native and Kubernetes networking at IsoValence Awesome.

Alex Perkins: 2:38

And you've been at a lot of different places, right. I mean you've been at Cisco, you've been at VMware, now IsoValence. You've been kind of all over all the trends and everything throughout the years. Different places, right. I mean you've been at Cisco, you've been at VMware, now Isovalence. You've been kind of all over all the trends and everything throughout the years as well, right.

Nicolas Vibert: 2:49

Yeah, I guess I've seen a fair bit of different trends across my career. I started more into networking, working for the Cisco channel doing network support, then went on to design implementation for the Cisco channel, var doing network support, then went on to design implementation, did my CCNA, ccnp, eventually did my CCIE and went to work for Cisco and work more on design and consultancy. And then I got excited by the prospect of network virtualization.

Nicolas Vibert: 3:26

So I went to work at VMware on NSX and then worked on network automation at HashiCorp and now working on almost the next evolution of NSX and network virtualization, which is Celiom, which is what I'm really excited about.

Alex Perkins: 3:45

Yeah see, I even forgot to mention HashiCorp. So yeah, you've definitely been all over the place A lot of heavy hitters, yeah, yeah, big names in the industry.

Nicolas Vibert: 3:54

Yeah, I guess when I think about networking, I think you know what are the cool areas that you can go and explore. And I think about cloud networking. Right, that's something I'm passionate about. I literally co-wrote a book about cloud networking a few years ago. And then network automation again, that's something I dived into a few years ago around learning Python and Go and getting involved in the work of NetDevOps, and that's kind of that's two areas. And then the third cool area I would say is Kubernetes networking. So I think as a network engineer, I feel like you can go to any of these fields and you'll be okay for the next five, 10 years of your career. But yeah, I've done an explore the three of them myself.

Chris Miles: 4:46

It's funny that you can say it like that. If you lock in on this new thing, you'll be solid for the next five years that's how volatile this thing is Whereas if you're a doctor you'll be good for probably the rest of your life. It's the treadmill right, the tech treadmill.

Tim McConnaughy: 5:04

Assuming it's adopted as a technology that everyone loves, then you're probably good for five years at least, right yeah?

Nicolas Vibert: 5:12

Sometimes you have to pivot a little bit just to extend your career, your lifespan, absolutely, absolutely, even though it makes you uncomfortable, for sure.

Chris Miles: 5:22

Yeah, I'm sure the token ring guys out there are all. You know we're a little bitter for a while, but they're all right. Yeah, for sure.

Alex Perkins: 5:31

All right, so before we really dive into kind of the meat of the episode, let's discuss the elephant in the room, if you will. You know, cisco recently announced it was like right around Christmas time, I think that they were buying.

Alex Perkins: 5:43

Isovalent. How do you see as far as industry impact, what do you guys think about this purchase and what it means for Cisco moving forward and networking as a whole? Because Cisco, whether people like it or not, really guides a lot of the networking industry. So a purchase like this is definitely something to pay attention to, and I'd love to hear you guys' thoughts about that particular acquisition.

Nicolas Vibert: 6:11

Yeah, happy to share my thoughts. That's not the company thoughts, right? It's just my own kind of personal Right of course, now that the disclaimer is out, I'm excited.

Nicolas Vibert: 6:26

I would say this came as a surprise. I had no idea this was in the works, but, on the other hand, it just made absolute sense for Cisco to join Cisco to suddenly to be able to scale with Cisco. There's nobody bigger in networking. Cisco was actually one of the initial investors into Isovedant when the company started in 2017. When it started in 2017. But I just thought, you know, they had missed the boat and somebody else would come in or maybe we would go and, you know, ipo eventually. So, like I say, a surprise, but I'm excited.

Nicolas Vibert: 7:17

I think the other interesting aspect is we're not actually joining, like the data center BU, or you know, we're not being merged into ACI. Not that we, you know, I'm sure we'll integrate with ACI eventually, but we're going into the security BU, yeah, which is kind of more software centric, and so that's going to be interesting integration points. There's going to be, I'm sure, a lot of education about what some of the Cisco products can do and how we can integrate with them, education about our perspective coming from the Kubernetes space and how we kind of work together. You know that's yeah, I'm excited. There's lots of unknown, but good unknown.

Alex Perkins: 8:13

It's great that you called out the you're going into like the security BU, because we did talk about this. We had a news episode where we kind of covered the acquisition and this was like actually a big point of our discussion was why security be you? And it seems like they're just integrating with everything and becoming bigger and bigger. So it makes a lot of sense and you guys have a lot of mature products coming out, especially on the security side as well.

Nicolas Vibert: 8:40

So I think for sure, and I think it's common knowledge at Cisco, where I started to look at building some kind of cloud firewall product using eBPF, which is one of the technologies supporting Selium, and they realized it would take them, you know, six, 12, 18 months to build something. So it was either you- know, do it yourself or buy. The company that's with some of the founders behind eBPF and Cisco has always been flush with cash, so that's never a problem.

Tim McConnaughy: 9:14

I don't know. After that $20 billion price tag they paid for Splunk, I was expecting them to take a little bit of a break.

Chris Miles: 9:21

It's going to be a bit rewarding, too, on that front as well, that the work you guys have been doing is validated to a degree, right? Because obviously we are currently on a podcast that's about cloud networking. It's a very niche market and not very big, so it's got to be, at least if you have eyes from Cisco to make that kind of acquisition. Obviously, this is a space that is going to drive business and needs to have innovation within the space as well, so I'd say it's a good thing for all of us in the long run, right? Yeah, for sure.

Nicolas Vibert: 10:00

It's a successful exit Again. We'll see. We can talk again in a couple of years' time just to see if it worked out. But yeah, I'm buzzing.

Alex Perkins: 10:11

Okay, let's start breaking into the Kubernetes networking side of this and I promise it will tie back into Isovalent and what you guys do and it'll all make sense throughout the theme of the episode. All right, so let's start with breaking into learning K8s or Kubernetes. Right, a lot of people shorthand it K8s. You know it seems very overwhelming and scary, right? A lot of network engineers don't even know where to begin. So I guess why don't we start there? Like, what was your own personal journey like to start learning about it, and do you have any recommendations for that path?

Nicolas Vibert: 10:47

I think my journey started with fear. There you go.

Tim McConnaughy: 10:52

Most do as they should yeah.

Nicolas Vibert: 10:56

But I really felt, like you know, I've been able to pick up a lot of kind of different technologies. This was one I could see was coming down the line. I could see it was becoming ubiquitous and you know, I was at VMware for almost six years and clearly becoming part of the infrastructure world, the cloud world, and I would go and learn everything else apart from Kubernetes. I was just like no, this is way too complicated, I just can't get my head around it. But yeah, eventually for me, and I'm sure for many people, it's about getting your hands on talking to people, finding great mentors to help you and teaching you in the world that you might understand instead of you know. I think a lot of the documentation is and it's pretty good of all the Kubernetes documentation, but it's not necessarily geared towards people like me, right.

Alex Perkins: 11:55

Yeah, I think I think that's the biggest thing, right? Is I mean like, like we said, we've wanted this episode for a while but honestly there's not many people that come from a real strong networking background that are in this space. Obviously, we got you on here, but I honestly cannot think of many other people that work in this space and can bridge that gap for anyone. So did that even exist when you started or is that something you just through trial and error right?

Nicolas Vibert: 12:23

No, there are. I would guess there are a handful of people. I can think of A few people at Isovalent who you know worked at Juniper Nicira, but yeah, there aren't a lot of us that kind of pivoted. And again because I think we partially because you know network engineers are traditionally resistant to change, you know, without offending, anybody that's true, they are yeah, and also again because the material wasn't there for me and for me.

Nicolas Vibert: 12:58

Eventually I pivoted to selling Kubernetes because I saw that Celium. I realized that that's going to be the next big thing. And I saw the people behind NSX, who essentially created software-defined networking, were really pushing Cilium. So I thought, well, there is something there that's going to be my next opportunity.

Alex Perkins: 13:22

Yeah, that makes sense, and you so you've told us you're creating a course, right, something to kind of bridge this gap a little. Do you want to plug that?

Nicolas Vibert: 13:32

Yeah, and going back to actually yeah, how, to how I would learn and recommend study.

Nicolas Vibert: 13:39

So I sort of learned what we do and what the rest of my teammates do in technical marketing is educating people about Cilium, about Kubernetes networking.

Nicolas Vibert: 13:50

So we created about 25, 30 hands-on labs so you can find them on isovalentcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcomcom, free labs. And as soon as you start the labs, Kubernetes cluster is deployed a couple of minutes and you get to try a lot of the features and use cases and we try to take you on a journey of learning, starting with some of our basic concepts and going into a number of different use cases. And I hope that we've had thousands of people taking these labs and I think they've been pretty useful and we'll get some more exciting labs coming out next. So that's what I would recommend for you know, if anybody wants to learn about Kubernetes networking in Cilium. And something else I'm working on is, as you mentioned, is some kind of guide or instructions manual for network engineers that will introduce Kubernetes and Kubernetes networking and it's really in the words of a network engineer and introducing lots of analogies and parallels with what people already know, and hopefully that should come out in the next few weeks.

Chris Miles: 15:11

Yeah, you're gonna tell us what version of VTP is running in Kubernetes, right?

Nicolas Vibert: 15:20

You know, I had to really think about, look back at some of the concepts. One thing we might talk about is kube-proxy, and in my head I was thinking back to things like IP safe or fast forwarding, express forwarding this kind of stuff. So I have to kind of rack my brain on some of the concepts, as maybe some of the similarities I will suggest don't apply very well, but I will test them on you and see how it goes All right?

Alex Perkins: 15:53

Well, speaking of that, so why don't we start by covering some of the basic building blocks of Kubernetes and you don't have to say every single one, right? But we just start with some of the basics so we can build up to the CNIs and what Cilium does.

Nicolas Vibert: 16:07

Okay. So it's all about the applications, it's about Kubernetes, it's all about how do we deploy applications in a scalable, repeatable, fast way, essentially. So, if we take our applications, essentially it tends to be broken down into lots of little microservices, little microservices this is going for this model of the monolith where you have one giant application which is hard to upgrade, to microservice model where you have lots of small applications that essentially make small containers that make up an application, and the smallest kind of unit that you can deploy in Kubernetes is a pod and that's just one or more containers and a pod will have one IP address and again the containers in the pod will share that IP address. And it's kind of similar to a virtual machine, if you're familiar with that, but more of the container level. Now that you've got your pod, you're deploying on an actual node Kubernetes node and again that's similar to the virtualized world where you have an ESXi host, if you're familiar with VMware.

Nicolas Vibert: 17:20

So again, that's a similar concept. So you'd have a node where you have pods running and then you'd have multiple nodes and the nodes can be a virtual machine machine or it can be like a bare metal server. So you'll have your multiple nodes with multiple pods and they're all together. All these nodes make up a cluster of nodes. So again, that's the basic stuff, and then Kubernetes is responsible. One of the key aspects of Kubernetes is that it's a scheduler and it will determine on which node a pod can run. Which is to say, how much memory or CPU does this application require? And oh, there's space left on this node, I will put that pod in there. So, again, based on your requirements, it will automatically schedule your pods in a specific node. That's what Kubernetes is primarily known as a scheduler.

Alex Perkins: 18:25

So go. So yeah, real quick. Sorry, yeah, I was going to say Tim and Chris, we can all just jump in whenever right, because I'm sure I was going to say Tim and Chris, we can all just jump in whenever right, because I'm sure, networking side, there's going to be plenty of questions.

Tim McConnaughy: 18:38

Yeah, and I'm not that great with Kubernetes, not yet. I just haven't needed to be although I understand the concept pretty well just the orchestration engine for orchestrating and building and allocating resources, and bringing up and tearing down these microservices.

Nicolas Vibert: 19:05

Yes, that's how it almost started, and Kubernetes came out of Google. That was part of a project called Borg and it was open sourced about 10 years ago now, if I recall. And so, yeah, it's mainly known as an orchestrator, a scheduler. It's become so much more than this. It's become more like an extremely extensible distributed system. It can be used for lots of different things that have actually very almost no relevance to its container. It can just be used for lots of different things that have actually very almost no relevance to this container. It can just be used almost as an API, which we can probably talk in different episodes. But yes, just to begin with, on Orchestrator, it decides where the pod should go, based on the specifications that the Kubernetes engineer or operator specifies.

Chris Miles: 19:57

Yeah, and just the fact that you tied it, you know like an IP address is typically, you know, tied directly with an individual pod, and those pods are very ephemeral and can live in different places, obviously, it sounds like the networking can get very complex. So do you want to kind of look at that and how that's typically done, how it was done in the way of old, because I know I think the sentiment was originally when Kubernetes was put together. It was like there was just like a little line at the end. It was like, hey, somebody should figure out the networking for all this stuff later at some point. So yeah, let's touch on that.

Nicolas Vibert: 20:31

Yeah, and we'll talk about IP addresses in Kubernetes and how it essentially becomes irrelevant, because pods come and go. You might destroy a pod and it will come back on a different node and have a different IP address. So you can't rely on IP addresses like we've done in the past, where, when I used to manage a data center IP address. So you can't rely on IP addresses like we've done in the past, where you know, when I used to manage a data center, I knew this VLAN means oh, this is the VLAN where this service is deployed and that's on this rack. And I remember for specific servers that are important to me, I remember the IP address. The IP address was the metadata of my application.

Nicolas Vibert: 21:21

But with Kubernetes you have such a big churn of applications. One of the things that Kubernetes does is autoscale as well. So when you need more pods because of the demand, you could automatically add more pods to your cluster and therefore, again, pods come in, pick up on IP addresses and you just don't know which IP addresses they're going to pick. There's no predictability here. So, yeah, IP addresses are irrelevant and we'll talk about this because that's quite important as well when you start thinking about security and observability.

Alex Perkins: 22:01

And real quick. So in default setup, right, because pods just spin up at will and everything isn't there. There's like default behavior where all IPs need to be able to talk to everything. Right, is that? Within a cluster?

Nicolas Vibert: 22:16

or within a node. Within the cluster. So that's the fundamental Kubernetes networking principle is that all pods should be able to communicate with any other pods without the use of NAT.

Alex Perkins: 22:33

Simple, right. That sounds really easy for an ephemeral setup, yeah that sounds very simple.

Nicolas Vibert: 22:40

So that's really the principle, that's how things should be done. So what defines a lot of the networking in Kubernetes is something called a CNI, and CNI represents a couple of different things. Cni is essentially a sense for container network interface. Again, I'm trying to explain it in simpler terms here. But essentially it's a way to define the networking standards in Kubernetes, the networking standards in Kubernetes, and a network plugin that implements these standards is also referred to as a CNI. So I guess the formal description of CNI is actually the standards, and you think about it almost as if it was the IETF RFC standards. You know that. You know, for the networking world it's a similar concept for what the CNI represents a set of standards. And then, but we more talk about, we talk about CNIs, we talk about the implementations of said standards.

Tim McConnaughy: 23:47

Is the CNI, then a piece of like a microservice or a piece of software that actually enforces or implements that networking standard?

Nicolas Vibert: 23:56

Yeah, essentially. Yeah, that's, and it can fall for Cilium, for example. So Cilium is one of the most popular CNIs and it comes with an agent that will go and configure the nodes with all the networking logic and it comes with some kind of operators that does more cluster-wide architect actions. It's more of some kind of control plane and it also comes with a UI and some additional tooling. But yeah, so I guess, if you think about the CLI, it's almost like when you do some kind of vendor selection for your data center. So you might be looking at Arista and Cisco, juniper, whatever. You have to do the same for Kubernetes, where you have to go and select your CNI, because that's going to be one of the most critical decisions you make in your cluster or in your environment.

Chris Miles: 24:57

I was going to say. One thing that always confused me about this was, like when I heard the term CNI, like I think a single network interface, like there's one big network interface that everything was processed through, but it's more of a distributed thing. Like you said, I think the idea of like a distributed control plane out to the entire cluster is probably a better way to look at it. Right, defining policy and rule sets, things like that Is that a better way to look at it than what I was previously looking at?

Nicolas Vibert: 25:25

Yeah, and that's a good point.

Nicolas Vibert: 25:29

Selium, just like Kubernetes, is a highly distributed platform With a lot of I guess the aspects of Cellium is defined in the Cellium configuration, but also a lot of the configuration is pushed to the Kubernetes server. As I said before, kubernetes is massively extensible as an API. As I said before, kubernetes is massively extensible as an API. What you tend to do is you send your desire configuration to the Kubernetes API server, which is part of the Kubernetes control plane, and the Kubernetes will take your desire intent and your intent might be I want X amount of pods or I want all these containers, and Kubernetes will go and take care of these and deploy them for you.

Nicolas Vibert: 26:17

And we have this in intent-based networking, where you have your network source of truth, where you define your network and then you push it and then you expect the network to automatically do that for you. And it's the same in Kubernetes. You send your intent, which you typically do in a YAML file with all your desired pods, desired networking configuration, CELUM configuration, bgp configuration. You send it to the Kubernetes API server, which is almost like a database, if you like, and then CELUM will go and go and implement this for you. So that's. I hope that helps with clarification.

Tim McConnaughy: 26:54

So that's kind of like the Terraform model right, where you're kind of declaring I forget what's the word I'm looking for, I completely lost the word when you're kind of declaring the end state right, and you're just kind of here's my end state, feed into the system, and then you know what comes out should match that end state. Basically.

Nicolas Vibert: 27:10

Yeah, yeah, yeah. So that process of reconciliation that happens to ensure that what exists, what has been deployed, matches the intent of the user Very similar to Terraform in many ways.

Tim McConnaughy: 27:26

Yeah, what always really screws with my head with Kubernetes is, as a network engineer especially as an old network engineer now, I think a lot in the physical layer, even though I work in the cloud networking space now, I think of it more like a physical layer, if you will.

Tim McConnaughy: 27:42

So that's the part where it always messes me up. I'm thinking here we have racks of compute with top-of-switch racks or whatever, and whether or not it's abstracted from us is irrelevant. Right In the data center, that's what it is. In the cloud, that's really what it is. But really you just see the API version of that right the front end. So when we're talking about Kubernetes as a solution, we're really talking about here's a stack of compute to whatever amount of compute we can throw at this thing, right, and this compute is connected to some top of rack switch somewhere and they're connected to each other almost like a VMware vCenter setup would be. So that helps me kind of ground it a little bit. Kubernetes, really, we're talking about orchestrated compute and the orchestrated networking between it is the CNI, right, how all these pods should communicate, the rule set.

Nicolas Vibert: 28:34

Yeah, yeah, 100%. And one thing that you know the CNI can also do is establish like a network overlay, which you would need, you know, if you don't like some. You know VXLAN, right, you know, that's what you would have with ACI, you would have with NSX, that's what you would have with ACI, you would have with NSX. That's how we can enable that connectivity between all your different pools of compute by building a network overlay. And essentially, your pods have no idea about the underlying layers of the network, they just talk to each other.

Alex Perkins: 29:16

That's a good point. I guess I would say why don't people just use the native like built-in Kubernetes, networking stuff? I'm setting you up for here, right?

Nicolas Vibert: 29:29

Yeah, yeah, there are, I guess, a few different models. And again, cilium is one of the CNIs and it was built for almost Kubernetes and the Cloud IT space. The challenge is, what you get by default with Kubernetes is something called kube-proxy and kube-proxy what it will do is configure some of the load balancing and some of the nothing required when you do load balancing, and something we can talk about later is the actual load balancing within the cluster. So it's the actual load balancing within the cluster, but kube-proxy is essentially based on technology which is 25 years old, which is IP tables. Oh, wow, okay, yeah, ip tables wasn't designed for an environment where, as I mentioned, there is a massive churn of.

Tim McConnaughy: 30:31

IPs Ephemeral right, An ephemeral environment.

Nicolas Vibert: 30:37

Yeah, I totally get it. So every time the problem with something like kubeproxy and iptables is you tend to have like these rules you know in your iptables configuration and you could have like hundreds of rules and you almost have to process the rules in a serialized manner. So you might need a packet to go through a list of thousands, of thousands of rules before it can hit the match, which, at scale especially, can add massive latency to your environment. Again, it's just simply because we are. The QProxy, which is kind of the default networking tool within Kubernetes, was based on the technology, which is outdated.

Tim McConnaughy: 31:20

Not only that, but updating all those IP tables all the time, every time there's worker nodes pulling up. Oh my God, yeah, I can already see the nightmare without even knowing the solution. That's crazy.

Nicolas Vibert: 31:32

Yeah. So you know, that's often a reason why engineers will start looking at proper CNI. If you like. The CNI is responsible for a lot of things Like you can't, for example, by default. If you want to do, say, ipv6, if you want to do network security, you need a CNI that can support and implement this. It doesn't just come out of the box.

Alex Perkins: 32:01

Yeah, security with IP tables would be a nightmare, with constantly just the way that everything, like you said, you have to serially read through everything and try to apply the security to that while everything's spinning up and down, it just sounds like a nightmare.

Chris Miles: 32:19

Yeah, I guess. What is the counter argument? Because I've just never really expanded upon this. A lot of application owners, at least the ones that I interact with typically in the cloud space, are oh, why do we need added security on top of this? You know everything's TLS encrypted where you know within Kubernetes everything's using. You know mutual TLS, right. So even if things can't communicate, it's all encrypted, right? So what is the counter argument to why you need to inject network security into Kubernetes specifically?

Nicolas Vibert: 32:52

Well, so first, mutual TLS only happens if you've put the right tools in place right. So that's which you know. One option could be if you want to do mutual authentication right, you might implement it in your app itself, but that would be a nightmare to manage and you don't want your app developers to be able to handle this, so you need to kind of offload that function to something else, and that's typically done with a service mesh or a CNI, and it's just a principle of network securities. I think the network security policies is something that is required because it's for regulation compliance.

Nicolas Vibert: 33:33

Compliance, yeah, compliance, all that stuff yeah required because it's for regulation compliance, you have to adopt some least privilege slash, zero trust model, again by default. Kubernetes is like, yeah, great, you can all talk to each other. I'm not going to put any kind of restrictions. Have fun, which is great when you want to get started and learn, but yeah, as soon as you have to be serious and running serious workloads with sensitive data, you need to restrict an application to its lowest, you know, to only what it's supposed to do.

Alex Perkins: 34:09

Right, well, and you're not going to. You know, most people, most places, aren't going to spin up a separate Kubernetes cluster for every single application either. Right, so you're going to need some kind of separation in there. Yeah, exactly, yeah, yeah.

Nicolas Vibert: 34:23

And it's. I guess that's another interesting thing we think about, like how we segment today in a network. Right, we've got all our VLANs or we get maybe a VRF, and this stuff doesn't really exist in Kubernetes. We don't really have the same concept. We have a concept of a namespace in Kubernetes, which is typically how you would group logically some entities. You may have an app in a namespace, you might have a tenant in your namespace, and that's the most kind of similar concept to a VLAN. To some extent, when you think of a VLAN as a logical group of network entities, that makes sense.

Tim McConnaughy: 35:10

So real quick. I know it's not on the thing, but now you got me asking. So we see the problem with basic kube proxy, right. So how does, without going too far down the well, how does CNI, like Cilium, solve that problem, that challenge?

Nicolas Vibert: 35:29

So one of the things that we do with Cilium is we use a technology called eBPF, which doesn't actually stand for anything. It used to essentially be the successor to something called BPF, which was a Linux technology which was used by TCP dump to hook into the network traffic. And I still use TCP dump on a very frequent basis. I still use TCP dump on a very frequent basis, but essentially that's a way to insert yourself into the traffic and just copy the traffic and just visualize what's going on. And eBPF is the extended version of it, if you like, the next generation version of it, and it's a way to essentially hook into the network and run some networking and security and observability features. Again, without going into too much detail, Cilium comes with an eBPF-based QProxy replacement which is far more effective at dealing with, again, the ephemeral IP addresses. And again, we can share maybe some notes with some of the comparisons, but using something called eBPF hash tables. That enables you to quickly find the role instead of having to go through thousands and thousands of roles for your traffic.

Alex Perkins: 36:55

We should call this out real quick too. So when you install the Cilium CNI, does it actually remove kube-proxy, or is it side-by-side, or how does that part work?

Nicolas Vibert: 37:06

So you can do it in a couple of different ways. You have something called a street replacement mode, where you just wipe out kube-proxy, and you also have a partial kube proxy replacement where you have both of them side by side but you can deploy. For example, if you use Cilium on an Azure community services cluster, you could have both of them deployed but the kubeproxy wouldn't do very much. So that's but yeah, you wouldn't really see rules in your kubeproxy once you start using eBPF and Cilium for it.

Alex Perkins: 37:51

Okay, I think so we should go over some of the I guess some of the common use cases for Cilium. So you mentioned load balancing and I don't know if you want to take a step back and talk about how it's done natively and how it's done differently with Cilium, but I think load balancing might be a good place to start.

Nicolas Vibert: 38:10

Yeah, so load balancing. So I mean that's clearly a critical part of any networks, right? And you think about the traditional network. What we have is we have our virtual IP and then it's fronting a pool of real servers, right? Or a backend. Yeah, yeah, backends. But in your virtual IP you have to essentially write the IP addresses of the backends of the real servers and again, that's not going to work in Kubernetes where you get pods, backends that come in and out and you just don't know. So if you had to update the pool of real servers every time a pod comes in, it would just take forever. So the way we create pools of server is by using labels, and labels is a way to categorize your workloads and group them together. Again, as I said before, ip was like your metadata and how you understood and how you categorize your workload in the old world. In the new world you have to use some different metadata instead of using IP addresses when IP addresses don't mean anything.

Chris Miles: 39:26

That's right. So when we talk about load balancing, in that case it's purely from an ingress perspective. Then right, it's not necessarily load balancing between the system itself, it's all about ingress, is that right? It's not necessarily load balancing between the system itself, it's all about ingress. Is that right? It's both.

Nicolas Vibert: 39:43

So we have Kubernetes, define different APIs and how things should work, and you have different. They're called services. Services is how you define again the load balancing model and you have something called cluster IP which is just used for internal load balancing within the cluster. So for a pod to access a set of pods within the set cluster, that will be called a cluster IP and that's just for internal use. But again, here what you have to do is when you create your pool of backends, instead of saying load balancer traffic between IPA, ipb, ipc, you have to say load balancer traffic between all the pods with this specific label. And label is kind of just a way to assign some metadata to any objects in your cluster and maybe the label could be this is production workload or production HR or test sales. So it's just a way to assign some information.

Tim McConnaughy: 41:02

Some meaningful data. Yeah, exactly. Yeah, so real quick, because this is where I always get lost as an network engineer. So I've got a packet coming from some microservice destined to some other microservice, right, and the ingress to that microservice that I'm trying to reach is a virtual IP, a front-end IP that's orchestrated by the CNI, and the back end of that is a bunch of labels. Who's tracking, like what is the ephemeral? Because, like you said, ips don't mean anything because we reuse them, we toss them, we get rid of them, but in the moment when a packet is coming in, how? But in the moment when a packet is coming in, how does Kubernetes actually send the packet on to the backend service? I mean, you can't send it with just a label as the address to send it to right. How does that piece?

Nicolas Vibert: 41:50

work. It would also depend on which CNR you're using, right. Well, let's assume Cilium in this case. Yeah, so Cilium would just be using the eBPF maps to essentially route the traffic to the right place. So it's just the way Cilium can enforce routes of traffic using the eBPF-based process to essentially push the traffic in the right direction.

Tim McConnaughy: 42:24

And is eBPF tracking, then, like we said, the backend will be any pod that has this label, right, so is eBPF. Oh, my god, I can't talk. Ebpf, then has a label to pod IP mapping or something that's going on there.

Nicolas Vibert: 42:39

So Stadium has a, something that I should have explained earlier. But we have this concept of endpoints and identity. So an endpoint is something that, let's say, for a container with, you know, an IP address, it will be referred to as an endpoint and every endpoint will have an identity. So, again, this is a way to map, because IP addresses are irrelevant. We can't just rely on this, so we have to use labels to determine the identity of an endpoint, and that's how we can enforce security.

Tim McConnaughy: 43:21

Yeah, so I see that. And again, this will be my last. I won't belabor the point any longer. But to me I get completely why IP addresses don't matter. Right, we can't build a policy, we can't build a rule set, we can't build any meaningful thing based on an IP address. But when it's time to actually route the packet and get it where it needs to go, I assume there's still a destination IP in the packet header somewhere.

Nicolas Vibert: 43:43

Yes, yeah, which is again tried by.

Tim McConnaughy: 43:45

Silly, got it, got it. That's the piece that always gets me yeah, that exists.

Alex Perkins: 43:53

Is there any integration? Does the Kubernetes database, the etcd database, does it keep track of any of this stuff too? Is there some talking between eBPF calling on the Kubernetes API to get some of this information? Is that some of the intelligence that's built into this process?

Nicolas Vibert: 44:14

Yeah, so Cilium would frequently just pull the Kubernetes API server for this kind of information, just to keep track of where everything is and keep it all updated.

Tim McConnaughy: 44:25

Okay, that makes sense.

Alex Perkins: 44:27

And as new pods get spun up or spun down right, yeah, exactly. So that's some of the reconciliation that would happen between there.

Chris Miles: 44:37

Yeah, one thing that I struggle to understand sometimes is thinking about Kubernetes is obviously a very ephemeral environment. There's things changing all the time and the observability piece sounds like it would be a nightmare, Because every network issue that ever gets reported is it always a specific snapshot of time? And is there something within Cilium that you can look at to know like, oh, this user reported this issue at 12.36 PM. Are you having to correlate a lot of that? Or how does a network professional interact with that system?

Nicolas Vibert: 45:16

Yeah, and that's, I guess, part of what Cilium can do Again with eBPF. The way we do it is kind of we hook into the Linux networking stack and sometimes it's you know a way to compare it with its service insertion. If you know what I mean, you know whether it's in the cloud, or yeah, bumping the wire.

Nicolas Vibert: 45:40

Yeah, yeah, and it's a way to. Ebpf lets you do it in the Linux kernel. So it's a high performance within the Linux kernel, not in user space, right, but you can do it safely without causing problems, which is usually the concern. You don't want to run things in the kernel because you don't want to break anything. It just lets us observe all the traffic that is kind of entering the host and also the pods, and we can use this with Selium to essentially observe all the traffic coming in and out of the pods. And it's similar in a way to something like netflow or sflow, where you can essentially see all the flow from a to b. You know your port, I port destination protocol, um, which you can then export if you wanted to for N6 purposes.

Nicolas Vibert: 46:39

Like you said, you want to look back at when something happened. What's cool about this is you can also it's also aware of the Kubernetes context, which is because, again, if I tell you that IP address 10.1.1 was the source of your problem, you know two hours later that source will be allocated to somebody else and you know. So we can correlate again, because Cilium is tracking all of this. Right, we can correlate to say this flow was from that pod on that node in that namespace, in that cluster and it was that sender traffic to that other part, for example, or to that service. So but yeah, observability is like you know, when you have so many layers of abstractions which you have in you know Kubernetes and stuff, you need some strong observability or otherwise nightmare to manage. In the best case it still seems like it'd be a nightmare to do any kind of observability or some wise nightmare to manage In the best case, it still seems like it'd be a nightmare to do any kind of observability correlation.

Tim McConnaughy: 47:42

And you need, you need a. You need a data center in a mountain in Utah to store all the store, all the insights.

Alex Perkins: 47:51

Yeah, you know, feel free to to show the products here, but I think your suite is called Hubble, right? Is that the Cilium? You guys have a lot of stuff built around the observability piece, where it's like a whole product suite that can do all this stuff too.

Nicolas Vibert: 48:05

Yeah, and maybe I should also clarify this the Cilium project itself it's an open source project. It has been donated to the Cloud Native Computing Foundation a couple of years ago, so it's not even ours as an IsoVelon project anymore. It was created by IsoVelon but it's been donated. So it's a free, open source and it comes with tools like Hubble that lets you do the observability. So that's this kind of just to clarify. And then we obviously Isovellon has a more product enterprise offering which is based on the open source version but with some additional features. But yeah, just to make sure people are-.

Chris Miles: 48:53

One last question Can you do a packet capture?

Tim McConnaughy: 48:58

Can we TCV dump from Kubernetes to a worker node or something?

Nicolas Vibert: 49:05

So you can't do it necessarily natively with Cilium like today. You can often do it and you can see it in the labs that we've been running. As I mentioned, we have something called the Cilium Agent that can build all the VXLAN tunnels and it's also where we're running BGP. For example, from the Cilium Agent that will build a BGP session with your top-offer device. Oh, all right, so in the labs that we've got, when I want to show things like that Cilium supports, Graceful Restart or eBGP multi-hop, I will do a TCP dump on the Cilium agent, capture that traffic and then go and show the user how to do that Nice.

Alex Perkins: 49:49

Okay, I hate to wrap this up, just because there's still a lot more. Like I said in the beginning, I think we're definitely going to need multiple parts to this. I want to talk about BGP and security stuff. So real quick, chris Tim, you guys have any last questions before we let Nico wrap up.

Tim McConnaughy: 50:14

Not a question. What you just said is accurate, right. Wrap-up Not a question. What you just said is accurate, right. Literally all we've been talking about is CNI, inter-cluster or intra-cluster. At this point there is so much more to talk about when we start talking about Kubernetes, talking to the outside world or getting traffic from the outside world, Even an ingress pattern we could probably spend a whole hour or more on right. So I'm looking forward to the demo stuff, Really really looking forward to sharing that with everybody. I think for network engineers, it's really about being able to tie Kubernetes I can't wait to read what you're putting out, by the way to tie Kubernetes to networking and just make it real right. Just make it real so somebody can grasp.

Chris Miles: 50:53

Yeah, ditto, not much more to add there. I think we're really going to get some value out of seeing a demonstration of it. But I mean, this has been great kind of going through the building blocks, because I don't think I've ever seen it laid out exactly like this. I know Nico puts out a lot of content around this kind of stuff, which is really great and I recommend it to anyone. So, yeah, excited for more. We'll have you back, for sure.

Nicolas Vibert: 51:16

Awesome Well thanks for having me and, yeah, look out for more content coming out from us on iservillaincom. And again, thanks for having me. Awesome, yep.

Alex Perkins: 51:29

And if you haven't like Chris was saying, if you haven't followed Nico on LinkedIn, it's a constant stream of really good stuff. So, definitely do that, and we'll add a bunch of show notes links in this.

Tim McConnaughy: 51:41

Yeah, we need the labs that he mentioned, like the ice of alien labs. It was really good. Yeah, you get those in the show notes and everything too. Yeah, absolutely.

Alex Perkins: 51:49

And also for this episode. I just I really want to call out like, if you know, anyone listening has any specific questions or things that they think we should cover in future parts, please, you know, leave a message on LinkedIn, on X, twitter, um send us an email, right?

Tim McConnaughy: 52:05

All the above, yeah, yeah.

Chris Miles: 52:08

It's like saying man, it's like saying fetch, it's going to happen. It's going to happen, all right guys. Um, it's going to happen.

Alex Perkins: 52:16

It's going to happen. All right, guys. So that's all we got for this week and we'll be back with more Kubernetes, networking 101. Thanks for stopping by. Hi everyone, it's Alex and this has been the Cables to Clouds podcast. Thanks for tuning in today. If you enjoyed our show, please subscribe to us and your favorite podcatcher, as well as subscribe and turn on notifications for our YouTube channel to be notified of all of our new episodes. Follow us on socials at Cables to Clouds. You can also visit our website for all of the show notes at CablesToCloudscom. Thanks again for listening and see you next time.

People on this episode

Chris Miles

Co-host

Tim McConnaughy

Co-host