Tariq Islam 0:07 We do intros first, right? Yes. Jamie Duncan 0:13 Thanks for joining us on the K files. Each week we take on a different topic in the Kubernetes community and the IT industry. Tariq Islam 0:19 Depending on the week we may find ourselves knee deep in kernel headers, magic quadrants or even start a pitch decks. Our goal John Osborne 0:26 is to close each file with a sense of completeness and satisfaction for us and anyone listening. Tariq Islam 0:36 Okay, welcome to Episode Two of the K files. General broad topic for today is going to be primarily around multi tenancy. This is something that I think all of us here at the K files feel pretty, I don't know about passionately about but we run into this a lot. And so this is often a topic of discussion with folks that we talked to different organizations just even within the community in the industry. I'm always kind of presented with questions around Well, you know, Tarik, what level of multi tenancy should we be? Should we be aiming for In fact, I just had that discussion yesterday around what level a given organization should be at. I was talking to one org and and they had tried one method, and they actually lived through a spectacular failure, where the blast radius of a very large cluster impacted them in a very, very bad way. So they were like, you know, what, we're gonna back off a little bit, maybe go one level one level down. So the blast radius is not so great. So there's some pros and cons, I think. And we just wanted to run through some some patterns here and see what areas of concerns folks need to be aware of as they look at different patterns. Jamie Duncan 1:41 So let's set the table a little bit around sort of multi tenancy and we're speaking specifically to Kubernetes. Here, obviously, so there are different layers. You can have sort of a tenant object and a tenant object is Coke versus Pepsi. It's not project a versus project B. Is that what you're thinking? Are you just thinking how Are we isolating any project from anything else? Or how we isolating projects data that really shouldn't be talking to one another for business or security reasons? Tariq Islam 2:08 It's actually it's, it's, it's more complex than that. I mean, I have a love hate relationship with this topic, just because there's just so much to consider. And even even to the point where you have to think about what is a tenant? Right? Yes, we're going to talk about multi tenancy. But first, let's define what a tenant is. And that's more often than not, at least in my experience, I'd love to hear from you guys. In my experience, it's been a business question, right? Is it a line of business? it? Is it? Is it a team? Is it multiple lines of businesses grouped through some sort of enterprise concept? I mean, it varies. You know, every organization is a bit of a snowflake and how they're comprised. And so what a tenant is, for a given business is going to depend and so that's usually the first question that I typically have. And for the listeners out there, I think that This is probably the first question that you want to be able to answer is what what is a tenant to me. Jamie Duncan 3:05 And I think that's a, that's a tough pill for a lot of people who walk into Kubernetes thinking it's going to solve all their problems. They think it's got a tenancy model, where it does and a lot of ways meaning like you can cut and slice and dice it in 100 different ways. But it's not going to tell you which one, it's not going to tell you how to organize your team, it's not going to tell you that you need geographic isolation, as opposed to business purpose isolation, as opposed to whatever, and they really want us to walk in the door with that answer. So I think you're exactly right. Tariq Islam 3:39 There's been some work done. I mean, we there are, you know, high level categories, kind of areas of concerns to to look out for when, and we're assuming that the tenant has been defined here. Jamie Duncan 3:50 Right. You know, we'll even define a tenant, and we'll Tariq Islam 3:53 okay. Yeah, that's that's actually a good point. Let's, let's assume for the sake of argument here, you know, one popular pattern that I've seen As far as tenancy goes, what a tenant tends to be, is usually a line of business within, within an organization. So specific line of business might have, you know, more than one or more application teams or workload teams, you name it, they might even have their own sub set of IT operations. But a line of businesses kind of its own silo of excellence, if you're talking about, you know, analogous to that might be like a product team at a software vendor or something like that. But if you're if we agree on the line of business being the tenant, there are certain areas, you know, and this is these areas are pretty much common no matter what level of tenancy you choose, but the areas of concerned are actually there's a number of them. I won't go through the full list, but things like Infrastructure Administration, and when I say infrastructure, I'm talking about the Kubernetes infrastructure. So things like provisioning operations, Dr. Disaster Recovery, the service mesh Things of that nature. There's the roles. Jamie Duncan 5:02 So you're saying like, Who controls building out your VMs? Yeah, basically, who's got the who's got the API keys to Amazon? Tariq Islam 5:09 Well, well, that definitely that. But also even, you know, things within the cluster itself, is the cluster admin. Exactly. Yeah. I think I think as for actors, as far as actors goes, or like, let's say, persona owner, we're talking about the cluster admin. That's exactly right. Gotcha. John Osborne 5:26 Okay, Thanks for clarifying. just laying the groundwork. We all agree that a cluster that runs in production should not spend any non production environments, right, like you want to dedicate it. That that's the thing, right? Like customers are like, hey, with nodes, I can, I can just schedule these nodes for production and these ones for, you know, test or Dev. And I think we've all seen enough reasons not to do that with different cascading failures or other things that can happen and I'm gonna raise my hand there and say I may have had to have recommended that in some weird one off scenarios before I think we kind of lived in learned, like, if a customer asked me that three years ago, I would have said maybe and now, you know, we've seen enough reasons that the number of times we should be saying maybe this is close to zero. Jamie Duncan 6:12 Yeah. And like Tara like tarxien oof in 100% agree. But yeah, times you, you take the Clegg, you've got right. You've got the clubs in your bag, and you've got to get to the hole. Yeah, it's a bad scenario. I agreed, but it's better than no scenario. But yeah, 100% agree. You do not want to commingle prod and non prod. Tariq Islam 6:31 Yeah, you know, actually. So, so having having gone through just that administration aspect, that's just this one area of concern where we have the cluster admin, I mean, there, there are other actors, right. We talked about what a tenant is. There are also other personas. So you know, we have the cluster admin, obviously, you have your developers, your security administrators. You also have, you know, tenant personas like the tenant developer, right, because again, we're talking about tenancy in terms of the organization But some of the other areas of concerns that we need to worry about are things like roles. So when we're talking about not just the Kubernetes, our back model, we're also worried about, you know, identity management outside of Kubernetes. And how that's going to integrate in. And I feel like that's always a huge kind of integration problem, right? I've got Active Directory or I've got some sort of cloud based IDM. How am I going to integrate that with Kubernetes and our back, that's, that tends to be a bit of a challenge for folks to grok. And then you know, General access control for workloads, deployment management, security log mon chargeback and show back resource isolation and sharing. These are all those areas of concerns. When we're talking about multi tenancy. These are very deep, deep areas need to be explored, doable, but it takes work. John Osborne 7:45 There was this talk at coop con from this rice around ARP spoofing your cluster and she kind of showed that if you have like a layer two network plugging in and that gets hacked, then you can basically take over DNS for all of that node and then you can do all these things that you're really not supposed to Doing that was new to people. And, you know, when she said that she had contacted Kubernetes security about it. And that was the expected behavior. I think that that there was a lot of people questioning like, Oh, well, I thought things were a little more out of the box secure and it's, it's not always the case because especially if everything is running in a container, certain containers will have root privileges if you've seen all the security, actual like runtime containers are out there, they all run as privileged and there's operators that run as privileged so I do feel like things can go sideways and we are still learning there is a good talk to buy in cold water and Paul Duffy, who you work with now Jamie that around, no five different ways you can abuse the default Kubernetes clusters which we'll put in the show notes as well recommend that but there's this I feel like there's we're always finding out new ways that a cluster can fall over. There was of course, like the breach at Target to where you know, things had I mean, at the breach at target the outage at Target with Kubernetes where you know just this cascading thing. Which there's a really good blog about that we can put in the show notes. But there's just a lot of places in Kubernetes, like between the API server and the cubelet. And at CD where things can get overwhelmed, I feel like Tariq Islam 9:09 at the beginning, in Episode One of this podcast, we we had entered the podcasts purpose of, you know, we look for patterns, and we try to try to signal those patterns out from all the noise that we see in our, in our industry. And I feel like this is an area where patterns could could really play a big role. You know, if you look at how, for example, when you look at openshift, and or Gk e or Eks, or aka s, a lot of these Kubernetes distributions when you provision them in some form or fashion, the platform's themselves, at the time of provisioning, make certain kind of common sense, default configurations on the behalf of the person provisioning or the user, right. So certain flags are preset, there are certain defaults there. So one good exam Might be openshift security context constraints, right? Maybe pod security policies for for Gk but i think i think this is multi tenancy is an area where I think we have a finite set of patterns where we can have, you know, high level, I guess tenancy profiles. And for each of those profiles, we could set certain defaults. I mean, I feel like we could take a lot of the headache away, still give folks the nerd knobs to tune whatever they need to tune in at the platform level. I think I think this is an area of opportunity for for really every every player in the industry to give something to the enterprise and say, Hey, if you're looking for that right level of multi tenancy, here's your T shirt size profile set, where you can get started, whether it's line of business level, application, team level, company level, that type of thing. I think like what we would consider Hard mult multi tenancy at Red Hat. And I've read Google considers this the same way as you have two layers of isolation, you know, to consider something like multi tenant. And that's not everywhere in Kubernetes doesn't have that, you know, there's one API server, there's, you know, one instance of, you know, certain things running, at the end of the day, there are patterns that that we can that we can offset against. And and when I say we, I'm talking about, you know, potentially vendors in the industry, or even you know, if we want to if the industry or the community wanted to come together and provide, you know, open source solutions to this, where if you wanted to start at the line of business layer level, or if you wanted to go down to the application team level, I mean, you can go with whatever level you want, but just provide some t shirt sized, sane default configurations that you could apply to your clusters, where you don't have to go in, you know, dig into each and every single Kubernetes resource and configure them to orchestrate with one another to achieve Have a certain level of multi tenancy, which is what folks have happy having to do right now. And that's hard. John Osborne 12:05 This is a good place for the for the community, I feel like to do better because if you look at historically some of the things like the SE comp profiles and other things that haven't been applied, it's largely been the rationale has largely been around what we don't want to break any existing applications. And, you know, Kubernetes has been out for a while. So there's a lot of applications on it. So but I think this is something with with open policy agent, which we'll talk about in the next episode, where we can provide out of the box reasonable defaults, that there will be like a secure what we would say more multi tenant, open the door to more multi tenant workloads, I think they'll always be places in the cluster that where you'll need more hardline isolation, but I do think that something like open policy agent or some other policies that can be shipped from the community could help solve some of the challenges around retrofitting a secured by the Default policy because we know that the configuration that you give to a customer is largely they're gonna be the one that they're going to stick with, right and that settle if things aren't secure by from day one that usually not secure in day end, right. And so this is something that where I feel like the community, we could provide more of a configurable option other than the just out of the backup box options, which, you know, are more based around backwards compatibility and so forth. Jamie Duncan 13:25 Yeah, I think a lot of that too. We talked about on the previous episode. I think, john, I think you were the one that brought it up about people who were completely capable sort of ops side of DevOps engineers, who had never racked a server who had never dealt with this sort of the lower underpinnings inside the Linux kernel. And when you start making containers secure, that's what you're dealing with. You're dealing with Linux capabilities, you're dealing with all of these things that enable or disable very low level functionality for a process running into lint inside of the Linux user space. And if you haven't dealt with them turning on a secure profile by default is a non starter because they simply don't have the vocabulary to deal with it. So I think that's a lot of the problem that we're running into around this. We're saying make your application secure, and they do the things they know to do they set you know, they set the security profiles on their AWS on their v PCs, and they and they set all the IM roles properly. And they do all the things that sort of that abstraction layer that they they're experts in, where they don't understand the Linux API, the kernel API abstraction layer nearly as well. So they don't know to go turn those knobs as effectively. John Osborne 14:42 If you haven't been working with Kubernetes or containers, you probably don't know that a lot of these knobs are inside the Linux kernel. And when I started five years ago, with Kubernetes when it was just starting to to hit GA, I still thought that you needed to be root to run Apache on port 80, right, because you had to be root to run to run on any port under 1024. And, you know, that's not the case. There's the Linux capability that allows you to do that. And the routes been cut up into all these capabilities. I didn't know that existed. I had worked on Linux for 15 years, I had a Linux certification. I had run workloads in production on Linux for years. And then the admin, I did not know that. Right. And so there's a lot of people out there that don't know a lot about Linux capabilities. And these in these a lot of the, you know, Linux namespaces and things and there's still an education going on there. Jamie Duncan 15:36 Yeah, so they take what they find on blog posts, and then copy and paste it. And if it's a bad blog post, they have bad security. And unfortunately, the vast majority of them are bad blog posts. So tarc getting back to sort of multi tenancy, we I think we kind of diverged a little bit and trying to bring it back around. Are there any best practices are there patterns that are identifiable that are that are useful? Or is everything in our current ecosystem? does everything have to be cut from whole cloth? Tariq Islam 16:05 Yeah, I mean, as far as as far as patterns across the board go and it's really just those areas of concern that's that's as far down as we've gotten. You know, the ones that I rattled off earlier around administration are back, LogMeIn, mon security, things like that. There are some some sub topics that we can, you know, tease out of those high level categories. But at the end of the day, to your point about doing this from whole cloth, it's it is a lot of work. I mean, there's just no, there's no templating out there, there's no, you know, there's no like sane defaults being afforded, at least not in an aggregate sense, where I can just go to a company or an enterprise and say, Hey, here you go, you're looking for this level of multi tenancy. Start Here, right, there's there's no starting point, I think is probably the best way of putting it. You have to create your own starting point and then iterate off of that. Jamie Duncan 16:57 Do you think it's possible to have one Tariq Islam 16:58 I absolutely Yeah, I mean, the same defaults are really just configurations that against the Kubernetes API. I mean, some of them are probably going to have to be at the infrastructure provider level. But even those could probably be, at least some of them, some of them can be can be made pretty common than anything in cluster. Yeah, certainly. I mean, you could, you could easily create a profile for that, at least I think Jamie Duncan 17:20 you could. And I'm not thinking more about saying defaults. I'm thinking more about a rubric to help define multi tenancy better to give them that starting point to like you call it T shirt sizes, we always call them when I was dealing with OpenStack. More often, we always called them take out menu options. Give me the number six, John Osborne 17:37 a nice decision tree I kind of thought about this, like a nice decision tree would actually be helpful because even just a few years ago, there'll be customers that would really push back on running Docker containers as root from Docker hub in production. And so there's there's a certain amount of education involved but then there is a kind of a decision tree with different customers have different requirements and like Tareq was saying this kind touches on a lot of different things like even even scheduling and their quality of service tears. And you know, the amount of failover space that you would need for a cluster really changes when you have different types of workloads and configurations involved with them. So it kind of touches on that topic. But it ends up touching all these things where, you know, if I work with the Department of Defense, they actually have classifications of security. And, you know, that's a pretty hard line, but then we work with other customers in commercial space, they might call it something different, right? But there does kind of need to be a decision tree or guidance around, you know, here's your considerations, you kind of want to go this way. If you you know, towards a single tenant model for a cluster, you know, if you have this hard line, and here's what we would define as that price, some good guidance out there be helpful Jamie Duncan 18:47 in tark. You mentioned like the various cloud providers who should be writing that like who should be spearheading this sort of rubric around solving this problem better because I think you're right. There is no question cut and dry, there is no apply this pod security policy, write these rules and rigo and use a PA or gatekeeper to apply them set your containers all run with this capability and not this capability. There is no one size fits all. There's no shoe size solution, who writes the understanding? Who does the teaching? Is? Or is it the people that make Kubernetes? Is it the people that provide a platform for Kubernetes to run on? Is it a security company? Are we making a startup, Tariq Islam 19:29 frankly, that? That's a hard question. I don't have an answer to that. Frankly, it's I don't know who I mean, in terms of John Osborne 19:39 there are cis benchmarks that are out there. But yeah, I still wouldn't consider those. Those would be good enough for a lot of use cases, but not all. So there's, that'd be part of the decision tree that, you know, I think could be created. Tariq Islam 19:51 And I don't I don't know if a single provider should or even multiple providers should own it. I would love for it to be like a community driven thing, something more intrinsic to public. netease Kubernetes projects perhaps, but at the end of the day, ultimately, you know, you're gonna have to have some sort of portability behind it. So at best, at best, the ownership would have to be some sort of a framework that that vendors take and extend off of, I guess, I don't really know how we would how we would model that. But this is a tough one, you know, because there's so much value, I think, in in establishing a start here with a T shirt size form of multi tenancy on Kubernetes as a platform, it's just too much money to be made, frankly. That John Osborne 20:37 Yeah, yeah. Jamie Duncan 20:37 Yeah. already designed. Because Yeah, Tariq Islam 20:40 exactly. Exactly. A John Osborne 20:42 lot of it comes out of people to in the people that you talk to in their individual, their individual motives for Kubernetes. And at the beginning, I was talking about these it providers who want to stop shadow it from from happening within their organization, he kind of Kubernetes as that as that outlet. But then there's also people who have other needs and motives. And you know, it's part of the discussion that you have with customers when you go on site. And that's why I think it's really hard to codify a lot of these things. But even if there was a gray area or you know, considerations for customers that there's a good multi cluster doc that kind of touches on some some of the stuff in the community, something that would just focus that on multi tenancy would probably be good for customers to read as well, you mentioned, it would have to be some sort of framework. And in my head is john was talking, I was kind of designing that framework, you know, going from A to Z really quickly. And it would have to be a framework of frameworks. I don't know of anyone that's trying to do that. Now. Have you seen anyone trying to solve this problem programmatically? Jamie Duncan 21:41 Look who's like psps like pod security policies are a small subset of it, but then depending on the day of the week, they're being deprecated. And then there's open policy agent, like john mentioned, we're gonna be talking about in depth here in an episode or two, which again, is right now is sort of an admission controller. You know, whether or not a resource gets created on it. cluster based on a rule set, which is part of what we're talking about around tenant sustainable cluster, is there a framework out there around this? Or are we just all out in the tall grass for the foreseeable future? Tariq Islam 22:10 I think I think it's more the latter. I don't I don't know of anyone that's actively working on this in the community. I think it's also just where the community is right now. I think, you know, in two, three years, there'll be I think the community will settle down a little bit, frankly, and this is a completely separate topic of discussion, by the way, but right now, I think Kubernetes is just struggling through the growing pains of commercialization, and finally making its way legitimately into the enterprise as a viable platform. So once that's done, I think the next growing pain will be, you know, how to how to really make a higher order capabilities like multi tenancy easier, but I just don't think we're there yet. Both in terms of you know, ownership, you know, features and functions. Because the API is still changing. We're still trying to figure You know how to surface things, what tools and technologies are going to be really the the canonical set of the ecosystem, all that still up in the air. So, you know, what we're talking about is from the from the K files perspective that we're talking about a pattern that enterprises need to be aware of, because this is something that they'll have to do if they want to adopt containers and Kubernetes in any way, shape, or form. But it's, it's, you know, as far as making it super easy to do, where it's more like a T shirt size or a push button thing. That's, that's, that's always away. Jamie Duncan 23:35 So we're not going to close this file today. Unfortunately, there's one thing just kind of throwing it out there I was, I finally had a chance to catch up or start catching up on the portlets, which used to be called the cubelets. And I think they rebranded after the, the VMware acquisition at heptio. It's a handful of people from heptio there now, you know, VMware engineers not doing a plug here, but I was listening to it catching up on it. And they had Kelsey Hightower on right after coop con. So that's kind of where I at where I'm at where I'm at and catching up on it. And when you were talking about things surfacing, it reminded me of something that he was talking about in things leaking, which is sort of surfacing too much. And the pattern he was talking about, which gets into this whole thing is he's actually we're actually running into customers, they're out in the world who are depending on Kubernetes, who are writing their applications, assuming they're going to run on top of Kubernetes. And that's one of the it's just a horrible idea to have. You want to write your application to depend on your application. John Osborne 24:40 That's what it operators though, Jamie Duncan 24:42 kind of it's not exactly what he was talking about. He was talking more along the lines of like literally querying the Kubernetes API from within an application that runs on top of Kubernetes and actually having Kubernetes leak into the application, which is sort of the yeah honestly never seen a customer doing it before. Maybe I'm just naive, or I'm just not too far down the opposite pathway. But he said it and I was like, Oh my God, why would you even do that? And yeah, I people are probably doing that a lot. And it's sort of like you were talking about surfacing. And I guess when you surface too much, when you become too dependent on the platform, it was just it was it popped into my mind when you brought that up, because it is sort of the the we've gone too far around figuring out 10 and objects. And we've we've broken the blood brain barrier between the platform and the application itself, which invalidates pretty much any kind of rules or portability. There are certain workloads, though, where you kind of need to John Osborne 25:38 query the curve, like for instance, if you want to do like query other applications that are similar to yours and your namespace, Jamie Duncan 25:46 you're not servicing every but more like, looking for our back and go listen to it. I'll drop the link to that specific episode in the show notes. That's not you know, obviously, yeah, that's good, like service discovery kind of stuff as well. is a requirement. Yeah, sorry about some something different. And it does line up. I don't want to misquote him, and I don't have it fresh in my brain. So Tarik is you're suggesting, you know now what you're suggesting. But then what they were talking about was folks are designing applications where the application itself is basically making cool API calls. Yeah. Wow. Yeah. Does it make a little inception, nausea? I mean, Tariq Islam 26:26 you're I almost, I kind of hate to say this, but you're locking yourself into Kubernetes. Right? at the application level, which is just I can't fathom why you would want to do that, right. I see at this point, at that point, if you're if you're baking Kubernetes API calls into your application. Just write a freaking controller. Yeah, Jamie Duncan 26:48 no, I think you're exactly right. Tariq Islam 26:49 Right. I'll check out that podcast too. John Osborne 26:51 It's been a mine. I saw some things on Twitter Jamie Duncan 26:54 on it. I was like, I should check that one out, but I'm well impressed with it. I started listening to it earlier this week, and I'm catching up on a lot more Drive. And I'm willing press so far, so dark. So we're not going to get to the bottom of this because this is this turns out this is a, potentially a just a chasm of despair if you do this wrong. And there's not a one size fits all model or a T shirt or shoe size kind of thing. But can you summarize, can you do like a three or four minute summary of if someone was starting to have this discussion tomorrow? And I had they had to know three facts or four facts or some short number, what's the best way to get off on the right foot to start this process for their team for their organization? So just to Tariq Islam 27:34 summarize, multi tenancy and a few bullet points, which is actually ironic. The first thing is the tenant right? Figure out what what the tenant means to you as an organization, where are you where How are you defining that right as your tenant or line of business, an application team, the org itself, and this is largely going to define how you level in terms of multi tenancy and How big of clusters are going to be, and that also is going to impact your blast radius when something inevitably goes wrong with the infrastructure. Aside from that, you know, the second bullet point would would be areas of concern. So we mentioned that earlier in this episode, where we talked about things like administration, security, networking, access, control, you know, deployment, even supply chain logging, monitoring, things of that nature. So for each of those areas concerned, there going to be sub areas or sub topics, where you'll need to think about how you architect those things. And usually Kubernetes does provide the appropriate constructs for for implementing those areas of concerns. And at that point, it's skill sets. Right. And I think this is a major shortage industry wide in general, but skilling up on the constructs and Kubernetes itself and figuring out how to actually implement these these things. to orchestrate your starting point for multi tendency. And the reason I put it that way again, and this is probably the last bullet point is that we're just not at a point yet in the industry of the community where we can provide, you know, same defaults or t shirt sizes, if you will for for multi tenancy, this is something that every organization is going to have to really live through for the time being, I think it's a worthwhile effort. Even if we do come out with templates, or you know, t shirt sizes, some sane defaults and things like that, I think it's still worth the effort to really get down to the nitty gritty details and figure out how the multi tenancy is being implemented, and what constructs are being used for each area of concern. And breaking it down that way, I think really helps establish at least a framework. And this is an opportunity for organizations to establish their own frameworks for multi tenancy on top of Kubernetes. It's a very powerful ecosystem, a very powerful area that is yet to be explored. So that's how I would summarize it just in those few just in those few points. John Osborne 29:55 Great. Well, thanks, everyone for joining us this week on the K files, where we took the topic multi tenancy and discuss the State of the Union for that next episode we'll discuss multi cluster which kind of leads into the multi tenancy topic as well. So we look forward to seeing you then. Thanks for listening