Tariq Islam  0:07  
We do intros first, right? Yes.

Jamie Duncan  0:13  
Thanks for joining us on the K files. Each week we take on a different topic in the Kubernetes community and the IT industry.

Tariq Islam  0:19  
Depending on the week, we may find ourselves knee deep in kernel headers, magic quadrants or even startup pitch decks.

John Osborne  0:26  
Our goal is to close each file with a sense of completeness and satisfaction.

Jamie Duncan  0:33  
Hello, and thanks for joining us for Episode Three of the K files. If you haven't listened to the previous episodes, my name is Jamie Duncan. I'm a staff architect at VMware. And I have with me as always, john Osborne. Everyone says john Osborne

John Osborne  0:47  
Yes. So I'm the Kubernetes leader in the field for Red Hat public sector awesome.

Jamie Duncan  0:51  
And also Tarik Islam.

Tariq Islam  0:54  
Hey, everyone, I'm Tarik Islam. I'm a sales engineering manager here at Google. tark went into manage measurement. I don't know what to think that I did, man, I took the leap.

John Osborne  1:03  
Oh, if Google can scale some terrorists out there then and that's gonna be pretty awesome team. So

Jamie Duncan  1:10  
Exactly. Big, big vote of confidence. So I think we're going to do a quick kind of get to know us thing we're three episodes in so we're wildly veterans at this. So we're going to do what was your favorite movie? Let's say when you were 14 You go first hurt.

Tariq Islam  1:24  
Yeah, that's that's probably a pretty easy one that for me that would that would have to be returning the Jedi. I mean, that's since I was like, seven. But you know, that's even after all of the prequels the sequels. The cartoons, although I will say the Clone Wars cartoon is pretty phenomenal season sevens coming out next month, for those of you that are fans of that. But if as far as movies go, it would have to be returning the Jedi just has everything you'd ever want.

John Osborne  1:53  
That's awesome. I'd like to say I have to say Top Gun. I mean, by picking that movie. I feel like instantly outing my as being white and of a certain age, but was probably my favorite movie then and now and with Top Gun two coming out this summer thing about heading out to San Diego used to live out in San Diego and he actually used to do a Top Gun day where we would basically just hire a driver with a bus and just go to all the filming sites and wear topcon clothes and drink and it was a lot of fun before kids and stuff Of course, but I'm thinking about heading out there for the for two coming out this summer. So we'll see.

Jamie Duncan  2:32  
Okay, that's really awesome. Cool. I would never have guessed. I guess when I was 14 I was so Tarik you're a Star Wars guy. I was always lean the Star Trek direction. So I would say probably at 14 like one of the like Star Trek two would have been my favorite movie. I like the Star Trek universe more than the Star Wars universe but I'm a big fan of both don't because I got a nobody got come at me real quickly cuz I gotta just judge

Tariq Islam  2:58  
in silence. What's your take on the new Star Wars movies.

Jamie Duncan  3:01  
My wife and I just went and saw the last one, episode nine a couple weeks ago. And it's definitely for the fans, right? Like it was let's put as many bows on as many of these storylines as we can and make everyone happy. The first two, I loved I know the second one got dogged a little bit. I thought it was awesome. The first one was great. And the second one I liked a lot. And the third one was totally enjoyable. Like, there were a few spots, like there were some pretty obvious like, Oh, you could have dove down in this and gotten something really cool out of it. But the movie would have also been five hours long. So I get I get where they were coming from. I liked it a lot. Agree. Did he get you the Star Wars North nerd seal of approval?

Tariq Islam  3:40  
It did. I think a lot of critics didn't like it so much, but I think it hit all the right points. It's one, it's up there. For me. I think a lot of the criticism is probably unfounded.

Jamie Duncan  3:52  
Anyway, yeah. So onto the topic of the day. So in the K files, we talk about one we try to pick one topic and we try to change And there is as deep as we can in about a half hour. So the one, we're going to what we're going to be talking about today, and I think, john, you're going to be primarily leading this conversation. This is something this was your idea and you were really been driving it is we're going to be talking about Kubernetes, multi cluster, so not Federation, but multi cluster, and how that's either enabled or enhanced by get ops. Is that an accurate thought?

John Osborne  4:23  
Yeah, I think so. I mean, get ops is larger than just multi cluster, but I think multi cluster and edge will be primarily the use cases that it ends up driving, once we see larger adoption, although there are plenty of companies already doing this in production with this kind of

Jamie Duncan  4:40  
pattern. So maybe we start out by defining a couple of these terms, or I guess the two terms that we picked for today's topic. Do we have a consensus definition of multi cluster and we're getting crickets So yeah, I mean,

Tariq Islam  4:55  
I guess the the best way I would define multi cluster is having a grouping clusters, and that being the tenant level where you're trying to do a common common thing, you have a common goal, whether it's for your workloads, or how you how you operate, Lifecycle Management, but across, you know, clearly divided clusters, just managing uniformly, configuring them uniformly, deploying them and things like that,

Jamie Duncan  5:21  
at least in my world cluster API is just sort of the de facto standard for that. Yeah, yeah, we get a source of truth to define our Kubernetes deployments, and we deploy them as needed. And then so we're going to be talking about multi cluster we're going to we're going to be talking about it within the context of get ops. But john, how are you? How do you define good ops?

John Osborne  5:42  
Well, the probably the one sentence headline I'd give it was that it's just a pattern on how to do continuous delivery. And in the context of Kubernetes, you know, it's talking about often having an operator or we call it an agent, and we call agents operators inside the cluster that treats get as the single source of truth, and it has all of the configurations for not only your applications, but then all the Kubernetes manifest files and other things and uses get to sync and do differences with a cluster so that you know that your desired state in your cluster is always the state that you want in get. And then all the workflows and other things end up basically revolving around get communications with a team that issues everything is really done through pull requests and gets that stays that single source of truth

Jamie Duncan  6:28  
in you mentioned sort of operators and, and things running within the cluster that sort of that rectify, and make sure that the cluster is always the way it's supposed to be when I read the definition of get ops about a week ago, because I've always just kind of heard the word and Amelie dismissed it. But when we said we were going to talk about this, I actually went and looked at sort of the accepted definition. And what I internalized was using part of what you said, was using Git as your source of truth and using get as the launch point of your workflows and Been doing that just sort of as common sense for a long time? You know, you do a get push, and you trigger any number of things. Is that definition?

John Osborne  7:08  
Yeah, I think that's pretty accurate. I mean, is it just sort of added to a lot of this stuff isn't new, right? Because you can go back like 10 years. And there's books from Jean Kim and Jess humble and Dr. Forrest green about with accelerating stuff about, you know, put everything into a single source of truth, right. And over time that that's evolved into the single source of truth is now get for. I don't know what you guys are seeing. But even from some of the customers that have, you know, a lot of older applications, I mean, pretty much everyone I see is using get at this point. So that, like infrastructure of code model isn't new, I think. But over time, the workflows and other things that have arrived around some of the good offerings out there, like GitHub and get lab has really become more robust. And so the old model of just putting everything infrastructure of code into this kind of hope big repository, and then having All these different other automated tooling like Ansible, and bash and Jenkins and all these, like external tools that then do imperative type modeling on your cluster? Well, we can, I think we can do it a better way now and it's more declarative by basically just putting all that stuff, bringing that in the cluster, getting rid of the hundred different tools that you might have that would do that externally. You have an operator that can sync with get because it knows the state of the cluster, and then it can do more robust things as well. Like you can do disaster recovery that way because you now have everything and get and you can turn everything back on and you can do more things in a declarative way because you know, sometimes when you script things out, you don't are already know the existing state of the cluster. Some things can break you might have to roll back roll forward and since you have everything and get if you need to roll backwards while you have the exact everything in get like it was and there's other benefits too. I was actually watching one of the coop con about you guys I'm still going through the coop con videos from November. But there is someone that just have these like great tidbits in them. But I was watching one of them on on auditing. And they were talking about you sometimes you just find these gems in the videos, but one of the engineers was saying the default settings that come with Kubernetes for auditing actually won't catch any CR DS. And I think it's set to, you know, a star or something, whatever it is for the API's, but that's really for only the core Kubernetes API's. And any auditing that you want to do on a core on a CRD, you actually have to explicitly add to the the auditing settings in Kubernetes. A lot of times I see customers they go, they want to go to use auditing, and but it's more of a checkbox, they haven't really gotten the full tracing and things around it. So something like get which provides a really nice log to do auditing. You know, that's all really built in for you so can solve some of those problems. You know, I work in the federal space. So there's literally people that say And classifications levels where if something happens to a cluster, nobody's supposed to touch it, you literally just basically burn that environment to the ground, right and, and then have to rebuild it Kubernetes. And using get ops with Kubernetes is a good model for that, because can easily rebuild the cluster. And you have that separation and isolation because there's an kind of going with this model actually be terrified. The interesting to get your take is what we've seen into it to what we've seen, we work do we work created flux, which is one of the solutions out there and into it created Argo, which is another solution. We can talk about those in a second. But they actually broke out their practices of keeping the having an application repository with all the code and then having a separate repository with all the configuration manifest files. And we seen patterns like that emerging to and tech, I'd like to get your take on that. Because if you go back to like the Google essary book that came out a few years ago, at Google, they said that they since application teams, really running In their code in production for, I think six months or nine months before they hand off the CERT teams, they actually had a different take on it, where you basically put all your configuration manifest files inside the master branch of your get repo with all your code, and then you hermetically seal that into one thing. I think in the enterprise, probably having the two repos, it'd be more of a

Tariq Islam  11:23  
better option to isolate the teams to infrastructure versus applications. But I could see if you had the Google essary model where, you know, combining them into one repository that might make more sense, the the model that you just described, where enterprises would be more comfortable with separate repos. I think that's what we're, we're seeing more and more, and I, you know, understanding that Google has its own way of doing things, but it's really the same principle, right? The repo structure. We've seen variations of that, depending on whatever the enterprises most comfortable with, where they're coming from, how their teams are set up. But on principle, as you mentioned, I mean it's you know, having that time separation, I think in the minds of, you know, infrastructure operations teams development teams, it's just it's just easier to conceptualize for them. So that they're not necessarily having to worry about, okay, well, how am I going to manage these things? My source of truth? When is intermixed with with my application code? Keeping it separate? I think it's probably good to start as a starter pattern,

John Osborne  12:22  
what Google has with anthos? Don't they have a configuration management aspect to it that does something with get ups?

Tariq Islam  12:28  
Yeah, yeah, we do. It's, so I know, you mentioned like our flux, in particular. So anthos does have a configuration management component that basically does act of reconciliation against the source of truth that lives and get,

John Osborne  12:42  
do you provide any best practices for customers around, you know, have a separate repo for your configuration files versus, you know, combining them into into one repo or is that just customer

Tariq Islam  12:53  
custom? Yeah, it's gonna be more, you know, set more centric to the enterprise and what they need and you know, how they're having Their teams want to want to run.

Yeah, for us. And, you know, for us configuration management pieces more

Jamie Duncan  13:07  
of a hammer, what you were describing john, where? Okay, so let's just assume we have multiple repositories. And I want to put a pin in this for now, but I'm kind of interested in the idea of, okay, so app devs, the app, the app developers have a repo of the code. The ops teams have a repo of the configs. To the security teams have their own repo, because we're kind of in this star Ops, model dev sec. NET Ops, does everyone get their own, and then we all go forward in order to put a pin in that for a second. It helped me understand a little better. It sounded like what you were describing was, like when you're talking about auditing, okay, I fully understand I do get push, right. I kick off a workflow to say deploy a cluster or to modify a cluster, deploy an application, and it goes and pulls everything from get, and that becomes my source of truth. For the most part, we've been doing that for a long time, but did I hear right or did I just misunderstood And when you were saying that, that it becomes a two way street where things are pushed back into the Git repo to generate logging events, I think

John Osborne  14:09  
in most cases, you're just changing the get repo and then it's a pull. And then it's a pull request, or you're pulling that medical request, you're pulling that information in from the cluster, or outside the cluster, like Argo can do external reconciliation to or on cluster reconciliation. But I think the first question that you said to put a pin in actually ties into this because I think one of the biggest challenges here will be that in this model, you've really kind of separated your ci from your CD. And I think that works great, because when we talked about ci CD for a long time, but most customers did have a completely separate CD process from the CI process. When you go to complete heartlines separation between that you really need to be able to validate everything in your ci pipeline, and then it's going to work when you actually do a deployment because Developers aren't going to be touching the cluster, they're only going to be touching what goes into get. And so we talked about like all the security tools that are out there in Kubernetes. You need to be able to wait to essentially smoke test what you're doing, from a CI perspective that that's going to work when it gets deployed. And I know in a future episode, we're going to talk about open policy agent opa. But that's a really good tool for this model, because you can easily it's just a lightweight binary, you can easily plug it into your ci pipeline to smoke test things. Now, there's a lot of you talked about, is it a two way street? I think ideally, no. But there's also use cases like a lot of security tools out there. They have all these runtime agents like I think twistlock has one that does like the CIS benchmarking. And Aqua does one that does cis benchmarking. There is cystic and all these other other tools where they're going to run their runtime agents that are going to run on the cluster, right? So how do they provide feedback and annotations as well, I think that's something that's really being still kind of thought out. But in most cases, I think for most people, there'll be running this kind of one way model where everything just gets put into get, and everything gets changed from there, because I don't know about you all, but a lot of times, at least in the early days, especially to with containers, you know, we talk to customers about environmental deltas and differences in, you know, hot patching your system, and how you're really supposed to patch the container, which then goes through your ci CD pipeline that then, you know, gets deployed into production so that you have a consistent baseline across environments and they're like, well, if something's on fire, I still go patch. I'm still gonna go patch hot patch the production system, right? And then I try to work backwards with that configuration, you know, after the fact because I have to patch production as fast as possible. Well, hopefully get ops can do a better job of that because you're just going to patch and get and they'll immediately pull that request and you don't have to worry about the kind of manual error prone way of kind of patching, you know, a live system and then going back in patching your depth system in a similar fashion and having it go through the CI CD pipeline. So that sounds a lot to me,

Jamie Duncan  17:06  
like what we talked about a lot when I kind of sold Ansible for a living, where the Golden State of Ansible is never SSH into a server. Even if you had a zero day issue, even if you had sort of an actively compromised system, you would still write your Ansible code to fix it and then push the Ansible out that idea of we don't hot patch, we fix the code that fixes the system and then rely on the distribution network to go fix the systems.

Tariq Islam  17:34  
We're pushing out. We're expanding the the the premise of Kubernetes with Application Management being declarative and and pushing that out to to the infrastructure to the cluster itself. I think that best practice is probably the biggest value here. Forget ops. I know it's I know Jamie tierpoint. Earlier you mentioned that it's it's a buzzword, but I think it's a buzzword because everyone's been kind of thinking about it kind of been doing it at some form in some form or another. So Ansible terraform, things like that. But I think the marriage of that the get ops principle with the Kubernetes declarative model is is a, it's kind of a match made in heaven here where, you know, Kubernetes is already declarative for application management. And with cluster API. Now we can do the same exact thing for infrastructure operations teams to manage entire environments. I think the uptick here is going to be getting infrastructure and operations teams that are responsible for clusters and configuration, as well as security teams and networking teams, because it's really spanning those three personas, right? It's going to be getting them skilled up on this declarative model, because we spent the last three decades basically, and in a very imperative world where we were SSH into servers. And so now, we're pushing that development ecosystem that that ethos of if you want to do something if you want to, if you want to make a change in what you're shipping In this case, it's the infrastructure, which is Kubernetes, you're going to have to basically fork you're going to have to modify the code. And then you're going to have to submit a pull request that gets reviewed. I mean, going through that whole process is something that I think is relatively new for that whole that whole space because so far it's been Oh, something is wrong. Let me just go into production make an imperative command and oops, I fat fingered something. So now I just blew up production.

We're trying to do so. And you can now right so

Jamie Duncan  19:30  
Tarik, quit reading my resume.

John Osborne  19:32  
The reality is, if you've ever been on call, if you come in at two in the morning, you're not doing it best practice. You're doing the now practice, right? So you can go so you can go home. And that's a great

Jamie Duncan  19:43  
way to put it. Oh,

John Osborne  19:45  
but yeah, I do think that's one of the things that you know, the kind of the downstream ramifications organizationally is something that people have to figure out there was there's actually a coupon there was a really good panel on this topic. And Jamie you yelled at someone recently on Twitter, if Got who for saying that the panels that they'd never learned anything off a panel. But I feel like the people that think that just are bad listeners, but there was actually a great panel at coop con, around get ops and one of the engineers from into it was saying that, like organizationally, what they do is they, they'll go spin up all these clusters. And the people that are deploying apps won't have any access to the clusters at all. And they but they might have access to all the the relational databases and other cloud resources that they have. But using the get ops model and kind of the separation of concerns. They don't need to they can have that kind of handoff and separation between the teams around it. So I do think that it can go a long way. There was also a quote from when I was actually researching, get ops for this. For this podcast, there was actually I came across a quote from Craig Mcluckie. There was one quote that I've actually used on slides with customers before because I really like it. It's one of my favorite Kubernetes quotes and it was basically around that. Kubernetes is one of the bigger opportunities that it provides is the opportunity to move from more of a ticket driven infrastructure to more of an API driven infrastructure. And, you know, we've seen in organizations doing transformations here at Red Hat that, you know, one of the great ways to to scaler in organization is to move into this more like API driven model where, you know, build out teams. And this isn't, this is all new information about, you know, your teams are focused on apps and their cross functional and you have kind of these API's in between them. And it's more API driven and ticket driven. And get ops is one of the ways that you can kind of extend that model and also move towards a mutable infrastructure.

Jamie Duncan  21:41  
So let me see if I can knit this out. And I think actually pull some some of the concepts that we talked about in the first two episodes as well. Going into that sort of API driven infrastructure. You don't need a crystal ball at this point to understand that's where the IT industry is moving over the next and I don't know if it's two years or 12 years, but over the Next period of time, where we want to have teams and infrastructure and applications, we don't have it all separated by API's. All of that makes all the sense in the world. And really, in the point, the part of what tark said a few minutes ago that really drove in on me was it's not that the good ops concepts are new. I mean, I've been using Travis CI for five years, and they're certainly people who've been using stuff like that way before I did. Where get becomes your source of truth and get becomes the the trigger point, the trigger for your workflows. But Kubernetes and that concept, really amplified each other a lot. And turned what was common sense into a very overloaded term of get Ops, things that you know, belong on VC slide decks now. And then when you take even further when you start talking about custom controllers inside Kubernetes and operators, and leveraging that to now I can now have an API that can control my infrastructure. have custom controllers that can do things like create cluster API, where now I have an API that represents the infrastructure for my Kubernetes cluster in a completely new way. And I'm able to maintain all of that as code. So we're really kind of seeing this convergence of kind of Kubernetes. And its ability to have custom resources and custom controllers. And this concept of get as your source of truth, you know, star as data stars code. And is this the is this the crucible? Is this sort of where it all converges into a diamond? I think so.

Tariq Islam  23:37  
To me, this is the Grail, I think we're getting to a point where, like, this is where everyone really should be. This is, you know, except what we're missing here is the a lot of the instrumentation. It's one thing to have a single source of truth and an operator that you know, syncs and you can push changes out. But as far as things like observability around what's happening? verification, validation, you know, things of that nature. That's still, that's still pretty, half not even half baked, it's barely baked. I would say,

John Osborne  24:13  
I think that's actually one of the biggest challenges moving forward with this model. Because if I type coop CTL apply, which is at the end of you know, what the operator is doing, that'll probably be the last step right? It gets everything from get it does a coop CTL apply. If that fails? Yeah, the operator can report that status back. But the scary part is, what if that works, but the app still breaks, right? And that's really where you need observability. And how we do that, you know, there's, we've got Prometheus metrics and tracing and all these things. I think a lot of organizations are still trying to figure out basic monitoring and be honest. But you know, how do we know that the application is broken? You know, Google, I think Tara touched on this has like these four golden metrics that they use for observability. And there's all sorts of companies out there like data dog, that too Good job as well. But kind of figuring out the workflow and marrying that to the, the observability aspect where you know, something breaks, but it's non trivial, you know, latency goes from 100 milliseconds to 400 milliseconds. And even figuring out if that increase latency matters to your application or your end user, and notifying the people that do the applications, because now they're completely separate from the delivery, and how that works is probably going to be one of the biggest challenges. I think, moving to this model.

Jamie Duncan  25:30  
I think that that's lagging behind. Because we've never had that. I mean, when you look at most of this stuff, we've had ci CD for years, we've had infrastructure, you know, since the 40s. You know, since the Bletchley Park, we've had servers, we've had computers. So having an infrastructure that is represented by physical systems or VMs, or containers, whatever. We had something to build on. monitoring, even up down monitoring, sort of navigate Since Zabbix, we got okay at that. I mean, every knock for every major company on the planet is running what's up goal? We know when servers go down, but we like you said, we don't know when server applications get slow. And we don't know when only parts of applications get slow. So that observability, like, if this was tart called at the Grail, I see it more of as a as a table or as a thing to build, and we're still missing a leg or two. I think you're exactly right. But we don't have anything to build on with observability. So it's all net new, which means it's going to take a lot longer to get it right.

Tariq Islam  26:38  
It is not new, I mean, so so john, you mentioned the four golden metrics. So the signals are I mean, these are these are meant for user facing systems, right? So you got latency, traffic errors, saturation, but these don't necessarily apply. The same way to infrastructure. Right. But, but as an IT but but as someone on the infrastructure team, I need some signals to, for me to know that some, you know, my infrastructures are they're healthy or not. So it's, it's a common pattern set, but it's a completely different set of signals. And I just don't think that we've teased out or been doing this whole get ops thing long enough to be able to tease out what those signals should be categorically. So just in the same way that we have the four golden signals for user facing systems, we need a similar set of X number of golden signals for for non user facing systems. And in this case, it would be the Kubernetes infrastructure, which is quite a lot.

John Osborne  27:34  
Yeah, it's it's definitely something that can be involved in Honestly, I think a lot of companies should consider outsourcing their observability to a company like data dog, honestly, because a lot of a lot of customers are still the old model where it's more monitoring where it's like very deterministic, where, you know, it's, you know, got this many messages in my message queue. So I need to page someone or I have this many this much discussion. Getting full. Whereas, you know, in these more complex systems, it's not as deterministic to figure out, you know, what is going to win is going to what is going to break? So I think that'll be a challenge as well. But one of the other things, too, I think that we'll see is, if I'm not from a, just to put my like security hat on per second, if I'm a hacker now, you know, I'm looking for high value targets. There's no higher value target now that get, I mean, it's always been that way. But now that get has 100% of the keys to the castle. I'm going after get in this model. And so I still think we'll probably need to think about, you know, how do we sign we need to do secrets better, that's been a thing that communities working on, I'm going to need to be able to assign things like manifest files, probably using with multi multiple keys. I'm going to need probably multi factor authentication. inside of my get repository. I'm gonna if I'm running my get repository on prem, I'm going to need to be updated. And patching my Git repository itself. So I think that is something that will also really need to think about as well when we move towards this because man, if I'm a hacker, I'm looking for any, any CVE, I can find in get lab or

GitHub or any of those.

Jamie Duncan  29:20  
It's funny, you've been thinking about the tooling. And we've been talking about the tooling for the past five or six minutes to sort of bring in just kind of calling get Ops, this combination of all of these things, to bring all of this into complete reality, you know, having declarative infrastructure, having API driven everything having having effective observability and traceability, and all of those things. I've been sitting here thinking about the people. Like I think, john, you just sort of outlined the things that don't exist yet the reasons we're not there yet. And I'm thinking about a network admin, and trying to explain to them how to do a pull request or a merge request and how that's going to take you months of effort alone.

John Osborne  30:05  
Yeah, I think there's lots of limitations there.

Even stuff like

in a declarative model, you know, I declare what I want my infrastructure to look like with the storage and the networking and all that stuff, even the communication that's involved. Now, between, I'm a developer, I push that stuff, while you're the ops person or the network person, that stuff already has to be ready, right, those pvcs after you're ready to be claimed, and all that stuff. So think about making sure that it's an API handoff, not a ticket driven handoff, so you have to front load or making sure there's a pool of resources there for all those engineers to us. I think sort of the human element

Jamie Duncan  30:37  
here is going to be the anchor, the thing that's holding us back even further, the tools could exist tomorrow, and they could be fully production ready. How are we going to dev net sec Ops, that sort of mentality how are we going to bring in I always kind of pick on the networking people as the as the furthest behind on the technical scale in this world. You know, there's still, you know, I talked to a network admin that still uses telnet.

That's awesome. Now, in no way shape or form,

right, yeah. And he was talking about tell netting around. I was like, are you just talking about SSH? No telnet.

John Osborne  31:21  
I started off my career in telco so I always have a place in my heart for telling letting

Jamie Duncan  31:25  
me to man I you know, learn how to bundle cables with wack string kind of stuff, but also understand that that's not effective. So anyway, the the question, and maybe we can use this to sort of to wrap it all together? How are we going to get the people there, because operations teams have a long way to go. They understand version control at this point, but they really don't understand the flexibility and the power sitting inside a get repo. Or at least a lot of them don't. networking and security teams are even further behind than that. If you find a networking team that uses version control, hire them all. Right now. There are a lot more You know, on the whole, they're they're much further behind. So kind of looking at it from the human perspective. How do we get them there? How do we bring get ops into fruition? How do we bring everyone into targets holy grail,

John Osborne  32:15  
you need to start with an adventurous or a kind of cutting edge team, right. So there's, you know, every organization has teams that are more, I guess, adventurous in terms of adopting new technologies versus others. So you have to kind of identify what teams those are going to be. But then, my opinion is that it's going to be really critical that you have cross functional stakeholders on that team, because you have moved into this API handoff versus a ticket driven handoff. So you know, you have to have all the pools of resources ready. When you're making declarative API's, you're going to need to make sure that that the actual cluster can meet the requests that are coming in. And then from a security perspective, you're going to need to have a way to validate all the security controls and mappings and controls. In your ci that are going to work and validate that is what's running in the cluster, right. So if you're using something like opa or some other scanning tool, you know, anything that would work or fail in production, you need to make sure that you can accurately account for that in your ci. So I think in order to do that the right way, you're really going to have kind of a cross functional team to do that. So you know, find an adventurous team, get the stakeholders that you need. And then you know, like everything else, it's about building small wins, and then bubbling up that into larger cultural changes. But you always have to start off with, you know, demonstrating value first and a small team tark is anything you want to add in,

Tariq Islam  33:37  
I was gonna double down on what john just said about finding that cross functional team. I'm not I'm not talking about a DevOps team, right. I feel like at this point, everybody's got that DevOps team. It's more about finding to John's point, the more forward leaning folks from each discipline, red security, networking, ops Dev, but really Kind of, you know, make the, you know, create the team, but also have them look at it as not not a tool, but really as a way of working, right? How, how do we want to work? How do we want to operate? And, and this is really about making sure that that be the focus. So that each individually each persona can bring their perspective into into that model. So how do I want to work as someone on the networking team, right? In a declarative fashion? What do I care about? What do I need to define? What do I need to push out? And how do I observe that? I think these are the big questions that each persona needs to ask. And that last one is his most important one. How do I observe what I'm putting into this source of truth? And how do I become proactive? Or how do I react to an event and how do I monitor these events? How do I even get alerted to something that may or may not have happened? verification validation the whole nine, but having this cross functional team, I think is a fantastic first step and then just having each each persona bring their unique perspectives and This declarative single source of truth model, and having it as a way of a way of working, I think is probably kind of that pattern.

Jamie Duncan  35:08  
Yeah, I think that's a great way to summarize it. So, but to put a bow on get Ops, I think we can all agree that the industry is further further down this road than we've ever been with, in particular, driven by the, the API centric workflows inside Kubernetes. And then leveraging all of the CI CD capabilities that are just sort of housed inside get, and that get Can, can bring to fruition for us, but we're not there yet. A lot of the humans aren't there yet. And a lot of the tooling isn't there yet in some of it doesn't exist at all. In particular, around observability and traceability. There is stuff going in the community of john has plugged data dog a little bit and actually just had some dealings with him and will agree they do really good work in that field, but none of it is is complete. Being able to really observe traffic and predict problems based on that data. That's really the big shortcoming. So the the humans, and then the observability, and traceability aspects of getting into sort of get ops. I think we're getting there, the tooling is significantly better than it's been for any other thing that I've ever worked in. So again, I don't think we can close this file, there's probably three startups that new none of us have ever heard of that are going to completely change this game in the next six months.

John Osborne  36:26  
For Kubernetes projects out there right now flux and Argo. And flux has a little bit more of a churn and burn startup aspect to it, where it they have an operator that kind of matches all your container images, and then updates your manifest files for it. Argo, came out of Intuit and what they do is it's a little bit more enterprise II with like Single Sign On and pruning and those types of things. So if you change something in your cluster, it can actually clean up the old resources before it deploy a new one. But they're actually they announced a coupon to which I'll put in the show notes, but they're actually So to work together on this open source project called the get ops engine, and that's going to handle all the Venn diagram right now. It's everything that they that they do that that's the same. And I think if that works out, well, I think there's definitely be a challenge for that with, you know, the community have I built this and you built that? How do we combine that together, but they hopefully combined into a project called Argo flux, I think, is the long term vision. And then now with this, but with this baseline convergence, if someone does build something in the future, hopefully they just build on, you know, hopefully, this get ops engine ends up being like the cube ADM, of get ops right where, you know, cube ADM, does 80% of what anyone would need, right. And if anyone's something opinionated, they can build on top of it. Well, with get ups engine, if I need to do something very opinionated with get ups, I can actually just build on this open source project called get ups engine, which will hopefully do 80% of it. And then you know, everyone else can every other vendor can have their own opinion on top of that, Eric,

Jamie Duncan  37:53  
cool, and we'll make sure we get all those in the show notes. I think that's a good spot for us to sort of close the file. folder on getups. Like you said, like we said a couple times in discussion. The next episode, hopefully about two weeks from from this one going live, we'll be focusing on APA, or opa is the way the community wants it pronounced. But we're going to dig deep into APA, which is a pretty critical component when you start looking at getting into this API driven everything designing. There's a whole lot of security way ideas inside Kubernetes. But there are also some big holes and opa plugs, a pretty big one. So we'll see. We'll look for that here in about two weeks. And thanks a lot for taking the time to listen to the K files. Thanks, everyone. See you two weeks.