
This video can't play due to privacy settings
To change your settings, select the "Cookie Preferences" link in the footer and opt in to "Advertising Cookies."
You Need Ops to AIOps
Does Artificial Intelligence + Machine Learning + Data Science = AIOps? Ask ten different people and you're likely to get ten different definitions of AIOps, but the key is this: it isn't a product you buy, it's a capability you build. And you need to understand Ops before doing AIOps. In this episode, Red Hat CTO Chris Wright chats with Marcel Hild about why data science tools don't replace human intelligence, and what AIOps might look like if we do it right.
Transcription
Transcript
00:01 - Chris WrightSay someone offered you a million dollars or a magical penny that doubles in value for 31 days. The immediate gain is appealing, but the long-term reward is ultimately the better choice. Technology and innovation can help us create scalable and resilient systems more efficiently. And with every dollar we save, we create an opportunity to invest in our business. But without having to rely on magic, what's going to create our next wave of exponential efficiency?
00:32 - Host
INTRO ANIMATION
00:41 - Chris Wright
We can collect and analyze data about our systems to make decisions. We can create self-healing infrastructure, and event driven automation. And we can adopt a DevOps mindset to gain agility and rapidly deliver services to customers without compromising quality. But to tap into that exponential efficiency, obviously we turn our attention to AIOps. [Background voice] AIOps.
01:10 - Chris Wright
What? Come on, of course, it's AI that breaks. All right, let's break this down. By AIOps we mean AI plus DevOps, or the next step and how we manage our systems and the services running on them. By AI we mean intelligence and by intelligence, we mean data. Well, that's a machine, right? But we can't just start throwing data into models and expect them to be intelligent. We need clean and curated data, and not just data, but our human expertise guides the training and refinement of models to ensure they're giving the best recommendations. It's easy to get caught up in the hype of AI, but we can't approach it as a plug and play technology with immediate returns. There are a few things we really have to consider and understand to set AIOps up for success. To hear more about what we need to anticipate, let's talk to Marcel Hild. Hey Marcel, how you doing?
02:11 - Marcel Hild
Hey Chris, what's up?
02:13 - Chris Wright
So there are a lot of definitions of AIOps, but we know what we're trying to achieve. It's about helping teams and systems operate better through the usage of data and data analysis, really much in the way humans do today, but leveraging AI. So what are the things we need to do just to get started?
02:34 - Marcel Hild
The first thing that you need to understand is that before you can do AI, you need to Ops. So understand the problem domain really well, understand what SRE means, what SLIs, SLOs means, understand how you operate your systems. In the end, AIOps is not a product that you buy off the shelf, but it's more like a capability you build in your teams. So if you look at the data science tooling that is applied here, like baselining, finding a common base of a time series data in your monitoring data, or correlation, how does A correlate with B or predicting the future? And you use that for anomaly detection where you say, "oh, I predicted the future of this, but it didn't appear so, so it must be an anomaly." That's all table stakes for data scientists. The tools that you get are just tools. They don't come with any embedded intelligence. So you will always train those tools on the observations, on the data that you are making in your own data center.
03:42 - Chris Wright
That's an important insight. And I know we have expert systems today, leveraging automation, so that we can take events from event driven automation and do remediation in a self-healing infrastructure. But as we go to AIOps, and I feel like there's a leap there, and we've seen advances in other parts of AI to create foundational models for whole portions of AI, like GPT-3 for natural language processing, or look at image processing with ImageNet, what are we doing for IT systems?
04:08 - Marcel Hild
So I think the current state of the art is that we get better at using the data at hand, that we collect in our own environment. Maybe we get more input features and we get faster and more accurate. But what is lacking is building the knowledge that is derived maybe at the vendor, maybe at some other site, and take that knowledge and make it accessible to the community, to other sites, to other customers, so that not everybody has to learn from scratch. How failure of a database looks like or how an outage of a cluster looks like. Like if you train an ImageNet model to identify cats, and now you want to identify your own cats at home, which hasn't been seen by the model yet, it will still identify this cat.
05:13 - Chris Wright
That's a great corollary. And I see open source and community collaboration as a fantastic way to build that collective knowledge and then distribute that through with even open source projects. And the transition from people doing all this work to including machine learning models and AI to help, that's a cultural shift. That's a fundamental change in how we do, people process part of any technology, which is always a hard part of the transition. So how do you see that impacting AIOps and what are the things that you look out for?
05:54 - Marcel Hild
Going through every revolution, people fear that their job would be gone, but in the end, we ended up with having machines doing the chores for us and us being more in the driver's seat. And the same is true for the operational domain. So if you look at root cause analysis, the machine will not tell you actually what the root cause is, but it will still need engineers to find out the root cause. Now you have an AI which actually remembers all the thousands of cases it's solved before and actually tells you, maybe you look there and maybe the root cause is over there. So I think our life will be more fun. And you have better tools to command. I mean, who doesn't like powerful tools?
06:41 - Chris Wright
Well, having personally sat through hours of sifting through logs with grep, sed and awk to find root causes, I love the leveraging of machines to really help get work done rapidly and focus on the key areas where human creativity comes to bear. And I think it's that machine augmented human intelligence that helps us do a better job of operating systems. This has been great, Marcel. Thank you so much for your time.
07:14 - Marcel Hild
It's been a pleasure.
07:16 - Chris Wright
It's important to understand that AIOps is not a replacement for DevOps. It's an evolution of operations with all the same responsibilities, but it augments what we do with automation and machine learning. It's not just about collecting more data or faster processing. It's about applying the right tools to the right problems in the right ways, and making sure that the skills and the operations teams are put to the best use. It's not the machines that are intelligent, it's humans, machine augmented human intelligence.
07:48 - Host
OUTRO ANIMATION
About the show
Technically Speaking
What’s next for enterprise IT? No one has all the answers—But CTO Chris Wright knows the tech experts and industry leaders who are working on them.
