Ever spend an afternoon spiraling down a wormhole? That’s how it feels sometimes for site reliability engineers and application performance managers as they try to figure out what went wrong. Troubleshooting is hard, and often takes unexpected twists and turns.
For decades, the rule of thumb has revolved around correlation: understanding how the combination of CPU usage and network speed relate, for example. Engineers look at histograms and try to ascertain the relationship between data feeds and the trouble.
All of that history might be about to go out the window, however, as a new approach promises to upend this time-intensive and often frustrating exercise. What’s this new magic sauce? Say hello to Structured Causal Models!
On a recent episode of InsideAnalysis, serial industry innovator Ellen Rubin threw a curve ball at us: “Okay, so I’m gonna shake your world. Forget about correlation. Correlation is history.”
Say what?
“The way people think about correlation is, ‘Stuff’s going wrong and there are a whole bunch of things I got alerted on. I’m looking at the picture that I have of my topology and the incidents that I’m seeing, that all my tools are telling me about. And I’m a human being and I’m trying to figure out how those relate to each other.’ Mostly what I’m looking at is anomaly detection together with correlation.”
She continued: “That is the state of the world right now. And what we’re saying at Causely and what we as the founding team really believe in very strongly is: In the end, that comes down to providing human beings with pieces of information and updated pictures of their environments in reality. But still they have to piece together what’s going on.”
This is where Causal AI steps in, offering a paradigm shift in IT operations. By leveraging causal models, Causely automates root cause analysis (RCA), slashing troubleshooting times and ensuring application reliability.
Demystifying Causal AI: Beyond Correlation
Traditional AI excels at identifying correlations. It can tell you that ice cream sales and shark attacks both rise in the summer. But correlation doesn’t imply causation. Causal AI, on the other hand, digs deeper to understand cause-and-effect relationships. It recognizes that the summer heat, not a surge in aggressive sharks, drives both trends.
In the context of IT operations, this distinction is paramount. Imagine an application performance issue. Observability tools might show fluctuations in database queries and server load. But what caused this? Was it a recent code deployment, a surge in user traffic, or an underlying hardware malfunction?
Causal AI tackles this challenge by building causal models. These models represent the relationships between different components within a system. By analyzing vast amounts of data, Causely’s platform learns these relationships and how changes in one variable can impact others.
Here’s a breakdown of how Causal AI empowers Causely:
• Automated Root Cause Analysis: When an anomaly is detected, Causely’s causal models analyze the data to pinpoint the root cause. This eliminates the time-consuming and error-prone process of manual troubleshooting, freeing up IT staff for more strategic tasks.
• Actionable Insights: Beyond identifying the root cause, Causal AI provides clear, actionable insights. It doesn’t just tell you what’s wrong; it suggests how to fix it. This reduces downtime and ensures faster issue resolution.
• Proactive Problem Prevention: Causal models enable Causely to predict potential issues before they occur. By anticipating the impact of changes or identifying emerging bottlenecks, proactive measures can be taken to prevent disruptions altogether.
Causely: Putting Causal AI into Action
Causely is the brainchild of industry veterans with a deep understanding of the challenges faced by DevOps and SRE teams. Their flagship offering is the first Causal AI platform specifically designed for IT operations.
Let’s look at how Causely leverages Causal AI to transform IT and Cloud operations:
• Focus on Application Health: Causely moves the focus from monitoring individual metrics to understanding the overall health of an application. By analyzing the interplay between different components, it identifies the root cause of issues that impact user experience and business outcomes.
• Reduced Time to Resolution: Traditional RCA can be a time-consuming exercise, involving manual analysis of log data and code. Causely automates this process, significantly reducing the time it takes to identify and resolve issues. This translates to improved application uptime and reduced business impact.
• Improved Developer Productivity: With faster RCA, developers are freed from the time-consuming task of troubleshooting. This allows them to focus on core development activities, leading to faster innovation and shorter release cycles.
• Scalability and Resilience: Causely’s causal models can adapt to complex and dynamic IT environments. As applications evolve and new components are added, the platform automatically learns and updates its models, ensuring continuous application health across diverse environments.
The Marriage of OpenTelemetry and Causal AI: A Powerful Union
Causely recognizes the importance of standardized data collection. They leverage OpenTelemetry, an industry-wide initiative for generating telemetry data from cloud-native applications. This ensures that Causely receives high-quality, consistent data, allowing its causal models to function optimally.
OpenTelemetry offers several benefits for Causely:
• Reduced Data Noise: By filtering irrelevant information, OpenTelemetry ensures that Causely receives only the data it needs for accurate root cause analysis. This minimizes processing power required and streamlines the identification of root causes.
• Simplified Integration: OpenTelemetry’s standardized approach facilitates seamless integration with various monitoring tools and technologies within an IT ecosystem.
This synergy between OpenTelemetry and Causal AI empowers Causely to deliver a holistic solution for IT operations, streamlining data collection and leveraging causal models for unparalleled root cause analysis.
Rubin concluded: “To some extent, the more observability tries to solve the problem in the industry, the more problems get created because how many hundreds of alerts can you look at? People simply can’t keep track of it and they don’t understand it. And sometimes the alerts that you’re getting are the symptoms and not the actual cause.”
And that’s where Causely excels. Be on the lookout for more from this startup! And it’s arguably hitting the market just in time.
About Eric Kavanagh
Career media professional who designs and manages an array of Web-based research and media products, including: The Briefing Room, World Matters, Hot Technologies; as well as DM Radio & InsideAnalysis which are both now broadcast coast-to-coast in 25+ markets, reaching upwards of 1 million listeners per episode. Recognized as a luminary in the field of Big Data. Recognized by Techopedia and Big Data Republic as one of the top experts to follow on Twitter