Water systems make a good analogy for understanding the world of data. Data lakes are enormous centralized repositories, while data streams flow freely at high speeds and high volumes. Data lakehouses bring the warehouse order to the more expansive vastness of the lake.
There’s even stagnant water in the world of information i.e. those locations and repositories where data has ‘gone bad’, which we can assume to mean data channels (there’s another water reference) where information has become stale, is incorrectly stored or is erroneously non-deduplicated.
The water analogy also extends to observability, where teams analyze an ever-growing number of logs, metrics and traces to ensure systems are working at their peak levels. This is the opinion of Eduardo Silva.
Eduardo Silva is a co-founder of San Francisco-based Calyptia, a first mile data observability platform and the creators and primary maintainers of the Fluentd and Fluent Bit. Eduardo and Anurag launched Calyptia to help organizations with their observability challenges. Today, Calyptia provides Fluent Bit-based products and services that enhance and integrate with organizations’ existing observability tools.
“Just as the water gets analyzed and tested for safety in two places (at the reservoir and in the pipelines), observability data needs to be analyzed and acted on while it is still flowing through the pipeline. Waiting for the data to arrive in the data lake is insufficient, and failure to maintain constant insight into the quality of water (and data) could lead to disaster,” said Silva.
As businesses in every industry become more data-driven, the amount of data generated will continue to increase. The IDC Global DataSphere, a measure of how much new data is created, captured, replicated, and consumed each year, is expected to double in size from 2022 to 2026. That’s leading to a growing demand for observability tools and some emerging best practices for implementing them.
Inside Analysis spoke to Calyptia’s Silva to get some insight into what 2023 has in store for observability professionals.
Observability in 2023
Embracing open standards and a vendor-neutral approach will be key.
“As companies begin to put together their observability practice, it’s easy to put all the focus on data analysis and centralizing all of its information. Choosing a single vendor to accomplish that is tempting but also short-sighted. Being locked into a single provider can hold an enterprise’s data hostage to rising licensing and storage costs: a lethal mistake to make as companies are keeping a close eye on IT budgets in 2023,” said Silva.
For true (and immediate) observability, a wide variety of products and platforms need to connect to each other. A CNCF survey reported that 72% of respondents employ up to nine different logging, metrics and tracing tools, while 23% use between 10 and 15.
Given that huge array of solutions, Silva says that it’s critical to select something that integrates with them all. That’s why he says open source tools are essential to meet today’s observability challenges. With open standards and a vendor-neutral approach, users with multiple back-ends or tools don’t have to worry that their data might be favored to a single endpoint over another.
Reducing rising cloud costs
In a recent presentation at the Monte Carlo Impact Summit, Redpoint Ventures Managing Director Tomasz Tunguz shared 9 Predictions for Data in 2023. Leading off his list is a prediction that cloud data warehouses will process 75% of workloads by 2024.
According to Silva and team, as companies across all industries become more data-driven and more reliant on the cloud, they will begin to encounter unforeseen costs — in both time and money — hidden in the cloud infrastructure. Those costs include the egress fees cloud providers charge when a company wants to move data from one place to another.
“A vendor-neutral observability platform that simplifies pipelines can cut down on those excessive fees and help companies avoid cloud sticker shock. It’s expensive to route, store and analyze data in a back-end database, and those costs skyrocket as distributed, dynamic IT explodes the amount of data you need to store and process. Analyzing data at the source saves teams time typically spent configuring and maintaining those pipelines. By understanding event data more quickly and comprehensively, teams can identify what data is essential and what is not. That non-essential or duplicative data can be routed to a lower-cost back-end for long-term storage and immediate cost savings,” explained Calyptia’s Silva.
Avoiding the noise
It’s been more than a decade since Marc Andreessen claimed “software is eating the world,” but it’s no less true today. But as software consumes all departments within an enterprise, the atmosphere gets wildly complex very fast.
“It’s becoming more important than ever to collect and route data where you need it so that engineers can focus on creating business value rather than connecting data sources to data backends,” said Silva. “By applying business logic early (in the pipelines, to extend our water analogy), teams can identify what data is essential and begin to structure it before sending it to the back end. Along with the cost savings mentioned earlier, eliminating the complexity of configuring and maintaining your observability pipelines also significantly impacts system performance.”
Keeping the water (and data) pure
To return to the increasingly relevant water-based analogy for observability in 2023, Silva says that a city’s water system and a company’s data system both work best when all the components work together seamlessly. In both cases, solid infrastructure reduces user costs and frustrating downtime.
“Like with the water supply, the end users of data don’t want to have to think about the complexities of how everything works; instead, they want to just be able to get the data they need with confidence that the pipeline is flowing correctly and securely,” concludes Silva.