If you want more, better, different analytic insights, you need more, better, different sources of data. 

This is especially true of data science and machine learning (ML) engineering. Rather than merely surfacing insights, these practices focus on productionizing them: identifying specific applications or use cases, engineering solutions to address them, and putting these solutions into production. The resultant products use machine intelligence to automate different kinds of actions in response to events. But, again, all of this depends on access to more, better, and different sources of data.

The good news is that the SaaS cloud is a potential goldmine in this regard; the bad news is that few businesses are not doing enough to mine this resource. “One of the biggest things we’ve noticed is that a lot of the SaaS applications in the world today are very point-in-time-centric,” says Joe Gaska, CEO of Grax, a provider of SaaS data management services. Gaska distinguishes between the point-in-time bias of SaaS apps and the big-picture historical view he says is a prerequisite for analytics.

“For your personal life or your family history … if you really look at it and if you really kind of want to understand change or behavior, it really [comes down to] reviewing and understanding historical data,” Gaska said during a recent Inside Analysis webinar. “Understanding … that history and owning that history more importantly is crucial for many of the companies that we’re working with today.”

The problem is that the SaaS cloud does not necessarily make this easy.

“Data in the cloud today is really locked away on other people’s servers,” he told analyst Eric Kavanagh, referring to Salesforce and other cloud providers. Even though Grax started out as a purveyor of backup and recovery services for Salesforce, Gaska says that customers kept inundating it with another, related problem: the need to capture data history, too: “How do we free that information and make sure that it’s available for our historical review was crucial for a lot of our customers.”

The two problems are related, Gaska realized, because the backup and recovery use case naturally lends itself to the data history use case: backup and recovery is data history—and vice-versa.

“Again, disaster recovery, point-in-time restore, compliance, auditing, all of those require historical data,” he pointed out. “But if we capture all of our history, what other very interesting things can we do with that history in the future,” Gaska continued, “if you’re thinking about [Amazon] sagemaker or artificial intelligence or machine learning, all of those are founded with large historical training sets.”

The pluses and minuses of built-in SaaS analytics

Grax has a dog in this fight, true, but Gaska does make a good point: the SaaS cloud – the circumstances attending its uptake and adoption, the conventions governing its use, and, not least, its built-in constraints, especially with respect to data access – has had both positive and negative effects on business analytics. This seems counter-intuitive, and, so far as “success” with analytics is seen as commensurate with insight into specific business function areas, it is. On the plus side, then, SaaS apps offer powerful capabilities for analyzing and understanding the current state of each SaaS domain. So, for example, cloud-based sales analytics are easy to use and, on balance, quite powerful, as are cloud-based analytics in other function areas. And Salesforce, especially, exposes a rich variety of built-in analytic functions and capabilities, all of which can be accessed from inside core Salesforce apps.  

On the minus side, the SaaS cloud model complicates historical analysis, first with respect to individual business function areas and, second, across all business function areas. The reason for this has to do with the point-in-time bias of SaaS apps that Gaska referred to. In other words, SaaS analytics tend to be biased in favor of the current state of a particular business function area; lacking a multi-domain SaaS suite – i.e., one that supports CRM, finance, HR, etc. – the business must extract and integrate data from disparate SaaS apps to obtain a panoptic view into operations across all of its function areas. Secondly, and even though some SaaS providers offer customers a built-in means of preserving and analyzing their data histories, the SaaS provider itself owns and controls this data; access to (and analysis of) this data is usually afforded via built-in tools or features. In effect, then, most SaaS subscribers rent their historical data from SaaS providers. This is a feature, not a bug, of the SaaS cloud: first, data access is facilitated via API endpoints, and most providers limit how customers may access APIs to extract data; second, hyperscale cloud platforms, especially, tend to charge premium rates for data egress. These and other factors militate against moving data out of the SaaS cloud.

“Customers [should] retain every version of their history, retain ownership of that historical data forever so if they want to do analytics, if they want to do machine learning, if they want to do artificial intelligence, the historical data becomes essentially gold,” Gaska told Inside Analysis’s Kavanagh. 

Ownership of data is a prerequisite for success with analytics

Gaska’s point respecting data ownership is also important. Like it or not, success with analytics is compounded of equal parts opportunity and responsibility. So, on the one hand, it is the responsibility of the business to own and administer the acquisition, integration, and management of the historical data that is grist for analytic discovery. On the other hand, the very discipline of responsible data ownership – and, specifically, the data management and analytic practices that take root and develop on the basis of this discipline – positions the business to succeed with analytics: that is, to consume the gleanings, the panoptic insights, produced by ad hoc query, analytic discovery, data science, ML engineering, and other practices. Analytic opportunity is contingent on analytic responsibility; there’s no way around it. “Customers should own their history. History is the source of extreme knowledge—whether you choose to use it in artificial intelligence now or in 5 to 10 years,” Gaska urged.

About Vitaly Chernobyl

Vitaly Chernobyl is a technologist with more than 40 years of experience. Born in Moscow in 1969 to Ukrainian academics, Chernobyl solved his first differential equation when he was 7. By the early-1990s, Chernobyl, then 20, along with his oldest brother, Semyon, had settled in New Rochelle, NY. During this period, he authored a series of now-classic Usenet threads that explored the design of Intel’s then-new i860 RISC microprocessor. In addition to dozens of technical papers, he is the co-author, with Pavel Chichikov, of Eleven Ecstatic Discourses: On Programming Intel’s Revolutionary i860.