Cloud deployment and distributed computing concepts are challenging the assumptions built into the architecture of most BI tools. The shift occurring in the technology underpinnings of the BI market is still at an early stage, but the direction is obvious: away from client-server models and toward distributed computing.
The historic mindset of BI vendors is built around core assumptions of client-server computing: each application runs on a server (or a cluster of servers, each running a copy of the software). The BI tool runs on a server. It presumes centralized data on single database, most commonly behind a SQL pipe. It’s even visible in the server- and core-based licensing models.Cloud deployment for a product with a client-server architecture is no different than it is in the data center: install the software inside a virtual machine running elsewhere. You get many of the benefits of running in the cloud, but the product architecture has not changed.
Cloud deployment for a product with a client-server architecture is no different than it is in the data center: install the software inside a virtual machine running elsewhere. You get many of the benefits of running in the cloud, but the product architecture has not changed.
You don’t get the benefits of elastic scaling. Scaling up capacity is just like it is in the data center. If the BI product doesn’t run on multiple servers transparently in a cluster then you get a virtual machine with more resources. The problem being that VMs are generally not as large as the largest servers you can find, so the ceiling is lower in the cloud than in a data center.
If the product can be run as multiple servers then it’s a bit more amenable to cloud deployment, but not much. You still provision the (virtual) server resources, then install the software, then adjust the configuration of the cluster and balance your load. If your scaling problem is one of the largest single unit of work, then multi-server deployment won’t help – you are still limited buy the capacity of a single server or VM in this model of parallel scaling.
The other side of elastic scaling is to reduce resources when they aren’t needed, which entails doing the same set of tasks in reverse. Many products support basic server addition and subtraction, since the servers are independent, but that’s the problem. The products assume “server” is the unit of resource.“Cloud” is not a technology it’s an architecture. In the same way that software built for the mainframe made assumptions about the environment in which it operates, software built for client-server architectures made a different set of assumptions.
“Cloud” is not a technology it’s an architecture. In the same way that software built for the mainframe made assumptions about the environment in which it operates, software built for client-server architectures made a different set of assumptions.
This is why we saw the rise of an entirely new set of vendors who seemed to appear from nowhere to become some of the largest software vendors in the industry. Almost no vendors of data technologies (databases, integration tools, query or reporting products) carried over from the mainframe market into the later stage client-server market of the late 1990s to early 2000s. We are at a stage in cloud adoption where the BI vendors are beginning to recognize that the way their software was built, deployed and managed is not well suited to the way cloud architectures work.
For example, with elastic scaling, the product itself should address the necessity of adding and removing resources, not an administrator. To do this in most server-centric BI products is not an easy-to-automate task, nor is it a prebuilt feature.
Engineering a product for reliability is likewise a challenge. Software designed for client-server expects reliability and redundancy at the hardware layer. Software designed for the cloud assumes that the resources are unreliable, so it manages resiliency at the product layer and treats resource loss (e.g. a server disappearing) as a minor and temporary problem. In order to do this, the protocols for coordinating workers and remembering the state of work have added layers of complexity that don’t exist in the client-server world.
It takes a wholesale rearchitecting of a product, from the inside out, to take full advantage of a new environment. Otherwise the new environment is simply emulating the old one, carrying forward some of the same limitations of the old and preventing use of new capabilities.
Vendors in the BI and analysis tools market are starting to introduce new products that are designed and built with the assumption that cloud is the environment they will be running in. They may be offered as a service, but there is no hard requirement that this be the case.
Companies like Arcadia Data [https://www.arcadiadata.com/] are early entrants in this new architecture for BI. The product is engineered to deploy in multiple environments, from a private cloud to natively on a Hadoop cluster, internal or cloud-hosted as a service.
Unlike client-server products that want to exist in a separate environment and copy the data – as was also the case with early BI-on-Hadoop products that required predefined data cubes stored separately from the data in HDFS – Arcadia can access the data locally (if the software is installed locally) or access the data from a remote location, cloud to cloud.
The cloud BI market is still in an early stage of development. Companies like Arcadia Data are showing how BI work differently and add capabilities that are not possible in a conventional client-server architecture.