This article is taken as an excerpt from internal Radiant Advisors research.
Data abstraction – or, as it is also referred to, data virtualization (DV) – is being recognized as the popular panacea for centralization, tackling challenges for manageability, consistency and security. For database administrators, data abstraction is good for data management.
At a conceptual level, data abstraction is where a data object is a representation of physical data and the user is not (nor needs to be) aware of its actual physical representation or persistence in order to work with that object. The data abstraction becomes a “mapping” of user data needs and semantic context to physical data elements, services or code. The benefits of data abstraction, then, are derived from the decoupling of data consumers and data sources. Data consumers only need to be concerned about their access point, and this allows for managing physical data – such as movement, cleansing, consolidation and permissions – without disrupting data consumers. For example, a database view or synonym mapped to a physical database table is available unchanged to a data consumer, while its definition may need to change, its records reloaded or its storage location changed.
Since abstracted data objects are mappings captured as metadata, they are very lightweight definitions. They do not persist any data and therefore are quick to create, update and delete as needed. Data abstraction is so valuable because of its agility to quickly define and relate data from multiple data sources without data movement and persistence. This also represents fast time to value, ease of updating and dealing with change, and poses less risk to the business.
The growing reality is that there are more and more data sources of interest available for companies to manage. Integration and consolidation can no longer keep up with the demands of a single repository of integrated, cleansed and single version of the truth (SVoT). Whenever a technology becomes too complex or numerous to manage, abstraction is the solution to detach the physical world from its logical counterpart. We have seen this trend in nearly every other layer in the technology stack; storage area networks manage thousands of disk drives as logical mount points, and network routing and addresses are represented by virtual local area and private networks (VLAN and VPN). Even operating systems are now virtualized as hypervisors running on servers in the cloud. Databases are no different when it comes to the benefits of abstraction.
With a single data access point – whether persisted or virtualized – companies can ensure data consistency and increase quality through reusability and governance to better monitor and enforce security in a single location, while providing data consumers with simplified navigation.
Putting Abstraction in Context
On the heels of centralization comes the next concept of context. A semantic context layer is defined as an abstraction layer (virtualized) where some form of semantic context – usually business terminology – is provided to the data abstraction. Virtualizing database tables or files does not provide semantic context if the virtual data object still represents its original application specific naming of data elements. Semantic context exists when the virtualized data objects represent the context in which the user – or, business user – needs to work with the data. Semantic context layers are considered to be “closed systems” when they are embedded into other applications that benefit from having centralized semantic context and data connectivity, such as in BI tools.
Semantic context layers can exist without virtualized databases. Once again, this centralized repository of business context is used to represent data elements of underlying disparate databases and files. From the users’ perspective, the data objects are in familiar business context and the proper usage becomes inherent, while metadata is used to mask the complexity of the data mappings. This is why information delivery applications that have user self-service capability and ad hoc usage must rely on an abstracted translation layer. Additionally, exposing these “business data objects” drives reusability and therefore consistency across information applications and business decisions, while minimizing rework and the risk of reports referencing the same context utilizing different code.
Another benefit of a semantic context layer is its ability to handle the realization that SVoT is highly unlikely and realistically there are multiple perspectives of data as multiple business context(s) (or abstractions) can be created on the same set of base data. This enables more business-specific data objects to be created without having data duplication and transformation jobs to manage. An example is different business interpretations of a customer depending on the business unit or process involved. While data from several operational sources is integrated, the perspectives of a financial customer, sales customer, order management customer or customer support customer may all have slight variations and involve different attribution of interest to that particular business unit or process.
Today, data abstraction is a technique that is the fastest and most flexible way to integrate data for business usage without requiring data migration or storage. The inclusion of a semantic context layer focuses on business consumption of abstracted data objects in a single, centralized access point for applications within the system where the data stays firmly ensconced in business context.