The modern data warehouse is not a single system-based architecture that serves all the enterprise users; it has grown into an ecosystem of diverse systems that is geographically distributed on different technology platforms with data shared across the platforms. Along with the growth of the platforms comes the challenge of managing workloads across the platforms.
Traditional database systems have always been challenged with performance and scalability issues in an integrated platform when executing workloads whether on user-generated SQL or from ETL (extract, transform and load) or CDC (change data capture) generated SQL activity. The performance gets worse when executing mixed workloads of loading data, querying data and supporting user-generated ad hoc analytical queries. Different categories of data will generate workloads that are more complex to manage than others.
The situation today is even more complex with the advent of Big Data and its processing complexities. Most of the database vendors have already forayed into connecting into the Big Data world with their own connectors and associated technologies.
Figure 1 shows the most common data footprint for enterprises across the world in many time zones and regions. The datacenters may be spread across areas of the world and, in other cases, the servers and systems may be within one datacenter in the USA. The issue of managing data across the datacenters presents many challenges:
- Replicating data across the datacenters
- Managing workloads across the datacenters, following the model of the sun
- Managing user access to the appropriate data across datacenters
- Managing query balancing across systems and datacenters
- Managing routing and access of data and systems
Managing data in this type of situation requires a combination of algorithms and optimization models: intelligent network routers, flexible data adapters, high-speed data movement, data replication and rebuild, and systems stabilization and optimization. While most of the workload-related algorithms have matured over the last few years, there are still issues with the management and processing of data and queries across the network and systems.
Figure 2 illustrates the situation experienced by system users in the diverse workload-based data warehouse. The mixed workload of business intelligence, analytics, ETL and Big Data queries create a congestion on the entire infrastructure, from storage to network to server and database, that performance and quality of service take a severe beating. There are workarounds and optimizations available, but there are challenges as these tools and techniques cannot integrate across diverse platforms. Even if the enterprise is on the same platform but has different versions of the database on a different server footprint, then managing these systems becomes a bigger issue. The lack of an integrated systems and database management platform which can seamlessly orchestrate the infrastructure while maintaining quality of service and performance has become the most complex hurdle for enterprises today.
To understand the requirements better between we need to understand the workloads better. The most common types of workloads include the following
- Business intelligence workloads – Standard and ad hoc reports executed by business users during different times of the day
- Analytical workloads – Data processing-related workloads generated by statistical and data mining applications and users.
- Data processing and loading – Workload generated by data loading, transformation, and management tools and programs.
- Security overhead
- Web and non-web related application traffic
- Database traffic
- Application-related data traffic
Storage Workloads – Related to handling the storage and retrieval of data
These workloads can become complex depending on the size and types of data that will be managed in the enterprise. Figure 3 shows the future state requirement for managing workloads in the next generation data warehouse, business intelligence and analytics architecture.
As shown in Figure 3, the future state data architecture requires a two-part management system. One part of this solution is related to managing the workloads and assigning them to the right systems based on the resources needed. This is similar to traffic patterns in China where vehicles are assigned lanes to avoid undue congestion, and the flow of traffic is uninterrupted. The other part of the system is designed to provide the user application with intelligent assistance to manage the query generation based on the resources available. This is similar to cars of the future with lane and park assist. This kind of streamlining in the infrastructure layers of the data warehouse and Big Data platforms is where the workload management fabric will need to evolve. All the major database vendors are making efforts to get to this state of nirvana. What will it take to provide this kind of workload management capability? There are a few conditions:
- Users need to be on the same family of the database
- Users can be in different hardware and database versions
- User workloads need to be somewhat clearly defined for a starter kit
- User data requirements need to be documented, including future growth projections
- LDAP security architecture needs to be in place
- Standardization of tools and technologies in the enterprise need to be adopted
With some or all of these requirements laid out, we can build a future state platform where the data is isolated based on the workload environment and users can seamlessly move between different environments as each of them are self contained to manage any type of workload related to the data.
What are the benefits of such a system? Let us see the different benefits that can be derived from this type of architecture
- Increased performance efficiencies
- Streamlined systems with optimal performance
- Effective management of user expectations
- Utilization of all infrastructure investments across the geographies
- Increased ROI from infrastructure investments
- Lower TCO from an infrastructure spend perspective
- Standardization of tools and technologies
There are a few steps that every enterprise needs to take to get to these types of benefits:
- Decide on which set of heterogeneous technologies to standardize with. There are caveats here – the databases need to be on a similar platform and heterogeneous platforms applications can be on any platform.
- Determine how much of time will be needed before ROI can be realized.
- Set the overall spending on the new infrastructure. How much can be saved from retired platforms over what period of time?
- Calculate the TCO and ROI in the new architecture.
Creating a program with a roadmap, answering some of these questions and moving to an implementation phase will ensure success for creating this future state vision to reality.