The inimitable Leonard Nimoy immortalized the power of logic within the realm of decision making. While the concept itself dates back thousands of years, Nimoy’s Dr. Spock persona in Star Trek emblazoned the value of logical thinking into the minds of millions. As the consistently stable counterbalance to the often emotional Captain Kirk, he helped steer the starship Enterprise on a steady path, working to ensure the success and safety of her crew.
By definition, logic conjures the power of mathematics, thus vaulting science over art in the hierarchy of dimensions that should be combined by responsible decision makers. Numbers don’t lie, and for those who maintain proper context when analyzing them, powerful insights can be gleaned. That said, there is always an art to every part of senior management. But in the world of numeric values, logic should arguably take the lead role, at least most of the time.
Perhaps this is one reason why a concept called the “logical data warehouse” took hold a number of years ago. As opposed to a traditional data warehouse, which physically marshals structured (relational) data within one super-charged database, the logical data warehouse employs a decidedly federated view of data. The idea is to provide a strategic view of governed, trusted data that spans multiple databases and other information systems.
Anyone who spent time in the trenches of an enterprise data warehouse program can understand the many reasons why a logical approach evolved. Simply put, data has gravity; it doesn’t really like to be moved. What’s more, moving data can become a rather costly initiative, especially as a warehouse grows in size and prominence. The associated expenses include not just software and hardware, but also personnel costs to manage the many feeds of data that get pulled in from source systems and channeled into the warehouse.
Of equal importance is the issue of quality, especially as it pertains to mistakes. Any time data is transformed in order to be loaded into a data warehouse, there are opportunities for human error. All it takes is one field aligned improperly and a fountain of data quality problems can ensue. Such mistakes can be difficult to find, especially after the person who committed them leaves the company. While documentation practices are clearly improving industry-wide (largely because they’re increasingly being automated), there’s always room for error, or even misinterpretation.
And then came Hadoop. Until Yahoo! turned Hadoop over to the Apache Foundation, the database market was relatively stable. The concept of NoSQL had not really taken flight, even though there were already many databases on the market that did not conform to the relational model. There were document databases, column-oriented databases and graph databases. But the use cases for those alternatives were uncommon, at least compared to the preponderance of relational databases in production.
Hadoop disrupted this market by providing such a cost-effective means for storing and analyzing data. Suddenly, organizations could keep all kinds of information about their business. Log files, social media data, life sciences data, clickstream data – hitherto, these data sources were largely viewed as too costly to store and, thus, were discarded to make room for more high-priority data.
Much like the explosion of data warehouse appliances from 2005 through 2012 (a trend that arguably peaked when IBM bought Netezza for $1.78 billion in 2010), the proliferation of Hadoop data stores altered the information landscape in a significant way. Suddenly, there was a veritable ton of valuable data sitting in these new kinds of repositories. Consequently, there came a new challenge: finding people with good enough Hadoop skills to leverage all that data.
As with any market disruption in the information industry, there are more ways than one to tackle the associated challenges. While all the major players (and countless smaller firms) are busy offering training courses for the Hadoop ecosystem, at the same time, there are new tools and platforms coming onto the market. One such tool that was specifically designed to address this new world of data management, is IBM Fluid Query, a software feature in Big Blue’s PureData portfolio.
Fluid Query could be described as super-charged data federation, specifically designed to enable the blending of data from multiple types of data stores, including relational data systems such as traditional warehouses, plus the panoply of Big Data stores that continue to proliferate. Fluid Query makes use of standard SQL constructs, such that anyone with good SQL skills can make use of this tool for facilitating analysis.
Rich Hughes, IBM Marketing Program Manager for Data Warehousing, explains: “Fluid Query is the physical bridge whereby a query is pushed efficiently to where the data resides, whether it is in your data warehouse or in your Hadoop environment. Other benefits made possible by Fluid Query include: better exploitation of Hadoop as a ‘Day 0’ archive that is queryable with conventional SQL, combining hot data from PureData with colder data from Hadoop, and archiving colder data from PureData to Hadoop to relieve resources on the data warehouse.”
The logic behind the Fluid Query concept recognizes that the days of marshaling all relevant data into a physical data warehouse are numbered. The modern world of data is simply too big, too unwieldy and too volatile to conform to such a rigid schema. Instead, organizations will increasingly look for ways to virtually manage the data assets they want to use for analysis. The warehouse will have a long tail, no doubt, but that tail is no longer wagging the dog of Big Data.
Nonetheless, the data quality problems of old will still lurk throughout all those reservoirs and will need to be addressed in the business logic and transformation formulae that get leveraged by tools such as Fluid Query. Careful is as careful does, and any data professional who plans to succeed will forever be on guard for human error. To quote the eternal Mr. Spock: “It is curious how often you humans manage to obtain that which you do not want.”