Not everything happens for a reason in the world of information management. Not every table or field in a database got where it wound up via some master plan. More often than not, a company’s information architecture has grown and evolved organically, like a sort of digital mycelium, spreading underground for years, ultimately providing the infrastructure for all manner of analytical insights to blossom somewhere down the line.
The obvious casualties of these “accidental architectures” (as companies like EMC and Talend are calling them) are the elusive goals of clarity and certainty. That’s why residential construction engineers take a vastly more disciplined approach when working with their architect counterparts. You wouldn’t want an accidental architecture for your three-story home, would you? No one in their right mind would want any such thing.
And yet these hodgepodge information architectures exist everywhere. You could argue that the Internet itself, in all its glory and splendor, is one gigantic accidental architecture. Oh, sure, there are many decidedly engineered pillars which help sustain the massive construct called the World Wide Web, things like DNS servers, for example. But still, only a fool would assert that the entirety of the Internet has been carefully designed for optimal efficiency.
There are, however, certain undeniable trends which drive the practical implementations of information systems. By no means is the challenge of heterogeneity new. Movements like the Service Oriented Architecture were focused on bringing some rhyme and reason to the otherwise scatterplot world of information and application landscapes. Today, SOA is not widely discussed, but that’s because its guiding principles now largely rule the roost.
One of the most significant disruptions to the status quo some years back was the concept of a data warehouse. Business analysts had realized the hard way that querying the operational systems of the 1980s and ‘90s was simply not effective or efficient. Transactional systems were designed to transact, not analyze. Back in those days, processors were relatively slow, storage was rather expensive, and enterprise software designers were scarce.
Nonetheless, the data warehousing industry was born, and offered great promise. This was conceivably the dawn of the true “information architecture” because now companies were actively extracting data from operational systems, then loading that data into standalone warehouses which were specifically designed to provide a foundation for running reports and enabling analysts to do ad hoc queries.
But a funny thing happened on the way to that coveted 360 degree view of the enterprise. Consultants, vendors and end users alike eventually realized that the data warehouse is more like a mere mortal than the eternal being some had hoped it might be. Turns out that getting all that data consolidated in one place really does take a great deal of time, effort, process, software, hardware and personnel. And all that Extract-Transform-Load (ETL) work! Ugh!
And then there are the politics. Because of its great expense, detailed processes, personnel and network requirements, a data warehouse is a political thing. Getting access to it requires some clout. Managing to get your new data set included in the blessed warehouse takes even more political will. Actually driving the overall direction of the project? That typically takes place high in the chain of corporate command, often involving the CIO.
But the demand for insights throughout any organization has the persistence of water tension: it’s always there, forever pulling. That’s one key reason why the data warehouse appliances burst onto the scene back in the mid-aughts. Netezza (the brainchild of the inimitable Foster Henshaw), Greenplum (a la Luke Lonergan), Vertica (from Dr. Michael Stonebraker) and a variety of other powerful-and-easy-to-deploy solutions began to proliferate.
You can do the math to figure out why: Too many executives got tired of dealing with the strains of the warehouse, and decided to go their own way. Data marts cropped up everywhere, often yielding targeted value for their champions, at the expense of enterprise-wide data quality. The SILO scenario brought us right back to square one. The coveted strategic view of the company suffered another setback.
Enter, the data federation vendors. Pioneers in that space like Composite Software (recently acquired by Cisco, the networking powerhouse — hint, hint) focused on creating what real-time data warehousing visionary Michael Haisten years ago called an “enterprise backplane.” Via data federation, end users could get access to key data sets without disrupting operational systems, and without needing to necessarily connect to the data warehouse.
Formerly referred to as Enterprise Information Integration, this practice hit stride as organizations tried to rein in all those appliance-borne data marts. Upstarts like Denodo Technologies got into the game, and even the venerable Informatica (which built its revenue streams on ETL) ultimately got the federation religion. Other players like Quest with their Toad line of products worked on delivering a virtual switchboard for data management.
Fast forward to today, and another range of innovations has further paved the way for a federated future. The speed of NoSQL engines (DataStax boasts a million writes per second), coupled with parallel processing and multi-core chips, has opened the door to what might just be called a new age in information architectures. This is no small thing.
At the cutting edge of this new era, we find companies like EntepriseWeb and Pneuron. The former has created a fully dynamic, just-in-time data and application fabric, which can be used to create a real-time and flexible information architecture. The latter has developed an exquisite platform for bolstering so-called accidental architectures with as-needed services like data quality, risk management and other staples of IT functionality, all delivered as, when and where they’re needed. EnterpriseWeb’s Dave Duggal calls this kind of approach “late binding.”
And so, the waning of the warehousing era gives rise to the real-time architecture. This has massive implications for how organizations can and should invest in data management software, hardware, personnel and services. As the former co-host of DM Radio, Jim Ericson, used to say, it’ll be “horses for courses” going forward, meaning each company will create its own unique mix of information systems.
Perhaps the best news out of this whole trend is that we no longer need to worry about force-fitting some root-level normalization to our information architectures. Sure, they got here largely by accident, but we can now forego shoving round pegs into square holes. We can start leaving data where it RESTs, call upon it as needed, and focus on getting things done on the business side of the house.
It’s a new dawn. It’s a new day.