Inside Analysis

Voracity and the Philosophy of Data

This interview is part of The Bloor Group’s research program, Philosophy of Data, which has been underwritten by IRI, The CoSort Company.

Eric Kavanagh: Ladies and gentlemen, hello and welcome back once again to A Philosophy of Data. It’s yours truly, Eric Kavanagh, today sitting with David Friedland from IRI, The CoSort Company, the folks who have been kind enough to sponsor our whole endeavor here on the Philosophy of Data. We’re going to talk about different aspects of data management today and try to understand how organizations can revisit their information architectures and their policies to take advantage of all this new fun stuff out there in the industry, all this new big data that’s out there, for example, while also taking account of their small data, which I think we all know is what runs the businesses today, so we’re going to talk about a number of different things but first of all, David, welcome to the show.

David Friedland: Thank you, Eric. I am glad to be here.

Eric Kavanagh: Sure thing. So I’ll just drop my thoughts on the Philosophy of Data and see how it relates to data management and we’ll kind of take it from there. It seems to me that the neat thing about philosophy is that it really challenges you to rethink your presumptions, theories, conclusions and take a hard look at your view of the world and try to better understand which direction you should go for the future to achieve what you’re trying to achieve, Soo in the world of data, it’s highly applicable right now I think because of many different factors. Social, mobile, analytics, cloud, all of these forces are coalescing right now and frankly disrupting the status quo.

In my opinion, philosophy is highly relevant to data management as a practice these days because we have these interesting disruptive forces that are fundamentally changing how data gets used, what it is used for, and what it generates to the business iat the end of the day, so I’ll turn that over to you and just to give you some thoughts here because let’s face it, you’re a veteran on the field, you’ve been doing this stuff for decades now with IRI, which is best known as The CoSort Company, but give us your thoughts just on the changing nature of data management and how you guys are trying to address that.

David Friedland: Well first, I agree with what you said about the new opportunities that big data is providing, but also that it is small data running businesses today still. Although it is more clear that more and more people expect that big data will be coming in to run businesses especially if they can find new ways to leverage it, like in data lakes. Anyway, I really do agree that philosophy is relevant to all of it, and I guess we’ll be talking through it as you have in the earlier interviews with our colleagues.

The tricky thing is when different people have different philosophies. Trying as a data management vendor to accommodate those philosophies is the hard part because those philosophies can manifest in ways that our users collect their data, store and process it, protect and look at it, hold on to it and disseminate it, pretty much everything. And in the last 25 or more years I’ve been around IRI and CoSort, we’ve seen and heard a lot, so we just know as we release our new data management platform, it needs to support all of that. That’s the IRI Voracity platform,and it has to provide the centrality and flexibility to handle data throughout its life cycle, whether it is big or small in the ways that people, companies, and even governments need to. But I guess we’ll be moving through all that, and we can talk about philosophy along the way too.

Eric Kavanagh: I think platform really is the key because without a platform, let’s face it, you have a whole array of disjointed systems and that is what causes many of the issues that companies encounter with things like data quality or even data readiness for example, even being able to get data to the right person at the right time, it seems to me the philosophy can help out because in a way, philosophy helps you define policies, and policies are what govern behavior around data, right?

David Friedland: Exactly, and we all know that policies are a huge part of the data management lifecycle. They play a role in everything, I mean at least in our platform, … what our platform is addressing like data discovery, integration, migration, governance and analytics. Those are the five things, the main activity pillars of Voracity if you will. Although I guess when you talk about policies, governance is the one that maybe most people think of first. It’s the enterprise information and data governance architects who are the ones that make the rules — what policies let’s say about finding data in the discovery phase or protecting it in the governance phase. Policies also speak to how, let’s say the data warehouse architect has to do his transformations or how a database administrator is going to model or migrate his data, replicate it, or even subset it.

Ultimately though, it’s the business user who also needs to have policies with respect to how they’re going to analyze the data. They have to think about what they’re trying to learn from it, so policies can dictate that too and maybe even what they’re allowed to learn from it, right? Especially in the area of big data and big brother. So again, that takes us back to the philosophy of data and how it’s governed. In the old days, the owner of the data was the one who paid to put it there, right? But I’m not sure that today, people whose data has been collected feel the same way.

Eric Kavanagh: That’s interesting. It’s a good point and I think it really does speak to the importance of policy and of having policies in place plus the people can understand what they are, can read what the policies are, can know what the policies do, and of course that’s what governance is all about. So governance is about understanding which policies need to be in place for issues like privacy, for example, that are really important and certainly for organizations that are regulated, but also just in general, policy speaks the manner in which people are supposed to use data. And I think that’s where the philosophy comes in. You really do need to be thoughtful as you design policies and as you manage them over time, right?

David Friedland: Not only thoughtful, but compliant. The data handling policies that you have may or may not square with the philosophy of a customer or the rules of an auditor who have to enforce the data privacy laws …especially in Western markets right now, or all of the other regulations that are applicable to the collection of data. As Gwen Thomas knows, IRI was early to that party when we started masking data in flat files about ten years ago, and now we’re in the middle of a pretty healthy industry protecting personally identifiable information. Now the sun is even also rising in the pacific rim too,where they’re finally aligning policies and tools with data protection mandates and practices.

Eric Kavanagh: Yeah, and I have to think and having studied regulations specifically in financial services and health care, that regulators are going to be a lot happier working with you if they can see that you have made an effort to define policies in accordance with those rules and regulations, which means again, if you have a platform and you could point the auditor or regulator to your policies, that’s going to go a long way in your favor, right?

David Friedland: It is, so it’ll be important to have those policies effectively codified in your processes and in your platform. That is easier said than done, but we just see in Voracity for example (our platform) how much is involved. Off the top of my head, let’s say starting with discovery, we’re talking about policies you might put in place for data and forensic metadata searches, who gets to find what, or for the data models or schemas that are built to tie those things together. Are you sure, for example, that you want to match a patient up to a disease?

There are ramifications to those decisions. That kind of stuff can even play out in data profiling or diagram decisions which reflect what’s been built. But the even bigger deal in Voracity is the data classes and the class libraries that our users are defining in the discovery phase because those will be the precursors to the transformation and data protection rules that get applied to those data classes Sso you have to know what your policies are in order to set up those data rules in the first place, and that goes back to your philosophy about the use of data and its security But even more mundane stuff can get caught up in philosophical and policy debates.

How you manage your data class libraries and your other metadata in Voracity, for example, like IRI data definition files or the IRI job scripts, or the entire ETL workflows of all of that stuff inside, and everything else, that might be in the work flow All that’s at least subject to access questions if nothing else. Then there’s the DDL that exists, the SQL procedures, batch files, scheduling, even team sharing details, artifacts from Hadoop, let’s say performance or audit logs …, whatever else, it might matter because you might have policies on how those assets are published or persisted. Let’s think also about lineage analysis, or compliance auditors who want to see how data was modified.

In Voracity, we use query-ready XML logs which contain all that run-time information and metadata and application details and stuff, because you have to know who amassed what, where and when for example, but looking at the self documenting IRI script portion of the log, at least, you’d be able to glean why a function was performed, and it is really helpful when the logs make it easy to review those things (as well as performance).

Eric Kavanagh: I really like this concept of self documentation because you’re not leaving something up to the end user to manually document it. That’s been one of the big issues over time in software development, I would say, since the dawn of software development itself, right? It’s very difficult to document everything you’re doing unless it’s done automatically, and that’s what you’ve baked into this platform, right?

David Friedland: Exactly, and it is a big part of our philosophy, too, as a vendor. Not for our source code necessarily but for what people are doing with our product. We’ve always believed in open and explicit metadata to define people’s data and the manipulations that they do on that data. That mantra goes back at IRI to 1992 when we created CoSort’s sort control language from the syntax of the VMS VAX sort utility, because Digital had the simplest 4GL probably ever made. That makes it easy to read the job scripts and the data load, so we revived and we extended that language with things like SQL constructs and formatting and even data masking functions and all the parameters that support data integration today as well as data cleansing and masking and reporting, so now, everything that we do with data really is documented in the modern day version of that language.

Even if you’re working with Voracity job wizards or diagram and dialogue editors, at the end of the day, they create scripts, those scripts, so that they’re really easy to understand and work with, and our users love that. They love the choice of scripts versus diagrams and things, especially when they know that the opposite, the dark world, exists as a selfish philosophy behind not having scripts that they can read. In fact, some vendors go so far as to obscure their metadata on purpose. They design it to be hard to read or access, or maybe they’ll even call it a “trade secret” because they really prefer that you’re addicted to their GUI so that you don’t have that metadata access, and it also, of course, makes it hard to convert jobs if you don’t have that metadata access or you don’t understand what’s going on when they want to leave to go to another product. So we believe that metadata has to be open in order to reveal how data gets defined and manipulated over time, especially in a governed infrastructure, and we believe that the metadata has to stay open for their freedom of movement, too, if they want to switch vendors.

Eric Kavanagh: Yeah, right, and those are several really good points you made there, and as I think about being a regulator, for example, putting on that hat or being on the client side, someone that comes into the organization, they need to be able to find out what information is out there. That’s called “discovery,” right? If you have the kind of platform that you’re describing, then a new person can very quickly, at least one hopes, think about and look through various various aspects of the information landscape, but also see those policies to understand how the data is being used. That’s going to put the person on a nice clear path to understanding what the data is, how it’s being used, and whether or not it’s aligned with current business needs and regulations, right?

David Friedland: All of that is so important, and that’s why in the area of data discovery, we try to bring all those aspects together, and that means having a way to search and learn what kind of data you have and then automatically classify it into the open metadata infrastructure that I just described so that you can do what you need to do with those data class libraries like I mentioned before., That of course has to be possible regardless of where the data is sitting and where it came from as well as the format that it’s in. You need to understand the content and the character of your data, meaning the kind of value that each entity contains, as well as the structure and relationships of those entities.

Again in Voracity, the IRI platform, you start with searches for values that might be, let’s say, in a lookup file or match- patterns defined by regular expressions, or that reach the matching threshold of one of our fuzzy matching algorithms We also build ER diagrams and statistical profiling reports on all this stuff and even dump out some forensic metadata on dark data (these are unstructured text-based documents that are on the network). So while you’re searching through everything, you’re getting your data classified, you’re able to access what’s in there …you need to also then tie up metadata repositories, and that works together so you can scan attributes and relationships of information, and then have it all converted into the IRI open metadata standard.

In that way, by having these data definition files, and in our Eclipse IDE for Voracity, you can work in a common way with all the different data elements in your data integration, your migration, your governance, and your analytic jobs.

Eric Kavanagh: You’re reminding me of some of what Rick Sherman said in his interview, and I always love Rick because he’s such a pragmatist. He’s been around doing this stuff for at least 30 years so he’s seen way back the waves of technology roll out, each time with a new set of promises or sometimes I guess with the same old promises, just spun in some new way, and he commented on how, and I think his exact quote was, “We keep fooling ourselves, we keep trying to run after solutions that are going to solve world hunger,, (that cracked me up) as opposed to recognizing the fact that data is hard.”

Data is complex, and we need to accept that and work on that instead of just going for silver bullets like, of course, data lakes being the latest trend, right? He said there’s a lot of value in data lakes but really, the data lake doesn’t eliminate all the other hard work, it’s just a large set of data which then enables other complex data operations on the other side of the lake if you will, and that’s really what you’re talking about here is having this lingua franca to be able to discover, assess, and then align data sets to your policies, and that’s how you get compliance, but it also helps you achieve meaningful use of that data in your organization, right?

David Friedland: That it does. In the data lake context, especially where everything is so murky and undefined at least initially, you’ve got to lend structure to that data, which of course is something that metadata will impose And I think he’s right also about the world not being simple because of the amount of data coming in, the variety of it, the velocity, the veracity of it, and that makes the discovery of it and the definition of the metadata, the lingua franca, all the more difficult.

That said, since to leverage all this metadata, big and small, and all of its permutations, data officers and information architects have to be able to restructure or remodel that data, and then harmonize it as they stage it and govern it in the analytic support system that they have which is usually an enterprise data warehouse or a logical data warehouse. So that’s why we added all those discovery and classification tools and metadata definition tools that feed off them in our Eclipse GUI … where you can design and deploy and manage the jobs … and why we use that same metadata across the board for ETL, cleansing, replication, masking, reporting, even schema migration, and that’s also really where you’re making data integration possible, I think, and a whole lot easier.

Eric Kavanagh: Let’s talk about the integration side of the equation. Once you’ve discovered what you’ve got, then you start looking at the business needs for that data, how do you integrate it into the operational systems for example or even the analytical systems? This is the next big step, right? It’s integration, you don’t want to do that before you’ve done some reasonable level of discovery and understanding, of course, which is an ongoing process, but the integration component itself. Can you talk about what you’ve built into the platform to enable this comprehensive process of integrating all this data?

David Friedland: Since I touched on the discovery and metadata aspects of it all, I’ll try to answer this one in terms of how Voracity can actually integrate the data. The short answer is through ETL workflows that leverage our high performance extraction engine and our high performance CoSort transformation engine, or Hadoop engines for transforms since we support both –especially because performance in volume has always been our sweet spot, Maybe it’s time to digress a little bit into our company history about that sweet spot, because I think it might throw some light on how we go about large scale data integration, maybe also on our philosophical path in the market, too.

Let’s start with the proposition that sorting is a big deal in data integration, right? … and maybe going into why that’s so a little later. The history lesson now: about 30 years ago, IRI created the Unix sort market with CoSort, our first product, and we were moving then big sort jobs off the mainframe. Remember I said it was 1992 when the CoSort SortCL program, that metadata language was launched, and it started growing its transformation and reporting legs before becoming really the first program in Unix actually with parallel sorting, with joins for flat files and some advanced aggregation features, …so much so that Hyperion made SortCL, the CoSort product, their ETL engine for their e-commerce applications back in 1999.

From that point on, other ETL and data warehousing solution partners, or providers rather, were starting to embed our technology. But back then, we still hadn’t built our own full blown integration package like the big companies have. Instead, we introduced the IRI FACT, the fast extract, parallel unload tool for very large databases around 2004, I think, and with that, and presorted bulk loads that we could do because of the CoSort engine, we had pretty much the fastest command line ETL pipeline on the market.

Most of our users weren’t really GUI minded anyway in those days because of their focus on performance, and we didn’t think much of the proprietary IDEs, the GUI development environments that were around in those days, So we stayed focused on reaching a “terminal velocity” in sorting as well as transformation and reporting so that we could help people speed up or replace their BI jobs, their ETL operations and even their SQL procedures where we could transform a lot faster than the databases could., And actually we’re still doing that, and we’re still learning from the DBAs and the data warehouse architects and the data miners who’ve used our engines for years because they like their speed, and they like our company’s size and agility, and I think their bosses just like our prices.

Anyway, for the last several years, we’ve been spinning off new products off that SortCL language in CoSort for data migration, which is NextForm, or data masking, which is FieldShield, and test data, which is RowGen.Ad then we finally went down the GUI road with Eclipse, which is very popular (it was becoming popular then). So we used it to set up data integration worklflows, the traditional ETL look and feel, the transform mapping diagrams, and also we’re able to build job wizards and do it with good metadata management facilities that leverage that CoSort metadata. Of course, all that helps with data discovery and data integration, migration, governance, and analytics which are the 5 major aspects of the platform really.

Now that Eclipse, on which Voracity’s built, has become so rich, in fact we think it’s going to help Voracity leapfrog other data integration tools at least in ergonomic terms because there’s so many different ways you can design jobs in an eclipse environment. So anyway, that’s how the technical stars aligned behind what we’re doing in data integration, long story short. Commercially, I perceive something similar now in terms of star alignment at the same time, because we are seeing a market that wants data integration and analytic solutions together, but they want them to be faster and cheaper and easier to use, so we think that the megavendor ETL tools for data integration are too slow, generally, too hard to use, and way too expensive if they have everything inside at all, so that’s the feedback we’ve gotten so far.

We think that Voracity’s the only data integration platform that’s poised to hit all these issues at the same time, and why we are pretty sure it can be used to help people certainly speed up or even leave their existing ETL tool with some automatic job conversion our partner makes available … so that’s the story in a nutshell about where we are on the data integration side.

Eric Kavanagh: You’re basically hinting in all those price performance stuff, right? That’s always been one of the big issues that comes into play, especially over time, because a lot of times, and I know Rick has talked about this on DM radio shows several times –, companies will not fully appreciate the, let’s call it “the legacy” if you will, of an ETL process, because once those processes are in place, a lot of times it’s difficult to change them or even to frankly understand exactly what they are. There’s no documentation for example, so price performance is one of those creeping problems that will come up on companies, and if they’re not strategic about dealing twithhis moving data around and accepting the reality that you have to feed people with the data that they want, and you’re going to encounter some really unpleasant price issues down the road. So could you talk about price performance and how you managed to bake that into the platform as well?

David Friedland: Yep, I can, and you’re right, it’s critical that companies have the price performance that they need or the performance they need, let’s start with that, just to meet those SLAs which are mission-critical, the service level agreements, right? Especially when their inbound data is growing and the number of hours in a day are not growing. Let me just pick the industry scab here for a minute, because when you run jobs in the platforms that are around today, most of which are compiled Java programs or running transforms inside databases, they’re going to be less efficient. You’re dealing with multiple IO passes through data, you’re dealing with partitioning, memory constraints and algorithmic weaknesses, and things like that,and things are going to get tight.

Then on the price side, the CIOs are spending 6 to 8 figures on appliances or in memory databases, really expensive ETL , or on Hadoopers throwing Apache projects and clusters at what is really a software problem. So most of the world is either running inefficiently, or they’re overpaying even if they can run their jobs at all, and that puts the problem of big data beyond reach, so for IRI and its mission for Voracity, we’re trying to unify data discovery, integration, migration, governance, and analytics with speed and price as the big attractions, at least until we become a mega vendor and jack up our prices too, right? In the meantime, I’d say that price performance continues to be our most obvious competitive advantage, and that we’re going to retain that in the data integration market, and even in the big data market as well, now that Hadoop executions are running seamlessly in Voracity.

Remember too that the ETL that we do is pipeable where every piece of the pass is optimized in terms of IO and algorithms, and there are a lot of efficiencies to be had in the way that CoSort consolidates transforms and reports, puts them all together, so doing all that and keeping a low overhead in terms of marketing and stuff like that are how you can bake price-performance into a solution like this.

Eric Kavanagh: Speaking of consolidation, this is a pretty big issue here you just mentioned, can you shine some light on this particular aspect for those you might want to understand what you mean by consolidation? It sounds intriguing but when the rubber meets the road, what’s actually happening there?

David Friedland: For instance, a Voracity designer might build an entire data replication, or a reporting job for data in a Mongo database using just one transform block that uses the CoSort’s SortCL program. You’d have CoSort use its native Mongo data handling engine and a SortCL language script to specify a bunch of targets with whatever formats and layouts that they needed in that example. Another example would be an ETL job where let’s say Oracle and data is getting dumped out in parallel into a flat file format by FACT, our fast extract tool, and that’s getting piped into a CoSort-type sort join aggregate filter transform-type job, which also has data masking functions going on …even in embedded BI frankly, in that same script. One of the targets could also be a presorted load file which then gets piped into a database load utility.

This whole pipe ETL thing, which is great, it consolidates everything, isn’t even the whole story with respect to consolidation. It’s really what’s happening in that IRI CoSort SortCL job script under the hood in Voracity. That’s the transform block. In that same IO pass through that step, you’ve got a bunch of commands in that script that are doing so many different transforms and mappings all at once, and pumping out custom detail and summary reports, XML files or HTML- embedded web report-type files, as well as database table targets, hand off files or tables for BI tools to ingest. Even data in memory can go directly out of this process into let’s say a BIRT dashboard that will display in our Eclipse GUI when that job runs.

This ability, or this wide ranging set of abilities,to combine so much together in the same step and in the same product is what makes CoSort and therefore Voracity very powerful … even though very few people have been able to get past their cognitive dissonance about stuff like this because they’re so used to these one-step-at-a-time ETL tools. And there are even other permutations of consolidation we could talk about in Voracity,like our Hadoop integration, because the jobs that you define in the default language of CoSort, that SortCL, those scripts, they also run seamlessly in MapReduce…MapReduce 2, Spark, Storm or Tez Sso that means no additional coding when you have to, or when you want to, use Hadoop …, and that’s really good for folks with Hadoop clusters who don’t want to keep an army geeks around that long, right?

Then as I think about another example of consolidation, it would be the ability to combine the design and deployment of other programs and languages in this environment, because the IRI Voracity Workbench, the GUI and Eclipse, support SQL procedures and C programs, Java, batch, Python, Perl, the R Language … whatever else you want to plug into Eclipse and make part of your workflow, you can. It’s all integrated into that consolidating design and runtime environment, so they run in the same workflow and you really have the ability to make a whole application happen in that one IDE. So those are higher level consolidation examples. I hope they were relevant too.

Eric Kavanagh: Sure. That’s all really interesting stuff, and I think you’re talking again about the value of a platform, but not just any platform, right? Because what you’re really digging into here is this whole concept of write once, deploy many times, across many different tools, and that really to me speaks to the heart of value in this approach that you’ve taken because there’s so many options these days. If you’re a new user in a company for example, you want to do some analysis, there maybe any number of tools you’d like to bring into the table, so if there’s something at the foundational level like what you’re describing, that just enables end users to get to the job of analysis much faster without having to go back and rewrite code that is specific to a particular tool, right?

David Friedland: Exactly. The whole point is to be a one stop shop, so you can bring all that stuff together. You’re handling all the data, but you’re also getting all the functionality that these different specialty tools out there that you’re talking about would normally have to bring into the mix, so we’re really as much as possible allowing people to avoid all that hassle, and cost and complexity, which entails administrative and financial overhead, too …having to design and maintain jobs in different tools and languages … that’s not easy, and so by having this consolidating platform, you’ll eliminate all that.

Eric Kavanagh: That’s good stuff. I feel like exploring sort as a function, since you’ve t’d it up earlier It’s my understanding, and correct me if I’m wrong, but the sort function is a foundational or fundamental function of a database and it tends to be the most costly, so I’m guessing that is frankly why you guys built your platform around sort to enable this high powered capability from CoSort — which then brought the market attention to you many years ago, it really enables the rest of your platform to achieve its price performance and achieve the speed that you talk about. Is that about right?

David Friedland: It is. Really, the same value at the heart of CoSort is at the heart of Voracity because CoSort is its default engine, and the CoSort SortCL program I mentioned drives most of the manipulations that people are going to do in the Voracity platform So yeah, sort speed is a key. It’s a key to database and ETL performance as you alluded to because it’s a very common transformation,and it’s normally an expensive one. It is done by most database load utilities as well, so when CoSort, let’s say our engine can be used to sort files ahead of the load, you can bypass the database’s slower sort function and basically get those loads to happen a lot closer to IO speed … and that changes the equation for ETL and ELT operations as well.

What I think is more interesting though is that the CoSort and the Voracity jobs run the sort-driven transforms together. That means your sorts, your joins and your aggregations in the same job and in the same IO pass … unlike other approaches which put them in separate steps or stages. So not only combining those together, but combining them with everything else that SortCL will do as well, like filtering and masking and encrypting data, cleansing the data, validating it, converting it, remapping and reporting . And the more you do at once, the more efficient and productive your whole system is going to be … in fact so much more efficient, that you probably will need a lot less hardware.

I think we’ve probably kept a lot of people from needing not only a legacy ETL tool (as they’ve told us), but from having to buy a new database appliance which would speed up, again, an inefficient approach, or even delaying the roll out of Hadoop because in many cases, a multi-core server is good for several terabytes of operating with CoSort or Voracity-type of engine at once. Beyond that, you can get Voracity to automatically run the same jobs in Hadoop as I mentioned before, which will distribute a load across multiple nodes if you are rolling out a Hadoop cluster. But no matter how you look at this,and what you’re doing with big data at least, sorting is a key, and that’s why I think our company has done so well for so long.

Eric Kavanagh: Right. This entire process that we’re talking about by which data is identified, managed, used, et cetera, they’re all sorts of steps along the way that you’re describing. You were getting into some nice good gritty detail there. What hear you saying is that there are comparisons between what CoSort is doing in for example, what Hadoop has done for the world of parallel programming, which is all great stuff, but there are taxes. There’s a tax to doing a sort for example. What can you do with CoSort, if I understand it correctly, is you can optimize that part of the process which then expedites the rest of the whole timeline, right?

David Friedland: Right, and continuing on the discussion about the value of sorting … especially when you can use a sort to bypass, a sort like CoSort ,to bypass or replace the default sorts …the default ones that come with other packages that weigh them down because they’re not terribly good with the sort that they came out with,. S for example in the database where I mentioned before, the sort that’s in their load utility, you might want to do an online reorg, which is what a lot of people do, but there’s a better way if you use a CoSort to sort data externally, outside the database, and use our offline reorg job wizard in Voracity, you can speed up the whole reorg process.

You’re doing an offline reorg at that point because you’re again, bypassing, you’re doing a fast unload with our FACT tool, you’re taking the data out of the database, you’re then doing sorts with the CoSort engine and then you have a presorted bulk load to go back … and that’s an offline reorg. And so when the tables are now in order, the query responses are going to get faster because you’ve kept those tables sorted on the most common query key, right? The join key. So there’s a lot of efficiency to be had when you’ve got a better sort going on … for the database but outside the database.

The same thing is true in the BI layer because when you take the sort out of there and you do sorting and other types of transforms to prepare data centrally, which is part of that whole data blending thing we’ll probably talk about, the BI tool only needs to use a centralized and presorted and smaller subset of data because again, you’ve gone through presorting and joining and aggregating and filtering for them so that they don’t have to, so just like with most of the ETL tools, BI and analytic tools will benefit from that because they could sort themselves and do transformations internally, but they kind of suck at it. Even big input data … if they try to open up a big file or a table, half the time, most of those tools would just crash, so it’s really better to do data preparation outside the BI layer, outside the database and where you can, and even to accelerate somebody else’s slower ETL tool using our stuff.

Eric Kavanagh: Right. It’s all reminding me again of Rick Sherman’s comments as we talk about things like self-service BI, I think the bottom line is that there are “gotchas” lurking out there everywhere and these are the things that affect a particular person in the organization or a department when they encounter some long running process, for example, that they just did not expect, especially if you start to try to pivot your perspective on something, right? You realize that maybe you’re going down the wrong path, you need to prepare your data in a different way. Those are the times when a long running sort of process can really truncate the analytical thought process itself, right? That’s just not good.

David Friedland: Right. Like I just said, you’ve got to prepare data out ahead of that layer, ahead of analytics and part of that preparation, what in fact Rick Sherman back in ’03 called “data franchising” and what today is called “data preparation” or “data blending” is the filtering and the transformation of data for BI tools, where again, sorting is a big part. And as you pointed out, in fact, it can be a long-running process,and for most folks, it’s an exponential proposition because as the data double, the resource requirement’s to sort it can triple, and that could slow things down unless you’ve got a robust, commercial-grade sort.

Eric Kavanagh: Right. That’s a good point. Let’s talk about data migration because obviously, integration is a core part of speeding different systems, but sometimes you’re going to want to migrate off an old system, it does happen, into a new platform. Can you talk about the Voracity perspective there in this part of the overall data equation?

David Friedland: Sure. Data migration is the third of five Voracity pillars, if you will, and certainly the migration of data is part of a lot of people’s processes in and around data integration, not to mention the folks who want to use data migration so they can leave,or let’s say replicate data that’s in, the database. They want to leave the database,or let’s say leave a mainframe and repurpose the data that they have in those older systems into another environment. Our roots in data integration come from our CoSort history, in so called “right sizing” or offloading JCL sort and report jobs because those jobs required us to remap the files and the data types that were on the mainframe over to Unix and Windows, so for example that meant converting COBOL or VSAM-type files or translating EBCDIC and packed decimal data into ASCII and numeric fields.

Now, Voracity, in terms of data migration, has got to do that and a lot more, like map relational to NoSQL or HDFS, migrate data between cloud apps, remap telco or machine data, who knows? If we don’t have built-in support for data through our CoSort process type as we would for, let’s say, XML and COBOL, Mongo, what have you, at least we can move data through ODBC which is another great technology that’s there for people who are doing data migration. What’s good about ODBC is that it’s ubiquitous. Many companies have ODBC drivers and JDBC drivers for that matter for legacy data sources like IMS and Pick on the mainframe, and newer ones too like Hive and Marketo. Some of those ODBC drivers are faster than you might think.

Interestingly also, when data sources are unstructured, ODBC drivers that have been made for them will artificially impose a structure for queries and things like that. So in our case, Voracity can work with that (that ODBC connected data source) just as it would with a SQL database with its own driver, or JDBC or ODBC driver, so it really kind of structures the unstructured, and you can get a lot more use out of it. It is true that a generic driver like that, that is structuring data for normal processing,can add overhead or cost to the connection but I think that the benefits outweigh that because we are again talking about being able to centrally discover, integrate, migrate, govern and analyze data that’s in proprietary or unstructured sources in a standard way, and that means getting all the sources into Voracity’s metadata infrastructure for all those uses.

Otherwise, you’d have to use some other type of extractor to get flat files out of those sources if you can … something like our FACT unloader for Oracle in that situation, or if it’s dark data, unstructured, you’d use our discovery wizard to find and suck values out,or something like a Cassandra copy command for a NoSQL database, so you’d have to use something else if there isn’t an ODBC type of driver for it.

Eric Kavanagh: You kind of touched on something there which I really didn’t even intend to talk about, but I think it’s a very hot topic. I think one of the most vexing challenges that face organizations today, it’s this simple concept of this word called “the cloud” because a lot of companies are trying to look to offoad to cloud, or at least get some of their on-prem systems up in the cloud environment. In this world of hybrid cloud I tell people all the time, it can be a very unpleasant environment if you don’t think through what you’re trying to do and how to get it done.

I’ve heard some pretty serious horror stories of companies trying to use cloud for, let’s just say, HR data when all of a sudden they realize because they didn’t go all the way in, they’re stuck in between, so now they have to pay this cloud vendor a bunch of money and they still must maintain their on premise systems and they have all sorts of consistency problems, I should say. There are data quality problems, there are just all kinds of things that come out of that. Can you speak for just a second here about how it is that you guys address that,because what I’m hearing is that the Voracity platform really can serve as an excellent conduit to cloud migration. Is that right?

David Friedland: Right, and you do have to think about cloud migration when you think about data migration for sure, and because of the issues you mentioned, which shouldn’t really come as a surprise to anyone, even though cloud is “newer paradigm,” it’s not that new and you’re going to see these things, these problems occur now as it becomes more and more prevalent. But I think you can address these cloud problems as well as the whole opportunity that cloud brings in simpler ways really that I think are faster and cheaper, too.

At least the first way that we think we’re going to do that is with an on-premise type data migration, an ETL platform, Voracity running on prem that will support sources of data that are in the cloud like I mentioned before in the migration part when you’re moving data in and out of these cloud apps, they have to get converted format-wise, and when you’re doing ETL to cloud, you don’t want to be worrying about an off site data integration system that’s not talking to your on-prem system, which is kind of the problem you alluded to or at least the cause of it. Cost wise, we think we’ve got some game there as well, because we’re setting up Voracity as a subscription type of service, so that’s kind of like a cloud app and that way, you’re paying for an integration platform as a service whether it’s running jobs in Amazon, or your own data center.

We do intend to move this thing into the cloud next and for that reason, we’re working to support more web services and storage formats – S3 at the moment – but to us the point is cloud is really a source and a target for data first and foremost, and that’s data that needs to coexist with data on-prem and the applications that are running on premise. So to send on-premise data up to the cloud or vice versa, you need to have the necessary connections to do that, to bring that data down from the cloud and deal with it or as they say, move it up once we’re running in the cloud.

Eric Kavanagh: Right, and that’s a good point, because cloud really is just another source and/or target, right? It’s just like any other database or any other system. You have some connection with the cloud, you mentioned there’s three, of course Amazon, right? They’re one of the more popular platforms out there but there are a lot of other companies looking to get serious about that. We’re seeing Microsoft, of course, get very serious with Azure. They’re a bit late to the game obviously but what I’m hearing you say has to be good news for an organization that’s trying to migrate from legacy environments into what you could call this next generation of solutions, many of which are going to be based in the cloud, right?

David Friedland: That’s what I’m saying. You just need one data management platform with all those connections to cloud sources and targets set up, and that’s what allows people with either local or remote data in the cloud to manage it centrally in one place and move it back and forth between their data center and cloud servers.

Eric Kavanagh: Yeah, and I think that’s a good segue back into the whole constant of governance. We’re talking about data governance as opposed to a bunch of broader topics like IT governance. In terms of data governance, this is something I’ve talked about in many different shows and it seems to me that the standard operating procedure for many organizations tends to be either trying to control data governance at the source, where it’s stored for example, like the database, or at the target, meaning the application that some person is using. To me, both of those approaches are fraught with potential problems and challenges. It seems to me, correct me if I’m wrong, but the best way to handle governance is in a central location in between source and target, and that’s what we mean by a platform, right?

David Friedland: Right. A platform is that place where you can both discover the data and establish your data governance policies and enforce them, right? You have to effectively steward the data which means enforcing those governance policies, so in Voracity, you can impose governance on pretty much everything you’re doing with data. Again, starting with discovery and then moving into integration and migration and up into the analytic layer, and there is a lot of governance that can happen along that way in terms of the policies you’re going to apply, starting with even data searches like we talked about in data discovery earlier.

Then again it’s who owns the data and who’s taking responsibility for the data, who’s going to be the one transforming it. How will they do that transform and why will they transform it or protect it? Then is there an audit trail? What is the audit trail? Where is it and who’s in charge of that? Lots of questions to answer, especially when data is sensitive. Personal identifiable information is the first thing that comes to mind, that has to be masked in certain ways to comply with governance policies of an organization’s governance team, or with data privacy loss that are applicable to your industry, so that’s just the obvious stuff. Voracity users have to be able to do that, and they can with rules for creating and controlling their data and their data class libraries like we talked about before, which then get translated or expressed through global transformation and protection rules.

Then there’s data quality and master data. Those are important too, so that you can scrub and unify data in the right way according to policy and share that good new standard data with the right people, right? As opposed to the wrong people, so, again, a lot going on in data governance, but the nice thing about having this platform as you said is having a central place to define and to administer all of these policies through all of those activity layers, and have that all incorporated with the common metadata infrastructure and kind of an audit trail where you can see everything that’s going on.

Eric Kavanagh: It really seems to me that you cannot maintain a comprehensible and defensible data governance platform unless you have some form of centralized metadata management too, right?

David Friedland: Right. Like I was alluding to earlier, you need that common environment for creating and using your data layout metadata, your data class libraries, your rule definitions, your work flows, your job scripts or even your runtime and your audit log and I probably missed something in that list But at least in IRI Voracity, all of those pieces are working together, and they’re syntactically unified by that common metadata layer I mentioned based on SortCL, so when you have a common way to describe disparate data sources and different kinds of jobs, you can at least keep your head on straight for one thing, and for another, you can keep building new and different applications in the same way, in the same language. So that’s one big thing to think about with respect to centralized metadata management.

We also think that people need a central way to secure, and to share, and to change, and to track the changes in their metadata, and we handle that fairly seamlessly because there are, happily, robust version control systems out there that people already know, and they’re using them for source code control, and typically, they know them from their programming days. You can use the same technologies in Voracity because it’s an Eclipse-based tool that supports systems like Subversion and CVS and Git which people use for source control, so they can also be used for metadata asset control because they’re all just Eclipse plugins, so they work with those central metadata assets — all of which are text-based anyway — and they’re exposed in the project explorer. So it’s really handy to have a source code control system that works as a metadata asset hub and control system.

On top of that, there are also specialized tools for metadata lineage and impact analysis but unless you wire those up to this central platform, they’re not going to be as effective. You need a centralized platform where you can run your metadata lineage and everything else to have that enterprise-wide view of metadata and data as it’s changing across the enterprise. If you silo that, you’re not going to get that view, so again, you need a central platform to manage your data and your metadata. Otherwise, you’re going to have trouble managing either, or doing much in a compliant or in a repeatable way.

Eric Kavanagh: That’s a really good point, and I think it’s good segue into analytics because we talked about all this stuff all the time. If you’re doing analysis on ungoverned, dirty data, unclean data if you will, you’re just not going to get good results. The whole preparation process, from identification to access, of course, discovery we’ve talked about, integration, migration, governance, all this stuff leads up to analysis, so let’s talk about where you guys fit in that whole analytics space and what you’re able to deliver to the end user who’s trying to understand the business by splicing and dicing data. Just kind of dig in to the Voracity take on the analytic side.

David Friedland: Okay. The analytic component starts by going back to data discovery and integration and governance, and we probably shouldn’t have glossed over data quality as much as we did in talking about these other things because it is important for analytics as well to have data that’s not only been vetted, but cleaned up and validated ahead of those analytics. And I’m not talking about value outliers that you might want to keep but more like making sure the data and sources that you have are of the same type and don’t have invalid values. You don’t want to be graphing days with the 13th month for example, garbage in, garbage out. You want to avoid that.

Anyway, once you’ve got the right data validated cleaned, high quality stuff that’s otherwise prepared, and maybe even masked for compliance, there are a number of ways that Voracity will address its analytics as well. In fact, there are actually three, with a fourth on the way. One is embedded BI, which is to say, the ability to define custom-formatted report targets with business intelligence value in Voracity’s SortCL program coming from CoSort, which I mentioned a while ago where you’re combining reporting with your transformations, which is very efficient. It may not be graphically very sexy in the output because you’re just getting reports, but you’re getting a great deal of efficiency in both design and run time when you combine data transformation and reporting in the same job script and IO path. People don’t realize that.

Anyway, we can do that because we use the same metadata for not only the source definitions and the transformations of the data sources at the field level but also the layout of the detail and summary records in your target reports, and that can include other types of nice layout features like headers and footer records and doing things that create business intelligence value like cross calculations and statistical logic and trig functions and whatnot, as well as string lookups and string manipulations, and the list goes on. It’s a fair amount of analytic capability already built into the CoSort engine through its SortCL program for transformation and reporting, and Voracity of course supports all that in terms of design and display.

The second method for analytics that we’ve got built in is really the traditional data franchising, that’s data preparation, right? The blending of data for other BI tools that people already are using like Business Objects, Cognos, MicroStrategy or OBIEE or some newer ones like Qlikview, Spotfire, Tableau. Again, that’s data blending and that’s turned into a new market for specialty players Alteryx and Trifacta, but in any case, Voracity does that, discovery of the data, the integration, the masking and the munging of all the data for the benefit of those BI tools because it’s doing that work and handing off the results in smaller XML, CSV or database table subsets which those BI and analytic platforms can readily ingest.

When they consume it from our results, they can display them in their dashboard widgets, which is a lot faster. It’s just way faster that way because we’ve taken the heavy lift away. We’ve pulled the transformation of data out of the BI layer, so now you’ve just created a centralized result set that a lot of different reporting tools can use. Inn fact that is not only faster, but it saves space and it keeps all the data in sync, so data preparation is option two.

Option three is a kind of hybrid of the first two, where we marry our data preparation to someone else’s presentation in the same go more directly,. So we’ve done two pretty cool tie ups so far. One in BIRT, the business intelligence reporting tool for Eclipse that’s part of Voracity’s GUI, and the other is the add-on for Splunk that I mentioned earlier, and those are pretty tight integrations, so when you hit the display button, let’s say, in BIRT,or you go to index data for Splunk, you’re automatically running a CoSort or Voracity type data preparation or ETL flow job at the same time, so those guys get to take the IRI results in memory, so to the end user, that data preparation stuff was all invisible and the insights they are getting out of BIRT or Splunk are just happening a hell of a lot faster.

The fourth and final way is a road we haven’t yet paved. We’re working on bundling a partner’s self service BI tool into Voracity. It’s pretty cool from what I’ve seen so far. It’s powered by sharding, and it uses a pretty easy SQL-like query language to support a lot of different what if analysis and mashups on data in different formats and they’ve got lots of different display widgets, It’s really there for data scientists actually,but the cool part about bundling in a self-service BI tool into the Voracity layer is that you’re getting the ability for those folks to do their thing in a more governed environment where you can do the data preparation and manage the metadata and all the things you need to do around it to organize the data that they’re going to play with.

Eric Kavanagh: That’s a lot of different choices and I think that’s good stuff, right? Because you’re going to have different perspectives and different organizations and different areas of expertise and so forth, so people like that, and people may or may not have already embraced a particular analytic engine or visualization platform so you’re creating this holistic set of choices for speeding up what they have or giving them enough to get started without getting something else, right? I think you’re right about marrying data governance to data blending, and that relates to another topic we haven’t really dug into yet which is this whole concept of the data lake, right?

We’ve mentioned it previously and Rick Sherman’s comments and so forth, but this is something that Dan Lindstedt, I recall, was quite adamant about. He had this great quote where he had said something like if you simply buy Hadoop and dump your data into a data lake, well you’re going to lose lineage, and you’re going to lose a lot of context, and you’re going to lose a lot of metadata and other information about that data, which will cause you some serious issues down the road or even immediately, frankly. What can you say about your approach to enabling Hadoop without losing things like lineage?

David Friedland: The data lake where Hadoop is one of the possible stores of data is probably the best example of an ungoverned environment, right? It’s the same issue with the self-service BI tools, like I mentioned before, where folks are doing without a governed backbone, they’re doing their analytic experiments in more or less of a moshpit, so you’ve got to think about that. But Hadoop in the Voracity world I think is more sedate or at least under control by virtue of what we do and don’t do for Hadoopers.

In the first place, we need the data and Hadoop to be in a structure that we can support, so that’s an opportunity as well though some people might say is a limitation, but it certainly concentrates the mind and we’re talking typically about flat files just like we would be handling in a local file system where it might even mean mean an ODBC connection let’s say to Hive or a SQL connection to a NoSQL database that allows SQL connections with the data in Hadoop. If it is unstructured, again, we do have some ability there. We can import and scan data or scan and import data with Voracity’s dark data discovery wizard to get values out of them, but with respect to audio visual data, we’re not there yet, but there’s great deal of opportunity in handling big data that is more or less structured or at least can have a structure imposed upon it as I mentioned earlier.

For the types of HDFS data that are within our scope and within our reach, we can discover it, integrate it, migrate it, govern and analyze it along with all the other sources of data that’s not in Hadoop, so that again means having the necessary metadata connections and metadata management that we normally do as we talked about before including lineage. And that’s true of course when we might use a Hadoop engine instead of a default CoSort engine to run those jobs which are defined in Voracity workflows which all use the same job flow and task data that CoSort does. So there’s a lot that we’re doing to enable people that have data in Hadoop or that want to run their jobs in Hadoop.

Eric Kavanagh: We’ve covered a lot of ground in a fairly short period of time here, and I’d like to kind of sum up some of the key points from our other interviews, so Rick Sherman of course, Gwen Thomas, Dan Lindstedt, we’ve mentioned all those. We haven’t talked about Robin Bloor yet, my business partner, but it seems to me, even though we had great things to say, I could sum up most of the salient points like this.

Rick Sherman, I think he really hit the nail on the head when he talked about what he called a “silent disruption” in the data world. I love this concept. What he was saying is that much more than with the case just five years ago., when he goes up to do prospects and talks to them about what he can do, people just get it. He said more and more business people just don’t need to be lectured on why data is an asset, they just understand that intrinsically, which means he can spend his time getting into the details of their business and how data can help them, so I thought that was a really, really interesting observation.

Then Gwen, for example, really talked about the importance of caring for data and really being a thoughtful data steward. She is a wonderful person to have that perspective because if you look at her history and her role, frankly, founding the Data Governance Institute and being a true visionary in that space, you can really appreciate that she cares about the stuff, right? It’s not just that she’s worked in data, but she cares about passing on that mantra and that belief system to help other organizations really appreciate that data isn’t just bits and bytes. Data is meaningful for the company and it’s very meaningful for the customer and prospects and the partners and so forth and having that perspective of a data shepherd, if you will, I think really can generate some positive results.

Then there’s Dan. I thought Dan took a different path but I really loved this because he talked about the criticality of getting processes right? Understand that small mistakes early in the process will result in error propagation and where you wind up with problems that get exponentially larger over time, which we’ve seen in the world today. It’s all over the place, right?

I thought that it was really good that he brought us down to earth a little bit with all that, and then Robin of course, being the philosopher’s stone incarnate, I would have to say. He did an excellent job of reminding us to thinking strategically about data but also specific types of applications and tools and platforms and things like Hadoop by thinking strategically about the business needs and understanding how, where and when and why, all these different tools and bits of functionality can come together.

That’s really the recipe for success in data management, especially as we segue from this whole legacy world which is here and it’s here to stay, it’s not going to go away any time soon, and this new world of opportunities like with big data and the cloud and so forth with all these new types of data, I think those words and that wisdom really continued to resonate at least with me, so if you think about this whole big picture, right? You’re not going to get any of that stuff done if you don’t have some kind of platform for managing the movement, the transformation, the discovery, the cleansing and so forth of data, so that would be my sort of overarching assessment of where we’ve been in this long journey and I’d just be curious to hear your take on what you think about all that.

David Friedland: Of course and you know I’ll give that to you, but first let me just double back to echo some of your praise of the talks we’ve heard. I’ve heard them all as well. They were great not only in this series but in all the other forums that you hear those gurus speak. I think that their experience and the observations and then hopefully some of ours too will challenge everybody listening to think about a lot of the things about data and its management, and what we think a platform and what it should do.

In fact, as I heard those discussions about the philosophical and the business drivers around data, the technical issues involved and integrating big data and in the cloud, and concerns around handling data responsibly and how to blend it to help chart the course of the business, they all did seem to reinforce our vision for Voracity as a platform to unify and address all those issues. It’s because data drives our lives and our businesses now, and it’s become so much bigger and more complex in terms of its variety, velocity and veracity, I think it is too hard to handle piecemeal in chunks anymore, at least for professionals who are serious about curating and exploiting all that data.

If you want to make those processes possible, and preferably easier, faster, and more affordable, I do think a single platform that was built organically and thoughtfully as we hope that we’ve done with Voracity to support all those things, is the right answer for a lot of people who might not be thinking along those lines right now. But I think I covered why those things are true and why they exist in Voracity at least from the perspective of IRI’s price-performance history with CoSort and from all the Eclipse wizards we’ve created for the functionality which we built for data architects and for ETL job designers, and for compliance officers, and for data scientists, who can all come together.

As their awareness of Voracity grows, its adoption should follow, and ultimately, I think it even has the right chops to become the same kind of standard platform for centralized data management that CoSort became as a data utility, but again, those folks are going to have to try Voracity for themselves and decide if it has the versatility and the value that they need just like CoSort did. But again anyway, unless you have a central platform, you’re not going to be able to work on your data, all that data and be able to get done with what you need to have done with it. And I think it’s growing obvious to folks now dealing with all the complexity of the specialty tools and the big data technologies that are emerging out there, that those are essentially making a complex world even more complex.

Again, anything that you can do that will lend structure and a modicum of simplicity to this situation is a good thing, and it’s an even better thing when you can find a way to do that in a one-stop-shop that gets your work done faster and under budget.

Eric Kavanagh: Yes. I really have to thank you for your thoughtfulness, obviously, your vision, I think, in sponsoring this whole webcast series, this podcast series I should say. It’s been very educational for me and I’m reminded and really emboldened by the fact that taking this philosophic approach really does help us challenge our presumptions, our understanding and our very thoughts about how we approach data and how we use data. That’s always been the most fascinating part of philosophy for me: if you keep taking steps back, and you keep your mind open to applying different lenses to view the data and the situation that you’re trying to asses, you really can gain perspective.

Let’s fact it; the whole point of managing data in the world of analytics is to get perspective. That’s what we’re doing. We’re trying to understand what’s happening and figure out what we can do to change all that. We’ve kind of dug into a lot of these different aspects of what IRI’s built and it’s obvious to me that your platform does in fact enable a very responsible, thoughtful process around managing data and preparing it for analytics, and then to use it whatever you want for or to use whatever tool you want I should say. To analyze that data to generate the business insights and have the capacity to go change the business for the better, that’s the key, right?

David Friedland: That’s the idea. That’s right, and thank you for that observation about what we’re trying to do, and to echo those points, we are a company that is trying to do that very thing, and at the same time, happy to sponsor this series of podcasts on the philosophy of data because not only is it fun to wax philosophical sometimes, it’s also a useful grounding exercise, as you said, for those of us who are working with data everyday, and trying to develop solutions to deal with the problems of the day, and tomorrow.

About our colleagues, Gwen Thomas, Rick Sherman, Dan Lindstedt, and Robin Bloor, they’re all gurus in data governance and data management, so we do appreciate their insights, their experience and their intellect, and they do feed it into what we’re building as well. They give everyone else a chance to re-think about what they’re doing with data and that is very important right now, especially in the era of the internet of things, in the era of big data, the era of digital business, the era of Big Brother, and everything else. This era gives us an opportunity to re-think about what we’re doing and for that reason, we think it’s also an opportunity for the industry to look for a centralized data management platform like Voracity that can turn their data into information effectively, safely, quickly, and with again, with modicum of simplicity.

These colleagues, I think, did refresh our affirmation of IRI’s direction behind this platform, especially because it so comprehensive functionally as I think you’re observing, it’s offering the consolidation of IO’s and data management tasks, it’s got the third-party tie ups in different ways to design and run jobs, so I think they also support the idea of a complete data preparation environment for analytics that also manages the whole lifecycle of data … so that people can classify and profile their data, they can clean it up, transform it, protect it, and ultimately slice and dice it and see it in different ways, so that they can act upon it and finally get the value that they collected all that data for in the first place.

Eric Kavanagh: That’s a really excellent point. I’m glad you threw in the concept of lifecycle in that conclusion there because to me, that does speak to the fact that data does have a lifecycle, right? Data starts somewhere, it has some journey and it gets used somewhere, and this lifecycle is ongoing, right? You have these static snapshots of a business, but it’s not going to generate a whole lot of value unless you have the context, right? The big picture comes when you understand that you get data from somewhere, you are somewhere, and you’re going to use that data for a specific purpose, right? Ideally, what you’re doing is you’re using data, your analysis and your processes as a rudder almost that you use to direct your organization where you want to go, right?

David Friedland: Exactly. Maybe the simplest data philosophy of all should just be making sure that you’ve got the right people, the right policies, and the right technology to get there.

Eric Kavanagh: Yeah, that’s good stuff, and I love this rudder concept too because it’s not even like a steering wheel, right? Because a steering wheel is on the front of the vehicle, you’re moving the front wheel. For a rudder, it’s kind of behind, so it’s a little bit looser if you think about it, a bit more I suppose, but I have to say folks, this has been a wonderful exploration about the philosophy of data. If anyone is interested, send me an email and we’ll keep the conversation going.

I think it’s going to be an ongoing process because frankly, it’s fun to learn about not just the technologies, but really how they’re used, and how businesses can rethink their architecture because I have to say, the time is now with the cloud, with big data and all these things that you, David, have been bringing up here. The time is now to focus on this stuff and to get serious about it and to prepare the next steps to what’s coming down the pike, because it’s coming pretty fast these days, the innovation waves just keep on coming.

With that, we’re going to conclude our conversation with David Friedland of IRI, The CoSort Company and The Philosophy of Data. Thanks so much for all of you out there listening. Please do share this with your friends and colleagues. We’ll talk to you next time, folks. Take care. Bye.

 

Leave a Reply

Your email address will not be published. Required fields are marked *