Inside Analysis

The Data Vault and The Philosophy of Data

This interview is part of The Bloor Group’s research program, Philosophy of Data, which has been underwritten by IRI, The CoSort Company.

 

Eric Kavanagh: Ladies and gentlemen, hello and welcome back again to Philosophy of Data. My name is Eric Kavanagh. I’ll be your moderator for today’s conversation with one of the industry visionaries. I’m very pleased to have Daniel Linstedt on the line from Empowered Holdings. He is the inventor and architect of the Data Vault 2.0. We’re really excited to be talking to you today. Welcome to show, Dan.

Dan Linstedt: Thank you. A pleasure to be here.

Eric: Sure thing. With this program, we’re trying to better understand data management and find new perspectives for data managers to be able to explore their jobs in new and interesting ways and come up with new ideas, frankly. That’s where the whole concept of applying philosophy to the picture came into play, because I’m a Philosophy major from years ago, and I was always fascinated by the way a discussion and exploration of philosophy can open your mind and give you a new perspective for how to look at life or how to look at anything really. Of course, the study of knowledge, the study of wisdom, the study of being and existence, these are all interesting things. I thought of a great opening philosopher’s quote for today’s conversation: good old-fashioned Descartes, who said, “I think, therefore I am.” That’s really the business analyst’s mantra, right?

Dan: Yes, absolutely.

Eric: As you look at the world of data management – you’ve been in it for many years now – what drives you? What gets you rolling in the day? What gets your juices flowing? What gets you going?

Dan: I’m all about solving business problems and, more than that, bridging the gap between the business and the technology or the technologist in the field.

Eric:  How do you do that?

Dan: That’s a long story. There are a number of different ways, but one of the first things I like to think about is the fact that data is…unfortunately, data is what? In my particular case and with my customer’s data, it’s absolutely useless or meaningless in most of its current states until or unless the business users can correlate it, assign value to it and, therefore, turn it into information. That’s where I come in.

Eric: You are an expert in the data warehousing field. This takes us back at least 30 years, if you go back to the earliest system. Then, there were a lot of constraints that arguably don’t exist, at least in the same sense today, as they did back then. Nonetheless, the practices and the principles that were applied then are still relevant today. They are designed to help manage the flow of information and the design of information systems, things like star schema, for example. Can you talk about what those are and how they work and why they’re important?

Dan: Absolutely. These days, there are two sides to that coin. The first is the data side. I know I just said that data is essentially useless, but that’s at the business level, until an auditor steps into the picture and needs to look at how the data got turned into information from a particular point in time. In terms of the flows, the process flows, and the way business has handled this data or turned it into information, that’s changing rapidly in this market space. A lot of that has to do with the technology being, one would say good enough or fast enough. That begs the definition of what is fast enough or what is good enough, but we’re not going to get into that right now. For the most part, the technology has advanced far enough along to allow us to do things like managed self-service BI. By the way, I don’t like the term “self-service BI.” I don’t know if you remember enterprise integration information from the 1980s and early 1990s and federated query and such?

Eric: Yes, sure.

Dan: Basically, that was a Pandora’s box, a free-for-all: just get a login to whatever source system and then you can service your own requests or your own needs. It was an attempt, an early attempt, at what we call today as managed self-service BI. I like to say “managed” because IT is still necessary, governance is still necessary, metadata and management is still necessary. You have to work with the business these days. In fact, more and more IT people have to become business analysts to some degree to help the business understand how to source the data, how to align it, and really how to turn it into information.

Overall, to answer your question, the whole flow is really all about correlating, aggregating, gathering all of this data from across usually 20 to 50 source systems or more and then putting it all together on the other side, which is the information side and delivering it to the business on-demand in a managed fashion. Again, I want to stress the “managed” part because, otherwise, the numbers don’t add up. You’ve got John over in accounting who creates a dashboard and he’s happy, but then Sam, who’s sitting right next to him in accounting, creates his dashboard, and the numbers don’t align or they don’t match. Management, governance, architecture, process flows, metadata, definitions, all that flows together to create a common, seamless analytics platform.

This is what I like to work on.

Eric: I was going to say you bring up a really good point by calling out this moniker of self-service BI. I think what happens, oftentimes, is that various software vendors in our space are trying to come up with clever, compelling taglines to encourage people to use the product, but in many ways, they are a bit misleading. It’s like, with respect to software, that term “seamless.” That’s always a red flag for me because there’s really nothing that’s seamless; there are seams everywhere, and they are what holds together the application, the environment, the data warehouse, whatever the case may be. We have to be careful about how we view and understand the words that are used in marketing terms to describe what software does, right?

Dan: Exactly. My particular job these days, if I could pin it to several specific overview type statements, is to help companies establish the from and the to or the as-is state of where they are and the to state of where they’d like to be. Then help them understand what training is necessary, what data integration is necessary, what information flows are necessary, and potentially where they’re lacking in areas like governance, master data, management, and a variety of other things, including the managed self-service BI component. Underneath all of that are the moving parts and the pieces, which is where the Data Vault lives.

The Data Vault crosses those boundaries and helps the technologists on the IT side answer those questions in a more agile fashion and helps the business collect the data in what we call a data warehouse. You noticed we don’t usually call it an information warehouse, and that’s because it stores raw data that is somewhat correlated. Then, from there, we turn that into information. There’s the whole slew of rules around virtualization and how we get it out into star schemas and what we do with it to present it at the business layers. That really is my job, to come in and assess where is the company today and help the company get to their vision for what they want tomorrow. Sometimes tomorrow is next month and sometimes tomorrow is two years from now.

I’ve got a customer in Australia I’ve been working with for probably 2-1/2 years now. It’s an enterprise initiative. They have over 400 or 500 people in IT alone just working toward this grand initiative and grand vision, and I’ve gotten thank you’s from the CFO, let’s put it that way.

Eric: That’s fantastic. You brought up a good point when you alluded to the methodologies and the practices by which you put together these information systems by which you gather the data and incorporate the data and then start to make some meaning out of them. It seems to me that, inevitably, what happens is you have this bridge of methodology that spans ideally from the technology itself to the people who use the technology. Then, over time the software vendors will try to automate whatever that method may be so that you do get closer and closer to an easier path. You talked about how with the Data Vault architecture, you try to walk people through this process, like the whole WYSIWYG concept, where you identify the business process involved and then you automate as much of that as you can in order to expedite the process of getting value from data, right?

Dan: Absolutely. A big part of that, some would call it secret sauce, but the people who point to Data Vault and say, “Oh, there’s a lot of secret sauce there,” don’t understand it because, in reality, it’s all been published to the public domain. There’s really no secret sauce. Of course, my latest book covers everything from soup to nuts at an implementation and an IT level. In reality, the secret, if you really want to know, is something I like to call divide and conquer. I hearken this back to real life. When we’re faced with a big problem to solve, what’s the first thing we normally do with that kind of a problem? We break it down into smaller bite-size, manageable chunks.

I always ask business management, “Why then do you insist on turning data off the source system right into information on the fly through the pipeline?” and they usually start shaking their heads. You’re trying to solve this massive problem of enterprise corporate memory and data warehousing and auditability and flexibility, at the same time you’re trying to solve business information delivery problems. I tell them, “Look, how well has that worked for you?” They shake their heads. They say, “That’s obviously why we brought you in.”

The next key to that success is divide and conquer. That’s the part that I teach them how to do. It relies on patterns. If you dig into frameworks, frameworks are great. You have the Zachmann Framework, you have the CIF Framework, you have the Data Warehousing 2.0 Framework from Bill Inmon. You have all these frameworks. Even Agile, to some degree, is a framework. On top of that, you’ve got SAFe, for instance, which is a government standard for framework. If you dig into all these frameworks, you start to ask the question, “How? How do I build it? How do I get my IT people to generate things, to automate things? How do I get my tooling to work with these patterns? What are the patterns? What are the best practices for implementation?”

That’s where the Data Vault comes in. It’s a pattern-based solution that divides and conquers. By divide and conquer, what I mean is we separate the data, historical data storage and “date warehousing” – notice again we don’t call it information warehousing at this point – from the interpretation or interpolation, which stems from this old statement, which I never did like. The old statement in the market for many, many years used to be “create one version of the truth.” This is what we started life with in data warehousing. There is no single version of the truth. My truth is different than your truth is different than his truth. And, oh by the way, the minute I learn something, my truth changes. Truth is subjective at best.

We separate the facts from the truth or the truths, multiple truths that are true all at the same time across the organization, and then we separated the data storage and historical needs and data warehousing aspects from the information delivery cycle. This is the power of the Data Vault. We teach people what are these methods, what are these patterns, what are the integration components that can be automated, how do we turn data warehousing into a back-office, enterprise-focused vision while we turn information delivery back into the hands of the business user for managed self-service BI and self-automation 80% of the way there, because, obviously, there are some things that the business users just need to express themselves and do themselves that can’t be automated.

Eric: That’s a really good point. You brought up this whole issue of integration, which is the key process that pulls all this information together. You have to be careful about when, where, how, and why you do data integration. You have to dot your Is and cross your Ts. I think many of the problems that occur in the development and use of information systems results because of practices like ETL not really being thought through or not being managed and documented in a way to enable auditability, right?

Dan: Yes. That, again, hits at the core issue, which is how do you integrate versus how do you deliver information? How do you store data in an auditable fashion? How do you store one version of the facts or are there multiple versions of the facts? The answer to that is yes, and they’re all correct at different points in time, which is a common data warehousing problem versus how do you deliver information to the business users. Until businesses actually understand that they need to separate these two components physically as well as in a data warehousing, in BI and even the analytics space, they’re not going to succeed at the enterprise level.

Sure, you’re still going to have Joe in accounting or Sam in finance or Mary in manufacturing creating their own dashboards using data munging or data provisioning tools like Datameer or Domo or Alteryx, whatever the tool of the day is, but they’re not going to be focused on the enterprise version of information delivery until IT comes in and helps with the master data vision, until the business decides to separate information delivery or interpretation of the facts from storage of the facts. That’s what we really teach in the Data Vault classes and that’s how we help the businesses to operate in a better situation.

Eric: There’s one topic I’d like to dig into a bit, which is the hot topic of the day, the whole issue of big data and leveraging big data. You referenced Datameer, which is one of these new big data companies. One of the concerns I had watching this whole new wave of innovation and whole new wave of tools come out is that we may not have learned all the lessons we needed to from the first wave, if you will, of data warehousing and business intelligence, which is to say issues like metadata management, auditability, data governance, for example, and you see all these activity around the new tools.

We’re not going to discard the warehouse; we’re not going to discard the old way of doing things. Yes, there’s great potential from leveraging big data, but what we need to do is have a very managed process around the evolution to, what you call hybrid, where you’ve got this data warehouse and all your traditional corporate data and you want to find a responsible, meaningful way to incorporate the use of big data, right?

Dan: Right. Absolutely. Let’s address that issue. My customers today are asking the same questions as everyone else in the market, which is the topic that you just brought up. I want to step back and say, “Wait a minute.” Again, let’s look at the information or let’s look at the definitions of the terms we’re using. We tend to use the term big data, unfortunately, too freely when in reality the business might mean big information or, as IBM likes to put it, big insights. I think, really, there’s a separation, just like a separation of church and state. There’s a separation of data and data storage, tooling and use, and, of course, types of information – structured, semi-structured. Some would dare say unstructured, but I’m not a fan of that either because in order to make use of any sort of bits and bytes, you have to apply structure to it. Otherwise, you can’t interpret it. That’s for a whole another day. I wish I could show you a slide here at this point that I use in my training, but let’s get into the big data notion of it.

If you look at something like Hadoop, it’s a great platform. If you look at the common components of it, you tend to start asking the questions: where is the security? Where is the governance? Where is the metadata? Where is the master data? Where are the best practices and the processes for managing those environments? Then, of course, the Hadoop programmers who grew up in that world say, “It’s a free-for-all. We shouldn’t have to model. We shouldn’t have to understand. We should just dump all this data into what they call a data lake.”

I want to step back and say, “Wait a minute. That’s the wrong term.” If you look at a lake in the real world, one of the first things you’ll notice, if you’ve done any sort of marine biology or studies, is that lakes, just like oceans, are stratified. What is a strata? A strata is a structure. A structure in the strata tells you what kind of fish live at which levels of the lake and which depths because of the levels of oxygen, what kind of plants will live there, where things decompose, how much light gets through the water.

I like to say if you’re really talking about a data lake, you’re talking about applying some form of structure to this data set. That can come in, in the technology speak, in terms of schema-on-read, which is fine; but, schema-on-write, it’s neither here nor there. That’s a performance issue, not a business issue. I like to say, “Look, if you’re just dumping data into what you call a data lake, I’m not willing to let you call it a data lake anymore.” In reality, it’s a data dump or it’s a data swamp. At best, it’s a data swamp or even a data junkyard.

The thing is all this stuff is just heaped in, loaded in, dumped in. There’s no structure, there’s no understanding, there’s no vision for how to apply this data or even understanding what you have in order to clean it up or to do anything with it and turn it into value-added information for the business. Now all you’re doing is you’re using a platform technology like Hadoop for storage purposes. Don’t get me wrong; that might be an okay objective. It depends on what the business use cases are on top of that, but, again, in order to apply any sort of business information on this data, you have to begin to stratify it, you have to begin to profile it, you have to begin to manage it and understand it and apply structure to it, so that you can get results from it.

When we talk about tools like Datameer or Domo or Alteryx or any of these others that are out there, they’re very, very good tools; some of them are better than others because they offer governance, because they offer security, because they offer lineage. These are the things I encourage my customers to look at when they’re looking in the big data space. I’d say, “Hey, look, you can’t throw the baby out with the bathwater. You can’t just simply say, ‘I’m going to move my entire warehouse off of traditional relational, drop all my best practices and process and procedures that have been working for years, and throw it in this Hadoop thing, and expect it to work.'” It’s not going to happen. The best practice that I teach my customers for big data is to integrate Hadoop as a platform, as a hybrid platform or a hybrid solution, and then say, “Okay. Now you’ve got data analysts.”

I don’t know where this term data scientist comes into play, but I have to share a small rant on that one. I wrote a blog entry awhile back on my blog, DanLinstedt.com about So You Think You Want to be a Data Scientist? I didn’t get very many responses, but, hey, so be it. In my opinion, if you’re going to be a scientist, you’re going to follow through with the scientific method. I have yet to meet an actual data scientist who follows through with scientific method of testing along with control sets and everything else. That’s personal rant.

Again, it comes back to management and maintenance, metadata, governance, and best practices. You could take a tool like Datameer which, by the way, I do prefer because they have a lot of customer case studies and success stories. You look at these tools in that category and you say to yourself, as a customer, where are you going to apply it, whom are you going to allow to use this tool, and are you willing to send people to training? Because these tools can be used in two lights. Most of these tools are highly focused on the end-user desktop experience. You build your models. Where? On the desktop. You build your data. Where? On the desktop.

What happens to the enterprise version of this data set? You could lose it. You could lose the vision for enterprise relationships and enterprise integration. Then what happens is this whole idea around data warehousing being a corporate asset begins to break down. Now you’ve lost focus and vision for information delivery at the corporate level. All those things that you worked so hard for over the last 15 years have just flown out the window.

This is what the companies that I talked to and that I worked with on these ideas and I try to keep them focused on management and governance and security. I haven’t even began to talk about security and privacy and already there’s a lot of red flags around these tools.

Eric: You bring up a good point about the importance of considering governance. To me, one way or another, an organization needs to have some overarching platform. From a vendor perspective, there are obviously vendors that deliver platforms for data management. IRI, the CoSort Company, that sponsored this whole series of podcasts and this e-book that we’re going to publish, is one such company. There are some others out there, too. There is some clear logic behind having at least one platform that governs what you could call the foundational level of your information management because, otherwise, if you have tools like Datameer pointed at systems that are not incorporated somehow into your information landscape, into that platform, then you start getting a skew, basically. You start getting different views that are not going to be aligned with your corporate vision, with your understanding of what your information assets say, right?

Dan: Exactly. This is why I talk so much about master data management as well. With the IRI tools, you can pull together … This is one thing I don’t like about the label master data management. People look at that and say, “Wait a minute. I can buy a master data management tool?” No, you can’t. Master data management’s made up of two pieces. It’s made up of master data and data management. Data management is people, process and technology. To my knowledge, no tool vendor as of yet has actually sold a person along with their tool, not to mention processes. I have a lot of bones to pick with the language that the industry uses.

As far as IRI is concerned, I like their solution because we can govern the end-to-end processing in a central place. With the governance comes the ability to manage. That’s really what this is about. You throw the Data Vault into that kind of a mix, and all of a sudden you have standards around the processes, the IT processes or the data processes or the information processes. You have standards around the data modeling constructs that are behind the scenes inside the warehouse. Whether or not you build Data Vault model or a data warehouse in your Hadoop system or you build it in a traditional relational database or you build it in a hybrid, it doesn’t matter. You have the best practices outlined, you have risk and mitigation strategies, and at the end of the day, you have automation components. You can automate 80% if you subscribe to the viewpoint that the Data Vault brings to the table then you can automate, then you can leverage, then you can do things repeatably and consistently well. This is where a good tool like IRI’s tool can come to the table and help you get there faster.

Eric: I think the real key or one of the real keys, for me at least, is having visibility into what’s going on underneath. Like you said, the data models, for example, the processes for gathering data and to be able to see the data lineage, because we should remember that every company, sooner or later, is going to lose someone or hire someone. If you don’t have some clear visibility into how this information system works, new people are going to have a pretty hard time understanding what to do and how to change things. That’s the key. If you don’t have visibility into the nature of the information systems themselves, you’re not going to be able to change stuff very easily. If you do, you’re going to wind up with errors, right?

Dan: Absolutely. That goes back to the components. If you look at master data management, there’s also now master process management, which has largely been ignored by the industry, and there is master lineage management, which, again, has been largely ignored by the industry. There are a number of components that have to achieve a mastery level for an enterprise-focused vision. Tools that allow us to control these end-to-end components or even generate or automate these end-to-end components are very, very valuable in the space we live in today.

That’s one of the things that I want to draw to the light for those who are listening to this podcast, is that if you simply buy Hadoop and put a Hadoop in, you lose lineage. If you’re just dumping data right into a Hadoop store, you’ll lose lineage. It’s not there. You don’t know where it came from, you don’t know how it got there, you don’t know how it’s defined. You lose metadata, you lose definitions, you lose enterprise focus and vision. Somebody using one of these managed self-service BI tools, if they’re not governed in terms of who gets to use which tools and who gets to log in and who gets access to what data can log in to their Excel spreadsheet on their desktop and upload an Excel component directly to Hadoop and then make it available to the rest of the enterprise. That can be a serious problem.

Eric: Yes, especially in terms of accessing either information that is not certified, for example, or information that’s not aligned is another example.

I think to myself, too, from an e-discovery perspective, if I’m an attorney and I’m representing a company that’s suing your company and I find out you have this ungoverned Hadoop cluster, I’m just going to dive right in and have some fun, right?

Dan: Oh, yeah, because you also don’t know … with the standard plain Jane Apache projects, although they’re working on this, you don’t know who’s accessing what, when and where. All you see is a lot of information or data just simply flowing into this data swamp. Again, having the right process at the business level, having the right tooling at the technology level, and, of course, training goes a long ways to making all of this work, but you do need a methodology that can help you govern all of this from soup to nuts, from end to end. That really helps your organization become agile whether or not you’re five people, three people, or 250 or 500 people in IT. Everyone should follow the same standards, the same principles, the same best practices.

Eric: I guess maybe we’ll close with one challenging question because lots of organizations that have been around for some time and have data management practices in place will need someone like yourself to come in and navigate through what is already there and then stitch together a new, more complete framework for helping them manage their information. I’m guessing that can be a fairly sticky process. Can you walk us through what the procedures look like to first do the assessments then incrementally move toward the final vision? That’s what you alluded to at the top of the call, breaking it down into bite-size pieces. Can you talk about, if in the perspective of someone who sees your vision and wants to bring you in, how do you manage that initial process and then how do you get the team to collaborate?

Dan: Absolutely. The first thing I want to say is once people understand the value of Data Vault from end to end, the system of BI, they begin to see that it will most likely require a culture change. Some companies are averse to culture change and, at that point, will not move ahead with anything else that I have to offer, and that’s okay. That just means they’re not ready yet to solve their enterprise problems.

How do we get there from here? How do I help them? There are three different ways to give this a go. The first that companies have chosen to do in the past is hire me for a two-week process called a kick-start, where we come in or I come in and do the training to get them all base-lined. We take a small team. We establish upfront before I’m on site what the statement of work needs to be, what the deliverables can be. Then we establish the sandbox environment. As soon as the training is done on day 3, we get the team into an Agile working mode. If they understand Agile, that’s even better.

By the way, I’m backed by my good friend, Scott Ambler, for Disciplined Agile Delivery. He backs the Data Vault as well. I work with him to make sure that the Agile principles and processes in the Data Vault methods are verifiable. Shall we move ahead with an actual implementation in a sandbox environment solving a real business problem so that the work is not thrown away? That could be done in a two-week time frame unless the scope is much larger, in which case we end up with a six-week time frame and two-week deliverables within those six weeks, meeting the business goals.

The second way that the company can do this is a little bit more relaxed in terms of an approach. They can bring me in for a training class and then they can bring me back usually a couple of months later. Some of their developers have tried some things and some of it has worked and some of it hasn’t, so they bring me back in to look at what they’ve done or what they tried and assess what they built and also help the business understand where they want to go or whether or not they’re ready for the next step of culture change within a small team.

It really is all about building the value through experience and getting the business as well as the IT folks to understand that, hey, they don’t need to do as much rework or maintenance anymore, that they can automate these things and they can spend more time and have more fun building new solutions, that meet business needs, being more valuable to the business.

The third way, which is atypical of my role in customer sites, is the enterprise brings me in, the executives, the directors. They say, “We’ve heard about the Data Vault from some of our tech folks. We’ve done some reading. We like what we see. Can you give us a 45-minute executive brief on the value propositions, getting there from where we are?” I’ll go through an initial brief. Then they’ll spend probably a couple of weeks more talking about my presentation. Then they usually setup some conference calls or web calls where I answer some questions. Then, after that, they bring me in for either the two-week assessment and training or they bring me back for sandbox environments to help them go on their way.

At the end of the day, many of my larger corporate customers or even government agencies will end up signing a long-term retainer with me. By long term, I mean it could be six months; it could be a year. The customer I have in Australia at the moment signed me for a four-year retainer. That’s not to say I’m working full time with them all the time. That’s not the way that works. We manage cost that ways. I work a couple of hours a month here and there and sometimes I go on site for two weeks at a shot. That’s the nature of how this goes.

Eric: You brought up a really good point I think we should close with, which is the ultimate value to the company hinges on the fact that if you don’t get your foundation right … And I think you’re spot on by pointing out the business processes in particular, not just the technology – technology is only a tool – but the business processes and the methods that your organization and your people use. If you don’t button down processes, you’re never going to solve the problem and you will always be creating more problems each and every day, which means various people on your team are going to wind up putting out fires on a regular basis forever. You have to get that foundation set. Then, upon that foundation, you can build all sorts of interesting views of the world. That foundation is absolutely mission-critical and that’s what most companies, frankly, do not have these days, right?

Dan: That is absolutely correct. Just as a last thought here, I came from an industry … I cut my teeth in government contracting with CMM and SCI, function point analysis, lean initiatives, cycle time reduction, 6Sigma, lots of mouthfuls of acronyms, but those of you who listen probably understand what I’m talking about. This was in the 1990s. In those particular instances, we had no choice. Ultimately, even though it was the data warehouse, our product actually made the difference in terms of whether somebody lost their life or somebody kept their life based on what, of course, the entire team working together. The tolerances for failure need to go down. They absolutely have to go down in the data warehousing and BI space and the analytics space. That’s where management and culture needs to change. They have to have the right foundation to do this.

I try to impress upon the organizations I deal with that if they don’t think they’re ready for extreme auditability or for a culture change that requires a foundation or foundational presence, if they’re still hung up on this idea of extreme RAD and throw everything in a data lake and see what sticks then I’m not for them. I probably would not be a good fit. If the enterprise has had these problems for years and hasn’t been able to solve them and wants to move ahead and mature on the next cycle then that’s where I can come in and really help them get there.

What I took away from writing compilers, and linkers, and assembly language, and all of that is changing skew factors on disk to read MSDOS disk where the skews were different as I started to understand the architecture of the machine. If you think about it, what would happen, and it does happen sometimes, if a BIOS goes haywire, if there’s an error in a BIOS? Well, the software vendors can’t fix it, the computer manufacturers have to fix it, and it might take months to fix it. There was a prime example a couple of years ago with a graphics card where they programmed the BIOS to do something and it killed the machine, just stops it dead. I mean, I have one of these old style … It was the Sandy Bridge problem and you can look it up.

Anyway, the whole thing I learned about this is down at the level of the machine when you’re talking interrupts, and IOs, and timing, and frequency, and the round robin scheduling, and all of that, when you get to that level of a machine you start to understand how data really works and you start to understand this nature of data integration, but more than that, every single BIOS or every single machine works in a fundamentally governed manner. It all works the same. All machines have interrupts. All machines have disk. Well, the definition of disk has changed, you’ve got in memory and whatnot, but all machines have memory and all machines have some sort of interactive device and displays, and they all work the same exact way. So, I learned that patterns are extremely important for data management and data governance. That’s where a lot of these foundations came from.

Eric: Yeah, that’s great stuff and you’re giving me ideas for another project somewhere down the line because that’s something I would like to dig into more as well to help people better understand what those patterns look like. Because, like you say, if you get them … This is what I love about machines, and it kills me to see, what’s his name? Elon Musk and even Stephen Hawking, they’re saying, “Oh machines are going to take over the world and it’s going to be Terminator.” I’m like, “No, it isn’t. Don’t worry about the machines, worry about the people designing the machines, using the machines. That’s a problem.” That’s been a problem for thousands of years. There’s nothing new there. It’s a classic case of displacement, right? Of displacing responsibility. We’re trying to say, “Oh, it’s going to be …” I refuse to believe any computer or even Hadoop cluster is going to wake up and go, “Ah ha, let’s take over the world.” There’s no chance of that happening.

Nonetheless it is fascinating to think of the actual transaction of data movement, which of course is what everything is predicated on. Every application that you use and every computer everywhere leverages these patterns that you’re talking about at the infrastructure level.

Dan: Yes, and that is what has made me so successful with the data vault approach, is I buried all of that inside the data approach and everything I do stems from those principles. I’ve been, historically, very, very good at finding problems in software and getting vendors to fix them of course. At one time I was QA for Borland Software in the heyday when they were the number one software maker on the planet, right behind Microsoft. I guess that would make them number two, but anyway.

All of that’s buried in the data vault and so when I go in to teach customers and teach companies I don’t shy away from any of these questions because I do know the answers. I’ve been down that road. I understand how data works and I know how exactly to integrate it and to make it fly and to automate all of that stuff, because in assembly you don’t have a choice. You make one wrong mistake in assembly and your whole program simply dies, the whole system just keels over. You have to have very, very low tolerance levels for errors and it kills me to look at these people going through college that don’t learn these principles, these foundational principles of data management and data integration. They say, “I don’t want to take a compiler type class because I’ll never write a compiler.” That’s not the point.

Eric: That’s exactly right. I’m with you a hundred percent, man. We should, at some point, I mean you’re busy and we’re busy, too, but at some point we should circle back and do some kind of a program on that and then see if you can leverage it into either e-learning, or a class, or something. I think that’s a really, really good idea and, like you say, it helps people understand the foundation upon which they’re building things. If you don’t know what the foundation is like, can you really understand what you’re building? The answer is no.

Dan: Right, and so that is where the heart of my philosophy of data lies.

Eric: That’s good stuff. Folks, I’ve been talking to Dan Linstedt of Empowered Holdings, architect and inventor of the Data Vault. Thank you so much for your time.

Dan: Thank you.

Eric: All right. Take care, folks. This has been another episode of Philosophy of Data. Bye bye.

 

One Response to "The Data Vault and The Philosophy of Data"

  • Jane Roberts
    May 10, 2016 - 8:36 am Reply

    Great talk, guys.
    Meta data and Data lineage are hot buttons with me. No reason to abandon them just because you want to be agile. I appreciate the comments about Hadoop and cautions against the data swamp.
    I’m looking forward to hearing more of these talks.

Leave a Reply

Your email address will not be published. Required fields are marked *