Inside Analysis

Data Governance and the Philosophy of Data

This interview is part of The Bloor Group’s research program, Philosophy of Data, which has been underwritten by IRI, The CoSort Company.


Eric Kavanagh: Ladies and gentleman, hello and welcome to the Philosophy of Data. My name is Eric Kavanagh. I’ll be your host for today’s conversation. It’s the very first in our project: trying to understand what a philosophy of data is all about and to help people of all walks of life figure out what do with data, what not to do with data and how to keep an open mind about this whole industry of managing and working with data. I’m very pleased to bring on our first guest, Gwen Thomas, founder of the Data Governance Institute, who for the last couple of years has been working with the World Bank Group to mature its data practices. Wow, long history of success stories before that. Gwen, welcome to the show.

Gwen Thomas: Thank you so much.

Eric: You bet. So you really were a visionary in the data governance field. I’ve known about you first at least twelve years or so. I remember reading some of the early stuff that you put out there. You saw way down the road that there were going to be some issues around data governance. Do you want to just talk about where that all came from and how you got into the business?

Gwen: Sure. Actually, I got into the business in a circuitous manner, as many of us in data do. I studied to be a composer and studied systems and how various pieces work together to create this whole, and then decided I wasn’t going to work in that industry. I went off to do some other things. During a little gig as a tech writer, helping a data architect in a banking situation to discover some information that was required for compliance purposes and validate that, I discovered that I had this uncanny ability to picture where different pieces of data were stored together. I started sketching out a data model and she started laughing at me and said, “They, brought in a ringer, right?” I said, “No,” and just discovered that my training and thinking in music was the same work and skillset that was needed to work in data. My brain was lighting up those days, and I make the decision, “I’m going to do a left turn,” and entered this industry.

So I did that and then worked my way up through various activities within consulting companies as I learned the lingo. It was really during the Sarbanes-Oxley years that I realized that the big consulting companies were focusing on the technical aspects of managing data and were not giving the same attention to the governance aspects. So I formed the Data Governance Institute and worked as a consultant in that field. Evidently, the rest of the world decided it was important also because the discipline exploded. I’ve been so thrilled to be a part of it and to help bring some of the frameworks and foundational activities that have helped others do this important work.

Eric: That is really cool stuff. It’s interesting, I did not know that you trained to be a composer, but if you think about an analogy of music to data, especially to the concept of data governance or management, it really does require very careful orchestration, right? You have many different voices; you have many different roles. They all sound different, there are different things that they do, but if you do it all correctly, you achieve a certain harmony and that’s when good things happen with data, right?

Gwen: Exactly. When we talk about musical ensembles of any type, it’s easy to talk about the blue sky situation where everyone’s in tune and everyone’s coming in on time. Especially, say, the score of a movie. Most people may not even be aware of it as it’s happening because it’s working in the background, the way data management works in the background of our businesses, or government or daily lives. Let me tell you, one piccolo out of tune screeching in there, coming in at the wrong time, and everyone will be aware of the music. It is those same types of ungoverned situations within the data world that I and others in this field are working to manage.

Eric: Yes, that’s a great analogy. I’ve got a music background as well, and it’s funny. There’s a blessing and curse to when you learn how to understand intonation and when something’s in tune. As soon as you understand pitch and intonation, you can really hear when something is out of tune, and it’s painful, right?

Gwen: It would be interesting to know how many of the people listening to the podcast have some sort of a musical background. When I do conferences, and I’m able to ask from the podium, typically about eighty percent in data management venues have music training in their background.

Eric: That is very interesting. Of course it’s a very complex field, it’s a beautiful field. Music is invisible to a certain degree, right?

Gwen: Mm-hmm (affirmative).

Eric: Data is invisible to a certain degree I suppose. I’ve also noticed the philosophical angle as well. You and several other people I’ve talked to about this book really do seem to appreciate the concept of philosophy of data. With your data governance background in mind, I was trying to think of a good philosopher and quote to throw out to you, and I came up with one. It’s one of my favorites from Immanuel Kant, who had his categorical imperative, I think he called it, who had said, in terms of human behavior, “Act only on that maxim which you can at the same time will as universal law.” Isn’t that interesting?

Gwen: Repeat the last part.

Eric: “Act only on that maxim which you can at the same time will as universal law.” In other words, any action you take, do so only if you could issue an edict to the world that anyone else in your situation would do exactly the same thing.

Gwen: Oh, my. Isn’t that something?

Eric: Isn’t that wild? It’s very pithy.

Gwen: Yes, yes. You know what? That’s fantastic, especially for the field of data governance, because the tempting thing about data governance is to only look at the end result and say, “Oh, it’s all about rules and standards and definitions, therefore I should take an authoritative approach towards this. It’s all about controls and my perspective is the only one that counts, therefore what I’ve just decided needs to be done is that which everyone else should do.” I have heard some people with maybe a little less experience in the field part from that philosophy. If we look at it at the … Oh, I think I’m revealing my philosophy of data here, or at least of data governance. If you peel back from those rules that you are proclaiming, and you look at the processes to develop the data that is subject to rules and the systems or environments in which the data exists, you very quickly realize that there are some standards and rules that are truly universal, but many more that are situational. So you start inserting some conditional statements into your edicts and if you work back from that, you realize, “Oh, for every conditional environment, you have stakeholders.”

To make this real, suppose you need to display on a report the age of a certain person at a certain time. The rule might be that the age is going to be displayed in years or years and months and day, or even more than that. If you look at all of the stakeholders who are using the data as well as the caregivers for the data that are tracing it along the way, the caregivers will tell you, “Oh no, you don’t enter the data that way. You have to enter it as a date of birth, and then you’ll calculate it from there. If you enter it as a conclusion, then one second goes by and your data is wrong.” It’s important to know, the caregivers will say, from your stakeholders, are they rounding it to a year? If so, are they rounding down to the lowest year, or do they really need the months and the days? Because that will inform how we capture the data and how we architect it for capture, or if you’re in the medical field, maybe we really are going down to the minute and second for the date of birth.

So going from the Kant statement to issuing our will, we have to, in the field of data governance, understand our stakeholders, understand how the data will be used and for what purposes so we can, within our communities along the data supply chain if you will … Then we can agree upon the appropriate rules for what to do with it, what not to do with it, how to structure it and what kind of quality we should put against it. So there, Kant is explained in data governance now.

Eric: I just love it. I love Immanuel Kant. You reminded me of a couple other things that I’ll throw in that I think are pretty relevant, especially with your background. One is another ancient philosopher I’ve loved studying for years, Lao Tzu, who of course wrote the Tao Te Ching. In one of his other works, he had this line that I read on a plane one day, and I just got chills thinking about it again, because it was so powerful. This is three thousand years ago that he wrote, “When the laws are complex, the bandits will abound.”

Gwen: Now repeat it. “When the laws are complex …”

Eric: “The bandits will abound.”

Gwen: That’s wonderful.

Eric: Isn’t that wild? Because you talked about trying to come up with some simple rules for what to do with data, but then you have situations where you need conditions, you need to give some further direction. Of course, this is how rules are created. You can even look, for example, at the Bills of Rights, and you will notice that the early outline of the Bill of Rights is very short. As they go on, as you get to twelve, thirteen, fourteen and beyond, they get much longer. There is this growing complexity over time, and then if you look at some of the laws these days, they’re extremely complex. So as a data governance professional, I’m sure you’re forever trying to balance simplicity with granularity, right? Isn’t that one of the biggest challenges?

Gwen: Absolutely. You start with the highest level rules, and one we see frequently is data is an asset. Another that we frequently see is that data will be fit for purpose or fit for use. Then it becomes more and more complex to extrapolate what that means. With the quote, if we replace the word “bandit” with a word that means those who are looking out for their own objectives, their own priorities, their own interests, now it becomes totally applicable to daily life in a large organization with a lot of data spread across multiple systems that require everything from ongoing maintenance to minor changes to transformational or even disruptive changes. What are the groups of people that are involved in that, and what is their natural interest here?

Suppose I were to give, as a requirement, to those who are architecting a system that includes information about people and including their age. Now suppose I were to give that requirement to a developing company, maybe even an outsource company who does not understand what the uses are. How are they going to make it fit for all purposes if they don’t know that? Their incentive is probably going to do the fastest, cheapest, easiest way of capturing information. Most professionals would know don’t capture current weight. They would know to capture it as a field that can be calculated. Trust me, no one is going to go beyond birth date to the more granular information unless they are explicitly told to do so. That’s going to influence not just the extra ten minutes for architecting that one field, but every bit of the IT calculations throughout the systems going forward. Who would introduce that complexity unless it’s important to do so? Heaven knows, if we pay them on a fixed bid contract, we just added work that they would have to absorb.

The only banditry that I see within the caregiver or IT and data management environment might be occasional lack of understanding of requirements, which influences budgets, but the piece that I think is applicable to everyone is knowing that is highly sensitive information that could be used along with other personal protective information, PII, for identity theft of misuse of information, blah, blah, blah. Now we all have to be aware of what types of access controls need to be put in there: preventative, detective, corrective, et cetera. How do we need to treat this sensitive information all along the data supply chain? That gets extremely complex. It requires design work to include a systems thinking, flow chain thinking in a manner that is quite different from design work. It takes those who are managing the entire data management and IT environment to really step back and look at the situation, apply either their own or the corporation’s philosophy about what to do with and not do with it, including control and governance, and adjust roadmaps and schedules and budgets accordingly.

One of the biggest skills for working with data is the ability to work with the system as a whole. Very high level, thirty thousand foot thinking, to understand what are those small but highly significant decisions that are embedded within work, such as, “Hey, how shall I structure a date field?” Know where those decisions are embedded. Zoom in and ensure that the right governance is put in place there. Trust that those around you are going to work to those requirements. Ensure that your system has a way of verifying. Trust, but verify. Then have the selectability to jump back up to your five thousand, ten thousand, thirty thousand foot view, and move on to the extra effort. We in data management tend to get exhausted a lot by that up and down work.

Eric: No, that’s a really good point. You’re speaking again to the importance of context awareness. It made me think of something that I learned from a teacher at TDWI a number of years ago, Maureen Clary. She gave a class called Power and Politics in Data or something like that.

Gwen: I took that same class. She’s brilliant.

Eric: Isn’t that a great class? She puts you in different perspectives and different roles and has you do role playing, and you realize that, “Hm, it’s not all peaches and cream as a manager,” or “It’s not all just grunt work as a frontline manager” or something like that. You get much more appreciation for the complexity of decisions that people need to make in certain roles, and therefore you’re more open minded about listening to people and talking to people and asking enough questions, if you are, for example, a systems designer. To your point, this data, if it’s used in systems, it must be machine readable.

Gwen: Right.

Eric: If it’s not machine readable, then it’s not really doing what it’s supposed to do. We have things like fuzzy logic and semantics. Semantics gets pretty fuzzy by definition, because you have different words that mean different things in different context. Semantics can be good for marshaling data into a certain area for analysis, but if you’re talking about the data that runs systems and that spits out checks, for example, or work orders or things of that nature, when you really have to be precise and you have to think through, to your point, about a date and a birth date, “Is this a dynamic field or is it a static field?” Your age changes every second every day, right?

Gwen: Mm-hmm (affirmative). Also, you have to distinguish between technology, which are the containers of data, the pipes, the pumps, the filters, the spigots that move it out into reporting technology, and the content. Data is the fluid that is moving through all of those pipes. If you are in charge of, let’s say, a juice plant, you know that you have different skillsets for managing the containers and the assembly lines and the trucks that contain them and the crates. You have different specialty for dealing with the orange juice that’s going into it. Early in the data governance field, we used that analogy a lot to point out that you would never ask an electrician or a mechanic to test the quality of the fluid going through the pipes. You would use a chemist for that. Likewise, there is this entirely different field of data management and governance that needs to work side by side with the IT workers.

This was fairly revolutionary at the time but is now common practice across most IT organizations that either have data management specialties within IT or have reorganized so instead of business and IT, it’s business, data and IT. It’s very gratifying to see that so many in the IT world are getting it. Now, for consumers of data, it may not be clear to them who needs to do what, and if we move all the way up to executive branches, it’s important for those of us who work in the field to help them understand the complexities of working with the information that drives their organizations or is an output from their organizations.

Eric: You brought up a really good point about the old way of looking at this environment that we work in of business and IT, a new way which appreciates and understands that IT consists of the pools with which we work with data. Data is its own separate plane. You had mentioned before we did this call, something in reference to our good buddy Rick Sherman from Athena IT Solutions, about this concept in a call we had the other day where he referred to a silent disruption.

Gwen: Love it.

Eric: Which I felt was so interesting. He was talking about how now there has been this significant change, a sea change in corporate America, wherein many more people naturally, intrinsically seem to understand the importance of data and the nature of data. Whereas, five, ten years ago, that was not the case. That’s a big change, right? That’s a big difference.

Gwen: Mm-hmm (affirmative). I completely support this. Although, I go back to the mainframe days. The importance was critical, but data workers were baked into the teams and they were also highly controlled. Of course the data controls were there, but move past the mainframe days and into our current environment, and I kind of got lost for a long time, but I’m seeing a lot of the awareness, much as Rick is. As a matter of fact, a wonderful situation at work. As you said, I’ve been helping the World Bank Group, especially the International Finance Corporation, which is the private sector. I’m helping them evolve their data management practices. Just very recently our CIO asked several of us to help plan some long-term and short-term evolution that will feed our funding and our prioritization and work plans. There’s twenty-five interdependent systems that all need to have the data flowing through them into a common environment. Her instructions just showed how wonderful it is to have this new vision.

She loves the visual of an iceberg with a certain portion about the waterline, and she said, “Now, coming from that, I want you to show those projects that are going to be visible to our end users within the corporation. Show the spending for that and show the responsibilities for it. Then under the water, let’s put that in too, and the bottom part will be those core platforms and technologies that are required to make this work.” That would be a lot of Rick’s area. The middle layer, just below the waterline, is the data management and governance. The work that it takes to ensure that the data is flowing through the systems in the right way so it can get above the waterline, and more importantly, that it is shaped and controlled and governed in the way it’s needed.

So she said, “I want you to go to the funders with these three layers and show them ‘You are probably aware of the top third, but these other two layers below the bottom line are absolutely critical. Whether you have been aware of them or not, please understand that they are critical to get what you need. By the way, in those two layers, there is a portion of work that is business as usual. Some is enhancement. Some of it is absolutely transformation and disruptive.'” Our job once again includes a lot of that education. I was just so thrilled that that’s the way she wanted to present it. I think I’ll go back and share the term “silent disruption” with her. It’ll help her make the case.

Eric: Yes, it’s such an excellent analogy, too, because you think of the tip of the iceberg as the interface that some business user sees or the report that they receive at a particular point in time. If they really appreciate the amount of data and data process and technology that’s going on underneath the waterline, you can start to understand why it takes so long to get to a certain point in time, where there are such data quality issues that are deeply embedded in older systems. You know, from being in the industry for so long, especially when you have mergers and acquisitions or when you try to consolidate data from multiple different data sets, there are so many things that can go wrong. One small field out of alignment, and you get this propagation of errors that comes into play, right?

Gwen: I’m getting a wonderful visual here. Maybe a cartoon. We should get someone to draw it. Two icebergs next to each other and an executive standing on the top saying, “We’ll do an M & A. All we have to do is build a bridge between the two icebergs.” Totally ignoring what’s happening under the waterline.

Eric: Wow. That’s a really good point. A lot of people don’t appreciate just how heterogeneous and frankly unique so many information systems are, especially some of the older ones, but even some of the newer ones. There are changes that happen all the time, especially in some of these cloud-based solutions that will just choose to change the business model or the business process or something. If it’s a mission- critical system for you, and you weren’t alerted to the change, that could be somewhat disruptive.

Gwen: Mm-hmm (affirmative). Oh, yes. A lot of the under the waterline work that has taken place in data governance and architecture over the last ten years has been to arive at methods of working under the waterline that will support business disruption. I’m going, of course, to Service- Oriented Architecture (SOA) as a philosophy and practice. The best description I’ve ever heard of that is moving from custom-molded sculptures to a LEGO approach. Standardize the LEGOs so that they can be disassembled and reassembled. That is certainly a disruptive approach to working with systems, and it can be more expensive for one-off efforts. It requires a balance of: are we doing something quickly and cheaply right now, or are we building for flexibility. Those are the kind of discussions that those of us who work in the data community, we get to have these all the time. I feel like I work in a philosophical environment. To those who don’t care about this, maybe it seems very boring and dull, but oh my goodness, the drama that occurs in making these kinds of decisions.

Eric: There’s another observation, I think you’ll appreciate, especially with your work these days. I remember doing some research back in 2000 and 2001, and I was on the World Bank website looking at the gross domestic product for all the different countries that they had listed. They had data in all the major countries, all the industrialized nations, for what their gross domestic product was. I remember that I had researched this for quite a period of time and would go back to the website and double check and triple check, like, “Was that a typo? Are they sure about this data?” What it showed was that the United States in 2000 had a GDP of roughly $10 trillion dollars, and it said that the Russian Federation’s GDP for 2000 was something like $330 billion.

I remember thinking to myself, “How on Earth is that possible?” These were the two superpowers until the Soviet Union collapsed and then a lot of strange stuff happened that was a tremendous disruption to the global economy, but I’m asking myself, “How is it even remotely possible that one of two superpowers would be thirty times the size of the other? That just makes no sense.” The data that you see, that’s presented in the report, just seems way off. After I thought about it for a long time, I drew the conclusion that a big part of the reason why is because so much that took place in Russia was under the radar, was on the black market. There were so many rules by the government designed to take money from people officially, that they would just hide it. This is a very interesting dynamic it seems to me in terms of reporting and trying to ascertain information about what’s going on.

Just as a quick related story, I remember when the sequester went through here in the United States. I heard people in the government telling me that there were so many incredibly clever machinations going on of departments and groups basically hiding money. Finding any way they could to not lie, but not overtly reveal all of the assets that they had access to. It just kind of gets me back to this whole concept of rules and of how you enforce rules and how you govern operations. Let’s look at data as money. How do you get the information from people in a way that is productive, and how do you set up rules that do not encourage people from just hiding their data or keeping it under their tent so to speak? I’m curious to hear your take on all that.

Gwen: Well, fascinating. As I was hearing you talk about this, I was also picturing systems within systems, because you’ve got aggregated data from two entities, two countries, which are actually comprised of multiple systems. First of all, you concluded that your understanding of what that data was and where it came from exactly maps the understanding of those who presented it. I’m guessing that there was some guidance published, but not enough to satisfy you. So you concluded, “Oh, there’s the official system and then there’s the unofficial, unreported system within it.” Then you started breaking down even more from that. As you went to the sequester, I heard you talk about departments and systems and groups and people within that aggregated and you were hinting at different intents and objectives for each of those systems within the systems even down to the individuals within them. That is just a fantastic way, a holistic, synthesized way of thinking about how a single data point is created throughout multiple environments and ultimately, although there are IT systems involved, there are people behind this.

Now two of my best friends are a married couple. Steve Jordan is a child psychologist. Johanna Jordan is a family counselor. Over the years, we have all almost joked, joking/not joking, saying that we’re in the same business as working with people within systems and the counseling that is required to create win-wins for everyone and still meet the ultimate objective. It really influenced my work here, and also Buddhism has influenced this. If I had a choice, I would be a peace worker. Earlier in my career, I did not find a path towards that, but I did realize examples not unsimilar to that that you described. The world is full of unrest and hurt feelings and conflict and personal angst. That angst in individuals out in the world came from somewhere.

Just like ripples in a pond, I started following them to the center and the one that I saw is, “Oh, someone out in the world is upset. They had a bad day. Oh, they work in an environment with data. Oh, there was a conflict and their project just discovered that they have to add X number of unfunded hours to accommodate the minute and second for the age of something, and it’s not their fault. They didn’t know they had to do that.” Then you work your way into requirements and the systems and all the way down to the beginning where there was an undocumented rule, requirement and a lack of communication among all of those who are working with that data. I realized, “Oh, I am here. I happen to be sitting at the absolute center where pebbles are going into a pond. If they are not dealt with in the right way, those ripples will become ripples of unease and confusion and bad feelings, and they will ripple eventually out into the world.”

I took my practice to the place I was in and said, “I will attend to my field in a way that a peace worker would attend to it.” So yes, the business objective is make sure that the report shows the information we need, but the way that the caregiver community works with that can be influenced. I was so fascinated by this that I spoke with several other colleagues in the data community, people whose names you would recognize, Len Silverston and Anne Marie Smith and others, and we all sort of confessed to each other at a conference that we had all been doing the same sort of peace worker work within data management and governance. We’d had an unofficial community over the years to reinforce each other and even arrive upon common language that is kinder and gentler and more human-centric to help those involved in caring for the data to do so in a way that realizes objectives and also does not add unnecessary angst into our lives and the world as a whole.

Eric: This is great stuff. I love that you phrase it in this context of the data caregivers, because really, what you’re doing is you’re helping people reconcile and understand. I think that’s critical in this whole process as opposed to an old school perspective of “They had better do a good job or they’re fired,” or something. No one wants to live in fear. I think anyone who works with data is naturally going to be someone who is curious and wants to find out. You want to know how many sales did we make. You want to know what was the GDP. This is usually a very, I think, positive force, the desire to learn, to understand. I think that’s a wonderful perspective and a wonderful philosophy to view it in this caregiving perspective, and I have to say I think that your desire to be a peace worker is much needed in the halls of enterprise America these days, right?
Gwen: I suspect that there are many more who just do not acknowledge that their philosophy towards their work is similar.

Eric: Yes, and it’s really important, too, because to me, everyone should have a voice. Everyone does have a voice and they should have a voice. They should be heard. I think most problems in companies would not become big problems if the right person had felt able and willing to speak up at the right time. That’s a cultural issue, right? That’s a leadership issue, it’s a cultural issue. I saw this great quote on LinkedIn the other day. All these pithy quotes are flying around, like on Twitter. Some lady put up a post that said, “When I talk to managers, I get the feeling that they’re important. When I talk to leaders, I get the feeling that I am important.”

Gwen: Oh, I love that.

Eric: Isn’t that great? A good leader knows that everyone on their team knows something. A good leader knows there’s going to be conflict between these two people and those departments or whatever. There are natural conflicts, there are accidental conflicts, there are personality conflicts. There’s always going to be out there.

Gwen: This is not lack of conflict. It’s managed conflict.

Eric: That’s right.

Gwen: That’s my quote.

Eric: Go ahead, say it again.

Gwen: Peace is not lack of conflict, it’s managed conflict.

Eric: Yes, that’s great. Let’s think about it, when you have peace, when you’re in a peaceful environment, you can reflect, right? You can even think of the words and what they mean in your analogy of pebbles in the water causing ripples. When those ripples go by, the reflection of the water gets disrupted. You can’t see, right?

Gwen: Mm-hmm (affirmative). Oh, good.

Eric: When it’s perfectly placid, then you get this beautiful reflection and you can see exactly what’s above you in the water.

Gwen: You have the time, the insight to be proactive, to spend your time planning for emerging capabilities, for evolution, for transformation, instead of reacting to the choppy water.

Yes. That’s the objective. Those of us who work near the epicenter of this, we recognize that we have limited ability to deal with some of the really important stuff: the privacy of data along the information supply chains, the ethics of using data. We may not be close enough to the actual usage to influence that, but we may be able to provide calm enough waters so that our counterparts who are focusing on that or focusing on how do we move the water into the technology systems and in doing so, how do we ensure that the objectives and the requirements around usage, privacy, ethics, quality, how do we ensure that those travel along with it so that the other counterparts can deal with those? We have to trust that that work is happening. We have to build communities, bridges between our different data-related communities, so hopefully we can help with the hand-off along the way. Some of our biggest gifts are to try to keep the water as calm as possible so that this other work can happen.

Eric: That’s a really good point. I’m realizing we should touch on at least one other subject area here because there are things that are changing dramatically in our world of information right now. Of course, the Web’s a huge part of that. Mobile is a huge part of that. You hear about the Internet of Things, the connected environment that we have. It’s a huge deal. We could talk all day about different aspects of it, but I guess what I would try to work through with you is how do we take the best practices of data governance that you and others in the field have ascertained over the years and curated and refined, and how do we make sure that the ethics baked into those practices get absorbed and understood and woven into all of these new big data information systems, things like Facebook and Google and LinkedIn and all these other mobile carriers and on and on? What can we do to ensure that issues like privacy are respected going forward, like data sharing, are respected, and what can we do to basically pass the baton as this new area unfolds around us?

Gwen: Well, I’ll tell you, as you described that, it seems like such a big scary feeling that my first impulse is to say, “I don’t know.” Then a little voice inside my head says, “Well, that’s exactly data governance in 2003.” We’ve evolved there, so obviously it is possible. I think a lot of the answers were embedded in the way you asked the question. Obviously systems don’t net think. They will, but the technology is ethics neutral. It deliberates what is baked into it. One stream of this work is to identify what types of rules and capabilities and controls need to be included in the technology. The bigger part of that is communication between the various communities that develop the systems and the reporting and repositories, et cetera, and upstream from that is common understanding of intent. I’m trying not to use the word ethics here, so I’ll talk about intent instead.

That is all about the people. All about the people. Any framework for working on this has to be based upon the stakeholders, whether they are end users or working their way upstream through technology chains and data chains and understanding what are the most universal requirements and where those … I think I’m coming full circle in our discussion here, where a universal requirement, such as do no harm through this displaying of personal information inappropriately, how does that transfer to requirements along chains that have to be baked into the system, and what can we do even upstream from that foundationally as we architect data, as we shape it, as we shape the containers that the content would go into? I think you may have just described the work of a whole generation here to look at our IT systems, our data systems, from a humanistic standpoint and see how caregivers and users along the way can align their work, and how we can align with the technology vendors then that are creating systems and even more importantly, creating tools that are used for integration and data masking and all of the other under the water technology capabilities. How we can work together to ensure that the tools that are needed to provide the level of protections that are needed as we move into this even more sharing environment?

Eric: Yeah, that’s really good stuff. There is another topic that should probably be thrown in here. Maybe we can end with this or almost end with this for this conversation. It’s this whole movement of open source. This is something I’ve been following back since 2005. You mentioned SOA, service oriented architecture. I remember doing the research while I was at The Data Warehousing Institute and thinking to myself, “Wait a minute, open source software combined with service oriented architecture, that doesn’t sound good for big, closed source software vendors.” I’m thinking that’s handwriting on the wall, and sooner or later you guys are going to be in trouble. Obviously Linux came along and became an industry standard, largely because of IBM’s investment. Then of course Hadoop comes along from Yahoo, and they turn over this Hadoop distributed file system and all of this innovation spins out of that.

Now we have all these vendors working around this foundation that is open source. I really see it just spreading out from the operating system to what you would call the data operating system and on out from there to really give visibility, because as you know, part of the complexity and part of the challenge, especially in trying to merge data from multiple systems, is that the different systems have proprietary business logic baked deep into them, which can be very difficult to surface. I think this is good news in general that the open source movement seems to be reaching out to the point where it’s going to subsume just about everything sooner or later, which I kind of think is good news for transparency in general, but I’m curious to know what you think.

Gwen: I have to say I agree with you. I don’t want to bash big systems. They are absolutely needed, but to use the metaphor from the automobile industry, which probably half of our audience already knows where I’m going with this. We went from completely bespoke automobile assembly by craftsmen to the age of the conveyor belt. Ford, of course, is best known for that. He, within a few years, drove I think it’s eighty percent of the cost down, something like that. He did it by building large inflexible assembly lines where you had to let quality issues flow past you because you couldn’t stop the assembly line. It was wonderful. It’s studied in business books. It’s great, but then they became so big and so inflexible that they were no longer meeting all the requirements. So then the Deming world, the Toyota way, changed and now instead of one big monolithic assembly line, you had component work going on. The experts came to the vehicle bringing their specialized tools, doing their specialized pieces. The automobile was seen as one system composed of smaller systems and it was the tools and connectors that made that possible.

So likewise, yes, we still have the giant systems out there, but the heroes of the current revolution here are the new tools, the new connectors, the new capabilities of dealing with connecting systems within a large system, of managing the data, masking the data, integrating the data. Long list of capabilities here. These may not get the same press as the big systems. Certainly the vendors of them and those who know how to train on them, they won’t have the marketing budget as the big systems, but they are enabling the transformation and disruption of the way our systems work towards each other. So those of us who work under the waterline are very much aware of how important these vendors and specialists and techniques and processes are to enable this. I do suspect that in the not too distant future, those who manage the budgets are going to become just as aware as many of us are.

Eric: That’s great. You’ve managed to come full circle perfectly. The silent disruption strikes again.

Gwen: Oh, good.

Eric: Wow. Well, this has been wonderful. Folks, we’ve been talking to Gwen Thomas, the data governance visionary doing great work all over the world. Thank you so much for your time.

Gwen: This has been fascinating and I cannot wait to hear the others in the series and to compare notes with them also.

Eric: That sounds good. We’ll be sure to do a webcast at some point. Thank you so much folks. You’re listening to a Philosophy of Data.

3 Responses to "Data Governance and the Philosophy of Data"

  • John O'Gorman
    December 14, 2015 - 11:04 am Reply

    Musical Composition? Immanuel Kant? Bridges on Icebergs? In a discussion about Data Governance?? How did I miss this broadcast? 😀

    There is a theme that runs through this conversation, related to the comment about the ‘up and down’ nature of the work that data governance professionals do when it’s being done right. Sort of an ‘as above so below’ principle. Applying universals that work ‘on the ground’ and verifying that any changes to the abstractions still work in the field is the best way to go. If you can demonstrate to the various tribal factions that they not only gain a connection to the ecosystem but (in the context of those same universals) they can also keep their local language, you’re half way home.

    Of course, how you cover half of the remaining distance depends on your interpretation of Zeno’s paradox. The work is never ‘done’ and progress, not perfection, is the objective.

    John O’Gorman
    Principal and Chief Disambiguation Officer
    Quantum Semantics Inc

  • Gwen Thomas
    December 14, 2015 - 2:32 pm Reply

    Zeno’s paradox! Good one, John 😀
    And yes, ‘as above so below’ is a much nice way of describing jumping from 30K foot view to in the trenches and back again.
    Do you have any other examples you use to make this invisible field of data management & governance visible to our stakeholders?

    • John O'Gorman
      December 14, 2015 - 3:17 pm Reply

      Hi Gwen;

      Your ‘out of tune, poorly timed piccolo’ is a great way to describe what we turn our attention to, and when. If it’s all properly orchestrated and balanced it usually means the band has agreed to standards while simultaneously appreciating that the pitch of the piccolo cannot be adjusted the same way as a violin or a french horn; and that they need to be.

      The ‘as above so below’ comment comes from my assertion that a proper abstraction of the constituent components of the ‘juice’ flowing around the applications and platforms can deliver a near seamless experience of, let’s call it :”high fidelity” information. That abstraction is based on the idea that raw information is what systems capture, enhance, store and distribute as data. The invisible part is the set of natural classes to which that “information in the wild” belongs. When information is captured correctly, and because the classes are universal, it is easy to draw a (relatively) straight line between what is acquired, how it is organized (governed and accessed) and how much easier it is to turn into *value added* information.

      This is longer than I anticipated, but to wrap up, most IT projects fail on semantics, that fuzzy part you and Eric talked about. While the user is talking ‘juice’ the technical guys are translating into pumps, pipes, tables and object classes, and that’s assuming they are using the same language. When I demonstrate that they are, as you put it, “on different planes” but actually using universal classes of names and terms, things get a lot more efficient.

      John O’

Leave a Reply

Your email address will not be published. Required fields are marked *