Dr. Geoffrey Malafsky
CEO, Technik Interlytics LLC,
Chief Data Scientist, The Bloor Group
An Enterprise Data Strategy (EDS) is the data centric companion to an organization’s business strategy, technology roadmap, security plan, and other high priority facets of operating a modern business. Data as a separate entity has risen in importance deserving its own corporate plan due to the impact of several factors. One is Big Data technology providing powerful computing systems at low-cost enabling storage and processing of the full range of organizational data. Another is the growth in market value of data for targeted marketing, customer experience management, higher efficiency operations, and many other Business to Consumer (B2C) and Business to Business (B2B) uses. Finally, legal and regulatory requirements, and associated penalties, have introduced a serious threat to business costs, reputation, and sales for even inadvertent mishandling of people’s privacy and health data. This is highlighted by the impact of the European Union’s new General Data Protection Regulation (GDPR) on non-European companies since it covers all data related to EU citizens and companies regardless of where the company is based that is processing the data, and has redefined the notion of data ownership as belonging to the person the data is about.
These factors create a significant change in the value and risk of an organization’s data. Consequently, an EDS is a foundational component of an executive team’s plan for how they operate their business profitably, legally, and with managed risk. It was common for only some large organizations to expend the resources to develop and implement an EDS. This made sense when an organization could process their data as needed without the overhead of coordinating data systems and activities across departments, use cases, and accept the resulting outcomes and risks. Even in the cases where an EDS was made, it was predominately more policy and future roadmap than executable guidance with concrete metrics.
Each of the factors mentioned is enough of a disruptive force to warrant revisiting an EDS, but in combination the impact is substantial and a full modernization of an EDS is needed. Indeed, it would be wise for all organizations to start fresh to make an EDS without prior assumptions. Several aspects of the EDS may not have existed in prior versions and rather than try to insert key items like executive roles (e.g. Chief Data Officer) and Big Data systems (whose low latency clusters are incompatible with some traditional management and security approaches) it will be better to build a coherent strategy incorporating all current key issues.
There are three main sections in an EDS:
- Executive Guidance
Executive Guidance should describe the goals and primary principles from a business perspective. The goals are typically framed as a mission statement, vision, or goals and objectives. Common goals are to enhance the interoperability and sharing of data across business groups to enable more granular analysis, greater timeliness, and improve operational efficiency. These should be directly linked to current business initiatives especially those that have overt executive level support such as new product ventures, reorganizations, and adjustments for external reviewer suggestions. It is important that these not be too abstract or else there will be a risk the strategy will be viewed as a pro forma policy that busy workers can ignore.
The guiding principles should include reasons why this strategy is being done now. That is, the impetus for the strategy should be embedded in the principles, such as requiring comprehensive security over a managed total lifecycle of data assets arising from the organization’s proactive addressing of publicized data breaches in other organizations. Some examples from my prior projects are listed below.
- The core value of Net-centricity is the data and information it distributes, not the connectivity, hardware and software which enables the distribution.
- The entire lifecycle of data and information must be resourced and managed.
- Accountability and compliance for achieving data and information interoperability, discoverability, accessibility, understandability, trustability and security will be metrics based.
- Identify usage by business services and processes
- Identify lineage and position them within the information supply chain by linking them to source data elements and other data assets are created from them
- Assign ownership and rate them from the data quality perspective
One intrinsic issue is inter-department sharing and collaboration which may be one of the more difficult challenges and make the EDS as much about organizational dynamics as technology. This is realistic since many data challenges are inseparable from organizational and workflow issues. That is, the EDS should embrace the need for an integrated Organization-Process-Technology (OPT) perspective.
Governance has historically been more form than substance being overcome by the daily pressures of getting work done and lackluster inter-departmental collaboration. Well thought out frameworks with roles and responsibilities were frequently championed by consultants and explained at conferences. Even when adopted, the rigors of following its procedures coupled with new issues draining managerial attention led to minimal enforcement. Yet, this is simultaneously a critical aspect to implement to achieve the EDS’s goals and widely considered expendable at the working level. This portion of the EDS deserves interdisciplinary focus and ownership.
The first item to be defined for EDS Governance is an Executive Steering Committee (ESC). This should be composed of senior managers from all major departments influencing or influenced by the organization’s core assets. However, this is also the first part of the EDS which requires a judicious organizationally sensible approach. While widespread support and inclusion is desirable, it should not be done with a significant risk that important actors will cease to participate or will try to use the ESC as a forum for their own issues. This will cause the ESC to stop functioning and lead rank and file members to believe the EDS is no longer important. Hence, determining the proper membership, meeting schedule, and actual control of the ESC requires careful planning.
The other primary Governance components are intertwined and should be defined in a coordinated manner. These include roles and responsibilities, performance metrics, and regularly scheduled program reviews for both technical and business outcomes. To properly coordinate, Governance has to determine the balance among competing interests of capabilities, cost, risk, and prioritized needs. Layered on top of this assessment should be flexibility for adapting to performance metric outcomes, new business opportunities and challenges, and market driven ever-changing technology options. One primary reason to modernize the EDS is the current technology market is changing rapidly offering significant options for greater capability at lower cost and reduced risk in substantially new product forms. This means an inflexible EDS bound to a lengthy development process and thereby not intended to be changed for ten years is obsolete before it is even completed.
There are multiple sets of roles and responsibilities promulgated in the data management industry. They have common types such as a leading senior manager, curators of data files, engineers to process the data and handle databases, and business domain analysts. One of the most common roles is a data steward who is responsible for implementing the governance over the data lifecycle. The challenge with this should be obvious. There are many types of data coming from many groups being used in many systems for many purposes. Even having stewards assigned in each group will not by itself overcome the hurdles to getting sufficient detailed ground truth of all of these aspects to be able to shepherd the data successfully. This is one of the main points corporate data strategies, and indeed other corporate business and technical plans involving data, go awry. It would take a rare person with broad technical expertise and the charisma to overcome a multitude to organizational silos to effectively do this job.
The hidden complexity of the data steward’s job is one reason successfully implementing an EDS in a real environment is difficult. In addition to stating responsibilities, the EDS can also delineate the human competencies needed as Knowledge, Skills, and Abilities (KSA). Worker competency is a large field itself with leadership from both government and industry groups defining standard competencies and KSAs for career paths and job functions. This includes the associated learning and task performance guidance and metrics. For example, the Department of Labor sponsored ONET guide defines the following for Clinical Data Managers.
- Tasks: design and validate clinical databases including logic checks; process clinical data; generate data queries; monitor work productivity or quality.
- Technology skills: Database reporting software; object-oriented development software; analytical software
- Knowledge: English language; computers and electronics; mathematics.
- Skills: critical thinking; active listening; reading comprehension; speaking; writing.
- Abilities: deductive reasoning; information ordering; oral comprehension; oral expression; written comprehension.
- Work activities: interacting with computers; getting information; processing information; documenting and recording information.
- Detailed work activities: evaluate data quality; create databases; prepare data; analyze data.
This is an impressive list of KSAs and work tasks with a median salary of eighty-four thousand dollars in 2017. In practice, I have not seen this job filled with people having the full range of KSAs stated. However, many enterprise data management projects are structured assuming this critical job is performed well and consistently thereby exposing them to high risk of poor outcomes. This is a good example of what is required to succeed with an EDS, namely, objective detailed assessments of what integrated OPT tasks will need to be done and what a realistic personnel plan must include such as assigning a multidisciplinary team as overarching stewards rather than relying on individuals spread among different groups.
A new role has become popular due to this challenge. This is the Chief Data Officer (CDO) who is meant to have adequate organizational clout to guide, manage, monitor all key corporate data issues. However, just like the experiences of many Chief Knowledge Officers (CKO) who became prominent in the heyday of corporate Knowledge Management (KM), truly influencing daily work activities entails much more than issuing guidance and controlling project budgets. There are many entangled undiscovered deal-breaking issues within the OPT environment that must be sorted out before success is realized.
The EDS is a roadmap to data success through the real world with all the people, technical, and business challenges along the way. A successful CDO must prioritize competing interests in security, interoperability, Big Data collection, new monetization plans, and personnel training based on what they can do, what they need to do, and what will make the best impact while rallying their subordinates to do the hard work day by day. Step one is recognizing it is hard work and won’t be done in a day or even one hundred days but must become part of the business’s workflow fabric.
Roles and responsibilities should be developed in conjunction with program level performance metrics and scheduled reviews. Examples of some typical activities from my clients are listed below. For each there should be a clear definition of its scope and activities, the concrete outcomes desired and ways to measure the outcomes, and which roles are needed to perform them. It is likely that more than one role is needed for several which not only reflects the reality of a modern data environment but also enhances the likelihood of continued success by avoiding artificial silos.
- Maintain Data Models
- Maintain Information Use Cases
- Perform Data Quality Assessments
- Perform Business Impact Analysis
- Rationalize Information Assets
- Manage User Access
Performance metrics are key to continually determining how well the project is progressing and fueling decision-making on course corrections. It will be normal for changes to be needed and this should be embraced as good management. All aspects of the project plan should be open to review and adaptation. Key Performance Indicators (KPI) are popular as the performance metric framework because they can effectively communicate to managers in many layers and fields. Each KPI should be attached to a specific quantifiable and measurable objective. KPIs can be for aggregated results such as customer retention time per web site interaction based on using data to improve customer experiences, or for internal management such as percentage of core data records secured at-rest and in-transit with encryption.
Implementing the strategy begins with assessing the applicability and level of effort required to use the many techniques available for managing and processing data over its lifecycle. Most of the common ones will be used in some manner but it is important to maintain the balance between level of effort (LOE) and tangible benefits from each technique. For example, a good EDS program will understand the current ecosystem of data systems and the workflow of data collection, processing, and distribution in Enterprise Architecture (EA) products. However, in some cases gathering this information accurately and in detail has been attempted unsuccessfully in prior projects due to the lack of documentation and high LOE to identify it even for just one system. The correct balance is to define the EA products important for all systems without which interoperability and other primary objectives cannot be met, as well as the smaller number of more granular EA products for a subset of the most critical systems and use cases. These should be the ones done in the early phases while the EDS should declare the goal of growing the body of products as the organization evolves its data environment. This growth should be tracked in the program reviews using performance metrics.
You should notice that assessing the scope and use cases for EA overlaps the governance assessments and decision-making underlying the executive guidance. This is a foundational characteristic of a good EDS, namely, the cross-fertilization and constant realignment of all of its parts as the organization puts it into real action. These topics should be explicitly presented and discussed at program reviews.
Other common implementation techniques that can be very useful if performed with a clear scope of LOE for benefits are:
- Data modeling
- Metadata schema development
- Semantic standards for core data sets
- Common data repository
- Hybrid Cloud architecture
- Big Data processing in YARN, SPARK, HIVE
- At-rest encryption
- In-transit encryption
- OS kernel level access control
- Automated lineage analysis and query
- Manual QA/QC documentation per industry standards
- EA logical models
- EA physical models
- External technology reviews
- External program reviews
- Internal red-team reviews
- Supply chain approach to data lifecycle
- Data science toolkits with notebook capture
- Automated data QA analysis at all stages of workflow
There are many more techniques and tools. While some are significant opportunities, such as Big Data and Hybrid Clouds, others are potentially much hyped minor palliatives for serious hurdles. This is a specific area that warrants in-depth consideration while developing the EDS. That is, do not simply adopt the latest industry fad as this is likely to not produce much needed capability improvement while squandering valuable organizational resources.
Achieving these benefits will require discipline and belief. Hurdles will occur and easier pathways for immediate actions will appear tempting. The executives must believe that this investment will generate substantial benefits. This requires that the implementors practice disciplined adherence to the guiding principles and governance procedures. This does not mean governance is blindly followed. Rather, the performance metrics should determine when procedures work well and when they need to be modified or removed. A well-executed EDS will do so regularly as part of the reviews.
In my experience, a little deep analysis to craft a realistic and still ambitious EDS will transform the organization and several careers with large, tangible, impressive advances. From this well-crafted EDS, a true roadmap will make the journey smoother, widely endorsed, and produce admired achievements.