Data Governance

Foreword

For an organization aspiring to be data driven, it becomes imperative to put together a team to constantly oversee what data is captured and how it is captured, transformed and conformed within the target data repositories where it is disseminated to the various stakeholders. And from this foundation, develop business intelligence and the analytics that drives the company. This team must also look to improve the process using people and technologies. Such a team is the Data Governance team.

 

Forming the Data Governance team

Data Governance is both a bottoms-up and tops-down effort. Teams specifically seeing data issues must corral support for the upper management to realize the need for it. And once upper management is convinced, they have to provide executive support and constant guidance to nurture Data Governance in an organization. Individuals within business units or functions realize they need additional data points or existing data points that are problematic and needs fixing. Such data gaps or issues warrant the business stakeholders to approach the data warehouse development team with the problem and ask for resolution. Once the DW team realizes that the gaps or issues are not originating from the data warehouse or data mart, they should work in tandem with business to get executive support for initiating a Data Governance practice within the organization. While data gaps need to be fixed in the source systems, some data issues could be fixed in the data warehouse when there is no bi-directional data flow between the DW and the source systems. Data fixes in the DW should ideally should be avoided and be dealt within the source systems as the source may change how and what data is captured resulting in costly maintenance. The downsides of not having data governance across the organization will prove too expensive for the organization in the long run. Some of the key justification points include inability to study operational efficiency, understand customers and identifying opportunities for growth of top line revenue, opportunities to increase bottom line profit by reducing costs. Quantifying this as a number will help justify to the management the need for Data Governance.

Executive Support

Any organization embarking on the Data Governance journey will need at least one executive crusader who will carry the message to other executives and garner support at the highest levels. All executives who lend their support must be committed to the initiative and must be willing to provide ongoing time & effort to ensure its success. These executives will also go on to form the Data Governance executive steering committee which we will discuss in the organization of the Data Governance team. Having socialized and garnering support, the lead executive can begin forming the team.

Team Composition

Data Governance team should comprise of the following roles:

  • Data Stewards
  • Data Governance leads
  • Data Governance Steering Committee members

Data Stewards – Data stewards are the people that are embedded within the business units or functions and are closest to the data. They understand the day-to-day challenges that the BU face in their inability to discern information from the data being captured in the source systems or the DW. They are responsible for identifying all data gaps and data issues. They are also responsible for tracking the data issues and also come up with a solution by working closely with their technology counterparts. To ensure that the data stewards are doing their jobs, performance metrics must be tied to their job objectives and responsibilities. To begin existing people within a BU can play the role of a Data steward. Within a BU there can be several data stewards depending on the size of the BU and the functional roles they play.

Data Governance Leads – Data Governance leads are designated individuals from each of the BU who will represent the BU or function at the Data governance council. They are responsible for understanding the challenges that their particular BU faces and work closely with multiple data stewards within their BU to resolve the issue. They can also work with external technology and other BU counterparts to resolve issues.

Steering Committee Member – Typically executives from each of the BU and functions form the steering committee to deliberate and decide on the challenges and their resolution based on business justification provided by the Data Governance leads.  They should be able to make executive decisions on priorities to implement the solution proposed to resolve the data issue. Ensuring progress is made on all data challenges is also a responsibility of this board.

 

Data Governance Components

DG

Data Accuracy

Data accuracy can be broken down further into 3 areas. They are:

  • Data Quality – Data quality can be measured if the data received from the source system is recorded in the target system which is primarily a Data Mart or a Data Warehouse. Data challenges in this area include truncated data, invalid or garbled data and or incorrect data.
  • Data completeness or coverage – Data completeness and/or coverage determines if the target data repositories have all of the data that originated from the source system. Typical data challenges include loading of partial rows or drop outs due to data integrity challenges such as nulls, missing mandatory columns among others
  • Data Integrity – Data Integrity is critical to be able to relate data to one another and integrate data sets from multiple data sources. Without data integrity, business intelligence and analysis would not be complete or deterministic.

Unless all of these areas are addressed, data accuracy will still remain a challenge within an organization. Please bear in mind that highest Data Accuracy is something all organizations should strive for but not be constrained by it. What this really means is that organizations must start somewhere to measure and improve Data Accuracy by gauging it constantly. Data Accuracy is something that is different for every business and function. Therefore the Data Governance team must work with each business unit lead to determine what is acceptable accuracy of the metric being measured. For Finance, it might be that the data must be 100% accurate. For Operations, 95% or a 98% accuracy may be acceptable. And therefore, the tolerance level of inaccuracy and data errors must be defined / agreed upon for/by each BU. Closely tied to Data Accuracy is Master Data Management (MDM) which is essential for all organizations to conform their data into a standard that all organizations can follow and adhere to. This allows for single version of the truth that we can attest to when it comes to data.  Master Data Management (MDM) is huge topic by itself and we will cover it separately.

Data Availability

Data Availability can be defined as making the right data available to the right audience at the right time to ensure that data is serving its purpose in making an organization data driven. The various aspects that drive data availability include:

  • Data Freshness – Each BU or function may have different needs in terms of how fresh or stale they want their data to be. This must be discussed and agreed up by the data governance leads within their BU.
  • Data Load Frequency – Data Load Frequency is dictated by how often data needs to be refreshed. This must be defined and designed/implemented within the automated processes of data sourcing and aggregation.
  • Data Redundancy – Data Redundancy is replication of data preferably in multiple data repositories in multiple data centers so that business continuity can be achieved. Loss of data and downtime in resurrecting data can be costly for mission critical operations within a business.
  • SLA – Data Availability includes discussion, prioritization, design, architecture and implementation of data freshness, load frequency and redundancy of data and the data /BI platform that support business continuity. All of these aspects must be defined, agreed upon by the business and technology partners through the data governance council. The outcome of the agreement must be captured in a Service Level agreement (SLA) between the teams. Implementation that requires resources should pass through the steering committee for budget approval.

Data Accountability

Data Accountability is ensuring that the data is available to the right individual with the right privileges and it is auditable. It also ensures that the data does not fall in the wrong hands by withholding privacy practices and confidentiality of sensitive company data. The broad topics under Data Accountability are as follows:

  • Data Privacy – Ensure data confidentiality is maintained through encryption, masking and/or denying access to private or confidential data.
  • Data Privileges – Control who within an organization having which role can view what data
  • Data Security – Ensure data is transmitted securely between internal and external systems
  • Data Auditability – Track who within the organization is using what data and ideally also learn for what purpose
  • Data Ownership – The concept of a data owner is allow for designated Data Stewards to discuss the nuances of data and hold them responsible for the resolution. Typically Product managers, operations managers should be data owners.

Data Standardization

Data Standardization can be summarized as enabling a single version of the truth when it comes to metrics and its underlying data from whence it is derived. Often times organizations struggle with different versions of the truth and thereby second guessing which numbers are right. Addressing this issue prevents a great deal of churn within teams and reduces escalations, unwanted fire drills and confusion at even at the highest levels within an organization. The following practices will ensure that data is standardized and a single version of the truth is what is made visible to an entire organization:

  • Data Definition – To begin with, it is important to define all data elements (machine or human created) within an organization. It is a long and arduous process but it has to be done.
  • Metrics / KPI – Metrics are measurements that a business unit or function defines to measure their performance, efficiency; it’s growth, revenue, profit, margin, customer satisfaction, reach among other things. And KPI (Key Performance Metrics) are a subset of metrics that determine how a company is performing against its goals. It indicates whether a company is growing or declining or stagnant. These have to defined and understood as to how they are computed across the organization
  • Processes – Processes are steps taken within an organization and within the business functions / business units to fulfill the business model the company is executing to create the value chain for their customers. When processes are executed, data is captured by people and systems. It is important to define what input is required for each processes and what output is created. Documenting this becomes critical for organizations to understand who expects what data (input) and how they process it to create what data (output). This also helps define the data owners for each of the business function or unit.
  • Master Data – Master data can be defined as the data that the entire organization uses to conduct their business. Product data and Customer data are examples of master data that should be consistent across the company. Hence managing this type of data using a hub and spoke model is highly important
  • Metadata – Metadata can be defined as the classification of data into buckets or categories or hierarchies to better navigate data, tag data and to segregate data for various purposes.
  • Source Systems – Understanding source systems for a given data element becomes crucial when it comes to addressing data quality issues. A source system could a source for a data element but does not make it the System of Record. A system of record is where the data is created for the first time. Understanding this subtlety and holding the systems of record responsible for fixing data quality is an important part of data governance.

 

Data Governance Process

 

DG-process

 

We have covered a lot of material to establish Data Governance within an enterprise. Each topic and sub-topic by itself is a larger discussion to be had. This is an attempt to give you broad strokes on what it entails. Please feel free to reach out to me and provide feedback or further clarifications

-BC

One thought on “Data Governance

  1. Pingback: Recent Blog | Bala Cuddalore

Leave a comment