What is Data Governance?
Governance is a term that most people are aware of, even if they only have a woolly idea of its definition. You might think of the Bank of England, the government, or the rules around how charities work. It evokes ownership and strategy, but it also the sets culture of an organisation.
Key principles of all types of governance include accountability, transparency, policy, risk-management, and structure.
Data governance is simply an extension of this. Not just the ownership of data, but of the standards around that data. Additional principles here include audit-ability, standardisation, quality and change management.
Why is it Important?
The oversight, monitoring, and strategy introduced by good Data Governance bring significant benefits to any organisation:
- Improved decision making. With incomplete, siloed data, any decision is reduced to best guess. Further, if that data is present, but is inaccurate, an organisation may make confident decisions based on incorrect information.
Should an online retailer order more stock for their top-selling product? How many items are left, what are the return rates, is there warehouse capacity, what are the profit margins on it?
Better data allows those decisions to be made accurately and quickly, but great data can unlock insights that add whole new streams of opportunity - if the retailer knows about those returned shoes earlier, they don’t need to order stock, they can contact the user to offer different sizes, they can prepare staff in the returns centre.
- Compliance. Organisations need to know which regulations apply, what they mean, and ensure that they both comply and can prove it. Breaches of GDPR carry fines of up to €10m, or 2% of global turnover, and violations can be treated as criminal offences in some member states.
The Cambridge Analytica scandal, where Facebook allowed app developers inappropriate access to user data, caused Facebook to be fined $5b and suffer a huge loss of trust.
- Data Security. Global cybercrime increases linearly year on year, its cost is counted in trillions of $US, and it is predicted to have a greater impact on insurers than natural disasters.
Mitigations are crucial, but unless they are holistic and universal within an organisation vulnerabilities are left open. As well as financial impacts, reputation, the viability of a business, and the moral imperative of safeguarding customers’ and employees’ data are at stake.
- Risk Mitigation. Having an holistic overview of their data, allows organisations to better understand risk profiles, causes, and consider potential mitigations. Whether your primary focus is financial, business, environmental, or ‘will that new killer feature be released before Cyber Monday’, having a critical eye on complete, relevant data, means that surprises can be reduced, avoided or removed.
Building dashboards for monitoring the latency of queries in a critical database, and building alerting, and regular reviews of trends will help to prevent the database falling over and bringing down your entire estate.
How to Implement Data Governance From Scratch
Build a governance board
Form a group of people who will own the strategy around policy, security, transparency, and who are accountable for it. Ensure that a wide variety of roles are represented within this group, so that it has all the experience needed. Consider representatives from IT, legal, compliance, and business units. Building cross-functional ownership here brings balance to decisions, allows them to be made more quickly, and better ensures that the rest of the organisation feels included.
The role of this group is not to look after the day to day (although the same people might do both things). In football terms, they are not referees, they are FIFA.
Define the rules
Create the principles by which your data should be governed, and set goals for what they should help you achieve:
- The Principles of Data Governance are they rules by which an organisation lives. Implementation of them will vary, but will universally include
- Stewardship - define roles and assign people to them to oversee and enact the policies created. People in these roles are responsible for the management and oversight of data. Look out for situations where there is no ownership, or where everyone is responsible. Often people assume data is owned by IT, but it should be owned by a business representative who has approval authority for decisions about data within their domain
- Data Quality - it’s crucial that data is complete, correct, and reliable in any organisation. Strategies, guidelines and mechanisms that verify and maintain the quality of data throughout its lifecycle should be implemented.
- Accessibility - data should be easy to find, and fast and reliable to access. Accessibility enables faster insight and a reduction of data siloes and duplication, however, it must be strongly balanced with privacy and security, working to the Principle of least Privilege (PoLP). Access controls must be in place, as well as audit-ability, secure storage and transport.
- Consistency - partial duplicates, mixed formats, use of different units can all lead to inconsistency. Strong controls on data entering the organisation are the simplest and cheapest ways to prevent it, but holistic, org-wide data models and tools should be used, as well as monitoring.
- Compliance - laws and regulations regarding data must be adhered to, with severe consequences for infractions. Audits and regular risk assessments should be performed to ensure that the organisation isn’t in breach. Regular training of staff and mechanisms to prevent breaches should also be put in place.
- Goals might include:
- Protecting stakeholders’ needs
- Reduction of costs through the removal of siloes
- Ensuring the transparency of processes
- Building more insightful data
- Reducing the risk and impact of cyber attacks
Take an audit
Before making any improvements, you need to know where the risks and gaps are in in your estate currently.
- Look at how securely data is stored and transported, what level of encryption is used, is data of mixed classification stored together?
- Is your infrastructure reliable enough - what would happen if the server crashed, what if someone accidentally hit ‘delete all’?
- Do you know where your data comes from? Could a malicious actor send in bad data, corrupting or overwhelming your databases? Can data be taken back out of the system?
- Inspect your data - do you have duplicate tables/databases/rows of data? Do you have incomplete or orphaned data? Is some of your data out of date and should be archived?
- Is your data discoverable, usable, complete? Is it a product in its own right? Do you have all the data you need to fulfil your business strategies?
Prioritise these discoveries according to urgency and priority - what is urgent, illegal, business critical. It can be useful to use an Eisenhower Matrix to identify biggest impact, least effort choices.
Do not try to fix everything at once.