10 Data Sourcing Best Practices for Reporting
Data quality, and quality reports, can only be produced by dependable and repeatable processes. Let SolveXia and Yellowfin show you how.
Solvexia’s mission is to help transform our clients’ businesses through process automation. As part of this role, we spend a lot of time with people learning about how existing and prior automation projects were “not as successful as hoped for” (in other words, they failed). We value this knowledge greatly, since we strive to avoid repeating mistakes of the past.
As part of this role, we often work with our clients to improve existing or create new reporting solutions ranging from team and departmental Excel based reporting through to boarder data warehousing and business intelligence solutions. Getting “into the trenches” with many of our clients has helped us learn and understand just how big an impact that data quality has on the credibility, usefulness and ultimately success of any reporting solution.
This article lists these observations, across three broad categories, getting the right data, enforcing quality and being agile enough to cope with change.
1 – Let the desired business outcome dictate what data is required
The term often used here is ‘analysis paralysis’ (over-thinking a set of data to a point that a decision or action is never taken, in effect paralysing the outcome).
Many people often fall into the trap of letting the data dictate the reporting that they produce, when in-fact the dynamic should be reversed. The desired business outcome for the reporting should be the starting point in driving what data is sourced, how and where data quality efforts are focused and the establishment of rules and controls around how the data supply is managed and maintained going forward.
The key driver here is “focus”. With energy, time and money being finite, it is critical that we focus our limited resources on data sourcing activities that impact the desired business outcomes for the reporting.
- Always start data sourcing activities by understanding the key desired business outcomes for the reporting. Note: the question here is NOT “What data can we get” followed by “What can we do with it”. Rather it should be “What do we want/need to achieve through our reporting” followed by “what data do we need to support this outcome and how can we source it and ensure it is of a high quality”.
- Prioritise your data sourcing activities based on the business outcomes that the data supports.
2 – Profile Your Data
In general, the providers of your source data, whilst having intimate knowledge of their data, understandably won’t know and sometimes care a great deal for what it is that you intend to do with their data.
Finding the provider of the data is step one. Once found, you will need to ensure that the data you are sourcing has the profile that meets your needs. That is, the structure, granularity, age, frequency and availability of the data.
Getting a daily extract of data from your general ledger may be just what you need but overall useless if the transactional level insights you require can’t be derived because the data being provided is aggregated at the account level. Likewise, the granularity may be perfect, but the overall impact for your daily performance report is reduced because the data can only be provided on a weekly basis
Conversing with the providers of the data and quizzing them on the profile of the data (e.g. How often can you provide the data? In what format? How granular? etc.) will help to ensure that what you get is fit for purpose. Alternatively, this may also provide an opportunity for issues to be identified early before the quality of your reporting is impacted.
- Take the time to profile the data that you need including the level of granularity, frequency with which it is provided, structure, format and the method of delivery.
- Meet with providers of the data and ensure that they not only can provide “the data” but that they understand and have an appreciation for the profile of the data that is required.
- Identify potential issues and inconsistencies early and put plans in place to either enhance the source of data, find an alternative source or adjust your reporting (including expectations.
3 – Get as close to the original source as possible
It is quite likely that one or more of the sources of data required for your reporting will, at first glance, reside in a report or spreadsheet that has been prepared by another person or department (rather than data extracted from a system or database). For example, say you want to provide insights on actual vs projected profitability of different sales channels requiring the use of a sales analysis spreadsheet which is manually prepared by some members of the finance team.
Each instance of manual intervention introduces a risk to the quality of your data which can have significant flow-on effects to your reporting process. In particular, the opportunity for the inconsistent or erroneous preparation of the source data or the possibility of unexpected structural changes to the source data in the future.
Whilst there will no-doubt be cases where the manual spreadsheet is simply the only method to collect the required data (e.g. a staff member noting the names of attendees to a marketing event or seminar as they walk through the door), often, a quick survey of what data is actually required for your reporting will reveal an opportunity to skip the manual spreadsheet and extract data straight from the “source” (i.e. tap into the “inputs” to the manual spreadsheet rather than utilise its “outputs”).
- Be cautious of using a manually prepared report or spreadsheet as a source of data.
- Look for opportunities to use the inputs into the manually prepared report or spreadsheet rather than its outputs.
- Collect the data you need from the earliest possible point of the data supply chain to limit exposure to uncontrolled manual intervention.
- If the manually prepared report or spreadsheet must be used, meet with the person/people responsible for its preparation and form an understanding of the quality expectations. Also, ensure that communication with the preparers is maintained going forward and that you are notified well in advance of any structural changes to the data they will be providing you.
4 – Keep it simple and consolidate sources. Avoid the temptation to hoard data.
The principle of hoarding extends to data. It tends to occur when you are spoilt for choices with data being duplicated across multiple sources or a tendency to collect and consume all data that is made available on the assumption that it will all be useful or that “somebody might need it someday”.
For example, when enquiring about data relating to sales for individual channels, the report preparer is told the data is available across multiple extracts which in addition to the data they require, also contain information relating to inventory and taxes for the products that have been sold. Instead of fousing on what is really needed, the report preparer attempts to accommodate all of the data that is made available.
As the volume and variation of data increases, so does the complexity, effort required and the general level of headaches.
- Focus on what you need. Don’t fall victim to the allure of hoarding data.
5 – Set and manage expectations around data quality.
Some data, as you’d expect, is more important than other data. Largely driven by the key business drivers/outcomes for having the reporting in the first place and your audience will tend to focus and have a limited tolerance for error on certain pieces of reporting.
It is quite common for the recipients and users of reporting – particularly senior management – to focus on key areas of interest, such as a ratio, movement or comparison to provide them with indicators on the performance of the business. Needless to say, ensuring the supply of high quality data to these specific areas of interest is paramount.
It is critical that you understand the information being extracted and decisions that are being made from your reporting and thus determine the potential business impact of data quality issues. As the time and energy available to invest in defining, measuring and acting on data quality issues is finite, you must be able to prioritise and focus on areas that would have the greatest impact on the business.
- Determine and rank the business impact of data quality issues for each source of data.
- Distribute your time, energy and focus on defining and measuring data quality in proportion to the business impact.
6 – Catch data quality issues as early as possible.
The cost of identifying and fixing data issues increases dramatically the further the data has moved along the supply chain. A common phenomenon observed is the 1-10-100 rule whereby a data issue will cost the business $1 to fix at the beginning of its journey, $10 to fix if it is identified in the middle of its journey and $100 to fix if it is not identified until it has completed its journey and has been output for reporting.
For example, trying to understand and determine why the sales for the current month are negative by looking at a report or dashboard that has been populated from multiple streams of data processing, mapping and calculations is considerably more painstaking than simply checking the source sales data for negative values at the very beginning of its journey.
Once you have defined the business impact of any potential data quality issues and have identified metrics to measure the quality of data, the implementation of quality control processes early on in the data supply chain will help to reduce the cost of identifying and fixing quality issues.
- Implement processes for checking data quality as early in the data supply chain as possible..
- Always check/vet the individual source data. Even basic checks for consistency of format and structure can make a massive difference in the overall cost of resolving errors.
7 – Measure and act on data quality issues.
Once potential issues to data quality have been identified, you will need simple and easy-to-interpret ways to measure the quality of the data going forward so that you can identify and act on it.
By creating metrics, you increase the visibility and oversight in relation to the potential data quality issues you have identified and help to ensure they are identified and acted upon on an ongoing basis.
The exercise of defining the metrics can be initiated by asking simple questions and determining measures to track each item. Is the data in the correct format (e.g. does the date field contain only dates etc.)? Is the ordering or structure of the fields consistent with expectations? Are the number of records or the total sum value of certain fields within expectations or tolerances? Are there any duplicate records?
Generally, the measures you implement will catch errors on odd occasions (for example, you are provided with the wrong file for a particular set of source data) and will allow for corrections to be made early. Consistent data quality issues may, however, require the implementation of a more strategic process to cleanse or fix the data – for example performing a find and replace of an invalid character (e.g. “!”) from a field in the source data.
- Always define measures or metrics for the data quality.
- Implement processes to check source data against the metrics you have defined.
- Define tolerances and expectations and trigger alerts to the appropriate people when necessary.
- Address consistent data issues through the implementation of more strategic processes for cleansing the source data (instead of “quick fixes”).
8 – Expect and embrace change.
The landscape you are operating in will most likely change. A merger/acquisition, new sales strategy, new HR direction or a change in focus on operational activities will bring with it a need for new information, insights and KPIs to aid decision making and compliance. In addition, the eco-system in and around your reporting may also change – for example changing your general ledger system.
You’re reporting and data supply chains that support it will tend to be designed for today, which is both reasonable and expected – as most of us do not possess a crystal ball. That being said, you can and should embrace the changing environment and plan for what are most likely inevitable changes in the future.
A great reporting solution can be undermined if it is suddenly put out of commission for two months due to change in the supply of data that was not planned or facilitated for ahead of time. Likewise, rigid data supply chains and reports that are not capable of changes to the supply of data or reporting requirements may face the same fate.
- Write a shortlist of expected or possible changes in the future to both the reporting requirements and supply of data and score each item for the likelihood of it occurring and the impact that it would have (Low – High). Ensure that you have a plan in place to mitigate the impact of changes that rank Med – High.
- Facilitate communication to ensure that you are made aware well ahead of time of when a change is to be made impacting the supply of any data (e.g. a new GL system or discontinuing and Excel report that is used as a data source) or new reporting requirements.
- Define tolerances and expectations and trigger alerts to the appropriate people when necessary.
- Manage expectations of sponsors around the ongoing maintenance and cost of adapting to changes.
9 – Establish change management controls and encourage knowledge transfer to simplify changes.
Now that we have accepted and embraced the likelihood of change (leading on from point 8), what happens when it is time to act? What will we do? Who should do it?
Good planning and change management controls will minimise the risk of future changes on the integrity and thus credibility and value of the reporting in the future.
What happens when you change CRM vendors eight months from now? Who will be responsible for transitioning the supply of critical source data for your reporting quickly whilst maintaining the same data profile (see point 2) and quality? Or how do you ensure that a request for a change to the reporting is responded to and implemented without adversely impacting existing reports?
Embedding controls for managing changes and ensuring that all stakeholders are equipped with or can easily access the required knowledge to implement the changes will be critical.
In the event of a required change, staff should be able to traverse through and identify and understand the individual supply chains used for reporting (which are ideally mapped or documented). Individuals, who have permission, can then make the necessary changes which will be signed off or approved by key stakeholders and recorded as “knowledge” for the future (i.e. updating documentation and keeping a log of changes made).
- Ensure that knowledge of the data supply chains is readily available through documentation.
- Establish procedures and rules for changes including who is allowed / responsible to request, make and approve changes.
- Ensure approved changes are recorded and logged and any supporting documentation and process maps updated to enable accurate knowledge transfer in the future.
10 – Create a controlled and audit-able environment for business users to adjust or manipulate data for reporting.
For many organisations, the vital analysis and insights that your staff are able to inject into the pool of data used for reporting presents a double edged sword.
On one hand, human expertise and experience, when introduced into a reporting process, ensures a level of business acumen and real-world understanding that would otherwise be ignored by simply looking at the data. On the other hand, however, the old saying of too many chefs ruining the soup may also hold true. As the number of human hands that are allowed into the data supply chain increases so does the risk of poor quality or inaccurate data surfacing in the reports.
Sustained data quality issues or the emergence of inaccuracies from human error will, over time, hamper the credibility of the reporting and reduce the value that it provides to the business. That being said, taking the extreme approach of configuring a “black box”, void of any human oversight or interaction may have the same issue as key adjustments are left wanting.
The facilitation of human interaction with the data supply chains in a controlled and auditable manner should be the main focus here. Ensuring that specific individuals can access data at specific points and perform a discrete set of adjustments and manipulations that are logged and traceable will provide the best of both worlds.
- Avoid uncontrolled manipulation and adjustment of data.
- Define points along the data supply chains where human interaction is expected and allowed and define rules around what can change and who can change it.
- Encourage workflow within the organisation to review and approve or decline changes to critical data. E.G. a change that will impact the report should be reviewed and signed off by a responsible person.
- Ensure changes are logged and auditable so that the reporting can be traced back to the individual changes that have been made. This will facilitate the changes/adjustments while maintaining credibility.