Reporting often relies on "data pipelines" to collect, combine and transform source data. With 62% of people relying on others to supply their data, here are 10 data sourcing best practices.
The 10 practices, explained in more detail below include:
1. Letting the desired business outcome dictate what data you need.
3. Getting as close to the source as possible.
4. Consolidating sources and keeping it simple.
5. Setting and managing data quality expectations.
6. Catching issues early in the data journey.
7. Measuring and acting on data quality issues.
9. Implementing change management controls.
10. Allowing for data collaboration - in a controlled way.
Before we get started on best business practices for reporting and data, let’s take a look at the data sourcing definition.
If you’re wondering, “what is data sourcing?” the answer first comes from defining a data source. A data source is where that data that is being used to run a report or gain information is originating from. For a database management system, the source is the database. For computer programs, the data source is a spreadsheet, XML file, data sheet or hard-coded data within the program. Depending on the computer system or program, data sources will differ.
When it comes to data warehousing, a primary concern for the accuracy of information is where the data comes from. In such cases, and to help businesses run more efficiently, it is imperative that data is accurate, clean and properly protected. Data sourcing is the first step in any data warehousing project because without the data, you can’t do anything. After devising the right plan to obtain accurate information (data), the next step is to figure out how to store it consistently and in the same format so that when you run reports, you will be able to receive the right outcomes for decision-making.
The term most suitable to this topic would be 'Analysis Paralysis'. Companies often over-analyse their data sourcing issues, so-much so they forget to act.
Another common issue is when the data determines what reporting you produce. This dynamic should be the other way round. The desired business outcome for reporting must be the starting point in determining:
Companies must focus on data sourcing activities that have the most impact. To do this, you need to have a clear and concise understanding of the desired business outcomes.
Do the providers of your source data know who you are? Do they understand (or even care) what you intend to do with their data? It's OK (and normal) if the answer to these questions is NO.
Finding the provider of the data is step one. Once found, you will need to ensure that the data you are sourcing has the profile that meets your needs. That is, the structure, granularity, age, frequency and availability of the data.
One company we worked with had requested a daily extract from their general ledger. After working their way to the front of their IT queue, the day finally arrived - a GL extract was now available each morning. Unfortunately, the extract was not granular enough for the transactional level insights required.
When it comes to data sourcing, you must communicate with the providers of the data. Take the time to ensure the data provider can answer critical questions like:
Doing this early on can also help mitigate risks around data quality.
It is common to source data from another report or spreadsheet - which is being prepared manually. For example, a sales performance report may rely on sales data prepared manually by other members of the finance team.
Each instance of manual intervention introduces risks to data quality. Inconsistent or erroneous upstream processing can have unexpected consequences for your reporting,
While you may sometimes not have a choice, it is always worth taking the time to survey what data is actually required for your reporting. This may highlight opportunities to remove manually prepared data sources or switch to system extracts.
The principle of hoarding extends to data. Companies often duplicate data sources. They also have a tendency to collect and store all data "just in case" they ever need it.
For example, a finance staff member enquires about data relating to sales for individual channels. They are told that data is available across several extracts. In addition to the data they require, these extracts also contain information relating to inventory and taxes. Instead of focusing on what is really needed, the staff member attempts to accommodate all of the data that is made available.
As the volume and variation of data increases, so does the complexity, effort required and the general level of headaches.
Some data, as you’d expect, is more important than other data. The tolerance for error will differ from report to report. Your data sourcing must be lead by where the focus of your audience lies.
For example, management may focus on a particular ration, movement or comparison in your report. Ensuring the supply of high quality data to these specific areas of interest is paramount.
It is critical that you understand the decisions that are being made from your reporting. This will allow you to determine the potential business impact of data quality issues. Ultimately, this will allow you to prioritise your data sourcing efforts.
The cost of data issues increases the further the data has moved along the supply chain. A common phenomenon observed is the 1-10-100 rule. That is, a data issue will cost:
The 1-10-100 rule is highly relevant for data sourcing. This is because data is often transformed several times during its journey. The effort to unravel these transformations late in the journey can be substantial.
Start by defining the business impact of potential data issues. Next, identify metrics to measure the quality of data along its journey. Finally, implement controls as early as possible into the data journey.
After you identify potential data quality issues, you must start measuring for quality. This will help ensure you alert to errors and can act on them going forward.
Create metrics to increase the visibility and oversight for data quality issues. You will also need a way to ask simple questions when collecting data such as:
The purpose of these measures is to catch errors and allow for corrections to your data. Recurring data quality issues will need an automated solution for cleansing the data.
The landscape you are operating in will most likely change. Examples of changes that impact data sourcing include:
The direct impact of such changes is often changes to source data and evolving needs for reporting (outputs). For example, a management restructure may result in a new business hierarchy. As a result, KPI reporting will need to adapt to the new hierarchy.
Whilst nobody posses a crystal ball, we should all expect and prepare for change. Reporting is undermined when it is not able to react to changes. Likewise, rigid and brittle data collection and processing can be the achilles heal for a finance department.
Having accepted and embraced the likelihood of change, what happens when it is time to act? In particular, how will you ensure changes don't result in data quality issues? The answer lies in good planning and change management practices.
For example, imagine your company changes its CRM, a key data source for your reporting:
It is imperative that you embed controls for managing change. You must also ensure stakeholders have easy access to the knowledge they will need to implement changes. Staff must first be able to understand the data supply chains used for reporting. Next, those with permission should be able to make changes and have them reviewed and approved by stakeholders. These changes must also be documented for future knowledge.
Our data sourcing recommendations:
Collaborative data analysis is a double edged sword. Human expertise and experience, when introduced into reporting, ensures relevancy of the reports. That said, the more people involved, the greater the risk of data quality issues.
Recurring data quality issues resulting from human error will reduce the impact of your reporting. The alternative - a "black box" reporting environment, void of collaboration, is simply too rigid. You must have a balanced approach to data sourcing.
The focus must be on facilitating human interaction in a controlled and auditable manner. Ensure that staff can access data at controlled points in the supply chain. When they do access data, create discrete ways for adjustments and manipulation that are:
Doing so will allow you to achieve the best of both worlds in your data analysis and reporting.
Our Recommendations:
You can watch a recording of a webinar that we presented on data sourcing best practices below:
(How to source good data from SolveXia)