Overcoming Common Data Quality Issues with OWOX BI

Though we are in the age of data-driven technologies, more than 50% of senior executives are generally dissatisfied with the value obtained from analytics. One of the main reasons for this is poor data quality. Moreover, having data quality issues at the very beginning of any task makes further actions useless. Additionally, any decisions made will be based on faulty information, and for businesses, this sure is bad news.

That’s why we’ve decided to share our many years’ experience and expertise working with data. Over the past year, we’ve talked a lot with our fellow analysts and conducted many interviews with world-class specialists. Now we want to guide the most frequent problems in working with data quality issues and how to overcome them.

The article discusses the stages of a data workflow, what errors can occur, and why. And most importantly, we share approaches to solve data quality difficulties.

Some definitions before we begin:

What is data quality?

In a nutshell (and in terms of marketing data), quality data is relevant, up-to-date data without errors and discrepancies. If we look up data quality on Wikipedia, we’ll see more than 10 (!) definitions. Furthermore, Wikipedia cites the latest research by DAMA NL into definitions of dimensions of data quality using ISO 9001 as a frame of reference.

data quality

Why data quality is important?

Whether you are a marketers, digital analysts, or any decision-makers, you can't rely on the analytics based on the data you don't trust. 

Data processing flow

In dealing with the vast amount of data that marketers and analysts use daily, it’s a challenge to eliminate errors and discrepancies completely. It’s extremely difficult to provide quality data to an end user immediately. However, data errors can be actively fought and proactively found. 

First of all, let’s look at the process of working with data and lay down the steps where data quality issues can appear:

  1. Measurement planning

  2. Primary data collection

  3. Raw data normalization

  4. Business data preparation

  5. Data visualization

In our experience, factual data errors usually appear in the first two steps, whereas data misuse happen more frequently during the last steps.

Data processing flow
Data processing workflow

Let’s see in more detail what data quality issues can arise at these steps and how to solve them.

Step 1. Plan measurements

Even though there are no errors in data at this step, we cannot completely omit it. The devil is in the details, and collecting data for analysis begins with detailed planning. Our recommendation is to always start with an express analysis and carefully plan the collection of all the marketing data you need.

Skipping this step leads to an unstructured approach and insufficient data for new tasks or projects. Collecting fragmented data from all sources is the goal. Without all the data, decisions and actions are flawed from the very beginning.

Let’s see what data you should collect before starting new projects:

  • User behavior data from your website and/or application

  • Cost data from advertising platforms

  • Call tracking, chatbot, and email data

  • Actual sales data from your CRM/ERP systems, etc.

Step 2. Collect primary data

Once you’ve created your measurement plan, let’s proceed to the primary data collection step. 

During this step, among all the other challenges you need to overcome, you have to consider controlling access to your data (it’s all about data security) and preparing in advance for the creation of your data storage or data warehouse.

If you want to gain complete control over your raw data without modifying it, we recommend using single storage with automated data import. For marketing needs, Google BigQuery remains the best options because of the google ecosystem.

What data quality difficulties you can come across at this step:

1.1 Getting incomplete and incorrect data from an advertising service’s API

What is this?

Advertising platforms and services collect vast amounts of valuable user behavior data, and the problem occurs when you try to get all of this information in full from these data sources without damaging its completeness.

*An Application Programming Interface (API) is a part of the server that transmits data (receives requests and sends responses), interacting with the user every time the user visits a page on the website.What’s wrong with that?

An advertising service collects data about a user’s actions; however, data may change retrospectively after it’s transferred, and the service API may be updated or unavailable for some time. As a result, part of the data simply won’t be delivered, and the overall quality will be corrupted. The analyst may not know about this and may use irrelevant data to prepare business data. However, Incomplete or inaccurate data leads to flawed insights and decisions. Investing time and effort without reliable data results in uncertainty about business performance and profitability. Complete and accurate data is essential for success.

What are the reasons?

Data can be incomplete or incorrect due to data loss, discrepancies, or duplicates. Common reasons for data quality issues include:

  • A new account on an advertising platform is created, but analysts aren’t informed and data from it isn’t collected.

  • An API doesn’t support dynamic parameters in UTM tags and doesn’t collect or transfer them. If you use unsupported parameters in UTM tags, you’ll see the parameter names instead of their values — for example, geo instead of the actual campaign name.

  • Due to retrospective data updates, Advertising services often retrospectively update data (on costs, clicks, and impressions in advertising accounts). However, not all analysts are aware of this and take it into account.

How to solve this problem?

Since it’s impossible to control code when working with APIs directly, you can fight these problems by:

  • distributing areas of responsibility during data collection to gain better control over data harvesting.

  • using automated data import tools that support API changes, dynamic parameters, etc. In case data is unavailable, these tools can show existing data gaps and download data retrospectively.

bonus for readers

Ultimate data quality checklist

Download now

1.2 Getting incomplete and incorrect data from a website

What is this?

We know how much we spend on advertising by analyzing data from advertising services. While from website user behavior data, we get information about how much we earn. Since business questions usually sound like “which advertising pays off, and which does not?” it’s essential to know the income/expense ratio.

What's wrong with that?

Data on website user behavior differs from cost data collected by advertising services because data on website user behavior

  • is collected directly by website owners

  • is significantly larger in volume than cost data from advertising services

In addition, if we don’t notice that there’s no collection of user behavior data and we don’t eliminate the problem, then this data is lost forever.

What are the reasons?

Errors can occur when collecting website data because:

  • Not all website pages have a Google Tag Manager (GTM) container. GTM collects the result of advertising campaigns and user behavior data. Therefore, if there’s no GTM container on the page, no data is collected.

  • The Google Cloud project isn’t paid on time, and therefore data collection is put on hold.

  • The correctness of data entered into application forms by website users (name, address, email, etc.) needs to be validated.

How to solve this problem?

  • As with collecting data from an API, the solutions for website data collection include:using automated data import tools; in case data is unavailable, these tools can warn you of existing data errors

2. Getting aggregated, sampled data

What is this?

Aggregated and sampled data is generalized data that appears in cases when not all data is processed and used for analysis and reporting. This happens when services like Google Analytics analyze only part of the data to reduce the load on servers and balance the speed and accuracy of data processing. Since sampling results in generalization, it leads to a lack of trust in the obtained results.

What’s wrong with that?

Sampled reports distort performance data, and that can cost you a fortune when it comes to money-related metrics such as goals, conversions, and revenue. Because of this, you risk not noticing a profitable advertising campaign and may turn it off due to distorted data in a report, or vice versa — you may spend all your money on inefficient campaigns.

What are the reasons?

In an attempt to create reports as soon as possible and save resources, systems apply sampling, aggregation, and filtering instead of processing massive data arrays.

How to solve this problem?The only thing you can do to avoid data sampling is to collect raw data and constantly check data completeness throughout all your reports. This process monitoring is preferably done automatically as a way to elude human factors. For example, you can apply automatic testing of correct metrics collection on your website as our client did with the help of OWOX BI.

Step 3. Normalize raw data

After collecting all the necessary data, it’s time to normalize it. At this step, analysts turn available information into the form required by the business. For example, we must get phone numbers into a single format.

Data normalization is a manual and routine “monkey job” that usually keeps analysts from more exciting tasks such as extracting useful data insights. Not to mention that normalization difficulties usually take up to 50% of an analyst’s work time overall.

Data quality difficulties one can come across at this stage:

1. Insertion, updating, and deletion dependencies

What is this?

These are troublesome side effects that appear during the normalization of unstructured data.

What’s wrong with that?

The common outcome of these data dependencies is that reporting systems discard such incorrect data while analyzing it. As a result, we end up with inaccurate reports that aren’t based on full data. For example, say we have a sessions object and an advertisements object. In sessions, we have data for days 10 to 20, and in advertisements, there’s data from days 10 to 15 (for some reason there is no cost data for days 16 to 20). Accordingly, either we lose data from advertisements for days 16 to 20, or data from sessions will only be available for days 10 to 15.

What are the reasons?

Undesirable side effects appear when an advertising service API is changed, isn’t available, or returns incorrect data.

How to solve this problem?

In the same way, you check for data collection errors, you should always verify the data you work with. Moreover, if the specifics of data merging aren’t known by the user, then chances are mistakes will occur while normalizing the data. In practice, the best decision at this step is to develop a data quality monitoring system that alerts the person responsible for data quality in case of data anomalies. For this, you can use such services as OWOX BI with its embedded data monitoring functionality.

Book a demo

Unlock Data Potential Today!

Discover data solutions tailored for you.

Book a demo

2. Different data formats, structures, and levels of detail

What is this?

Each advertising platform or service uses different data formats, currencies, and structures. As a result, after having collected data from all sources, you have to normalize or blend it into a single format/structure.

What’s wrong with that?

It’s a tough task to build a triangular fortress when all you have are round and oval pieces. When you have lots of data sets in different formats, you simply cannot create a report before unifying the data.

What are the reasons?

There are different data schemes across advertising services. For instance, the same column in different platforms can be named Product Name and Product Category. Another example is using different currencies in cost data: dollars for Twitter Ads and pounds on Facebook Ads, for example.

Data normalization

How to solve this problem?

Before analyzing data, it must be converted to a single format; otherwise, nothing good will come out of your analysis. 

For example, you should merge user session data with advertising cost data to measure the impact of each particular traffic source or marketing channel and to see which advertising campaigns bring you more revenue. Of course, this can be done manually by utilizing scripts and SQL, but applying automated solutions is a better choice.

Step 4. Prepare business-ready data

Business-ready data is a cleaned final dataset in the structure that corresponds to the business model

In other words, if you have gone through all the steps in working with data and have completed everything, you should get the final dataset. Its ready-made data that can be sent to any data visualization service such as Power BI, Tableau, Looker Studio (formerly Google Data Studio) or Google Sheets.

However, you shouldn’t confuse it with raw data on which you can try to build a report. It leads to recurring issues, lengthy error detection, and duplicated business logic in SQL queries. Managing updates and changes becomes challenging, causing problems like updating cost data history after adjustments by advertising services or handling repurchased transactions.

Unfortunately, it’s common practice to skip this step and jump right into creating the report. 

However, our experience demonstrates that using a final dataset and preparing business-ready data for the modern world of analytics is a must. In the long run, it’s cheaper and easier to work with prepared data instead of running around raw data doing the same things again and again.

What data quality difficulties can appear during this step:

1. Lack of data definitions leads to discrepancies

What is this?

It’s difficult to control changes in transformation logic due to inconsistent or absent definitions of the types of data required throughout data processing.

What’s wrong with that?

When data is not clearly defined, users aren’t on the same page about data use: they aren’t sure which table or column to query, which filter to use, or who to ask for information about data objects. Besides, it takes too long to navigate through and understand all data objects from raw data, including their attributes, their place in the data model, and their relevance to each other.

What are the reasons?

A business hasn’t clearly defined its core data and its data model. Therefore, the logic for merging data is incomprehensible.

How to solve this problem?

First of all, don’t apply business logic to each report or dataset but use data modeling at the company level. Within the company, there should be a transparent business data model and control of the data lifecycle. This means that all definitions used must be clear. For example, end users should be aware of what conversion and website visitor metrics represent.

Along with that, as it’s challenging to prepare and maintain up-to-date simulated data, the answer lies in applying automated solutions (e.g. OWOX BI Transformation) that can clean, normalize, blend and attribute your data so it’s business-ready and prepared for reporting. 

Step 5. Visualize data

Visually presenting key metrics is the last step to make data work, so your data presentation should be both informative and user-friendly. Automated and properly configured visualizations can significantly reduce the time to find a problem; that is, you can perform more iterations with less effort over the same period to improve data quality.

Also, it’s important to remember data visualization services like the popular Google Data Studio cannot merge or transform data. If you need reports based on many data sources, our recommendation is to collect all the data you need beforehand into a single data storage to avoid any difficulties.

Data quality difficulties you can come across at this step:

1. Factual data errors

What are these?

When something goes wrong at the previous levels of data collection and normalization, data errors can end up in reports shown by data visualization services.

What’s wrong with that?

Reports created with factual data errors are a waste of time and money. They provide no profit or valuable insights into risk and growth zones for the business. Sorry, Mario, but the princess is in another castle!

What are the reasons?

Simply put, the visualized data isn’t relevant. However, to find out the source of those errors, you should go all the way back to verifying your data again.

How to solve this problem?

The only way to solve this problem is to thoroughly prepare data and monitor its quality before creating reports.

2. Broken SQL queries or too many edits to reports (and/or SQL queries)

What is this?

Data requirements are constantly changing, and SQL queries also change. As a result, the more complicated the reporting system becomes, the easier it is for it to break down.

What’s wrong with that?

There’s nothing wrong with changes, unless there are so many that it’s impossible to remember what changes were made, where, and when. Eventually, all carefully built reporting systems can disappear, since the SQL queries aren’t working and there’s no correct data to visualize.

What are the reasons?

It’s quite a challenge to remember every small thing, so the typical mistake is to forget to apply edits on all datasets where they’re needed.

How to solve this problem?

Ideally, users should be able to generate marketing reports without using lots of SQL queries and without needing to apply lots of changes and/or edits.

3. Misunderstanding and misuse of collected data

What is this?

One of the most common problems is misunderstanding data (and, therefore, misusing it). This happens when a particular metric or parameter can be interpreted in more than one way. For example, say there’s a conversion metric in a report and this report is used by different users. One of the users thinks that a conversion means a website visit, while another thinks it means placing an order. However, there’s also a third person who thinks that this conversion metric is about delivered and purchased orders. As you can see, there are lots of potential interpretations, so you must make it clear what information is presented in the report.

What’s wrong with that?

If there’s no clear understanding of what data is used in reports and dashboards, there’s no guarantee that your decisions will be based on the facts on the ground.

What are the reasons?

An unclear explanation of metrics and parameters used in reports or an inappropriate type of data visualization can lead to poor decisions.

How to solve this problem?

Data verification doesn’t end when you make sure your input data is correct and relevant. This data still can be misused. To avoid this problem, end users have to have access to complete, up-to-date, business-ready data with clear and precise explanations of what information is presented in the reports.


Measure CPO and ROAS in GA4

Automatically link your Ad Platforms cost data to Google Analytics 4 conversion data, so you can analyze your marketing KPIs and make fully informed decisions

Start Free Trial
Measure CPO and ROAS in GA4

Reconsider your data relationships with OWOX BI

The OWOX BI team knows more than anyone how severe the data problem is, since each of our clients encounters it. We have made a product that allows analysts to automate the routine, deliver business value from data, and ensure data quality.

OWOX BI is a unified platform that empowers you to collect, prepare, and analyze all your marketing data. It automates data delivery from siloed sources to your analytics destination, ensuring data is always accurate and up to date.

all-in-one marketing analytics tool

By applying OWOX BI, you can get business-ready data according to your business model with transparent data quality monitoring and an easy-to-use report builder for unlocking insights without SQL or code.

Let’s look at how OWOX BI can help you at all the steps we mentioned above.

  • Plan your measurements. Create a measurement plan for your business or develop a system of metrics especially for your business needs with the help of our specialists.

  • Collect primary data. OWOX BI collects raw data from advertising services, on-site analytics, offline stores, call tracking systems, and CRM systems in your data storage. The platform works smoothly with large ad accounts and uploads all data regardless of the number of campaigns. Besides automatically importing cost data, OWOX BI recognizes dynamic parameters in UTM tags, converts costs into one currency, and monitors the relevance of data.

No more connectors and custom integrations.

report
  • Normalize raw data. When using OWOX BI, you don’t need to manually clean, structure, and process data. You’ll receive ready datasets in the clearest and most convenient structure. Moreover, at any time you can get a visual report on the relevance of data from advertising services uploaded to Google Analytics.

data normalization
  • Prepare business data. With OWOX BI, you have trusted business-ready data at your fingertips. There’s no longer any need to create a new dataset for every new report, as you get prebuilt final datasets prepared according to your business data model. With up-to-date and unified data ready for further data segmentation, you can get insights into the speed of your business and increase the value of your data.


data preparation
  • Visualize data. The OWOX BI platform lets you analyze and visualize your data wherever you want. Once your marketing data is ready, you can send it to the BI or visualization tool of your choice with a few clicks.

data visualization

Book a free demo to see how OWOX BI guarantees data quality and how you can benefit from fully automated data management today!

BOOK A DEMO

Key takeaways

Quality analytics starts with quality data. Your data efforts should be valuable and trusted. That’s why you should proactively identify and fight data quality issues  and discrepancies to get holistic performance reporting and improve the quality of your data. In general, improving data quality is an iterative process in which three steps can be distinguished:

  1. Examining the status of your data

  2. Localizing a problem that degrades data quality

  3. Fixing the problem

The most time-consuming are the first two steps. It's difficult to identify that a problem exists and understand in which situations an error arises. After it becomes clear what the problem is and where it’s located, it turns out to be relatively easy to fix it.

Finally, you can create three documents to improve your data:

  • Data movement scheme

  • Responsibility matrix (who is responsible for what)

  • Business data model

FAQ

Expand all Close all
  • How to ensure data quality?

    You can ensure data quality by establishing clear data quality standards, implementing automated data validation tools, regularly monitoring and cleansing your data, and providing data governance training to staff.
  • How can I avoid data duplication?

    You can avoid data duplication by setting up a unique identifier for each record, performing regular deduplication cleaning and integration check, and enforcing data entry standards.
  • What are the most common data quality issues?

    The most common data quality issues include missing, inaccurate, or incomplete data, inconsistent formatting, and duplication of data.