How to monitor data quality — a detailed guide

Preventing an error in data collection is easier than dealing with its consequences. The sagacity of your business decisions depends on the quality of your data. In this article, we tell you how to check the quality of data at all stages of collection, from the statement of work to completed reports.

Want to be sure about the quality of your data? Leave it to OWOX BI. We’ll help you develop metrics and customize web analytics. With OWOX BI, you don’t need to look for connectors and clean up and process data. You’ll get ready data sets in an understandable and easy-to-use structure.

The importance of testing in web analytics

Unfortunately, many companies that spend substantial resources storing and processing data still make important decisions based on intuition and their own expectations instead of data.

Why does that happen? Distrust of data is exacerbated by situations where data provides an answer that’s at odds with the expectations of the decision-maker. In addition, if someone has encountered errors in data or reports in the past, they’re inclined to favor intuition. This is understandable, as a decision made on the basis of incorrect data may throw you back rather than move you forward.

Imagine you have a multi-currency project. Your analyst has set up Google Analytics in one currency, and the marketer in charge of contextual advertising has set up cost importing into Google Analytics in another currency. As a result, you have an unrealistic return on ad spend (ROAS) in your advertising campaign reports. If you don’t notice this error in time, you may either disable profitable campaigns or increase the budget on loss-making ones.

In addition, developers are usually very busy, and implementing web analytics is a secondary task for them. While implementing new functionality — for example, a new design for a unit with accessories — developers may forget to check that data is being collected in Google Analytics. As a result, when the time comes to evaluate the effectiveness of the new design, it turns out that the data collection was broken two weeks ago. Surprise.

We recommend testing web analytics data as early and as often as possible to minimize the cost of correcting an error.

Cost of correcting a mistake

Imagine you’ve made an error during the specification phase. If you find it and correct it immediately, the fix will be relatively cheap. If the error is revealed after implementation, when building reports, or even when making decisions, the cost of fixing it will be very high.

cost of correcting a mistake

How to implement data collection

Data collection typically consists of five key steps:

  1. Formulate a business challenge. Say you need to assess the efficiency of an algorithm for selecting goods in a recommendations block.
  2. The analyst or person responsible for data collection designs a system of metrics to be tracked on the site.
  3. The person sets up Google Analytics and Google Tag Manager.
  4. The one sends terms of reference for developers to implement.
  5. After the developer implements metrics and sets up data collection, the analyst works with reports.
stages of data collecting

At almost all of these stages, it’s very important to check your data. It’s necessary to test technical documentation, Google Analytics and Google Tag Manager settings and, of course, the quality of data collected on your site or in your mobile application.

Features of data collection testing

Before you go to each step, let’s take a look at some requirements for data testing:

  • You can’t test without tools. At a minimum, you’ll have to work with the developer console in a browser.
  • There’s no abstract expected result. You need to know exactly what you should end up with. We always have a certain set of parameters we need to collect for any user interaction with a site. And we know the values these parameters should take.
  • Special knowledge is necessary. At a minimum, you need to be familiar with the documentation for the web analytics tools you use, the practice, and the experience of market participants.

Testing documentation for website data collection

As we’ve mentioned, it’s much easier to correct an error if you catch it in the specifications. Therefore, checking documentation starts long before collecting data. Let’s figure out why we need to check your documentation.

Purposes of testing documentation:

  • Fix mistakes with little effort. An error in documentation is just an error in written text, so all you have to do is make a quick edit.
  • Prevent the need for changes in the future that may affect the site/application architecture.
  • Protect the analyst’s reputation. An instrument with errors in development could call into question the competence of the person who drafted it.

Most common errors in specifications:

  1. Typos. A developer can copy the name of parameters without reading them. This isn’t about grammatical or spelling errors, but rather about incorrect names of parameters or values that these parameters hold.
  2. Ignoring fields when tracking events. For example, an error message may be ignored if a form wasn’t submitted successfully.
  3. Invalid field names and a mismatch with the enhanced ecommerce scheme. Implementing enhanced ecommerce with a dataLayer variable requires clear documentation. Therefore, it’s better to check all fields twice when drafting your specifications.
  4. You don’t have currency support for a multi-currency site. This problem is relevant to all revenue-related reports.
  5. Hit limits aren’t taken into account. For example, say there can be up to 30 different products on a catalog page. If we transfer information about views at the same time for all products, it’s likely that the hit in Google Analytics will not be transferred.

Testing Google Analytics and Google Tag Manager settings

The next step after you check your technical documentation is to check your Google Analytics and Google Tag Manager settings.

Why test Google Analytics and Google Tag Manager settings?

  • Ensure that parameters are correctly processed by data collection systems. Google Analytics and Google Tag Manager can be configured in parallel with the implementation of metrics on your site. And until the analyst is done, the data will not appear in Google Analytics.
  • Make it easier to test metrics embedded on the site. You’ll only need to concentrate on part of the developer’s work. At the final stage of X, you’ll need to look for the cause of the error directly on the site, not in the platform settings.
  • Low repair cost as there’s no need to involve developers.

Most common errors in Google Analytics:

  1. A custom variable wasn’t created. This is especially relevant for Google Analytics 360 accounts, which can have up to 200 metrics and 200 parameters. In that case, it’s very easy to miss one.
  2. The specified access scope is invalid. You won’t be able to catch this error during the dataLayer review phase or by reviewing the hit you’re sending, but when you create report, you’ll see that the data doesn’t look as expected.
  3. You get a duplicate of an existing parameter. This error doesn’t affect the data being sent, but it may cause problems when checking and building reports.

Most common errors in Google Tag Manager:

  1. No parameters have been added, such as to the Universal Analytics tag or Google Analytics Settings variable.
  2. The index in the tag doesn’t match the parameter in Google Analytics, creating a risk that values will be passed to the wrong parameters. For example, say you specified the index of the number of users parameter in GTM for the item rating parameter. This error is likely to be immediately found when building reports, but you’ll no longer be able to influence the collected data.
  3. Invalid variable name specified in the dataLayer. When you create a dataLayer, be sure to specify by which name the variable will be found in the dataLayer array. If you type or write another value, this variable will never be read from the dataLayer.
  4. Enhanced ecommerce tracking is not enabled.
  5. The start trigger isn’t configured correctly. For example, the regular expression for triggering X is written incorrectly or there’s an error in the event name.

Testing the implementation of Google Analytics

The last stage of testing is testing directly on the site. This stage requires more technical knowledge because you’ll need to watch the code, check how the container is installed, and read the logs. So you need to be savvy and use the right tools.

Why test embedded metrics?

  • Check that what’s implemented complies with the specifications and record any errors.
  • Check whether the values to be sent are adequate. Verify that the parameters are transmitting the values to be transmitted. For example, the category of goods don’t pass its name instead.
  • Give feedback to developers on the quality of implementation. Based on this feedback, developers can make changes to the site.

The most common mistakes:

  1. Not all scenarios are covered. For example, say an item can be added to the cart on the product, catalog, promo, or master page — that is, anywhere where there’s a link to the item. With so many entry points, you can miss something.
  2. The task isn’t implemented on all pages. That is, for some pages or some partition/directory, data isn’t collected at all or is only partially collected. To prevent such situations, we can draw up a checklist. In some cases, we can have as many as 100 checks for one function.
  3. Not all parameters are implemented; that is, the dataLayer is only partially implemented.
  4. The dataLayer scheme for enhanced ecommerce is broken. This is especially true for events such as adding items to the cart, moving between checkout steps, and clicking on items. One of the most common errors in implementing enhanced ecommerce is missing square brackets on the Products array.
  5. The dataLayer uses an empty string instead of null or undefined to zero the parameter. In this case, Google Analytics reports contain empty lines. If you use null or undefined, this option will not even be included in the hit you’re sending.

Tools for checking data

Tools we use to test data:

Let’s take a closer look at these tools.

Google Analytics Debugger

To get started, you need to install this extension in your browser and enable it. Then open the page ID and go to the Console tab. The information you see is provided by the extension.

This screen shows the parameters that are transmitted with hits and the values that are transmitted for those parameters:

Google Analytics Debugger

There’s also an extended e-commerce block. You can find it in the console as ec:

In addition, error messages are displayed here, such as for exceeding the hit size limit.

If you need to check the composition of the dataLayer, the easiest way to do this is to type the dataLayer command in the console:

dataLayer command in the developer console

Here are all the parameters that are transmitted. You can study them in detail and verify them. Each action on the site is reflected in the dataLayer. Let’s say you have seven objects. If you click on an empty field and call the dataLayer command again, an eighth object should appear in the console.

Google Tag Manager Debugger

To access Google Tag Manager Debugger, open your Google Tag Manager account and click the Preview button:

Then open your site and refresh the page. In the lower pane, a panel should appear that shows all the tags running on that page.

Google Tag Manager Debugger

Events that are added to the dataLayer are displayed on the left. By clicking on them, you can check the real-time composition of the dataLayer.

Testing mobile browsers and mobile applications

Features of mobile browser testing:

  • On smartphones and tablets, sites can be launched in adaptive mode or there can be a separate mobile version of the site. If you run the mobile version of a site on your desktop, it will be different from the same version on your phone.
  • In general, extensions cannot be installed in mobile browsers.
  • To compensate for this, you must enable Debug mode in the Universal Analytics tag or in the Google Analytics tracking code on the site.

Features of mobile application testing:

  • Working with application code requires more technical knowledge.
  • You’ll need a local proxy server to intercept hits. In order to keep track of the number of requests a device sends, you can filter requests by the name of the application or the host to which they’re sent.
  • All hits are collected in Measurement Protocol format and require additional processing. Once hits have been collected and filtered, they must be copied and parsed into parameters. You can use any convenient tool to do this: Hit Builder, formulas in Google Sheets, or a JavaScript or Python app. It all depends on what’s more convenient for you. Plus, you’ll need knowledge of Measurement Protocol parameters to identify errors in the sent hits.

How to use your mobile browser

  1. Connect your mobile device to your laptop via USB.
  2. Open Google Chrome on your device.
  3. In the Chrome developer console, open the Remote Devices report:
Remote Devices report
  1. Confirm the connection to your device by clicking Ok in the dialog box. Then select the tab you want to inspect and click Inspect.
  2. Now you can work with the developer console in standard mode, as in the browser. You will have all the familiar tabs: Console, Network, and others.

How to work with a mobile app

  1. To work with a mobile application, you must install and run a proxy server. We recommend Charles.
  2. One your proxy server is installed, check which IP address the application connects to:
  1. Then take your device and configure the Wi-Fi connection through the proxy server using port 8888. This is the port Charles uses by default.
  1. After that, it’s time to collect hits. Note that in applications, hits are not sent to collect but to batch. Batch is a packaged request that helps you send multiple requests. First, it saves application resources. Second, if there are network problems, the requests will be stored in the application and one common pool will be sent as soon as the network connection is reestablished.
  1. Finally, the collected data must be parsed (disassembled) into parameters, checked in order, and checked against the specifications.
table of parameters

Checking data in Google Analytics reports

This step is the fastest and easiest. At the same time, it makes sure the data collected in Google Analytics makes sense. In your reports, you can check hundreds of different scenarios and look at indicators depending on the device, browser, etc. If you find any anomalies in the data, you can play the script on a specific device and in a specific browser.

You can also use Google Analytics reports to check the completeness of data transferred to the dataLayer. That is, depending on each of the scenarios, the variable is filled, whether there are all parameters in it, whether the parameters take the correct values, etc.

The most useful reports

We want to share the most useful reports (in our opinion). You can use them as a data collection checklist:

Let’s see what these reports look like in the interface and which of these reports you need to pay attention to first.

Product Performance report

The most valuable tab in this report is Shopping Behavior. It analyzes the completeness of data collection at each stage of enhanced ecommerce. That is, we can see if Google Analytics transfers product list views, clicks, product detail views, addition/deletion of products to/from the basket, and the purchases themselves.

Product Performance report

What should we pay attention to here? First, it’s very strange if you have zero values in any of the columns. Second, if you have more values at some stage than at the previous stage, you’re likely to have problems collecting data. For example, say the number of unique purchases of an item is greater than the number of checkouts. That’s weird and it’s worth paying attention to.

You can also switch between other parameters in this report, which should also be sent to Enhanced Ecommerce. For example, if you select Item Category as the main option, you may see there are sales for certain categories of items but there are no views for these items, no adds to the cart, etc.

Top Events report

First of all, it’s necessary to walk through all parameters that are transmitted to Google Analytics and see what values each parameter takes. Usually it’s immediately clear whether everything is okay. More detailed analysis for each of the events can be carried out in custom reports.

Top Events report

Cost Analysis report

Another standard report that can be useful for checking the importing of expense data into Google Analytics is Cost Analysis.

We often see reports where there are expenses for some source or advertising campaign but there are no sessions. This can be caused by problems or errors in UTM tags. Alternatively, filters in Google Analytics may exclude sessions from a particular source. These reports need to be checked from time to time.

Custom reports

We would like to highlight the custom report that allows you to track duplicate transactions. It’s very easy to set up: the parameter must be a transaction ID and the key dimension must be transactions.

custom report

Note that when there’s more than one transaction in the report, this means that information about the same order was sent more than once.

checking the duplication of transactions

If you find a similar problem, read these detailed instructions on how to fix it.

Learn more about what to pay attention to when configuring web analytics and which reports to use for verifying data quality in our post on how to conduct an audit of website analytics.

Automatic email alerts

Google Analytics has a very good Custom Alerts tool that allows you to track important changes without viewing reports. For example, if you stop collecting information about Google Analytics sessions, you can receive an email notification.

Custom alerts in Google Analytics

We recommend that you set up notifications for at least these four metrics:

  • Number of sessions
  • Bounce rate
  • Revenue
  • Number of transactions

To set up notifications, see our post on automating reports in Google Analytics.

Testing automation

In our experience, this is the most difficult and time-consuming task — the narrow line where mistakes are the most common.

To avoid problems with dataLayer implementation, checks must be done at least once a week. In general, the frequency should depend on how often you implement changes on the site. Ideally, you need to test the dataLayer after each significant change. It’s time-consuming to do this manually, so we decided to automate the process.

Why automate testing?

To automate testing, we’ve built a cloud-based solution that enables us to:

  • Check whether the dataLayer variable on the site matches the reference value
  • Check the availability and functionality of Google Tag Manager code
  • Check that data is sent to Google Analytics and OWOX BI
  • Collect error reports in Google BigQuery

Advantages of test automation:

  • Significantly increase the speed of testing. In our experience, you can test thousands of pages in a few hours.
  • Get more accurate results, since the human factor is excluded.
  • Lower the cost of testing, as you need fewer specialists.
  • Increase the frequency of testing, as you can run tests after each change to the site.

A simplified scheme of the algorithm we use:

algorithm of automatic testing

When you sign in to our app, you need to specify the pages you want to verify. You can do this by uploading a CSV file, specifying a link to the sitemap, or simply specifying a site URL, in which case the application will find the sitemap itself.

Then it’s important to specify the dataLayer scheme for each scenario to be tested: pages, events, scripts (a sequence of actions, such as for checkout). Then you can use regular expressions to specify that the page types match the URL.

After receiving all this information, our application runs through all pages and events as scheduled, checks each script, and uploads test results to Google BigQuery. Based on this data, we set up email and Slack notifications.

P. S. If you need a full website audit, you can request consulting services from OWOX BI. Sign up for a demo and we’ll discuss the possibilities. 

Sign up for a demo

FAQ

Expand all Close all
  • What is data testing?

    Data testing is the process of verifying and validating the accuracy, completeness, consistency, and validity of data used in a system or application. It involves various techniques and tools to identify and correct errors, inconsistencies, and discrepancies in the data.
  • Why is data testing important?

    Data testing is crucial to ensure that data is correct, reliable, and trustworthy. Inaccurate data can lead to wrong decisions, loss of revenue, and damage to reputation. Data testing helps to identify and fix data errors early on, saving time and resources and improving data quality.
  • What are the different types of data testing?

    There are several types of data testing, including functionality testing, integration testing, performance testing, security testing, and usability testing. Each type of testing evaluates different aspects of data quality and helps to ensure that data meets the required standards.