5 Reasons to Create Reports in Google BigQuery
Alexander Diduh, Web analyst at OWOX BI
Google Analytics is the world’s most used web analytics system. It’s easy to implement and provides a variety of out-of-the-box reports. All in all, Google Analytics is a great tool for analyzing basic KPIs. However, if you’ve ever tried performing deeper analysis of your website, you may have noticed that in some cases Google Analytics lacks the capabilities you need. This may be due to a number of factors: internal limitations of the system, scopes of dimensions and metrics, sampling and aggregation of reported data.
In this article, you’ll learn what challenges you might face when building reports in Google Analytics, and how to solve these challenges using the Google BigQuery cloud database.
1. Limitations on the number of dimensions and metrics in reports
The first problem you might encounter when working with Google Analytics is the limited number of dimensions and metrics you can use in reports. Take for example, you want to perform cohort analysis of your customers in order to evaluate the performance of advertising channels and optimize customer acquisition costs. Or, you need to monitor the customers’ LTV The revenue you get from a particular customer over the entire time they do business with your company. in each individual cohort during a reporting period, and approach each cohort differently to improve this indicator. To perform this analysis, you’ll need to break your users into cohorts and analyze data about each hit, session, and user, plus data about revenues and advertising costs.
The data can be processed in the Google Analytics interface, or passed on to a third-party service using an API (e.g. Core Reporting API v3). Whichever of the two options you choose, you’ll face a restriction on the maximum number of dimensions and metrics you can use in reports. To find out more about dimensions and metrics, read our blogpost.
In the Google Analytics interface, you can only use a maximum of 5 dimensions in one report, as seen on the screenshot of a custom report below:
Core Reporting API v3 has a limit of 7 dimensions in one query:
The number of metrics in the Core Reporting API v3 is limited to 10. In addition, an API request cannot consist only of dimensions; every request requires at least one metric. Let’s say you want to send emails to users who were active in the past month. You only need the User ID dimension in order to create the list of active users, but you’ll also have to add a metric, which is not very convenient.
In the Google Analytics interface, you can only choose a maximum of 25 metrics:
In Google BigQuery, there are no such constraints. Collecting and processing data in Google BigQuery helps overcome limitations on the number of dimensions and metrics in reports. Let’s say you want to segment your audience in order to run retargeting campaigns, create more relevant content, or adapt your website to better fit a popular browser. To do this, you’ll need to measure the number of pageviews, events, transactions, and sessions, generated by each user, and also for each of the user’s attributes: device, browser, browser version, city, country, etc.
If you try doing so in Google Analytics, your report will be limited to 5 dimensions:
With Google BigQuery, you can get all the information you need by simply listing the dimensions and metrics you want to see, in a single query. In the examples below, the source data to be queried was collected by OWOX BI Pipeline:
SELECT clientId, sessionId, date, device.deviceCategory deviceCategory, device.browser browser, device.browserVersion browserVersion, geoNetwork.city city, geoNetwork.country country, trafficSource.channelGrouping channel, trafficSource.source source, trafficSource.medium medium, totals.hits hits, totals.pageviews pageviews, totals.events events, totals.transactions transactions, totals.visits visits FROM [owox-demo:OWOXBI_Streaming.session_streaming_20160708]
The query will return all the data you need:
2. Compatibility of dimensions and metrics in reports
Each dimension and metric in Google Analytics has a scope. There are four types of dimensions, based on their scope:
- Hit-level — value is applied to each single hit — a user’s interaction with the website. For instance: "Page", "Page Title", "Event Category" etc.
- Session-level — value is applied to all hits in a session. For instance: "Source", "Medium", "Keyword".
- User-level — value is applied to all hits in sessions of a certain user. For instance: "Gender", "Age", "City".
- Product-level — value is applied to each product (Enhanced Ecommerce only). For instance: "Product", "Product Brand", "Product Category".
Metrics have only two levels of scope: Hit (e.g, Sessions, % New Sessions, Bounce Rate) and Product (e.g. Product Revenue, Unique Purchases, Quantity).
In Google Analytics, you can only combine dimensions and metrics if they share the same scope. This becomes a real problem if you need to combine dimensions and metrics of different scopes in one report. Let’s say you want to see how visitors interacted with the products of a certain brand, e.g. Gucci, and use this data to increase sales. Your task is to measure the number of sessions (the Sessions metric) in which users viewed Gucci products, and the number of users who viewed the product.
The compatibility of dimensions and metrics in a query can be validated using the Dimensions & Metrics Explorer. In our example, the Product Brand dimension is incompatible with the Users and Sessions metrics:
With Google BigQuery, it only takes a couple of seconds to run a simple query and see how users interacted with the brand:
SELECT hits.product.productBrand Brand, COUNT_DISTINCT(sessionId) Sessions, COUNT_DISTINCT(clientId) Users FROM [owox-demo:OWOXBI_Streaming.session_streaming_20160701] WHERE hits.product.productBrand='Gucci' AND hits.eCommerceAction.action_type='click' GROUP BY 1
The resulting report shows that 5 users have viewed Gucci products in 5 sessions:
Use this information to achieve your goals. For instance, you can measure the performance of your retargeting campaigns for Gucci products. Or send «special offer» emails to users who were interested in the products.
3. Data Sampling
Google Analytics applies sampling to process data faster when the amount of data exceeds a certain threshold (500 thousands of sessions at the property level with Google Analytics Standard and 1 million of sessions at the view level with Google Analytics 360). What does this mean? When there’s too much data, Google Analytics takes a sample of all data, analyzes the sample, and says that the results are applicable to all the data. For example, 20% of data is analyzed, the results are multiplied by 5, and shown in the report. This is convenient, quick, but rarely all that accurate. Let’s say for example, you need to count how many seeds are in an orange. The orange was cut into 10 wedges, you got one, and there’s a seed in it. You could probably multiply it by 10 and say that there are 10 seeds in the orange. But how can you be sure about that? What if other wedges have 0, or 2, or even 3 seeds?
Sampling can negatively affect the reliability of your reports. This is especially the case when reports are built for a long period of time or with a large number of dimensions and metrics.
How can you avoid sampling and make more accurate decisions? Pass all the data to Google BigQuery using OWOX BI Pipeline or Google BigQuery Export for Analytics 360 (to learn more about these two methods of data collection, read our blogpost). You’ll be able to query unsampled data, thus drawing more accurate conclusions. Moreover, it only takes Google BigQuery just a few seconds to process data, no matter if it’s daily or years’ worth of data you query.
4. Aggregated values in reports
Dimensions in Google Analytics can have multiple unique values assigned to them. For instance, the Gender dimension has only two potential values: Male or Female, while the Page dimension can have an infinite number of values (i.e. website URLs) assigned. Dimension-value pairs are stored as separate rows.
If the number of rows in in daily processed table exceeds 50 thousands (75 thousands for Google Analytics 360), lower volume dimension-value combinations are aggregated into the (other) entry. This is what it looks like in reports:
The example above shows that a significant percentage of pages (26.13%) visited by users is rolled up into the (other) row. That is, you can’t see the URLs, nor can you see the metrics, for the pages of your website that are visited the least often.
Data aggregation is applied not only to daily reports. Google Analytics has a limit of 1 million rows per report for any date range. Rows in excess of 1 million are also rolled up into the aggregate (other) row.
On the one hand, aggregated data is easy to work with. The basic KPIs are already presented in convenient reports. On the other hand, more complex tasks can only be performed with non-aggregated (raw) data. These tasks include tracking the entire customer journey on your website, creating custom attribution models to evaluate the efficiency of each advertising channel, targeting your ads more accurately, etc.
By collecting data in Google BigQuery, you can prevent aggregation in your reports and make sure you won’t miss any important details when analyzing the data.
5. Examples of reports in BigQuery
We’ve covered four major cases when Google Analytics is not enough. However, there are many more complex reports that can be built using non-aggregated data in Google BigQuery, but can’t be created with Google Analytics alone.
Let’s say you want to see user paths — the sequences of pages users visit on your website — and see which of the path users take the most often. This task can be mastered with a report in which you can select the base URL and see the previous and the next URLs for the chosen page. With Google Analytics, you can only see either entry or exit pages for the chosen URL, but with Google BigQuery, you can see both. Such report will help analyze and improve customer experience for each page. This includes changing the button position, improving the navigation menu, etc.
The URL sequence report can be easily created by querying the data in Google BigQuery:
SELECT IF(a.URL IS NOT NULL,a.URL,b.URL) BaseURL, IF(a.PreviousURL IS NOT NULL,a.PreviousURL,b.NextURL) URL, a.Number PreviousPageQuantity, b.Number NextPageQuantity FROM ( SELECT URL, PreviousURL, COUNT(PreviousURL) Number FROM ( SELECT sessionId, time, URL, number, LAG(URL) OVER (PARTITION BY sessionId ORDER BY time) PreviousURL FROM ( SELECT sessionId, hits.time time, hits.pagePath URL, RANK() OVER (PARTITION BY sessionId ORDER BY hits.time ASC) number FROM [owox-demo:OWOXBI_Streaming.session_streaming_20160711] WHERE hits.type='pageview' ORDER BY 1, 2, 3) ) WHERE URL='/products/id1222894335' GROUP BY URL, PreviousURL) a FULL OUTER JOIN EACH ( SELECT URL, NextURL, COUNT(NextURL) Number FROM ( SELECT sessionId, time, URL, number, LEAD(URL) OVER (PARTITION BY sessionId ORDER BY time) NextURL FROM ( SELECT sessionId, hits.time time, hits.pagePath URL, RANK() OVER (PARTITION BY sessionId ORDER BY hits.time ASC) number FROM [owox-demo:OWOXBI_Streaming.session_streaming_20160711] WHERE hits.type='pageview' ORDER BY 1, 2, 3)) WHERE URL='/products/id1222894335' GROUP BY URL, NextURL) b ON a.PreviousURL=b.NextURL WHERE PreviousURL IS NOT NULL OR NextURL IS NOT NULL ORDER BY 3 DESC, 4 DESC
As a result, instead of disparate data that you would have to combine manually, you will receive a ready-to-use report:
The results of the query can be easily visualized using Google Sheets or Google Data Studio:
The example above shows that users most often enter the product page in review after viewing the ID 1946305601 product page, and then most often proceed to view the ID 1236971198 product page. This information can be used, for example, to adjust the product recommendations on your website.
Google BigQuery makes it possible to create reports of virtually any complexity. For example, cohort analysis reports with all the indicators your business needs to see. A case in point would be the success story of Contentmart , who used cohort analysis with Google BigQuery to improve the LTV and broaden their customer base. To see more ready-made queries you can use for your projects, visit OWOX BI Help Center.
Google BigQuery is a powerful tool, an indispensable helper for marketers and analysts who need more than Google Analytics functionality. It helps overcome such challenges as the limit on the number of dimensions and metrics in reports, data sampling and data aggregation. Moreover, you can use it to create more complex and informative reports. There’s no reason not to give it a try, even more so because there’s a free Trial :)
Also, by tradition, we have a bonus for you: A query you can use to see the conversion paths and the revenue generated by customers who arrive via different channels, on different devices. Enter your email address, and get the query, an example of the resulting report, and the possible interpretations of the data, delivered straight to your inbox.
How much does it cost to use GBQ reports?The cost of using GBQ reports depends on the amount of data you are analyzing and the queries you are running. However, basic usage of GBQ reports is generally included in Google Cloud's free tier, which provides up to 1 terabyte of data analyzed per month free of charge.
Do I need a technical background to use GBQ reports?Although some technical knowledge of SQL and query language can be helpful, there are many resources available for beginners to learn how to utilize GBQ reports effectively, including online tutorials and courses.
What is the difference between standard Google Analytics reports and GBQ reports?Standard Google Analytics reports provide out-of-the-box data insights while GBQ reports provide more detailed insights based on custom queries and analyses, using the Google BigQuery platform.