How to Export Raw Data from Google Analytics
Why do you need to gather raw unsampled data? To get complete data for marketing reporting and correctly estimate advertising efficiency. In this article, we’ll explore how you can gather raw data and where to store it.
Table of contents
- Why you need to gather raw unsampled data
- Four ways to gather raw data
- Where to store collected data
- Short conclusions
Why you need to gather raw unsampled data
Google Analytics is an undisputed leader among web analytics services. It’s free, easy to work with, and it provides insights about the key KPI of online businesses. However, there are limitations in the system that prevent you from getting deeper into the data and exploring it from all sides.
- The data you see in Google Analytics reports is always aggregated, and this process is beyond control.
- Sampling, which can seriously distort your data and lead to wrong business decisions.
- Reports can contain only a limited number and only specific combinations of parameters and key figures.
- Limit on a number of lines.
- Data processing time — If you use a free version of Google Analytics, you need to wait up to 24-48 hours for the system to complete data processing.
Fortunately, most of these problems can be solved with raw data. Pretty cool, huh? Let’s figure out how to get raw data.
Four ways to gather raw data
OWOX BI Pipeline
Set up automatic raw data collection through OWOX BI Pipeline — all hits that are sent from your website to Google Analytics, are sent to Google BigQuery in parallel. Thanks to this, each hit is available in GBQ in a few minutes.
Tables with session data are formed according to the OWOX BI algorithm — this process is described in detail in our help center. At the same time, OWOX BI uses a data structure compatible with the GA structure, under which many examples of SQL queries are written. This saves your team time preparing reports.
Collecting raw hit data with OWOX BI gives you these benefits:
- User behavior data is transferred to Google BigQuery in real-time and without restrictions on the number of hits.
- In Google Analytics, the number of user parameters is limited: 20 in the standard version and 200 in the paid version. But in GBQ you can collect as many custom parameters as you like and build deeper reports for detailed analysis.
- You can build reports in GBQ without sampling and restrictions on the number and compatibility of parameters and dimensions, for any period.
- Unlike Google Analytics, in BigQuery you can collect and use personal customer data, including email addresses and phone numbers.
- OWOX BI calculates the value of each session. Thanks to this, you can calculate ROI/ROAS for new and returned users. And you can evaluate the effectiveness of advertising for different regions, product groups, landing pages, mobile versions, and applications.
- The service allows you to retrospectively update data on costs, users, and transactions already uploaded to GBQ. You can take into account the purchased orders, returns after purchase, or, for example, find out what the new subscriber was doing on your website 30 days before registration.
- Every day OWOX BI compares data in your BigQuery with the information from GA and reports significant discrepancies. You don’t lose any important data that third-party trackers cannot provide.
Read more about all the benefits of OWOX BI data collection:
How to set up raw data collection from a website in Google BigQuery using OWOX BI:
- Use your Google account to sign in.
- Choose what data you want to collect in GBQ, provide accesses, and create a data pipeline.
- Copy the tracking code and post it on the website in a way that is convenient for you.
You can download detailed instructions on how to set up uploading raw data from the website to the BigQuery.
Guide on How to Export Raw Data from Google AnalyticsDownload now
Use Google Analytics APIs
Google Analytics lets you retrieve information using APIs. In particular, the Core Reporting API allows you to access dimensions and metrics for your chosen reporting view from outside of the GA interface.
Moreover, there’s a workaround that will enable you to access data for each hit and facilitate data integration. This can be achieved using API functionality together with custom dimensions in Google Analytics.
Custom dimensions can be used to capture, analyze, and visualize information that isn’t presented in Google Analytics by default. You can use custom dimensions as keys for combining information from GA and other systems, as well as to enhance your reports with information that’s relevant to your business. For example, you can save the User ID from your database and use it for integrating offline and online actions.
Examples of custom dimensions:
- Hit timestamp — a hit-scoped custom dimension that captures the exact timestamp when the hit happened, in the yyyy-mm-ddThh: mm: ss format with the timezone offset.
- Session ID — a session-scoped custom dimension that collects a unique, random value, used to identify hits that belong to the same session.
- Client ID — a session-scoped custom dimension that collects the unique value assigned to the client’s device from the _ga cookie.
- User ID — a hit-scoped custom dimension that collects the value representing a user who has logged in to your website, allowing you to identify all the sessions and hits of this particular user.
Credit for this examples list goes to Simo Ahava for his great post on improving data collection with custom dimensions and Google Tag Manager.
Why is an API not a perfect solution?
Will the Google Analytics API solve the problem of sampling? It depends on how much traffic your website gets. If traffic isn’t too high and you choose a short reporting period, sampling can be avoided. On the other hand, you’ll have to run hundreds of queries to get unsampled data.
Since the information will be exported from Google Analytics, all GA data processing conditions, including the compatibility of dimensions and metrics and the data processing time, will also apply. But for startups and small projects, the API might work as a temporary solution. Simo Ahava has a great post on his blog where he explores what can be considered flaws of the Google Analytics data schema.
There are also limits and quotas specific to the Google Analytics API, such as the number of dimensions and metrics in a query and the amount of data you can extract per day.
Moreover, you’ll need space to store the exported information. This brings us to a more sophisticated approach that will surely prove advantageous for your business.
BigQuery export for Google Analytics 360
Google Analytics 360 is not a cheap tool, but you get what you pay for and more. In addition to benefiting from the advanced features of the paid platform, Google Analytics 360 users can export raw hit- and session-level data from Google Analytics to Google BigQuery via native integration.
There are two export options you can choose between:
- Data exported continuously. With this option, each day you get one file exported with the previous day’s Google Analytics data and three other files exported with data for the current day. Data from linked Google services is also available.
- Export data continuously. This option allows you to get fresher data with exports to Google BigQuery every 10 to 15 minutes. In this case, Google BigQuery bills an additional $0.05 for each gigabyte of data processed. Note that data from services linked to Google Analytics — such as DoubleClick for Publishers, AdSense, and AdX — can only be exported for the previous day, on a daily basis.
When you initially link a Google Analytics view to Google BigQuery, Google Analytics 360 will automatically export 10 billion hits or 13 months’ worth of historical data into BigQuery. Isn’t that a great deal for those who have been struggling with sampling all this time?
And here’s the cherry on top: Google Analytics 360 users receive $500 per month in credit to cover the cost of importing, storing, and processing data in Google BigQuery.
Build your own connector
You can alternatively clone the hits you’re sending to Google Analytics and process that cloned information somewhere outside of GA. For this, you could try storing hits on your own servers or using a cloud-based solution. Hit-scoped data alone won’t give you source, medium, or campaign data, nor will it give you ad cost or location information. However, this approach will allow you to get raw hits as soon as they’re sent from your website and use them for purposes that don’t require session-level data — such as sending out timely transactional emails and identifying issues with website performance.
If you don’t want to spend time and money on designing your own connector, a team of OWOX BI analysts can help you. Sign up for a demo to request a meeting and find out the details about how OWOX can meet your business needs.
Where to store collected data
Whether you’re a small startup or a large enterprise, there are a number of factors to consider when choosing a data storage system. Whichever option you choose, here’s a quick outline of what you should look for:
- Data processing capabilities. Collecting raw data is good and all, but if you’re not able to process it and extract the information you need, this data will be of no use to you.
- The ability to scale flexibly based on your business requirements. As your business grows, you’ll want your warehouse to adapt accordingly.
- High-security standards. You have to be confident that your precious data is protected and fully under your control.
- Reasonable cost.
Luckily for all of us, there’s no need to reinvent the wheel, especially since good services are already out there like Google BigQuery, a Google Cloud Platform-based data warehouse designed for data analytics.
Why Google BigQuery?
Google BigQuery allows storing and processing billions of rows (that’s gigabytes and petabytes of data!) using the SQL-like syntax. Incredible processing speed? Check. Scalability? Check. Unparalleled data security? Check. The service provides everything you need for advanced analysis of huge amounts of data.
Google BigQuery is a paid service, but you only pay for the amount of data stored and processed. The first 10 gigabytes stored and 1 terabyte processed per month are free. After that, Google charges $0.02 for each gigabyte stored and $5 for each terabyte processed. According to the terms of service at the time of writing, new BigQuery users also get a $300 credit to spend over 12 months.
Why isn’t Google BigQuery a perfect solution?
If you’re completely new to GBQ, you might have to wrap your head around how information is organized in this service.
First of all, keep in mind that GBQ supports nested and repeated fields. Since Google Analytics data is organized into a hierarchical structure of hits, sessions, and users, you might need to learn how to query the data to access values from these nested and repeated fields.
Check out these convenient references for JOIN and FLATTEN clauses: you might need to use them a lot. Another thing to keep in mind when using BigQuery is that some metrics available in the Google Analytics interface won’t be calculated automatically, such as total users and total events.
Exporting raw data from Google Analytics is easier than it might seem at first glance. Whether you decide to invest in an out-of-the-box solution or create your own is up to you. Just don’t let this great asset go unused.
Leverage the data you collect. Look for new insights. Integrate. Experiment. Know the pulse of your website, and connect with customers when they need you most. And remember that you can always ask questions in the comments section below. We’ll gladly respond!
And if you want to learn more about Google Analytics and other analytics tools, subscribe to our newsletter. Every month you’ll get useful tips for modern marketers and analysts.
Why you need to gather raw unsampled dataGoogle Analytics is an undisputed leader among web analytics services. It’s free, easy to work with, and it provides insights about the key KPI of online businesses. However, there are limitations in the system that prevent you from getting deeper into the data and exploring it from all sides.
1. The data you see in Google Analytics reports is always aggregated, and this process is beyond control.
2. Sampling, which can seriously distort your data and lead to wrong business decisions.
3. Reports can contain only a limited number and only specific combinations of parameters and key figures.
4. Limit on a number of lines.
5. Data processing time — If you use a free version of Google Analytics, you need to wait up to 24-48 hours for the system to complete data processing.
Why Google BigQuery?- The service allows storing and processing billions of rows (that’s gigabytes and petabytes of data!) using the SQL-like syntax.
- Incredible processing speed.
- Unparalleled data security.
4 ways to gather raw data1. OWOX BI Pipeline.
2. Use Google Analytics APIs.
3. BigQuery export for Google Analytics 360.
4. Build your own connector.