How to Export Raw Data from Google Analytics
In this article, we’ll explore why we need raw data, how we can gather it, and where to store it.
- Why we need to gather raw data
- Four ways to gather raw data
- Where to store collected data
- Summing up
Why we need to gather raw data
Raw data is, by definition, not processed in any way, which means you can analyze it as you please without having to wait for an analytics service to finish processing your hits and organizing them into reports.
Google Analytics (GA) is the undisputed leader among web analytics services. It’s powerful, it’s free, it’s easy to work with, and it gives you quick insights into key performance indicators for your online initiatives. However, Google Analytics has a number of limitations that make it difficult — sometimes even impossible — to delve deep into data and explore it from all possible angles.
- The data you see in Google Analytics reports is always processed, and much of that processing is beyond your control.
- Sampling, the nemesis of statistical accuracy, may seriously harm your decisions if you let it go unchecked.
- Reports can only be created with a limited number of and with certain combinations of dimensions and metrics. There are limits to the number of rows in your reports.
- Last but not least, if you’re using a standard free Google Analytics account, you have to wait as long as 24 to 48 hours for GA to finish processing your data.
Fortunately, most of these complications can be managed by collecting raw GA data. Sounds exciting, doesn’t it? Let’s find out how to get that data!
Four ways to gather raw data
OWOX BI Pipeline
Set up raw data export with OWOX BI Pipeline and all the hits you send to Google Analytics will be sent simultaneously to Google BigQuery, directly from your website, click by click, action by action. With this setup, hits become available in Google BigQuery within a matter of minutes.
Session data tables are computed using OWOX BI’s own sessionization algorithm; this process is described in detail in our Help Center. In this way, you can always get raw Google Analytics data, no matter how much traffic your website receives.
OWOX BI starts at $115 per month, but you can experience its benefits for free with a 14-day trial.
Collecting raw, unprocessed Google Analytics hits in Google BigQuery while also sending them to Google Analytics gives you even more benefits:
- While the number of custom dimensions is limited to 20 in the standard version of Google Analytics and 200 in the paid version, you can collect as many custom dimensions as you need in Google BigQuery by setting them up through Google Tag Manager.
- Reports can be created for an unlimited number of (and with any combination of) dimensions and metrics, for any period of time.
- User behavior data is sent to Google BigQuery in real time, without any hit limits.
- You can legally collect and process personal user data like email addresses and phone numbers.
Use Google Analytics APIs
Google Analytics lets you retrieve data using APIs. In particular, the Core Reporting API allows you to access dimensions and metrics for your chosen reporting view from outside of the GA interface.
Moreover, there’s a workaround that will enable you to access data for each hit and facilitate data integration. This can be achieved using API functionality together with custom dimensions in Google Analytics.
Custom dimensions can be used to capture, analyze, and visualize information that isn’t presented in Google Analytics by default. You can use custom dimensions as keys for combining information from Google Analytics and other systems, as well as to enhance your reports with information that’s relevant to your business. For example, you can save the User ID from your database and use it for integrating offline and online actions.
Examples of custom dimensions:
- Hit timestamp — a hit-scoped custom dimension that captures the exact timestamp when the hit happened, in the yyyy-mm-ddThh:mm:ss format with the timezone offset.
- Session ID — a session-scoped custom dimension that collects a unique, random value, used to identify hits that belong to the same session.
- Client ID — a session-scoped custom dimension that collects the unique value assigned to the client’s device from the _ga cookie.
- User ID — a hit-scoped custom dimension that collects the value representing a user who has logged in to your website,allowing you to identify all the sessions and hits of this particular user.
Credit for this examples list goes to Simo Ahava for his great post on improving data collection with custom dimensions and Google Tag Manager.
Why is an API not a perfect solution?
Will the Google Analytics API solve the problem of sampling? It depends on how much traffic your website gets. If traffic isn’t too high and you choose a short reporting period, sampling can be avoided. On the other hand, you’ll have to run hundreds of queries to get unsampled data.
Since the data will be exported from Google Analytics, all GA data processing conditions, including the compatibility of dimensions and metrics and the data processing time, will also apply. But for startups and small projects, the API might work as a temporary solution. Simo Ahava has a great post on his blog where he explores what can be considered flaws of the Google Analytics data schema.
There are also limits and quotas specific to the Google Analytics API, such as the number of dimensions and metrics in a query and the amount of data you can extract per day.
Moreover, you’ll need space to store the exported data. Which brings us to a more sophisticated approach that will surely prove advantageous for your business.
BigQuery export for Google Analytics 360
Google Analytics 360 is not a cheap tool, but you get what you pay for and more. In addition to benefiting from the advanced features of the paid platform, Google Analytics 360 users can export raw hit- and session-level data from Google Analytics to Google BigQuery via a native integration.
There are two export options you can choose between:
- Data exported continuously. With this option, each day you get one file exported with the previous day’s Google Analytics data and three other files exported with data for the current day. Data from linked Google services is also available.
- Export data continuously. This option allows you to get fresher data with exports to Google BigQuery every 10 to 15 minutes. In this case, Google BigQuery bills an additional $0.05 for each gigabyte of data processed. Note that data from services linked to Google Analytics — such as DoubleClick for Publishers, AdSense, and AdX — can only be exported for the previous day, on a daily basis.
When you initially link a Google Analytics view to Google BigQuery, Google Analytics 360 will automatically export 10 billion hits or 13 months’ worth of historical data into BigQuery. Isn’t that a great deal for those who have been struggling with sampling all this time?
And here’s the cherry on top: Google Analytics 360 users receive $500 per month in credit to cover the cost of importing, storing, and processing data in Google BigQuery.
Build your own connector
You can alternatively clone the hits you’re sending to Google Analytics and process that cloned data somewhere outside of GA. For this, you could try storing hits on your own servers or using a cloud-based solution. Hit-scoped data alone won’t give you source, medium, or campaign data, nor will it give you ad cost or location information. However, this approach will allow you to get raw hits as soon as they’re sent from your website and use them for purposes that don’t require session-level data — such as sending out timely transactional emails and identifying issues with website performance.
Where to store collected data
Whether you’re a small startup or a large enterprise, there are a number of factors to consider when choosing a data storage system. Whichever option you choose, here’s a quick outline of what you should look for:
- Data processing capabilities. Collecting raw data is good and all, but if you’re not able to process it and extract the information you need, this data will be of no use to you.
- The ability to scale flexibly based on your business requirements. As your business grows, you’ll want your warehouse to adapt accordingly.
- High security standards. You have to be confident that your precious data is protected and fully under your control.
- Reasonable cost.
Luckily for all of us, there’s no need to reinvent the wheel, especially since good services are already out there. I’m talking about Google BigQuery, a Google Cloud Platform-based data warehouse designed for data analytics.
Why Google BigQuery?
Google BigQuery allows storing and processing billions of rows (that’s gigabytes and petabytes of data!) using the SQL-like syntax. Incredible processing speed? Check. Scalability? Check. Unparalleled data security? Check. The service provides everything you need for advanced analysis of huge amounts of data.
Google BigQuery is a paid service, but you only pay for the amount of data stored and processed. The first 10 gigabytes stored and 1 terabyte processed per month are free. After that, Google charges $0.02 for each gigabyte stored and $5 for each terabyte processed. According to the terms of service at the time of writing, new BigQuery users also get a $300 credit to spend over 12 months.
Why isn’t Google BigQuery a perfect solution?
If you’re completely new to Google BigQuery, you might have to wrap your head around how data is organized in this service. First of all, keep in mind that Google BigQuery supports nested and repeated fields. Since Google Analytics data is organized into a hierarchical structure of hits, sessions, and users, you might need to learn how to query the data to access values from these nested and repeated fields. Check out these convenient references for JOIN and FLATTEN clauses: you might need to use them a lot. Another thing to keep in mind when using BigQuery is that some metrics available in the Google Analytics interface won’t be calculated automatically, such as total users and total events.
Exporting raw data from Google Analytics is easier than it might seem at first glance. Whether you decide to invest in an out-of-the-box solution or create your own is up to you. Just don’t let this great asset go unused.
Science is built up with facts, as a house is with stones. But a collection of facts is no more a science than a heap of stones is a house.
Leverage the data you collect. Look for new insights. Integrate. Experiment. Know the pulse of your website, and connect with customers when they need you most. And remember that you can always ask questions in the comments section below. We’ll gladly respond!