How to Export Raw Data from Google Analytics
Google Analytics is the undisputed leader among web analytics services. It’s powerful, it’s free, it’s easy to work with, and it gives you quick insights into the key performance indicators for your online initiatives. What could be better? However, there’s a number of limitations to the platform, which make it more difficult and sometimes even impossible to delve-deep into the the data and to explore it from all different angles.
Here’s the problem: the data you see in Google Analytics reports is always processed, and much of the processing is beyond your control. Sampling, the nemesis of statistical accuracy, may seriously harm your decisions if you let it go unattended. Reports can only be created with a limited number of certain combinations of dimensions and metrics. There are limits to the number of rows in your reports. Last but not least, if you’re using the standard free Google Analytics account, you have to wait up to 24-48 hours for Google Analytics to finish processing the data.
Fortunately, most of these complications can be managed by the means of collecting raw GA data. Raw data set is, by definition, not processed in any way, which means you can analyze it as you please, without having to wait for the analytics service to finish processing your hits and organize them into reports. Sounds exciting, doesn’t it? So, let’s find out how to get that data!
Use Google Analytics APIs
Google Analytics provides a way to retrieve the data programmatically, using one or multiple APIs. In particular, the Core Reporting API, which allows you to access dimensions and metrics for your chosen reporting view from outside the interface.
Will you really get raw data in this way? In a word, no. Any data you’re exporting from Google Analytics, via any of its reporting APIs, has been processed, aggregated, and sessionized. Simo Ahava has a great post in his blog, where he explores what can be considered as flaws of the Google Analytics data schema.
However, the API is mentioned here for a reason. For starters — sure, the data won’t be literally raw. However, by extracting smaller chunks of data for your chosen combinations of dimensions and metrics, you may achieve higher precision and deeper level of detail in reports than Google Analytics would allow you to. A practice definitely worth keeping in your analytics arsenal.
Moreover, there’s a workaround that will enable you to access the data for each hit, and also facilitate data integration. This can be achieved by utilizing the API functionality together with custom dimensions in Google Analytics.
Custom dimensions can be used to capture, analyze, and set out the information that is not presented by Google Analytics by default. You can use them as keys for combining the information from Google Analytics and other systems, as well as to enhance your reports with the information that is specifically relevant to your business. Such as, to identify each activity of each user entering your website.
- Hit timestamp — a hit-scoped custom dimension that captures the exact timestamp when the hit happened, in the yyyy-mm-ddThh:mm:ss format with the timezone offset.
- Session ID — a session-scoped custom dimension that collects a unique, random value, used to identify hits that belong to the same session.
- Client ID — a session-scoped custom dimension that collects the unique value assigned to the client’s device from the _ga cookie.
- User ID — a hit-scoped custom dimension that collects the value representing a user who has logged in to your website,allowing you to identify all the sessions and hits of this particular user.
Again,credit goes to Simo Ahava for his great post on improving data collection with custom dimensions and Google Tag Manager.
Sampling. Eventually, you’ll deal with it, or end up slicing the data into frustratingly small pieces. Since the data will be exported from inside Google Analytics, all the data processing conditions, including the compatibility of dimensions and metrics and the data processing time, would also apply. Not to mention the limits and quotas specific for the API, such as the number of dimensions and metrics in a query, or the amount of data you can extract per day. Take a look at another post in our blog, for more information about the limitations of the Google Analytics APIs. Moreover, you’ll need space to store the exported data. Which brings in a more sophisticated approach that will for sure prove advantageous for your business.
Build your own connector
Clone the hits you’re sending to Google Analytics, and deploy those clone troops somewhere outside Google Analytics. For this, you could try storing hits on your own servers, or resort to a cloud-based solution. Hit-scoped data alone won’t give you information about the source of each hit (no source, medium, or campaign data), nor will it give you the ad cost or geo information. However, the approach will allow you to get raw hits as soon as they’re sent from your website, and utilize them for purposes that don’t require session-level data. Such as, sending out timely transactional emails or identifying issues in the website performance.
Where to collect the data
Whether you’re a small start-up or a large-scale enterprise, there’s a number of factors to consider when choosing a data storage system. Whichever option you choose, here’s a quick outline of what you should look for:
- Data processing capabilities. Collecting raw data is good and all that, but if you’re not able to process it and extract the information you need, this data will be of no use to you.
- The ability to scale flexibly based on your business requirements. As your business grows, you’ll want your warehouse to adapt correspondingly.
- High security standards. You have to be confident that your precious data is protected and under your full control.
- Reasonable cost.
Luckily for all of us, there’s no need to reinvent the wheel, especially since good measures are already out there. I’m talking about Google BigQuery, a Google Cloud Platform-based data warehouse designed for data analytics purposes.
Why Google BigQuery
Google BigQuery allows storing and processing billions of rows (that’s gigabytes and petabytes of data!) using the SQL-like syntax. Incredible processing speed? Check. Scalability? Check. Unparalleled data security? Check. The service provides everything you need for advanced analysis of huge amounts of data.
Google BigQuery is a paid service, but you only pay for the amount of data stored and processed. The service charges $0.02 for each 1GB stored and $5 for 1TB processed per month. The first 10GB stored and 1TB processed a month are free of charge. According to the terms of service by the moment this article was written, new BigQuery users also get a $300 credit to spend over 12 months as a free trial.
If you are completely new to Google BigQuery, you might have to wrap your head around how the data is organized in the service. First of all, mind that Google BigQuery supports nested and repeated fields; since Google Analytics data is organized into a hierarchical structure of hits, sessions, and users, you might need to learn how to query the data, to access values from these nested or repeated fields. See reference for JOIN and FLATTEN clauses — you might need to use them a lot. Another thing to keep in mind when using GBQ is that some metrics available in the Google Analytics interface won’t be calculated automatically like, say, Total users, Total events etc.
Utilize BigQuery Export for Google Analytics 360
Google Analytics 360 is not a cheap tool, but you get what you pay for and beyond. In addition to the advanced features of the paid platform, Google Analytics 360 users can export the raw hit- and session-level data from Google Analytics to Google BigQuery via the native integration.
There are two export options you can choose between:
- Data exported continuously. This option allows getting fresher data, exported to Google BigQuery every 10 to 15 minutes. Google BigQuery bills additional $0.05 for each GB of data. Note that the data from the services linked to Google Analytics, such as DFP, AdSense or AdX, can only be exported for the previous day, on a daily basis.
- Data exported 3 times a day. With this option, you get 1 file exported daily with the previous day’s Google Analytics data, and 3 other files exported each day with the current day’s data. The data from from the linked Google services is also available.
Once you initially link a Google Analytics view to Google BigQuery, Google Analytics 360 will automatically export 10 billion hits or 13 months’ worth of historical data into Google BigQuery. Isn’t that a great deal for those who’ve been struggling with sampling all this time?
Here’s the cherry on top: Google Analytics 360 users receive a $500-per-month coupon to cover the cost of importing, storing and processing the data in Google BigQuery.
Try OWOX BI Pipeline
With OWOX BI Pipeline, all the hits you send to Google Analytics will be sent simultaneously to Google BigQuery, directly from the website, click by click, action by action. Because of this, each hit becomes available in Google BigQuery within just a matter of minutes. Session data tables are computed using the OWOX BI’s own sessionization algorithm — this process is described in detail in our Help Center. In this way, you always get Google Analytics raw data, no matter how much traffic your website receives. The service starts at $115/month and can be experienced for free with a 14-day trial.
Collecting the raw, unprocessed Google Analytics hits in Google BigQuery concurrently with sending them to Google Analytics gives you even more benefits:
- While the number of custom dimensions is limited to 20 in the standard Google Analytics and 200 in the paid version, you can collect up as many custom dimensions as you need in Google BigQuery by setting them up through Google Tag Manager.
- Reports can be created for unlimited number and with any combinations of dimensions and metrics, for any period of time.
- User behavior data is sent to Google BigQuery in real time, without any hit limits.
- You can legally collect and use personal user data like email addresses and phone numbers.
Exporting raw data from Google Analytics is easier than it might seem at a first glance. Whether you decide to invest in an out-of-the-box solution or create your own, is up to you. Just never let this great asset be a dead weight in your storage.
Science is built up with facts, as a house is with stones. But a collection of facts is no more a science than a heap of stones is a house.
Leverage the data you collect. Look for new insights. Integrate. Experiment. Know the pulse of your website, and connect with customers when then need you most. And remember that you can always ask your questions in the comment section below, we’d gladly respond!