BigQuery data structure in Google: How to get started with cloud storage

4
551
Downloads
959.24 Kb

Google BigQuery is a cloud storage service that allows you to collect all your data in one system and easily analyze it using SQL queries. For data to be convenient to work with, it should be structured correctly. In this article, we’ll explain how to create tables and datasets for uploading to Google BigQuery.

Contents

Datasets: What they are and how to create one

To use Google BigQuery, you need to create a project in Google Cloud Platform (GCP). Upon registration, you’ll receive access to all Cloud Platform products during a free trial period and $300 to spend on these products within the next 12 months.

After creating a project in Google Cloud Platform, you need to add at least one dataset to Google BigQuery.

A dataset is a top-level container that’s used to organize and control access to your data. In simple terms, it’s a kind of folder in which your information is stored in the form of tables and views.

Open your project in GCP, go to the BigQuery tab, and click Create Dataset:

In the window that opens, specify a name for the dataset and the shelf life of a table. If you want tables with data to be deleted automatically, specify when exactly. Or leave the default Perpetual option so that tables can only be deleted manually.

The Processing site field is optional. By default, it’s set to US multiregion. You can find more information about regions for storing data in the help section.

How to add a table to load data into Google BigQuery

After creating a dataset, you need to add a table to which data will be collected. A table is a set of rows. Each row consists of columns, which are also called fields. There are several ways to create a table in BigQuery depending on the data source:

  • Manually create an empty table and set up a data schema for it
  • Create a table using the result of a previously calculated SQL query
  • Upload a file from your computer (in CSV, AVRO, JSON, Parquet, ORC, or Google Sheets format)
  • Instead of downloading or streaming data, you can create a table that refers to an external source: Cloud Bigtable, Cloud Storage, or Google Drive.

In this article, we’ll take a closer look at the first method: creating a table manually.

Step 1. Select the dataset to which you want to add the table, then click Create Table:

Step 2. In the Source field, select Empty Table, and in the Table Type field, select Table in the native format of the target object. Come up with a name for the table.

Important: The names of datasets, tables, and fields must be in Latin characters and contain only letters, numbers, and underscores.

Step 3. Specify the table schema. The schema consists of four components: two mandatory (column name and data type) and two optional (column mode and description). Properly selected types and field modes will facilitate work with the data.

Example schema in BigQuery:

Column names
In the column name, you need to specify the parameter for which each column is responsible: date, user_id, products, etc. Titles can contain only Latin letters, numbers, and underscores (maximum 128 characters). Identical field names are not allowed, even if their case is different.

Data type

When creating a table in BigQuery, you can use the following field types:

Modes
BigQuery supports the following modes for table columns:

Note: It isn’t necessary to fill in the Mode field. If no mode is specified, the default column is NULLABLE.

Column descriptions
If you wish, you can add a short description (no more than 1024 characters) for each column in the table in order to explain what a particular parameter means.

When you create an empty table in BigQuery, you need to set the schema manually. This can be done in two ways:
1. Click the Add field button and fill in the name, type, and mode for each column.

2. Enter the table schema as a JSON array using the Edit as text switch.

In addition, Google BigQuery can use automatic schema detection when loading data from CSV and JSON files.

This option works on the following principle: BigQuery selects a random file from the source you specify, scans up to 100 rows of data in it, and uses the results as a representative sample. It then checks each field in the uploaded file and tries to assign a data type to it based on the values ​​in the sample.

When loading Google files, BigQuery can change the name of a column to make it compatible with its own SQL syntax. Therefore, we recommend that you upload tables with English field names; if the names are in Russian, for instance, the system will rename them automatically. For example:

If, when loading data, the names of the columns were entered incorrectly or you want to change the names and types of the columns in an existing table, you can do this manually. We’ll tell you how.

How to make changes to the table schema

After loading data into Google BigQuery, the table layout may be slightly different from the original. For example, a field name may have changed because of a character that’s not supported in BigQuery, or the field type may be INTEGER instead of STRING. In this case, you can manually adjust the schema.

How to change a column name

Using a SQL query, select all the columns in the table and specify a new name for the column that you want to rename. In this case, you can overwrite the existing table or create a new one. Request examples:

    
#legacySQL
Select
date,
order_id,
order___________ as order_type, -- new field name
product_id
from [project_name:dataset_name.owoxbi_sessions_20190314]
    
    
#standardSQL
Select
* EXCEPT (orotp, ddat),
orotp as order_id,
ddat as date
from `project_name.dataset_name.owoxbi_sessions_20190314`
    

How to make changes to the data type in the schema

Using a SQL query, select all data from a table and convert the corresponding column to a different data type. You can use query results to overwrite an existing table or create a new one. Request example:

    
#standardSQL
Select
CAST (order_id as STRING) as order_id,
CAST (date as TIMESTAMP) as date
from `project_name.dataset_name.owoxbi_sessions_20190314`
    

How to change the column mode

You can change the column mode from REQUIRED to NULLABLE as described in the help documentation. The second option is to export the data to Cloud Storage and from there return it to BigQuery with the correct mode for all columns.

How to remove a column from the data schema

Use the SELECT * EXCEPT query to exclude a column (or columns), then write the query results to the old table or create a new one. Request example:

    
#standardSQL
Select
* EXCEPT (order_id)
from `project_name.dataset_name.owoxbi_sessions_20190314`
    

In addition, there’s a second way to change the schema that’s suitable for all tasks described above: export data and load it into a new table. To rename a column, you can upload data from BigQuery to Cloud Storage, then export it from Cloud Storage to BigQuery in a new table or overwrite the data in the old table using the Advanced Parameters:

You can read about other ways to change the table structure in the Google Cloud Platform help documentation.

Export and import data from/to Google BigQuery

You can download data from and upload data to BigQuery without the help of developers via the interface or a special add-on from OWOX BI. Let’s consider each method in detail.

Import data via the Google BigQuery interface

To upload the necessary information to storage – for example, data about users and offline orders – open your dataset, click Create Table, and select the data source: Cloud Storage, your computer, Google Drive, or Cloud Bigtable. Specify the path to the file, its format, and the name of the table where the data will be loaded:

After you click Create Table, a table will appear in your dataset.

Export data via the Google BigQuery interface

It’s also possible to upload processed data from BigQuery – for example, to create a report through the system interface. To do this, open the desired table with data and click the Export button:

The system will offer two options: view data in Google Data Studio or upload it to Google Cloud Storage. If you select the first option, you’ll immediately go to Data Studio, where you can save the report.

Choosing to export to Google Cloud Storage will open a new window. In it, you need to specify where to save the data and in what format.

Export and import data using an add-on from OWOX BI

The free OWOX BI BigQuery Reports add-on allows you to quickly and conveniently transfer data directly from Google BigQuery to Google Sheets and vice versa. Thus, you don’t need to prepare CSV files or use paid third-party services.

For example, say you want to upload offline order data to BigQuery to build a ROPO report. For this, you’ll need to:

  1. Install the BigQuery Reports add-on in your browser.
  2. Open your data file in Google Sheets and in the tab Add-ons, select OWOX BI BigQuery Reports → Upload data to BigQuery.
  3.  In the window that opens, select your project and dataset in BigQuery and enter the desired name for the table. Also, select the fields whose values you want to load. By default, the type of all fields is STRING, but we recommend choosing the data type according to the context (for example, for fields with numeric identifiers, choose INTEGER; for prices, FLOAT):
  1. Click the Start Upload button and your data will be loaded into Google BigQuery​

You can also use this add-on to export data from BigQuery to Google Sheets – for example, to visualize data or share it with colleagues who don’t have access to BigQuery. For this:

  1. Open Google Sheets. In the Add-ons tab, select OWOX BI BigQuery Reports → Add a new report:
  1. Then enter your project in Google BigQuery and select Add new query.
  2. In the new window, insert your SQL query. This could be a query that uploads data from a table to BigQuery or a query that pulls and calculates the necessary data.
  3. Rename the query to make it easy to find and launch it by clicking the Save & Run button.

To upload data from BigQuery to Google Sheets on a regular basis, you can enable scheduled data updates:

  1. In the Add-ons tab, select OWOX BI BigQuery ReportsSchedule report:
  1. In the window that opens, set the time and frequency for report updates and click Save:​

Why collect data in Google BigQuery?

If you haven’t yet appreciated the benefits of Google BigQuery cloud storage, we recommend trying it. With the help of OWOX BI, you can combine data from your website, advertising sources, and internal CRM systems into BigQuery in order to:

OWOX BI has a free trial period during which you can try all the features of the service.

TRY OWOX BI

You might also like