How to Restore Data Accidentally Deleted from Google BigQuery

icon Get in-depth insights

How to control and optimize costs for Google BigQuery

icon Get in-depth insights

How to control and optimize costs for Google BigQuery

Have you ever accidentally deleted an important table, view, or whole dataset in Google BigQuery? If you did, you probably understand the setback of not being able to proceed with tables and queries once data has been deleted. 

We've crafted an in-depth guide to assist you in restoring any unintentionally erased data, enabling further analysis without feeling stressed out.

Note: This article was updated in January 2024 to include the most recent practices in BigQuery data restoration, ensuring that our content remains up-to-date, and practical, and sticks to the latest changes in Google BigQuery - the best DWH for marketing data.

How to restore a deleted table

First things first, BigQuery retains deleted data for a certain period of time, typically 7 days, depending on the storage type and table settings. Also, the data can be restored at the table or dataset level.

So in order to restore a deleted table two conditions should met: 

  • The table was deleted less than even days ago;
  • You know the table name as well as the name of the dataset from which it was deleted.

To restore a table, open the Cloud Shell command line in the Google BigQuery interface:

Enter the following command:

bq cp mydataset.mytable@-3600000 mydataset.newtable

Example of the same command with table and dataset names:

bq cp OWOXBI_Reports.123_Transactions_withModels@-13600000 OWOXBI_Reports_Restore.123_Transactions_withModels

Where:

  • OWOXBI_Reports.123­_Transactions­_withModels — is the dataset and the table that has been deleted.
  • OWOXBI_Reports_Restore.123­_Transactions­_withModels — is the dataset and the name of the table within it where you want to restore the data.
  • @-13600000 — a distance in the past (13,600,000 milliseconds ago) when the table you are looking for still existed. For example, if a table was deleted 30 minutes ago, it’s sufficient to set the time interval @ -3600000, which is an hour ago (60 seconds ×60 minutes × 1000).

After entering the request, authorization is required. Click the Authorize button:

After the command is successfully run, the table will be restored:

You’ll see text similar to this:

Waiting on bqjob_r4ca30008c2e3147d_0000017af0d58e5e_1 ... (0s) Current status: DONE   
Table 'summer-drive-112011:OWOXBI_Reports_Restore.test_table@-600000' successfully copied to 'summer-drive-112011:OWOXBI_Reports_Restore.test_table'

While discussing the fundamentals of restoring deleted data in BigQuery, it's important to note that managing your expenses on this platform is equally crucial. Google has recently ramped up the pricing for on-demand queries in Google BigQuery. This might seem alarming, but have no fear!

We've got you covered. Our team has created an easy-to-follow checklist that will help you control and optimize your costs on the Google Cloud platform. 

Don't let the price spike hinder your data operations. Continue mastering BigQuery while keeping your budget in check. Download our guide now and confidently navigate your way through the Google Cloud platform.

Uncover in-depth insights

How to control and optimize costs for Google BigQuery

Download now

Bonus for readers

How to control and optimize costs for Google BigQuery

How to restore a deleted view

The method outlined above is not applicable to restoring a view in BigQuery; it's primarily designed for table restoration. 

Now, let's shift our focus toward how you can restore a remote view in BigQuery. There's a different procedure to this compared to restoring a table. It wouldn't take much of your time.

Google support recommends using Logs Explorer to restore a deleted view.

To find queries that updated a remote view, run the following query in Google Cloud Platform logs:

resource.type="bigquery_resource"
protoPayload.methodName="tableservice.update"
protoPayload.serviceData.tableUpdateRequest.resource.tableName.tableId="custom_events_attribution_VIEW"

Where:

  • tableservice.update — is the command to display the view update in logs
  • custom_events_attribution_VIEW — the name of the view

In the Select time range settings set the period during which changes could be made to the view (for example, one year):

When the command is executed, all queries that updated the view you are looking for will be displayed. Select the last timestamp query:

Copy the query from the log and recreate the view.

In addition, you can find the query that created the view you were looking for (before that, we searched for queries that updated the view). To do this, use the following command:

resource.type="bigquery_resource"
protoPayload.methodName="tableservice.insert"
protoPayload.serviceData.tableInsertRequest.resource.tableName.tableId="query_name_VIEW"

We recommend that you set the longest possible period to search the logs for the required entry.

If a view was created before logs began to be recorded, it won’t be possible to restore it. In this case, the tableservice.update command we described above can help.

How to restore a deleted dataset

If you delete a dataset, you won’t be able to restore it. You must create a new dataset with the same name, restore the tables and views that were in the deleted dataset, and move them to the new dataset.

In the first 24 hours after deleting a dataset, you cannot create a new dataset with the same name, but you can still find the names of tables and views that were part of the deleted dataset. However, the tables and views themselves will no longer be visible.

You can find views from a deleted dataset by name by searching in BigQuery:

The list of tables cannot be found through search, as tables don’t have a common component in their names.

To find a table name from a deleted dataset, use the following SQL query. In the query, specify the period during which the tables can be modified (creation_time BETWEEN). As a result, the query will return tables to which changes have been made within what time period. These will be the tables from the deleted dataset.

SELECT
  *
FROM (
  SELECT
    query,
    user_email,
    CONCAT(destination_table.project_id,".",destination_table.dataset_id,".",destination_table.table_id) AS destination_table,
    COUNT(job_id) AS job_ids,
    MAX(creation_time) AS lastUpdateDate
  FROM
    region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  WHERE
    creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 180 DAY)
    AND TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 0 DAY)
    AND state = 'DONE'
    AND CONCAT(destination_table.project_id,".",destination_table.dataset_id,".",destination_table.table_id) LIKE "%OWOXBI_Reports.%"
  GROUP BY
    1,
    2,
    3
  ORDER BY
    5 DESC)

As a result of the request, you’ll receive the following information:

  • query — the text of the query that made the change to destination_table
  • user _email — the user who started the destination_table update request
  • destination_table — the table that has been updated
  • job_ids — the number of query starts during the specified time interval (creation_time BETWEEN …)
  • lastUpdateDate — the last time the query was started (MAX(creation_time) AS lastUpdateDate)

Twenty-four hours after you delete a dataset, you can create a new dataset with the same name and begin restoring tables and views.

7 Tips to prevent similar problems in the future

Preventing future data loss in BigQuery is just as important as knowing how to restore it. You can protect your data by following some of these best practices: 

1. Create error logs

Error logs: Set up alerts and error log collection in your Google BigQuery project when you start working with the project.

These allow you to quickly identify if and when data has been deleted unintentionally. This will give you precious time to rectify the situation before it becomes a critical issue. Additionally, these logs can be used for efficient problem-solving and identifying trends in data loss. 

2. Do not name a table and view the same as a dataset

One of the fundamental principles of good data management revolves around effective naming conventions. When working with Google BigQuery, or any other data platform for that matter, it's crucial to differentiate between your datasets, tables and views appropriately. This is where naming plays a pivotal role. 

Essentially, if you name a table or view the same as a dataset, it can lead to mismatches and confusion not only for yourself but for anyone else who might be involved in maintaining or accessing the database. Imagine working through a massive dataset, only to find inconsistencies due to overlapping names. It’s bound to add unnecessary complications to your work. 

So, what can you do? 

  1. Firstly, establish a distinct naming pattern for your datasets, tables, and views. This will enable you to easily distinguish one from the other and avoid any mix-ups. For example, consider prefixing names with ds_ for datasets, tbl_ for tables, and vw_ for views.
  2. Secondly, consistently follow this pattern when creating new tables, views, or datasets. This way, even if someone new joins your team, they'll quickly understand your database structure.
  3. Lastly, make sure to explain your naming convention in the documentation. Clear communication is a key aspect of effective data management.

Naming your tables and views the same as a dataset may sound like a small error, but small errors have a way of snowballing into larger issues. Just like in all aspects of your work, attention to details can make or break the effectiveness of your operations. 

Also, make sure each part of your database has a clear, distinct name. This way, you can ensure your work with Google BigQuery is as smooth and efficient as possible.

Automate your digital marketing reporting

Manage and analyze all your data in one place! Access fresh & reliable data with OWOX BI — an all-in-one reporting and analytics tool

Start Free Trial
Automate your digital marketing reporting

3. Double-check the name of any table you want to delete from Google BigQuery

We can't stress enough the importance of verifying the table name before hitting that delete button.

One minor typo can potentially lead to devastating results.

Now, you might be wondering, "I'm always cautious. This won't happen to me."

But believe us when we say, even the most meticulous of us can make mistakes.

So, just take those few extra seconds to double-check. It's a small step, but it could save you from a massive headache down the line. We guarantee: you won't regret it..

4. Delete tables via queries, not via the user interface

With Google BigQuery, you've got a couple of options when it comes to deleting a table. But, as a wise data manager, it's recommended to embrace the practice of deleting tables via queries, not using the user interface. Why, you might ask? Well, let me illuminate. 

Deleting tables via queries provides a certain level of transparency and intuitiveness. This approach logs a record of the executed query, enabling you to trace the deletion queries if necessary. Moreover, the operation tends to be more immediate and direct rather than navigating through the graphical interface. 

For instance, to delete a table via a SQL query, you would use the 'DROP TABLE' SQL command like this: 

DROP TABLE `project_name.dataset_name.table_name`

With this approach, the responsibility is directly on you, hence the necessity to be cautious. Inaccurate input can lead to the deletion of unwanted tables. 

On the flip side, deleting tables via the user interface, while seemingly easy, can fall short in aspects of traceability and mistake-proofing. Sure, the UI provides a convenient "delete table" button, but this doesn't log a traceable record of the action. To top it off, there's always the chance of clicking in the wrong place and unintentionally deleting another table. Whoops! 

So, long story short, adopt and maintain the habit of deleting BigQuery tables via executing SQL queries—it may just save you from some forthcoming migraines and, of course, data losses.

You can also Execute the following query to delete the view:

DROP VIEW `project_name.dataset_name.view_name`

5. Add the entity type to dataset, table, and view names

By adding the entity type to your dataset, table, and view names, you are helping to clarify the function of these elements in the context of the whole system. It's like labeling drawers in a giant cabinet. Imagine this; you are working in a shared workspace with hundreds of other data scientists, each having unique datasets, tables, and views. Without a proper naming convention, it would be downright chaotic - and who wants to sift through the clutter? 

An example of an ideal naming convention might look like: 

  1. OWOXBI_report_dataset
  2. OWOXBI_report_table
  3. OWOXBI_report_view

6. Implement backup procedures

Regular Backups: Ensure regular and timely backups of your BigQuery data. This should be a non-negotiable habit. If you configure automatic backups, it will be much easier to restore the data. Keep in mind the importance of maintaining a balance between the frequency of your backups and the storage costs.   

Ensure Backup Redundancy: Your backup strategy should not be limited to a single copy in the same location as the source. It's always best to have redundant copies of your backups, preferably in diverse geographical locations. This way, in the event of a site failure, your data remains safe and recoverable. Cloud storage solutions often make this simpler by offering easy automation and redundancy options. 

Test Your Backup Strategy: A backup is only as good as a successful restore. Regularly test your recovery procedures to make sure they're working as expected. This way, when disaster strikes, you know you have a reliable solution to fall back on. Regular testing also gives you the opportunity to familiarize yourself with the process, so you’re not trying to figure things out in a panic. 

Save 70+ hours on data preparation

Spend time reaching your monthly KPIs instead of collecting the data or building reports

Start Free Trial
Automate your digital marketing reporting

7. Manage access control

Access Control: Restrict write and delete permissions on your BigQuery setup to only those who need it. Everyone who interacts with your BigQuery data should be trained on the importance of data integrity, and the potential drawbacks of data deletion.  

Moreover, use the principle of least privilege when you assign roles to users.

This basically means providing only the minimum permissions necessary to accomplish the task at hand. By setting these limits, you limit the risk of accidental or malicious data manipulation or deletion. This includes not just direct data deletion, but also accidental overwriting of queries or any changes that may lead to data loss. 

Taking these actions can significantly minimize the risk of future data loss in BigQuery. Remember, proactive protection is the most effective strategy against data loss.

How to reduce the scale of the problem if it happens again

  • Do not change queries in Schedule and Google Apps Script; the next day, restore all views and tables, after which Schedule and Google Apps Script will work correctly.
  • If the view or table was involved in setting up the attribution model, you will need to restart the attribution calculation after restoring the data to its previous location.

Hassle-free data analysis and reporting

Easily collect, prepare, and analyze marketing data. Stay on top of your marketing performance

Start Free Trial
Automate your digital marketing reporting

FAQ

Expand all Close all
  • Can I recover deleted data in Google BigQuery?

    Yes, it is possible to recover deleted data in Google BigQuery using the Point-In-Time Recovery feature.
  • How does Point-In-Time Recovery work in Google BigQuery?

    Point-In-Time Recovery allows you to restore your datasets to a previous point in time using backups that are taken every 24 hours.
  • How long does it take to restore a dataset using Point-In-Time Recovery in Google BigQuery?

    The time taken to restore a dataset using Point-In-Time Recovery in Google BigQuery depends on the size of the dataset and the amount of data that needs to be restored. It may take anywhere from a few minutes to several hours.

icon Get in-depth insights

How to control and optimize costs for Google BigQuery

icon Get in-depth insights

How to control and optimize costs for Google BigQuery