Content
- How to Restore Deleted BigQuery Tables
- How to Restore an Accidentally Deleted View
- How to Restore a Deleted Dataset
- 7 Tips to Prevent Similar Problems in The Future
- 1. Create Error Logs
- 2. Don't Name a Table and View the Same as a Dataset
- 3. Double-check the Name of any Table You Want to Delete
- 4. Delete Tables via SQL Queries, not via the User Interface
- 5. Add the Entity Type to the Dataset, Table, and View Names
- 6. Implement backup procedures
- 7. Manage Access Control
- How to Reduce the Scale of the Problem with Deleted Data if It Happens Again
How to Restore Data Accidentally Deleted from Google BigQuery
Vlada Malysheva, Creative Writer @ OWOX
Maxim Guzev, Digital Analyst at OWOX BI
Have you ever accidentally deleted an important table, view, or whole dataset? If you did, you probably understand the setback of not being able to proceed with tables and queries once data has been deleted.
This guide will walk you through the data recovery process to ensure you can continue your work without interruption.
We’ve crafted an in-depth guide to assist you in restoring any unintentionally erased data, enabling further analysis without feeling stressed out.
Note: This article was updated in July 2024 to include the most recent practices in BigQuery data restoration, ensuring that our content remains up-to-date and practical and sticks to the latest changes in Google BigQuery - the best DWH for storing and processing marketing data.
How to Restore Deleted BigQuery Tables
First things first, BigQuery retains deleted data for a certain period, typically 7 days, depending on the storage type and table settings. Also, the data can be restored at the table or dataset level.
Understanding the backup and restore capabilities of BigQuery is crucial for effective data management.
So, to restore a deleted table, two conditions should be met:
- The table was deleted less than seven days ago;
- You know the table name as well as the name of the dataset from which it was deleted.
To restore a table, click here to open the Cloud Shell command line in the BigQuery interface
Enter the following command:
bq cp dataset_id.table_id@-3600000 dataset_id.new_table_id
Example of the same command with table and dataset names, for example:
bq cp reports.transactions@-3600000 reports.transactions_new
Where:
- reports.transactions — is the dataset and the table that has been deleted.
- reports.transactions_new — is the dataset and the name of the table within it where you want to restore the data, potentially overwriting an existing table with this restored data.
- @-3600000 — a distance in the past (3,600,000 milliseconds ago) when the table you are looking for still existed.
For example, if a table was deleted 60 minutes ago, it’s sufficient to set the time interval @ -3600000, which is an hour ago (60 seconds × 60 minutes × 1000).
After entering the request, authorization is required.Click the Authorize button:
After the command is successfully run, the table will be restored:
You’ll see text similar to this:
Waiting on bqjob_r4ca30008c2e3147d_0000017af0d58e5e_1 ... (0s) Current status: DONE Table 'summer-drive-112011:OWOXBI_Reports_Restore.test_table@-600000' successfully copied to 'summer-drive-112011:OWOXBI_Reports_Restore.test_table'
Additionally, the Google Cloud console can also be used for restoring tables, offering an alternative method for managing table snapshots and overwriting existing tables with restored data.
While discussing the fundamentals of restoring deleted data in BigQuery, it’s important to note that managing your expenses on this platform is equally crucial.
Google has recently ramped up the pricing for on-demand queries in Google BigQuery. This might seem alarming, but have no fear!
We’ve got you covered. Our team has created an easy-to-follow checklist that will help you control and optimize your costs on the Google Cloud platform.
Don’t let the price spike hinder your data operations. Continue mastering BigQuery while keeping your budget in check.
Download our optimization guide now and confidently navigate your way through the Google Cloud platform.
Uncover in-depth insights
How to control and optimize costs for Google BigQuery
Download nowBonus for readers
How to Restore an Accidentally Deleted View
The method outlined above does not apply to restore a view in BigQuery; it’s designed only for table restoration.
Now, let’s shift our focus toward how you can restore a remote view in BigQuery. There’s a different procedure to this compared to restoring a table.
It wouldn’t take much of your time. Ensuring data integrity is essential when restoring views to maintain the accuracy and reliability of your data.
Google support recommends using Logs Explorer to restore a deleted view.
To find queries that updated a remote view, run the following query in Google Cloud Platform logs:
resource.type="bigquery_resource" protoPayload.methodName="tableservice.update" protoPayload.serviceData.tableUpdateRequest.resource.tableName.tableId="view_name"
Where:
- tableservice.update — is the command to display the view update in logs
- view_name — the name of the view
In the Select time range settings, set the period during which changes could be made to the view (for example, one year):
When the command is executed, all queries that update the view you are looking for will be displayed.
Select the last timestamp query:
Copy the query from the log and recreate the view.
In addition, you can find the query that created the view you were looking for (before that, we searched for queries that updated the view).
To do this, use the following command (just replace value with your details):
resource.type="bigquery_resource" protoPayload.methodName="tableservice.insert" protoPayload.serviceData.tableInsertRequest.resource.tableName.tableId="query_name_VIEW"
We recommend that you set the longest possible period to search the logs for the required entry.
If a view was created before logs began to be recorded, it won’t be possible to restore it. In this case, the tableservice.update command we described above can help.
How to Restore a Deleted Dataset
If you delete a BigQuery dataset, you won’t be able to restore it directly.
1. First, you must create a new BigQuery dataset with the same name.
2. Then, you have to restore tables and views that were in the deleted BigQuery dataset, and move them to the new BigQuery dataset.
Adhering to a data retention policy can help prevent accidental data loss and ensure compliance with data management standards.
Note: In the first 24 hours after deleting a BigQuery dataset, you cannot create a new BigQuery dataset with the same name, but you can still find the names of tables and views that were part of the deleted BigQuery dataset. However, the tables and views themselves will no longer be visible.
You can find views from a deleted BigQuery dataset by name by searching in BigQuery:
The list of tables cannot be found through search, as tables don’t have a common component in their names.
To find a table name from a deleted BigQuery dataset, use the following SQL query.
In the query, specify the period during which the tables can be modified (creation_time BETWEEN).
As a result, the query will return tables to which changes have been made within what period. These will be the tables from the deleted BigQuery dataset.
SELECT
*
FROM (
SELECT
query,
user_email,
CONCAT(destination_table.project_id,".",destination_table.dataset_id,".",destination_table.table_id) AS destination_table,
COUNT(job_id) AS job_ids,
MAX(creation_time) AS lastUpdateDate
FROM
region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 180 DAY)
AND TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 0 DAY)
AND state = 'DONE'
AND CONCAT(destination_table.project_id,".",destination_table.dataset_id,".",destination_table.table_id)
LIKE "%OWOXBI_Reports.%"
GROUP BY
1,
2,
3
ORDER BY
5 DESC)
As a result of the request, you’ll receive the following information:
- query — the text of the query that made the change to destination_table
- user _email — the user who started the destination_table update request
- destination_table — the table that has been updated
- job_ids — the number of query starts during the specified time interval (creation_time BETWEEN …)
- lastUpdateDate — the last time the query was started (MAX(creation_time) AS lastUpdateDate)
Twenty-four hours after you delete a BigQuery dataset, you can create a new BigQuery dataset with the same name and begin restoring tables and views.
Restore a Deleted Dataset Using Time Travel Window
In Google’s BigQuery, datasets that have been recently deleted can be recovered using the “Undelete Dataset“ feature, available as a part of their pre-General Availability offerings.
This feature allows users to restore a dataset to its state at the time of deletion, including all objects, properties, and security settings within the dataset’s time travel window.
Open the Explorer Panel: Navigate to the BigQuery interface and open the Explorer panel. Click on your project name to expand the list and view available datasets.
Choose a Dataset: Select the dataset you wish to configure. Expand the "Actions" option for the selected dataset.
View and Edit Dataset Details: Click “Open” to access the dataset’s details and select “Edit details” to modify settings.
Set Time Travel Window: Under “Advanced options,” choose the desired duration for the "Time Travel window." Click “Save” to apply your settings.
Google BigQuery offers various data recovery tools, including the 'Undelete Dataset' feature, to help users recover recently deleted datasets.
Restoration Limitations
- Security references that no longer exist are not restored.
- Links in other datasets referencing the deleted dataset are not automatically re-established.
- Business tags associated with the dataset have not been restored.
- Materialized views and authorized resources (views, datasets, routines) require manual refresh and reauthorization.
Implementing data loss prevention measures can help mitigate the risks associated with the limitations of the restoration process.
Steps to Undelete a Dataset
- In the Google Cloud console, navigate to the BigQuery page.
- Use the query editor to execute the UNDROP SCHEMA statement: UNDROP SCHEMA DATASET_ID;—replace DATASET_ID with your specific dataset's ID.
- Specify the dataset's location.
- Run the query.
UNDROP SCHEMA DATASET_ID;
Possible Errors
- ALREADY_EXISTS: Occurs if a dataset with the same name exists in the target region.
- NOT_FOUND: The dataset is outside the time travel window or does not exist.
- ACCESS_DENIED: Insufficient permissions to perform the undelete operation.
Access BigQuery Data at Your Fingertips
Make BigQuery corporate data accessible for business users. Easily query data, run reports, create pivots & charts, and enjoy automatic updates
7 Tips to Prevent Similar Problems in The Future
Preventing future data loss in BigQuery is just as important as knowing how to restore it. You can protect your data by following some of these best practices:
Implementing robust data protection strategies is essential to safeguard your data and prevent future data loss.
1. Create Error Logs
Error logs: Set up alerts and error log collection in your Google BigQuery project when you start working on the project.
These allow you to quickly identify if and when data has been deleted unintentionally.
This will give you precious time to rectify the situation before it becomes a critical issue.
Additionally, these logs can be used for efficient problem-solving and identifying trends in data loss. Effective data monitoring can help you quickly identify and address issues, minimizing the impact of data loss.
2. Don't Name a Table and View the Same as a Dataset
One of the fundamental principles of good data management revolves around effective naming conventions.
When working with Google BigQuery, or any other data platform for that matter, it’s crucial to differentiate between your datasets, tables, and views appropriately. This is where naming plays a pivotal role.
Establishing clear naming conventions is crucial for maintaining an organized and efficient database structure.
Essentially, if you name a table or view the same as a dataset, it can lead to mismatches and confusion not only for yourself but for anyone else who might be involved in maintaining or accessing the database.
Imagine working through a massive dataset only to find inconsistencies due to overlapping names. It’s bound to add unnecessary complications to your work.
So, what can you do?
- Firstly, establish a distinct naming pattern for your datasets, tables, and views. This will enable you to easily distinguish one from the other and avoid any mix-ups. For example, consider prefixing names with ds_ for datasets, tbl_ for tables, and vw_ for views.
- Secondly, consistently follow this pattern when creating new tables, views, or datasets. This way, even if someone new joins your team, they’ll quickly understand your database structure.
- Lastly, make sure to explain your naming convention in the documentation. Clear communication is a key aspect of effective data management.
Naming your tables and views the same as a dataset may sound like a small error, but small errors have a way of snowballing into larger issues.
Just like in all aspects of your work, attention to detail can make or break the effectiveness of your operations.
Also, make sure each part of your database has a clear, distinct name. This way, you can ensure your work with Google BigQuery is as smooth and efficient as possible.
3. Double-check the Name of any Table You Want to Delete
We can’t stress enough the importance of verifying the table name before hitting that delete button.
One minor typo can potentially lead to devastating results.
Now, you might be wondering, “I’m always cautious. This won’t happen to me.”
But believe us when we say that even the most meticulous of us can make mistakes.
So, just take those few extra seconds to double-check. It’s a small step, but it could save you from a massive headache down the line.
We guarantee you won’t regret it. Incorporating data validation steps can help prevent accidental deletions and ensure the accuracy of your data.
4. Delete Tables via SQL Queries, not via the User Interface
With Google BigQuery, you’ve got a couple of options when it comes to deleting a table.
But, as a wise data manager, it’s recommended to embrace the practice of deleting tables via queries, not using the user interface.
Why, you might ask? Well, let me illuminate.
Deleting tables via queries provides a certain level of transparency and intuitiveness.
This approach logs a record of the executed query, enabling you to trace the deletion queries if necessary.
Moreover, the operation tends to be more immediate and direct rather than navigating through the graphical interface. For instance, to delete a table via a query, you would use the ‘DROP TABLE’ SQL command like this:
DROP TABLE `project_name.dataset_name.table_name`
Executing queries directly provides a clear record of actions taken, enhancing traceability and accountability.
With this approach, the responsibility is directly on you, hence the necessity to be cautious. Inaccurate input can lead to the deletion of unwanted tables.
On the flip side, deleting tables via the user interface, while seemingly easy, can fall short in aspects of traceability and mistake-proofing. Sure, the UI provides a convenient “delete table” button, but this doesn’t log a traceable record of the action. To top it off, there’s always the chance of clicking in the wrong place and unintentionally deleting another table. Whoops!
So, long story short, adopt and maintain the habit of deleting BigQuery tables via executing SQL queries — it may just save you from some forthcoming migraines and, of course, data losses.
You can also execute the following query to delete the view:
DROP VIEW `project_name.dataset_name.view_name`
5. Add the Entity Type to the Dataset, Table, and View Names
By adding the entity type to your dataset, table, and view names, you are helping to clarify the function of these elements in the context of the whole system. It’s like labeling drawers in a giant cabinet.
Imagine this: you are working in a shared workspace with hundreds of other data scientists, each having unique datasets, tables, and views.
Without a proper naming convention, it would be downright chaotic - and who wants to sift through the clutter?
Implementing data classification can help you organize and manage your data more effectively.
An example of an ideal naming convention might look like:
- OWOXBI_report_dataset
- OWOXBI_report_table
- OWOXBI_report_view
Make Your Corporate BigQuery Data Smarter in Sheets
Transform Google Sheets into a dynamic data powerhouse for BigQuery. Visualize your data for wise, efficient, and automated reporting
6. Implement backup procedures
Regular Backups: Ensure regular and timely backups of your BigQuery data. This should be a non-negotiable habit.
If you configure automatic backups, it will be much easier to restore the data.
Keep in mind the importance of maintaining a balance between the frequency of your backups and the storage costs.
Developing a comprehensive data backup strategy is essential for ensuring data availability and recovery.
Ensure Backup Redundancy: Your backup strategy should not be limited to a single copy in the same location as the source.
It’s always best to have redundant copies of your backups, preferably in diverse geographical locations.
This way, in the event of a site failure, your data remains safe and recoverable. Cloud storage solutions often make this simpler by offering easy automation and redundancy options.
Test Your Backup Strategy: A backup is only as good as a successful restore. Regularly test your recovery procedures to make sure they’re working as expected.
This way, when disaster strikes, you know you have a reliable solution to fall back on. Regular testing also allows you to familiarize yourself with the process so you’re not trying to figure things out in a panic.
7. Manage Access Control
Access Control: Restrict writing and deleting permissions on your BigQuery setup to only those who need it.
Everyone who interacts with your BigQuery data should be trained on the importance of data integrity and the potential drawbacks of data deletion.
Effective access management helps protect your data from unauthorized access and potential breaches.
Moreover, use the principle of least privilege when you assign roles to users.
This means providing only the minimum permissions necessary to accomplish the task at hand.
By setting these limits, you limit the risk of accidental or malicious data manipulation or deletion.
This includes not just direct data deletion but also accidental overwriting of queries or any changes that may lead to data loss.
Taking these actions can significantly minimize the risk of future data loss in BigQuery. Remember, proactive protection is the most effective strategy against data loss.
How to Reduce the Scale of the Problem with Deleted Data if It Happens Again
Do not change queries in Schedule and Google Apps Script; the next day, restore all views and tables, after which Schedule and Google Apps Script will work correctly.
Having a robust data recovery plan in place can help you quickly recover from data loss and minimize its impact.
FAQ
-
Can I recover deleted data in Google BigQuery?
Yes, it is possible to recover deleted data in Google BigQuery using the Point-In-Time Recovery feature. -
How does Point-In-Time Recovery work in Google BigQuery?
Point-In-Time Recovery allows you to restore your datasets to a previous point in time using backups that are taken every 24 hours. -
How long does it take to restore a dataset using Point-In-Time Recovery in Google BigQuery?
The time taken to restore a dataset using Point-In-Time Recovery in Google BigQuery depends on the size of the dataset and the amount of data that needs to be restored. It may take anywhere from a few minutes to several hours.