All resources

What Is the REGEXP_EXTRACT_ALL Function in BigQuery?

The REGEXP_EXTRACT_ALL Function in BigQuery returns an array of all substrings that match a regular expression pattern within a given string.

The REGEXP_EXTRACT_ALL Function is useful when multiple patterns appear within the same text field, allowing analysts to extract all relevant matches at once instead of just the first occurrence. It’s a valuable tool for parsing, text mining, and data cleaning tasks involving unstructured data.

Use Cases for REGEXP_EXTRACT_ALL Function in BigQuery

The REGEXP_EXTRACT_ALL Function simplifies working with strings that contain repeated or multiple instances of data points.

  • Extracting Identifiers: Retrieve all product IDs, hashtags, or email addresses within a single text field.
  • Parsing Logs: Capture timestamps, error codes, or event labels from log entries.
  • Data Cleaning: Isolate numeric or keyword patterns from large text blobs.
  • Web Scraping: Extract all URL links, tags, or references from HTML content.

This function saves time by automating pattern-based extractions that would otherwise require multiple steps or manual filtering.

Practical Example of REGEXP_EXTRACT_ALL in BigQuery

Here’s an example of how REGEXP_EXTRACT_ALL works:

SELECT REGEXP_EXTRACT_ALL('order_123, order_456, order_789', r'order_\d+') AS extracted_orders;

Result:

["order_123", "order_456", "order_789"]


This query extracts all substrings that match the regular expression order_\d+. The result is an array of all matching order IDs found within the text. This ensures that every occurrence of the pattern is captured efficiently without overlap.

Key Features and Limitations of REGEXP_EXTRACT_ALL in BigQuery

The REGEXP_EXTRACT_ALL Function is powerful for pattern-based text extraction but comes with some considerations.
Key Features:

  • Extracts multiple non-overlapping substrings matching a regex pattern.
  • Supports advanced regex syntax for precise matching.
  • Returns an array, making it easy to work with multiple results in SQL.

Limitations:

  • Regex complexity can impact performance on large datasets.
  • Overly broad patterns may lead to unexpected matches.
  • Requires familiarity with regular expressions for accurate implementation.

Overall, it provides flexibility and control for analysts working with complex or repetitive text structures.

Best Practices for Testing REGEXP_EXTRACT_ALL Patterns

To use REGEXP_EXTRACT_ALL effectively, it’s essential to design and test regular expressions properly.

  • Test Before Running: Use tools like regex101 or BigQuery’s preview mode to verify patterns.
  • Keep Patterns Specific: Avoid overly generic regex patterns that capture unintended data.
  • Leverage Character Classes: Use [A-Za-z0-9] and quantifiers for precise control.
  • Document Regex Logic: Clearly describe what each pattern is meant to capture.
  • Combine with ARRAY Functions: Use ARRAY_LENGTH() or UNNEST() to further analyze extracted results.

These practices help maintain accuracy, readability, and scalability when implementing regex-based extractions.

Manage Complex Data Extraction with OWOX Data Marts

OWOX Data Marts Cloud helps analysts automate regex-based transformations like REGEXP_EXTRACT_ALL to handle text-heavy or unstructured datasets efficiently. It enables consistent SQL logic across data marts, schedules refreshes, and publishes extracted arrays directly to Google Sheets or BI tools. With centralized governance and no-code automation, OWOX ensures accuracy, scalability, and trust in every regex-driven extraction.

Enable Self-Service Analytics on top of your BigQuery Data
Get Started Free
Glossary terms

Learn more about analytics

Quick & easy explanations of the most important data terms

See all terms →
From the blog

Learn how teams ship analytics faster

Deep dives on data marts, governance, and modern reporting workflows.

See all articles →
What users are saying

Not testimonials. Comment threads.

From people who actually use the product. Each quote is attached to a specific claim.

A1
· re: warehouse integration
KP
Katya P.
BI Manager

Finally, a tool that doesn't ask business users to learn a new dashboarding UI. Our marketing team already knows Sheets. OWOX just delivers the right data.

C3
· re: governance
MR
Marco R.
Head of Data

Joinable data marts concept was the thing that sold us. We can now use the semantic layer without building one.

E7
· re: open source
JC
James C.
Data Analyst

Self-hosted the OSS version on Digital Ocean. Zero vendor lock-in. Contributed a Shopify connector back in week two.

Google Sheets in modern analytics

Google Sheets, powered by governed data marts

Google Sheets were never designed to be a system of record. With OWOX Data Marts, Sheets becomes a trusted analysis layer — powered by governed data marts defined upstream in your warehouse.

Business teams keep the flexibility they love
Data teams retain control over logic and definitions
No more fragile joins duplicated across spreadsheets
See how it works