A Comprehensive Guide to Using Window Functions in BigQuery

icon Get in-depth insights

Modern Data Management Guide

icon Get in-depth insights

Modern Data Management Guide

BigQuery window functions are essential for data analysts, business intelligence professionals, SQL developers, and data engineers. They provide powerful tools for performing advanced calculations and analysis of data. This guide will help you understand the importance of window functions in SQL querying for analytical tasks.

Learn how to perform calculations over a set of rows related to the current row, enhancing your ability to analyze data efficiently.

Window functions allow for detailed insights without collapsing rows into a single output. They simplify complex analyses that would otherwise require subqueries, making your data analysis more efficient and insightful.

Introduction to Window Functions in BigQuery

Window functions are powerful tools that allow you to perform calculations across table rows related to the current row.

Unlike aggregate functions, window functions do not collapse rows into a single output row, making them incredibly useful for analytical tasks.

They enable running totals, moving averages, and ranking without losing the row-level detail.

Why Are They Called Window Functions?

They are called window functions because they perform calculations across a "window" of table rows defined by the user.

This "window" can include a specific range of rows around the current row, creating a dynamic view for each calculation. The window can slide over the dataset, allowing for flexible and detailed analysis.

Full List of Functions That Can Be Used for Window Calculations

BigQuery offers a variety of window functions that allow you to perform complex calculations across rows of data, maintaining the individual row details. These functions can be categorized into three main types: aggregation functions, navigation functions, and numbering functions.

Navigation Functions

Navigation functions in BigQuery allow you to access data from other rows relative to the current row within the same window frame.

These functions enable you to perform tasks like retrieving values from previous or subsequent rows and computing specific percentile values.

Dive deeper with this read

Breaking Down Navigation Functions in BigQuery

Image for article: Breaking Down Navigation Functions in BigQuery

Function Name

    Function Description

FIRST_VALUE

Returns the value of the first row in the current window.

LAST_VALUE

Returns the value of the last row in the current window.

NTH_VALUE

Returns the value of the nth row in the current window.

LEAD

Returns the value of a row at a specified physical offset following the current row within the window.

LAG

Returns the value of a row at a specified physical offset preceding the current row within the window.

PERCENTILE_CONT

Calculates the specified percentile value with linear interpolation within the window.

PERCENTILE_DISC

Calculates the specified percentile value for discrete values within the window.

Numbering Functions

Numbering functions in BigQuery assign unique identifiers to rows within a specified window.

They are essential for tasks such as ranking rows, distributing rows into buckets, and assigning sequential numbers, facilitating detailed data analysis and reporting.

Function Name

    Function Description

RANK

Assigns a rank to each row within the partition, with gaps for ties.

DENSE_RANK

Assigns a rank to each row within the partition, without gaps for ties.

PERCENT_RANK

Computes the relative rank of a row as a percentage within the partition.

CUME_DIST

Calculates the cumulative distribution of a value within the partition.

NTILE

Divides rows into a specified number of buckets and assigns a bucket number to each row.

ROW_NUMBER

Assigns a unique sequential number to each row within the partition.

Dive deeper with this read

Numbering Functions in BigQuery: An In-depth Review

Image for article: Numbering Functions in BigQuery: An In-depth Review

Aggregation Functions

Aggregation functions in BigQuery perform calculations on a set of values to return a single scalar value.

They are used for summarizing data, such as computing totals, averages, and statistical measures across rows within a window.

Function Name

    Function Description

ANY_VALUE

Returns the value from any random row in the group.

ARRAY_AGG

Aggregates the values from the group into an array.

AVG

Calculates the average of non-NULL values in the group.

CORR

Computes the Pearson correlation coefficient between two columns, with the first column as the dependent variable.

COUNT

Counts the number of rows in the group.

COUNTIF

Counts the rows where the specified condition is TRUE.

COVAR_POP

Calculates the population covariance between pairs of numbers.

COVAR_SAMP

Calculates the sample covariance between pairs of numbers.

MAX

Returns the highest value from the non-NULL expressions in the group.

MIN

Returns the lowest value from the non-NULL expressions in the group.

ST_CLUSTERDBSCAN

Clusters geographies using the DBSCAN algorithm and assigns a cluster number to each row.

STDDEV_POP

Computes the population standard deviation of the values.

STDDEV_SAMP

Computes the sample standard deviation of the values.

STRING_AGG

Concatenates the non-NULL values in the group into a single STRING value.

SUM

Sums up the non-NULL values in the group.

VAR_POP

Computes the population variance of the values.

VAR_SAMP

Computes the sample variance of the values.

Dive deeper with this read

A Closer Look at BigQuery Aggregate Functions

Image for article: A Closer Look at BigQuery Aggregate Functions

A BigQuery Example of a Window Function Call

To demonstrate a BigQuery window function in action, let's consider an example involving sales employees in a company.

We have a table "EmployeeSales" with data as follows.

We'll use this JSON dataset below to illustrate how window functions can be used to perform advanced data analysis.

{'employee': 'Alice', 'region': 'North', 'sales': 150}
{'employee': 'Bob', 'region': 'North', 'sales': 200}
{'employee': 'Charlie', 'region': 'South', 'sales': 300}
{'employee': 'David', 'region': 'South', 'sales': 100}
{'employee': 'Eve', 'region': 'East', 'sales': 250}
{'employee': 'Frank', 'region': 'East', 'sales': 350}
{'employee': 'Grace', 'region': 'West', 'sales': 400}
{'employee': 'Hank', 'region': 'West', 'sales': 450}
{'employee': 'Ivy', 'region': 'West', 'sales': 300}

Here, we will use the RANK() window function to rank employees by their sales within each region.

SELECT
  employee,
  region,
  sales,
  RANK() OVER (
   PARTITION BY region 
   ORDER BY sales 
   DESC) AS sales_rank
FROM
  `owox-analytics.myDataset.employeeSales`;

Here:

  • SELECT: We select the employee, region, and sales columns to include them in our results.
  • RANK() OVER (PARTITION BY region ORDER BY sales DESC): This window function ranks employees within each region based on their sales, in descending order. The PARTITION BY clause ensures that ranking restarts within each region.
  • FROM owox-analytics.myDataset.employeeSales: Specifies the source table for our query.

In this example, we used the RANK() window function to rank sales employees within each region based on their sales performance. This powerful feature enables detailed and dynamic analysis without collapsing the data into summary rows, thus maintaining the granular details of each employee’s performance.

table

Access BigQuery Data at Your Fingertips

Make BigQuery corporate data accessible for business users. Easily query data, run reports, create pivots & charts, and enjoy automatic updates

Elevate Your Analytics

Understanding the Difference Between GROUP BY and Window Functions

GROUP BY aggregates data into a single row per group, summarizing data points like sums or averages. In contrast, window functions perform calculations across rows while maintaining the original row structure. You can compute cumulative totals, moving averages, and ranks without losing detailed row-level information.

GROUP BY is for summary results, while window functions allow for detailed, row-wise analysis. Window functions enable complex analytical tasks like running totals and moving averages, enhancing your data insights.

Syntax of BigQuery Window Functions Declaration

BigQuery window functions follow a specific syntax that allows for flexible and powerful data analysis. Understanding this syntax is crucial for effectively leveraging window functions in your queries. Below is the detailed structure of how to declare window functions in BigQuery.

OVER() Clause

The OVER() clause indicates that you're using a window function. You'd write your query as usual and include the aggregates you want alongside the other column names. Each aggregate is identified with the OVER() clause.

Example:

Suppose you wanted to extract employee sales results and show the average, highest, and lowest sales for each employee within the company.

Your query would look like this:

SELECT
  employee,
  sales,
  AVG(sales) OVER() AS average_sales,
  MIN(sales) OVER() AS lowest_sales,
  MAX(sales) OVER() AS highest_sales
FROM `owox-analytics.myDataset.employeeSales`;

Here:

  • SELECT: Selects the columns employee and sales from the EmployeeSales table.
  • AVG(sales) OVER(): Computes the average sales across the entire dataset.
  • MIN(sales) OVER(): Finds the lowest sales figure across the entire dataset.
  • MAX(sales) OVER(): Finds the highest sales figure across the entire dataset.
  • FROM owox-analytics.myDataset.employeeSales: Specifies the table from which the data is being queried.

Using the OVER() clause allows us to compute aggregate values such as average, minimum, and maximum sales without collapsing the data into a single row. This enables us to display these aggregate values alongside each employee's individual sales data.

PARTITION BY Clause

The PARTITION BY expression is straightforward to understand. It divides the rows into separate chunks (partitions), and the window function is then independently evaluated over each of these partitions. You can even use multiple expressions to partition on multiple fields at once.

Example:

Let's use an example to demonstrate the PARTITION BY clause, this time focusing on employee salary metrics within different company departments.

SELECT
  employee_id,
  department,
  salary,
  SUM(salary) 
  OVER(PARTITION BY department) AS total_salary_by_department,
  RANK() OVER(
   PARTITION BY department 
   ORDER BY salary DESC) AS rank_within_department
FROM `owox-analytics.myDataset.employee_data`;

Here:

  • SELECT: Selects the columns employee_id, department, and salary from the employee_data table.
  • SUM(salary) OVER(PARTITION BY department): Calculates the total salary for each department by partitioning the data by department.
  • RANK() OVER(PARTITION BY department ORDER BY salary DESC): Ranks employees within each department based on their salary in descending order.
  • FROM owox-analytics.myDataset.employee_data: Specifies the table from which the data is being queried.

In this example, the PARTITION BY clause creates partitions based on departments. This allows for a detailed employee salary analysis within departments, helping managers manage department budgets.

ORDER BY Clause

The ORDER BY clause within the OVER() function uses a different type of sliding window. When you use OVER(ORDER BY column_name), the window includes only those rows where the value of the specified column is less than or equal to that column’s value in the current row. This is particularly useful for calculating running totals and moving averages.

Example:

Let’s use an example involving cumulative sales in a retail environment. We have a table called daily_sales that records sales transactions by date.

sales_date

transaction_id

amount

2023-09-01

101

150.00

2023-09-01

102

200.00

2023-09-02

103

50.00

2023-09-03

104

300.00

2023-09-04

105

400.00

2023-09-05

106

250.00

2023-09-06

107

500.00

In this example, we want to calculate the cumulative sales amount by date.

SELECT
  sales_date,
  transaction_id,
  amount,
  SUM(amount) 
  OVER(ORDER BY sales_date) 
  AS cumulative_sales
FROM `owox-analytics.myDataset.daily_sales`;

Here:

  • SELECT: Selects the columns sales_date, transaction_id, and amount from the daily_sales table.
  • SUM(amount) OVER(ORDER BY sales_date): Calculates the cumulative sales amount up to each date by summing the amount column, ordered by sales_date.
  • FROM owox-analytics.myDataset.daily_sales: Specifies the source table.

Using the ORDER BY clause within the OVER() function allows for calculating cumulative metrics, such as running totals. In this example, it helps us track cumulative sales over time, providing a clear view of sales growth.

pipeline

Explore BigQuery Data in Google Sheets

Bridge the gap between corporate BigQuery data and business decisions. Simplify reporting in Google Sheets without manual data blending and relying on digital analyst resources availability

Simplify Analytics Now

Simple Practical Examples of Using Window Function Calls in BigQuery

Window functions in BigQuery enable powerful and flexible data analysis by performing calculations across sets of table rows. Below are some practical examples demonstrating the use of window functions in BigQuery.

Grand Total

The SUM() window function can be used to calculate the grand total of a specific column across all rows in a table. This is useful for obtaining a cumulative sum that spans the entire dataset without aggregating the data into a single row.

Example:

In this example, we will compute the grand total of sales for all departments within a company.

Here’s the sample data from the department_sales table:

item

sales

department

Laptop

150

Electronics

Tablet

120

Electronics

Smartphone

200

Electronics

Printer

80

Electronics

Gaming Console

90

Electronics

SELECT 
  item, 
  sales, 
  department, 
  SUM(sales) OVER() AS total_sales 
FROM `owox-analytics.myDataset.department_sales`;

Here:

  • SELECT: Selects the columns item, sales, and department from the DepartmentSales table.
  • SUM(sales) OVER(): Calculates the grand total of sales across all rows in the dataset.
  • FROM owox-analytics.myDataset.department_sales: Specifies the source table for the data.

This example demonstrates how to use the SUM() window function with the OVER() clause to compute the grand total of sales across all departments in a company. Maintaining the detailed data for each item allows us to analyze individual and total sales simultaneously.

Subtotal

The SUM() window function can compute subtotals for each category within a table. Using the PARTITION BY clause, the function calculates the sum of values for each partition. This is useful for summarizing data within specific groups while keeping individual row details.

Example

In this example, we will compute the subtotal of sales for each department within a company. This helps us understand each department's sales performance while maintaining detailed data for individual items.

Here’s the sample data from the department_sales table:

item

sales

department

Laptop

20

Electronics

Printer

15

Electronics

Mobile

30

Electronics

Tablets

25

Electronics

Gaming Console

10

Electronics

Monitor    

5

Electronics

SELECT 
  item, 
  sales, 
  department, 
  SUM(sales) OVER (
    PARTITION BY department
    ORDER BY sales
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS total_sales
FROM `owox-analytics.myDataset.department_sales`;

Here:

  • SELECT: Selects the columns item, sales, and department from the DepartmentSales table.
  • PARTITION BY department: Divides the result set into partitions based on each department.
  • ORDER BY sales: Orders the rows within each partition by sales.
  • ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Ensures the sum includes all rows within the partition.
  • FROM owox-analytics.myDataset.department_sales: Specifies the source table for the data.

This example demonstrates how to use the SUM() window function with the OVER() clause to compute subtotals for each department. By partitioning the data by department, we can calculate the total sales for each department while maintaining detailed sales data for individual items.

Cumulative Sum

The SUM() window function can compute a cumulative sum for each category within a table. By using the PARTITION BY and ORDER BY clauses, the function calculates the running total of values for each partition, ordered by a specified column.

Example

Let's use an example involving cumulative revenue for different product types within a company.

product_type

product

sales_date    

revenue

Electronics

Laptop

2023-09-01

1000.00

Electronics

Printer

2023-09-01

200.00

Electronics

Tablet

2023-09-02

500.00

Furniture

Desk

2023-09-03

300.00

Furniture

Chair

2023-09-04

150.00

Furniture

Sofa

2023-09-05

700.00

SELECT 
  product_type, 
  product, 
  sales_date, 
  revenue, 
  SUM(revenue) OVER (
    PARTITION BY product_type 
    ORDER BY sales_date 
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS cumulative_revenue
FROM `owox-analytics.myDataset.product_revenue`;

Here:

  • SELECT: Selects the columns product_type, product, sales_date, and revenue from the ProductRevenue table.
  • PARTITION BY product_type: Divides the result set into partitions based on each product type.
  • ORDER BY sales_date: Orders the rows within each partition by sales date.
  • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Ensures the sum includes all rows from the start of the partition up to the current row.
  • FROM owox-analytics.myDataset.product_revenue: Specifies the source table for the data.

This example demonstrates using the SUM() window function with the OVER() clause to compute cumulative revenue for each product type. By partitioning the data by product type and ordering it by sales date, we can track the running total of revenue for each product type

Moving Average

The AVG() window function is useful for calculating a moving average, which helps analyze trends over a specific range of rows.

This technique is particularly valuable for identifying patterns and smoothing out short-term fluctuations in data.

Example:

This example calculates the moving average salary for employees based on their employee IDs.

employee_name

department    

employee_id

salary

Alice

HR

1

5000

Bob

HR

2

5500

Charlie

IT

3

6000

David    

IT

4

6500

Eve

IT

5

7000

SELECT
  employee_name,
  department,
  salary,
  AVG(salary) 
  OVER (
   ORDER BY employee_id 
   ROWS BETWEEN 2 
   PRECEDING AND CURRENT ROW) 
   AS moving_avg_salary
FROM `owox-analytics.myDataset.employees`;

Here:

  • SELECT: Selects the columns employee_name, department, and salary from the employees table.
  • AVG(salary) OVER (ORDER BY employee_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW): Calculates the moving average salary over the current row and the two preceding rows, ordered by employee_id.
  • FROM owox-analytics.myDataset.employees: Specifies the source table for the data.

This query computes the moving average salary for each employee, providing insights into salary trends and fluctuations within the organization over time.

Number of Items within a Range

The COUNT() window function is effective for determining the number of items within a specific range or partition. This is particularly useful for understanding the distribution and size of groups within your data set, such as counting the number of employees in each department.

Example

This example calculates the number of employees within each department.

employee_name

department    

employee_id

salary

Alice

HR

1

5000

Bob

HR

2

5500

Charlie

IT

3

6000

David    

IT

4

6500

Eve

IT

5

7000

SELECT
  employee_name,
  department,
  salary,
  COUNT(*) OVER (
  PARTITION BY department) 
  AS num_employees
FROM employees;

Here:

  • SELECT: Selects the columns employee_name, department, and salary from the employees table.
  • COUNT(*) OVER (PARTITION BY department): Counts the number of employees within each department.
  • FROM owox-analytics.myDataset.employees: Specifies the source table for the data.

This query calculates the number of employees in each department, clearly understanding departmental size and distribution.

Rank Items

The RANK() window function assigns a rank to each row within a partition of a result set based on a specified order. This function is particularly useful for identifying items' relative standing. It helps highlight top performers and compare values within grouped data.

Example

This example ranks employees within each department based on their salary in descending order.

employee_name

department    

employee_id

salary

Alice

HR

1

5000

Bob

HR

2

5500

Charlie

IT

3

6000

David    

IT

4

6500

Eve

IT

5

7000

SELECT
  employee_name,
  department,
  salary,
  RANK() OVER (
   PARTITION BY department 
   ORDER BY salary DESC) AS rank
FROM employees;

Here:

  • SELECT: Selects the columns employee_name, department, and salary from the employees table.
  • RANK() OVER (PARTITION BY department ORDER BY salary DESC): Assigns a rank to each employee within their department based on their salary in descending order.
  • FROM owox-analytics.myDataset.employees: Specifies the source table for the data.

This query ranks employees within each department by their salary, helping to identify top earners and overall salary distribution.

report-v2

Make Your Corporate BigQuery Data Smarter in Sheets

Transform Google Sheets into a dynamic data powerhouse for BigQuery. Visualize your data for wise, efficient, and automated reporting

Transform Your Reporting

Number Rows

The ROW_NUMBER() window function assigns unique sequential integers to rows within a partition of a result set. This is useful for easier referencing and detailed analysis of individual rows, particularly when you need a clear identifier for each row within grouped data.

Example

This example demonstrates assigning row numbers to sales records within different sales regions for better organization and analysis.

sales_person

region

sales_id

revenue

John

North

1

1000

Jane

North

2

1200

Joe

South

3

1100

Jill

South

4

900

Jack

East

5

1500

Jenny

East

6

1300

SELECT
  sales_person,
  region,
  revenue,
  ROW_NUMBER() OVER (
   PARTITION BY region 
   ORDER BY sales_id) AS row_num
FROM `owox-analytics.myDataset.sale_person_data`;

Here:

  1. SELECT: Selects the columns sales_person, region, and revenue from the sales table.
  2. ROW_NUMBER() OVER (PARTITION BY region ORDER BY sales_id): Assigns a unique sequential number to each sales record within their region, ordered by sales_id.
  3. FROM owox-analytics.myDataset.sale_person_data: Specifies the source table for the data.

This query assigns row numbers to sales records within each region, providing a unique identifier for each row. This helps in organizing and referencing sales data within different regions for detailed analysis.

Run Percentile Calculations

The PERCENT_RANK() window function is used to determine percentiles within a dataset, which is useful for finding the relative standing of values, such as the 90th percentile of test scores. This can help identify the distribution of data and highlight key performance metrics.

Example

In this example, we calculate the percentile rank of sales revenue for each sales person within their region.

sales_person

region

sales_id

revenue

John

North

1

1000

Jane

North

2

1200

Joe

South

3

1100

Jill

South

4

900

Jack

East

5

1500

Jenny

East

6

1300

SELECT
  sales_person,
  region,
  revenue,
  PERCENT_RANK() OVER (
   PARTITION BY region 
   ORDER BY revenue) AS rank
FROM `owox-analytics.myDataset.sale_person_data`;

Here:

  • SELECT: Selects the columns sales_person, region, and revenue from the sales table.
  • PERCENT_RANK() OVER (PARTITION BY region ORDER BY revenue): Calculates the percentile rank of each sales person’s revenue within their region.
  • FROM owox-analytics.myDataset.sale_person_data: Specifies the source table for the data.

This query computes the percentile rank of each salesperson’s revenue within their region, providing insights into how individual revenues compare to others in the same region.

Lead and Lag Analysis

The LEAD() and LAG() window functions allow for comparing current values with previous and future values, which is useful for detecting changes or growth, such as in financial or event sequence data.

Example

In this example, we analyze the previous and next sales revenue for each sales person within their region.

sales_person

region

sales_id

revenue

John

North

1

1000

Jane

North

2

1200

Joe

South

3

1100

Jill

South

4

900

Jack

East

5

1500

Jenny

East

6

1300

SELECT
  sales_person,
  region,
  revenue,
  LAG(revenue, 1) 
  OVER (P
   ARTITION BY region 
   ORDER BY revenue) 
   AS prev_revenue,
  LEAD(revenue, 1) 
  OVER (
   PARTITION BY region 
   ORDER BY revenue) 
   AS next_revenue
FROM 
   `owox-analytics.myDataset.sale_person_data`;

Here:

  • SELECT: Selects the columns sales_person, region, and revenue from the sales table.
  • LAG(revenue, 1) OVER (PARTITION BY region ORDER BY revenue): Accesses the revenue of the previous row within the same region, ordered by revenue.
  • LEAD(revenue, 1) OVER (PARTITION BY region ORDER BY revenue): Accesses the revenue of the next row within the same region, ordered by revenue.
  • FROM owox-analytics.myDataset.sale_person_data: Specifies the source table for the data.

This query provides insights into each salesperson's previous and next sales revenue within their region, allowing for a comparative analysis of revenue progression.

First and Last Value Identification

The FIRST_VALUE() and LAST_VALUE() window functions return the first and last values in an ordered partition. This is useful for identifying key events, such as the first and last sales date of a product.

Example

Here is a table with data we will use to analyze and demonstrate the functions.

sales_person

region

sales_id

revenue

John

North

1

1000

Jane

North

2

1200

Joe

South

3

1100

Jill

South

4

900

Jack

East

5

1500

Jenny

East

6

1300

With the following syntax, we identify each region's first and last sales revenue.

SELECT
  sales_person,
  region,
  revenue,
  FIRST_VALUE(revenue) 
  OVER (PARTITION BY region ORDER BY revenue) 
  AS first_revenue,
  LAST_VALUE(revenue) 
  OVER (PARTITION BY region ORDER BY revenue) 
  AS last_revenue
FROM `owox-analytics.myDataset.sale_person_data`;

Here:

  • SELECT: Selects the columns sales_person, region, and revenue from the sales table.
  • FIRST_VALUE(revenue) OVER (PARTITION BY region ORDER BY revenue): Retrieves the first revenue within each region.
  • LAST_VALUE(revenue) OVER (PARTITION BY region ORDER BY revenue): Retrieves the last revenue within each region.
  • FROM owox-analytics.myDataset.sale_person_data: Specifies the source table for the data.

This query identifies the first and last sales revenue within each region, providing a range of revenues within the region to better understand revenue distribution.

Time-to-Event Analysis

The LEAD() function can be used to calculate the following time for an event, such as time to purchase after a marketing campaign. This is useful for understanding the effectiveness of events or campaigns.

Example

In this example, we calculate the time to the next promotion for employees within their department.

SELECT
  employee_name,
  department,
  promotion_date,
  LEAD(promotion_date, 1) 
  OVER (
   PARTITION BY employee_name 
   ORDER BY promotion_date) 
   AS time_to_next_promotion
FROM `owox-analytics.myDataset.employee_promotions`;

Here:

  • SELECT: Selects the columns employee_name, department, and promotion_date from the employee_promotions table.
  • LEAD(promotion_date, 1) OVER (PARTITION BY department ORDER BY promotion_date): Calculates the date of the next promotion.
  • FROM owox-analytics.myDataset.employee_promotions: Specifies the source table for the data.

This query calculates the date to the next promotion for employees within their department, providing insights into promotion timelines and the effectiveness of career advancement strategies.

Calculate Standard Deviation

The STDDEV() window function computes the standard deviation of values within a partition, which is useful for understanding data variability, such as salary distributions within a department.

Example

In this example, we calculate the standard deviation of salaries within each department to understand salary variability.

SELECT
  employee_name,
  department,
  salary,
  STDDEV(salary) 
  OVER (PARTITION BY department) 
  AS salary_stddev
FROM `owox-analytics.myDataset.employees`;

Here:

  • SELECT: Select the columns employee_name, department, and salary from the employees' table.
  • STDDEV(salary) OVER (PARTITION BY department): Computes the standard deviation of salaries within each department.
  • FROM owox-analytics.myDataset.employees: Specifies the source table for the data.

This query calculates the standard deviation of salaries within each department, providing insights into salary variability and helping to identify departments with higher salary dispersion.

pipeline

Make Your Corporate BigQuery Data Smarter in Sheets

Transform Google Sheets into a dynamic data powerhouse for BigQuery. Visualize your data for wise, efficient, and automated reporting

Transform Your Reporting

Advanced Use Cases with Window Functions in BigQuery

Window functions in BigQuery offer powerful tools for performing complex calculations and analyses over data partitions. These advanced use cases enable deeper insights and more nuanced data interpretations, which are essential for data-driven decision-making.

Using BigQuery Window Functions Between Dates

Many use cases for window functions involve calculating rolling metrics between dates, providing insights into trends over specific time periods. These functions are ideal for tasks like moving averages or cumulative sums within a defined range.

Example

In this example, we will calculate the total sales amount for each product in 3-day and 5-day sliding windows.

product_id

sale_date

sale_amount

1

2023-09-01

100

1

2023-09-02

150

1

2023-09-02

200

1

2023-09-04

250

1

2023-09-05

300

1

2023-09-01

400

2

2023-09-03

500

2

2023-09-06

600

2

2023-09-08

700

SELECT 
  product_id, 
  sale_date, 
  sale_amount, 
  SUM(sale_amount) OVER (
    PARTITION BY product_id 
    ORDER BY UNIX_DATE(sale_date) 
    RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
  ) AS three_day_rolling_sales,
  SUM(sale_amount) OVER (
    PARTITION BY product_id 
    ORDER BY UNIX_DATE(sale_date) 
    RANGE BETWEEN 5 PRECEDING AND CURRENT ROW
  ) AS five_day_rolling_sales
FROM `owox-analytics.myDataset.product_sales`;

Here:

  • SELECT: Selects the columns product_id, sale_date, and sale_amount from the product_sales table.
  • SUM(sale_amount) OVER (PARTITION BY product_id ORDER BY UNIX_DATE(sale_date) RANGE BETWEEN 3 PRECEDING AND CURRENT ROW): Calculates the cumulative sum of sales for each product over the last 3 days, including the current day.
  • SUM(sale_amount) OVER (PARTITION BY product_id ORDER BY UNIX_DATE(sale_date) RANGE BETWEEN 5 PRECEDING AND CURRENT ROW): Calculates the cumulative sum of sales for each product over the last 5 days, including the current day.
  • FROM owox-analytics.myDataset.product_sales: Specifies the source table for the data.

This query uses BigQuery window functions to calculate rolling sales totals within different time intervals. This technique can be extended to other metrics and time intervals to suit various analytical needs.

💡 Want to master date manipulation and analysis in BigQuery? Check out our latest guide for a comprehensive overview of DATE functions, including syntax, usage, and practical examples.

Dive deeper with this read

BigQuery Date Functions Explained

Image for article: BigQuery Date Functions Explained

Add QUALIFY Clause to Use Filters

In many cases, you might want to filter the results of a Window Function. This can be achieved using the QUALIFY clause in BigQuery. The QUALIFY clause filters the results of window functions, including only rows that meet the specified condition. This is particularly useful for refining data analysis and obtaining focused results from complex queries.

Example

In this example, we will filter and keep only the top 3 salespersons based on their total sales in each region.

SELECT
  salesperson,
  region,
  total_sales,
  RANK() OVER (
   PARTITION BY region 
   ORDER BY total_sales DESC) 
   AS position
FROM
  `owox-analytics.myDataset.salesperson_data`
QUALIFY position <= 3;

Here:

  • SELECT: Selects the columns salesperson, region, and total_sales from the sales table.
  • RANK() OVER (PARTITION BY region ORDER BY sales_amount DESC): Calculates the rank of each salesperson within their region based on the sales_amount, in descending order.
  • FROM owox-analytics.myDataset.salesperson_data: Specifies the source table for the data.
  • QUALIFY position <= 3: Filters the results to include only the top 3 salespersons in each region.

This query uses the QUALIFY clause to filter and retain only the top 3 salespersons based on their total sales within each region. By combining the RANK() window function with the QUALIFY clause, we can efficiently filter and analyze top-performing individuals or entities within specific groups.

Applying Named Windows in a Window Frame Clause

Named windows in BigQuery allow you to define a window frame clause once and reuse it across multiple window functions. This improves query readability and maintainability by avoiding redundancy. You can efficiently manage and standardize complex analytical calculations over defined data partitions by applying named windows.

Example

Let's say we want to analyze the sales performance of different products.

SELECT
  product_id,
  sale_date,
  sale_amount,
  LAG(sale_amount) OVER(PARTITION BY product_id ORDER BY sale_date) AS previous_sales,
  FIRST_VALUE(sale_amount) OVER(PARTITION BY product_id ORDER BY sale_date) AS first_sales,
  SUM(sale_amount) OVER(yearly_ordered_sales) AS cumulative_sum_sales,
  AVG(sale_amount) OVER(yearly_ordered_sales) AS cumulative_average_sales
FROM product_sales
WINDOW 
  yearly_ordered_sales AS (PARTITION BY product_id RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);

Here:

  • SELECT: Selects the columns product_id, sale_date, and sales_amount from the product_sales table.
  • LAG(sales_amount) OVER(PARTITION BY product_id ORDER BY sale_date): Retrieves the sales amount of the previous day for each product.
  • FIRST_VALUE(sales_amount) OVER(PARTITION BY product_id ORDER BY sale_date): Retrieves the amount of the first sale within the partition for each product.
  • SUM(sales_amount) OVER(yearly_ordered_sales): Computes the cumulative sum of sales for each product over the years.
  • AVG(sales_amount) OVER(yearly_ordered_sales): Computes the cumulative average sales for each product over the years.
  • WINDOW: Defines named windows yearly_ordered_sales. The yearly_ordered_sales window partitions the data by product_id.

This example demonstrates how to use named windows in BigQuery to efficiently perform multiple window function calculations. Defining and reusing named windows allows you to streamline your queries and maintain consistent analysis across different metrics.

Best Practices to Apply When Using Window Functions in BigQuery

Using window functions in BigQuery can significantly enhance your data analysis capabilities, but it's important to follow best practices to ensure optimal performance and accuracy. Implementing these strategies can lead to more efficient and maintainable queries.

Skip NULL Values in Navigation Functions

When using navigation functions in BigQuery, there are times when NULL values need to be ignored to ensure accurate results. For instance, if you're looking for the first and second available preceding value, you might want to ignore the NULLs. This can be achieved using the IGNORE NULLS value expression.

Example

In this example, we ignore NULL values when looking for each product's previous two non-null sales amounts.

SELECT
  product_id,
  sale_date,
  sale_amount,
  NTH_VALUE(sale_amount, 1 IGNORE NULLS) OVER (PARTITION BY product_id ORDER BY sale_date ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS previous_sales1,
  NTH_VALUE(sale_amount, 2 IGNORE NULLS) OVER (PARTITION BY product_id ORDER BY sale_date ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS previous_sales2
FROM `owox-analytics.myDataset.product_sales`;

This query demonstrates how to use the IGNORE NULLS value expression to skip NULL values in navigation functions, ensuring that the results reflect meaningful data points.

Include Only Necessary Columns and Expressions

When working with window functions, it’s essential to include only the necessary columns and expressions in your queries. This practice minimizes the data processed, reducing computational load and improving query performance. Focusing on relevant data ensures that the results are both efficient and accurate.

Example

In this example, we calculate the average salary within each department while including only the necessary columns.

SELECT
  department,
  salary,
  AVG(salary) OVER (PARTITION BY department) AS avg_salary
FROM `owox-analytics.myDataset.employees`;

By including only the necessary columns (department and salary), this query reduces the data processed and focuses on the relevant information needed to calculate each department's average salary. This practice enhances query performance and ensures efficient data analysis.

Use Filters to Reduce Data Processing

Applying filters before using window functions can significantly reduce the amount of data processed, leading to more efficient and faster queries. By filtering out unnecessary rows early in the query, you can minimize the computational load and focus only on relevant data.

Example

In this example, we filter sales data to include only transactions from date 2023-09-01 before calculating the cumulative sales for each product.

SELECT
  product_id,
  sale_date,
  sale_amount,
  SUM(sale_amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS cumulative_sales
FROM `owox-analytics.myDataset.product_sales`
WHERE sale_date = '2023-09-01';

By applying a filter to include only relevant transactions from 2023-09-01, this query reduces the amount of data processed by the window function, resulting in more efficient and faster computation.

table

Simplify BigQuery Reporting in Sheets

Easily analyze corporate data directly into Google Sheets. Query, run, and automatically update reports aligned with your business needs

Simplify Reporting Now

Prefer Window Functions for Problem-solving

Unlike aggregate functions, window functions allow you to perform calculations across sets of rows related to the current row without collapsing the result into a single output. This makes window functions ideal for tasks such as ranking, running totals, and moving averages, providing detailed insights and maintaining row-level detail.

Example

In this example, we use window functions to calculate running totals of sales amounts for each product.

SELECT
  product_id,
  sale_date,
  sales_amount,
  SUM(sales_amount) 
  OVER (PARTITION BY product_id ORDER BY sale_date) AS running_total
FROM `owox-analytics.myDataset.product_sales`;

This query demonstrates the use of window functions to solve the problem of calculating running totals. By using the SUM() window function, we maintain the granularity of each sale while providing cumulative sales amounts.

Be Explicit When Aliasing Columns and Naming Windows

Being explicit when aliasing columns and naming windows in BigQuery enhances the readability and maintainability of your SQL queries. Clear and descriptive aliases help to understand the purpose of each column and the logic behind window functions, making it easier to debug and optimize your code.

Example

In this example, we will analyze the sales data and use clear aliases and named windows to improve the query's readability and maintainability.

SELECT
  product_id,
  sale_date,
  sale_amount,
  SUM(sale_amount) OVER yearly_sales AS total_yearly_sales,
  AVG(sale_amount) OVER yearly_sales AS average_yearly_sales,
  SUM(sale_amount) OVER three_day_sales_window AS three_day_sales_total
FROM `owox-analytics.myDataset.product_sales`
WINDOW 
  yearly_sales AS (PARTITION BY product_id ORDER BY sale_date RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
  three_day_sales_window AS (PARTITION BY product_id ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW);

By being explicit in aliasing columns and naming windows, this query becomes more readable and easier to maintain. Descriptive aliases indicate the purpose of each calculated column. Named windows like yearly_sales and three_day_sales_window help to simplify the reuse of window definitions, making the query structure clear and logical.

Refer to Documentation for Accuracy

Referring to official documentation is crucial for ensuring accuracy and best practices when working with window functions in BigQuery. The documentation provides detailed information on syntax, function behavior, and edge cases, which helps write correct and efficient queries. Staying updated with the documentation ensures that you leverage the full potential of BigQuery's capabilities and avoid common pitfalls.

Understand the Order of Clause Execution

Understanding the order of clause execution in SQL queries is essential for writing efficient and accurate queries. In BigQuery, the execution order for various clauses (e.g., FROM, WHERE, GROUP BY, HAVING, WINDOW, SELECT, ORDER BY) impacts how the data is processed and filtered. Knowing this order helps in structuring queries correctly and avoiding logical errors.

Example

In this example, we use multiple clauses to filter, group, and order sales data while calculating cumulative sales using window functions.

Understanding the order of clause execution ensures that the query processes data correctly, leading to accurate results. Properly structuring queries according to this order helps avoid logical errors and optimizes performance.

SELECT
  product_id,
  sale_date,
  sale_amount,
  SUM(sale_amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS cumulative_sales
FROM `owox-analytics.myDataset.product_sales`
GROUP BY product_id, sale_date, sale_amount
ORDER BY product_id, sale_date;

Explore the Power of BigQuery Functions

BigQuery offers a rich set of functions beyond window functions. Exploring and mastering these functions can significantly enhance your data analysis capabilities.

  • Conditional Expressions: Used to perform logic-based operations and return values based on specified conditions (e.g., CASE, IF).
  • String Functions: Functions that manipulate and format string data (e.g., CONCAT, SUBSTRING, UPPER, LOWER).
  • Conversion Functions: Convert data from one type to another (e.g., CAST, SAFE_CAST).
  • Array Functions: Functions to create, manipulate, and query arrays (e.g., ARRAY_AGG, UNNEST, ARRAY_LENGTH).
  • Timestamp Functions: Functions that handle and manipulate timestamp data (e.g., CURRENT_TIMESTAMP, TIMESTAMP_ADD, FORMAT_TIMESTAMP).
  • Data Manipulation Language: SQL statements used to manage data within tables (e.g., INSERT, UPDATE, DELETE).
  • Datetime Functions: Functions that work with date and time data types (e.g., CURRENT_DATE, DATE_ADD, EXTRACT).

Discover Advanced Insights with the OWOX BI BigQuery Reports Extension

The OWOX BI BigQuery Reports Extension is a powerful tool designed to help you gain deeper insights from your data stored in BigQuery. This extension enhances your data analysis capabilities by offering advanced reporting features, intuitive data visualization, and seamless integration with your existing BigQuery datasets.

report-v2

Seamless BigQuery Integration in Sheets

Get real-time, automated, and deeply insightful reporting at your fingertips. Connect BigQuery, run reports, and prepare dashboards in your favorite Google Sheets

Enhance Your Sheets Now

Moreover, the OWOX BI BigQuery Reports Extension simplifies creating complex reports by providing pre-built templates and customizable dashboards. This means you can spend less time on data preparation and more time on analysis and strategy.

Whether you're tracking key performance indicators (KPIs), analyzing customer behavior, or monitoring sales performance, the OWOX BI BigQuery Reports Extension provides the tools you need to transform raw data into actionable insights, helping you stay competitive in a data-driven world.

FAQ

Expand all Close all
  • What are window functions in BigQuery?

    Window functions in BigQuery perform calculations across a set of table rows related to the current row without collapsing rows into a single output. They allow for advanced data analysis, such as ranking, running totals, and moving averages, making them ideal for detailed, row-wise data insights.

  • How does the OVER() clause work in BigQuery window functions?

    The OVER() clause defines the window, or set of rows, for a window function to operate on. It specifies how to partition and order the data rows for calculations. Without arguments, it applies the function to all rows; with arguments, it customizes the row set.

  • What is the difference between GROUP BY and window functions in BigQuery?

    GROUP BY aggregates data into a single row per group, summarizing data. Window functions, however, maintain the row structure while performing calculations across related rows. This allows for more detailed, row-level insights while still providing aggregate-like calculations.

  • Can you give an example of a practical use case for window functions in BigQuery?

    A practical use case is calculating a moving average of sales. Using the AVG() function with the OVER() clause, you can compute the average sales over the last few days for each row, providing trends without complex subqueries.

  • What are some common window functions used in BigQuery?

    Common window functions include ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), LAG(), FIRST_VALUE(), LAST_VALUE(), SUM(), AVG(), MIN(), and MAX(). These functions facilitate various analyses like ranking, value navigation, and cumulative calculations.

  • What are the best practices for using window functions in BigQuery?

    Best practices include:

    • Using filters to limit data processed.
    • Including only necessary columns.
    • Skipping NULL values in navigation functions.
    • Being explicit when aliasing columns.
    • Regularly referring to documentation for accurate syntax.
    • These practices enhance query performance and maintainability

icon Get in-depth insights

Modern Data Management Guide

icon Get in-depth insights

Modern Data Management Guide