Wondering how to turn your data into decisions quickly and easily? BigQuery ML lets analysts build, train, and deploy machine learning models directly in their data environment, without complex setups. It offers interactive tutorials and enables SQL users to build models using familiar tools, making machine learning more accessible.
By integrating with Vertex AI, BigQuery ML improves predictive accuracy, simplifies data workflows, and helps organizations efficiently manage machine learning projects at scale.
Note: This blog post was originally published in February 2025 and has been updated in May 2025 to reflect the latest BigQuery ML features, use cases, and best practices for streamlined machine learning.
BigQuery ML integrates machine learning directly into Google BigQuery, allowing users to create and deploy models without needing to move data to external services. This setup eliminates the need for data movement, speeding up development by letting data analysts and machine learning practitioners work within the familiar BigQuery environment.
By eliminating data transfers to other Google Cloud AI services, BigQuery ML simplifies workflows, boosts productivity, and streamlines the process of developing and deploying machine learning models.
BigQuery ML simplifies machine learning by enabling data analysts to build models within Google BigQuery using SQL, directly handling large data structures
BigQuery ML supports various model types for different use cases, including:
A model in BigQuery ML represents learned patterns from training data to identify trends and make predictions. To create a model, BigQuery ML uses the CREATE MODEL command, which initiates a Query Job that runs asynchronously within the BigQuery environment.
Prerequisites for BigQuery ML:
#1: GCP Account: Sign up and create a project.
#2: BigQuery API: Enable the BigQuery API in your project via the API Library.
Additionally, ensure that billing is active for your project and that you have the necessary roles – such as BigQuery Data Editor and BigQuery Job User, to create and manage models effectively. Once a model is created, BigQuery runs it as a Query Job. These jobs execute asynchronously, allowing users to continue working in the interface while the job runs in the background
These roles provide the permissions required to handle data and execute machine learning jobs within BigQuery.
In the BigQuery console, start by selecting or creating a dataset to hold your data and machine learning models.
To add a new dataset, click 'Create dataset' and enter the required information.
For this article, we'll use the pre-cleaned and processed 'Google Analytics Sample' dataset from the BigQuery Public Datasets, adding it to our project for convenience.
To create a logistic regression model in BigQuery, use the CREATE MODEL statement.
Example:
This example demonstrates how to define and train a model to predict whether a transaction occurred based on user session data from the Google Analytics sample dataset.
1CREATE OR REPLACE MODEL `bqml.new_sample_model` OPTIONS(model_type='logistic_reg')
2AS SELECT IF(totals.transactions IS NULL, 0, 1)
3AS purchase_made, -- Target variable (binary classification) IFNULL(device.browser, "")
4AS browser, -- Input feature: Browser type device.deviceCategory
5AS device_category, -- Input feature: Device category (mobile, desktop, tablet) IFNULL(geoNetwork.region, "")
6AS region, -- Input feature: Region IFNULL(totals.hits, 0)
7AS total_hits -- Input feature: Total hits count
8FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
9WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20171231' LIMIT 150000;
Here:
To train and deploy a BigQuery ML model, first prepare and load your dataset. Then, use SQL to create and train the model. Reviewing training statistics is essential to understand how the model was trained and how well it performed.
The training data is split into training and evaluation data to prevent overfitting.
Before training, clean and organize your dataset. Create a BigQuery dataset, load relevant data (e.g., GA4), and set the correct data location (US/EU) for compliance. Use exploratory analysis or Google Data Studio to better understand the data before training.
In BigQuery ML, you can enhance model training using the TRANSFORM clause for feature engineering and data preprocessing. Instead of relying solely on raw data, machine learning models achieve better accuracy with proper feature engineering and preprocessing.
In BigQuery ML, the TRANSFORM clause enables these steps directly within SQL to enhance model training.
This enables both feature selection and engineering, providing the model with more relevant and refined features. The TRANSFORM clause helps data analysts and BI professionals ensure that their models use the most meaningful data, leading to improved predictions and analysis accuracy.
To train a model in BigQuery ML, start by defining it with a standard SQL query. First, specify the model’s name, then define the features and any necessary data transformations using the TRANSFORM clause. Users can access detailed model information, including training statistics and options, to understand the model's capabilities.
Next, select the model type and set parameters, including the target column. Finally, execute an SQL query to select the input data from a table. The WHERE clause limits the number of rows scanned by the query, optimizing performance and reducing costs.
Model training time varies depending on complexity, from minutes for simple models to hours for more advanced models.
Track progress in the BigQuery UI's TRAINING tab.
BigQuery automatically calculates key performance metrics based on the model type, such as precision, recall, accuracy, F1 score, log loss, and ROC AUC. You can view these in the EVALUATION tab or via ML.EVALUATE queries.
Users can use the ML.EVALUATE function to compare the model's predicted values against actual data. Results from ML.EVALUATE can be further structured and analyzed using standard SQL queries.
These metrics help assess model performance. If the results are unsatisfactory, you can return to the model training step to fine-tune and improve performance.
Once you’re satisfied with your model’s performance, it’s time to use it for predictions on new, unseen data. In BigQuery, prepare a dataset with new data that wasn’t used during training, such as using a date range outside the training period.
Users can apply the trained model to predict metrics like purchases per user or transaction counts for new visitors using unseen data.
For example, predict the probability of the addedToCart event using data from December 31, 2020. Alternatively, you can export the model for deployment in production systems to generate online predictions.
BigQuery ML supports built-in models (e.g., linear regression, K-means) trained within BigQuery, and external models like deep neural networks via Vertex AI. By eliminating data movement, BigQuery ML speeds up development and reduces costs.
Pricing depends on the model type and usage, either through editions or on-demand. Prediction and evaluation functions are executed within BigQuery, with costs based on the model type and resources used.
BigQuery ML offers Enterprise and Enterprise Plus Editions with slot-based pricing for predictable costs on dedicated resources. Pricing and resource management vary between built-in and external models. By using familiar SQL tools, teams can manage resources efficiently and reduce costs.
Built-in Models: Built-in models like linear regression and K-means use shared BigQuery slots, enabling efficient resource use across ML and analytical tasks.
External Models: External models require ML_EXTERNAL reservations to prevent conflicts and ensure cost parity with BigQuery and other Google Cloud services.
BigQuery ML on-demand pricing is determined by the type of operation, including model creation, evaluation, inspection, and prediction. For iterative models, the CREATE MODEL statement is limited to 50 iterations, applying to both on-demand and edition pricing.
For time series models with auto-arima enabled, multiple candidate models are evaluated during training. The number of models can be controlled by the AUTO_ARIMA_MAX_ORDER setting. This impacts model creation costs, but for evaluation, inspection, and prediction, only the selected model is used, following standard query pricing.
BigQuery ML enables customers to create remote models using Vertex AI, Cloud AI APIs (Vision, NLP, Translate), or custom prediction endpoints. It integrates with Google Cloud resources for remote training and predictions. Pricing includes BigQuery processing costs and additional charges for services like text generation, translation, or image annotation.
For supervised tuning with Vertex AI LLMs, costs include BigQuery data processing, but GPU/TPU usage is billed separately. Use billing labels like billing_service = ‘bigquery_ml’ to track these charges across services.
For certain BigQuery ML model types, due to the complexity of their underlying algorithms, the bytes processed for billing may not be calculated until after the training process is complete.
After training, users can review resource usage and billing details through model statistics and metadata to understand any adjustments made.
This is because estimating the initial resource usage is more complex for these models, resulting in billing adjustments being made once training has finished.
To manage costs in BigQuery ML, choose the right pricing model, use dry runs for cost estimates, and manage training efficiently. Editions can help maintain predictable pricing. Speeding up development by simplifying queries or limiting training iterations also reduces costs without affecting performance.
Choose BigQuery ML Enterprise or Enterprise Plus Editions for predictable, slot-based pricing over on-demand. These editions help avoid unexpected costs and speed up development by ensuring consistent resource availability. Built-in models share BigQuery slots, while external models use ML_EXTERNAL reservations for efficient resource management and cost parity with other Google Cloud services.
Use dry runs to estimate data processed and potential costs before running queries or training models. This helps catch costly mistakes early, manage resources efficiently, and avoid unexpected expenses. Make sure your project ID is set correctly to track costs and resource usage during dry runs.
Register models in the Vertex AI Model Registry to track, compare, and manage versions efficiently. This avoids unnecessary retraining, reduces costs, and improves model lifecycle control in BigQuery ML. Make sure your Google Cloud account has the right permissions and billing setup to manage registrations effectively.
Vertex AI is Google’s platform for developing, deploying, and managing AI/ML models. It integrates with BigQuery to prepare training data, manage features in the Vertex AI Feature Store, and support various training methods like AutoML and custom training. Models trained in BigQuery ML can be registered in Vertex AI for easier management and deployment.
Benefits of Managing BigQuery ML Models in Vertex AI
Register BigQuery ML models in the Vertex AI Model Registry to manage, version, and deploy them without exporting. This enables online predictions through a unified interface, with no separate serving container needed. The query below shows how to register a BigQuery ML model in Vertex AI.
Before registering BigQuery ML models in Vertex AI, enable the Vertex AI API in your Google Cloud project. You can do this via the APIs & Services page, the BigQuery UI, or by running the gcloud CLI command below.
1gcloud --project PROJECT_ID services enable aiplatform.googleapis.com
To register BigQuery ML models in the Vertex AI Model Registry, the service or user account must have the Vertex AI Administrator role (roles/aiplatform.admin). You can grant this role using the Google Cloud CLI.
For a service account, use the following command:
1gcloud projects add-iam-policy-binding PROJECT_ID
2--member=serviceAccount:SERVICE_ACCOUNT_EMAIL
3--role=roles/aiplatform.admin --condition=None
For a user account, use the following command:
1gcloud projects add-iam-policy-binding PROJECT_ID
2--member=user:USER_ACCOUNT --role=roles/aiplatform.admin
3--condition=None
Use the MODEL_REGISTRY option in CREATE MODEL to register a BigQuery ML model in Vertex AI. Add VERTEX_AI_MODEL_ID and VERTEX_AI_MODEL_VERSION_ALIASES for version control. Once registered, models can be deployed and monitored with support from Vertex Explainable AI.
The following example shows how to create a time series forecasting model and register it during creation. Here’s a step-by-step guide:
Step 1: Import Time Series Data
First, upload the dataset to BigQuery and create a table. You can also use an existing dataset.
Step 2: Create a Time Series Model
Using BigQuery ML syntax, we will create an ARIMA time series model:
1CREATE OR REPLACE MODEL
2 `demo.transportation_revenue_insights`
3OPTIONS(MODEL_TYPE='ARIMA',
4 TIME_SERIES_TIMESTAMP_COL='service_date',
5 TIME_SERIES_DATA_COL='total_rides',
6 HOLIDAY_REGION='us',
7 MODEL_REGISTRY='vertex_ai',
8 VERTEX_AI_MODEL_ID='cta_ridership_forecast') AS
9SELECT
10 service_date, total_rides
11FROM
12 `demo.transportation_revenue_insights`;
Here:
Step 3: Evaluate the Model
Once the model is trained, evaluate it with this query:
1SELECT
2 *
3FROM
4 ML.EVALUATE(MODEL `transportation_revenue_insights`);
This returns model parameters, evaluation metrics, and the best ARIMA model based on AIC.
Step 4: Forecast with the Model
Use the ML.FORECAST function to forecast ridership for the next 7 days:
1SELECT
2 *
3FROM
4 ML.FORECAST(MODEL `transportation_revenue_insights`,
5 STRUCT(7 AS horizon));
This query generates a 7-day forecast along with confidence intervals.
By following these steps, you not only create and evaluate a time series model but also register it in Vertex AI for easy management and deployment.
To register an existing BigQuery ML model in the Vertex AI Model Registry, use the ALTER MODEL statement.
1ALTER MODEL [IF EXISTS] <model_name> SET OPTIONS
2(vertex_ai_model_id="VERTEX_AI_MODEL_ID");
The query shows the model options after it's registered.
Once your time series forecasting model is registered in the Vertex AI Model Registry, you can deploy it to predict outcomes like purchases, monitor its performance, and use Vertex Explainable AI to understand the key factors driving forecasts. This helps improve control and transparency over your model’s predictions.
With BigQuery ML models registered in Vertex AI, you can assign a model ID and version aliases for easier management and versioning.
Specify a Vertex AI Model ID
Use the VERTEX_AI_MODEL_ID option when creating a model. It must be lowercase. If not set, the BigQuery ML model ID is used by default. To change it later, you’ll need to delete and recreate the model.
Specify a Vertex AI Model Alias
Use VERTEX_AI_MODEL_VERSION_ALIASES to assign aliases to model versions. This lets you refer to versions easily without using exact IDs.
BigQuery ML streamlines model building by letting users execute machine learning models directly within BigQuery using SQL. In contrast, traditional ML often involves complex workflows, data movement, and external tools.
Here is a detailed comparison of the two:
By simplifying model creation and deployment within a familiar environment, BigQuery ML offers a more efficient, accessible approach to machine learning compared to traditional workflows.
The OWOX Reports Extension for Google Sheets empowers users to transform raw data into actionable insights with ease. By connecting Google Sheets directly to your data sources, such as Google BigQuery and Google Analytics, it enables users to automate report generation, streamline data analysis, and eliminate the need for manual data imports.
With intuitive dashboards, charts, and reports, business users and data analysts can track key metrics, uncover trends, and make data-driven decisions without leaving Google Sheets. The extension simplifies data management, making it accessible to non-technical users while providing in-depth insights.
BigQuery ML allows users to build and deploy machine learning models directly within Google BigQuery using SQL. It simplifies the machine learning process by letting data analysts and engineers create models on large datasets without moving data, leveraging SQL knowledge to perform predictive analytics.
With BigQuery ML, you can create a variety of models including linear and logistic regression, time series (ARIMA), k-means clustering, matrix factorization for recommendations, PCA for dimensionality reduction, and custom models through external training with Vertex AI for more advanced use cases.
Yes, BigQuery ML integrates with Vertex AI for advanced machine learning tasks. You can register BigQuery ML models in the Vertex AI Model Registry to use features like online predictions, model monitoring, automated retraining, and MLOps capabilities, enhancing the scalability and management of machine learning workflows.
Built-in models are trained within BigQuery using SQL, like linear regression or ARIMA, while external models are trained outside BigQuery using services like Vertex AI. External models offer more customization and advanced capabilities, but require more setup and integration with other Google Cloud services.
BigQuery ML offers two pricing models: on-demand (based on bytes processed) and editions (slot-based). Model creation, evaluation, and prediction incur costs based on the data processed and model complexity. External models have additional costs for resources like Vertex AI, with flexible options based on usage.
To reduce costs in BigQuery ML, consider using dry runs to estimate data processed, selecting appropriate pricing models (on-demand or editions), using built-in models when possible, and optimizing queries. Additionally, leveraging Vertex AI's features like model monitoring and retraining can help manage resources efficiently.