CAST AS INT64 is an SQL type conversion that turns a value (like a string or float) into a 64-bit integer so your IDs, keys, and metrics behave predictably in joins, filters, and aggregations.
In real-world analytics data, the “same” field can arrive in different types: a user_id might be a string in event logs, an integer in CRM exports, and a float in a messy CSV. CAST AS INT64 is your move to force consistency. You’re telling SQL: “Treat this value as a 64-bit integer.” That reduces type mismatch errors and makes query logic easier to reason about.
INT64 (often called BIGINT in other SQL dialects) is a signed 64-bit integer type. It’s designed for whole numbers with a large range—useful when identifiers or counters can get big.
How it compares conceptually:
If your value is meant to be a whole number (especially an identifier), INT64 is usually the safest “standard type” to converge on.
CAST AS INT64 shows up constantly in analytics work because raw data is rarely typed the way you want it. Common scenarios include:
Type casting is one of those skills that feels small—until it saves your pipeline at 2 a.m. The best part: it’s simple, explicit, and readable when done well.
The basic pattern is:
CAST(expression AS INT64)
Example conceptually: you select a field and cast it into INT64 for downstream use in joins or aggregations.
In practice, you’ll often cast inside SELECT statements, JOIN conditions, and sometimes in subqueries where you’re “normalizing” data before further transformations.
Two of the most common conversions are:
When casting a string, SQL attempts to interpret the characters as a number. When casting a float, the conversion may drop the fractional component (details depend on your SQL engine), which is why you should be intentional about when you do it.
If you’re moving fast and want help drafting variations safely, it can be useful to generate and optimize SQL queries with AI, then review the casting rules and edge cases before shipping anything to production.
Real data loves being invalid. You’ll see empty strings, “N/A”, whitespace, or unexpected characters in fields that “should” be numeric. Here’s the key: decide what should happen when the value can’t be converted.
Practically, you can protect your queries by pre-validating strings (e.g., checking patterns), using conditional logic to return NULL when invalid, or isolating the cast into a staging step where you can audit failures.
Casting is powerful—but it’s not a magic wand. You’re changing the data type, and that can change how values behave. The trick is to cast with purpose: you cast to make the data more correct for the job you’re doing.
This is the hero use case. If one table stores order_id as STRING and another stores it as INT64, joining them directly can fail or produce incorrect results depending on the engine.
Common pattern: cast one side so both sides match, ideally to the “true” semantic type. For most identifiers, INT64 is a solid target because it’s compact and comparable.
Also, good key hygiene matters beyond just making joins work. If you’re designing tables or cleaning upstream models, think about key constraints and correct data types so you’re not constantly patching types at query time.
One more note: some IDs are sensitive (customer IDs, internal account numbers). If you’re standardizing types for joins and sharing datasets, consider whether you should obfuscate or protect identifiers first. It’s worth understanding data masking and handling sensitive IDs before you normalize and distribute “clean” keys everywhere.
Sometimes metrics arrive in weird types too: a count field as STRING, a “quantity” as FLOAT, or numeric values embedded in JSON and extracted as text. Casting to INT64 can make aggregations more consistent, especially when a metric is truly a whole number (like items_count, sessions, clicks).
That said, don’t cast everything to INT64 just because you can. If a metric can be fractional (conversion rate, revenue with cents, weighted averages), INT64 is the wrong destination. Use an appropriate numeric type for the level of precision you need.
CAST AS INT64 is strict: it’s an integer type, meaning it can’t represent decimals. That leads to a few classic pitfalls:
The mindset: cast where it improves correctness and reliability, not where it silently changes meaning.
These examples are about the daily grind: mismatched types, imported files, and reporting tables that need to behave. CAST AS INT64 is the small tool that keeps your warehouse logic from going off the rails.
Scenario: Your product events table stores user_id as STRING (because it came from JSON), but your users dimension table stores user_id as INT64. Your join fails or produces no matches.
Fix: cast the string user_id to INT64 in a safe staging step (or directly in the join if you’re exploring).
Example query pattern:
1SELECT events.*, users.user_segment
2FROM events
3JOIN users
4ON CAST(events.user_id AS INT64) = users.user_id
This makes the join condition comparable. In production, you’d typically normalize user_id once (in a cleaned model) rather than casting repeatedly in every downstream query.
Scenario: A CSV export lands in your warehouse with everything typed as STRING: order_id, quantity, and even a “days_to_convert” field. You need a clean table for analysis.
A practical approach is to create a cleaned view/table that casts fields into their intended types:
If your pipeline uses reusable logic (like stored procedures) to load and standardize recurring imports, you may end up applying casts inside that procedural layer. If that’s your world, see patterns for using CAST inside SQL stored procedures to keep transformations consistent and auditable.
Scenario: Your BI tool expects integer fields for relationships and filters, but your dataset contains mixed types (some report pages use “campaign_id” as a number, others as text). The result: broken filters, duplicated “same” values, and relationships that don’t link.
A clean reporting layer usually standardizes types so:
CAST AS INT64 is often the final “make it behave” step before a table becomes dashboard-ready.
In a data mart, consistency isn’t a nice-to-have. It’s the foundation that makes your joins deterministic, your metrics stable, and your reporting layer trustworthy.
Data marts typically have dimensional modeling patterns: fact tables (events, orders, sessions) joined to dimension tables (users, products, campaigns). Those relationships depend on keys lining up perfectly.
If keys aren’t typed consistently, you’ll see:
That’s why teams standardize keys early and enforce models using well-defined relationships, including primary and foreign keys in SQL. CAST AS INT64 is often part of that standardization step, especially when integrating event data, ad platforms, and CRM exports.
In data mart workflows, casting to INT64 is a practical move to keep identifiers and counters consistent across layers (staging → marts → reporting). When your join keys and commonly-used metrics have stable types, everything downstream gets faster to build, easier to test, and harder to accidentally break.
Want a cleaner path from raw data to analysis-ready tables? Try OWOX Data Marts and build reporting-friendly datasets where your key fields are standardized early so joins and dashboards don’t turn into a daily debugging sport.