In part 1, we covered the Snowflake platform features that improve query performance: warehouse sizing, caching, materialized views, clustering, the Query Acceleration Service (QAS) and the Search Optimization Service. If you haven’t read that yet, start there.
This installment (part 2) focuses on the query-writing side of Snowflake query tuning. Bad query patterns are often the root cause of slow performance, and no amount of platform configuration fixes a query that’s fundamentally inefficient. We’ll cover four practices that consistently make a difference.
Now let’s get started!
4 best practices for Snowflake query tuning and Optimization
1) Select only the columns you need; skip the SELECT *
This one sounds obvious, but it’s necessary to understand why it matters in Snowflake specifically.
Snowflake stores data in a columnar format. Each column in a micro-partition is compressed and stored independently. When you run SELECT *, Snowflake has to decompress and transfer every column in every micro-partition it touches, even the columns your query ultimately ignores. That’s extra decompression work, extra memory usage and extra data movement between warehouse nodes.
Partition pruning doesn’t save you from this. Even if your WHERE clause prunes 90% of micro-partitions, the 10% that do get scanned still require every column to be decompressed when you use SELECT *. On a table with 50 columns where you only need 5, that’s roughly 10 times the I/O you’d otherwise incur.
Snowflake does apply column pruning when you name specific columns, so the fix is straightforward: list exactly what you need.
Avoid this
SELECT * FROM orders WHERE region = 'East';
Do this instead
SELECT order_id, order_date, order_total FROM orders WHERE region = 'East';
A few related habits worth building:
- Avoid pulling unnecessary columns into WHERE, JOIN, GROUP BY or ORDER BY clauses
- Use table aliases to keep column references precise and readable in multi-table queries
- Consider views or materialized views that expose only the relevant columns to downstream consumers
Note: using SELECT * inside a CTE or a subquery is generally fine, provided the outer query names specific columns. Snowflake’s optimizer is smart enough to push column projections down and avoid scanning columns that the outer query doesn’t reference.
2) Sort less, sort smarter
Sorting in Snowflake is expensive in a specific way. Because Snowflake distributes workloads across multiple worker nodes in a virtual warehouse, sorting a large dataset requires a data shuffle: intermediate results from one node have to move to another node so the global sort can be assembled. That shuffle costs CPU time, memory and network bandwidth. For very large result sets, the intermediate data spills to local SSD or even to remote cloud storage, which compounds the slowdown.
The core advice is simple: don’t add ORDER BY unless you actually need the output to be sorted. This applies especially inside CTEs and subqueries. Snowflake executes ORDER BY in a CTE even though the outer query may entirely override the ordering. That’s wasted compute with no benefit.
Wasteful: this ORDER BY has no effect on the final result
WITH ranked_orders AS ( SELECT order_id, customer_id, order_total FROM orders ORDER BY order_total DESC -- pointless here ) SELECT * FROM ranked_orders LIMIT 10;
Better: push the ORDER BY to where it actually matters
WITH ranked_orders AS ( SELECT order_id, customer_id, order_total FROM orders ) SELECT * FROM ranked_orders ORDER BY order_total DESC LIMIT 10;
When you do need to sort, a few practices reduce the cost:
Filter early. Apply WHERE clause filters before the sort to reduce the number of rows being sorted. The fewer rows going into the sort, the less shuffle and memory pressure you generate.
Always pair ORDER BY with LIMIT when you only need top-N rows. Snowflake pushes the LIMIT down into the sort operation, which means it can stop sorting once it has the top N values rather than sorting the entire result set.
The following query gets you the top 100 spenders without sorting the entire table.
SELECT customer_id, total_spent FROM customer_summary ORDER BY total_spent DESC LIMIT 100;
Pre-sort data at load time for recurring queries. If you frequently query a table with the same ORDER BY column, load the data sorted on that column. Once the data is physically organized that way in micro-partitions, many ORDER BY queries become much cheaper because Snowflake can take advantage of the existing order.
Use clustering keys on high-frequency sort columns. When data is clustered on the column you sort by most often, rows with similar values concentrate in the same micro-partitions. This doesn’t eliminate sorting, but it significantly reduces the number of micro-partitions involved and the amount of data that needs to be shuffled.
3) Prefer joins over correlated subqueries
The general advice to replace subqueries with joins is sound, but it’s worth understanding exactly when and why it applies in Snowflake because Snowflake’s query optimizer already handles a lot of this automatically.
Uncorrelated subqueries execute once and return a fixed value or result set. Snowflake handles these efficiently and they often perform comparably to an equivalent join. Rewriting them as joins is fine but not always necessary.
Correlated subqueries are the real concern. A correlated subquery references a column from the outer query, which means it re-executes for every row the outer query produces. In databases without a decorrelation optimizer, this creates an N+1 scan problem: if your outer query returns 1 million rows, the subquery runs 1 million times.
Snowflake’s query optimizer does attempt to decorrelate correlated subqueries and convert them to hash joins internally. For many standard patterns, this works well. But the optimizer doesn’t handle all cases, and complex or deeply nested correlated subqueries can still force row-by-row execution. The Query Profile is the best way to check: look for a “NestedLoopJoin” operator in the execution plan. If you see one, it’s a sign the subquery wasn’t decorrelated and should be rewritten as an explicit join.
Here’s a concrete example. Say you want order details for all customers in the East region:
Subquery approach: less efficient, harder to read
SELECT o.order_id, o.order_date, o.order_total, c.customer_name, c.customer_region FROM orders o JOIN ( SELECT customer_id FROM customers WHERE customer_region = 'East' ) cids ON o.customer_id = cids.customer_id JOIN customers c ON o.customer_id = c.customer_id ORDER BY o.order_date;
Notice that the subquery approach above actually joins customers twice. That’s a clear sign to simplify:
Direct join: one pass through customers, easier to read and optimize
SELECT o.order_id, o.order_date, o.order_total, c.customer_name, c.customer_region FROM orders o JOIN customers c ON o.customer_id = c.customer_id AND c.customer_region = 'East' ORDER BY o.order_date;
Beyond the correlated vs. uncorrelated distinction, CTEs are worth mentioning. A common table expression (CTE) doesn’t guarantee materialization in Snowflake. The optimizer may inline it, treating it like a subquery. But CTEs improve readability and make it easier to spot redundant logic. If the same subquery appears multiple times in a query, move it into a CTE. Snowflake can then decide whether to compute it once and reuse the result.
4) Use performance monitoring to find what actually needs tuning
Writing better SQL only helps if you know which queries to target. Snowflake provides two main ways to surface slow queries and diagnose what’s going wrong.
Snowsight Query History
The Query History view in Snowsight is the fastest starting point. Sort by duration to surface the slowest queries, then open the Query Profile for any query that looks slow. The Profile shows you the full execution plan with timing per operator, partition scan ratios and any spillage. It takes about 30 seconds to rule out the most common problems.
ACCOUNT_USAGE.QUERY_HISTORY
For systematic analysis across your entire account, query the SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY view. This view retains data for 365 days and exposes detailed metrics per query execution.
Here is how to find your 20 most expensive queries by total execution time in the past 30 days:
SELECT query_text, query_id, user_name, warehouse_name, ROUND(total_elapsed_time / 1000, 1) AS elapsed_seconds, partitions_scanned, partitions_total, bytes_spilled_to_local_storage, bytes_spilled_to_remote_storage FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY WHERE start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP()) AND execution_status = 'SUCCESS' ORDER BY total_elapsed_time DESC LIMIT 20;
The columns to watch most closely:
- partitions_scanned vs. partitions_total: a high ratio means pruning isn’t working well
- bytes_spilled_to_local_storage: the warehouse ran out of memory and wrote to SSD
- bytes_spilled_to_remote_storage: worse than local spillage; the warehouse ran out of SSD too and wrote to cloud storage
If you only have access to the Information Schema rather than Account Usage, the QUERY_HISTORY() table function works similarly but only covers the past 7 days.
Canceling runaway queries
To cancel a query that’s consuming too many resources, use SYSTEM$CANCEL_QUERY. Note: this is a system function, not a stored procedure:
SELECT SYSTEM$CANCEL_QUERY('<query_id>');
You can find the query ID from the Query History view or from the QUERY_HISTORY results.
Setting query timeouts
To prevent any single query from monopolizing a warehouse indefinitely, set a timeout at the session or warehouse level:
For example, here is how to cancel any query that runs longer than 300 seconds on this warehouse
ALTER WAREHOUSE my_warehouse SET STATEMENT_TIMEOUT_IN_SECONDS = 300;
This doesn’t fix slow queries, but it stops them from blocking others and gives you a clear signal that something needs investigation.
Flexera’s Snowflake observability
If you are a Snowflake user who wants account-wide query monitoring beyond what Snowsight provides can look to Flexera One capabilities. These surface expensive query groups, identify workloads that contribute disproportionately to compute spend and surface tuning recommendations across the full query history. For teams managing large Snowflake environments with many users and warehouses, automated observability complements the manual profiling workflow well.
Conclusion
Snowflake can handle millions of queries, but it’s important to remember that queries can’t be run in any way to get the best optimal performance out of it. Utilizing best practices for Snowflake query tuning and optimization is the only viable option for running queries efficiently.
In Part 1 of this two-part article on how to optimize Snowflake queries, we talked about a number of Snowflake features that might make queries run faster. But, in this part (Part 2), we went over several typical pitfalls and common mistakes to avoid while writing Snowflake queries.
Here’s a summary of what we covered:
- Select only required columns; never use SELECT * in production queries
- Avoid ORDER BY unless the output genuinely needs to be sorted; never use it inside CTEs or subqueries
- Prefer explicit joins over correlated subqueries; check Query Profile for NestedLoopJoin operators
- Use ACCOUNT_USAGE.QUERY_HISTORY for systematic query monitoring and prioritize the highest-cost queries first
Taken together with Part 1’s coverage of warehouse sizing, caching, clustering and the Search Optimization Service, these practices give you a solid foundation for running Snowflake efficiently at any scale.
FAQs
How can sorting impact Snowflake query performance?
Sorting in Snowflake triggers a data shuffle across warehouse nodes, which consumes CPU, memory and network bandwidth. On large datasets, intermediate sort results can spill to local SSD or remote cloud storage, which is significantly slower than in-memory processing. The impact scales with data volume, making unnecessary ORDER BY clauses one of the easiest wins to address.
When should I use ORDER BY with LIMIT?
Always pair ORDER BY with a LIMIT clause when you only need the top N results. Snowflake pushes the LIMIT into the sort operation, which means it stops sorting once it has the top N values rather than sorting the full dataset. Running ORDER BY without LIMIT forces a full sort of every row, even if only the first few rows end up being used.
Should I always replace subqueries with joins in Snowflake?
Not always. Snowflake’s optimizer automatically decorrelates many correlated subqueries and converts them to hash joins internally. For uncorrelated subqueries, the performance difference is often negligible. The real concern is correlated subqueries that the optimizer can’t decorrelate, which can trigger row-by-row execution. Check the Query Profile for a NestedLoopJoin operator; if you see one, rewrite the subquery as an explicit join.
What is the difference between ACCOUNT_USAGE.QUERY_HISTORY and the QUERY_HISTORY table function?
ACCOUNT_USAGE.QUERY_HISTORY is a view in the Account Usage schema that retains query history for 365 days and covers the entire account. The QUERY_HISTORY() table function in the Information Schema is simpler to access but only retains data for the past 7 days. For ongoing monitoring and trend analysis, use the Account Usage view. For quick, recent lookups, the table function works fine.
How do I cancel a long-running query in Snowflake?
Use the SYSTEM$CANCEL_QUERY system function with the query ID as its argument. Get the query ID from Snowsight’s Query History or from ACCOUNT_USAGE.QUERY_HISTORY. Note that SYSTEM$CANCEL_QUERY is a function called with SELECT, not a stored procedure called with CALL.
How do I prevent long-running queries from monopolizing warehouse resources?
Set STATEMENT_TIMEOUT_IN_SECONDS at the warehouse, session or account level. A query that exceeds the timeout is automatically canceled. This protects shared warehouses from runaway queries while giving data teams a clear signal that a query needs investigation.
Does Snowflake automatically optimize SELECT * queries?
Partially. Snowflake applies projection pushdown, so if an outer query references only a subset of columns from an inner SELECT *, the optimizer may avoid scanning unused columns. But this optimization isn’t guaranteed in all cases, and relying on it as a substitute for explicit column selection is risky. The safe practice is to name only the columns you need, especially in queries that run frequently or at scale.
What causes a query in Snowflake to spill to remote storage?
Remote spillage happens when a query’s intermediate results exceed both available warehouse memory and the local SSD capacity of the warehouse nodes. Snowflake then writes overflow data to cloud object storage, which is orders of magnitude slower than SSD. Common causes are large, unsorted joins, GROUP BY operations on high-cardinality columns and ORDER BY without LIMIT on very large tables. The fix is usually to reduce the data volume entering the expensive operation, rewrite the query logic or scale up the warehouse size.
How do I find which queries contribute most to my Snowflake costs?
Query SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY and sort by total_elapsed_time or by credits consumed. Look for queries with high partitions_scanned to partitions_total ratios, significant bytes spilled or long elapsed times. These are the best candidates for optimization. Using a QUERY_TAG to label queries by team or project makes cost attribution even more precise at scale.
Why does ORDER BY inside a CTE waste resources?
Snowflake executes the ORDER BY statement inside the CTE even though the outer query may reorder or aggregate the results differently. The sort consumes CPU and memory with no benefit to the final output. Move ORDER BY to the outermost query and only apply it when the final result set genuinely needs to be ordered.