D4: Performance & Querying Week 6

DAY 40

Snowflake Caching Layers – Result, Metadata & Warehouse

Day 39 changed how data sits on disk so a query reads fewer micro-partitions. Today’s topic skips the read entirely. Snowflake keeps three caches that answer a query without redoing the work. The result cache returns a whole answer that was computed before. The metadata cache answers counts and ranges from statistics alone. The warehouse cache keeps recently read data on local disk. The exam tests where each one lives, whether it needs a warehouse, and what wipes it. Carry in two facts from yesterday’s teaser. The result cache lasts 24 hours from last use. The warehouse cache is cleared when the warehouse suspends.

🗣️ Plain-English First

Five terms run through this sub-objective. The word “cache” hides three different things that the exam keeps separate.

Term	What it sounds like	What it means in Snowflake
Result cache	A saved answer	The full result set of a past query, stored in the Cloud Services layer. A matching repeat query gets the stored answer back with no warehouse and no scan.
Metadata cache	Notes about the data	Statistics Snowflake keeps for every table: row count, per-column min and max, distinct counts. Some queries are answered from these numbers alone.
Warehouse cache	The warehouse’s memory	Table data the running warehouse pulled from remote storage and kept on its local SSD. It speeds up the next scan that needs the same data.
Cloud Services layer	A vague back end	The coordination layer that holds the result cache and the metadata cache. It runs without any virtual warehouse. That is why both caches answer queries for free.
USE_CACHED_RESULT	A cache on-switch	A session parameter that turns result-cache reuse on or off. It defaults to TRUE. It controls only the result cache, not the other two.

📘

Query Result Cache

What it stores and where

After any query runs, Snowflake stores the complete result set. This is the result cache. It lives in the Cloud Services layer, not on the warehouse. A repeat of the same query gets the stored answer back instantly. No warehouse runs and no table is scanned, so the reuse is free.

The cache is shared across the account. A result one user computed can serve another user. The second user needs the access privileges on the tables in the query. The Query Profile marks a reused result with a single node reading QUERY RESULT REUSE.

How long it lasts

A result is kept for 24 hours. Each reuse resets that 24-hour clock. The reset can repeat up to a ceiling of 31 days from the first run. After 31 days the result is purged whatever the reuse pattern was.

The exam builds a trap on this. A stem claims the result cache lasts forever, or that it expires on a fixed schedule. Both are wrong. The rule is 24 hours from last use, capped at 31 days from first execution. Candidates I have trained tend to remember the 24 hours and drop the 31-day ceiling. The ceiling is the part a harder question targets.

What it takes to reuse a result

Reuse is strict. Snowflake checks several conditions before it hands back a stored answer. The documentation lists these as the main ones:

The new query text matches the previous one exactly. A change in case, a table alias, or extra whitespace breaks the match.
The data in the referenced tables has not changed since the result was stored.
The micro-partitions behind the query are unchanged, so reclustering them also invalidates the result.
The role running the query holds the privileges on every table involved.
The query has no function that returns a fresh value each run, such as CURRENT_TIMESTAMP, RANDOM, or UUID_STRING.

The data condition is the one questions lean on. A single insert into a referenced table invalidates the cached result. The next run of that query recomputes from scratch. A stem that loads one row at noon, then asks whether a result cached that morning still serves, is testing this. The answer is no.

Turning reuse off

Reuse is on by default through the USE_CACHED_RESULT session parameter. Set it to FALSE to force a real run. This helps when timing a query for tuning. The setting applies at the account, user, or session level. It controls only the result cache.

🧮

Metadata Cache

What Snowflake keeps about every table

Snowflake tracks statistics for each table as data loads and changes. This is the metadata cache. It also sits in the Cloud Services layer. It records the row count, the per-column minimum and maximum, the number of distinct values, the null counts, and the micro-partition boundaries. These numbers update on every load and DML.

Because the answer is already in the statistics, some queries never touch table data. A SELECT COUNT(*) reads the stored row count. A MIN or MAX on a number, date, or timestamp column reads the stored range. No warehouse is needed and no micro-partition is scanned. The Query Profile shows a single METADATA-BASED RESULT node with zero bytes scanned.

The character-column limit

The metadata path covers numeric, date, and timestamp columns. It does not extend to MIN and MAX on a string column. A query asking for the smallest or largest VARCHAR value scans the data and needs a running warehouse. This is a fine distinction the exam can use to separate candidates. The metadata trick is real, but it stops at character types.

The metadata cache cannot be turned off. There is no parameter for it, unlike the result cache. It is always on and Snowflake maintains it.

The same statistics drive pruning

These per-micro-partition min and max values do more than answer aggregates. They drive the pruning from Day 39. A filter on a column lets Snowflake skip any micro-partition whose stored range cannot match. The metadata cache and partition pruning read the same numbers. One returns a count for free. The other narrows a scan.

⚙️

Warehouse Cache

Data kept on the warehouse’s local disk

When a warehouse scans a table, it pulls micro-partitions from remote storage. It keeps a copy of that data on the local SSD of its compute nodes. This is the warehouse cache, also called the local disk cache. A later query on the same warehouse can read from SSD instead of remote storage. Local reads are faster, so the repeat query runs quicker.

This cache sits in the compute layer, not in Cloud Services. It is local to one warehouse. A second warehouse does not see it and builds its own. That is the line the exam draws against the result and metadata caches. Any warehouse can use those two.

Suspend clears it

The warehouse cache is dropped when the warehouse suspends. A resume starts with an empty cache that fills again as queries run. A stem claiming the warehouse cache survives a suspend is wrong. The first query after a resume reads from remote storage and runs at cold speed.

The AUTO_SUSPEND setting drives a trade. A short timeout saves credits but drops the cache between queries. A longer timeout keeps the cache warm for repeat work and burns more idle credits. Snowflake suggests a longer auto-suspend for query and dashboard warehouses, where cache reuse pays off.

The three caches side by side

Cache	Layer	Needs a warehouse?	Survives suspend?	Can disable?
Result cache	Cloud Services	No	Not tied to a warehouse	Yes, USE_CACHED_RESULT
Metadata cache	Cloud Services	No	Not tied to a warehouse	No
Warehouse cache	Compute	Yes, it lives on the warehouse	No, dropped on suspend	No direct switch

Read the first two rows together. Both Cloud Services caches answer without a warehouse. That is why a suspended warehouse can still return a count or a cached result. The third row is the odd one out. The warehouse cache cannot exist without a running warehouse.

⚡

Cheat Sheet

Concept	What to remember	Exam keyword
Result cache retention	24 hours from last use, reset on each reuse, ceiling of 31 days from first run	“24 hours, 31-day max”
Result cache location	Cloud Services layer. No warehouse, no scan, free reuse	“Cloud Services”
Result cache reuse	Exact query text, unchanged data, role privileges, no run-time functions	“exact match”
Result cache invalidation	Any DML on a referenced table, or reclustering its micro-partitions	“data changed”
USE_CACHED_RESULT	Turns result-cache reuse on or off. Default TRUE. Result cache only	“default TRUE”
Metadata cache answers	COUNT(*), and MIN or MAX on numeric, date, or timestamp columns	“no warehouse”
Metadata cache limit	MIN or MAX on a string column scans data, not metadata	“character column scans”
Metadata cache control	Always on. Cannot be disabled	“cannot disable”
Warehouse cache	Local SSD on the warehouse. Speeds repeat scans on that warehouse	“local disk”
Warehouse cache on suspend	Dropped when the warehouse suspends. Rebuilds after resume	“cleared on suspend”

🎯

Exam Tip

🎯 Exam Tip

Two false claims show up the most. The result cache does not last forever. It holds for 24 hours from last use, with a 31-day ceiling from the first run. The warehouse cache does not survive a suspend. It is dropped. The next query reads cold from remote storage. A stem stating either as a permanent feature is the wrong answer.

Watch the suspended-warehouse setup. A question suspends the warehouse and runs SELECT COUNT(*) on a large table. The result comes back instantly. The metadata cache answered it from the stored row count. No warehouse was needed. The same setup with a MIN on a string column behaves differently, because that query scans data and the warehouse is off.

Sort the three caches by layer. The result cache and the metadata cache live in Cloud Services and need no warehouse. The warehouse cache lives on the warehouse and dies with it. If a question asks which caches a suspended warehouse can still use, the answer is the two in Cloud Services.

🛠️

Hands-On Lab

Type: LAB (read-only against the shared sample database, plus one session parameter) | Time: ~10 minutes | Credits: <0.2 | Prerequisite: any edition trial, the shared lab_xs warehouse, and read access to SNOWFLAKE_SAMPLE_DATA. This lab reads sample tables only. It creates and drops nothing. The single change is a session parameter. The final step resets it.

See the result cache reuse a whole answer. Run a query once, then run the identical text again.

SQL

USE WAREHOUSE lab_xs;

SELECT L_RETURNFLAG, COUNT(*) AS c
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.LINEITEM
GROUP BY L_RETURNFLAG;

-- Run the exact same query a second time:
SELECT L_RETURNFLAG, COUNT(*) AS c
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.LINEITEM
GROUP BY L_RETURNFLAG;

👀 Observe: The first run scans and takes seconds. The second returns in milliseconds. Open its Query Profile and you see one node reading QUERY RESULT REUSE. No scan happened on the second run.

Answer a count with the warehouse off. Suspend the warehouse, then run a fresh count that was not cached.

SQL

ALTER WAREHOUSE lab_xs SUSPEND;

SELECT COUNT(*)
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS;

👀 Observe: The count returns instantly and the warehouse stays suspended. The Query Profile shows METADATA-BASED RESULT with zero bytes scanned. The row count came from the metadata cache in Cloud Services, so no warehouse was needed.

Turn result reuse off for an honest scan timing. This stops the result cache from masking the warehouse cache in the next step.

SQL

ALTER SESSION SET USE_CACHED_RESULT = FALSE;

👀 Observe: Repeat queries now run for real. Any speed-up you see next comes from the warehouse SSD cache, not from a reused result.

Watch the warehouse cache warm up, then clear on suspend. Resume, scan twice, then bounce the warehouse and scan again.

SQL

ALTER WAREHOUSE lab_xs RESUME;

-- First scan, cold. Reads from remote storage.
SELECT SUM(O_TOTALPRICE)
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS;

-- Second scan, warm. Reads from local SSD.
SELECT AVG(O_TOTALPRICE)
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS;

-- Drop the cache by bouncing the warehouse.
ALTER WAREHOUSE lab_xs SUSPEND;
ALTER WAREHOUSE lab_xs RESUME;

-- Cold again. The SSD cache was cleared on suspend.
SELECT AVG(O_TOTALPRICE)
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS;

👀 Observe: The second scan is faster than the first because it read from SSD. The scan after the suspend and resume is slow again. The cache did not survive the suspend. That is the rule the exam tests.

✓

Cleanup. Reset the session parameter and suspend the warehouse. Nothing was created, so there is no object to drop.

SQL

ALTER SESSION UNSET USE_CACHED_RESULT;
ALTER WAREHOUSE lab_xs SUSPEND;
-- Keep lab_xs itself. It is the shared warehouse reused across days.
-- SNOWFLAKE_SAMPLE_DATA is read-only and was never modified.

❓

Practice Questions

Q1: How long does Snowflake keep a result in the query result cache? ▸

Options:

A. 24 hours from the first execution, then purged with no extension
B. 24 hours from the last reuse, up to a ceiling of 31 days from the first run
C. Permanently, until the underlying table is dropped
D. Only while the warehouse that ran the query stays active

✅ Answer: B

Why B: The 24-hour clock resets on each reuse. That reset can repeat until 31 days have passed since the first run, when the result is purged regardless.

Why not A: This drops the reset on reuse. A reused result extends past the first 24 hours.

Why not C: Nothing in the result cache is permanent. The 31-day ceiling is the hard limit.

Why not D: The result cache lives in Cloud Services and is not tied to any warehouse. It serves even when the warehouse is suspended.

Q2: A warehouse is suspended. A user runs SELECT COUNT(*) on a large table for the first time. The result returns instantly without resuming the warehouse. Which cache served it? ▸

Options:

A. The warehouse cache on local SSD
B. The result cache, from a prior identical run
C. The metadata cache in the Cloud Services layer
D. None, because the query resumed the warehouse and scanned the table

✅ Answer: C

Why C: A row count is stored in table metadata. Snowflake returns it from the Cloud Services layer with no warehouse and no scan.

Why not A: The warehouse cache only exists on a running warehouse. This one is suspended, so its SSD cache is gone.

Why not B: The stem says the query ran for the first time, so no result was stored to reuse.

Why not D: A metadata-answered count does not resume the warehouse. The result came from statistics, not a scan.

Q3: Which statement about the warehouse (local disk) cache is correct? ▸

Options:

A. It is shared across all warehouses in the account
B. It survives a suspend and resume of the warehouse
C. It is cleared when the warehouse suspends and rebuilds after a resume
D. It is turned off by setting USE_CACHED_RESULT to FALSE

✅ Answer: C

Why C: The warehouse cache lives on the warehouse’s local SSD. A suspend drops it. A resume starts with an empty cache that fills as queries run.

Why not A: The cache is local to one warehouse. The shared caches are the result and metadata caches in Cloud Services.

Why not B: This is the most common false claim on this topic. The suspend clears it.

Why not D: That parameter controls only the result cache. It has no effect on the warehouse cache.

Q4: A query result is cached. Which changes will stop that cached result from being reused on the next run? (Select TWO.) ▸

Options:

A. An INSERT into one of the tables the query reads
B. Running the identical query from a different virtual warehouse
C. Re-running the query with every keyword changed to lowercase
D. Suspending the warehouse that ran the original query
E. Granting a new role to an unrelated user

✅ Answer: A, C

Why A: A DML change to referenced data invalidates the stored result. The next run recomputes.

Why C: Reuse needs an exact text match. Changing case breaks the match, so the cache is not reused.

Why not B: The result cache is shared across warehouses. A different warehouse still reuses the result.

Why not D: The result cache lives in Cloud Services, independent of any warehouse. A suspend does not clear it.

Why not E: A grant to an unrelated user has no bearing on this query’s cached result.

Q5: What does the USE_CACHED_RESULT parameter control, and what is its default value? ▸

Options:

A. The warehouse cache, defaulting to FALSE
B. The metadata cache, defaulting to TRUE
C. Query result cache reuse, defaulting to TRUE
D. All three cache layers at once, defaulting to TRUE

✅ Answer: C

Why C: The parameter switches result-cache reuse on or off and is TRUE by default. Setting it to FALSE forces a real run.

Why not A: The warehouse cache has no on-switch parameter. The default shown here is also wrong.

Why not B: The metadata cache cannot be disabled at all. No parameter governs it.

Why not D: The parameter touches only the result cache, not all three layers.

❄️

Snowflake Documentation

Official docs for today’s topics. The exam pulls directly from these.

❄️

Using Persisted Query Results

docs.snowflake.com/en/user-guide/querying-persisted-results

→

❄️

Optimizing the Warehouse Cache

docs.snowflake.com/en/user-guide/performance-query-warehouse-cache

→

❄️

Micro-partitions and Data Clustering (metadata)

docs.snowflake.com/en/user-guide/tables-clustering-micropartitions

→

❄️

Parameters (USE_CACHED_RESULT)

docs.snowflake.com/en/sql-reference/parameters

→

📝 Recap

Today you learned: three caches that answer a query without redoing the work. The result cache stores a whole answer in Cloud Services for 24 hours from last use, with a 31-day ceiling from the first run. Reuse needs an exact query match, unchanged data, and the right privileges. Any DML on a referenced table invalidates it. The metadata cache answers COUNT(*) from statistics, with no warehouse. It also serves MIN and MAX on numeric, date, or timestamp columns. MIN and MAX on a string column still scan. The warehouse cache keeps recently read data on local SSD and is dropped when the warehouse suspends.

Key takeaway: sort by layer. The result cache and the metadata cache sit in Cloud Services and need no warehouse, so a suspended warehouse can still serve both. The warehouse cache lives on the warehouse and dies with it. Two false claims carry the most exam weight: the result cache lasting forever, and the warehouse cache surviving a suspend. Both are wrong.

Tomorrow (Day 41): SQL: Aggregate and Window Functions, plus QUALIFY. You meet ROW_NUMBER, RANK, DENSE_RANK, LEAD, and LAG, then filter their output without a subquery. Carry in two rules. QUALIFY filters window-function results the way HAVING filters groups. RANK skips numbers after a tie, while DENSE_RANK does not.

← Day 39 All 50 Days Day 41 →

About the Author

Abhay Krishnan

Senior Data & AI Consultant

Connect on LinkedIn

With over five years of data engineering experience at EY and Infosys, Abhay Krishnan specializes in building scalable data pipelines and cloud warehousing solutions. He is a certified SnowPro Core professional, alongside credentials in AWS and Azure. Abhay created this 50-day track to solve a problem he faced firsthand: the lack of a structured, free resource for Snowflake certification prep. Follow him on LinkedIn for more data engineering insights.

Snowflake Caching Layers – Result, Metadata & Warehouse

Query Result Cache

What it stores and where

How long it lasts

What it takes to reuse a result

Turning reuse off

Metadata Cache

What Snowflake keeps about every table

The character-column limit

The same statistics drive pruning

Warehouse Cache

Data kept on the warehouse’s local disk

Suspend clears it

The three caches side by side

Cheat Sheet

Exam Tip

Hands-On Lab

Practice Questions

Snowflake Documentation

Topics