Snowflake Caching Layers – Result, Metadata & Warehouse
Day 39 changed how data sits on disk so a query reads fewer micro-partitions. Today’s topic skips the read entirely. Snowflake keeps three caches that answer a query without redoing the work. The result cache returns a whole answer that was computed before. The metadata cache answers counts and ranges from statistics alone. The warehouse cache keeps recently read data on local disk. The exam tests where each one lives, whether it needs a warehouse, and what wipes it. Carry in two facts from yesterday’s teaser. The result cache lasts 24 hours from last use. The warehouse cache is cleared when the warehouse suspends.
Five terms run through this sub-objective. The word “cache” hides three different things that the exam keeps separate.
| Term | What it sounds like | What it means in Snowflake |
|---|---|---|
| Result cache | A saved answer | The full result set of a past query, stored in the Cloud Services layer. A matching repeat query gets the stored answer back with no warehouse and no scan. |
| Metadata cache | Notes about the data | Statistics Snowflake keeps for every table: row count, per-column min and max, distinct counts. Some queries are answered from these numbers alone. |
| Warehouse cache | The warehouse’s memory | Table data the running warehouse pulled from remote storage and kept on its local SSD. It speeds up the next scan that needs the same data. |
| Cloud Services layer | A vague back end | The coordination layer that holds the result cache and the metadata cache. It runs without any virtual warehouse. That is why both caches answer queries for free. |
| USE_CACHED_RESULT | A cache on-switch | A session parameter that turns result-cache reuse on or off. It defaults to TRUE. It controls only the result cache, not the other two. |
Query Result Cache
What it stores and where
After any query runs, Snowflake stores the complete result set. This is the result cache. It lives in the Cloud Services layer, not on the warehouse. A repeat of the same query gets the stored answer back instantly. No warehouse runs and no table is scanned, so the reuse is free.
The cache is shared across the account. A result one user computed can serve another user. The second user needs the access privileges on the tables in the query. The Query Profile marks a reused result with a single node reading QUERY RESULT REUSE.
How long it lasts
A result is kept for 24 hours. Each reuse resets that 24-hour clock. The reset can repeat up to a ceiling of 31 days from the first run. After 31 days the result is purged whatever the reuse pattern was.
The exam builds a trap on this. A stem claims the result cache lasts forever, or that it expires on a fixed schedule. Both are wrong. The rule is 24 hours from last use, capped at 31 days from first execution. Candidates I have trained tend to remember the 24 hours and drop the 31-day ceiling. The ceiling is the part a harder question targets.
What it takes to reuse a result
Reuse is strict. Snowflake checks several conditions before it hands back a stored answer. The documentation lists these as the main ones:
- The new query text matches the previous one exactly. A change in case, a table alias, or extra whitespace breaks the match.
- The data in the referenced tables has not changed since the result was stored.
- The micro-partitions behind the query are unchanged, so reclustering them also invalidates the result.
- The role running the query holds the privileges on every table involved.
- The query has no function that returns a fresh value each run, such as
CURRENT_TIMESTAMP,RANDOM, orUUID_STRING.
The data condition is the one questions lean on. A single insert into a referenced table invalidates the cached result. The next run of that query recomputes from scratch. A stem that loads one row at noon, then asks whether a result cached that morning still serves, is testing this. The answer is no.
Turning reuse off
Reuse is on by default through the USE_CACHED_RESULT session parameter. Set it to FALSE to force a real run. This helps when timing a query for tuning. The setting applies at the account, user, or session level. It controls only the result cache.
Metadata Cache
What Snowflake keeps about every table
Snowflake tracks statistics for each table as data loads and changes. This is the metadata cache. It also sits in the Cloud Services layer. It records the row count, the per-column minimum and maximum, the number of distinct values, the null counts, and the micro-partition boundaries. These numbers update on every load and DML.
Because the answer is already in the statistics, some queries never touch table data. A SELECT COUNT(*) reads the stored row count. A MIN or MAX on a number, date, or timestamp column reads the stored range. No warehouse is needed and no micro-partition is scanned. The Query Profile shows a single METADATA-BASED RESULT node with zero bytes scanned.
The character-column limit
The metadata path covers numeric, date, and timestamp columns. It does not extend to MIN and MAX on a string column. A query asking for the smallest or largest VARCHAR value scans the data and needs a running warehouse. This is a fine distinction the exam can use to separate candidates. The metadata trick is real, but it stops at character types.
The metadata cache cannot be turned off. There is no parameter for it, unlike the result cache. It is always on and Snowflake maintains it.
The same statistics drive pruning
These per-micro-partition min and max values do more than answer aggregates. They drive the pruning from Day 39. A filter on a column lets Snowflake skip any micro-partition whose stored range cannot match. The metadata cache and partition pruning read the same numbers. One returns a count for free. The other narrows a scan.
Warehouse Cache
Data kept on the warehouse’s local disk
When a warehouse scans a table, it pulls micro-partitions from remote storage. It keeps a copy of that data on the local SSD of its compute nodes. This is the warehouse cache, also called the local disk cache. A later query on the same warehouse can read from SSD instead of remote storage. Local reads are faster, so the repeat query runs quicker.
This cache sits in the compute layer, not in Cloud Services. It is local to one warehouse. A second warehouse does not see it and builds its own. That is the line the exam draws against the result and metadata caches. Any warehouse can use those two.
Suspend clears it
The warehouse cache is dropped when the warehouse suspends. A resume starts with an empty cache that fills again as queries run. A stem claiming the warehouse cache survives a suspend is wrong. The first query after a resume reads from remote storage and runs at cold speed.
The AUTO_SUSPEND setting drives a trade. A short timeout saves credits but drops the cache between queries. A longer timeout keeps the cache warm for repeat work and burns more idle credits. Snowflake suggests a longer auto-suspend for query and dashboard warehouses, where cache reuse pays off.
The three caches side by side
| Cache | Layer | Needs a warehouse? | Survives suspend? | Can disable? |
|---|---|---|---|---|
| Result cache | Cloud Services | No | Not tied to a warehouse | Yes, USE_CACHED_RESULT |
| Metadata cache | Cloud Services | No | Not tied to a warehouse | No |
| Warehouse cache | Compute | Yes, it lives on the warehouse | No, dropped on suspend | No direct switch |
Read the first two rows together. Both Cloud Services caches answer without a warehouse. That is why a suspended warehouse can still return a count or a cached result. The third row is the odd one out. The warehouse cache cannot exist without a running warehouse.
Cheat Sheet
| Concept | What to remember | Exam keyword |
|---|---|---|
| Result cache retention | 24 hours from last use, reset on each reuse, ceiling of 31 days from first run | “24 hours, 31-day max” |
| Result cache location | Cloud Services layer. No warehouse, no scan, free reuse | “Cloud Services” |
| Result cache reuse | Exact query text, unchanged data, role privileges, no run-time functions | “exact match” |
| Result cache invalidation | Any DML on a referenced table, or reclustering its micro-partitions | “data changed” |
| USE_CACHED_RESULT | Turns result-cache reuse on or off. Default TRUE. Result cache only | “default TRUE” |
| Metadata cache answers | COUNT(*), and MIN or MAX on numeric, date, or timestamp columns | “no warehouse” |
| Metadata cache limit | MIN or MAX on a string column scans data, not metadata | “character column scans” |
| Metadata cache control | Always on. Cannot be disabled | “cannot disable” |
| Warehouse cache | Local SSD on the warehouse. Speeds repeat scans on that warehouse | “local disk” |
| Warehouse cache on suspend | Dropped when the warehouse suspends. Rebuilds after resume | “cleared on suspend” |
Exam Tip
Two false claims show up the most. The result cache does not last forever. It holds for 24 hours from last use, with a 31-day ceiling from the first run. The warehouse cache does not survive a suspend. It is dropped. The next query reads cold from remote storage. A stem stating either as a permanent feature is the wrong answer.
Watch the suspended-warehouse setup. A question suspends the warehouse and runs SELECT COUNT(*) on a large table. The result comes back instantly. The metadata cache answered it from the stored row count. No warehouse was needed. The same setup with a MIN on a string column behaves differently, because that query scans data and the warehouse is off.
Sort the three caches by layer. The result cache and the metadata cache live in Cloud Services and need no warehouse. The warehouse cache lives on the warehouse and dies with it. If a question asks which caches a suspended warehouse can still use, the answer is the two in Cloud Services.
Hands-On Lab
lab_xs warehouse, and read access to SNOWFLAKE_SAMPLE_DATA. This lab reads sample tables only. It creates and drops nothing. The single change is a session parameter. The final step resets it.See the result cache reuse a whole answer. Run a query once, then run the identical text again.
USE WAREHOUSE lab_xs;
SELECT L_RETURNFLAG, COUNT(*) AS c
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.LINEITEM
GROUP BY L_RETURNFLAG;
-- Run the exact same query a second time:
SELECT L_RETURNFLAG, COUNT(*) AS c
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.LINEITEM
GROUP BY L_RETURNFLAG;QUERY RESULT REUSE. No scan happened on the second run.Answer a count with the warehouse off. Suspend the warehouse, then run a fresh count that was not cached.
ALTER WAREHOUSE lab_xs SUSPEND;
SELECT COUNT(*)
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS;METADATA-BASED RESULT with zero bytes scanned. The row count came from the metadata cache in Cloud Services, so no warehouse was needed.Turn result reuse off for an honest scan timing. This stops the result cache from masking the warehouse cache in the next step.
ALTER SESSION SET USE_CACHED_RESULT = FALSE;Watch the warehouse cache warm up, then clear on suspend. Resume, scan twice, then bounce the warehouse and scan again.
ALTER WAREHOUSE lab_xs RESUME;
-- First scan, cold. Reads from remote storage.
SELECT SUM(O_TOTALPRICE)
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS;
-- Second scan, warm. Reads from local SSD.
SELECT AVG(O_TOTALPRICE)
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS;
-- Drop the cache by bouncing the warehouse.
ALTER WAREHOUSE lab_xs SUSPEND;
ALTER WAREHOUSE lab_xs RESUME;
-- Cold again. The SSD cache was cleared on suspend.
SELECT AVG(O_TOTALPRICE)
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS;Cleanup. Reset the session parameter and suspend the warehouse. Nothing was created, so there is no object to drop.
ALTER SESSION UNSET USE_CACHED_RESULT;
ALTER WAREHOUSE lab_xs SUSPEND;
-- Keep lab_xs itself. It is the shared warehouse reused across days.
-- SNOWFLAKE_SAMPLE_DATA is read-only and was never modified.Practice Questions
Options:
A. 24 hours from the first execution, then purged with no extension
B. 24 hours from the last reuse, up to a ceiling of 31 days from the first run
C. Permanently, until the underlying table is dropped
D. Only while the warehouse that ran the query stays active
Why B: The 24-hour clock resets on each reuse. That reset can repeat until 31 days have passed since the first run, when the result is purged regardless.
Why not A: This drops the reset on reuse. A reused result extends past the first 24 hours.
Why not C: Nothing in the result cache is permanent. The 31-day ceiling is the hard limit.
Why not D: The result cache lives in Cloud Services and is not tied to any warehouse. It serves even when the warehouse is suspended.
Options:
A. The warehouse cache on local SSD
B. The result cache, from a prior identical run
C. The metadata cache in the Cloud Services layer
D. None, because the query resumed the warehouse and scanned the table
Why C: A row count is stored in table metadata. Snowflake returns it from the Cloud Services layer with no warehouse and no scan.
Why not A: The warehouse cache only exists on a running warehouse. This one is suspended, so its SSD cache is gone.
Why not B: The stem says the query ran for the first time, so no result was stored to reuse.
Why not D: A metadata-answered count does not resume the warehouse. The result came from statistics, not a scan.
Options:
A. It is shared across all warehouses in the account
B. It survives a suspend and resume of the warehouse
C. It is cleared when the warehouse suspends and rebuilds after a resume
D. It is turned off by setting USE_CACHED_RESULT to FALSE
Why C: The warehouse cache lives on the warehouse’s local SSD. A suspend drops it. A resume starts with an empty cache that fills as queries run.
Why not A: The cache is local to one warehouse. The shared caches are the result and metadata caches in Cloud Services.
Why not B: This is the most common false claim on this topic. The suspend clears it.
Why not D: That parameter controls only the result cache. It has no effect on the warehouse cache.
Options:
A. An INSERT into one of the tables the query reads
B. Running the identical query from a different virtual warehouse
C. Re-running the query with every keyword changed to lowercase
D. Suspending the warehouse that ran the original query
E. Granting a new role to an unrelated user
Why A: A DML change to referenced data invalidates the stored result. The next run recomputes.
Why C: Reuse needs an exact text match. Changing case breaks the match, so the cache is not reused.
Why not B: The result cache is shared across warehouses. A different warehouse still reuses the result.
Why not D: The result cache lives in Cloud Services, independent of any warehouse. A suspend does not clear it.
Why not E: A grant to an unrelated user has no bearing on this query’s cached result.
Options:
A. The warehouse cache, defaulting to FALSE
B. The metadata cache, defaulting to TRUE
C. Query result cache reuse, defaulting to TRUE
D. All three cache layers at once, defaulting to TRUE
Why C: The parameter switches result-cache reuse on or off and is TRUE by default. Setting it to FALSE forces a real run.
Why not A: The warehouse cache has no on-switch parameter. The default shown here is also wrong.
Why not B: The metadata cache cannot be disabled at all. No parameter governs it.
Why not D: The parameter touches only the result cache, not all three layers.
Snowflake Documentation
Official docs for today’s topics. The exam pulls directly from these.
Today you learned: three caches that answer a query without redoing the work. The result cache stores a whole answer in Cloud Services for 24 hours from last use, with a 31-day ceiling from the first run. Reuse needs an exact query match, unchanged data, and the right privileges. Any DML on a referenced table invalidates it. The metadata cache answers COUNT(*) from statistics, with no warehouse. It also serves MIN and MAX on numeric, date, or timestamp columns. MIN and MAX on a string column still scan. The warehouse cache keeps recently read data on local SSD and is dropped when the warehouse suspends.
Key takeaway: sort by layer. The result cache and the metadata cache sit in Cloud Services and need no warehouse, so a suspended warehouse can still serve both. The warehouse cache lives on the warehouse and dies with it. Two false claims carry the most exam weight: the result cache lasting forever, and the warehouse cache surviving a suspend. Both are wrong.
Tomorrow (Day 41): SQL: Aggregate and Window Functions, plus QUALIFY. You meet ROW_NUMBER, RANK, DENSE_RANK, LEAD, and LAG, then filter their output without a subquery. Carry in two rules. QUALIFY filters window-function results the way HAVING filters groups. RANK skips numbers after a tie, while DENSE_RANK does not.
