D1: Architecture & Features Week 2

DAY 12

Apache Iceberg Tables & Dynamic Tables

Apache Iceberg tables are Snowflake’s open lakehouse format with native read and write. Dynamic Tables are declarative pipelines refreshed against a freshness target instead of a schedule. Today covers what each one is and the catalog modes Iceberg supports. The keyword TARGET_LAG beats both Streams+Tasks and Materialized Views for hands-off pipelines.

🗣️ Plain-English First

Term	Plain meaning
Apache Iceberg	An open-source table format. A specification, not a product. Data lives as Parquet files in cloud storage. Metadata files describe schema, partitions, and snapshots. Any engine can read and write the same files: Snowflake, Spark, Trino, Flink.
Catalog	A small service that tracks the current metadata file for each table. Engines hit the catalog first to find out which snapshot is “now”. Options in Snowflake: Snowflake itself, AWS Glue, Snowflake Open Catalog, or any Iceberg REST endpoint.
External volume	The Snowflake object that holds the cloud credentials. It tells Snowflake how to reach the S3, Azure, or GCS location where the Iceberg files live. Required for every Iceberg table, both catalog modes.
Snowflake-managed Iceberg	Catalog = SNOWFLAKE. Snowflake owns the writes. External engines read only. The right choice when Snowflake is the primary producer of the data.
Externally-managed Iceberg	Catalog = Glue, REST, or Open Catalog. External engines like Spark and Flink can write. Snowflake can also write. That capability went GA in October 2025.
Dynamic Table	A physical table whose contents are defined by a SELECT. Snowflake refreshes the result automatically and keeps it within a freshness target. Removes most of the Streams + Tasks boilerplate.
TARGET_LAG	The freshness target. `TARGET_LAG = '5 minutes'` tells Snowflake the dynamic table should never be more than five minutes behind the source. Snowflake picks the refresh schedule.

📘

Today’s Concept

Micro-Concept 1: Why Iceberg Exists

Yesterday covered external tables. Read-only, no Time Travel, limited schema handling. Useful for lightweight querying of files in cloud storage, not much else. Iceberg fixes those gaps. The format supports transactional writes, snapshot-based Time Travel, schema evolution, and multi-engine access against the same physical files.

The practical impact is concrete. A team can ingest data with Spark on Monday, transform it with Snowflake on Tuesday, and serve it to Trino on Wednesday. No copies between systems. That is the pitch for open table formats. It is why the C03 outline added Iceberg as a first-class table type.

Micro-Concept 2: The Three Parts of an Iceberg Table

Data files. Parquet, written to your cloud storage bucket (S3, Azure Blob, or GCS).
Metadata files. JSON files alongside the data. They describe the current schema, partition spec, and the chain of snapshots that make Time Travel work.
Catalog. The pointer to the current metadata file. Without a catalog, an engine cannot tell which snapshot is the latest committed version.

Snowflake adds two of its own objects on top:

External volume. The IAM-aware handle to your cloud storage location.
Catalog integration. Only required when the catalog is external. Not needed when CATALOG = SNOWFLAKE.

The external volume vs catalog integration distinction is exactly where candidates trip up. Many assume both are always required.

Micro-Concept 3: Two Catalog Modes for Iceberg in Snowflake

Mode	Catalog	Who can write?	Who can read?	When to choose
Snowflake-managed	`CATALOG = 'SNOWFLAKE'`	Snowflake only	Snowflake + external engines (Spark, Trino)	Snowflake is the primary producer; external engines just read
Externally-managed	Glue / REST / Open Catalog / Polaris	External engines, and Snowflake (GA Oct 2025)	Everyone	Multiple engines write; data lake-first organization

Both modes support full DML from Snowflake: INSERT, UPDATE, DELETE, MERGE. That is the headline feature. It is the single biggest jump over standard external tables.

Micro-Concept 4: Iceberg Beats Standard External Tables Where It Matters

Capability	Standard external table	Iceberg table
DML writes from Snowflake	No	Yes. INSERT, UPDATE, DELETE, MERGE all supported
Time Travel	No	Yes (via snapshots)
Schema evolution	Limited	Native (add/drop/rename columns)
Other engines (Spark, Trino) read same files	No	Yes
Cloning	No	Yes (Snowflake-managed)

If a scenario question lists “must support UPDATE from Snowflake”, “multi-engine access”, or “Time Travel on cloud storage data”, the answer is Iceberg. Standard external tables fail all three.

Micro-Concept 5: Dynamic Tables and What They Replace

Before Dynamic Tables, an incrementally-refreshed aggregate required four moving parts:

Create a Stream on the source table to track changes.
Create a Task on a schedule.
Write the Task body, a MERGE that consumes the Stream and applies changes to a target.
Own the orchestration: failures, retries, ordering between multiple targets in a DAG.

With a Dynamic Table, the same outcome is one statement:

SQL

CREATE DYNAMIC TABLE summary
  TARGET_LAG = '1 minute'
  WAREHOUSE  = lab_xs
AS
SELECT category, SUM(amount) total
FROM   orders
GROUP BY category;

Snowflake handles change detection. Snowflake handles incremental refresh where the SQL allows it, full refresh where it does not. Snowflake handles scheduling. If multiple dynamic tables chain together, Snowflake handles dependency ordering.

Micro-Concept 6: TARGET_LAG, the Freshness Knob

Declares the maximum staleness Snowflake will allow.
Snowflake decides when to refresh. You declare the lag, not a schedule.
No source change means no refresh, which means no compute. Billing only happens when work runs.
Minimum lag is 1 minute (60 seconds). For sub-minute freshness, fall back to Streams + Tasks or Snowpipe Streaming.
TARGET_LAG = DOWNSTREAM is a special value. The table refreshes only when a dynamic table that depends on it needs to refresh. The standard pattern for intermediate nodes in a DAG of dynamic tables.

Micro-Concept 7: Dynamic Tables vs Materialized Views vs Streams+Tasks

Need	Use
Simple aggregate of slow-changing data, sub-second freshness, narrow SQL subset, Enterprise+ only	Materialized View (Day 13)
Declarative pipeline with target freshness, full SQL, multiple chained transforms	Dynamic Table
Row-level CDC consumption with custom merge logic, you own the orchestration	Streams + Tasks (Day 33)

Scenario questions in this area come down to stem phrasing. “Declarative” or “automatic” or “without writing orchestration code” points to Dynamic Tables. “Sub-second” or “single-table aggregate” points to MVs. “Custom merge” or “row-level CDC consumption” points to Streams + Tasks.

Micro-Concept 8: Dynamic Iceberg Tables

Both features in one object: a Dynamic Table whose output is stored as an Iceberg table in cloud storage. Use it when downstream consumers are non-Snowflake engines. The refreshed result is then available to them in open format. Same TARGET_LAG syntax, plus CATALOG, EXTERNAL_VOLUME, and BASE_LOCATION.

⚡

Cheat Sheet

Concept	What to remember
Iceberg parts	Data files (Parquet) + metadata files + catalog. Snowflake adds external volume + (optionally) catalog integration.
Snowflake-managed catalog	`CATALOG = 'SNOWFLAKE'`. Snowflake writes; external engines read-only.
Externally-managed catalog	Glue / REST / Open Catalog. External engines and Snowflake both write (Snowflake write GA Oct 2025).
Iceberg DML	Full INSERT/UPDATE/DELETE/MERGE from Snowflake
Iceberg vs external tables	Iceberg supports writes, schema evolution, snapshot Time Travel. Standard external tables don’t.
Dynamic Table essentials	`CREATE DYNAMIC TABLE name TARGET_LAG = '...' WAREHOUSE = ... AS SELECT ...`
TARGET_LAG minimum	1 minute (60 seconds)
TARGET_LAG = ‘DOWNSTREAM’	Refresh only when a downstream dynamic table needs me
Refresh cost	Uses the named warehouse. No refresh = no cost. Snowflake decides timing.
Dynamic Table replaces	Streams + Tasks for most aggregation/transformation pipelines
Dynamic Iceberg Table	Stores the result as an Iceberg table in cloud storage. Combine both features.

🎯 Exam Tip

Three traps show up on Domain 1 questions in this area. (1) “Iceberg = read-only external table”. FALSE. Read-only describes standard external tables (Day 11). Iceberg supports full DML in both catalog modes. The question often disguises this by mentioning S3 or “files in cloud storage”. Those phrases alone do not make a table read-only. (2) “Dynamic Tables refresh continuously”. FALSE. Refresh is demand-driven against TARGET_LAG. If the source has not changed, Snowflake does not refresh, and you do not pay. The “continuous” wording is borrowed from Snowpipe. That phrasing is the catch. (3) “Use Streams + Tasks for declarative pipelines”. WRONG TOOL. Streams + Tasks is the opposite of declarative. You own every retry, every MERGE, every ordering decision. When a scenario says “the team does not want to manage orchestration”, “automated incremental processing”, or “single statement”, the answer is Dynamic Tables. I have seen this distinction tested in three different phrasings on practice banks. It is one of the highest-yield gotchas in sub-objective 1.5.

🛠️

Hands-On Lab

Type: LAB (guided) | Time: ~10 minutes | Credits: <0.05 | Prerequisite: lab_xs warehouse. | Note: Iceberg DDL is skeleton-only (needs an external volume + cloud bucket). Dynamic Table portion is fully runnable.

Build a base table for the Dynamic Table to consume.

SQL

USE ROLE SYSADMIN;
USE WAREHOUSE lab_xs;
ALTER WAREHOUSE lab_xs RESUME IF SUSPENDED;

CREATE OR REPLACE TABLE dt_source (
  id        INT,
  category  STRING,
  amount    FLOAT
);

INSERT INTO dt_source VALUES
  (1, 'A', 100),
  (2, 'B', 200),
  (3, 'A', 150);

SELECT * FROM dt_source;

👀 Observe: Three rows, two categories. The dynamic table in the next step will roll this up.

Create the Dynamic Table. One statement. No Stream, no Task, no MERGE.

SQL

CREATE OR REPLACE DYNAMIC TABLE dt_summary
  TARGET_LAG = '1 minute'
  WAREHOUSE  = lab_xs
AS
SELECT category, SUM(amount) AS total
FROM   dt_source
GROUP BY category;

SELECT * FROM dt_summary;
-- A → 250
-- B → 200

👀 Observe: The initial population runs synchronously at create time (the default INITIALIZE = ON_CREATE). From here on, refresh is automatic against the target lag.

Change the source and watch the refresh happen.

SQL

INSERT INTO dt_source VALUES (4, 'A', 300);

-- Wait ~60 seconds, then re-query
SELECT * FROM dt_summary;
-- A → 550 (was 250)
-- B → 200

👀 Observe: No task definition. No stream consumption. No MERGE. Snowflake detected the change and applied an incremental refresh.

Inspect refresh history. This is the view I check first when a dynamic table is misbehaving in production.

SQL

SELECT
  NAME,
  STATE,
  REFRESH_TRIGGER,
  REFRESH_START_TIME,
  REFRESH_END_TIME,
  ROWS_INSERTED, ROWS_UPDATED, ROWS_DELETED
FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY())
WHERE NAME = 'DT_SUMMARY'
ORDER BY REFRESH_START_TIME DESC
LIMIT 10;

👀 Observe: Each row is one refresh. REFRESH_TRIGGER tells you why it ran (SCHEDULED, MANUAL, CREATION). The row counts confirm whether refresh was incremental or a full recompute. Snowsight → Monitoring → Dynamic Tables gives the same data in a UI view.

Concept-only: Iceberg DDL. Running this requires an external volume and a cloud bucket the volume can reach. Skip execution on a trial that is not set up for that. Read the syntax. Recognizing the three-clause shape on the exam is the goal.

SQL

-- ============= Snowflake-managed Iceberg (skeleton) =============
-- (Requires an external volume already created — see docs.)

CREATE OR REPLACE ICEBERG TABLE orders_iceberg
  EXTERNAL_VOLUME = 'my_external_volume'
  CATALOG         = 'SNOWFLAKE'
  BASE_LOCATION   = 'iceberg/orders'
AS
  SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS;

-- Full DML works:
-- INSERT INTO orders_iceberg ...
-- UPDATE orders_iceberg SET ...
-- DELETE FROM orders_iceberg WHERE ...

-- ============= Dynamic Iceberg Table (skeleton) =============
CREATE OR REPLACE DYNAMIC ICEBERG TABLE orders_by_status
  TARGET_LAG     = '5 minutes'
  WAREHOUSE      = lab_xs
  EXTERNAL_VOLUME = 'my_external_volume'
  CATALOG         = 'SNOWFLAKE'
  BASE_LOCATION   = 'iceberg/orders_by_status'
AS
  SELECT O_ORDERSTATUS, COUNT(*) c
  FROM   orders_iceberg
  GROUP BY O_ORDERSTATUS;

-- Concept check only — no need to run.

👀 Observe: Three clauses separate an Iceberg CREATE from a regular one: EXTERNAL_VOLUME, CATALOG, BASE_LOCATION. When an exam question mentions “external volume” in the stem, it is signposting Iceberg as the answer.

Cleanup.

SQL

DROP DYNAMIC TABLE IF EXISTS dt_summary;
DROP TABLE         IF EXISTS dt_source;
ALTER WAREHOUSE lab_xs SUSPEND;
-- KEEP day10_orders — needed for Days 13, 26, 31, 32, 35, 39

💡 Pro tip: ALTER DYNAMIC TABLE ... SUSPEND stops refreshes but leaves the data in place. Use this on production pipelines when you want to pause the bill without losing the current materialized state. The DAG resumes from where it left off when you RESUME.

❄️

Snowflake Documentation

Official docs for today’s topics.

❄️

Apache Iceberg Tables

docs.snowflake.com/en/user-guide/tables-iceberg

→

❄️

Dynamic Tables (Overview)

docs.snowflake.com/en/user-guide/dynamic-tables-about

→

❄️

CREATE DYNAMIC TABLE

docs.snowflake.com/en/sql-reference/sql/create-dynamic-table

→

🔗

External References

Catalog setup and dynamic-iceberg patterns.

Docs Snowflake

CREATE ICEBERG TABLE (all variants)

Docs Snowflake

Dynamic Iceberg Tables

❓

Practice Questions

Q1: A team needs Snowflake to act as the writer for an open-format table while still allowing Apache Spark to query the same files. Which Iceberg configuration meets this? ▸

Options:

A. A standard external table on the Parquet files
B. A Snowflake-managed Iceberg table (CATALOG = ‘SNOWFLAKE’)
C. A transient table with directory metadata
D. A materialized view referencing the cloud storage location

✅ Answer: B

Why B: Snowflake-managed Iceberg is exactly this pattern. Snowflake owns the writes. The metadata is written in the open Iceberg spec that Spark, Trino, and Flink can read. External engines are read-only against this catalog mode, and Snowflake does the DML.

Why not A: Standard external tables are read-only from Snowflake. The scenario says Snowflake needs to write. Re-read Day 11 if missed.

Why not C: Transient tables are Snowflake-internal storage. They are not in open format, and external engines cannot read them.

Why not D: A materialized view is a Snowflake-internal object. It is not Iceberg. It has nothing to do with cloud storage in open format. External engines cannot access it.

Q2: What does TARGET_LAG = '5 minutes' mean for a Dynamic Table? ▸

Options:

A. The dynamic table refreshes exactly every 5 minutes, regardless of source changes
B. The dynamic table is kept within ~5 minutes of being current relative to the source; Snowflake decides when to refresh, and skips refreshes when the source hasn’t changed
C. Queries against the dynamic table will time out after 5 minutes
D. The dynamic table caches results for 5 minutes

✅ Answer: B

Why B: TARGET_LAG is a freshness target, not a cron schedule. Snowflake picks refresh timing to keep the data within the lag window. If the source has not changed, no refresh runs and no compute is billed.

Why not A: The classic distractor. “Every 5 minutes” describes a Task schedule, not a Dynamic Table target lag. If A were true, Snowflake would burn compute on every cycle even when the source was idle.

Why not C: TARGET_LAG has nothing to do with query timeouts. That parameter is STATEMENT_TIMEOUT_IN_SECONDS.

Why not D: A dynamic table is a physical table, not a cache. The result set is materialized on disk; it does not expire on a timer.

Q3: Which statements about Iceberg tables in Snowflake are TRUE? (Select TWO) ▸

Options:

A. Iceberg tables in Snowflake support full DML: INSERT, UPDATE, DELETE, MERGE
B. Snowflake-managed Iceberg requires no external volume
C. The catalog tracks the current metadata file for an Iceberg table
D. Iceberg tables in Snowflake do not support Time Travel
E. Iceberg tables can only be queried by Snowflake, not by external engines

✅ Answer: A, C

Why A: Full DML works in both catalog modes. This is the defining capability that separates Iceberg from standard external tables.

Why C: The catalog is the first layer of the Iceberg spec. Its job is to hold the pointer to the current metadata file so any engine can find the latest committed snapshot.

Why not B: The external volume is required regardless of catalog mode. It is how Snowflake reaches the cloud storage location where the Parquet files and metadata files live.

Why not D: Iceberg supports snapshot-based Time Travel. Every commit creates a new snapshot, and historical snapshots are queryable through standard Snowflake Time Travel syntax.

Why not E: Multi-engine access is the whole point of the format. Spark, Trino, Flink, and other Iceberg-aware engines can read these tables.

Q4: A data engineering team wants a declarative incremental aggregate of an orders table that stays within 2 minutes of fresh, without writing any tasks or stream-consumption SQL. Which feature fits BEST? ▸

Options:

A. A Stream on the orders table + a scheduled Task that consumes it
B. A standard view
C. A Dynamic Table with TARGET_LAG = ‘2 minutes’
D. An external table over the orders files

✅ Answer: C

Why C: Three signals in the stem point straight to Dynamic Tables: “declarative”, “incremental”, and a target freshness window. The 2-minute lag is well above the 1-minute minimum, so it is a valid configuration.

Why not A: Streams + Tasks would technically deliver the same outcome. The stem explicitly rules it out: “without writing any tasks or stream-consumption SQL”. The trap catches candidates who recognize the CDC pattern and pick A without re-reading the constraint.

Why not B: A standard view recomputes the SELECT on every query. No incremental processing, no stored result, no freshness target. Bad fit when the source is large.

Why not D: External tables do not aggregate. They expose files as queryable rows, nothing more.

Q5: For a chain of Dynamic Tables where an intermediate table only needs to refresh when its downstream consumer needs to refresh, which TARGET_LAG setting is appropriate for the intermediate table? ▸

Options:

A. TARGET_LAG = '1 minute'
B. TARGET_LAG = 'DOWNSTREAM'
C. TARGET_LAG = NULL
D. TARGET_LAG = 0

✅ Answer: B

Why B: DOWNSTREAM is the special value for exactly this pattern. The intermediate table refreshes on demand, only when a dependent dynamic table needs fresh data. No wasted compute on a node whose output nobody is consuming.

Why not A: A fixed 1-minute lag forces refresh every minute regardless of downstream demand. On a deep DAG, that is a lot of wasted compute for intermediate stages.

Why not C: NULL is not a valid TARGET_LAG value. Every dynamic table must have either a time interval or DOWNSTREAM.

Why not D: Zero is below the 1-minute minimum and is not a valid setting. Sub-minute freshness needs Streams + Tasks or Snowpipe Streaming, not Dynamic Tables.

📝 Recap

Today you learned: Apache Iceberg is an open table format. In Snowflake it comes in two catalog modes. Snowflake-managed has Snowflake writing while external engines read. Externally-managed has both external engines and Snowflake writing. Both modes support full DML and snapshot-based Time Travel, going far beyond standard external tables. Required objects: external volume, plus a catalog integration if external. Dynamic Tables are declarative pipelines defined by a SELECT and a TARGET_LAG. Snowflake schedules refreshes automatically and skips them when the source has not changed. DOWNSTREAM is the special lag value for intermediate stages.

Key takeaway: Iceberg = open format multi-engine writes. Dynamic Tables = declarative pipelines without orchestration code. Both are first-class C03 table types in sub-objective 1.5.

Tomorrow (Day 13): View Types: Standard vs Materialized vs Secure. Materialized Views are Enterprise+ and not free. Secure Views are slower but required for cross-database sharing. The cost and visibility tradeoff between all three is the exam focus. Wraps up Week 2’s storage objects.

← Day 11 All 50 Days Day 13 →

About the Author

Abhay Krishnan

Senior Data & AI Consultant

Connect on LinkedIn

With over five years of data engineering experience at EY and Infosys, Abhay Krishnan specializes in building scalable data pipelines and cloud warehousing solutions. He is a certified SnowPro Core professional, alongside credentials in AWS and Azure. Abhay created this 50-day track to solve a problem he faced firsthand: the lack of a structured, free resource for Snowflake certification prep. Follow him on LinkedIn for more data engineering insights.

Apache Iceberg Tables & Dynamic Tables

Today’s Concept

Micro-Concept 1: Why Iceberg Exists

Micro-Concept 2: The Three Parts of an Iceberg Table

Micro-Concept 3: Two Catalog Modes for Iceberg in Snowflake

Micro-Concept 4: Iceberg Beats Standard External Tables Where It Matters

Micro-Concept 5: Dynamic Tables and What They Replace

Micro-Concept 6: TARGET_LAG, the Freshness Knob

Micro-Concept 7: Dynamic Tables vs Materialized Views vs Streams+Tasks

Micro-Concept 8: Dynamic Iceberg Tables

Cheat Sheet

Hands-On Lab

Snowflake Documentation

External References

Practice Questions

Topics