Review Week 5

DAY 35

Week 5 Recap

No new material today. Five weeks in, you have Domain 3 covered: stages, file formats, COPY INTO loading and unloading, and Snowpipe. The week also covered Snowpipe Streaming, Streams and Tasks, and the drivers, connectors, and integrations that link Snowflake to the outside. Domain 3 is 18% of the exam. Today is a 30-minute closed-book drill, the recurring Domain 3 traps, and the Week 5 Practice Test. Target 8 out of 10 to clear into Week 6.

🗣️ How to use today

Step	Time	What you do
1. Closed-book drill	30 min	Run the four drill tasks below. No tabs open. Write the SQL first, run it after.
2. Self-grade	5 min	Open Snowsight, run your SQL, score yourself honestly.
3. Read the gotchas	10 min	Speed-read the trap patterns below. These are the ones Domain 3 reuses in scenario form.
4. Take the Week 5 Practice Test	15 min	10 Domain 3 questions. Target 8 out of 10. Below that, re-read the day flagged in your wrong answers before Week 6.

🛠️

The 30-Minute Drill

Type: CLOSED-BOOK DRILL | Time: ~30 minutes | Credits: <0.05 | Rule: No docs, no Day 29-34 tabs. Step 1 reads day10_orders but never changes it. Write the answers first, run them after.

Stage, file format, then COPY INTO with PURGE. Stage a small file, load it into a fresh table, and confirm PURGE removes the staged file after a successful load.

SQL

-- Read-only source: day10_orders. Never modified in this drill.
CREATE OR REPLACE STAGE recap_stage;
CREATE OR REPLACE FILE FORMAT recap_csv
  TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1;
CREATE OR REPLACE TABLE drill_load (
  o_orderkey INT, o_custkey INT, o_orderstatus STRING);

-- Produce one small staged file by reading 100 rows (read-only SELECT):
COPY INTO @recap_stage/drill/
  FROM (SELECT o_orderkey, o_custkey, o_orderstatus
        FROM day10_orders LIMIT 100)
  FILE_FORMAT = (TYPE = CSV) HEADER = TRUE SINGLE = TRUE;

-- Load it, then PURGE the staged file:
COPY INTO drill_load
  FROM @recap_stage/drill/
  FILE_FORMAT = recap_csv
  PURGE = TRUE;

LIST @recap_stage/drill/;          -- empty: PURGE removed the file
SELECT COUNT(*) FROM drill_load;   -- 100

👀 Self-check: Run the load COPY INTO drill_load a second time without FORCE. It loads 0 rows. The load metadata already marks those files as loaded. If you expected duplicates, re-read Day 30.

Stream on a table, then watch the metadata. Create a stream, make changes, and prove that querying it does not advance the offset.

SQL

CREATE OR REPLACE TABLE drill_cdc (id INT, name STRING, status STRING);
INSERT INTO drill_cdc VALUES (1,'Alice','active'),(2,'Bob','active');

CREATE OR REPLACE STREAM drill_stream ON TABLE drill_cdc;
SELECT * FROM drill_stream;     -- empty: no changes since creation

INSERT INTO drill_cdc VALUES (3,'Charlie','active');
UPDATE drill_cdc SET status = 'inactive' WHERE id = 1;

SELECT id, name, status, METADATA$ACTION, METADATA$ISUPDATE
  FROM drill_stream;           -- run it twice: same rows both times

-- Only a DML consume advances the offset:
CREATE OR REPLACE TABLE drill_target LIKE drill_cdc;
INSERT INTO drill_target
  SELECT id, name, status FROM drill_stream;
SELECT * FROM drill_stream;     -- empty now: offset advanced

👀 Self-check: The UPDATE shows up as a DELETE plus an INSERT, both with METADATA$ISUPDATE = TRUE. If you predicted a single UPDATE row, re-read Day 33.

Snowpipe versus scheduled COPY INTO, in one line each. Write the use case for each from memory, then check.

CHECK

Snowpipe:
  Files land continuously and you want them loaded within minutes.
  Serverless. No warehouse. Default ON_ERROR = SKIP_FILE.

Scheduled COPY INTO:
  Files arrive in predictable batches on a clock or a task.
  Runs on your warehouse. Default ON_ERROR = ABORT_STATEMENT.

Snowpipe Streaming:
  Sub-second, row-level inserts straight from an app via the API.
  No staged files at all. Separate from COPY INTO.

👀 Self-check: If you wrote “Snowpipe needs a warehouse,” that is the trap. Snowpipe is serverless. The warehouse line belongs to scheduled COPY INTO. Re-read Day 32.

Storage integration versus API integration: three distinctions. List three differences without peeking.

CHECK

Storage integration:
  • Grants cross-account access to cloud storage (S3/Azure/GCS).
  • Keeps credentials OUT of the stage DDL. No keys in CREATE STAGE.
  • Used by external stages and external tables.

API integration:
  • Authorizes calls to an external HTTPS endpoint.
  • Required for external functions and for Git repositories.
  • Holds the proxy/provider config, not storage credentials.

👀 Self-check: Storage integration is about reading and writing files. API integration is about calling out to code or a Git repository. Mixing the two is a single-letter trap. Re-read Day 34.

Bonus speed round (60 seconds each):

CHECK

Q: What do @~, @%orders, and @my_stage point to?
A: User stage, table stage for ORDERS, named internal stage.

Q: After how many days does COPY load metadata expire?
A: 64 days. Past that, use LOAD_UNCERTAIN_FILES (or FORCE).

Q: Unload a fixed-point column to Parquet. What type lands?
A: DECIMAL. Floating-point lands as DOUBLE. Fixed-point is NOT DOUBLE.

Q: Which function gates a task on new stream changes?
A: SYSTEM$STREAM_HAS_DATA('stream_name').

Q: One COPY INTO @stage FROM table produced several files. Why?
A: Unload writes multiple parallel files by default. SINGLE=TRUE forces one.

👀 Self-check: 4 out of 5 right means you are tracking. 2 or fewer means stop here. Spend 30 minutes back in the days you slipped on, then take the test.

✓

Drill cleanup. Drop everything the drill created. day10_orders is left untouched.

SQL

DROP STREAM IF EXISTS drill_stream;
DROP TABLE  IF EXISTS drill_target;
DROP TABLE  IF EXISTS drill_cdc;
DROP TABLE  IF EXISTS drill_load;
DROP STAGE  IF EXISTS recap_stage;
DROP FILE FORMAT IF EXISTS recap_csv;
-- day10_orders is intentionally NOT dropped. It is needed through Day 39.

🎯

Week 5 Exam Gotchas

These are the patterns Domain 3 reuses in scenario form. Each one has a Tell, the trigger phrase the question uses to set the trap. Learn the Tells and you catch the trap before you finish reading the options.

Gotcha 1: Stage prefixes decode the question

@~ is your user stage. @%orders is the table stage for the table orders. @my_stage is a named stage. Tell: a stem that shows a prefix and asks which stage type it is. The symbol is the whole answer.

Gotcha 2: PUT and GET need a client, not a worksheet

PUT and GET move files between a local machine and a stage. A browser cannot reach the local file system. The Snowsight worksheet cannot run them. Use SnowSQL, the Snowflake CLI, or a driver. Tell: “uploaded the file by running PUT in a worksheet.”

Gotcha 3: Directory tables make a stage queryable

A directory table returns one row per file in the stage, with its name, size, and last-modified time. It is the metadata layer over unstructured files. Tell: “list the files in the stage and their metadata as rows.”

Gotcha 4: Presigned URL is the one a non-Snowflake tool can open

A scoped URL and a file URL still need Snowflake authentication. A presigned URL carries a temporary signature, so an external tool opens it with no Snowflake login. Tell: “share the file with a tool that has no Snowflake credentials.”

Gotcha 5: Encryption mismatch breaks the file URL

If the stage encryption setting does not match how the file was written, the download fails when you open the URL. Tell: “the file URL returns an error opening the file,” paired with a server-side encryption detail in the setup.

Gotcha 6: Default ON_ERROR differs by load type

Bulk COPY INTO defaults to ABORT_STATEMENT: one bad row fails the whole load. Snowpipe defaults to SKIP_FILE: the bad file is skipped and the rest continue. Tell: “bulk load, default error handling” points to ABORT_STATEMENT; “continuous pipe” points to SKIP_FILE.

Gotcha 7: PURGE deletes staged files after a successful load

PURGE = TRUE removes the source files from the stage once the load succeeds. A failed purge returns no error, so confirm with LIST. Tell: “remove the files from the stage automatically after loading.”

Gotcha 8: FORCE reloads regardless of history

FORCE = TRUE loads every named file again, even files already marked as loaded. It can create duplicates on purpose. Tell: “load the same files again right now,” with no mention of changed data.

Gotcha 9: Load metadata expires after 64 days

Snowflake tracks load status per file for 64 days from the file’s last-modified date. Inside that window, a re-run skips already-loaded files. Tell: the words “staged more than 64 days ago” or a date math setup in the stem.

Gotcha 10: Past 64 days, you need LOAD_UNCERTAIN_FILES or FORCE

Once metadata expires, COPY cannot tell whether a file loaded, so it skips it by default. Set LOAD_UNCERTAIN_FILES = TRUE (or use FORCE) to load it anyway. Tell: “the file is old and the COPY skipped it.”

Gotcha 11: A repeat COPY loads zero rows

Re-running the same COPY INTO on files already loaded returns 0 rows loaded, not duplicates. This is load-history protection working as designed. Tell: “ran the COPY twice and the second run loaded nothing.”

Gotcha 12: Unload writes many files by default

COPY INTO @stage FROM table writes multiple files in parallel. SINGLE = TRUE forces one file and is slower. MAX_FILE_SIZE caps each file’s size. Tell: “one unload command produced several files, why.”

Gotcha 13: Parquet preserves fixed-point as DECIMAL

On unload to Parquet, fixed-point columns become DECIMAL. Floating-point columns become DOUBLE. In training sessions I have run, the most-missed unload fact is exactly this. The trap option says fixed-point becomes DOUBLE. Tell: “unloaded a NUMBER column to Parquet, what type.”

Gotcha 14: HEADER=TRUE keeps column names on unload

Unload writes data only by default. HEADER = TRUE adds the column names as the first row. Tell: “the unloaded file should keep the column names.”

Gotcha 15: Snowpipe is serverless

Snowpipe uses Snowflake-managed compute. It does not run on a user warehouse. Tell: “which warehouse size should the pipe use.” The answer is none.

Gotcha 16: Snowpipe Streaming is a separate API

Snowpipe Streaming writes rows straight from an application with sub-second latency. It does not stage files and it does not use COPY INTO. Tell: “row-level, no staged files, lowest latency” points to Streaming, not classic Snowpipe.

Gotcha 17: A stream stores an offset, not data

A stream holds a pointer to a point in the source table’s history. It does not keep a copy of the changed rows. That is why you can create many streams cheaply. Tell: “the stream stores the changed data” is the false option.

Gotcha 18: Querying a stream does not advance it

A plain SELECT on a stream returns the changes and leaves the offset where it was. Only a DML statement that consumes the stream advances the offset. Tell: “ran SELECT on the stream and then it was empty” is wrong.

Gotcha 19: Stream types split by what they track

Standard tracks inserts, updates, and deletes. Append-only tracks inserts only on tables and views. Insert-only is for external tables. Tell: “track only new rows on a standard table” points to append-only.

Gotcha 20: Gate a task with SYSTEM$STREAM_HAS_DATA

A task can check SYSTEM$STREAM_HAS_DATA('s') in its WHEN clause and skip the run when nothing changed. Tasks run on a warehouse or serverless, and chain into DAGs. Tell: “run the task only when the stream has changes.”

Gotcha 21: Storage integration keeps keys out of DDL

A storage integration grants cross-account access to cloud storage without putting credentials in the CREATE STAGE statement. Tell: “load from S3 without embedding access keys in the stage.”

Gotcha 22: API integration powers external functions and Git

An API integration authorizes calls to an external endpoint. External functions need one. Git integration needs one too, and it clones a repo into a repository stage for version-controlled SQL and notebooks. A driver connects a programming language. A connector integrates an ecosystem such as Spark or Kafka. Tell: “connect a Git repository” or “call an external endpoint” points to API integration; “load from Kafka” points to the connector.

⚡

Week 5 Readiness Checklist

If you can confidently…	You’re solid on	Otherwise re-skim
Name the three stage prefixes and what each points to	Day 29	Day 29: Stages
Say why PUT and GET need a client, not a worksheet	Day 29	Day 29: PUT/GET tooling
Explain what a directory table returns	Day 29	Day 29: Directory tables
Pick the file URL type a non-Snowflake tool can open	Day 29	Day 29: File URLs
State the default ON_ERROR for bulk versus Snowpipe	Day 30, Day 32	Day 30: ON_ERROR
Explain PURGE, FORCE, and the 64-day window	Day 30	Day 30: COPY options
Use LOAD_UNCERTAIN_FILES for an expired file	Day 30	Day 30: Load history
Predict multiple versus single output files on unload	Day 31	Day 31: Unloading
Recall Parquet type mapping for fixed-point versus float	Day 31	Day 31: Parquet types
Set HEADER and MAX_FILE_SIZE on an unload	Day 31	Day 31: Unload options
Explain why Snowpipe needs no warehouse	Day 32	Day 32: Snowpipe
Distinguish Snowpipe from Snowpipe Streaming	Day 32	Day 32: Streaming
Describe what a stream stores	Day 33	Day 33: Streams
State what advances a stream offset	Day 33	Day 33: Offset rules
Match a stream type to a use case	Day 33	Day 33: Stream types
Gate a task on stream data with a WHEN clause	Day 33	Day 33: Tasks
Explain storage integration versus API integration	Day 34	Day 34: Integrations
Describe Git integration and driver versus connector	Day 34	Day 34: Connectors

🎯 Exam Tip

Domain 3 questions reward reading the verb and the default. Two distractor families dominate. The first swaps a default value: it offers SKIP_FILE where the load is bulk, or ABORT_STATEMENT where the load is Snowpipe. The first thing to fix is which load type the stem describes. The second family swaps a mechanism. It pairs Snowpipe with a warehouse, claims a stream stores data, or says a plain SELECT empties the stream. Name the mechanism before you read the options. On multi-select, the stem states how many to pick. Partial credit is rare on the COF-C03, so both answers must be right.

📝 Week 5 Wrap

You’ve covered: the three stage types, external stages, file formats, and directory tables. On loading, you have COPY INTO with its ON_ERROR, PURGE, FORCE, and 64-day rules. On unloading, you have single versus multiple files and Parquet type mapping. You also have Snowpipe, Snowpipe Streaming, and Streams and Tasks for change data capture. Finally, the drivers and connectors, plus the storage, API, and Git integrations that link Snowflake to the outside.

Take the Week 5 Practice Test (10 Domain 3 questions). Target 8 out of 10. If you score 6 or below, re-read the days flagged in your wrong answers before starting Week 6.

Tomorrow (Day 36): Week 6 begins. We move into Domain 4, Performance Optimization. It opens with Query Profile and Query Insights. You learn to read the operator tree and spot bytes spilled to local versus remote storage. You also learn to catch weak pruning, exploding joins, and the gap between queue time and execution time. That is the first day of the stretch that takes you through workload management, caching, clustering, and search optimization by the end of the week.

← Day 34 All 50 Days Take Practice Test →

About the Author

Abhay Krishnan

Senior Data & AI Consultant

Connect on LinkedIn

With over five years of data engineering experience at EY and Infosys, Abhay Krishnan specializes in building scalable data pipelines and cloud warehousing solutions. He is a certified SnowPro Core professional, alongside credentials in AWS and Azure. Abhay created this 50-day track to solve a problem he faced firsthand: the lack of a structured, free resource for Snowflake certification prep. Follow him on LinkedIn for more data engineering insights.

Week 5 Recap

The 30-Minute Drill

Week 5 Exam Gotchas

Gotcha 1: Stage prefixes decode the question

Gotcha 2: PUT and GET need a client, not a worksheet

Gotcha 3: Directory tables make a stage queryable

Gotcha 4: Presigned URL is the one a non-Snowflake tool can open

Gotcha 5: Encryption mismatch breaks the file URL

Gotcha 6: Default ON_ERROR differs by load type

Gotcha 7: PURGE deletes staged files after a successful load

Gotcha 8: FORCE reloads regardless of history

Gotcha 9: Load metadata expires after 64 days

Gotcha 10: Past 64 days, you need LOAD_UNCERTAIN_FILES or FORCE

Gotcha 11: A repeat COPY loads zero rows

Gotcha 12: Unload writes many files by default

Gotcha 13: Parquet preserves fixed-point as DECIMAL

Gotcha 14: HEADER=TRUE keeps column names on unload

Gotcha 15: Snowpipe is serverless

Gotcha 16: Snowpipe Streaming is a separate API

Gotcha 17: A stream stores an offset, not data

Gotcha 18: Querying a stream does not advance it

Gotcha 19: Stream types split by what they track

Gotcha 20: Gate a task with SYSTEM$STREAM_HAS_DATA

Gotcha 21: Storage integration keeps keys out of DDL

Gotcha 22: API integration powers external functions and Git

Week 5 Readiness Checklist

Topics