D3: Data Loading Week 5

DAY 32

Snowpipe & Snowpipe Streaming – Continuous Data Loading

Day 31 unloaded table rows out to files. Today turns loading the other direction into something continuous. A manual COPY INTO runs once, when you submit it. Snowpipe runs on its own and loads each file within minutes of its arrival. Snowpipe Streaming goes further. It writes rows straight into a table, with no files at all. Two of today’s facts, the ON_ERROR default and the serverless billing, get their own questions.

🗣️ Plain-English First

Today adds two services and three words the exam leans on. Each carries a default or a fact the stems assume. Read this table before the concepts. The Snowpipe and Snowpipe Streaming rows are the pair the exam contrasts most often.

Word you know	What it usually means	What it means in Snowflake
Pipe	A tube that carries water	A named Snowflake object that holds one `COPY INTO` statement. Snowpipe runs that statement to load staged files.
Serverless	Running without a server	Compute that Snowflake provides and manages for you. There is no virtual warehouse to create or size.
Auto-ingest	Take something in by itself	A pipe setting where a cloud event notification tells Snowpipe a new file landed. The load then starts without you.
Snowpipe	(a product name)	The serverless service that loads files in micro-batches, minutes after they reach a stage.
Snowpipe Streaming	(a product name)	A separate service that writes individual rows into a table through an SDK. No files, no stage, latency in seconds.

Why the serverless row matters today: Snowpipe needs no warehouse. A stem that claims “Snowpipe requires a warehouse you size” is false. Hold that one fact. Several questions fall out of it.

📘

Today’s Concept

Micro-Concept 1: Snowpipe, the Serverless Continuous Loader

Snowpipe loads files from a stage as soon as they arrive. It runs a COPY INTO statement stored inside a pipe. A pipe is a first-class Snowflake object. It holds one COPY INTO that names a source stage and a target table.

The load is serverless. Snowflake supplies and manages the compute. You never create or size a virtual warehouse for it. That is the first break from the manual loading of Day 30.

A manual COPY INTO runs once, the moment you submit it. To keep a table fresh, you would schedule that command to repeat. Snowpipe removes the schedule. It loads each file in micro-batches, within minutes of arrival.

SQL

-- A pipe wraps exactly one COPY INTO statement
CREATE PIPE sales_pipe
  AUTO_INGEST = TRUE
  AS
  COPY INTO sales
    FROM @sales_stage
    FILE_FORMAT = (TYPE = CSV);

Snowflake declines to promise a load latency. File size, file format, and the COPY complexity all change it. The honest planning number is minutes, not seconds, for Snowpipe. Snowpipe Streaming is the path when seconds matter.

Micro-Concept 2: How Snowpipe Finds a New File

Snowpipe has to learn that a file arrived. Two mechanisms do that. The first is auto-ingest. The second is the Snowpipe REST API.

Auto-ingest reads cloud event notifications. You set AUTO_INGEST = TRUE on the pipe. The cloud storage service sends a message when a file lands. Snowpipe reads that message and queues the file for loading.

The event service is different on each cloud provider. The exam tests this mapping by naming a cloud and asking for the service.

Cloud provider	Event service Snowpipe reads
Amazon Web Services	Amazon SQS, often fed by an SNS topic
Microsoft Azure	Event Grid
Google Cloud	Pub/Sub

The REST API is the second path. Your application calls a Snowpipe endpoint with a pipe name and a file list. This path uses AUTO_INGEST = FALSE. No cloud events take part. Auto-ingest needs an external stage, because the events come from cloud storage. The REST path and manual refresh also work on an internal stage.

One detail catches candidates. A pipe ignores files that were already staged before it was created. Those files raised no event. To load them, run ALTER PIPE ... REFRESH. REFRESH queues files staged within the last 7 days. In training sessions I have run, one Snowpipe surprise comes up most. Files staged before the pipe existed never load on their own. REFRESH inside the 7-day window is the fix.

SQL

-- Manual path: no cloud events needed
CREATE PIPE load_pipe
  AUTO_INGEST = FALSE
  AS
  COPY INTO load_test
    FROM @load_stage
    FILE_FORMAT = (TYPE = CSV);

-- Queue files already staged in the last 7 days
ALTER PIPE load_pipe REFRESH;

Micro-Concept 3: The ON_ERROR Default and the Metadata Window

Two Snowpipe defaults differ from a manual COPY INTO. Both sit in stems that read almost identically to the load side. The exam rewards knowing which value applies to which path.

The ON_ERROR default flips. A manual COPY INTO defaults to ABORT_STATEMENT, the Day 30 fact. Snowpipe defaults to SKIP_FILE. A bad file is skipped. The remaining files keep loading. A stem that says “by default” wants one of these two values, so read whether it describes bulk loading or Snowpipe.

Load metadata sits in a different place for a different length of time. A manual COPY INTO records its load history in the target table, kept for 64 days. Snowpipe records its history in the pipe object, kept for 14 days. Both stop a file of the same name from loading twice, even if its contents changed.

Behaviour	Bulk COPY INTO	Snowpipe
`ON_ERROR` default	`ABORT_STATEMENT`	`SKIP_FILE`
Compute	A warehouse you size	Serverless, Snowflake-managed
Load metadata	Target table, 64 days	Pipe object, 14 days
What triggers a load	You run the command	An event or a REST call

Micro-Concept 4: Snowpipe Streaming, Row-Level Ingest

Snowpipe Streaming is a separate service. It does not use files. It does not use COPY INTO. It writes individual rows straight into a target table through an SDK.

The latency drops from minutes to seconds. A client opens a channel and pushes rows. Snowflake commits them. They become queryable in seconds, sometimes under one second. This fits event streams such as clickstream data, IoT readings, and application logs.

Snowpipe Streaming complements Snowpipe. It does not replace it. Files still suit data that already arrives as files. Rows suit data that arrives as a continuous flow. In production projects I have worked on, the choice came down to file shape. Data already written as files went to Snowpipe. A live event feed went to Snowpipe Streaming.

Two architectures exist. The classic one uses a Java SDK and writes rows directly into the table. The newer high-performance one routes rows through a pipe object and reaches higher throughput. For the exam, the headline is the same across both. Snowpipe Streaming is row-level, file-free, and lower latency than Snowpipe.

Micro-Concept 5: The Snowpipe Cost Model

Snowpipe bills as serverless compute. There is no warehouse active-time charge, because there is no warehouse you control. Snowflake meters the load work and charges your account in credits.

One familiar control does not apply here. A resource monitor caps virtual warehouse credits. It cannot cap Snowpipe, because the Snowpipe compute is Snowflake-managed rather than a warehouse you own. The exam tests this point directly, so do not pair Snowpipe with a resource monitor in an answer.

File sizing drives the cost either way. The older model added a per-file fee on top of compute, so a flood of tiny files cost more than a few larger ones. Snowflake simplified this in December 2025 to a fixed credit charge per gigabyte ingested. The practitioner rule survives the change. Aim for files in the 100 to 250 MB compressed range and avoid a stream of tiny ones.

⚡

Cheat Sheet

Concept	What to remember	Exam keyword
Snowpipe	Serverless continuous loader. No virtual warehouse to size	“Snowpipe = serverless”
Pipe object	A named object holding one `COPY INTO` statement	“pipe wraps a COPY”
Latency	Minutes, in micro-batches. Snowflake will not quote a fixed number	“Snowpipe = minutes”
Detection	Auto-ingest (cloud events) or the Snowpipe REST API	“two ways to detect files”
Auto-ingest events	AWS = SQS (often via SNS). Azure = Event Grid. GCP = Pub/Sub	“event service per cloud”
ON_ERROR default	Snowpipe = `SKIP_FILE`. Bulk `COPY INTO` = `ABORT_STATEMENT`	“Snowpipe skips, bulk aborts”
Load metadata	Pipe object, 14 days. Bulk `COPY INTO` = target table, 64 days	“pipe 14, table 64”
ALTER PIPE REFRESH	Queues files staged in the last 7 days. For one-off fixes, not routine use	“REFRESH = last 7 days”
SYSTEM$PIPE_STATUS	Returns JSON: `executionState`, `pendingFileCount`, last error	“PIPE_STATUS check”
Snowpipe Streaming	Row-level, no files, no `COPY INTO`, SDK, latency in seconds	“Streaming = rows, no files”
Cost control	Serverless billing. A resource monitor cannot cap Snowpipe	“no resource monitor on Snowpipe”

🎯

Exam Tip

🎯 Exam Tip

Domain 3 continuous-loading questions cluster around four claims. Each one is true or false on a single fact.

First: “Snowpipe needs a warehouse.” False. Snowpipe is serverless and uses Snowflake-managed compute. The same trap appears as “attach a resource monitor to the Snowpipe warehouse.” A resource monitor caps only virtual warehouses you own.

Second: “The ON_ERROR default.” Read whether the stem says bulk loading or Snowpipe. Bulk COPY INTO defaults to ABORT_STATEMENT. Snowpipe defaults to SKIP_FILE. Same option name, opposite answer.

Third: “Snowpipe Streaming uses COPY INTO.” False. Streaming is a separate API. It writes rows through an SDK with no files and no stage. If the stem mentions files or COPY INTO, it is describing Snowpipe, not Snowpipe Streaming.

Fourth: “Which event service for this cloud?” Match the named cloud to its service. Azure is Event Grid. Google Cloud is Pub/Sub. AWS is SQS, often fed by SNS. Swapping these across clouds is the intended catch.

🛠️

Hands-On Lab

Type: LAB (guided) | Time: ~20 minutes | Credits: <0.5 | Prerequisite: lab_xs warehouse (Day 1) and day10_orders table (Day 10). Both should still be present. This lab only reads day10_orders; it is never modified. We build a pipe on an internal stage with AUTO_INGEST = FALSE, because a trial account cannot easily wire up cloud event notifications. The manual REFRESH path loads the same way.

Create a fresh stage, a file format, and a target table. The persistent day10_orders table is the read-only source. We unload a small sample from it to act as the “incoming” files a pipe would normally watch.

SQL

USE WAREHOUSE lab_xs;

CREATE OR REPLACE STAGE day32_stage
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

CREATE OR REPLACE FILE FORMAT csv_format
  TYPE = CSV;

CREATE OR REPLACE TABLE load_test (
  o_orderkey    NUMBER,
  o_orderstatus VARCHAR,
  o_totalprice  NUMBER(12,2)
);

-- Confirm the persistent source is present and untouched
SELECT COUNT(*) AS source_rows FROM day10_orders;

👀 Observe: source_rows returns roughly 1.5 million. The load_test table is empty. Nothing here writes to day10_orders; the next step reads from it.

Unload a sample into the stage. These become the staged files the pipe will load. We unload without a header so the file matches the plain csv_format.

SQL

COPY INTO @day32_stage/incoming/
  FROM (SELECT o_orderkey, o_orderstatus, o_totalprice
        FROM day10_orders LIMIT 5000)
  FILE_FORMAT = (TYPE = CSV)
  OVERWRITE = TRUE;

LIST @day32_stage/incoming/;

👀 Observe: LIST shows one or more .csv.gz files under incoming/. This is the unload form of COPY INTO from Day 31, used here only to seed the stage.

Create the pipe with AUTO_INGEST = FALSE. The pipe holds one COPY INTO from the stage into load_test.

SQL

CREATE OR REPLACE PIPE day32_pipe
  AUTO_INGEST = FALSE
  AS
  COPY INTO load_test
    FROM @day32_stage/incoming/
    FILE_FORMAT = (FORMAT_NAME = csv_format);

SHOW PIPES LIKE 'day32_pipe';
DESCRIBE PIPE day32_pipe;

👀 Observe: SHOW PIPES lists the pipe. DESCRIBE PIPE shows the stored COPY INTO definition. The pipe has loaded nothing yet, because the files were staged before it existed.

Refresh the pipe to queue the staged files. REFRESH loads files staged in the last 7 days. That window covers the ones from Step 2.

SQL

ALTER PIPE day32_pipe REFRESH;

-- Read the pipe status as JSON
SELECT SYSTEM$PIPE_STATUS('day32_pipe');

👀 Observe: SYSTEM$PIPE_STATUS returns a JSON string. Look for executionState set to RUNNING and a pendingFileCount above zero just after the REFRESH. The serverless load runs without a warehouse.

Confirm the rows landed. Give the serverless load a short moment, then check the table and the load history.

SQL

SELECT COUNT(*) AS loaded_rows FROM load_test;

SELECT *
  FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
    TABLE_NAME  => 'load_test',
    START_TIME  => DATEADD('hour', -1, CURRENT_TIMESTAMP())));

👀 Observe: loaded_rows reaches 5000 once the load completes. COPY_HISTORY lists the file, its row count, and a status of Loaded. If loaded_rows is still zero, wait a few seconds and re-run; the load is asynchronous.

Clean up the lab objects and keep the source table.

SQL

DROP PIPE IF EXISTS day32_pipe;
DROP TABLE IF EXISTS load_test;
DROP FILE FORMAT IF EXISTS csv_format;

-- Remove the staged sample, then drop the stage
REMOVE @day32_stage/incoming/;
DROP STAGE IF EXISTS day32_stage;

-- Confirm the persistent table is untouched (Day 35 reads it)
SELECT COUNT(*) AS row_count FROM day10_orders;

👀 Observe: Every object built in this lab is dropped. The final SELECT COUNT(*) on day10_orders returns roughly 1.5 million rows, confirming the persistent Day 10 table is intact for later days.

📚

Snowflake Docs

Authoritative references for every fact in today’s post. The Snowpipe introduction and the Snowpipe Streaming overview are the two worth re-reading the day before the exam.

❄️

Snowpipe: Continuous Data Loading

docs.snowflake.com/en/user-guide/data-load-snowpipe-intro

→

❄️

Automate Loading With Cloud Messaging: Per-Cloud Event Services

docs.snowflake.com/en/user-guide/data-load-snowpipe-auto

→

❄️

Snowpipe Streaming Overview: Row-Level Ingest

docs.snowflake.com/en/user-guide/snowpipe-streaming/data-load-snowpipe-streaming-overview

→

🔗

External References

The ALTER PIPE reference covers the 7-day REFRESH window. The billing page documents the December 2025 pricing change.

Reference Snowflake Documentation

ALTER PIPE: REFRESH, the 7-Day Window, and Pipe Properties

Guide Snowflake Documentation

Snowpipe Costs: Serverless Billing and the Per-GB Model

❓

Practice Questions

Q1: What is the default value of the ON_ERROR option for a load performed by Snowpipe? ▸

Options:

A. ABORT_STATEMENT
B. SKIP_FILE
C. CONTINUE
D. SKIP_FILE_10%

✅ Answer: B

Why B: Snowpipe defaults ON_ERROR to SKIP_FILE. A file with errors is skipped. The other files keep loading. This differs from bulk loading on purpose.

Why not A: ABORT_STATEMENT is the default for a bulk COPY INTO <table>, not Snowpipe. Swapping the two paths is the intended trap. Re-read Day 30 if missed.

Why not C: CONTINUE loads the good rows and skips only the bad rows. It is a valid value, but not the Snowpipe default.

Why not D: SKIP_FILE_n and percentage forms exist as options, but none of them is the default value.

Q2: A Snowflake account is hosted on Microsoft Azure. Which service delivers event notifications that trigger auto-ingest Snowpipe? ▸

Options:

A. Amazon SQS
B. Event Grid
C. Pub/Sub
D. Azure Service Bus

✅ Answer: B

Why B: On Azure, auto-ingest Snowpipe reads Event Grid notifications. The cloud storage account sends an Event Grid message when a blob lands. Snowpipe then queues the file.

Why not A: Amazon SQS is the AWS service, often fed by an SNS topic. It is the right answer only for an AWS-hosted account.

Why not C: Pub/Sub is the Google Cloud service. Matching it to Azure is the cross-cloud catch the stem sets up.

Why not D: Azure Service Bus is a real messaging service, but Snowpipe auto-ingest on Azure uses Event Grid, not Service Bus.

Q3: Which statements describe Snowpipe compared to a manually scheduled COPY INTO? (Select TWO) ▸

Options:

A. Snowpipe uses Snowflake-managed serverless compute, with no warehouse to size.
B. Snowpipe loads files as they arrive, without a repeating schedule.
C. A resource monitor can cap the credits Snowpipe consumes.
D. Snowpipe delivers lower latency than Snowpipe Streaming.
E. Snowpipe loads every file in a single account-wide transaction.

✅ Answer: A and B

Why A and B: Snowpipe is serverless, so there is no warehouse to create or size. It is event-driven or REST-driven, so it loads each file soon after arrival rather than on a schedule you maintain.

Why not C: A resource monitor caps only virtual warehouses you own. Snowpipe compute is Snowflake-managed, so a resource monitor cannot cap it.

Why not D: Snowpipe Streaming is the lower-latency path, reaching seconds. Snowpipe runs in micro-batches measured in minutes.

Why not E: Snowpipe splits or combines files across one or more transactions. There is no single account-wide transaction.

Q4: Which statements about Snowpipe Streaming are correct? (Select TWO) ▸

Options:

A. It writes rows directly into a table through an SDK, with no files.
B. It achieves lower latency than file-based Snowpipe, reaching seconds.
C. It loads staged files by running a COPY INTO statement.
D. It requires an external stage to hold the incoming rows.
E. It replaces Snowpipe for all continuous loading.

✅ Answer: A and B

Why A and B: Snowpipe Streaming pushes individual rows through an SDK straight into the table. It skips files and stages entirely. That is how it reaches latency measured in seconds.

Why not C: COPY INTO is the file-based path used by bulk loading and Snowpipe. Snowpipe Streaming uses a separate row-level API.

Why not D: Streaming needs no stage, because it never writes files. The rows go straight to the table.

Why not E: Snowpipe Streaming complements Snowpipe. File-based data still loads through Snowpipe.

Q5: A team wants to cap the credits Snowpipe uses by attaching a resource monitor to it. What is the correct outcome? ▸

Options:

A. The resource monitor caps Snowpipe credits once attached.
B. A resource monitor cannot control Snowpipe credits, because Snowpipe uses Snowflake-managed serverless compute.
C. Snowpipe is free, so no cost control is needed.
D. Snowpipe is billed by the active time of the user’s current warehouse.

✅ Answer: B

Why B: Resource monitors control virtual warehouse credit usage. Snowpipe runs on Snowflake-managed serverless compute, not a warehouse you own, so a resource monitor cannot cap it.

Why not A: There is no Snowpipe warehouse for the monitor to attach to. The attachment does not apply.

Why not C: Snowpipe consumes credits as serverless compute. It is not free, so cost control still matters through file sizing.

Why not D: Snowpipe does not bill against your current warehouse. Its serverless compute is metered and billed separately.

📝 Recap

Today you learned: Snowpipe loads staged files continuously, in micro-batches, within minutes. It is serverless, so it needs no virtual warehouse. A pipe object holds one COPY INTO statement. Snowpipe finds files through auto-ingest (cloud events: SQS on AWS, Event Grid on Azure, Pub/Sub on GCP) or the REST API. Its ON_ERROR default is SKIP_FILE, where bulk loading uses ABORT_STATEMENT. Load metadata lives in the pipe for 14 days, against 64 days in the target table for bulk loads. ALTER PIPE ... REFRESH queues files staged in the last 7 days. Snowpipe Streaming is a separate, row-level service with no files and latency in seconds.

Key takeaway: Four facts carry most of the Day 32 questions. Snowpipe is serverless and cannot be capped by a resource monitor. The ON_ERROR default is SKIP_FILE, the opposite of bulk loading. The auto-ingest event service changes by cloud. Snowpipe Streaming is row-level with no COPY INTO. Get those four and the option syntax follows.

Tomorrow (Day 33): Streams and Tasks. We turn continuous loads into change data capture. A stream tracks inserts, updates, and deletes on a table without storing the data itself. A task runs SQL on a schedule, or fires only when a stream has data. Together they automate the pipeline.

← Day 31 All 50 Days Day 33 →

About the Author

Abhay Krishnan

Senior Data & AI Consultant

Connect on LinkedIn

With over five years of data engineering experience at EY and Infosys, Abhay Krishnan specializes in building scalable data pipelines and cloud warehousing solutions. He is a certified SnowPro Core professional, alongside credentials in AWS and Azure. Abhay created this 50-day track to solve a problem he faced firsthand: the lack of a structured, free resource for Snowflake certification prep. Follow him on LinkedIn for more data engineering insights.

Snowpipe & Snowpipe Streaming – Continuous Data Loading

Today’s Concept

Micro-Concept 1: Snowpipe, the Serverless Continuous Loader

Micro-Concept 2: How Snowpipe Finds a New File

Micro-Concept 3: The ON_ERROR Default and the Metadata Window

Micro-Concept 4: Snowpipe Streaming, Row-Level Ingest

Micro-Concept 5: The Snowpipe Cost Model

Cheat Sheet

Exam Tip

Hands-On Lab

Snowflake Docs

External References

Practice Questions

Topics