Snowflake Data Unloading & Parquet Type Mapping
Day 30 read staged files from a stage into tables. Today’s command sends those same rows the other way. The command keeps its name: COPY INTO <location>. The output defaults and the Parquet type rules are where the exam questions sit. One Parquet fact below appears as its own single-select question.
Unloading reuses one command and adds five new options. Each option has a default the exam assumes when the stem stays silent. Read this table before the concepts. The SINGLE and HEADER rows are tested most often.
| Word you know | What it usually means | What it means in Snowflake |
|---|---|---|
| Unload | Take cargo off a truck | Write table rows out to files in a stage. The reverse of loading. The command is COPY INTO <location>, with a stage path on the left. |
| SINGLE | One of something | A copy option for file count. FALSE is the default and writes many files in parallel. TRUE writes exactly one file. |
| MAX_FILE_SIZE | The biggest a file can be | An upper target in bytes for each unloaded file. Default is 16 MB. Snowflake aims for it but does not promise it. |
| HEADER | A heading at the top of a page | A copy option that writes the column names as the first row of each output file. Off by default. |
| PARTITION BY | Sort items into bins | A copy option that splits rows into separate files using an expression. It builds a folder layout inside the stage. |
Why the SINGLE row matters today: the default is multiple files, not one. Candidates who expect a single output file pick the wrong answer on the most basic unload question. Get the default right and the rest of the options follow from it.
Today’s Concept
Micro-Concept 1: COPY INTO <location>, the Unload Direction
The same COPY INTO command runs in two directions. Day 30 used the load form, stage to table. Today uses the unload form, table to stage. The stage path moves to the left of the FROM clause. That single change flips the direction.
The source can be a table. It can also be a SELECT query. A query source accepts the full Snowflake SQL syntax, including JOIN. This is the reverse of the load side. A load transformation could not join, but an unload query can.
-- Unload a whole table to a stage path
COPY INTO @my_stage/unload/
FROM sales
FILE_FORMAT = (TYPE = CSV);
-- Unload the result of a query, joins allowed
COPY INTO @my_stage/unload/
FROM (
SELECT s.order_id, c.region, s.amount
FROM sales s
JOIN customers c ON s.cust_id = c.cust_id
)
FILE_FORMAT = (TYPE = CSV);Three destinations are valid: an internal stage, an external stage, and a direct external location. With no options set, the output is CSV. The files are GZIP-compressed and UTF-8 encoded. Those three defaults sit behind many “what does the file look like” stems.
Micro-Concept 2: One File or Many, SINGLE and MAX_FILE_SIZE
The first decision an unload makes is file count. The SINGLE option controls it. The default is SINGLE = FALSE. That setting writes multiple files. Each parallel thread writes its own file. The names follow a pattern like data_0_0_0.csv.gz.
The file count scales with the warehouse. One server writes up to eight files at once. A larger warehouse has more servers, so it produces more, smaller files. This is the opposite of intuition: scaling up makes files smaller, not bigger.
Set SINGLE = TRUE to force one output file. It runs without the parallel split, so it is slower on large tables. With SINGLE = TRUE and no filename in the path, the file lands with no extension. Supply a filename and extension in the path to fix that.
| Option | Default | What it controls |
|---|---|---|
SINGLE | FALSE | One file (TRUE) versus many parallel files (FALSE). |
MAX_FILE_SIZE | 16 MB | Upper target in bytes per file. Raise it for fewer, larger files. |
MAX_FILE_SIZE sets the upper target for each file in a multi-file unload. The default is 16 MB (16,777,216 bytes). The maximum supported value is 5 GB for an external stage. The target is not a promise. A file can finish smaller when memory runs short. A single set of rows can push a file slightly past the limit.
Micro-Concept 3: HEADER and OVERWRITE
Two options decide what the output looks like and what happens to old files. Both default to off. Both catch candidates who assume the friendlier behaviour.
HEADER writes the column names as the first row. The default is FALSE, so unloaded files carry no header by default. Set HEADER = TRUE when a downstream tool expects a name row. On a recent migration, an unload left HEADER at its default. The loader on the far side expected a header line and rejected every file.
OVERWRITE decides what happens when a file of the same name already sits in the path. The default is FALSE. A second unload to a path with matching filenames raises an error instead of replacing them. Set OVERWRITE = TRUE to replace the old files. This is the unload-side equivalent of a name collision, not the 64-day load metadata from Day 30.
-- One file, with a header row, replacing any old file
COPY INTO @my_stage/unload/orders.csv
FROM sales
FILE_FORMAT = (TYPE = CSV)
SINGLE = TRUE
HEADER = TRUE
OVERWRITE = TRUE;Micro-Concept 4: Parquet Type Mapping, the Fact With Its Own Question
Unloading to Parquet changes column types in a fixed, documented way. The exam tests one mapping directly. A fixed-point column unloads to a Parquet DECIMAL column. The precision and scale are preserved. A NUMBER(12,2) total stays exact in the file.
A floating-point column unloads to a Parquet DOUBLE column. That is the pair the exam plays on. Fixed-point goes to DECIMAL. Floating-point goes to DOUBLE. The trap swaps them and claims fixed-point becomes DOUBLE. That claim is false.
A few other mappings are worth a glance. VARIANT, GEOGRAPHY, and GEOMETRY unload as JSON-encoded strings. TIMESTAMP_NTZ(9) unloads as milliseconds, not nanoseconds. Some types must be cast before they unload. You can also cast a number explicitly to choose its Parquet type.
-- Fixed-point NUMBER(12,2) lands as Parquet DECIMAL, precision kept
COPY INTO @my_stage/unload/parquet/
FROM (SELECT order_id, amount FROM sales)
FILE_FORMAT = (TYPE = PARQUET);
-- Cast explicitly to pick the Parquet integer width
COPY INTO @my_stage/unload/parquet/
FROM (SELECT CAST(order_id AS INT) FROM sales)
FILE_FORMAT = (TYPE = PARQUET);Micro-Concept 5: Partitioned Unload With PARTITION BY
The PARTITION BY option splits rows into separate files using an expression. Each distinct value of the expression gets its own path. The result is a folder layout in the stage. This is how Snowflake feeds a data lake that reads by directory.
The filenames carry the partition values and a data_ prefix. Snowflake also adds a UUID to each filename. The reason is INCLUDE_QUERY_ID. It defaults to TRUE the moment PARTITION BY is set. The UUID is the query ID of the unload statement.
Two cautions appear in the documentation. An ORDER BY paired with PARTITION BY does not guarantee row order in the files. Snowflake also advises partitioning on dates or timestamps rather than on sensitive string or numeric values. The partition values become visible in the file paths. In production projects I have worked on, PARTITION BY on a date column is the standard way to lay out lake data by day.
-- Split rows into one folder per order status
COPY INTO @my_stage/unload/parts/
FROM (SELECT order_id, order_status, amount FROM sales)
PARTITION BY ('status=' || order_status)
FILE_FORMAT = (TYPE = CSV COMPRESSION = GZIP)
HEADER = TRUE;
-- Produces paths like .../parts/status=O/data_..._.csv.gzOne account-level detail rounds this out. Two parameters can block unloading entirely. PREVENT_UNLOAD_TO_INLINE_URL stops ad hoc unloads to a cloud URL written into the statement. PREVENT_UNLOAD_TO_INTERNAL_STAGES stops unloads to any internal stage. Both are governance controls an admin sets, not copy options.
Cheat Sheet
| Concept | What to remember | Exam keyword |
|---|---|---|
| Unload command | COPY INTO <location>. Table or query out to a stage | “unload = reverse COPY” |
| Query source | A SELECT source allows JOIN, unlike a load transform | “unload SELECT can JOIN” |
| Default output | CSV, GZIP-compressed, UTF-8, multiple files | “CSV GZIP by default” |
| SINGLE | Default FALSE = many parallel files. TRUE = one file | “SINGLE default FALSE” |
| File count vs warehouse | Bigger warehouse makes more, smaller files. 8 files per server | “scale up = smaller files” |
| MAX_FILE_SIZE | Default 16 MB. Max 5 GB external. Upper target, not a promise | “16 MB default” |
| HEADER | Default FALSE. Set TRUE for a column-name row | “HEADER default FALSE” |
| OVERWRITE | Default FALSE. Same-named files raise an error | “OVERWRITE = TRUE to replace” |
| Parquet fixed-point | NUMBER / DECIMAL unloads to Parquet DECIMAL, precision kept | “fixed-point = DECIMAL” |
| Parquet floating-point | FLOAT / DOUBLE unloads to Parquet DOUBLE | “float = DOUBLE” |
| PARTITION BY | Splits rows into files by an expression. Builds a folder layout | “PARTITION BY” |
| Partition filenames | Carry partition values plus a UUID (INCLUDE_QUERY_ID default TRUE) | “UUID in partition files” |
Exam Tip
Domain 3 unload questions cluster around three patterns. Each maps to one fact you can state in a sentence.
First: “What is the default?” The three that get tested are SINGLE = FALSE (multiple files), HEADER = FALSE (no header row), and MAX_FILE_SIZE = 16 MB. A stem that says “by default” wants one of these three. The common miss is assuming a single output file.
Second: “Parquet type mapping.” Fixed-point columns unload to DECIMAL. Floating-point columns unload to DOUBLE. The trap statement reverses the pair. Read the column type in the stem, then match it to the right Parquet type.
Third: “Which option builds a directory layout?” The stem describes a folder-per-value structure for a data lake. The answer is PARTITION BY. The distractors are SINGLE and MAX_FILE_SIZE. Both change file count and size but never build folders.
Hands-On Lab
lab_xs warehouse (Day 1) and day10_orders table (Day 10). Both should still be present. This lab only reads day10_orders; it is never modified.Set up a fresh stage for the unloaded files. The persistent day10_orders table is the read-only source for every step below.
USE WAREHOUSE lab_xs;
CREATE OR REPLACE STAGE day31_stage
ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');
-- Confirm the source table is present and untouched
SELECT COUNT(*) AS source_rows FROM day10_orders;source_rows returns roughly 1.5 million. The SNOWFLAKE_SSE setting on the stage is the Day 29 habit carried forward. Nothing in this lab writes to day10_orders; every step unloads from it.Multi-file unload with the default SINGLE = FALSE. A small MAX_FILE_SIZE forces several output files.
COPY INTO @day31_stage/unload/multi/
FROM day10_orders
FILE_FORMAT = (TYPE = CSV COMPRESSION = GZIP)
HEADER = TRUE
MAX_FILE_SIZE = 5000000
OVERWRITE = TRUE;
LIST @day31_stage/unload/multi/;LIST returns several files named like data_0_0_0.csv.gz. The 5 MB cap split the table across files. HEADER = TRUE wrote a column-name row into each one. Parallel threads produced the multiple files.Single-file unload with SINGLE = TRUE. A named path gives the file a clean extension.
COPY INTO @day31_stage/unload/single/orders_sample.csv
FROM (SELECT o_orderkey, o_orderstatus, o_totalprice
FROM day10_orders LIMIT 1000)
FILE_FORMAT = (TYPE = CSV)
SINGLE = TRUE
HEADER = TRUE
OVERWRITE = TRUE;
LIST @day31_stage/unload/single/;LIST shows exactly one file under single/. SINGLE = TRUE skipped the parallel split. The filename in the path gave it the .csv extension. Without that name, the file would land with no extension.Parquet unload to see the type mapping. The o_totalprice column is a fixed-point NUMBER(12,2).
COPY INTO @day31_stage/unload/parquet/
FROM (SELECT o_orderkey, o_orderstatus, o_totalprice
FROM day10_orders LIMIT 10000)
FILE_FORMAT = (TYPE = PARQUET);
LIST @day31_stage/unload/parquet/;LIST returns one or more .parquet files. Inside them, o_totalprice is a Parquet DECIMAL column. The precision and scale of NUMBER(12,2) are preserved. A floating-point column here would have become DOUBLE instead.Partitioned unload into a folder layout. Each order status gets its own directory.
COPY INTO @day31_stage/unload/parts/
FROM (SELECT o_orderkey, o_orderstatus, o_totalprice
FROM day10_orders)
PARTITION BY ('status=' || o_orderstatus)
FILE_FORMAT = (TYPE = CSV COMPRESSION = GZIP)
HEADER = TRUE
MAX_FILE_SIZE = 5000000;
LIST @day31_stage/unload/parts/;LIST shows subfolders such as status=O/, status=F/, and status=P/. The three order-status values became three partition paths. Each filename carries a UUID, because INCLUDE_QUERY_ID defaulted to TRUE once PARTITION BY was set.Clean up the unloaded files and keep the source table.
-- Remove every unloaded file under this stage
REMOVE @day31_stage/unload/;
-- Drop the stage created for this lab
DROP STAGE IF EXISTS day31_stage;
-- Confirm the persistent table is untouched (Days 32 and 35 read it)
SELECT COUNT(*) AS row_count FROM day10_orders;REMOVE clears all unloaded files in one statement. After the DROP STAGE, the lab leaves nothing behind. The final SELECT COUNT(*) on day10_orders returns roughly 1.5 million rows, confirming the persistent Day 10 table is intact for tomorrow.Snowflake Docs
Authoritative references for every fact in today’s post. The COPY INTO <location> page and the considerations page are the two worth re-reading the day before the exam.
External References
Two Snowflake guides that walk through the unload steps and the download path.
Practice Questions
Options:
A. DECIMAL
B. DOUBLE
C. INTEGER
D. VARCHAR
Why A: Fixed-point types (NUMBER and DECIMAL with defined precision and scale) unload to Parquet DECIMAL columns. The precision and scale carry over unchanged, so the values stay exact.
Why not B: DOUBLE is where floating-point columns land on a Parquet unload. Swapping it with fixed-point is the intended trap.
Why not C: INTEGER would drop the fractional part. It is never the mapping for a fixed-point column.
Why not D: VARCHAR would discard the numeric type entirely. Parquet keeps the number, not a string.
Options:
A. As a single file named after the source table.
B. As multiple files written by parallel threads.
C. As one file per micro-partition in the table.
D. As a single compressed archive of all rows.
Why B: SINGLE defaults to FALSE. The unload splits across parallel threads and writes multiple files. The count scales with the warehouse size.
Why not A: SINGLE = TRUE produces one file, but it is not the default. The default is many files.
Why not C: Output file count tracks parallel threads and MAX_FILE_SIZE, not the source micro-partition count.
Why not D: Files are GZIP-compressed one by one, not bundled into a single archive.
Options:
A. SINGLE
B. HEADER
C. OVERWRITE
D. COMPRESSION
Why B: HEADER defaults to FALSE. Snowflake writes no column-name row unless you set HEADER = TRUE.
Why not A: SINGLE controls file count. It has no effect on whether a header row appears.
Why not C: OVERWRITE decides whether existing files are replaced. It does not add a header.
Why not D: COMPRESSION sets the codec. A GZIP default still leaves the header off on its own.
Options:
A. 16 MB, an upper target for each generated file.
B. 5 GB, a fixed size that every file must reach.
C. 128 MB, the Parquet row-group size.
D. 256 MB, the limit for single-file mode.
Why A: MAX_FILE_SIZE defaults to 16 MB. It is an upper target, not a guarantee. Files can finish smaller. A single set of rows can push a file slightly past the limit.
Why not B: 5 GB is the maximum supported value for an external stage, not the default. Nothing forces a file to reach it.
Why not C: 128 MB is the Parquet row-group size, a separate internal detail unrelated to the default.
Why not D: 256 MB is not a documented default. The intended target value is 16 MB.
Options:
A. It writes partition values into the unloaded file names and paths.
B. It guarantees row order inside each file when combined with ORDER BY.
C. INCLUDE_QUERY_ID defaults to TRUE once PARTITION BY is set.
D. It only works when the unload target format is Parquet.
E. It requires SINGLE = TRUE to take effect.
Why A and C: PARTITION BY splits rows into separate files by an expression. The partition values appear in the file paths and names. With partitioning on, INCLUDE_QUERY_ID defaults to TRUE, so a UUID is added to each filename.
Why not B: Pairing ORDER BY with PARTITION BY does not guarantee the order survives in the files.
Why not D: Partitioned unload works for CSV and JSON as well as Parquet. It is not Parquet-only.
Why not E: SINGLE = TRUE forces one file. That conflicts with splitting rows across partition files.
Today you learned: COPY INTO <location> unloads a table or a query result to a stage. A query source can use JOIN, unlike a load transform. The default output is multiple GZIP-compressed CSV files in UTF-8. SINGLE = FALSE is the default, so a single file needs SINGLE = TRUE. MAX_FILE_SIZE defaults to 16 MB as an upper target, not a promise. HEADER and OVERWRITE both default to FALSE. On a Parquet unload, fixed-point columns become DECIMAL and floating-point columns become DOUBLE. PARTITION BY splits rows into a folder layout and adds a UUID to each filename.
Key takeaway: Three facts carry most of the Day 31 questions. The unload defaults are SINGLE = FALSE, HEADER = FALSE, and MAX_FILE_SIZE = 16 MB. The Parquet pair is fixed-point to DECIMAL, floating-point to DOUBLE. The directory-layout option is PARTITION BY. Get those three and the rest is option syntax.
Tomorrow (Day 32): Snowpipe and Snowpipe Streaming. We move from manual unloads to continuous ingest. Snowpipe is serverless and needs no warehouse. Its ON_ERROR default flips to SKIP_FILE, the number we flagged back on Day 30. Snowpipe Streaming drops latency to row-level inserts through a separate API.
