Snowpark, Snowflake Notebooks & Streamlit in Snowflake
Week 3 opens with the AI/ML developer stack: Snowpark, Notebooks, and Streamlit-in-Snowflake as one integrated developer surface. Snowpark hosts non-SQL code that runs inside Snowflake, which is the answer to official Sample Q5 and the most-tested sentence in sub-objective 1.6.
| Term | Plain meaning |
|---|---|
| Snowpark | A set of client libraries (Python, Java, Scala) plus a server-side runtime that lets you write code that runs inside Snowflake. No data movement, no external compute. |
| DataFrame | A table-like object you can manipulate in code. Snowpark’s DataFrame looks like pandas/Spark but compiles to SQL behind the scenes. |
| UDF / UDTF | User-Defined Function / User-Defined Table Function. Custom logic (often Python) that Snowflake stores and runs on a warehouse like a built-in function. |
| Stored procedure | A named, callable block of code (SQL, Python, Java, Scala, JavaScript). Called with CALL. Procedural: runs multiple statements, loops, conditionals. |
| Snowflake Notebook | An interactive Snowsight surface with cells. Each cell is SQL, Python, or Markdown. Like Jupyter, but lives inside Snowflake and uses a warehouse. |
| Streamlit in Snowflake | A native Snowflake object that hosts a Python web app. No external hosting, no separate URL infrastructure. Users open it through Snowsight. |
| Snowpark-Optimized warehouse | A warehouse type with ~16× the memory per node of a standard warehouse, built for memory-intensive Python (ML training, large UDFs). |
| Kernel / Runtime | The Python execution engine. In Notebooks, the kernel runs on either a warehouse (Warehouse Runtime) or a container (Container Runtime). |
Today’s Concept
Micro-Concept 1: Snowpark: The One Sentence
One sentence from today is non-negotiable:
Snowpark is how you host non-SQL code (Python, Java, Scala) so it runs inside Snowflake, on a warehouse, with no data movement.
That is the verbatim answer to official Sample Q5. The stem asks which feature lets a customer host non-SQL code (Java, Python) that operates on Snowflake data with high performance. The answer is Snowpark. Not External Functions, which call out to a third-party service. Not stored procedures on their own. Those are the typical callable container for Snowpark code, not the engine.
“Inside Snowflake” has three concrete components:
- Code lives in Snowflake, as a Snowpark stored procedure, UDF, or UDTF object.
- Compute is a Snowflake warehouse (standard or Snowpark-Optimized). Not your laptop, not a separate cluster.
- Data never leaves. The DataFrame operations translate to SQL, run on the warehouse, and return results in place.
The Snowpark client libraries can run anywhere: your laptop, a notebook, an application server. When that client calls session.sql(...) or operates on a DataFrame, the actual work happens on the Snowflake warehouse. Candidates in training sessions I have run often assume that because they typed Python on their laptop, the Python ran on their laptop. It did not.
Micro-Concept 2: Snowpark vs Stored Procedure vs External Function
This is the distinction the exam returns to repeatedly. Three patterns, three answers:
| You want to… | Use | Why |
|---|---|---|
| Run Python/Java/Scala code inside Snowflake, with no data leaving | Snowpark | Hosts the code on the warehouse. Can deliver as a stored procedure or UDF, but the engine is Snowpark. |
| Run a block of SQL (or other) statements as a single named, callable unit | Stored procedure | Procedural orchestration. Called with CALL. Often the wrapper for Snowpark code, but does not require Snowpark. |
| Call an external service (REST API, SaaS endpoint) from inside SQL | External Function | Sends data out to an external HTTPS endpoint via an API integration. Snowflake does not host the code; the third-party service does. |
| Apply custom logic inline inside a SELECT, on many rows | UDF (often a Snowpark UDF) | Returns a value; can be called many times per statement. Snowpark Python UDFs are the most common case. |
The key rule is direction. Snowpark = code comes into Snowflake. External Function = data goes out to an external endpoint. Remember the direction and you will answer the scenario question correctly.
Micro-Concept 3: Snowpark-Optimized Warehouses
Standard warehouses are sized for SQL workloads. Snowpark code can hit memory limits on a standard warehouse and spill to disk. ML training and large pandas-style operations are the common cases. Snowpark-Optimized warehouses fix that with roughly 16× the memory per node of a same-size standard warehouse.
| Property | Standard warehouse | Snowpark-Optimized warehouse |
|---|---|---|
| Memory per node | 1× (baseline) | ~16× baseline |
| Local SSD cache | 1× | ~10× |
| Minimum size | X-Small | Medium (XS/Small not supported in the classic mode) |
| Credit cost | 1× (baseline) | ~1.5× the same size’s standard credits/hour |
| Best for | SQL workloads, light Python | ML training, memory-heavy Python UDFs/SPROCs |
| Multi-cluster? | Yes (Enterprise+) | Yes (Enterprise+) |
| Query Acceleration Service? | Yes | No. Not supported. |
Four facts to remember for the exam: minimum size Medium, 16× memory, ~1.5× credit cost, and recommended for memory-intensive Snowpark workloads. ML training is the headline scenario. The newer RESOURCE_CONSTRAINT clause (released March 2025) does let you specify MEMORY_1X on an XSMALL warehouse. That is a configuration extension, not what the exam tests today. Stick to “Medium minimum” on test day. One common distractor says Snowpark-Optimized is “gated to Enterprise Edition or higher.” That is multi-cluster behaviour, not Snowpark-Optimized itself.
Micro-Concept 4: Snowflake Notebooks
A Snowflake Notebook is a Snowsight surface with cells. SQL cells, Python cells, and Markdown cells. They are not a separate edition. They are a Snowsight feature available across all editions.
Two facts make Notebooks exam-testable:
(a) Two runtime choices.
| Runtime | What it runs on | Best for |
|---|---|---|
| Warehouse Runtime | A virtual warehouse (the Notebook warehouse). Default Python 3.9 (3.10 also available). | General SQL + Python work, lightweight analytics, fast startup. |
| Container Runtime | A Snowpark Container Services compute pool (CPU or GPU). Many ML packages pre-installed. | ML training, GPU workloads, deep learning, custom Python packages. |
(b) Two warehouses, not one. A notebook on the Warehouse Runtime can use two warehouses at the same time:
- Notebook warehouse. Runs the kernel (the Python process driving the cells). Snowflake recommends X-Small here. The managed default is called
SYSTEM$STREAMLIT_NOTEBOOK_WH, provisioned per account. - Query warehouse. Runs the SQL/Snowpark queries the notebook submits. Can be larger to handle heavy queries on demand.
The split exists because the kernel warehouse stays active for the entire notebook session. You will see an EXECUTE NOTEBOOK query parked in Query History the whole time. The query warehouse only spins up when a cell actually submits work. Keeping them separate lets you pay X-Small rates for sit-and-think time, and run the bigger warehouse only when queries actually fire. On the exam, watch for stems describing this two-warehouse model. The kernel goes on the Notebook warehouse. The queries go on the Query warehouse.
Micro-Concept 5: Streamlit in Snowflake
Streamlit is an open-source Python library for building data apps with a handful of lines of code. Streamlit-in-Snowflake hosts those apps as first-class Snowflake objects. Not on an external server, not on a separate URL, not on your laptop.
Five facts to remember:
- Source lives in a named internal stage. The
STREAMLITobject points to it. External stages are not supported. - Access is RBAC. Viewers need USAGE on the database, schema, warehouse (or compute pool), and the STREAMLIT object itself.
- Two runtimes, mirroring Notebooks: Warehouse Runtime (the default) and Container Runtime (uses a compute pool).
- Warehouse runtime gives each viewer a personal app instance. Container runtime shares one long-running instance across all viewers. Cheaper at scale.
- Works with Snowpark, UDFs, and stored procedures. Anything you build with Snowpark can be wrapped in a Streamlit UI without leaving Snowflake.
Streamlit-in-Snowflake is the standard exam answer for any scenario that says “host a Python data app inside Snowflake without managing external infrastructure.” If the stem mentions Streamlit Community Cloud, a self-managed VM, or an external hosting platform, that is not Streamlit-in-Snowflake. Those are separate deployment options for the underlying open-source library.
Cheat Sheet
| Concept | What to remember |
|---|---|
| Snowpark one-liner | Host non-SQL code (Python, Java, Scala) inside Snowflake. No data movement. Sample Q5 answer. |
| Snowpark vs External Function | Snowpark = code comes IN. External Function = data goes OUT. |
| Snowpark-Optimized warehouse | 16× memory, ~1.5× credits, min size = Medium, NO Query Acceleration Service. |
| Best fit for Snowpark-Optimized | ML training, memory-intensive Python UDFs/stored procedures. |
| Notebook = SQL + Python + Markdown cells | Native to Snowsight. Default Python 3.9 (3.10 optional). |
| Notebook runtimes | Warehouse Runtime (on a warehouse) or Container Runtime (on a compute pool, CPU/GPU, for ML). |
| Notebook default warehouse | SYSTEM$STREAMLIT_NOTEBOOK_WH (Snowflake-managed, provisioned per account). |
| Two warehouses in a Notebook | Notebook warehouse (kernel, X-Small recommended) + Query warehouse (SQL pushdown, size on demand). |
| Streamlit-in-Snowflake = native object | Source on internal stage, RBAC for access, no external hosting. |
| Streamlit warehouse runtime | Each viewer gets a personal app instance. |
| Streamlit container runtime | All viewers share one long-running instance. More cost-effective at scale. |
| UDF vs stored procedure call rule | A single SQL statement can call many UDFs but only one stored procedure. |
| ML Models / Applications | Two of the 12 database objects (from Day 5). Both belong here in 1.6. |
Three near-identical-sounding distractors to keep straight:
(1) Snowpark hosts non-SQL code inside Snowflake on a warehouse. TRUE. This is the Sample Q5 answer and the most-tested sentence in sub-objective 1.6. (2) External Functions host non-SQL code inside Snowflake. FALSE. They call out to an external HTTPS endpoint, and the data is sent out. (3) Streamlit-in-Snowflake requires an external hosting service. FALSE. It is a native Snowflake object, opened through Snowsight, with no external infrastructure. One common trap: stems that mention “Java UDF for sentiment scoring” or “Python code with high performance on Snowflake data”. Those are Sample Q5 in disguise. The answer is Snowpark, even when the word Snowpark never appears in the question.
Hands-On Lab
day10_orders table from Day 10 should still exist (Section 14 of memory: it persists through Day 39).Confirm the Snowpark Python runtime is reachable from a regular worksheet. This proves Snowpark is server-side accessible. No local install required.
USE ROLE ACCOUNTADMIN;
USE WAREHOUSE lab_xs;
-- Inspect the Snowpark Python stored-procedure runtime
SHOW FUNCTIONS LIKE 'SYSTEM$ALLOW%' IN SNOWFLAKE.CORE;
-- Confirm day10_orders still exists from Week 2 (Day 10 lab)
SELECT COUNT(*) AS row_count FROM day10_orders;
-- Expect ~1.5M rows
day10_orders is your fixture for the next steps.Create a Snowpark Python stored procedure. This is the “host non-SQL code in Snowflake” pattern, the exact mechanism Sample Q5 asks about.
CREATE OR REPLACE PROCEDURE day15_top_status()
RETURNS TABLE (status STRING, order_count NUMBER)
LANGUAGE PYTHON
RUNTIME_VERSION = '3.10'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'run'
AS
$$
def run(session):
df = (session.table("day10_orders")
.group_by("O_ORDERSTATUS")
.count()
.sort("COUNT", ascending=False))
return df
$$;
-- Call it. The Python code runs on the warehouse, no data leaves Snowflake.
CALL day15_top_status();
run() function executes inside Snowflake on lab_xs. The DataFrame operations translate to SQL internally. This is Snowpark’s defining property: non-SQL code, hosted in Snowflake, no data movement.Create a Snowflake Notebook in Snowsight (GUI step).
In Snowsight: Projects » Notebooks » + Notebook. Fill in:
- Name:
day15_demo - Database / Schema: any database you have CREATE privileges on (e.g., your default database, public schema)
- Run on:
Warehouse - Python environment: Warehouse Runtime 1.0 (Python 3.9) is fine
- Query warehouse:
lab_xs - Notebook warehouse:
SYSTEM$STREAMLIT_NOTEBOOK_WH(Snowflake-managed default, recommended)
Click Create. The notebook opens with an empty cell.
EXECUTE NOTEBOOK query that stays running as long as the notebook is open. The Query warehouse only spins up when a SQL or Snowpark query is actually submitted from a cell.Add and run a SQL cell, then a Python cell. Inside the notebook:
In the first cell, set the cell type to SQL and paste:
SELECT O_ORDERSTATUS, COUNT(*) AS c
FROM day10_orders
GROUP BY 1
ORDER BY c DESC;
Click Run. The output renders as an interactive Streamlit DataFrame.
Add a second cell, set the type to Python, and paste:
from snowflake.snowpark.context import get_active_session
session = get_active_session()
df = session.table("day10_orders") \
.group_by("O_ORDERSTATUS") \
.count() \
.sort("COUNT", ascending=False)
df # In a Snowflake Notebook, just naming the DataFrame renders it.
Run the cell.
df.show() to get the same result.Create a Streamlit-in-Snowflake app (GUI step). This is the third surface in today’s stack.
In Snowsight: Projects » Streamlit » + Streamlit App. Fill in:
- Name:
day15_streamlit - Database / Schema: same as your notebook
- Run on:
Warehouse - App warehouse:
lab_xs
Click Create. Snowflake generates a starter streamlit_app.py. Replace its body with:
import streamlit as st
from snowflake.snowpark.context import get_active_session
st.title("Day 15 — Order Status Explorer")
session = get_active_session()
status_options = [r[0] for r in
session.sql("SELECT DISTINCT O_ORDERSTATUS FROM day10_orders").collect()]
picked = st.selectbox("Pick an order status:", status_options)
count = session.sql(
f"SELECT COUNT(*) AS c FROM day10_orders WHERE O_ORDERSTATUS = '{picked}'"
).collect()[0]["C"]
st.metric(label=f"Orders with status '{picked}'", value=f"{count:,}")
Click Run. The app renders inline.
Concept check (no need to run). A Snowpark-Optimized warehouse for a memory-intensive ML training stored procedure would be created like this. Most trial accounts will not run this cheaply, so this step is illustrative only.
-- Concept check only — minimum size is MEDIUM in the classic mode.
-- Costs roughly 1.5x a standard MEDIUM credit rate.
-- DO NOT RUN unless you intend to spend the credits.
/*
CREATE WAREHOUSE day15_ml_wh
WAREHOUSE_TYPE = 'SNOWPARK-OPTIMIZED'
WAREHOUSE_SIZE = 'MEDIUM'
AUTO_SUSPEND = 60
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = TRUE;
*/
-- The newer RESOURCE_CONSTRAINT clause (GA March 2025) lets you do this:
/*
CREATE WAREHOUSE day15_ml_wh
WAREHOUSE_TYPE = 'SNOWPARK-OPTIMIZED'
WAREHOUSE_SIZE = 'XSMALL'
RESOURCE_CONSTRAINT = 'MEMORY_1X'
INITIALLY_SUSPENDED = TRUE;
*/
-- For the exam, stick to the canonical answer: Snowpark-Optimized minimum = MEDIUM.
Cleanup. Remove the stored procedure and the GUI-created notebook + Streamlit app. day10_orders stays. It is the persistent fixture through Day 39.
DROP PROCEDURE IF EXISTS day15_top_status();
-- In Snowsight:
-- Projects » Notebooks » day15_demo » (ellipsis) » Delete
-- Projects » Streamlit » day15_streamlit » (ellipsis) » Delete
-- DO NOT drop day10_orders — it persists through Day 39.
EXECUTE NOTEBOOK query alive on its Notebook warehouse. The warehouse stays running until the session ends. I have seen idle notebooks run up meaningful overnight credit costs on a client project, because users closed the browser tab without ending the session. Always click the Active button at the top of the notebook and choose End session before walking away.Snowflake Documentation
External References
Snowpark-Optimized and runtime details.
Practice Questions
Options:
A. External Functions
B. Snowpark
C. Snowpipe
D. Snowflake Cortex SQL functions
Why B: Snowpark hosts non-SQL code (Java, Python, Scala) that executes inside Snowflake on a warehouse, with no data movement. This is the answer to official Sample Q5 and the most-tested fact in sub-objective 1.6. Whenever a stem mentions custom Java or Python with high performance on Snowflake data, the answer is Snowpark. The word itself need not appear in the question.
Why not A: External Functions call out to an HTTPS endpoint hosted by a third party. The data leaves Snowflake, the opposite direction of what the scenario describes.
Why not C: Snowpipe is for continuous data loading from a stage, not for hosting custom code. Re-read Day 4 if missed.
Why not D: Cortex SQL functions (SENTIMENT, SUMMARIZE, and so on) are pre-built and called from SQL. They cannot host the team’s own custom Java or Python code. Day 16 covers Cortex in detail.
Options:
A. A single executable statement can call only two stored procedures. In contrast, a single SQL statement can call multiple UDFs.
B. A single executable statement can call only one stored procedure. In contrast, a single SQL statement can call multiple UDFs.
C. A single executable statement can call multiple stored procedures. In contrast, multiple SQL statements can call the same UDFs.
D. Multiple executable statements can call more than one stored procedure. In contrast, a single SQL statement can call multiple UDFs.
Why B: The Snowflake documentation states the rule plainly. A single SQL statement can call multiple UDFs, but only one stored procedure. A stored procedure is invoked via CALL as its own independent statement. It cannot be embedded in an expression. UDFs return a value and can sit inline in SELECT lists, WHERE clauses, and so on. That is why many can appear in a single statement.
Why not A, C, D: All three distort the actual call rule. Option D’s “multiple statements can call multiple procedures” is technically true on its own but misses the comparison the documentation actually makes.
Why this matters for Day 15: Snowpark code ships either as a stored procedure (one call per statement, procedural) or as a UDF (many calls per statement, returns a value). Knowing which packaging fits which use case is exam-relevant.
Options:
A. The Notebook warehouse runs SQL queries; the Query warehouse runs the Python kernel.
B. The Notebook warehouse runs the Python kernel (and cell execution); the Query warehouse runs SQL and Snowpark pushdown queries.
C. Both warehouses split SQL traffic round-robin for load balancing.
D. The Query warehouse is only used when the notebook is closed.
Why B: The Notebook warehouse runs the kernel, the Python process driving the cells. It stays active for the entire session via an EXECUTE NOTEBOOK query. The Query warehouse spins up on demand to run the SQL and Snowpark queries submitted by the cells. Snowflake recommends a small Notebook warehouse (X-Small) paired with a separately-sized Query warehouse for heavy work.
Why not A: The roles are reversed. This is the classic flip-the-labels distractor. If you skim, you pick it.
Why not C: Snowflake does not load-balance round-robin between these two warehouses. They have distinct, non-overlapping roles.
Why not D: The Query warehouse is used while the notebook is actively running queries, not when it is closed.
Options:
A. Streamlit in Snowflake requires hosting on Streamlit Community Cloud and authenticates back to Snowflake via OAuth.
B. Streamlit in Snowflake is a native Snowflake object whose source files live on an internal stage, with access controlled through Snowflake RBAC.
C. Streamlit in Snowflake apps are only visible to the ACCOUNTADMIN role.
D. Streamlit in Snowflake apps can only read data via the Snowflake REST API.
Why B: Streamlit-in-Snowflake stores the app’s source files on a named internal stage. External stages are not supported. The STREAMLIT object is a first-class Snowflake securable. Viewers need USAGE on the database, schema, warehouse (or compute pool), and the STREAMLIT object. That is standard RBAC.
Why not A: External hosting is exactly what Streamlit-in-Snowflake removes. Streamlit Community Cloud is a separate deployment option for the open-source library and is not how Streamlit-in-Snowflake works.
Why not C: Any role granted the right privileges can view the app. ACCOUNTADMIN is not required, and granting it for viewing would be a security anti-pattern.
Why not D: Inside the app, Snowpark and SQL run against the app’s warehouse directly. There is no requirement to route through the REST API.
Options:
A. They provide roughly 16× the memory per node compared to a standard warehouse of the same size.
B. They are available starting from the X-Small size.
C. The minimum supported size (in the classic mode without the RESOURCE_CONSTRAINT clause) is Medium.
D. They are billed at the same credit rate as a standard warehouse of the same size.
E. They are only available on Business Critical Edition or higher.
Why A: Snowpark-Optimized warehouses provide approximately 16× the memory per node of a standard warehouse. This supports memory-intensive workloads like ML training without spilling to disk.
Why C: In the classic mode, the supported sizes start at Medium. X-Small and Small are not available. The newer RESOURCE_CONSTRAINT = 'MEMORY_1X' clause does let you run at XSMALL, but the standard exam answer remains “Medium minimum.”
Why not B: X-Small is not supported in the classic Snowpark-Optimized mode. This is the distractor designed to catch people who skip past the “in the classic mode” qualifier.
Why not D: Snowpark-Optimized warehouses cost roughly 1.5× the credits/hour of a standard warehouse of the same size, not the same rate.
Why not E: Snowpark-Optimized is available across editions. Multi-cluster capability is what is gated to Enterprise+, not the warehouse type itself. Re-read Day 9 if the multi-cluster edition gate is fuzzy.
Today you learned: Snowpark is how non-SQL code (Python, Java, Scala) runs inside Snowflake on a warehouse. That is the answer to Sample Q5 and the single most-tested fact in sub-objective 1.6. External Functions go in the opposite direction (data out to an HTTPS endpoint). Snowpark-Optimized warehouses give roughly 16× memory and 1.5× credits, with a Medium minimum size, for memory-hungry ML workloads. Snowflake Notebooks combine SQL and Python in cells. They use two warehouses (kernel on the Notebook warehouse, queries on the Query warehouse) and default to the managed SYSTEM$STREAMLIT_NOTEBOOK_WH. Streamlit-in-Snowflake hosts Python data apps as native Snowflake objects on an internal stage, accessed through Snowsight, governed by RBAC. No external hosting.
Key takeaway: Snowpark, Notebooks, and Streamlit-in-Snowflake share one design principle. Bring the code to the data, not the other way around. Once you internalise that, sub-objective 1.6 stops being a “memorise three products” exercise and becomes a “recognise which surface fits which scenario” exercise. The exam tests exactly that.
Tomorrow (Day 16). Cortex AI: SQL Functions + Cortex Search. Still in sub-objective 1.6, but the angle shifts from “host your own code” to “use Snowflake’s pre-built AI.” Cortex SQL functions are pure SQL (no Python required) for sentiment, summarisation, translation, completion, classification, and answer extraction. Cortex Search adds semantic and lexical retrieval on text columns. Expect one to two exam questions on the SQL-only nature of these functions.