Cortex Analyst & Snowflake ML Functions
Day 17 closes sub-objective 1.6 with two topics. Cortex Analyst converts natural language to SQL via a semantic model. Snowflake ML adds forecasting, anomaly detection, and classification as SQL-callable classes. The fact tested most often as a multi-select trap: ML Models are first-class database objects, one of the 12 from Day 5.
| Term | Plain meaning |
|---|---|
| Text-to-SQL | Take a natural-language question. Produce a SQL query that answers it. Cortex Analyst is Snowflake’s hosted text-to-SQL service. |
| Semantic model | A description of what your data means in business terms. It maps tables and columns to business names, synonyms, and measures. The LLM then understands “last quarter revenue” the way a human analyst would. Defined in YAML. |
| Logical table / Logical column | What the semantic model exposes to Cortex Analyst. A logical table points at a base table but layers business names, synonyms, and derived measures on top. |
| Dimension / Measure | Dimensions are the categorical “by what” columns: region, product, date. Measures are the numbers you aggregate: revenue, count, average order value. |
| Semantic view | A newer first-class Snowflake object that holds the semantic model inside the database instead of as a YAML file on a stage. Schema-level, governable by RBAC. |
| Time-series | Data captured over time: one timestamp column, one target value column. The input shape FORECAST and ANOMALY_DETECTION expect. |
| Snowflake ML built-ins | Pre-built ML capabilities exposed as SQL classes: FORECAST, ANOMALY_DETECTION, CLASSIFICATION. Train once with SQL, then call model!FORECAST(...) to predict. |
| Model Registry | A schema-level store of trained ML model objects. ML Models are first-class: one of the 12 from Day 5. |
Today’s Concept
Micro-Concept 1: Cortex Analyst, the One-Liner
One sentence to commit to memory:
Cortex Analyst answers business questions in natural language by converting them to SQL. The semantic model is the bridge between business terms and your database schema.
It is a Snowflake-hosted text-to-SQL service. A user asks “What was net revenue in EMEA last quarter?” Cortex Analyst consults the semantic model, generates SQL, and runs it on a warehouse you specify. It then returns the result along with the SQL it executed. Transparency is built in. The service is delivered as a REST API. The typical front-end is a Streamlit-in-Snowflake app or a chatbot, not a SQL worksheet.
Three architectural facts the exam returns to:
- A semantic model is required. Schema-only text-to-SQL fails on real workloads because raw column names do not carry business meaning. CUST_ID does not tell an LLM what a customer is. The semantic model is what makes the output usable. The exam tests this first.
- Data never leaves Snowflake. The LLMs are Snowflake-hosted (Meta Llama and Mistral by default). Only the semantic model’s metadata is passed to the model for SQL generation. The customer data itself stays inside the governance boundary.
- RBAC still applies. The generated SQL executes as the calling role. Masking policies, row access policies, and table grants all enforce. Cortex Analyst is not a way around security.
Micro-Concept 2: The Semantic Model, What’s Inside
A semantic model is a YAML document. The core structure looks like this:
name: sales_semantic_model
description: Sales analytics for global e-commerce
tables:
- name: orders # logical table name (business-friendly)
description: Customer orders, one row per order
base_table:
database: sales_db
schema: public
table: orders # physical table
dimensions: # "by what" columns
- name: region
synonyms: ['area', 'territory']
expr: customer_region
data_type: VARCHAR
- name: product_category
expr: prod_cat
data_type: VARCHAR
time_dimensions: # timestamp columns
- name: order_date
expr: o_orderdate
data_type: DATE
measures: # numeric values to aggregate
- name: revenue
synonyms: ['sales', 'net revenue']
expr: o_totalprice
data_type: NUMBER
default_aggregation: sum
verified_queries: # optional: example Q&A pairs that boost accuracy
- name: revenue_by_region_last_quarter
question: "What was revenue by region last quarter?"
sql: |
SELECT region, SUM(revenue) AS revenue
FROM orders
WHERE order_date >= DATEADD(QUARTER, -1, CURRENT_DATE)
GROUP BY region;
Names to recognise on the exam. Logical tables point at base tables. Dimensions are categorical “by-what” columns. Time dimensions are the timestamps Cortex Analyst uses to interpret time filters. Measures are numeric columns with default aggregations. Synonyms capture alternative business terms. Verified queries are an optional bank of curated question-and-SQL pairs that sharpen accuracy on similar questions.
Two ways to deliver the model:
- Legacy: YAML file on a Snowflake stage. Cortex Analyst reads it via the API call. Still fully supported.
- Recommended: Semantic View. A schema-level Snowflake object created with
CREATE SEMANTIC VIEW .... Governed by RBAC, shareable through standard sharing.
For C03, the question stems still describe the YAML-on-stage flow. Semantic Views are good background context. The exam’s standard phrasing remains “semantic model in YAML.”
Micro-Concept 3: Snowflake ML, the Built-in Function Families
Snowflake ML exposes pre-built machine learning as SQL-callable classes. Three families to know:
| Family | Class / Function | What it does | Input data shape |
|---|---|---|---|
| Time-series forecasting | FORECAST (under SNOWFLAKE.ML) | Predict future values of a numeric metric over time. Handles single and multi-series. | Timestamp column + target numeric column (+ optional series ID) |
| Anomaly detection | ANOMALY_DETECTION → DETECT_ANOMALIES | Flag outliers in time-series data based on a fitted forecast. | Same as FORECAST |
| Classification | CLASSIFICATION → PREDICT | Sort rows into two or more classes based on feature columns. | Features + a class label column (training); features only (prediction). |
The lifecycle is the same across all three. (1) Train with CREATE SNOWFLAKE.ML.<CLASS> ... against a training table or view. (2) Call model!FORECAST(...), model!DETECT_ANOMALIES(...), or model!PREDICT(...) to get predictions. (3) Reuse the model. It persists as a database object.
The most-tested fact on this topic: Snowflake ML built-ins are SQL-callable. No Python required. It is the same trap shape as the Cortex AI no-Python claim from Day 16. If an answer says “you must use Snowpark Python to call FORECAST,” that is the wrong option. You can drive these from Snowpark if you want to. You do not have to.
One terminology note that catches people. “Snowflake ML” is a broader umbrella. It also covers the Snowpark ML Python library, used by data scientists who train custom models in Python. It covers the Model Registry, which stores trained models regardless of how they were built. The SQL-callable built-in classes above are the C03-relevant subset.
Micro-Concept 4: ML Models Are First-Class Database Objects
This is the multi-select trap that connects Day 5 to Week 3. A trap I keep seeing in practice tests: a “which of these are database objects?” multi-select that mixes ML Models with account-level securables. Learners overthink it and skip ML Models. Don’t.
Recall the 12 from Day 5: Tables, Views, Schemas, Stages, File Formats, Sequences, UDFs, Stored Procedures, Pipes, Shares, ML Models, and Applications.
ML Models being on that list means:
- They live at the schema level, just like tables and views.
- They have owners and respond to standard RBAC: GRANT, REVOKE, ownership transfer.
- They show up in
SHOW MODELSand inSNOWFLAKE.ACCOUNT_USAGEmetadata views. - They can be cloned, dropped, replaced.
- They can be shared through Secure Data Sharing where applicable.
The exam trap is putting ML Models next to Roles, Users, Warehouses, or Resource Monitors. ML Models belong. The others are account-level securables and do not.
Micro-Concept 5: Cortex Analyst vs Cortex Search, Don’t Mix Them Up
These two get paired as distractors constantly. The line between them is clean:
| Cortex Analyst | Cortex Search | |
|---|---|---|
| Input data type | Structured (tables, columns) | Unstructured text (documents, descriptions) |
| Question style | “What’s the X for Y by Z?” (analytical) | “Find documents about X” (retrieval) |
| How it works | NL → SQL via semantic model | Hybrid retrieval (vector + lexical + reranker) |
| Output | A data result set + the SQL that produced it | A ranked list of matching text rows |
| Configuration artifact | Semantic model (YAML or Semantic View) | CREATE CORTEX SEARCH SERVICE on a text column |
| Primary use case | Self-service BI / talk-to-your-data | RAG / enterprise document search |
Read the stem carefully. “Users want to ask business questions and get back rows from our sales table” is Cortex Analyst. “Users want to find relevant support tickets from a free-text query” is Cortex Search. The two can sit together in one RAG application. On the exam they are separate answers.
Cheat Sheet
| Concept | What to remember |
|---|---|
| Cortex Analyst one-liner | NL → SQL using a semantic model. LLM-powered, REST-API-driven. |
| Cortex Analyst REQUIRES | A semantic model: YAML on a stage, or a Semantic View. Without one, accuracy collapses. |
| Semantic model contents | Logical tables, dimensions, time dimensions, measures, synonyms, verified queries. |
| Cortex Analyst role | CORTEX_USER covers all Cortex features and is granted to PUBLIC by default. CORTEX_ANALYST_USER gates Cortex Analyst separately when you need narrower access. |
| Where data lives | Stays in Snowflake. Only semantic-model metadata reaches the LLM during SQL generation. |
| Snowflake ML built-ins | SQL-callable: FORECAST, ANOMALY_DETECTION, CLASSIFICATION. No Python required. |
| ML built-in pattern | CREATE SNOWFLAKE.ML.<CLASS> → model!FORECAST(...) / model!DETECT_ANOMALIES(...) / model!PREDICT(...) |
| Time-series inputs | Timestamp column + target numeric column for single-series. Add a series-ID column for multi-series. |
| ML models are objects | One of the 12 database objects from Day 5. Schema-level, GRANTable, visible in SHOW MODELS. |
| The 12 objects | Tables, Views, Schemas, Stages, File Formats, Sequences, UDFs, Stored Procedures, Pipes, Shares, ML Models, Applications. |
| Cortex Analyst vs Search | Analyst = structured + NL→SQL. Search = unstructured text + retrieval. |
| Snowflake ML scope | SQL built-ins + Snowpark ML Python library + Model Registry. Built-ins are the SQL-callable subset. |
Three high-yield traps to internalise before the exam. (1) Cortex Analyst without a semantic model. The scenario reads “the team enabled Cortex Analyst and pointed it at their database, but accuracy was poor.” The fix is always “build a semantic model.” Schema-only text-to-SQL is the failure mode the semantic model exists to solve. It is not warehouse size, not edition, not retraining the LLM. (2) Snowflake ML requires Python. FALSE. FORECAST, ANOMALY_DETECTION, and CLASSIFICATION are SQL-callable. The Python-required option is the trap, same family as the Cortex AI no-Python claim from Day 16. (3) “Which of these are database objects?” multi-select. When the option list mixes ML Models or Applications with Roles, Users, Warehouses, or Resource Monitors: ML Models and Applications are IN. The rest are OUT. Roles, users, warehouses, and resource monitors are account-level securables. They do not belong on the 12-object list.
One more split worth memorising. A stem about “finding documents related to a question” and one about “answering business questions on structured tables” are easy to confuse. The first is Cortex Search. The second is Cortex Analyst. They co-exist in real apps. On the exam they are separate answers.
Hands-On Lab
day10_orders from Day 10 still alive (we’ll use it as time-series input).Build a time-series view over day10_orders. FORECAST needs a timestamp column and a numeric target. Aggregate orders by day.
USE ROLE ACCOUNTADMIN;
USE WAREHOUSE lab_xs;
-- Build a daily order-count time series from day10_orders
CREATE OR REPLACE VIEW day17_daily_orders AS
SELECT
O_ORDERDATE AS order_day,
COUNT(*)::NUMBER AS order_count
FROM day10_orders
GROUP BY O_ORDERDATE
ORDER BY O_ORDERDATE;
-- Confirm shape: should be ~2,400 daily rows (TPCH spans 1992-1998)
SELECT MIN(order_day), MAX(order_day), COUNT(*) FROM day17_daily_orders;
Train a forecasting model. CREATE SNOWFLAKE.ML.FORECAST trains a model and registers it as a schema-level database object.
CREATE OR REPLACE SNOWFLAKE.ML.FORECAST day17_order_forecast(
INPUT_DATA => SYSTEM$REFERENCE('VIEW', 'day17_daily_orders'),
TIMESTAMP_COLNAME => 'order_day',
TARGET_COLNAME => 'order_count'
);
-- The model is now a schema-level object — one of the 12 from Day 5.
SHOW SNOWFLAKE.ML.FORECAST;
SHOW SNOWFLAKE.ML.FORECAST lists it under the current schema, same pattern as SHOW TABLES.Generate forecasts. Call the model’s FORECAST method directly in a FROM clause.
-- Predict the next 30 days
SELECT * FROM TABLE(day17_order_forecast!FORECAST(FORECASTING_PERIODS => 30));
-- Check how well the model fit historical data
CALL day17_order_forecast!SHOW_EVALUATION_METRICS();
model!FORECAST(...). The ! separator marks a method call on a class instance rather than a regular function call. Same pattern for !DETECT_ANOMALIES and !PREDICT.Detect anomalies on the same time series. Anomaly Detection uses the same input shape. Internally it fits a forecast and flags points outside the predicted range.
CREATE OR REPLACE SNOWFLAKE.ML.ANOMALY_DETECTION day17_order_anomalies(
INPUT_DATA => SYSTEM$REFERENCE('VIEW', 'day17_daily_orders'),
TIMESTAMP_COLNAME => 'order_day',
TARGET_COLNAME => 'order_count'
);
-- Score the historical data for anomalies, 95% prediction interval
SELECT *
FROM TABLE(
day17_order_anomalies!DETECT_ANOMALIES(
INPUT_DATA => SYSTEM$REFERENCE('VIEW', 'day17_daily_orders'),
TIMESTAMP_COLNAME => 'order_day',
TARGET_COLNAME => 'order_count',
CONFIG_OBJECT => {'prediction_interval': 0.95}
)
)
WHERE is_anomaly = TRUE
ORDER BY order_day
LIMIT 20;
is_anomaly is a boolean column in the output. Tightening the prediction interval to 99% flags fewer points; loosening to 80% flags more. That single parameter controls sensitivity.Concept check: Cortex Analyst setup (no need to run). Cortex Analyst is REST-API-driven. It needs a semantic model plus a Streamlit (or external) client to exercise end-to-end. What the exam tests is the shape of the semantic model, not the deployment plumbing.
-- Concept check only — this is the legacy YAML-on-stage flow.
-- 1. Create a stage to hold the semantic model
/*
CREATE OR REPLACE STAGE day17_semantic_stage
DIRECTORY = (ENABLE = TRUE);
-- 2. Upload the YAML file (PUT from SnowSQL/CLI — can't be done from Snowsight worksheet)
-- PUT file:///path/to/sales_semantic_model.yaml @day17_semantic_stage;
-- 3. Cortex Analyst is called via REST:
-- POST https://<account>.snowflakecomputing.com/api/v2/cortex/analyst/message
-- Body: { "messages": [...], "semantic_model_file": "@day17_semantic_stage/sales_semantic_model.yaml" }
-- The newer (recommended) flow uses a Semantic View instead of a stage:
CREATE OR REPLACE SEMANTIC VIEW day17_sales_sv
TABLES (orders AS day10_orders PRIMARY KEY (o_orderkey))
DIMENSIONS (
orders.region AS o_orderpriority,
orders.status AS o_orderstatus
)
METRICS (
orders.total_revenue AS SUM(o_totalprice),
orders.order_count AS COUNT(*)
);
-- Cortex Analyst can then point at this semantic view by name — no stage file needed.
*/
Confirm ML Models are first-class objects.
-- ML Models behave like tables/views from a metadata perspective.
SHOW SNOWFLAKE.ML.FORECAST;
SHOW SNOWFLAKE.ML.ANOMALY_DETECTION;
-- ACCOUNT_USAGE tracks them too
SELECT model_name, schema_name, model_type, created
FROM SNOWFLAKE.ACCOUNT_USAGE.ML_MODELS
WHERE deleted IS NULL
ORDER BY created DESC
LIMIT 10;
Cleanup. Drop the two ML model objects and the helper view. day10_orders stays.
DROP SNOWFLAKE.ML.FORECAST IF EXISTS day17_order_forecast;
DROP SNOWFLAKE.ML.ANOMALY_DETECTION IF EXISTS day17_order_anomalies;
DROP VIEW IF EXISTS day17_daily_orders;
-- day10_orders persists through Day 39.
SERVICE_TYPE = 'AI_SERVICES' in METERING_DAILY_HISTORY. That is the same line item as Cortex LLM usage. Useful when you start attributing spend on Day 26.Snowflake Documentation
Official docs for today’s topics.
External References
Forecasting and anomaly detection references.
Practice Questions
Options:
A. Increase the warehouse size used by Cortex Analyst.
B. Build a semantic model (YAML or Semantic View) that defines logical tables, dimensions, measures, and business synonyms.
C. Re-train the Cortex Analyst LLM on the customer’s data.
D. Move the data to a Business Critical account.
Why B: Cortex Analyst’s accuracy depends on a semantic model that maps business terms to physical schema. Pointed at raw tables, the LLM has only column names to work from. CUST_ID does not tell it what a customer is. Logical tables, dimensions, measures, and synonyms are what make the SQL output reliable.
Why not A: Warehouse size affects how fast the generated SQL runs. It has no effect on whether the LLM picks the right tables and columns. Classic “throw more compute at it” trap.
Why not C: Cortex Analyst does not train on customer data. The LLM is Snowflake-hosted and managed. Trap option designed to catch users who think LLMs always learn from their data.
Why not D: Edition has no effect on Cortex Analyst accuracy. Trap that exploits the general pattern of “higher edition unlocks more features.”
Options:
A. SNOWFLAKE.CORTEX.COMPLETE
B. SNOWFLAKE.ML.FORECAST
C. SNOWFLAKE.CORTEX.SUMMARIZE
D. Cortex Search service
Why B: SNOWFLAKE.ML.FORECAST trains a time-series forecasting model. The input is a table or view with a timestamp column and a numeric target. After training, model!FORECAST(...) returns predicted future values with confidence intervals. Everything runs in SQL.
Why not A: COMPLETE generates LLM text continuations. It does not produce numeric forecasts. Trap pairs an LLM function with a numeric-prediction stem to test whether the candidate knows the ML-vs-Cortex split. Re-read Day 16 if missed.
Why not C: SUMMARIZE condenses text passages. Same trap family as A.
Why not D: Cortex Search retrieves matching documents. It does not produce numeric predictions. Trap mixes Cortex Search into ML territory to test whether the candidate keeps Search and ML straight.
Options:
A. Roles
B. ML Models
C. Resource Monitors
D. Warehouses
E. Stored Procedures
Why B: ML Models are schema-level first-class database objects, one of the 12 from Day 5. They have owners, respond to GRANT and REVOKE, appear in SHOW commands, and are tracked in SNOWFLAKE.ACCOUNT_USAGE.ML_MODELS.
Why E: Stored Procedures are also on the 12-object list. Schema-level, created with CREATE PROCEDURE, owned by a role, grantable.
Why not A, C, D: Roles, Resource Monitors, and Warehouses are account-level securables. They live above the database/schema hierarchy and are not on the 12-object list. This is the recurring multi-select trap. The options always look plausible because the listed items are all “real Snowflake things.” The split is by scope, not by familiarity. Re-read Day 5 if missed.
Options:
A. No — Snowflake ML built-ins require Snowpark Python for training and prediction.
B. No — ML built-ins are only callable from Snowflake Notebooks, not regular worksheets.
C. Yes — FORECAST, ANOMALY_DETECTION, and CLASSIFICATION are SQL-callable classes; training and prediction are both done in pure SQL.
D. Yes, but only if the user holds the ACCOUNTADMIN role.
Why C: The Snowflake ML built-in classes (FORECAST, ANOMALY_DETECTION, CLASSIFICATION) are designed to be SQL-callable. Training uses CREATE SNOWFLAKE.ML.<CLASS> ... and prediction uses model!FORECAST(...). Any SQL interface works, no Python needed.
Why not A: The “must use Python” trap. Snowflake also ships a Python ML library, but the built-ins exist precisely so SQL-only analysts can forecast and detect anomalies. Re-read Day 16 if missed. Same trap shape as Cortex SQL functions.
Why not B: Snowsight worksheets, drivers, and the CLI all work. Notebooks are convenient, not required. Trap exploits the assumption that “ML always needs a notebook.”
Why not D: Privileges are controlled by schema and model object grants, not by ACCOUNTADMIN. Trap defaults to the most powerful role when the actual answer is normal RBAC.
Options:
A. Cortex Search
B. Cortex Analyst
C. SNOWFLAKE.CORTEX.EXTRACT_ANSWER
D. Search Optimization Service
Why B: Cortex Analyst is the text-to-SQL service built for this exact use case. It handles natural-language questions over structured data, with a semantic model providing the business-to-schema mapping. The response is a data result set plus the SQL it ran.
Why not A: Cortex Search retrieves matching text documents. Wrong tool for analytical questions over structured tables. Trap pairs the two Cortex services that share branding but solve different problems. Re-read Day 16 if missed.
Why not C: EXTRACT_ANSWER works against a single unstructured text passage, not over structured tables.
Why not D: Search Optimization Service accelerates point-lookup SQL. It does not translate natural language to SQL. Trap exploits the word “search.”
Today you learned: Cortex Analyst is Snowflake’s text-to-SQL service. It is REST-API-driven, LLM-powered, and anchored by a semantic model. The semantic model is a YAML file or a newer Semantic View. It defines logical tables, dimensions, time dimensions, measures, synonyms, and verified queries. Without the semantic layer, accuracy collapses. With it, business users can ask analytical questions and get governed SQL answers. Snowflake ML offers three SQL-callable built-in families: FORECAST (time-series), ANOMALY_DETECTION (outliers in time-series), and CLASSIFICATION (categorical). Training is CREATE SNOWFLAKE.ML.<CLASS>; prediction is model!METHOD(...). ML Models are first-class database objects, one of the 12 from Day 5, and the most frequent multi-select trap on this topic. Cortex Analyst handles structured questions (NL → SQL); Cortex Search handles unstructured text retrieval. Both are SQL-callable. Neither requires Python.
Key takeaway: Sub-objective 1.6 is closed. The full AI/ML developer surface now sits in one place. Snowpark runs your code in Snowflake. Notebooks provide interactive cells. Streamlit-in-Snowflake delivers apps as native objects. Cortex AI SQL functions are the built-in LLM calls. Cortex Search handles hybrid retrieval. Cortex Analyst converts NL to SQL via the semantic model. Snowflake ML offers SQL-callable forecast, anomaly detection, and classification, plus the Model Registry. One thread runs through everything. It all executes inside Snowflake, with no data movement, governed by RBAC, and reachable from SQL where it can be. Expect 5–7 questions across this sub-objective on the real exam.
Tomorrow (Day 18): RBAC + DAC Foundations. Week 3 leaves AI/ML and enters Domain 2: Account Management & Governance. Day 18 covers the bedrock. Role-Based Access Control (RBAC) handles centrally-managed privileges. Discretionary Access Control (DAC) covers owner-grants. Owner-grants is the answer to official Sample Q4 (“framework where an object owner grants access on the objects they own”). Three security days follow before Week 3’s recap and test.