D3: Data Loading Week 5

DAY 34

Snowflake Drivers, Connectors & Storage/API/Git Integrations

Day 33 automated change inside Snowflake with streams and tasks. Today connects Snowflake to the outside world. A driver lets a programming language open a session. A connector wires a whole data ecosystem like Spark or Kafka into Snowflake. An integration is a trust object that lets Snowflake reach external storage or HTTPS endpoints without embedding secrets. Two facts carry most of today’s questions. A storage integration keeps cloud credentials out of stage definitions. An API integration, not a storage integration, is what external functions and Git both need.

🗣️ Plain-English First

Five words anchor today’s topic. The exam tests which one solves which problem. Read this table before the concepts. The two integration rows are the ones stems try to swap.

Word you know	What it usually means	What it means in Snowflake
Driver	Something that operates a vehicle	A client library that lets one programming language connect to Snowflake and run SQL. JDBC, Python, and ODBC are examples.
Connector	A part that joins two things	A client that wires an external data system, such as Spark or Kafka, into Snowflake for reading or loading.
Integration	Combining parts into a whole	A named account-level object that grants Snowflake trusted access to an external service, without storing raw secrets in object DDL.
Storage integration	(not common usage)	The integration type that lets a stage reach cloud storage. The credentials live in the integration, not in the stage definition.
Git repository	A place where code is version-controlled	A Snowflake object that clones a remote Git repo. You reference its files like a stage and run them with EXECUTE IMMEDIATE FROM.

Why the integration rows matter today: a storage integration trusts cloud storage. An API integration trusts an HTTPS endpoint. A stem that hands a storage integration to an external function is wrong on the object. Several questions rest on that one swap.

📘

Today’s Concept

Micro-Concept 1: A Driver Connects One Programming Language

A driver is a client library you install in your application. It opens a session to Snowflake and runs SQL from your code. The driver is the bridge between one language and the Snowflake service.

Snowflake ships drivers for seven environments. The set is JDBC, ODBC, Python, Node.js, Go, .NET, and PHP. Every Snowflake driver uses TLS to secure the connection.

Two distinctions show up on the exam. JDBC and ODBC are standards-based. JDBC serves Java applications and many BI tools. ODBC serves C-based and SQL client tools. The Python connector implements the Python Database API and pairs well with libraries like pandas.

The word to hold is single. A driver connects from one runtime. It does not move a whole data platform into Snowflake. That job belongs to a connector, which is the next concept.

Micro-Concept 2: A Connector Integrates a Whole Ecosystem

A connector wires an external data system into Snowflake. It is broader than a driver. It carries data between a platform and Snowflake, with logic built for that platform.

The Spark connector is the first to know. It makes Snowflake a Spark data source for bi-directional read and write. It also pushes filters and query logic down into Snowflake, so Snowflake does the heavy work. This is the predicate and query pushdown the exam likes to name.

The Kafka connector is the second. It reads records from one or more Apache Kafka topics and loads them into Snowflake tables. It runs inside a Kafka Connect cluster.

One Kafka detail is worth a sentence on its own. The Snowflake Connector for Kafka can use Snowpipe Streaming to write rows directly into tables. This path skips staged files and gives lower latency than file-based loading. It is the row-based loading idea from Day 32, packaged inside the connector.

Hold the line this way. A driver connects from a language. A connector integrates a platform such as Spark or Kafka. A stem that calls Spark or Kafka a driver is wrong on the category.

Micro-Concept 3: A Storage Integration Keeps Credentials Out of the Stage

A storage integration is an account-level object. It stores a generated cloud identity for your external storage, plus a list of allowed and blocked locations. Its purpose is to let a stage reach cloud storage without embedding access keys in the stage definition.

The type is fixed for this use. You create it with TYPE = EXTERNAL_STAGE. A cloud administrator then grants that generated identity permission on the bucket or container. One storage integration can back many external stages.

The exam point is the credential question. Without an integration, a stage may carry raw keys in its DDL, where anyone with view rights can read them. With an integration, the stage names the integration and holds no secrets. This is the same external stage idea from earlier this week, now made credential-free.

Two facts round out the topic. Only the ACCOUNTADMIN role, or a role with the global CREATE INTEGRATION privilege, can create one. After creation you run DESCRIBE INTEGRATION to read the cloud identity Snowflake generated, then you trust that identity on the cloud side.

SQL

-- Storage integration: the stage holds no keys
CREATE STORAGE INTEGRATION s3_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::001234567890:role/myrole'
  STORAGE_ALLOWED_LOCATIONS = ('s3://my-bucket/path/');

-- The stage references the integration, not credentials
CREATE STAGE my_ext_stage
  STORAGE_INTEGRATION = s3_int
  URL = 's3://my-bucket/path/';

Micro-Concept 4: An API Integration Trusts an HTTPS Endpoint

An API integration is a different trust object. It grants Snowflake trusted access to an external HTTPS endpoint. It names the allowed URL prefixes and the provider type.

Two exam uses sit on this object. An external function needs an API integration to call a proxy service such as Amazon API Gateway. The provider is set with API_PROVIDER = aws_api_gateway, or the Azure and Google equivalents. The external function runs on the cloud provider’s compute, not on a Snowflake virtual warehouse.

The second use is Git, covered in the next concept. A Git repository needs an API integration with API_PROVIDER = git_https_api.

Keep the two integrations apart. A storage integration trusts cloud storage for stages. An API integration trusts an HTTPS endpoint for external functions and Git. In training sessions I have run, the swap between these two is the single most common integration mistake.

Micro-Concept 5: Git Integration Brings Version Control Into Snowflake

A Git repository object connects a remote Git repo to your account. It is new to the COF-C03 exam, so expect at least a recognition question on it.

The setup chains two objects. First you create an API integration with API_PROVIDER = git_https_api. Then you create the repository with CREATE GIT REPOSITORY, naming that integration, the origin URL, and an optional credentials secret.

The result is a clone inside Snowflake. It holds every branch, tag, and commit from the remote repo. You reference its files the way you reference a stage path: @my_repo/branches/main/file.sql.

The payoff is running version-controlled code in place. EXECUTE IMMEDIATE FROM runs a SQL file straight from the repository clone. You can also point a stored procedure or function handler at a Python file in the clone. The one fact to carry: Git integration relies on an API integration, never a storage integration.

SQL

-- 1. API integration for the Git provider
CREATE API INTEGRATION git_api_int
  API_PROVIDER = git_https_api
  API_ALLOWED_PREFIXES = ('https://github.com/my-account/')
  ENABLED = TRUE;

-- 2. The repository clone references that integration
CREATE GIT REPOSITORY my_repo
  API_INTEGRATION = git_api_int
  ORIGIN = 'https://github.com/my-account/my-repo.git';

-- 3. Run a SQL file straight from the clone
EXECUTE IMMEDIATE FROM @my_repo/branches/main/setup/create_tables.sql;

⚡

Cheat Sheet

Concept	What to remember	Exam keyword
Driver	Client library that connects one programming language to Snowflake and runs SQL	“driver = one language”
Driver list	JDBC, ODBC, Python, Node.js, Go, .NET, PHP. All use TLS	“seven drivers, all TLS”
Connector	Wires a whole data platform into Snowflake, not just one language	“connector = ecosystem”
Spark connector	Bi-directional read and write, with predicate and query pushdown	“Spark = pushdown”
Kafka connector	Reads Kafka topics into tables. With Snowpipe Streaming, writes rows, no staged files	“Kafka = topics to tables”
Storage integration	Account-level. Lets a stage reach cloud storage with no keys in the stage DDL	“storage int = no keys in stage”
Storage integration type	`TYPE = EXTERNAL_STAGE`. One integration backs many stages	“EXTERNAL_STAGE”
API integration	Trusts an HTTPS endpoint. Needed for external functions and Git	“API int = HTTPS endpoint”
External function	Requires an API integration (`aws_api_gateway`). Runs on cloud compute	“ext func = API int”
Git integration	New C03. `CREATE GIT REPOSITORY` over an API integration (`git_https_api`)	“Git = API int, not storage”

🎯

Exam Tip

🎯 Exam Tip

Domain 3 connectivity questions cluster around four claims. Each is true or false on one fact.

First: “A storage integration stores your cloud access keys for reuse.” False. It stores a generated cloud identity that you trust on the cloud side. The point is that the stage holds no raw keys.

Second: “An external function needs a storage integration.” False. An external function needs an API integration, because it calls an HTTPS endpoint. A storage integration only serves stages reaching cloud storage.

Third: “Git integration uses a storage integration.” False. A Git repository uses an API integration with API_PROVIDER = git_https_api. This is the one fact that catches people on the new Git topic.

Fourth: “Spark and Kafka are Snowflake drivers.” False. They are connectors. Drivers connect one programming language, such as JDBC or Python. Connectors integrate a platform.

🛠️

Hands-On Lab

Type: CONCEPT WALKTHROUGH | Time: ~15 minutes | Credits: ~0 | Prerequisite: ACCOUNTADMIN role for integration DDL. Most integrations here need real cloud or Git setup to function, so several steps are skeletons you read and adapt. The runnable steps are SHOW INTEGRATIONS and the cleanup. This lab creates no persistent objects and never touches day10_orders.

List the integrations already in your account. This runs on any account. A new trial account usually returns none.

SQL

USE ROLE ACCOUNTADMIN;

SHOW INTEGRATIONS;

👀 Observe: the result lists any integration objects with a type column. Watch for the value EXTERNAL_STAGE on storage integrations and API on API integrations. The type column is exactly the distinction the exam tests.

Read the storage integration skeleton. Substitute a real role ARN and bucket to make it work. The point is the shape: no keys appear here.

SQL

-- SKELETON: replace the ARN and bucket with your own
CREATE STORAGE INTEGRATION s3_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::000000000000:role/REPLACE_ME'
  STORAGE_ALLOWED_LOCATIONS = ('s3://your-bucket/path/');

-- Read the generated identity to trust on the cloud side
DESCRIBE INTEGRATION s3_int;

👀 Observe: DESCRIBE INTEGRATION returns the Snowflake-generated cloud identity. You grant that identity bucket access on AWS. The stage that uses this integration carries no access key of its own.

Read the API integration skeleton for external functions. The provider type tells the exam this is an API integration, not a storage one.

SQL

-- SKELETON: replace the role ARN and endpoint prefix
CREATE API INTEGRATION ext_func_int
  API_PROVIDER = aws_api_gateway
  API_AWS_ROLE_ARN = 'arn:aws:iam::000000000000:role/REPLACE_ME'
  API_ALLOWED_PREFIXES = ('https://your-endpoint.example.com/')
  ENABLED = TRUE;

👀 Observe: the keyword is API_PROVIDER, set to a cloud API gateway. An external function later references this integration by name. The function’s logic runs on the cloud provider, not on a virtual warehouse.

Read the Git integration skeleton. Note the provider value changes to git_https_api. The repository then names this integration.

SQL

-- SKELETON: works against a public repo with no secret
CREATE API INTEGRATION git_api_int
  API_PROVIDER = git_https_api
  API_ALLOWED_PREFIXES = ('https://github.com/your-account/')
  ENABLED = TRUE;

CREATE GIT REPOSITORY my_repo
  API_INTEGRATION = git_api_int
  ORIGIN = 'https://github.com/your-account/your-repo.git';

-- List the synced files like a stage path
LS @my_repo/branches/main/;

👀 Observe: the same CREATE API INTEGRATION command serves Git, with the provider set to git_https_api instead of a cloud gateway. The repository clone exposes branches, tags, and commits as paths you can list and run.

Clean up. Drop anything you created from the skeletons. These statements are safe to run even if a step above was skipped.

SQL

DROP GIT REPOSITORY  IF EXISTS my_repo;
DROP INTEGRATION     IF EXISTS git_api_int;
DROP INTEGRATION     IF EXISTS ext_func_int;
DROP INTEGRATION     IF EXISTS s3_int;

👀 Observe: every skeleton object is gone. Nothing here read or changed day10_orders, so the persistent Day 10 table is untouched for later days.

📚

Snowflake Docs

Authoritative references for every fact in today’s post. The drivers overview and the storage integration reference are the two worth re-reading the day before the exam.

❄️

Drivers: JDBC, ODBC, Python, Node.js, Go, .NET, and PHP

docs.snowflake.com/en/developer-guide/drivers

→

❄️

CREATE STORAGE INTEGRATION: Credential-Free Stage Access

docs.snowflake.com/en/sql-reference/sql/create-storage-integration

→

❄️

Using a Git Repository in Snowflake (New in C03)

docs.snowflake.com/en/developer-guide/git/git-overview

→

🔗

External References

The CREATE API INTEGRATION reference lists every provider value, including the Git and external-function gateways. The Kafka connector guide covers the Snowpipe Streaming path.

Reference Snowflake Documentation

CREATE API INTEGRATION: Provider Values for Git and External Functions

Guide Snowflake Documentation

Snowflake Connector for Kafka: Topics, Tables, and Snowpipe Streaming

❓

Practice Questions

Q1: Why would a team use a storage integration when creating an external stage? ▸

Options:

A. To compress files automatically during loading.
B. To avoid embedding cloud access keys in the stage definition.
C. To speed up COPY INTO by adding more compute.
D. To store the loaded data inside the integration object.

✅ Answer: B

Why B: A storage integration holds a generated cloud identity that the cloud administrator trusts. The stage references the integration by name, so no raw keys live in the stage DDL.

Why not A: Compression is a file-format and load setting. It has nothing to do with the integration object.

Why not C: An integration grants access. It adds no compute and does not change COPY performance.

Why not D: The integration stores a trust identity, not table data. The data stays in cloud storage and the table.

Q2: Which statement correctly describes Git integration in Snowflake? ▸

Options:

A. A Git repository object uses a storage integration to reach the remote repo.
B. A Git repository uses an API integration with API_PROVIDER = git_https_api.
C. Git integration copies only the main branch and discards tags.
D. Git files cannot be run; they can only be viewed in Snowsight.

✅ Answer: B

Why B: A Git repository clone needs an API integration whose provider is git_https_api. The repository then names that integration, the origin URL, and an optional secret.

Why not A: Git uses an API integration, not a storage integration. A storage integration only serves stages reaching cloud storage.

Why not C: The clone holds all branches, tags, and commits from the remote repo, not just main.

Why not D: You can run a repository file with EXECUTE IMMEDIATE FROM or point a handler at it.

Q3: A team builds an external function that calls an Amazon API Gateway endpoint. Which object must they create? ▸

Options:

A. A storage integration with TYPE = EXTERNAL_STAGE.
B. A notification integration.
C. An API integration with API_PROVIDER = aws_api_gateway.
D. A security integration.

✅ Answer: C

Why C: An external function calls an HTTPS endpoint, so it needs an API integration. For an Amazon proxy service, the provider is aws_api_gateway.

Why not A: A storage integration serves stages reaching cloud storage, not function calls to an endpoint.

Why not B: A notification integration handles event messaging, such as Snowpipe auto-ingest. It does not back external functions.

Why not D: A security integration handles authentication flows like SSO or OAuth, not external function calls.

Q4: Which of the following correctly separate Snowflake drivers from connectors? (Select TWO) ▸

Options:

A. JDBC is a driver that connects a Java application to Snowflake.
B. The Spark connector is a driver for a single programming language.
C. The Kafka connector is a driver installed in a Python script.
D. The Spark connector integrates an external platform with Snowflake.
E. ODBC is a connector that wires Kafka into Snowflake.

✅ Answer: A and D

Why A and D: A driver connects one programming language. JDBC serves Java. A connector integrates a whole platform. The Spark connector links Spark to Snowflake..

Why not B: Spark is an ecosystem, so its client is a connector, not a single-language driver.

Why not C: The Kafka connector runs in a Kafka Connect cluster. It is not a Python driver.

Why not E: ODBC is a driver for ODBC-based clients. It does not wire Kafka into Snowflake.

Q5: A data team wants bi-directional read and write between a Spark cluster and Snowflake, with filtering pushed into Snowflake. Which client fits? ▸

Options:

A. The ODBC driver.
B. The Snowflake Connector for Spark.
C. A storage integration.
D. The Python connector only.

✅ Answer: B

Why B: The Spark connector makes Snowflake a Spark data source for read and write. It also pushes predicates and query logic down into Snowflake for performance.

Why not A: ODBC is a driver for ODBC-based clients. It does not provide Spark data-source integration or pushdown.

Why not C: A storage integration grants stage access to cloud storage. It moves no data between Spark and Snowflake.

Why not D: The Python connector connects Python code. It is not the Spark data-source integration with pushdown.

📝 Recap

Today you learned: A driver connects one programming language to Snowflake. The seven drivers are JDBC, ODBC, Python, Node.js, Go, .NET, and PHP, all over TLS. A connector integrates a whole platform. The Spark connector gives bi-directional read and write with query pushdown. The Kafka connector loads topics into tables. Its Snowpipe Streaming path writes rows without staged files. A storage integration lets a stage reach cloud storage with no keys in the stage DDL, using TYPE = EXTERNAL_STAGE. An API integration trusts an HTTPS endpoint. External functions and Git both rely on it. Git integration is new to C03. It uses an API integration with API_PROVIDER = git_https_api and clones a remote repo’s branches, tags, and commits.

Key takeaway: Two swaps carry most of the Day 34 questions. A storage integration trusts cloud storage, while an API integration trusts an HTTPS endpoint. Drivers connect a language, while connectors integrate a platform. Add the new Git fact: it rides on an API integration, never a storage integration. The rest is recognition of names.

Tomorrow (Day 35): Week 5 recap and a mixed Domain 3 quiz. The review pulls together stages, file formats, COPY INTO options, and unloading. It also revisits Snowpipe, Snowpipe Streaming, streams and tasks, and today’s drivers, connectors, and integrations. Ten mixed questions test the lot.

← Day 33 All 50 Days Day 35 →

About the Author

Abhay Krishnan

Senior Data & AI Consultant

Connect on LinkedIn

With over five years of data engineering experience at EY and Infosys, Abhay Krishnan specializes in building scalable data pipelines and cloud warehousing solutions. He is a certified SnowPro Core professional, alongside credentials in AWS and Azure. Abhay created this 50-day track to solve a problem he faced firsthand: the lack of a structured, free resource for Snowflake certification prep. Follow him on LinkedIn for more data engineering insights.