Skip to content
Learni
View all tutorials
Data Engineering

How to Set Up and Use Snowflake in 2026

Lire en français

Introduction

Snowflake is a fully managed cloud data warehouse that separates storage from compute, offering infinite horizontal scalability without infrastructure management. In 2026, with the rise of AI and real-time data pipelines, Snowflake shines with features like Snowpipe for streaming, Time Travel for audits, and zero-copy clones.

This intermediate tutorial walks you through setting up a complete environment: installing SnowSQL, creating warehouses, databases, loading data via internal stages, advanced analytical queries, and cost optimization. Whether migrating from a data lake or building a data mart, these concrete steps will boost your productivity. Imagine scaling your compute from X-Small to 6X-Large with one SQL command, no downtime. Ready to process 100 TB of data in minutes? (128 words)

Prerequisites

  • An active Snowflake account (free trial at snowflake.com)
  • SQL basics (JOIN, WINDOW functions)
  • SnowSQL installed (optional but recommended)
  • Test CSV files (e.g., 1 MB sales data)

Install SnowSQL

install_snowsql.sh
#!/bin/bash
# Download and install SnowSQL (macOS/Linux)
wget https://repo1.maven.org/maven2/net/snowflake/snowsql/1.3.5/snowsql-1.3.5-linux_x86_64.tgz

tar -xzf snowsql-1.3.5-linux_x86_64.tgz

sudo mv snowsql/ /opt/snowsql/

# Configure ~/.snowsql/config
cat > ~/.snowsql/config << EOF
[connections]
accountname = xy12345.us-east-1
username = YOUR_USER
password = YOUR_PASSWORD
warehouse = COMPUTE_WH
rolename = SYSADMIN
EOF

# Test connection
/opt/snowsql/snowsql -a xy12345.us-east-1 -u YOUR_USER

This script downloads, installs SnowSQL, and configures the ~/.snowsql/config file with your Snowflake credentials. Replace placeholders with your real values. SnowSQL is the official CLI for running SQL in batch or interactive mode, ideal for CI/CD automation. Avoid plaintext passwords in production; use key-pair auth instead.

Connect and Create a Warehouse

Once SnowSQL is installed, connect to your account. A warehouse is the scalable compute unit (like an auto-resizing cluster). Start with XS for testing, scalable up to 6XL.

Create a Virtual Warehouse

create_warehouse.sql
USE ROLE SYSADMIN;

-- Suspend if exists
ALTER WAREHOUSE IF EXISTS MY_WH SUSPEND;

-- Create XS warehouse (1 credit/hour)
CREATE WAREHOUSE MY_WH
  WAREHOUSE_SIZE = 'XSMALL'
  AUTO_SUSPEND = 300
  AUTO_RESUME = TRUE
  MIN_CLUSTER_COUNT = 1
  MAX_CLUSTER_COUNT = 2
  SCALING_POLICY = 'STANDARD';

-- Use the warehouse
USE WAREHOUSE MY_WH;

-- Check status
SELECT * FROM TABLE(INFORMATION_SCHEMA.WAREHOUSE_LOAD_HISTORY(MY_WH));

This script creates a 'MY_WH' warehouse sized XS with auto-suspend after 5 minutes of inactivity to avoid unnecessary costs. AUTO_RESUME restarts it instantly. Check load via WAREHOUSE_LOAD_HISTORY to monitor credits used. Pitfall: Don't forget to manually SUSPEND in dev to save money.

Structure the Database

Create a database and schema. Snowflake supports flexible namespaces with Time Travel (up to 90 days) for recoveries.

Create DB, Schema, and Table

create_db_schema.sql
USE ROLE SYSADMIN;

-- Create database with 7-day retention
CREATE DATABASE IF NOT EXISTS MY_DB
  DATA_RETENTION_TIME_IN_DAYS = 7;

USE DATABASE MY_DB;

-- Create schema
CREATE SCHEMA IF NOT EXISTS ANALYTICS;

USE SCHEMA ANALYTICS;

-- Table for sales data (clustered on date_v)
CREATE TABLE sales (
  id INT AUTOINCREMENT,
  product VARCHAR(100),
  sale_date NUMERIC(8,0),
  quantity INT,
  amount DECIMAL(10,2)
)
CLUSTER BY (sale_date);

-- Insert test data
INSERT INTO sales (product, sale_date, quantity, amount) VALUES
('Laptop', 20260115, 1, 1200.00),
('Mouse', 20260115, 10, 25.00);

Creates a 'MY_DB' database with retention for Time Travel, an 'ANALYTICS' schema, and a table clustered on sale_date to optimize scans (automatic pruning). CLUSTER BY speeds up temporal queries. Test inserts validate the structure. Note: AUTOINCREMENT requires Snowflake Enterprise for advanced features.

Load Data via Internal Stage

load_data.sql
-- Create internal stage
CREATE STAGE my_stage
  DIRECTORY = ( ENABLE = TRUE )
  ENCRYPTION = ( TYPE = 'SNOWFLAKE_SSE' );

-- Upload local file (via SnowSQL or Web UI)
PUT file:///path/to/sales.csv @my_stage AUTO_COMPRESS=TRUE;

-- Copy into table (default CSV format)
COPY INTO sales
FROM @my_stage/sales.csv.gz
FILE_FORMAT = (TYPE = CSV FIELD_OPTIONALLY_ENCLOSED_BY = '"' SKIP_HEADER = 1)
ON_ERROR = 'CONTINUE';

-- Verify
SELECT COUNT(*) FROM sales;

Creates a secure internal stage for uploading compressed CSVs via PUT. COPY INTO parses and bulk loads with error handling. SKIP_HEADER ignores the header. For 1 TB+, use external S3/GCS stages. Pitfall: Match FILE_FORMAT to your CSV or get unexpected NULLs.

Advanced Analytical Queries

Leverage powerful analytics: window functions, JSON, semi-structured data.

Queries with WINDOW and JSON

advanced_queries.sql
-- Sales per day with running total
SELECT
  sale_date,
  SUM(amount) as daily_total,
  SUM(SUM(amount)) OVER (ORDER BY sale_date ROWS UNBOUNDED PRECEDING) as running_total
FROM sales
GROUP BY sale_date;

-- JSON variant support
CREATE TABLE events (
  data VARIANT
);

INSERT INTO events SELECT PARSE_JSON('{"user_id":1, "action":"click"}');

SELECT data:user_id::INT as user_id, data:action
FROM events
WHERE data:action = 'click';

WINDOW query calculates a running total for trends. VARIANT handles JSON natively without fixed schemas, ideal for IoT logs. ::INT casts for analytics. Scales to billions of rows via Snowflake's MPP. Avoid costly UDFs; prefer native SQL.

Time Travel and Cloning

time_travel_clone.sql
-- Recover table from -1h (if retention >0)
CREATE TABLE sales_backup AS
SELECT * FROM sales AT (OFFSET => -3600);

-- Zero-copy clone (instant, no storage duplication)
CREATE TABLE sales_prod CLONE sales;

-- Update clone (independent)
UPDATE sales_prod SET amount = amount * 1.1 WHERE id = 1;

-- Check metadata
SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'sales_prod';

Time Travel recovers history via OFFSET (seconds). CLONE creates a metadata-only view, perfect for dev/prod without storage costs. Updates are independent. Limit: default retention 1 day; increase for compliance.

Set Up Snowpipe for Streaming

snowpipe_setup.sql
-- External S3 stage (AWS example)
CREATE STAGE my_s3_stage
  URL = 's3://my-bucket/'
  CREDENTIALS = (AWS_KEY_ID='xxx' AWS_SECRET_KEY='yyy');

-- Auto-ingest pipe
CREATE PIPE sales_pipe
  AUTO_INGEST = TRUE
  AS
  COPY INTO sales
  FROM @my_s3_stage
  FILE_FORMAT = (TYPE = 'CSV' FIELD_DELIMITER = ',' SKIP_HEADER=1);

-- S3 notification (Lambda or NSQ)
-- Config via AWS SNS: arn:aws:sns:...

-- Check pipes
SELECT * FROM TABLE(INFORMATION_SCHEMA.PIPE_STATUS('SALES_PIPE'));
SHOW PIPES;

Snowpipe ingests in near-real-time via cloud notifications (S3 Event). AUTO_INGEST enables auto-polling. Cost: per file read. For Kafka/Stream, use Kafka connector. Pitfall: Test credentials; silent errors if misconfigured.

Best Practices

  • Scale warehouses per query: Use RESOURCE_MONITOR for budget caps.
  • Multi-column cluster keys: On top 20% predicates for 90% pruning.
  • Stages + COPY: Always compressed (GZ 4x gain), partition S3 by date.
  • Zero-copy clones: For QA/prod, avoids duplication (50%+ savings).
  • Monitor credits: QUERY_HISTORY + ACCOUNT_USAGE for Slack alerts.

Common Errors to Avoid

  • Warehouse not suspended: Drains credits; use CRON script for SUSPEND.
  • No clustering: Full-table scans on >1B rows → timeouts.
  • Forgot FILE_FORMAT: Malformed data (dates as STRING).
  • Time Travel expired: Set DATA_RETENTION_TIME=90 at creation.

Next Steps