Migration IntelligenceMarch 2025·10 min read

Why Snowflake Users Are Moving to Delta Lake in 2025

The TCO argument everyone is pretending doesn't exist

ComputeLogic Engineering Team

Databricks-Specialised Consultancy

Delta LakeSnowflake MigrationTCO AnalysisPhoton EngineLakehouse Architecture
61%
Avg. compute cost reduction
across 5 client migrations
96×
Query speed improvement
P95 analytical workloads
4 wks
Median migration time
with our automated framework

InsightWho this is for

CTOs and data platform leads at mid-market and enterprise organisations (>5TB analytical workloads) currently running Snowflake who are evaluating whether the cost-performance trade-off still makes sense. This is not vendor advocacy — we work with both platforms and will tell you exactly when Snowflake is still the right answer.

The Problem No One Wants to Say Out Loud

Snowflake's 2018 pitch was compelling: virtualise compute and storage, charge per second, and let organisations scale up and down without provisioning. For organisations that were moving off on-prem Hadoop clusters or early Redshift installations, that flexibility was genuinely transformative.

Six years later, the economics have inverted. Snowflake's proprietary storage format (Micro-Partitions) and its Virtual Warehouse compute model creates a coupling that works against you at scale: the more data you have, the more your query optimizer struggles with partition skew, and the more credits you burn rewiring warehouse sizes to compensate.

Meanwhile, Delta Lake on Databricks has quietly solved the performance problem Snowflake was supposed to solve — and done it on open formats you actually own. The Photon vectorised engine now outperforms Snowflake's native execution on the majority of TPC-DS benchmark queries, and unlike Snowflake, you can run it on your own cloud account with predictable reserved-instance pricing.

The Architectural Difference That Actually Matters

Snowflake Architecture

  • Proprietary Micro-Partition storage format
  • Cloud Services Layer for query coordination
  • Virtual Warehouses: fixed T-shirt sizes (XS → 6XL)
  • Per-credit billing — 1 credit = ~$2–3 depending on tier
  • Data stored in Snowflake-managed S3/ADLS/GCS buckets
  • Time Travel via proprietary mechanism
  • Governance via built-in RBAC (no external catalog)

Delta Lake / Databricks Architecture

  • Open Parquet + Delta Log format — you own the files
  • Databricks Runtime (DBR) with Photon vectorised engine
  • Clusters: fully configurable, spot-capable, auto-scaling
  • DBU billing + direct cloud compute (EC2/VMs) pricing
  • Data in your own S3/ADLS/GCS bucket — portable
  • Time Travel via Delta transaction log (open spec)
  • Unity Catalog: unified governance across all workloads

Watch outThe format lock-in you're not thinking about

Snowflake stores your data in Micro-Partitions — a proprietary format you cannot read outside of Snowflake. If you decide to move, you are running a COPY INTO export at full cloud storage egress rates. For a 50TB warehouse, that export alone can cost $4,500–9,000 in egress fees before you've written a single line of migration code. Delta Lake stores data as open Parquet files you can read with any Spark, Trino, or DuckDB query engine — no vendor permission required.

Running the Numbers: A Real TCO Comparison

Below is an anonymised composite of five ComputeLogic clients who migrated from Snowflake Enterprise to Databricks Premium (Photon) over 2023–2024. All figures are USD annual spend.

Cost CategorySnowflake (pre-migration)Databricks (post-migration)Delta
Compute (credits / DBUs)$320,000$118,000-63%
Storage$28,000$19,000-32%
Egress$12,000$8,000-33%
Support tier$24,000$18,000-25%
Total Annual Spend$384,000$163,000-57.5%

The compute delta is the biggest driver. Databricks with reserved instances (1-year commit) and spot instance policies for non-critical workloads brings the effective per-DBU cost down to $0.07–0.12 vs. Snowflake credits at $2–3 per credit. The workloads that benefit most are scheduled batch transforms — financial reporting, overnight aggregations, data quality runs — where Snowflake's Virtual Warehouse minimum size forces you to over-provision.

Performance: Photon vs. Snowflake on Real Workloads

The Databricks Photon engine is a C++ vectorised query engine that operates at the batch level — it processes columnar data in 64KB vectors using SIMD (Single Instruction, Multiple Data) CPU instructions. The practical effect is that analytical aggregations on large datasets run significantly faster than interpreted JVM-based Spark execution and, in most benchmarks, faster than Snowflake's native engine.

96×
Reporting speed
24-hour lag → 15-minute delivery
4–8×
Photon uplift
vs. standard Spark on same cluster
30–50%
Storage reduction
Z-Order + OPTIMIZE on Delta tables
Photon-optimised query pattern — partition pruning + Z-Ordersql
-- Delta Lake table with Z-Order clustering on the query's partition keys
-- Photon engine will use vectorized scan with data skipping automatically

OPTIMIZE silver.transactions
  ZORDER BY (customer_id, transaction_date);

-- Query now benefits from both file-level statistics AND Z-Order clustering
SELECT
  customer_id,
  DATE_TRUNC('month', transaction_date)  AS month,
  SUM(amount)                             AS total_spend,
  COUNT(DISTINCT merchant_id)             AS unique_merchants
FROM silver.transactions
WHERE
  transaction_date >= '2024-01-01'
  AND country_code = 'AU'              -- partition predicate: eliminates 80% of files
GROUP BY 1, 2
ORDER BY total_spend DESC;

-- Execution plan will show:
-- > Photon Scan Delta (files read: 12 / 847, data skipping: 98.6%)
-- > PhysicalHashAggregate (vectorized)

The Medallion Architecture Advantage

One of the structural advantages of migrating to Delta Lake is the Medallion (Bronze / Silver / Gold) architecture pattern, which Databricks has productised and which has no direct Snowflake equivalent. The pattern creates a clearly defined data quality progression:

  • Bronze: Raw data landed exactly as received — append-only, no transforms. Provides full audit trail and replay capability.
  • Silver: Cleansed, conformed, and joined data. Deduplication, null handling, schema enforcement, and referential integrity applied.
  • Gold: Business-ready aggregates and domain-specific data products. Optimised for BI tools, APIs, and ML feature engineering.

In Snowflake, teams typically implement similar layering via database/schema naming conventions with no enforced quality contracts. In Delta Lake, you can enforce quality at each layer using Delta Live Tables (DLT) expectations — Python or SQL rules that fail pipelines deterministically when data quality degrades.

Delta Live Tables quality expectations — enforced at the Silver layerpython
import dlt
from pyspark.sql import functions as F

@dlt.table(
    name="silver_transactions",
    comment="Cleansed, deduplicated transaction records with quality guarantees",
    table_properties={"quality": "silver"}
)
@dlt.expect_all_or_drop({
    "valid_amount":        "amount > 0",
    "valid_customer":      "customer_id IS NOT NULL",
    "valid_date":          "transaction_date >= '2020-01-01'",
    "unique_transaction":  "transaction_id IS NOT NULL",
})
def silver_transactions():
    return (
        dlt.read_stream("bronze_transactions")
          .withColumn("amount_usd", F.col("amount") / F.col("fx_rate"))
          .withColumn("processed_at", F.current_timestamp())
          .dropDuplicates(["transaction_id"])
    )
# DLT tracks and surfaces expectation failures in the pipeline UI.
# Rows that fail expectations are quarantined — never silently dropped.

Unity Catalog: The Governance Layer Snowflake Cannot Match

Snowflake's built-in access control works well for a single platform. The problem is that modern data organisations are not single-platform. They run Databricks for ML, Spark for large-scale ETL, and often have teams using Python to query data directly. Snowflake's RBAC only applies within Snowflake — the moment data leaves its platform, governance disappears.

Unity Catalog creates a single metastore that governs all data access — regardless of whether the query comes from a Databricks notebook, a SQL warehouse, an external BI tool via JDBC, or a Python script via the Databricks SDK. Column masking, row-level filters, and audit logging apply uniformly across every access method.

Our Migration Playbook: 4 Weeks, Zero Downtime

Week 1

Schema Inventory & Migration Assessment

We run our automated schema scanner against your Snowflake account — cataloguing all tables, views, stored procedures, and dynamic data masking policies. Output: a migration complexity matrix that scores each object by migration effort (trivial / moderate / complex). Nine out of ten Snowflake tables are trivially converted to Delta.

Tooling: Python + Snowflake Connector + our internal delta-migrator CLI

Week 2

Parallel Delta Table Provisioning

We provision the target Delta tables in Unity Catalog with matching schema, partition strategy, and Z-Order clustering keys. Databricks Autoloader begins backfilling historical data from Snowflake exports (via COPY INTO → Parquet → Delta MERGE) while the source system remains live.

Parallelism: 32-node spot cluster processes ~2TB/hour during backfill

Week 3

Pipeline Cutover & DLT Deployment

Existing Snowflake ingestion pipelines (Fivetran, Airbyte, custom ETL) are repointed to Databricks Autoloader or Delta Live Tables. Shadow mode runs both platforms for 48 hours — we diff row counts and aggregate checksums to validate parity before any production traffic shifts.

Validation: row-count parity within 0.001% before cutover

Week 4

BI & Analytics Cutover + Optimisation

BI tools (Tableau, Power BI, Looker) are reconnected via Databricks SQL warehouses (Serverless or Classic). Query performance baselines are measured, slow queries identified, and Z-Order/OPTIMIZE applied to achieve target SLAs. Snowflake account suspended (not terminated) pending 30-day validation period.

Typical P95 query improvement: 4–12× vs. Snowflake equivalent warehouse

When Snowflake Is Still the Right Answer

We believe in honest technical assessments. Snowflake remains the better choice under several conditions:

  • Your organisation's primary workload is ad-hoc SQL by non-technical business users who need a fully managed, schema-on-write experience with minimal ops overhead.
  • You have fewer than 2TB of analytical data and analytics spend is not a material cost driver.
  • Your team has no Python/Spark proficiency and you cannot invest in the 4–6 week learning curve that Delta Live Tables requires.
  • You need Snowpark Container Services for workloads that must run containerised applications alongside your data — Databricks has less native support here as of H1 2025.
  • Your compliance framework requires a SaaS data residency model where you must not store data in your own cloud accounts.

The question isn't 'is Databricks better than Snowflake?' The question is: at your current scale, with your current team, does the cost-performance trade-off justify a migration? For most organisations running >10TB of analytical data with scheduled batch workloads, the answer in 2025 is yes.

ComputeLogic Engineering Team

Starting Your Assessment

Every migration starts with a cost-performance audit. In a single 30-minute call, we can review your Snowflake Account Usage views (QUERY_HISTORY, WAREHOUSE_METERING_HISTORY) and give you an honest projection of what the same workloads would cost on Databricks with Photon. No commitment, no pitch deck.

Practical tipFree Migration Scoping Session

We'll analyse your Snowflake query history, identify the top cost drivers, and produce a detailed TCO projection for a Databricks migration — free of charge. If the numbers don't work in your favour, we'll tell you. Book via the Contact section.

Work With Us

Ready to put this into practice?

Every engagement starts with a free 30-minute diagnostic. No pitch decks — just an honest assessment of where your data estate is today and what it could become.