Live Data · Q1 20265 Engines · 12 Dimensions · 847 Test Runs

LAKEHOUSE ENGINE
BENCHMARK MATRIX

Independent stress-tests across five production-grade lakehouse engines. Cyan cells mark category leaders. Every number is reproducible — methodology and raw Parquet files linked in the full report.

Databricks3/12 wins
Iceberg4/12 wins
Delta Lake1/12 wins
Hudi0/12 wins
Dremio4/12 wins
lakehouse_benchmark_matrix.parquet · last_run: 2026-02-27 · n=847
BENCHMARK DIMENSION
Databricks3 wins
Iceberg4 wins
Delta Lake1 wins
Hudi0 wins
Dremio4 wins
Query Latency p50unit: ms
142▲ BEST198211287163
Query Latency p99unit: ms
412▲ BEST573604891488
Time-Travel Query Costunit: $/query
$0.0052$0.0031▲ BEST$0.0037$0.0089$0.0041
Concurrent Write Throughputunit: MB/s
1840162017109802210▲ BEST
Schema Evolution Supportunit: score/100
9498▲ BEST918782
Partition Pruning Efficiencyunit: %
97.2%98.8%▲ BEST96.1%91.4%95.7%
Small-File Compactionunit: sec/GB
8.411.29.714.67.1▲ BEST
ACID Transaction Overheadunit: ms/txn
233118▲ BEST4438
Cloud Storage Egressunit: $/TB
$0.87$0.64$0.71$1.12$0.58▲ BEST
Spark Compatibilityunit: score/100
99▲ BEST97989488
Flink Compatibilityunit: score/100
9196▲ BEST888983
Trino/Presto Read Perfunit: GB/s
3.24.13.82.94.7▲ BEST
Environment: r6i.4xlarge · us-east-1 · 10TB TPC-DS dataset · Apache Parquet
scroll to explore
02 // Test Environment

METHODOLOGY &
ENVIRONMENT SPECS

We disclose every variable. If a number surprises you, click through to the raw Parquet files and run the assertion yourself.

Instance Type

r6i.4xlarge (128 GB RAM, 16 vCPU)

Cloud Region

AWS us-east-1 (N. Virginia)

Dataset Size

10 TB TPC-DS (SF=10000)

File Format

Apache Parquet · Snappy compression

Spark Version

Apache Spark 3.5.1

Flink Version

Apache Flink 1.19.0

Test Duration

72 hours per engine · 847 total runs

Isolation

Dedicated VPC · no shared workloads

01

Reproducible by Default

Every test script, Terraform config, and seed dataset is published to GitHub. Any team with an AWS account can replicate the full suite in under 4 hours.

02

Vendor-Blind Configuration

Each engine is configured using its own published best-practices guide. No deliberate handicapping. If Databricks recommends Delta caching, we enable it.

03

Quarterly Re-Tests

Engines ship fast. We re-run the full benchmark suite every 90 days and publish diffs. Subscribers get email alerts when their primary engine changes rank.

$
REPRODUCIBILITY STATEMENT:

git clone github.com/lakehouse-lab/benchmarks · ./run_suite.sh --engine all --dataset tpcds-10tb
# Full suite completes in ~4 hours on r6i.4xlarge. Raw results output to /results/*.parquet

03 // Deep Dive

DIMENSION-LEVEL
ANALYSIS

Each section isolates one benchmark dimension — methodology, raw numbers, and a one-sentence verdict you can paste into your RFP.

DIMENSION 01

Databricks leads p50 by 28% over nearest rival

We ran 99 TPC-DS queries 5× each engine at steady-state (warm cache, no cold starts). p50 reflects median interactive query performance — the number your BI users feel every day. Databricks Photon engine's vectorized execution delivers 142ms median versus Iceberg's 198ms on the same r6i.4xlarge hardware.

Test Environment

r6i.4xlarge · 10TB TPC-DS · Warm cache · 5 runs per query

Editorial Verdict

Databricks wins p50 latency by 28%. For interactive BI workloads, this gap is user-perceptible. Iceberg on Trino closes to within 15% at p99 — making it the better choice for batch-heavy pipelines where tail latency matters more than median.

DatabricksWINNER
142ms
Dremio
163ms
Iceberg
198ms
Delta Lake
211ms
Hudi
287ms

← Lower is better

DIMENSION 02

Dremio auto-compaction 40% faster than Hudi on 1M-file datasets

The small-file problem is the silent killer of lakehouse performance. We generated 1 million 512KB Parquet files (simulating 6 months of CDC ingestion) and measured time-to-compact to target 128MB file size. Dremio's Automatic Reflection Refresh triggered compaction in 7.1s/GB; Hudi's async compaction service required 14.6s/GB even with maxParallelism=32.

Test Environment

r6i.4xlarge · 1M × 512KB files · 128MB target file size · 32 parallel threads

Editorial Verdict

Dremio wins small-file compaction at 7.1s/GB. If your pipeline generates >100K files/day from streaming CDC, Dremio or Databricks are the only defensible choices. Hudi's compaction lag creates a 2–3× query penalty window that compounds under high-ingestion workloads.

DremioWINNER
7.1s/GB
Databricks
8.4s/GB
Delta Lake
9.7s/GB
Iceberg
11.2s/GB
Hudi
14.6s/GB

← Lower is better

DIMENSION 03

Delta Lake ACID overhead 21% lower than Iceberg at 1K concurrent writers

We measured the lock acquisition + commit latency overhead of ACID transactions under concurrent write pressure: 1,000 simultaneous upsert transactions on a 500GB table. Delta Lake's optimistic concurrency control with log-based conflict detection adds only 18ms overhead per transaction. Hudi's timeline-based locking adds 44ms — a 2.4× penalty that compounds at scale.

Test Environment

r6i.4xlarge · 500GB table · 1,000 concurrent upserts · optimistic concurrency

Editorial Verdict

Delta Lake wins ACID overhead at 18ms/txn. For high-frequency upsert workloads (CDC replication, event sourcing), Delta Lake's optimistic concurrency model outperforms Hudi's timeline locking by 2.4×. Iceberg's 31ms is a reasonable middle ground for teams already invested in the Iceberg ecosystem.

Delta LakeWINNER
18ms
Databricks
23ms
Iceberg
31ms
Dremio
38ms
Hudi
44ms

← Lower is better

Showing 3 of 12 benchmark dimensions.

Access All 12 Dimensions in Full Report
04 // Field Reports

ENGINEERS WHO
USED THE DATA

$2.4Minfrastructure cost avoided

"We were 6 weeks into a Delta Lake POC when we found Lakehouse's ACID overhead benchmark. The 18ms vs 44ms gap on concurrent upserts matched exactly what we were seeing in production CDC. We migrated the evaluation criteria overnight and saved a $2.4M infrastructure mistake."

Priya Raghunathan

Staff Data Engineer · Meridian Financial

Migrated 14TB Hive warehouse · chose Delta Lake
$180Kannual egress savings identified

"The partition pruning efficiency numbers were the one artifact I needed for the board deck. 98.8% on Iceberg vs 91.4% on Hudi — that's a $180K/year egress difference at our query volume. The benchmark methodology section answered every question our security team raised about data provenance."

Marcus Okafor

Platform Architect · Axiom Logistics

Enterprise RFP · 40-node cluster · 8TB/day ingestion
3 monthsof internal debate resolved

"I sent the one-page benchmark summary to our CTO at 11 PM on a Tuesday. By Wednesday morning we had budget approval for the Dremio migration. Three months of internal debates ended because the numbers were specific, the methodology was clean, and there was nothing to argue with."

Tomás Herrera

Principal Engineer · Vektor Analytics

CTO presentation · 7-figure infrastructure approval
05 // Full Report

UPGRADE YOUR
ACCESS LEVEL

80% of the benchmark data is visible on this page — free, ungated, exportable. The full report adds depth: cost modeling, migration playbooks, and quarterly diffs.

Free · No SignupOn this page
  • This comparison table (all 12 dimensions, 5 engines)
  • Methodology overview and environment specs
  • Three deep-dive dimension analyses
  • Editorial verdicts for query latency, compaction, and ACID overhead
Full ReportWork email required
  • Full 12-dimension benchmark results with statistical confidence intervals
  • Cost modeling spreadsheet: TCO calculator for 1TB → 1PB scale
  • Quarterly re-test diffs: see how each engine changed over 4 quarters
  • Migration playbook: Hive → your chosen lakehouse in 14 steps
  • Raw Parquet result files for independent verification
  • Private Slack channel: ask methodology questions directly
download_benchmark_report.sh

$ ./download_report.sh --format pdf+xlsx --include raw-parquet

Enter credentials to authenticate...

# Personal emails excluded — this report is for engineering teams

# Used to tailor the migration section of the report

No marketing emails. One delivery. Unsubscribe from quarterly re-test alerts at any time.