Live Data · Q1 20265 Engines · 12 Dimensions · 847 Test Runs

LAKEHOUSE ENGINE
BENCHMARK MATRIX

Independent stress-tests across five production-grade lakehouse engines. Cyan cells mark category leaders. Every number is reproducible — methodology and raw Parquet files linked in the full report.

Databricks3/12 wins

Iceberg4/12 wins

Delta Lake1/12 wins

Hudi0/12 wins

Dremio4/12 wins

lakehouse_benchmark_matrix.parquet · last_run: 2026-02-27 · n=847

● WINNER

BENCHMARK DIMENSION	Databricks3 wins	Iceberg4 wins	Delta Lake1 wins	Hudi0 wins	Dremio4 wins
Query Latency p50unit: ms	142▲ BEST	198	211	287	163
Query Latency p99unit: ms	412▲ BEST	573	604	891	488
Time-Travel Query Costunit: $/query	$0.0052	$0.0031▲ BEST	$0.0037	$0.0089	$0.0041
Concurrent Write Throughputunit: MB/s	1840	1620	1710	980	2210▲ BEST
Schema Evolution Supportunit: score/100	94	98▲ BEST	91	87	82
Partition Pruning Efficiencyunit: %	97.2%	98.8%▲ BEST	96.1%	91.4%	95.7%
Small-File Compactionunit: sec/GB	8.4	11.2	9.7	14.6	7.1▲ BEST
ACID Transaction Overheadunit: ms/txn	23	31	18▲ BEST	44	38
Cloud Storage Egressunit: $/TB	$0.87	$0.64	$0.71	$1.12	$0.58▲ BEST
Spark Compatibilityunit: score/100	99▲ BEST	97	98	94	88
Flink Compatibilityunit: score/100	91	96▲ BEST	88	89	83
Trino/Presto Read Perfunit: GB/s	3.2	4.1	3.8	2.9	4.7▲ BEST

Environment: r6i.4xlarge · us-east-1 · 10TB TPC-DS dataset · Apache Parquet

Export CSV|View Methodology →

scroll to explore

02 // Test Environment

METHODOLOGY &
ENVIRONMENT SPECS

We disclose every variable. If a number surprises you, click through to the raw Parquet files and run the assertion yourself.

⬡

Instance Type

r6i.4xlarge (128 GB RAM, 16 vCPU)

◈

Cloud Region

AWS us-east-1 (N. Virginia)

◉

Dataset Size

10 TB TPC-DS (SF=10000)

◫

File Format

Apache Parquet · Snappy compression

◆

Spark Version

Apache Spark 3.5.1

◇

Flink Version

Apache Flink 1.19.0

◷

Test Duration

72 hours per engine · 847 total runs

◻

Isolation

Dedicated VPC · no shared workloads

Reproducible by Default

Every test script, Terraform config, and seed dataset is published to GitHub. Any team with an AWS account can replicate the full suite in under 4 hours.

Vendor-Blind Configuration

Each engine is configured using its own published best-practices guide. No deliberate handicapping. If Databricks recommends Delta caching, we enable it.

Quarterly Re-Tests

Engines ship fast. We re-run the full benchmark suite every 90 days and publish diffs. Subscribers get email alerts when their primary engine changes rank.

REPRODUCIBILITY STATEMENT:

git clone github.com/lakehouse-lab/benchmarks · ./run_suite.sh --engine all --dataset tpcds-10tb
# Full suite completes in ~4 hours on r6i.4xlarge. Raw results output to /results/*.parquet

03 // Deep Dive

DIMENSION-LEVEL
ANALYSIS

Each section isolates one benchmark dimension — methodology, raw numbers, and a one-sentence verdict you can paste into your RFP.

DIMENSION 01

milliseconds · lower is better

Databricks leads p50 by 28% over nearest rival

We ran 99 TPC-DS queries 5× each engine at steady-state (warm cache, no cold starts). p50 reflects median interactive query performance — the number your BI users feel every day. Databricks Photon engine's vectorized execution delivers 142ms median versus Iceberg's 198ms on the same r6i.4xlarge hardware.

Test Environment

r6i.4xlarge · 10TB TPC-DS · Warm cache · 5 runs per query

Editorial Verdict

Databricks wins p50 latency by 28%. For interactive BI workloads, this gap is user-perceptible. Iceberg on Trino closes to within 15% at p99 — making it the better choice for batch-heavy pipelines where tail latency matters more than median.

DatabricksWINNER

142ms

Dremio

163ms

Iceberg

198ms

Delta Lake

211ms

Hudi

287ms

← Lower is better

DIMENSION 02

seconds per GB · lower is better

Dremio auto-compaction 40% faster than Hudi on 1M-file datasets

The small-file problem is the silent killer of lakehouse performance. We generated 1 million 512KB Parquet files (simulating 6 months of CDC ingestion) and measured time-to-compact to target 128MB file size. Dremio's Automatic Reflection Refresh triggered compaction in 7.1s/GB; Hudi's async compaction service required 14.6s/GB even with maxParallelism=32.

Test Environment

r6i.4xlarge · 1M × 512KB files · 128MB target file size · 32 parallel threads

Editorial Verdict

Dremio wins small-file compaction at 7.1s/GB. If your pipeline generates >100K files/day from streaming CDC, Dremio or Databricks are the only defensible choices. Hudi's compaction lag creates a 2–3× query penalty window that compounds under high-ingestion workloads.

DremioWINNER

7.1s/GB

Databricks

8.4s/GB

Delta Lake

9.7s/GB

Iceberg

11.2s/GB

Hudi

14.6s/GB

← Lower is better

DIMENSION 03

milliseconds per transaction · lower is better

Delta Lake ACID overhead 21% lower than Iceberg at 1K concurrent writers

We measured the lock acquisition + commit latency overhead of ACID transactions under concurrent write pressure: 1,000 simultaneous upsert transactions on a 500GB table. Delta Lake's optimistic concurrency control with log-based conflict detection adds only 18ms overhead per transaction. Hudi's timeline-based locking adds 44ms — a 2.4× penalty that compounds at scale.

Test Environment

r6i.4xlarge · 500GB table · 1,000 concurrent upserts · optimistic concurrency

Editorial Verdict

Delta Lake wins ACID overhead at 18ms/txn. For high-frequency upsert workloads (CDC replication, event sourcing), Delta Lake's optimistic concurrency model outperforms Hudi's timeline locking by 2.4×. Iceberg's 31ms is a reasonable middle ground for teams already invested in the Iceberg ecosystem.

Delta LakeWINNER

18ms

Databricks

23ms

Iceberg

31ms

Dremio

38ms

Hudi

44ms

← Lower is better

Showing 3 of 12 benchmark dimensions.

Access All 12 Dimensions in Full Report

04 // Field Reports

ENGINEERS WHO
USED THE DATA

$2.4Minfrastructure cost avoided

"We were 6 weeks into a Delta Lake POC when we found Lakehouse's ACID overhead benchmark. The 18ms vs 44ms gap on concurrent upserts matched exactly what we were seeing in production CDC. We migrated the evaluation criteria overnight and saved a $2.4M infrastructure mistake."

Priya Raghunathan

Staff Data Engineer · Meridian Financial

Migrated 14TB Hive warehouse · chose Delta Lake

$180Kannual egress savings identified

"The partition pruning efficiency numbers were the one artifact I needed for the board deck. 98.8% on Iceberg vs 91.4% on Hudi — that's a $180K/year egress difference at our query volume. The benchmark methodology section answered every question our security team raised about data provenance."

Marcus Okafor

Platform Architect · Axiom Logistics

Enterprise RFP · 40-node cluster · 8TB/day ingestion

3 monthsof internal debate resolved

"I sent the one-page benchmark summary to our CTO at 11 PM on a Tuesday. By Wednesday morning we had budget approval for the Dremio migration. Three months of internal debates ended because the numbers were specific, the methodology was clean, and there was nothing to argue with."

Tomás Herrera

Principal Engineer · Vektor Analytics

CTO presentation · 7-figure infrastructure approval

05 // Full Report

UPGRADE YOUR
ACCESS LEVEL

80% of the benchmark data is visible on this page — free, ungated, exportable. The full report adds depth: cost modeling, migration playbooks, and quarterly diffs.

Free · No SignupOn this page

✓This comparison table (all 12 dimensions, 5 engines)
✓Methodology overview and environment specs
✓Three deep-dive dimension analyses
✓Editorial verdicts for query latency, compaction, and ACID overhead

Full ReportWork email required

✦Full 12-dimension benchmark results with statistical confidence intervals
✦Cost modeling spreadsheet: TCO calculator for 1TB → 1PB scale
✦Quarterly re-test diffs: see how each engine changed over 4 quarters
✦Migration playbook: Hive → your chosen lakehouse in 14 steps
✦Raw Parquet result files for independent verification
✦Private Slack channel: ask methodology questions directly

download_benchmark_report.sh

$ ./download_report.sh --format pdf+xlsx --include raw-parquet

→ Enter credentials to authenticate...

LAKEHOUSE ENGINEBENCHMARK MATRIX

METHODOLOGY &ENVIRONMENT SPECS

Reproducible by Default

Vendor-Blind Configuration

Quarterly Re-Tests

DIMENSION-LEVELANALYSIS

Databricks leads p50 by 28% over nearest rival

Dremio auto-compaction 40% faster than Hudi on 1M-file datasets

Delta Lake ACID overhead 21% lower than Iceberg at 1K concurrent writers

ENGINEERS WHOUSED THE DATA

UPGRADE YOURACCESS LEVEL

LAKEHOUSE ENGINE
BENCHMARK MATRIX

METHODOLOGY &
ENVIRONMENT SPECS

DIMENSION-LEVEL
ANALYSIS

ENGINEERS WHO
USED THE DATA

UPGRADE YOUR
ACCESS LEVEL