DuckDB includes a comprehensive benchmarking suite for measuring and validating performance across various workloads.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/duckdb/duckdb/llms.txt
Use this file to discover all available pages before exploring further.
Running Benchmarks
Building the Benchmark Runner
FromREADME.md:45 and benchmark/README.md:2:
- The benchmark runner executable
- TPC-H data generator and queries
- Micro-benchmark suite
Listing Available Benchmarks
micro/- Focused micro-benchmarks for specific operationstpch/- TPC-H decision support queriestpcds/- TPC-DS decision support queriestrainbenchmark/- Graph pattern matching queries
Running a Single Benchmark
Frombenchmark/README.md:14:
Using Regex Patterns
Run multiple related benchmarks:Running All Benchmarks
Output Options
Save timing results to a file:Benchmark Information
View benchmark metadata:Viewing Queries
Print the benchmark query:Profiling Benchmarks
Generate a query profile:TPC-H Benchmarks
Overview
TPC-H is an industry-standard decision support benchmark consisting of 22 complex queries that simulate business intelligence workloads. Frombenchmark/tpch/sf1/tpch_sf1.benchmark.in:8:
Scale Factors
TPC-H supports multiple scale factors (SF):- SF1 - 1 GB dataset (~6 million order rows)
- SF10 - 10 GB dataset
- SF100 - 100 GB dataset
Query Types
TPC-H queries cover:- Aggregations - SUM, COUNT, AVG with GROUP BY
- Joins - Multi-way joins across fact and dimension tables
- Subqueries - Correlated and uncorrelated subqueries
- Sorting - ORDER BY with various columns
- Window functions - ROW_NUMBER, RANK, etc.
Schema
TPC-H models a wholesale supplier database:customer- Customer informationorders- Customer orderslineitem- Order line items (largest table)part- Parts catalogsupplier- Supplier informationpartsupp- Parts supplied by suppliersnation- Nation codesregion- Geographic regions
Running TPC-H Benchmarks
Run a specific TPC-H query at SF1:Example TPC-H Query (Q1)
Pricing summary report:- Filter pushdown on dates
- Multiple aggregation functions
- Arithmetic in aggregates
- GROUP BY and ORDER BY
TPC-DS Benchmarks
Overview
TPC-DS is a more complex decision support benchmark with 99 queries modeling retail product analytics. Frombenchmark/tpcds/sf1/tpcds_sf1.benchmark.in:8:
Key Differences from TPC-H
- More queries - 99 vs. 22 queries
- More complex - Deeper nesting, more joins
- More tables - 24 tables vs. 8 tables
- More features - ROLLUP, CUBE, window functions
Running TPC-DS Benchmarks
Micro-Benchmarks
Zone Maps
Frombenchmark/micro/zonemaps/zonemaps.benchmark:9:
Aggregation Benchmarks
- Simple aggregates (SUM, COUNT, AVG)
- GROUP BY with various cardinalities
- DISTINCT aggregates
- Multiple aggregates in one query
Join Benchmarks
- Hash joins
- Nested loop joins
- Index joins
- Join ordering optimization
String Benchmarks
- String functions (UPPER, LOWER, SUBSTRING)
- LIKE and regex matching
- String aggregation
- String comparison
Performance Characteristics
Columnar Storage Benefits
DuckDB’s columnar storage excels at:- Aggregations - Process only needed columns
- Compression - Similar values compress well
- SIMD operations - Vectorized execution on columns
- Zone map pruning - Skip irrelevant row groups
Query Performance Patterns
Fast queries:- Highly selective filters with zone map pruning
- Aggregations over few columns
- Sorted data matching query ORDER BY
- Full table scans without filters
- High-cardinality GROUP BY
- Complex join graphs without statistics
- Random access patterns
Memory Management
DuckDB automatically:- Spills to disk when memory exceeds limits
- Uses streaming operators for large results
- Optimizes hash table sizes based on cardinality
Creating Custom Benchmarks
Benchmark File Format
Create a.benchmark file:
name- Display namegroup- Category for organizationload- Setup queries (run once)run- Benchmark query (run multiple times)result- Optional expected result format
Running Custom Benchmarks
Performance Analysis Tips
- Consistent environment - Disable CPU frequency scaling, close other applications
- Multiple runs - Average at least 5 runs to account for variance
- Warm cache - Run once to warm OS page cache, then measure
-
Profile first - Use
--profileto identify bottlenecks before optimizing - Compare apples to apples - Same scale factor, same hardware, same configuration
- Monitor resources - Check CPU, memory, and I/O during benchmarks
-
Version control - Record DuckDB version and configuration:
See Also
- Query Optimization - Understanding query plans
- Indexing Strategies - Using indexes and zone maps effectively