DuckDB employs a comprehensive query optimizer that transforms logical query plans into efficient physical execution plans through multiple optimization passes.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/duckdb/duckdb/llms.txt
Use this file to discover all available pages before exploring further.
Optimizer Architecture
The optimizer is implemented insrc/optimizer/optimizer.cpp and applies a series of transformation rules in a specific order. Each optimization pass can be individually enabled or disabled through configuration.
Optimization Phases
DuckDB’s optimizer runs the following key optimization passes:- Expression Rewriting - Simplifies expression trees without changing plan structure
- Filter Pushdown - Moves filter predicates closer to data sources
- Join Ordering - Determines optimal join order using cost-based optimization
- Statistics Propagation - Propagates column statistics through the query plan
- Column Pruning - Removes unused columns early in execution
- Common Subexpression Elimination - Eliminates redundant computations
Using EXPLAIN
TheEXPLAIN command shows how DuckDB will execute your query:
EXPLAIN ANALYZE:
- Actual execution time for each operator
- Number of rows processed
- Memory usage
- Chosen physical operators
Expression Rewriting Rules
DuckDB applies numerous expression simplification rules (seesrc/optimizer/rule/):
Constant Folding
Evaluates constant expressions at optimization time:src/optimizer/rule/constant_folding.cpp
Arithmetic Simplification
Simplifies arithmetic expressions:src/optimizer/rule/arithmetic_simplification.cpp
Conjunction Simplification
Simplifies boolean expressions:src/optimizer/rule/conjunction_simplification.cpp
Comparison Simplification
Optimizes comparison operations:src/optimizer/rule/comparison_simplification.cpp
LIKE Optimizations
Converts LIKE patterns to more efficient operations:src/optimizer/rule/like_optimizations.cpp
Filter Optimization
Filter Pushdown
The filter pushdown optimizer (src/optimizer/filter_pushdown.cpp) moves filter predicates as close to the data source as possible:
Filter Pullup
Filter pullup (src/optimizer/filter_pullup.cpp) extracts filters from subqueries when beneficial:
Join Optimization
Cost-Based Join Ordering
DuckDB uses dynamic programming to find optimal join orders (seesrc/optimizer/join_order/).
The cost model (src/optimizer/join_order/cost_model.cpp) estimates join costs based on:
- Cardinality estimates - Expected number of output rows
- Base relation sizes - Statistics from table scans
- Join selectivity - Estimated filtering effect of join predicates
Cardinality Estimation
The cardinality estimator (src/optimizer/join_order/cardinality_estimator.cpp) uses:
- Table statistics (row counts, distinct values)
- Column histograms
- Filter selectivity estimates
- Join selectivity based on foreign key relationships
Join Elimination
Removes unnecessary joins when possible:src/optimizer/join_elimination.cpp
Statistics and Cost Estimation
Statistics Propagation
The statistics propagator (src/optimizer/statistics_propagator.cpp) flows statistics through the query plan:
- Min/Max values - Range of values in a column
- Null percentage - Fraction of NULL values
- Distinct count - Number of unique values
- Column correlation - Relationships between columns
Using Statistics
Statistics enable optimizations like:- Filter elimination - Removing always-true or always-false filters
- Join ordering - Choosing smaller build sides
- Zone map pruning - Skipping row groups (see Indexing)
Advanced Optimizations
Common Subexpression Elimination (CSE)
Eliminates duplicate expressions:price * 1.15 computation is performed only once.
Implementation: src/optimizer/cse_optimizer.cpp
Top-N Optimization
Converts ORDER BY + LIMIT into a more efficient Top-N heap operation:src/optimizer/topn_optimizer.cpp
Limit Pushdown
Pushes LIMIT operations down through projections and unions:src/optimizer/limit_pushdown.cpp
Unnest Rewriting
Optimizes UNNEST operations by moving them to projections: Implementation:src/optimizer/unnest_rewriter.cpp
Late Materialization
Delays fetching full row data until necessary, fetching only row IDs through filters and joins: Implementation:src/optimizer/late_materialization.cpp
Optimizer Configuration
Disable specific optimizers for debugging:expression_rewriterfilter_pushdownfilter_pullupjoin_orderstatistics_propagationcommon_subexpressionstop_n
Performance Tips
- Let the optimizer work - Complex predicates are often optimized automatically
- Use EXPLAIN ANALYZE - Verify that filters are being pushed down
- Check statistics - Run
ANALYZE table_nameto update statistics - Avoid optimizer barriers - Some functions prevent pushdown
- Use appropriate types - Implicit casts can prevent optimizations
See Also
- Indexing Strategies - Using indexes and zone maps
- Performance Benchmarks - Measuring query performance