DuckDB uses a vectorized execution model combined with push-based pipelining and automatic parallelization to achieve high performance on analytical queries. This execution strategy is fundamentally different from traditional row-at-a-time execution.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/duckdb/duckdb/llms.txt
Use this file to discover all available pages before exploring further.
Vectorized Execution
Location:src/execution/
What is Vectorization?
Instead of processing one row at a time, DuckDB processes data in vectors - batches of rows stored in columnar format. This approach provides significant performance benefits. Traditional Row-at-a-Time:Vector Size
Fromsrc/include/duckdb/common/vector_size.hpp:15-21:
- Fit comfortably in CPU cache (L1/L2)
- Enable SIMD (Single Instruction Multiple Data) operations
- Balance between batch size and memory pressure
- Be a power of 2 for efficient bit operations
Data Chunks
Location:src/common/types/data_chunk.cpp
Vectors are organized into DataChunks - the fundamental unit of data flow in DuckDB.
From src/common/types/data_chunk.cpp:23-35:
DataChunk contains:
- Multiple column vectors (one per column)
- Count: Number of valid rows (≤ 2048)
- Capacity: Maximum rows (typically 2048)
Benefits of Vectorization
CPU Cache Efficiency
Consecutive memory access patterns maximize CPU cache hits
SIMD Instructions
Process multiple values with single CPU instruction (AVX-512, AVX-256)
Branch Prediction
Reduce branch mispredictions with batch-oriented logic
Function Call Overhead
Amortize function call costs across 2048 rows instead of 1
SIMD Example
Modern CPUs can process multiple values simultaneously:Physical Operators
Location:src/execution/physical_operator.cpp
The execution engine consists of physical operators that implement actual query operations.
From src/execution/physical_operator.cpp:19-27:
Common Physical Operators
| Operator | Description | Example |
|---|---|---|
| TableScan | Read data from storage | FROM users |
| Filter | Apply WHERE predicates | WHERE age > 18 |
| Projection | Select specific columns | SELECT name, email |
| HashJoin | Join using hash table | INNER JOIN orders ON ... |
| HashAggregate | GROUP BY with aggregates | GROUP BY country |
| Order | Sort results | ORDER BY created_at DESC |
| Limit | Restrict output rows | LIMIT 100 |
| Window | Window functions | ROW_NUMBER() OVER (...) |
Operator Interface
Fromsrc/execution/physical_operator.cpp:97-100:
- Takes input
DataChunk(or pulls from child operators) - Processes the data
- Produces output
DataChunk - Returns status (more data, finished, etc.)
Push-Based Execution Model
DuckDB uses a push-based (also called “data-centric”) execution model where data flows from operators to their parents.Pull vs Push
Pull-Based (Volcano/Iterator):Benefits of Push-Based
Better Pipelining
Better Pipelining
Data flows through multiple operators without materialization. A chunk can be filtered, projected, and aggregated in a single pass.
Cache Efficiency
Cache Efficiency
Data stays hot in CPU cache as it flows through the pipeline.
Parallelism
Parallelism
Multiple pipelines can run concurrently on different cores.
Morsel-Driven Parallelism
Morsel-Driven Parallelism
Work is divided into small chunks (morsels) that can be distributed across threads.
Query Optimization
Location:src/optimizer/
Before execution, queries are optimized through multiple passes. See Architecture for details.
Key Optimizations for Execution
1. Predicate PushdownParallel Execution
DuckDB automatically parallelizes query execution across available CPU cores.Thread Count
Fromsrc/execution/physical_operator.cpp:56-73:
Morsel-Driven Parallelism
Work is divided into morsels (typically row groups of ~122K rows) that can be processed independently.- Load balancing: Fast threads can pick up more morsels
- No synchronization overhead during processing
- Cache-friendly: Each thread works on contiguous data
Configuring Parallelism
Pipeline Execution
Operators are organized into pipelines - sequences of operators that can run without materializing intermediate results.Pipeline Breakers
Some operators require materialization and break the pipeline:| Operator | Why It Breaks Pipeline |
|---|---|
| HashJoin (build side) | Must build complete hash table before probing |
| HashAggregate | Must see all rows before finalizing groups |
| Order | Must collect all rows before sorting |
| Window (some) | Depends on window frame specification |
Example Query Execution
- Each vector flows from scan through filter to aggregator
- No intermediate materialization
- Parallel: Multiple threads scan different row groups
- Wait for all aggregation to complete
- Stream results to sort
- Sort completes, stream sorted results
- Stop after 10 rows (early termination)
Adaptive Execution
Location:src/execution/adaptive_filter.cpp
DuckDB adapts execution strategies based on runtime statistics:
Adaptive Filtering
- If
cheap_column = 'value'eliminates 99% of rows → evaluate it first - If both are selective → use cost-based ordering
Adaptive Hash Joins
Hash joins adapt based on data size:- Small data: In-memory hash table
- Medium data: Partitioned hash table
- Large data: Disk-based partitioning with spillage
Performance Monitoring
EXPLAIN ANALYZE
See actual execution statistics:- Actual row counts vs estimates
- Execution time per operator
- Memory usage
- Parallelism degree
Profiling
Execution Best Practices
Execution Summary
Vectorized
Process 2,048 rows at a time for CPU efficiency
Push-Based
Data flows through pipelines without materialization
Parallel
Automatic parallelization across CPU cores
Adaptive
Runtime adaptation based on actual data characteristics
Next Steps
- Understand the Storage System that feeds execution
- Learn about Data Types and their processing
- Review the complete Architecture