DuckDB is structured as a modular analytical database system. This guide provides an overview of the codebase architecture to help you navigate and contribute to the project.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/duckdb/duckdb/llms.txt
Use this file to discover all available pages before exploring further.
Source Code Organization
The DuckDB source tree is organized under thesrc/ directory:
Query Processing Pipeline
Queries in DuckDB flow through a multi-stage pipeline:1. Parser
Location:src/parser/
The parser is the entry point for SQL queries.
Key Components
- libpg_query integration: DuckDB uses PostgreSQL’s parser (libpg_query)
- Custom AST: Transforms PostgreSQL tokens into DuckDB’s internal representation
- Output: Tree of
SQLStatement,Expression, andTableRefnodes
Main Classes
SQLStatement- Base class for all statements (SELECT, INSERT, etc.)ParsedExpression- Represents expressions (columns, operators, functions)TableRef- Represents table references in FROM clauses
Example
SelectStatementcontaining:- Select list:
ColumnRef(name),ColumnRef(age) - From:
BaseTableRef(users) - Where:
ComparisonExpression(age > 18)
- Select list:
2. Planner
Location:src/planner/
Converts parsed AST into a logical query plan.
Responsibilities
- Binding: Resolve table/column names using the catalog
- Type resolution: Determine result types for expressions
- Logical plan creation: Build tree of
LogicalOperatornodes
Key Components
Binder- Resolves symbols and creates bound nodesLogicalOperator- Base class for logical plan nodes- Operators:
LogicalGet- Table scanLogicalFilter- WHERE clauseLogicalProjection- SELECT listLogicalJoin- JOIN operationsLogicalAggregate- GROUP BY aggregations
Example Plan
For the query above:3. Optimizer
Location:src/optimizer/
Transforms logical plans into more efficient equivalent plans.
Optimization Techniques
Rule-based optimizations:- Predicate pushdown: Move filters closer to data sources
- Projection pushdown: Only read required columns
- Expression rewriting: Simplify expressions
- Common subexpression elimination: Reuse computed values
- Join ordering: Find optimal join sequence
- Join type selection: Choose hash join vs. nested loop join
- Index selection: Use available indexes
Key Classes
Optimizer- Main optimization driverRule- Base class for optimization rulesOptimizerExtension- Extension point for custom optimizations
Example Transformation
Before optimization:4. Execution
Location:src/execution/
Executes the optimized query plan and produces results.
Architecture
- Physical plan: Converts
LogicalOperatortoPhysicalOperator - Push-based execution: Operators push data chunks to parents
- Vectorized processing: Operates on batches of rows (default: 2048)
- Parallel execution: Automatic parallelization of operators
Key Components
PhysicalOperator- Base class for execution operatorsDataChunk- Vectorized batch of rows (columnar format)ExecutionContext- Runtime state and resources
Physical Operators
| Logical | Physical | Description |
|---|---|---|
LogicalGet | PhysicalTableScan | Sequential table scan |
LogicalFilter | PhysicalFilter | Apply filter predicates |
LogicalJoin | PhysicalHashJoin | Hash-based join |
LogicalAggregate | PhysicalHashAggregate | Hash-based aggregation |
LogicalProjection | PhysicalProjection | Compute expressions |
Vectorized Execution
DuckDB processes data in columnar batches:Core Components
Catalog
Location:src/catalog/
Manages database metadata and schema information.
Structure
Key Classes
Catalog- Top-level catalog containerSchemaCatalogEntry- Schema (namespace) entryTableCatalogEntry- Table metadataStandardEntry- Base class for catalog entries
Usage
Storage
Location:src/storage/
Manages physical data storage, both in-memory and on-disk.
Components
Buffer Manager:- Manages in-memory buffer pool
- Handles eviction and loading of blocks
- Coordinates with disk I/O
- Row groups: Data organized in horizontal partitions
- Column segments: Columnar storage within row groups
- Compression: Automatic compression of column data
- MVCC (Multi-Version Concurrency Control)
- Optimistic concurrency control
- ACID guarantees
Storage Format
Transaction Management
Location:src/transaction/
Provides ACID transaction support.
Features
- Isolation: Snapshot isolation using MVCC
- Atomicity: All-or-nothing commit/rollback
- Durability: Write-Ahead Logging (WAL)
- Consistency: Constraint enforcement
Key Classes
Transaction- Represents an active transactionTransactionManager- Coordinates transaction lifecycleWriteAheadLog- Ensures durability
Functions
Location:src/function/
Built-in functions (scalar, aggregate, table).
Function Types
Scalar Functions:Development Workflow
Making Changes
Identify the component
Determine which part of the codebase to modify:
- SQL syntax changes →
parser/ - New operator →
planner/+execution/ - Optimization →
optimizer/ - Storage format →
storage/ - Built-in function →
function/
Implement changes
Follow DuckDB coding standards:
- Use tabs for indentation
CamelCasefor types and functionssnake_casefor variables- Const correctness
- Smart pointers over raw pointers
Code Navigation Tips
Finding implementations:- Set breakpoint in parser for your query type
- Follow through planner binding
- Watch optimizer transformations
- Step into physical operator execution
Extension System
DuckDB supports loadable extensions.Extension Structure
Extension Entry Point
Debugging Tips
Enable Debug Logging
Query Profiling
Explain Plans
Resources
Building
Build DuckDB from source
Testing
Write and run tests
Contributing
Contribution guidelines
Source README
Official source documentation