DuckDB uses a columnar storage format optimized for analytical workloads, combined with a single-file architecture for simplicity and portability. The storage system provides full ACID guarantees through Multi-Version Concurrency Control (MVCC) and Write-Ahead Logging (WAL).Documentation Index
Fetch the complete documentation index at: https://mintlify.com/duckdb/duckdb/llms.txt
Use this file to discover all available pages before exploring further.
Single-File Database Design
Location:src/storage/
DuckDB stores all database data in a single file with the .duckdb extension (or in-memory). This design provides several benefits:
Portability
Copy a single file to move the entire database
Simplicity
No complex directory structures or multiple files to manage
Versioning
Easy to version control, backup, and restore
Zero Config
No server setup or configuration required
File Structure
The database file contains:- File Header: Magic bytes “DUCK”, version info, configuration
- Block Manager: Manages storage blocks on disk
- Metadata: Table schemas, statistics, indexes
- Data Blocks: Actual table data in columnar format
- Write-Ahead Log (WAL): Transaction log for durability
src/include/duckdb/storage/storage_info.hpp:83-98:
Columnar Storage Format
Unlike row-oriented databases (like SQLite) that store entire rows together, DuckDB stores data column-by-column. This is optimal for analytical queries that typically access a subset of columns.Row-Oriented vs Columnar
Row-Oriented (OLTP databases):Benefits of Columnar Storage
Better Compression
Better Compression
Similar values are stored together, leading to higher compression ratios. For example, a column of country names will have many repeated values.
Reduced I/O
Reduced I/O
Only read the columns you need. If a query only accesses 3 out of 50 columns, you only read ~6% of the data.
Vectorized Processing
Vectorized Processing
Columnar data is ideal for SIMD operations and cache-friendly batch processing.
Better Statistics
Better Statistics
Column-level min/max values enable efficient query pruning and optimization.
Row Groups
Data is organized into row groups - horizontal partitions of the table containing a fixed number of rows. Fromsrc/include/duckdb/storage/storage_info.hpp:22-54:
Row Group Structure
Each row group contains:- Column segments: Compressed column data
- Version information: For MVCC transaction visibility
- Statistics: Min/max values, null counts, distinct counts
- Metadata pointers: Links to compression info and data blocks
src/storage/table/row_group.cpp:32-36:
Why Row Groups?
- Parallelism: Different threads can scan different row groups concurrently
- Pruning: Skip entire row groups based on statistics (zone maps)
- Compression: Each row group can use optimal compression for its data
- Memory Management: Load/unload row groups as needed
Block Management
Storage is divided into fixed-size blocks that are the unit of I/O and caching. Fromsrc/include/duckdb/storage/storage_info.hpp:29-64:
- Default allocation: 256 KB
- Configurable range: 16 KB to 256 KB
- Header overhead: 8 bytes
Buffer Manager
Location:src/storage/buffer_manager.cpp
The buffer manager handles:
- Memory allocation: Intelligent caching of frequently accessed blocks
- Eviction policies: LRU-based eviction when memory is full
- Pinning: Keep critical blocks in memory
- Dirty tracking: Track modified blocks for write-back
Compression
Location:src/storage/compression/
DuckDB applies per-column compression automatically, choosing the best algorithm based on data characteristics.
Compression Algorithms
| Algorithm | Best For | Description |
|---|---|---|
| Uncompressed | Random data | No compression overhead |
| Constant | All same values | Stores single value |
| RLE (Run-Length Encoding) | Repeated values | [A,A,A,B,B,C] → [A×3,B×2,C×1] |
| Dictionary | Low cardinality | String → Integer mapping |
| BitPacking | Small integers | Pack values in fewer bits |
| Frame of Reference | Clustered numbers | Store offset from base |
| ALP (Adaptive Lossless) | Floating-point | Specialized for doubles |
| FSST | Strings | Symbol table compression |
| Chimp/Patas | Time series | Specialized for timestamps |
Compression is transparent to queries - data is automatically decompressed during scans.
ACID Properties
DuckDB provides full ACID (Atomicity, Consistency, Isolation, Durability) transaction support.Atomicity
Transactions are all-or-nothing:Consistency
Database constraints are enforced:Isolation
DuckDB uses Multi-Version Concurrency Control (MVCC) to provide snapshot isolation.How MVCC Works
- Each transaction gets a unique timestamp
- Multiple versions of rows can exist simultaneously
- Readers see a consistent snapshot (no locks needed)
- Writers create new versions without blocking readers
src/transaction/duck_transaction_manager.cpp:36-78:
Durability
Write-Ahead Log (WAL)
Location:src/storage/write_ahead_log.cpp
The WAL ensures durability by logging all changes before they’re applied to the database file.
WAL Process:
- Transaction modifies data → changes written to WAL first
- WAL entry flushed to disk
- Transaction commits
- Changes asynchronously applied to main database file
- Periodically, checkpoint flushes all changes and truncates WAL
src/storage/write_ahead_log.cpp:27-36:
- Database opens and detects incomplete transactions in WAL
- Replays committed transactions from WAL
- Rolls back uncommitted transactions
- Database returns to consistent state
Checkpointing
Location:src/storage/checkpoint_manager.cpp
Checkpointing is the process of:
- Flushing all in-memory changes to the database file
- Writing updated metadata and statistics
- Truncating the WAL
- Automatically when WAL grows too large
- On clean database shutdown
- When explicitly requested via
CHECKPOINT
Storage Internals Summary
Performance Characteristics
Sequential Scans
Excellent - Columnar format and compression minimize I/O
Aggregations
Excellent - Read only needed columns, vectorized processing
Point Lookups
Good - Supported via indexes, but not primary use case
Bulk Inserts
Excellent - Batch-oriented, optimized for analytics
Best Practices
Choose Appropriate Types
Use the smallest type that fits your data.
TINYINT (1 byte) vs BIGINT (8 bytes) affects storage and performance.Partition Large Tables
For very large datasets, consider partitioning by date or other keys to improve pruning.
Checkpoint Regularly
For write-heavy workloads, periodic checkpoints prevent WAL from growing too large.
Configuration Options
Next Steps
- Learn about Query Execution and how queries process this data
- Explore Data Types and their storage characteristics
- Read the Architecture Overview