DuckDB provides two main mechanisms for accelerating data access: ART (Adaptive Radix Tree) indexes for point and range lookups, and zone maps for pruning data during scans.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/duckdb/duckdb/llms.txt
Use this file to discover all available pages before exploring further.
ART Indexes
Overview
DuckDB uses ART (Adaptive Radix Tree) indexes, an efficient in-memory index structure that adapts to data distribution. ART indexes are implemented insrc/execution/index/art/.
From src/include/duckdb/execution/index/art/art.hpp:43:
- Point lookups (
=) - Range scans (
<,>,BETWEEN) - Prefix matching for strings
- NULL handling
- Multiple column indexes
Creating ART Indexes
Create a unique index:Supported Data Types
Fromsrc/execution/index/art/art.cpp:56, ART indexes support:
BOOLINT8,INT16,INT32,INT64,INT128UINT8,UINT16,UINT32,UINT64,UINT128FLOAT,DOUBLEVARCHAR
When to Use ART Indexes
ART indexes are beneficial for:1. Primary Key Constraints
2. Foreign Key Lookups
3. Range Queries
4. Sorting
Indexes can accelerate ORDER BY:When NOT to Use ART Indexes
Avoid indexes when:- High cardinality relative to table size - Scanning may be faster
- Frequent bulk updates - Index maintenance overhead
- Analytical queries scanning most rows - Zone maps are better (see below)
- Low selectivity filters - Returning >10-15% of rows usually favors scans
Index Internals
ART is a radix tree with adaptive node sizes:- Node4 - Up to 4 children (smallest node)
- Node16 - Up to 16 children
- Node48 - Up to 48 children
- Node256 - Up to 256 children (largest node)
src/execution/index/art/art.cpp:84:
Managing Indexes
Drop an index:Zone Maps
Overview
Zone maps (also called min-max indexes) store minimum and maximum values for each column segment. DuckDB uses zone maps automatically for all columns in persistent storage. Zone maps enable data skipping - entire row groups can be skipped if their min/max ranges don’t overlap with query predicates.How Zone Maps Work
DuckDB stores data in row groups (default ~122K rows). Each row group maintains statistics per column:- Minimum value
- Maximum value
- NULL count
- Distinct count (approximate)
src/storage/table/row_group.cpp:515:
Zone Map Pruning Example
Consider a table with timestamps:- Check each row group’s min/max
event_date - Skip row groups where
max < '2024-03-15'ormin > '2024-03-15' - Only scan row groups where the range overlaps
Zone Map Benefits
Automatic - No manual index creation required Zero overhead - Statistics collected during data ingestion Effective for:- Range filters on sorted/clustered columns
- Equality predicates
- MIN/MAX aggregates
- IN clauses with small sets
Optimizing for Zone Maps
1. Sort Data by Filter Columns
Sorting data maximizes zone map effectiveness:2. Use Appropriate Data Types
Narrower types create tighter bounds:3. Partition-Like Queries
Structure data to match query patterns:Zone Map Examples
Frombenchmark/micro/zonemaps/zonemaps.benchmark:9:
d and e have perfect zone map pruning (all row groups have same min=max).
Column a has excellent pruning for range queries since it’s sorted.
Column c has moderate pruning due to cycling values.
Viewing Zone Map Effectiveness
UseEXPLAIN ANALYZE to see pruning:
- Row groups scanned vs. total row groups
- “Filters” section showing pushed-down predicates
- Actual rows processed
Index vs Zone Map Decision Guide
| Scenario | Recommendation |
|---|---|
| Primary key lookups | ART index |
| Foreign key joins | ART index |
| Point queries (=) on high-cardinality columns | ART index |
| Range scans on sorted analytical data | Zone maps (no index needed) |
| Analytical aggregations | Zone maps (no index needed) |
| OLTP-style random access | ART index |
| Full table scans with filters | Zone maps (no index needed) |
| ORDER BY + LIMIT on small results | ART index |
Best Practices
- Start without indexes - DuckDB’s zone maps often suffice for analytical queries
-
Profile before indexing - Use
EXPLAIN ANALYZEto identify slow queries - Index for constraints - Always use indexes for PRIMARY KEY and UNIQUE constraints
- Sort for analytics - Cluster data by common filter columns for zone map effectiveness
-
Multi-column indexes - Put most selective columns first:
-
Monitor index size - Check index memory usage:
-
Rebuild indexes after bulk loads - Drop and recreate for better structure:
See Also
- Query Optimization - Understanding the query optimizer
- Performance Benchmarks - Measuring index performance