Documentation Index Fetch the complete documentation index at: https://mintlify.com/duckdb/duckdb/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start
This guide will get you up and running with DuckDB quickly, covering both CLI usage and embedded C programming.
Using the CLI
The fastest way to start using DuckDB is through the command-line interface.
Start DuckDB
Launch the DuckDB CLI: This opens an in-memory database. To use a persistent database:
Create a table and insert data
CREATE TABLE users (id INTEGER , name VARCHAR , email VARCHAR );
INSERT INTO users VALUES
( 1 , 'Alice' , 'alice@example.com' ),
( 2 , 'Bob' , 'bob@example.com' ),
( 3 , 'Charlie' , 'charlie@example.com' );
Query your data
SELECT * FROM users WHERE id > 1 ;
Output: ┌───────┬─────────┬──────────────────────┐
│ id │ name │ email │
│ int32 │ varchar │ varchar │
├───────┼─────────┼──────────────────────┤
│ 2 │ Bob │ bob@example.com │
│ 3 │ Charlie │ charlie@example.com │
└───────┴─────────┴──────────────────────┘
Querying Files Directly
One of DuckDB’s most powerful features is the ability to query data files directly without importing:
SELECT * FROM 'data.csv' WHERE amount > 100 ;
DuckDB automatically detects file formats and handles schema inference, making data exploration incredibly fast.
Embedded C Example
For applications that need embedded analytics, DuckDB provides a simple C API. Here’s a complete working example:
#include "duckdb.h"
#include <stdio.h>
int main () {
duckdb_database db = NULL ;
duckdb_connection con = NULL ;
duckdb_result result;
if ( duckdb_open ( NULL , & db) == DuckDBError) {
fprintf (stderr, "Failed to open database \n " );
goto cleanup;
}
if ( duckdb_connect (db, & con) == DuckDBError) {
fprintf (stderr, "Failed to open connection \n " );
goto cleanup;
}
if ( duckdb_query (con, "CREATE TABLE integers(i INTEGER, j INTEGER);" , NULL ) == DuckDBError) {
fprintf (stderr, "Failed to query database \n " );
goto cleanup;
}
if ( duckdb_query (con, "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);" , NULL ) == DuckDBError) {
fprintf (stderr, "Failed to query database \n " );
goto cleanup;
}
if ( duckdb_query (con, "SELECT * FROM integers" , & result) == DuckDBError) {
fprintf (stderr, "Failed to query database \n " );
goto cleanup;
}
// print the names of the result
idx_t row_count = duckdb_row_count ( & result);
idx_t column_count = duckdb_column_count ( & result);
for ( size_t i = 0 ; i < column_count; i ++ ) {
printf ( " %s " , duckdb_column_name ( & result, i));
}
printf ( " \n " );
// print the data of the result
for ( size_t row_idx = 0 ; row_idx < row_count; row_idx ++ ) {
for ( size_t col_idx = 0 ; col_idx < column_count; col_idx ++ ) {
char * val = duckdb_value_varchar ( & result, col_idx, row_idx);
printf ( " %s " , val);
duckdb_free (val);
}
printf ( " \n " );
}
cleanup:
duckdb_destroy_result ( & result);
duckdb_disconnect ( & con);
duckdb_close ( & db);
}
Understanding the C API
Initialize database and connection
duckdb_database db = NULL ;
duckdb_connection con = NULL ;
// Open database (NULL = in-memory)
duckdb_open ( NULL , & db );
// Create connection to database
duckdb_connect (db, & con );
Pass a filename to duckdb_open() for persistent storage: duckdb_open ( "mydata.db" , & db );
Execute queries
// Queries that don't return results (DDL, DML)
duckdb_query (con, "CREATE TABLE integers(i INTEGER, j INTEGER);" , NULL );
duckdb_query (con, "INSERT INTO integers VALUES (3, 4), (5, 6);" , NULL );
// Queries that return results
duckdb_result result;
duckdb_query (con, "SELECT * FROM integers" , & result );
Process query results
// Get result metadata
idx_t row_count = duckdb_row_count ( & result );
idx_t column_count = duckdb_column_count ( & result );
const char * col_name = duckdb_column_name ( & result , 0 );
// Access values
for ( size_t row = 0 ; row < row_count; row ++ ) {
for ( size_t col = 0 ; col < column_count; col ++ ) {
char * val = duckdb_value_varchar ( & result, col, row);
printf ( " %s " , val);
duckdb_free (val); // Important: free allocated strings
}
}
Clean up resources
duckdb_destroy_result ( & result );
duckdb_disconnect ( & con );
duckdb_close ( & db );
Always clean up in reverse order of creation.
Embedded C++ Example
The C++ API provides a more ergonomic interface with RAII and exceptions:
#include "duckdb.hpp"
using namespace duckdb ;
int main () {
// Open database (nullptr = in-memory)
DuckDB db ( nullptr );
// Create connection
Connection con (db);
// Execute queries
con . Query ( "CREATE TABLE integers(i INTEGER)" );
con . Query ( "INSERT INTO integers VALUES (3)" );
// Query and print results
auto result = con . Query ( "SELECT * FROM integers" );
result -> Print ();
}
The C++ API automatically handles resource cleanup through destructors, making it less error-prone than the C API.
Common SQL Operations
Here are some common analytical queries you can run:
Aggregation
Window Functions
Joins
Complex Types
SELECT
department,
COUNT ( * ) as employee_count,
AVG (salary) as avg_salary,
MAX (salary) as max_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC ;
Working with Real Data
DuckDB excels at analyzing real-world data files:
-- Analyze a large CSV file without importing
SELECT
DATE_TRUNC( 'month' , timestamp ) as month ,
COUNT ( * ) as events,
COUNT ( DISTINCT user_id) as unique_users
FROM 'logs.csv'
WHERE event_type = 'purchase'
GROUP BY month
ORDER BY month ;
-- Query multiple Parquet files at once
SELECT
product_category,
SUM (revenue) as total_revenue
FROM 'sales_data/*.parquet'
GROUP BY product_category;
When querying large files, DuckDB automatically uses streaming and parallel processing, but keep in mind that complex operations may require significant memory.
Next Steps
Now that you understand the basics, explore more advanced features:
SQL Reference Learn about DuckDB’s rich SQL dialect and advanced features
Data Import Deep dive into importing and exporting data in various formats
Client APIs Explore Python, R, Java, and other client libraries
Extensions Extend DuckDB with additional functionality
Tips and Tricks
Use DESCRIBE table_name; to see table schema
Use .tables to list all tables (CLI)
Use .schema table_name to see CREATE statement (CLI)
Use EXPLAIN to see query execution plans
Enable profiling with PRAGMA enable_profiling; to optimize queries
Use COPY command for fast bulk data import/export