Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/duckdb/duckdb/llms.txt

Use this file to discover all available pages before exploring further.

DuckDB provides multiple convenient methods for importing data from various file formats. Data import is designed to be simple and intuitive, often requiring just a single SQL statement.

Direct File Querying

The simplest way to import data is to reference files directly in the FROM clause. DuckDB automatically detects the file format and schema.

CSV Files

Query CSV files directly without creating a table:
SELECT * FROM 'myfile.csv';
DuckDB automatically:
  • Detects delimiters (comma, tab, pipe, etc.)
  • Infers column types
  • Handles quoted strings and escape characters
  • Processes headers
SELECT * FROM 'data.csv';

Parquet Files

Parquet files work the same way as CSV:
SELECT * FROM 'myfile.parquet';
Parquet offers several advantages:
  • Columnar storage format
  • Built-in compression
  • Type information preserved
  • Faster query performance for analytical workloads
Parquet files are ideal for analytical queries because DuckDB can skip reading unnecessary columns and row groups.

The COPY Command

The COPY command loads data from files into tables and exports table data to files.

COPY FROM - Import Data

Load data from a CSV file into a table:
CREATE TABLE test (
  a INTEGER, 
  b INTEGER, 
  c VARCHAR(10)
);

COPY test FROM 'test.csv';

COPY TO - Export Data

Export table data to a CSV file:
-- Export entire table
COPY test TO 'output.csv';

-- Export query results
COPY (
  SELECT a, b 
  FROM test 
  WHERE a < 4000
) TO 'filtered_output.csv';
COPY returns the number of rows copied, making it easy to verify import/export operations.

INSERT Statements

INSERT with VALUES

Insert individual rows using the VALUES clause:
INSERT INTO test VALUES (11, 22, 'hello');

INSERT with SELECT

Insert data from a query, including file queries:
-- Insert from another table
INSERT INTO test2
SELECT * FROM test WHERE a > 100;

-- Insert from CSV file
INSERT INTO users
SELECT * FROM 'users.csv';

-- Insert with transformation
INSERT INTO users
SELECT 
  id,
  UPPER(name) AS name,
  email
FROM 'users.csv'
WHERE email IS NOT NULL;

INSERT BY NAME

Insert data matching by column names instead of position:
INSERT INTO users BY NAME
SELECT email, name, id FROM 'input.csv';

Reading Functions

DuckDB provides specialized functions for reading different file formats with fine-grained control.

read_csv Function

The read_csv function provides advanced CSV reading options:
SELECT * FROM read_csv('data.csv');

read_parquet Function

SELECT * FROM read_parquet('data.parquet');

-- Multiple files
SELECT * FROM read_parquet(['file1.parquet', 'file2.parquet']);

-- Glob pattern
SELECT * FROM read_parquet('data/**/*.parquet');

Import Options

Handling Errors

Control how DuckDB handles malformed data:
-- Ignore rows with errors
SELECT * FROM read_csv('data.csv', ignore_errors = TRUE);

-- Pad missing columns with NULL
SELECT * FROM read_csv('data.csv', null_padding = TRUE);

Union by Name

Combine files with different column orders:
CREATE TABLE users_age (
  id INTEGER NOT NULL,
  name VARCHAR(10) NOT NULL,
  email VARCHAR,
  age INTEGER
);

INSERT INTO users_age
SELECT *
FROM read_csv(
  'data/*.csv', 
  union_by_name = TRUE,
  ignore_errors = TRUE
);

Glob Patterns

Read multiple files with pattern matching:
SELECT * FROM 'data/*.csv';

Type Detection and Conversion

DuckDB automatically detects types from files, but you can also specify them explicitly:
-- Automatic type detection
SELECT * FROM 'data.csv';

-- Explicit type specification
SELECT * FROM read_csv('data.csv', columns = {
  'id': 'INTEGER',
  'created_at': 'TIMESTAMP',
  'amount': 'DECIMAL(10,2)'
});

-- Type pushdown from target table
CREATE TABLE timestamps(ts TIMESTAMP, dt DATE);
INSERT INTO timestamps 
SELECT ts, ts FROM read_csv('timestamp.csv');
When inserting into a table, DuckDB can push down type information to the CSV reader, improving performance and error handling.

Best Practices

For large datasets, use Parquet format instead of CSV for:
  • Faster query performance
  • Smaller file sizes
  • Type safety
  • Column pruning benefits

Performance Tips

  1. Use Parquet for analytical workloads: Parquet’s columnar format is optimized for DuckDB
  2. Leverage glob patterns: Read multiple files in parallel
  3. Specify types when known: Avoid auto-detection overhead for large files
  4. Use COPY for bulk loads: COPY is optimized for loading large datasets

Error Handling

-- Robust import with error handling
CREATE TABLE users (
  id INTEGER NOT NULL,
  name VARCHAR(10) NOT NULL,
  email VARCHAR
);

INSERT INTO users
SELECT *
FROM read_csv(
  'users.csv',
  ignore_errors = TRUE,  -- Skip malformed rows
  null_padding = TRUE    -- Handle missing columns
);

Examples

Complete Import Workflow

-- Create table
CREATE TABLE sales (
  order_id INTEGER,
  customer VARCHAR,
  amount DECIMAL(10,2),
  order_date DATE
);

-- Import from CSV
COPY sales FROM 'sales_data.csv';

-- Verify import
SELECT COUNT(*), SUM(amount) FROM sales;

-- Export filtered results
COPY (
  SELECT * FROM sales WHERE order_date >= '2024-01-01'
) TO 'sales_2024.parquet' (FORMAT PARQUET);