Data Import Methods

DuckDB provides multiple convenient methods for importing data from various file formats. Data import is designed to be simple and intuitive, often requiring just a single SQL statement.

Direct File Querying

The simplest way to import data is to reference files directly in the FROM clause. DuckDB automatically detects the file format and schema.

CSV Files

Query CSV files directly without creating a table:

SELECT * FROM 'myfile.csv';

DuckDB automatically:

Detects delimiters (comma, tab, pipe, etc.)
Infers column types
Handles quoted strings and escape characters
Processes headers

SELECT * FROM 'data.csv';

Parquet Files

Parquet files work the same way as CSV:

SELECT * FROM 'myfile.parquet';

Parquet offers several advantages:

Columnar storage format
Built-in compression
Type information preserved
Faster query performance for analytical workloads

Parquet files are ideal for analytical queries because DuckDB can skip reading unnecessary columns and row groups.

The COPY Command

The COPY command loads data from files into tables and exports table data to files.

COPY FROM - Import Data

Load data from a CSV file into a table:

CREATE TABLE test (
  a INTEGER, 
  b INTEGER, 
  c VARCHAR(10)
);

COPY test FROM 'test.csv';

COPY TO - Export Data

Export table data to a CSV file:

-- Export entire table
COPY test TO 'output.csv';

-- Export query results
COPY (
  SELECT a, b 
  FROM test 
  WHERE a < 4000
) TO 'filtered_output.csv';

COPY returns the number of rows copied, making it easy to verify import/export operations.

INSERT Statements

INSERT with VALUES

Insert individual rows using the VALUES clause:

INSERT INTO test VALUES (11, 22, 'hello');

INSERT with SELECT

Insert data from a query, including file queries:

-- Insert from another table
INSERT INTO test2
SELECT * FROM test WHERE a > 100;

-- Insert from CSV file
INSERT INTO users
SELECT * FROM 'users.csv';

-- Insert with transformation
INSERT INTO users
SELECT 
  id,
  UPPER(name) AS name,
  email
FROM 'users.csv'
WHERE email IS NOT NULL;

INSERT BY NAME

Insert data matching by column names instead of position:

INSERT INTO users BY NAME
SELECT email, name, id FROM 'input.csv';

Reading Functions

DuckDB provides specialized functions for reading different file formats with fine-grained control.

read_csv Function

The read_csv function provides advanced CSV reading options:

SELECT * FROM read_csv('data.csv');

read_parquet Function

SELECT * FROM read_parquet('data.parquet');

-- Multiple files
SELECT * FROM read_parquet(['file1.parquet', 'file2.parquet']);

-- Glob pattern
SELECT * FROM read_parquet('data/**/*.parquet');

Import Options

Handling Errors

Control how DuckDB handles malformed data:

-- Ignore rows with errors
SELECT * FROM read_csv('data.csv', ignore_errors = TRUE);

-- Pad missing columns with NULL
SELECT * FROM read_csv('data.csv', null_padding = TRUE);

Union by Name

Combine files with different column orders:

CREATE TABLE users_age (
  id INTEGER NOT NULL,
  name VARCHAR(10) NOT NULL,
  email VARCHAR,
  age INTEGER
);

INSERT INTO users_age
SELECT *
FROM read_csv(
  'data/*.csv', 
  union_by_name = TRUE,
  ignore_errors = TRUE
);

Glob Patterns

Read multiple files with pattern matching:

SELECT * FROM 'data/*.csv';

Type Detection and Conversion

DuckDB automatically detects types from files, but you can also specify them explicitly:

-- Automatic type detection
SELECT * FROM 'data.csv';

-- Explicit type specification
SELECT * FROM read_csv('data.csv', columns = {
  'id': 'INTEGER',
  'created_at': 'TIMESTAMP',
  'amount': 'DECIMAL(10,2)'
});

-- Type pushdown from target table
CREATE TABLE timestamps(ts TIMESTAMP, dt DATE);
INSERT INTO timestamps 
SELECT ts, ts FROM read_csv('timestamp.csv');

When inserting into a table, DuckDB can push down type information to the CSV reader, improving performance and error handling.

Best Practices

For large datasets, use Parquet format instead of CSV for:

Faster query performance
Smaller file sizes
Type safety
Column pruning benefits

Performance Tips

Use Parquet for analytical workloads: Parquet’s columnar format is optimized for DuckDB
Leverage glob patterns: Read multiple files in parallel
Specify types when known: Avoid auto-detection overhead for large files
Use COPY for bulk loads: COPY is optimized for loading large datasets

Error Handling

-- Robust import with error handling
CREATE TABLE users (
  id INTEGER NOT NULL,
  name VARCHAR(10) NOT NULL,
  email VARCHAR
);

INSERT INTO users
SELECT *
FROM read_csv(
  'users.csv',
  ignore_errors = TRUE,  -- Skip malformed rows
  null_padding = TRUE    -- Handle missing columns
);

Examples

Complete Import Workflow

-- Create table
CREATE TABLE sales (
  order_id INTEGER,
  customer VARCHAR,
  amount DECIMAL(10,2),
  order_date DATE
);

-- Import from CSV
COPY sales FROM 'sales_data.csv';

-- Verify import
SELECT COUNT(*), SUM(amount) FROM sales;

-- Export filtered results
COPY (
  SELECT * FROM sales WHERE order_date >= '2024-01-01'
) TO 'sales_2024.parquet' (FORMAT PARQUET);

Getting Started

Core Concepts

SQL Guide

Data Import & Export

Extensions

Performance

Data Import Methods

Direct File Querying

CSV Files

Parquet Files

The COPY Command

COPY FROM - Import Data

COPY TO - Export Data

INSERT Statements

INSERT with VALUES

INSERT with SELECT

INSERT BY NAME

Reading Functions

read_csv Function

read_parquet Function

Import Options

Handling Errors

Union by Name

Glob Patterns

Type Detection and Conversion

Best Practices

Performance Tips

Error Handling

Examples

Complete Import Workflow

Getting Started

Core Concepts

SQL Guide

Data Import & Export

Extensions

Performance

Documentation Index

​Direct File Querying

​CSV Files

​Parquet Files

​The COPY Command

​COPY FROM - Import Data

​COPY TO - Export Data

​INSERT Statements

​INSERT with VALUES

​INSERT with SELECT

​INSERT BY NAME

​Reading Functions

​read_csv Function

​read_parquet Function

​Import Options

​Handling Errors

​Union by Name

​Glob Patterns

​Type Detection and Conversion

​Best Practices

​Performance Tips

​Error Handling

​Examples

​Complete Import Workflow

Direct File Querying

CSV Files

Parquet Files

The COPY Command

COPY FROM - Import Data

COPY TO - Export Data

INSERT Statements

INSERT with VALUES

INSERT with SELECT

INSERT BY NAME

Reading Functions

read_csv Function

read_parquet Function

Import Options

Handling Errors

Union by Name

Glob Patterns

Type Detection and Conversion

Best Practices

Performance Tips

Error Handling

Examples

Complete Import Workflow