JSON Files

DuckDB provides comprehensive support for JSON data, including reading JSON files directly, querying JSON structures, and working with the JSON data type.

Reading JSON Files

Query JSON files directly without manual data loading:

SELECT * FROM 'data.json';

DuckDB automatically detects JSON format and structure.

Using read_json_auto

For more control over JSON reading:

SELECT * FROM read_json_auto('movies.ndjson');

JSON File Formats

DuckDB supports multiple JSON file formats:

Newline-Delimited JSON (NDJSON)

{"id": 1, "name": "O Brother, Where Art Thou?"}
{"id": 2, "name": "Home for the Holidays"}
{"id": 3, "name": "The Firm"}

SELECT * FROM 'movies.ndjson';

JSON Array

[
  {"id": 1, "name": "O Brother, Where Art Thou?"},
  {"id": 2, "name": "Home for the Holidays"},
  {"id": 3, "name": "The Firm"}
]

SELECT * FROM 'movies.json';

Multiple Files and Union

Combine JSON files with different schemas:

SELECT * FROM read_json_auto(
  ['file1.ndjson', 'file2.json'],
  union_by_name = true
)
ORDER BY id;

Missing fields are filled with NULL values.

Reading JSON Objects

Preserve raw JSON structure with read_json_objects:

SELECT * FROM read_json_objects_auto('data.ndjson');

This returns each JSON object as a single JSON column:

{"id":1,"name":"O Brother, Where Art Thou?"}
{"id":2,"name":"Home for the Holidays"}
{"id":3,"name":"The Firm"}

The JSON Data Type

DuckDB provides a native JSON type for storing and querying JSON data:

CREATE TABLE events (
  id INTEGER,
  data JSON
);

INSERT INTO events VALUES 
  (1, '{"user": "alice", "action": "login"}'),
  (2, '{"user": "bob", "action": "purchase", "amount": 99.99}');

JSON Functions

DuckDB provides rich functions for working with JSON data:

json_extract

Extract values from JSON:

SELECT json_extract(
  '{"my_field": {"my_nested_field": ["goose", "duck"]}}',
  '$.my_field.my_nested_field[1]'
);
-- Result: "duck"

json_extract_string

Extract values as strings (removes quotes):

SELECT json_extract_string(
  '{"my_field": {"my_nested_field": ["goose", "duck"]}}',
  '/my_field/my_nested_field/1'
);
-- Result: duck

json_extract_path

Alias for json_extract:

SELECT json_extract_path(
  '{"user": {"name": "alice"}}',
  '/user/name'
);
-- Result: "alice"

Handling NULL Values

JSON functions properly handle NULL:

SELECT json_extract('{"foo": null}', '$.foo');
-- Result: null (JSON null)

JSON Structure Analysis

Analyze JSON structure:

SELECT json_structure('{"duck":"goose"}'->'duck');
-- Result: "VARCHAR"

Querying Nested JSON

Work with complex nested structures:

SELECT 
  id,
  data->>'user' AS user,
  data->>'action' AS action,
  CAST(data->>'amount' AS DECIMAL) AS amount
FROM events;

Result:

id | user  | action   | amount
---+-------+----------+-------
1  | alice | login    | NULL
2  | bob   | purchase | 99.99

Array Operations

Extract array elements:

SELECT json_extract('[1, 2, 42]', 2);
-- Result: 42

Writing JSON Files

Export data as JSON:

COPY (
  SELECT id, name FROM movies
) TO 'output.json' (FORMAT JSON);

Hive Partitioning with JSON

Read JSON files from Hive-partitioned directories:

SELECT j, count(*) 
FROM read_json_auto(
  'data/j=*/*.csv',
  HIVE_PARTITIONING = 1
)
GROUP BY j 
ORDER BY j;

Common Patterns

Extract Multiple Fields

SELECT 
  data->>'$.user.id' AS user_id,
  data->>'$.user.name' AS user_name,
  data->>'$.timestamp' AS timestamp
FROM events;

Filter by JSON Content

SELECT * FROM events
WHERE data->>'action' = 'purchase';

Convert JSON to Table

CREATE TABLE structured_events AS
SELECT 
  id,
  data->>'user' AS user,
  data->>'action' AS action,
  CAST(data->>'amount' AS DECIMAL) AS amount
FROM events;

Aggregate JSON Arrays

SELECT 
  data->>'user' AS user,
  COUNT(*) AS action_count
FROM events
GROUP BY data->>'user';

Performance Tips

Use json_extract_string when you need string values (avoids JSON type overhead)
For frequent queries on JSON fields, consider extracting them into regular columns
JSON files are less efficient than Parquet for analytical queries; consider converting large datasets
Use column projection when reading JSON files: SELECT field1, field2 FROM 'data.json'
Enable union_by_name when reading multiple files with varying schemas

Error Handling

Handle malformed JSON gracefully:

SELECT * FROM read_json_auto('data.json',
  ignore_errors = true
);

Virtual Columns

Access file metadata:

SELECT 
  *,
  filename,
  parse_filename(filename) AS parsed_name
FROM read_json_auto('data/*.ndjson');

Getting Started

Core Concepts

SQL Guide

Data Import & Export

Extensions

Performance

Reading JSON Files

Using read_json_auto

JSON File Formats

Newline-Delimited JSON (NDJSON)

JSON Array

Multiple Files and Union

Reading JSON Objects

The JSON Data Type

JSON Functions

json_extract

json_extract_string

json_extract_path

Handling NULL Values

JSON Structure Analysis

Querying Nested JSON

Array Operations

Writing JSON Files

Hive Partitioning with JSON

Common Patterns

Extract Multiple Fields

Filter by JSON Content

Convert JSON to Table

Aggregate JSON Arrays

Performance Tips

Error Handling

Virtual Columns

Getting Started

Core Concepts

SQL Guide

Data Import & Export

Extensions

Performance

Documentation Index

​Reading JSON Files

​Using read_json_auto

​JSON File Formats

​Newline-Delimited JSON (NDJSON)

​JSON Array

​Multiple Files and Union

​Reading JSON Objects

​The JSON Data Type

​JSON Functions

​json_extract

​json_extract_string

​json_extract_path

​Handling NULL Values

​JSON Structure Analysis

​Querying Nested JSON

​Array Operations

​Writing JSON Files

​Hive Partitioning with JSON

​Common Patterns

​Extract Multiple Fields

​Filter by JSON Content

​Convert JSON to Table

​Aggregate JSON Arrays

​Performance Tips

​Error Handling

​Virtual Columns

Reading JSON Files

Using read_json_auto

JSON File Formats

Newline-Delimited JSON (NDJSON)

JSON Array

Multiple Files and Union

Reading JSON Objects

The JSON Data Type

JSON Functions

json_extract

json_extract_string

json_extract_path

Handling NULL Values

JSON Structure Analysis

Querying Nested JSON

Array Operations

Writing JSON Files

Hive Partitioning with JSON

Common Patterns

Extract Multiple Fields

Filter by JSON Content

Convert JSON to Table

Aggregate JSON Arrays

Performance Tips

Error Handling

Virtual Columns