Documentation Index Fetch the complete documentation index at: https://mintlify.com/duckdb/duckdb/llms.txt
Use this file to discover all available pages before exploring further.
DuckDB provides comprehensive support for JSON data, including reading JSON files directly, querying JSON structures, and working with the JSON data type.
Reading JSON Files
Query JSON files directly without manual data loading:
SELECT * FROM 'data.json' ;
DuckDB automatically detects JSON format and structure.
Using read_json_auto
For more control over JSON reading:
Newline-Delimited JSON
JSON Array
Multiple Files
With Filename
SELECT * FROM read_json_auto( 'movies.ndjson' );
DuckDB supports multiple JSON file formats:
Newline-Delimited JSON (NDJSON)
{ "id" : 1 , "name" : "O Brother, Where Art Thou?" }
{ "id" : 2 , "name" : "Home for the Holidays" }
{ "id" : 3 , "name" : "The Firm" }
SELECT * FROM 'movies.ndjson' ;
JSON Array
[
{ "id" : 1 , "name" : "O Brother, Where Art Thou?" },
{ "id" : 2 , "name" : "Home for the Holidays" },
{ "id" : 3 , "name" : "The Firm" }
]
SELECT * FROM 'movies.json' ;
Multiple Files and Union
Combine JSON files with different schemas:
SELECT * FROM read_json_auto(
['file1.ndjson', 'file2.json'],
union_by_name = true
)
ORDER BY id;
Missing fields are filled with NULL values.
Reading JSON Objects
Preserve raw JSON structure with read_json_objects:
SELECT * FROM read_json_objects_auto( 'data.ndjson' );
This returns each JSON object as a single JSON column:
{"id":1,"name":"O Brother, Where Art Thou?"}
{"id":2,"name":"Home for the Holidays"}
{"id":3,"name":"The Firm"}
The JSON Data Type
DuckDB provides a native JSON type for storing and querying JSON data:
CREATE TABLE events (
id INTEGER ,
data JSON
);
INSERT INTO events VALUES
( 1 , '{"user": "alice", "action": "login"}' ),
( 2 , '{"user": "bob", "action": "purchase", "amount": 99.99}' );
JSON Functions
DuckDB provides rich functions for working with JSON data:
Extract values from JSON:
JSONPath Syntax
Slash Syntax
Arrow Operator
SELECT json_extract(
'{"my_field": {"my_nested_field": ["goose", "duck"]}}' ,
'$.my_field.my_nested_field[1]'
);
-- Result: "duck"
Extract values as strings (removes quotes):
Function Syntax
Operator Syntax
SELECT json_extract_string(
'{"my_field": {"my_nested_field": ["goose", "duck"]}}' ,
'/my_field/my_nested_field/1'
);
-- Result: duck
Alias for json_extract:
SELECT json_extract_path(
'{"user": {"name": "alice"}}' ,
'/user/name'
);
-- Result: "alice"
Handling NULL Values
JSON functions properly handle NULL:
NULL in JSON
Missing Path
NULL Document
SELECT json_extract( '{"foo": null}' , '$.foo' );
-- Result: null (JSON null)
JSON Structure Analysis
Analyze JSON structure:
SELECT json_structure( '{"duck":"goose"}' -> 'duck' );
-- Result: "VARCHAR"
Querying Nested JSON
Work with complex nested structures:
SELECT
id,
data ->> 'user' AS user,
data ->> 'action' AS action ,
CAST ( data ->> 'amount' AS DECIMAL ) AS amount
FROM events;
Result:
id | user | action | amount
---+-------+----------+-------
1 | alice | login | NULL
2 | bob | purchase | 99.99
Array Operations
Extract array elements:
SELECT json_extract( '[1, 2, 42]' , 2 );
-- Result: 42
Writing JSON Files
Export data as JSON:
COPY (
SELECT id, name FROM movies
) TO 'output.json' (FORMAT JSON );
Hive Partitioning with JSON
Read JSON files from Hive-partitioned directories:
SELECT j, count ( * )
FROM read_json_auto(
'data/j=*/*.csv' ,
HIVE_PARTITIONING = 1
)
GROUP BY j
ORDER BY j;
Common Patterns
SELECT
data ->> '$.user.id' AS user_id,
data ->> '$.user.name' AS user_name,
data ->> '$.timestamp' AS timestamp
FROM events;
Filter by JSON Content
SELECT * FROM events
WHERE data ->> 'action' = 'purchase' ;
Convert JSON to Table
CREATE TABLE structured_events AS
SELECT
id,
data ->> 'user' AS user,
data ->> 'action' AS action ,
CAST ( data ->> 'amount' AS DECIMAL ) AS amount
FROM events;
Aggregate JSON Arrays
SELECT
data ->> 'user' AS user,
COUNT ( * ) AS action_count
FROM events
GROUP BY data ->> 'user' ;
Use json_extract_string when you need string values (avoids JSON type overhead)
For frequent queries on JSON fields, consider extracting them into regular columns
JSON files are less efficient than Parquet for analytical queries; consider converting large datasets
Use column projection when reading JSON files: SELECT field1, field2 FROM 'data.json'
Enable union_by_name when reading multiple files with varying schemas
Error Handling
Handle malformed JSON gracefully:
SELECT * FROM read_json_auto( 'data.json' ,
ignore_errors = true
);
Virtual Columns
Access file metadata:
SELECT
* ,
filename ,
parse_filename( filename ) AS parsed_name
FROM read_json_auto( 'data/*.ndjson' );