Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/duckdb/duckdb/llms.txt

Use this file to discover all available pages before exploring further.

DuckDB provides comprehensive support for JSON data, including reading JSON files directly, querying JSON structures, and working with the JSON data type.

Reading JSON Files

Query JSON files directly without manual data loading:
SELECT * FROM 'data.json';
DuckDB automatically detects JSON format and structure.

Using read_json_auto

For more control over JSON reading:
SELECT * FROM read_json_auto('movies.ndjson');

JSON File Formats

DuckDB supports multiple JSON file formats:

Newline-Delimited JSON (NDJSON)

{"id": 1, "name": "O Brother, Where Art Thou?"}
{"id": 2, "name": "Home for the Holidays"}
{"id": 3, "name": "The Firm"}
SELECT * FROM 'movies.ndjson';

JSON Array

[
  {"id": 1, "name": "O Brother, Where Art Thou?"},
  {"id": 2, "name": "Home for the Holidays"},
  {"id": 3, "name": "The Firm"}
]
SELECT * FROM 'movies.json';

Multiple Files and Union

Combine JSON files with different schemas:
SELECT * FROM read_json_auto(
  ['file1.ndjson', 'file2.json'],
  union_by_name = true
)
ORDER BY id;
Missing fields are filled with NULL values.

Reading JSON Objects

Preserve raw JSON structure with read_json_objects:
SELECT * FROM read_json_objects_auto('data.ndjson');
This returns each JSON object as a single JSON column:
{"id":1,"name":"O Brother, Where Art Thou?"}
{"id":2,"name":"Home for the Holidays"}
{"id":3,"name":"The Firm"}

The JSON Data Type

DuckDB provides a native JSON type for storing and querying JSON data:
CREATE TABLE events (
  id INTEGER,
  data JSON
);

INSERT INTO events VALUES 
  (1, '{"user": "alice", "action": "login"}'),
  (2, '{"user": "bob", "action": "purchase", "amount": 99.99}');

JSON Functions

DuckDB provides rich functions for working with JSON data:

json_extract

Extract values from JSON:
SELECT json_extract(
  '{"my_field": {"my_nested_field": ["goose", "duck"]}}',
  '$.my_field.my_nested_field[1]'
);
-- Result: "duck"

json_extract_string

Extract values as strings (removes quotes):
SELECT json_extract_string(
  '{"my_field": {"my_nested_field": ["goose", "duck"]}}',
  '/my_field/my_nested_field/1'
);
-- Result: duck

json_extract_path

Alias for json_extract:
SELECT json_extract_path(
  '{"user": {"name": "alice"}}',
  '/user/name'
);
-- Result: "alice"

Handling NULL Values

JSON functions properly handle NULL:
SELECT json_extract('{"foo": null}', '$.foo');
-- Result: null (JSON null)

JSON Structure Analysis

Analyze JSON structure:
SELECT json_structure('{"duck":"goose"}'->'duck');
-- Result: "VARCHAR"

Querying Nested JSON

Work with complex nested structures:
SELECT 
  id,
  data->>'user' AS user,
  data->>'action' AS action,
  CAST(data->>'amount' AS DECIMAL) AS amount
FROM events;
Result:
id | user  | action   | amount
---+-------+----------+-------
1  | alice | login    | NULL
2  | bob   | purchase | 99.99

Array Operations

Extract array elements:
SELECT json_extract('[1, 2, 42]', 2);
-- Result: 42

Writing JSON Files

Export data as JSON:
COPY (
  SELECT id, name FROM movies
) TO 'output.json' (FORMAT JSON);

Hive Partitioning with JSON

Read JSON files from Hive-partitioned directories:
SELECT j, count(*) 
FROM read_json_auto(
  'data/j=*/*.csv',
  HIVE_PARTITIONING = 1
)
GROUP BY j 
ORDER BY j;

Common Patterns

Extract Multiple Fields

SELECT 
  data->>'$.user.id' AS user_id,
  data->>'$.user.name' AS user_name,
  data->>'$.timestamp' AS timestamp
FROM events;

Filter by JSON Content

SELECT * FROM events
WHERE data->>'action' = 'purchase';

Convert JSON to Table

CREATE TABLE structured_events AS
SELECT 
  id,
  data->>'user' AS user,
  data->>'action' AS action,
  CAST(data->>'amount' AS DECIMAL) AS amount
FROM events;

Aggregate JSON Arrays

SELECT 
  data->>'user' AS user,
  COUNT(*) AS action_count
FROM events
GROUP BY data->>'user';

Performance Tips

  • Use json_extract_string when you need string values (avoids JSON type overhead)
  • For frequent queries on JSON fields, consider extracting them into regular columns
  • JSON files are less efficient than Parquet for analytical queries; consider converting large datasets
  • Use column projection when reading JSON files: SELECT field1, field2 FROM 'data.json'
  • Enable union_by_name when reading multiple files with varying schemas

Error Handling

Handle malformed JSON gracefully:
SELECT * FROM read_json_auto('data.json',
  ignore_errors = true
);

Virtual Columns

Access file metadata:
SELECT 
  *,
  filename,
  parse_filename(filename) AS parsed_name
FROM read_json_auto('data/*.ndjson');