JSON Parser
The JSON parser is used for parsing incoming JSON data. To use it set engine
to json
and use a
JSON path in your schema’s source
.
To parse newline delimited JSON data set the engine to ndjson
. Each row will be treated as a
separate JSON document.
JSON Path Syntax
Code | Description |
---|---|
. |
The dot operator is used to denote a child element of the current element e.g. parent.child . |
* |
Wildcard matching 0+ characters e.g. foo* matches foo as well as foobar . |
? |
Wildcard matching any 1 character e.g. fo? matches foo , not foobar . |
\ |
Used to escape special characters such as . , * and ? . |
Any source
starting with a dot (.
) will be considered a relative path and use the parent
schema’s source
as the root.
If the parent schema has the array
type it will resolve to an element of that array.
You can test out your JSON paths on the GJSON Playground.
Examples
JSON
Content of https://example.com/data.json
:
{
"data": {
"person": {
"name": "Alice",
"age": 29,
"friends": [
{
"name": "Bob",
"age": 34
},
{
"name": "Carol",
"age": 47
}
]
}
}
}
Optimus task config:
tasks:
- task: scrape
input: https://example.com/data.json
engine: json
schema:
type: object
source: data.person # could be omitted if we weren't using a relative path for the `name` property below
properties:
name:
type: string
source: .name # as the path starts with a dot it is a relative path from the parent (i.e. `data.person`)
age:
type: integer
source: data.person.age
friends:
type: array
source: data.person.friends
items:
type: object
properties:
name:
type: string
source: .name # since the parent is an array paths starting with a dot will be resolved to the array's items
age:
type: integer
source: .age
Output:
name: Alice
age: 29
friends:
- name: Bob
age: 34
- name: Carol
age: 47
Newline Delimited JSON
Content of https://example.com/data.json
:
{ "name": "Alice", "age": 29}
{ "name": "Bob", "age": 34 }
{ "name": "Carol", "age": 47 }
Optimus task config:
tasks:
- task: scrape
input: https://example.com/data.json
engine: json
schema:
type: object
properties:
name:
type: string
source: name
age:
type: integer
source: age
Output:
- name: Alice
age: 29
- name: Bob
age: 34
- name: Carol
age: 47