Parsing Data
Parse engines are used to parse incoming data using a schema. The syntax is based on JSON Schema with a few additions.
String
The string
type is used for text.
Syntax
schema:
type: string
Integer
The integer
type is used for whole numbers (i.e. no fractions) and can be positive, negative or
zero.
Fractional components of numbers will be dropped e.g. 1.9
becomes 1
.
Syntax
schema:
type: integer
Example
tasks:
- task: parse
input:
- '-42'
- '1.9'
- '0'
- '12'
schema:
type: integer
Outputs:
- -42
- 1
- 0
- 12
Number
The number
type is used for floating point numbers (i.e. numbers with a fractional component).
Syntax
schema:
type: number
Boolean
The boolean
type can be either true
or false
. Booleans can be parsed from numbers & strings.
Values that are parsed as true
: 1
, t
, T
, true
, TRUE
, True
.
Values that are parsed as false
: 0
, f
, F
, false
, FALSE
, False
.
Syntax
schema:
type: boolean
Example
tasks:
- task: parse
input:
- '1'
- '0'
- 'FALSE'
schema:
type: boolean
Outputs:
- true
- false
- false
Array
The array
type is used for lists of any single type. You could for example have an array of
booleans, strings, numbers etc. It is however not possible to have an array that contains multiple
types.
If the root schema is an array most exporters will consider each element of the array a separate
record. To have an array in a single record you can wrap it in an object
type.
Syntax
schema:
type: array
items: <schema-definition>
Examples
tasks:
- task: parse
engine: json
input: '{ "data": [ "foo", "bar", "baz" ] }'
schema:
type: array
source: 'data'
items:
type: string
source: '.'
Outputs:
- foo
- bar
- baz
Object
The object
type defines a key-value map. The keys must be strings. Each mapping of a key and a
value is referred to as a property.
By adding a property name (i.e. a key) in the required
list any object that doesn’t contain the
property will be skipped.
Syntax
schema:
type: object
properties:
<property-name>: <schema-definition>
required:
- <property-name>
Example
In the following example we’re parsing a few json
objects. Notice how the 3rd input is omitted
from the results as we’ve set name as a required property.
tasks:
- task: parse
engine: json
input:
- '{ "name": "Alice" }'
- '{ "name": "Bob", "age": 34 }'
- '{ "age": 45 }'
schema:
type: object
properties:
name:
type: string
age:
type: integer
required:
- name
Outputs:
- name: Alice
- name: Bob
age: 34