Parquet Parser
Parquet is currently experimental and may change in future versions.
The Protocol Buffers parser is used for parsing incoming Protocol Buffers data. To use it set
decoder
to parquet
. Fields can be mapped to a different name by using the parquet field names in
your schema’s source
.
Types
You will need to specify the corresponding type
and format
. See Parsing Data for more
information about formats.
Primitive Types
Parquet Type | Schema Type | Schema Format |
---|---|---|
boolean | boolean | |
int32 | integer | int32 |
int64 | integer | int64 |
int96 | n/a | |
float | number | float32 |
double | number | float64 |
binary | string | bytes |
Logical Types
Strings
ConvertedType | LogicalType | Schema Type | Schema Format |
---|---|---|---|
UTF8 | STRING | string | |
ENUM | string | ||
UUID | string | uuid |
Numeric
ConvertedType | LogicalType | Schema Type | Schema Format |
---|---|---|---|
INT_8 | INT(8,true) | integer | int8 |
INT_16 | INT(16,true) | integer | int16 |
INT_32 | INT(32,true) | integer | int32 |
INT_64 | INT(64,true) | integer | int64 |
UINT_8 | INT(8,false) | integer | uint8 |
UINT_16 | INT(16,false) | integer | uint16 |
UINT_32 | INT(32,false) | integer | uint32 |
UINT_64 | INT(64,false) | integer | uint64 |
Temporal
Convert to date/time
ConvertedType | LogicalType | Schema Type | Schema Format |
---|---|---|---|
TIME_MILLIS | TIMESTAMP(isAdjustedToUTC=true,unit=MILLIS) | string | date-time |
TIME_MICROS | TIMESTAMP(isAdjustedToUTC=true,unit=MICROS) | string | date-time |
TIMESTAMP(isAdjustedToUTC=true,unit=NANOS) | string | date-time |
Convert to integer
ConvertedType | LogicalType | Schema Type | Schema Format |
---|---|---|---|
TIME_MILLIS | TIMESTAMP(isAdjustedToUTC=true,unit=MILLIS) | integer | int32 |
TIME_MICROS | TIMESTAMP(isAdjustedToUTC=true,unit=MICROS) | integer | int64 |
TIMESTAMP(isAdjustedToUTC=true,unit=NANOS) | integer | int64 |
Repeated Types
To read repeated types use the schema type array
with the parquet type defined in items
.
Example
Parquet definition:
message record {
required binary name (STRING);
required int32 age (INT(32,false));
repeated group friends {
required binary friend_name (STRING);
}
}
Decoder configuration:
decoder: parquet
schema:
type: object
properties:
name:
type: string
age:
type: integer
format: uint32
friends:
type: array
items:
type: object
properties:
name:
type: string
source: friend_name