Parquet Parser

Parquet is currently experimental and may change in future versions.

The Protocol Buffers parser is used for parsing incoming Protocol Buffers data. To use it set decoder to parquet. Fields can be mapped to a different name by using the parquet field names in your schema’s source.

Types

You will need to specify the corresponding type and format. See Parsing Data for more information about formats.

Primitive Types

Parquet Type Schema Type Schema Format
boolean boolean
int32 integer int32
int64 integer int64
int96 n/a
float number float32
double number float64
binary string bytes

Logical Types

Strings

ConvertedType LogicalType Schema Type Schema Format
UTF8 STRING string
ENUM string
UUID string uuid

Numeric

ConvertedType LogicalType Schema Type Schema Format
INT_8 INT(8,true) integer int8
INT_16 INT(16,true) integer int16
INT_32 INT(32,true) integer int32
INT_64 INT(64,true) integer int64
UINT_8 INT(8,false) integer uint8
UINT_16 INT(16,false) integer uint16
UINT_32 INT(32,false) integer uint32
UINT_64 INT(64,false) integer uint64

Temporal

Convert to date/time
ConvertedType LogicalType Schema Type Schema Format
TIME_MILLIS TIMESTAMP(isAdjustedToUTC=true,unit=MILLIS) string date-time
TIME_MICROS TIMESTAMP(isAdjustedToUTC=true,unit=MICROS) string date-time
TIMESTAMP(isAdjustedToUTC=true,unit=NANOS) string date-time
Convert to integer
ConvertedType LogicalType Schema Type Schema Format
TIME_MILLIS TIMESTAMP(isAdjustedToUTC=true,unit=MILLIS) integer int32
TIME_MICROS TIMESTAMP(isAdjustedToUTC=true,unit=MICROS) integer int64
TIMESTAMP(isAdjustedToUTC=true,unit=NANOS) integer int64

Repeated Types

To read repeated types use the schema type array with the parquet type defined in items.

Example

Parquet definition:

message record {
  required binary name (STRING);
  required int32 age (INT(32,false));
  repeated group friends {
    required binary friend_name (STRING);
  }
}

Decoder configuration:

decoder: parquet
schema:
  type: object
  properties:
    name:
      type: string
    age:
      type: integer
      format: uint32
    friends:
      type: array
      items:
        type: object
        properties:
          name:
            type: string
            source: friend_name