Custom Parsers (JavaScript)

For more complex parsing jobs you can write custom parsers in JavaScript. To use it set engine to javascript and use your JavaScript code in the source field of the root schema.

Note: Although the JavaScript engine is fairly performant, it doesn’t match the performance of native engines, so avoid using it for tasks that could be achieved with other parsers.

JavaScript API

The Optimus Mine JavaScript engine uses ES2015 with only minor additions.

Code Description
$input The $input variable contains the incoming data.
$callback The $callback function is used to pass data to the next task.
require Used to import modules. Imported modules need to be CommonJS modules using ES5 syntax.

Limitations

You can only have one script per schema which needs to be set on the source property of the root schema.

Examples

Generating URLs for scraping

In the following example we’re using the JavaScript engine to build a list of URLs to be consumed by a scrape task.

tasks:
  - task: parse
    engine: javascript
    input: '1-10'
    schema:
      type: string
      source: |
        const [start, end] = $input.split('-');
        for (let i = start; i <= end; i++) {
          $callback(`https://example.com/products/page-${i}.html`);
        }

Output:

https://example.com/products/page-1.html
https://example.com/products/page-2.html
https://example.com/products/page-3.html
https://example.com/products/page-4.html
https://example.com/products/page-5.html
https://example.com/products/page-6.html
https://example.com/products/page-7.html
https://example.com/products/page-8.html
https://example.com/products/page-9.html
https://example.com/products/page-10.html

Parsing more complex data structures

Since you can only use JavaScript on the root schema you need to return an object containing the structure defined in the schema, omitting the source property for all sub-schemas. You can however still use filters on them as you would for any other parser.

See the example below on how to achieve this.

tasks:
  - task: parse
    engine: javascript
    input: 'Bob:34:02071234567'
    schema:
      type: object
      source: |
        const [name, age, phone] = $input.split(':');
        $callback({ name, age, phone });
      properties:
        name:
          type: string
        age:
          type: integer
        phone:
          type: string
          filter:
            type: phone
            country: GB

Output:

name: Bob
age: 34
phone: +4402071234567