Quick Guide

Using bootpeg to create a parser is done in three steps: Define a Grammar to match input, provide some Actions to transform matches, and tell bootpeg to use both to create the parser.

This guide shows the parts needed to create a parser for basic scientific number notation. It can parse numbers as integer, such as 10 or -16, or scientific, such as 1e1 or -160E-1.

Grammar and Actions

The Grammar is a textual definition how to match input. [1] You can write a grammar in one of many meta-grammars, though we suggest the bootpeg meta-grammar by default. A grammar consists of rules that decide which input is valid and how it is to be interpreted:

# bootpeg uses # for line comments
# the first rule must match the entire input, usually via other rules
top:
    | scientific | integer

# a named rule – can be referred to in other rules by name
integer:
    | literal=([ "-" ] "1"-"9" "0"-"9"*) { Integer(literal) }
#     ^        ^       ^                  ^ interpret match as Integer
#     |        |       \ match a digit between 1 and 9 followed by arbitrary many digits
#     |        \ match an optional leading sign
#     \ capture (parts of) match to transform them

scientific:
    | base=integer ("E" | "e")  exponent=integer { base * (10 ** exponent) }
#     ^             ^ either E or e may be used for the exponent
#     \ capture part of the match to interpret it separately

The Actions are callables and constants needed to transform input. You can define actions freely using any code you like; bootpeg merely expects them in a mapping from each of the grammar’s names to the respective action:

>>> # map from names used in the grammar to callables
>>> example_actions = {
...     # use the builtin `int` to interpret any matched Integer
...     'Integer': int
... }

The Grammar uses captures like base=integer to select part of the input. The transformations like { Integer(literal) } define how captured values are passed to Actions.

Boot the PEG

There are two convenient ways to create a parser from the grammar and actions:

  • use bootpeg.create_parser to load the grammar from a string, or

  • use bootpeg.import_parser to load the grammar from a packaged file.

Generally, you should work with a string for interactive development, and a packaged file in any other case. In addition to the grammar, both functions also take the actions of your new parser, and the bootpeg dialect of the grammar.

Simply pass in the grammar, the actions defined, and the dialect – in this case bootpeg.grammars.bpeg for the bootpeg meta-grammar.

>>> from bootpeg import create_parser
>>> from bootpeg.grammars import bpeg
>>> example_grammar = """\
... top:
...     | scientific | integer
... integer:
...     | [ "-" ] "1" - "9" ("0" - "9")* { Integer(.*) }
... scientific:
...     | base=integer ("E" | "e")  exponent=integer { .base * (10 ** .exponent) }
... """
>>> example_actions = {'Integer': int}
>>> parse = create_parser(example_grammar, bpeg, example_actions)
>>> parse("12")
12
>>> parse("12E6")
12000000

Where to next?

As bootpeg uses the PEG formalism, grammars are order-dependent. In the example, swapping | scientific | integer for | integer | scientific would not work well: matching 12E6 would just match 12 as an integer and be done. See Choices and Precedence on how to use ordering to your advantage.

Separating the task of grammars and actions is integral to how bootpeg operates. Likewise, one should aim at creating parsers where each part handles its strong point. In the example, the grammar recognizes numbers but the actions interpret them. See Actions and Captures on how to best match the tasks of grammars and actions.