pt::peg::import::peg(n) | Parser Tools | pt::peg::import::peg(n) |
pt::peg::import::peg - PEG Import Plugin. Read PEG format
package require Tcl 8.5
package require pt::peg::import::peg ?1?
package require pt::peg::to::peg
import text
Are you lost ? Do you have trouble understanding this document ? In that case please read the overview provided by the Introduction to Parser Tools. This document is the entrypoint to the whole system the current package is a part of.
This package implements the parsing expression grammar import plugin processing PEG markup.
It resides in the Import section of the Core Layer of Parser Tools and is intended to be used by pt::peg::import, the import manager, sitting between it and the corresponding core conversion functionality provided by pt::peg::from::peg.
IMAGE: arch_core_iplugins
While the direct use of this package with a regular interpreter is possible, this is strongly disrecommended and requires a number of contortions to provide the expected environment. The proper way to use this functionality depends on the situation:
The API provided by this package satisfies the specification of the Plugin API found in the Parser Tools Import API specification.
peg, a language for the specification of parsing expression grammars is meant to be human readable, and writable as well, yet strict enough to allow its processing by machine. Like any computer language. It was defined to make writing the specification of a grammar easy, something the other formats found in the Parser Tools do not lend themselves too.
It is formally specified by the grammar shown below, written in itself. For a tutorial / introduction to the language please go and read the PEG Language Tutorial.
PEG pe-grammar-for-peg (Grammar) # --------------------------------------------------------------------
# Syntactical constructs
Grammar <- WHITESPACE Header Definition* Final EOF ;
Header <- PEG Identifier StartExpr ;
Definition <- Attribute? Identifier IS Expression SEMICOLON ;
Attribute <- (VOID / LEAF) COLON ;
Expression <- Sequence (SLASH Sequence)* ;
Sequence <- Prefix+ ;
Prefix <- (AND / NOT)? Suffix ;
Suffix <- Primary (QUESTION / STAR / PLUS)? ;
Primary <- ALNUM / ALPHA / ASCII / CONTROL / DDIGIT / DIGIT
/ GRAPH / LOWER / PRINTABLE / PUNCT / SPACE / UPPER
/ WORDCHAR / XDIGIT
/ Identifier
/ OPEN Expression CLOSE
/ Literal
/ Class
/ DOT
;
Literal <- APOSTROPH (!APOSTROPH Char)* APOSTROPH WHITESPACE
/ DAPOSTROPH (!DAPOSTROPH Char)* DAPOSTROPH WHITESPACE ;
Class <- OPENB (!CLOSEB Range)* CLOSEB WHITESPACE ;
Range <- Char TO Char / Char ;
StartExpr <- OPEN Expression CLOSE ; void: Final <- END SEMICOLON WHITESPACE ;
# --------------------------------------------------------------------
# Lexing constructs
Identifier <- Ident WHITESPACE ; leaf: Ident <- ('_' / ':' / <alpha>) ('_' / ':' / <alnum>)* ;
Char <- CharSpecial / CharOctalFull / CharOctalPart
/ CharUnicode / CharUnescaped
; leaf: CharSpecial <- "\\" [nrt'"\[\]\\] ; leaf: CharOctalFull <- "\\" [0-2][0-7][0-7] ; leaf: CharOctalPart <- "\\" [0-7][0-7]? ; leaf: CharUnicode <- "\\" 'u' HexDigit (HexDigit (HexDigit HexDigit?)?)? ; leaf: CharUnescaped <- !"\\" . ; void: HexDigit <- [0-9a-fA-F] ; void: TO <- '-' ; void: OPENB <- "[" ; void: CLOSEB <- "]" ; void: APOSTROPH <- "'" ; void: DAPOSTROPH <- '"' ; void: PEG <- "PEG" WHITESPACE ; void: IS <- "<-" WHITESPACE ; leaf: VOID <- "void" WHITESPACE ; # Implies that definition has no semantic value. leaf: LEAF <- "leaf" WHITESPACE ; # Implies that definition has no terminals. void: END <- "END" WHITESPACE ; void: SEMICOLON <- ";" WHITESPACE ; void: COLON <- ":" WHITESPACE ; void: SLASH <- "/" WHITESPACE ; leaf: AND <- "&" WHITESPACE ; leaf: NOT <- "!" WHITESPACE ; leaf: QUESTION <- "?" WHITESPACE ; leaf: STAR <- "*" WHITESPACE ; leaf: PLUS <- "+" WHITESPACE ; void: OPEN <- "(" WHITESPACE ; void: CLOSE <- ")" WHITESPACE ; leaf: DOT <- "." WHITESPACE ; leaf: ALNUM <- "<alnum>" WHITESPACE ; leaf: ALPHA <- "<alpha>" WHITESPACE ; leaf: ASCII <- "<ascii>" WHITESPACE ; leaf: CONTROL <- "<control>" WHITESPACE ; leaf: DDIGIT <- "<ddigit>" WHITESPACE ; leaf: DIGIT <- "<digit>" WHITESPACE ; leaf: GRAPH <- "<graph>" WHITESPACE ; leaf: LOWER <- "<lower>" WHITESPACE ; leaf: PRINTABLE <- "<print>" WHITESPACE ; leaf: PUNCT <- "<punct>" WHITESPACE ; leaf: SPACE <- "<space>" WHITESPACE ; leaf: UPPER <- "<upper>" WHITESPACE ; leaf: WORDCHAR <- "<wordchar>" WHITESPACE ; leaf: XDIGIT <- "<xdigit>" WHITESPACE ; void: WHITESPACE <- (" " / "\t" / EOL / COMMENT)* ; void: COMMENT <- '#' (!EOL .)* EOL ; void: EOL <- "\n\r" / "\n" / "\r" ; void: EOF <- !. ;
# -------------------------------------------------------------------- END;
Our example specifies the grammar for a basic 4-operation calculator.
PEG calculator (Expression)
Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
Sign <- '-' / '+' ;
Number <- Sign? Digit+ ;
Expression <- '(' Expression ')' / (Factor (MulOp Factor)*) ;
MulOp <- '*' / '/' ;
Factor <- Term (AddOp Term)* ;
AddOp <- '+'/'-' ;
Term <- Number ; END;
Using higher-level features of the notation, i.e. the character classes (predefined and custom), this example can be rewritten as
PEG calculator (Expression)
Sign <- [-+] ;
Number <- Sign? <ddigit>+ ;
Expression <- '(' Expression ')' / (Factor (MulOp Factor)*) ;
MulOp <- [*/] ;
Factor <- Term (AddOp Term)* ;
AddOp <- [-+] ;
Term <- Number ; END;
Here we specify the format used by the Parser Tools to serialize Parsing Expression Grammars as immutable values for transport, comparison, etc.
We distinguish between regular and canonical serializations. While a PEG may have more than one regular serialization only exactly one of them will be canonical.
Assuming the following PEG for simple mathematical expressions
PEG calculator (Expression)
Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ;
Sign <- '-' / '+' ;
Number <- Sign? Digit+ ;
Expression <- '(' Expression ')' / (Factor (MulOp Factor)*) ;
MulOp <- '*' / '/' ;
Factor <- Term (AddOp Term)* ;
AddOp <- '+'/'-' ;
Term <- Number ; END;
then its canonical serialization (except for whitespace) is
pt::grammar::peg {
rules { AddOp {is {/ {t -} {t +}} mode value} Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value} Expression {is {/ {x {t (} {n Expression} {t )}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}} mode value} Factor {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value} MulOp {is {/ {t *} {t /}} mode value} Number {is {x {? {n Sign}} {+ {n Digit}}} mode value} Sign {is {/ {t -} {t +}} mode value} Term {is {n Number} mode value}
}
start {n Expression} }
Here we specify the format used by the Parser Tools to serialize Parsing Expressions as immutable values for transport, comparison, etc.
We distinguish between regular and canonical serializations. While a parsing expression may have more than one regular serialization only exactly one of them will be canonical.
Assuming the parsing expression shown on the right-hand side of the rule
Expression <- '(' Expression ')'
/ Factor (MulOp Factor)*
then its canonical serialization (except for whitespace) is
{/ {x {t (} {n Expression} {t )}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}}
This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category pt of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation.
EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar, import, matching, parser, parsing expression, parsing expression grammar, plugin, push down automaton, recursive descent, serialization, state, top-down parsing languages, transducer
Parsing and Grammars
Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
1 | pt |