MEPS stands for Matching Extensions for PositionableStream. You can use this package to translate a BNF grammar into an equivalent recursive descent parser or scanner.
MEPS is simple enough for one-liners yet capable enough for complete parsers or scanners. MEPS can be downloaded here or from SqueakMap.
(a) 'abc' readStream match: 'ab' action:[:r | r] "returns 'ab' "
(b) STMethod class
from: input startingWith: start
"method = messagePattern [temps][primitiveNumber][statements]['.']"
^input
match: STMessagePattern
andMaybe: STTemporaries
andMaybe: STPrimitiveNumber
andMaybe: STStatements
andMaybe: [input matchToken: '.']
action: [:list |
self new
messagePattern: list first
; temporaries: list second
; primitiveNumber: list third
; statements: list fourth]
(c) rcptCommand
match: '[Tt][Oo]:\s*<\s*' regex
and: EmailAddress
and: '\s*>\s*' regex
action: [:r | r second] Example 1: (a) One-liner. (b) Method from a Smalltalk parser. (c) Combining regular expressions with MEPS. (using additional MEPS regex package).
The receiver in a MEPS method is a PositionableStream. The stream may contain whatever objects are expected by the sub-parsers specified in the match arguments, such as characters in example 1(a), or tokens for a parser, as in example 1(b).
The selector consists of one or more matching keywords followed by an action keyword. There are over 60 different selectors which cover a wide variety of parsing situations.
The match arguments are sub-parsers. A sub-parser is any object which implements the method #from:startingWith:. A sub-parser can be as simple as a single character or string, or a block containing other MEPS statements, or a class such as STMessagePattern in example 1(b). The action object is a block or other evaluable.
MEPS double dispatches #from:startingWith: to each sub-parser, passing the stream and current element. If the result is not nil (ie the sub-parse was successful) then the method returns the evaluation of the action block with the parse result. Otherwise the stream is rewound, the action block is not called, and the return value is nil.
(a) 'abc' readStream match: $a and: $b action: [:r | r] (b) 'aaa' readstream match: $a and: $b action: [:r | r]
Example 2: (a) parse succeeds. (b) parse fails
2(a) is evaluated as follows:
2(b) is evaluated as as follows:
Using blocks as sub-parsers allows complex BNF rules to be represented literally. The following example is taken from the MEPS regex parser.
RxBracket class method
from: input startingWith: s
"bracketExp = '[' ['^'] (range | any | escape | specialChar)+ ']' "
^input
match: $[
and:[input
maybeMatch: $^
and:
[input
matchOneOrMore:
[input
match: RxRange
or: RxAny
or: RxEscape
or: RxSpecialCharacter
action: [:r|r]]
action:[:r|r]]
action: [:r|r]]
and: $]
action: [:list |
self new
negate: list second first notNil
; list: list second second]
"
'[a]' regex << 'a'
'[^a]' regex << 'a'
'[ab-c]' regex << 'c'
'[a-z0-9]' regex << '3'
'[a-z]+[A-Z]' regex << 'ababaB'
"Example 3: Nested Blocks
Using classes as sub-parsers allows a parse to be implemented as a set of parse nodes which know how to deserialize themselves from a stream, rather than having a centralized parser class or generator.
STBinaryExpression class
from: input startingWith: start
" binaryExp = unaryExp binaryMessage+ "
^input
match: STUnaryExpression
and: [input matchOneOrMore: STBinaryMessage action: [: r | r ]]
action: [:list |
self new
unaryExpression: list first
; binaryMessages: list second]Example 4: Parse nodes (STBinaryExpression,STUnaryExpression etc) as sub-parsers, showing the implementation of the STBinaryExpression sub-parser.
The next section provides an example of how to implement a complete grammar. Our example will be the parsing of expressions in the regex parser which is included with MEPS. MEPS regular expressions can integrated with MEPS as in example 1(c).
regex = branch [ '|' branch ]*
branch = piece+
piece = atom ['*' | '+' | '?']
atom = bracketExp | dot | any | parenExp | escape | specialChar
bracketExp = '[' (['^'] (range | any | escape | specialChar)+ ']'
range = anychar '-' anychar
dot = '.'
anychar = any character except . [ ] $ ^ ( ) | \
parenExp = '(' regex ')'
escape = '\' ascii-character
specialChar = '^' | '$'Example 5: Grammar of MEPS regular expressions.
Each production is represented by an object:
RxNode
Regex
RxAtom
RxBracket
RxCharacter
RxAny
RxDot
RxEscape
RxSpecialCharacter
RxParen
RxBranch
RxPiece
RxRangeExample 6: Class Hierarchy of the MEPS regular expression objects
Finally, here is the class-side parse for each node. (Note that these objects also implement #from:startingWith: on the instance side so that compiled expressions can be used as sub-parsers with MEPS.
Regex class from: input startingWith: s " regex = branch [ '|' branch ]* " ^input matchOneOrMore: RxBranch separatedBy: $| action: [:list | self new branches: list]
RxBranch class from: input startingWith: s " branch = piece+ " ^input matchOneOrMore: RxPiece action: [:list | self new pieces: list]
RxBracket class (see example 3)
RxRange class from: input startingWith: s " anychar = any character except . [ ] $ ^ ( ) | \ " ^input match: [(RxAny from: input) ifNotNilDo: [:f | f isCharacter ifTrue: [f]]] and: $- and: [(RxAny from: input) ifNotNilDo: [:t | t isCharacter ifTrue: [t]]] action: [:list | (self new) r1: list first ; r2: list last]
RxDot class from: input startingWith: c " dot = '.' " ^input match: $. action: [:x | RxDot new]
RxAny class from: input startingWith: c ^input match: [ input atEnd ifTrue:[^nil] . ('[]$^()|\' includes: c) ifFalse:[input next]] action:[ : r | self new characters: r asString]
RxParen class from: input startingWith: s " parenExp = '(' regex ')'" ^input match: $( and: Regex and: $) action: [:list | self new exp: list second]
RxEscape class from: input startingWith: s " escape = '\' ascii-character " ^input match: $\ and: [input atEnd ifFalse: [input next]] action: [:list | self new character: list second]
RxSpecialCharacter class from: input startingWith: s " specialChar = '^' | '$' " ^input match: $^ or: $$ action: [:c | self new character: c ]
Example 7: Class side parsing of regular expression nodes.