MEPS

MEPS stands for Matching Extensions for PositionableStream. You can use this package to translate a BNF grammar into an equivalent recursive descent parser or scanner.

MEPS is simple enough for one-liners yet capable enough for complete parsers or scanners. MEPS can be downloaded here or from SqueakMap.

Examples.

(a) 'abc' readStream match: 'ab' action:[:r | r] "returns 'ab' "

(b)  STMethod class
      from: input startingWith: start
      "method = messagePattern [temps][primitiveNumber][statements]['.']"
      ^input
           match: STMessagePattern
	andMaybe: STTemporaries
	andMaybe: STPrimitiveNumber
	andMaybe: STStatements
	andMaybe: [input matchToken: '.']
	action:   [:list |
		    self new
		       messagePattern: list first
		    ; temporaries: list second
		    ; primitiveNumber: list third
		    ; statements: list fourth]
		    
(c) rcptCommand
	match: '[Tt][Oo]:\s*<\s*' regex
	and: EmailAddress
	and: '\s*>\s*' regex
	action: [:r | r second]		    

Example 1: (a) One-liner. (b) Method from a Smalltalk parser. (c) Combining regular expressions with MEPS. (using additional MEPS regex package).

Syntax.

The receiver in a MEPS method is a PositionableStream. The stream may contain whatever objects are expected by the sub-parsers specified in the match arguments, such as characters in example 1(a), or tokens for a parser, as in example 1(b).

The selector consists of one or more matching keywords followed by an action keyword. There are over 60 different selectors which cover a wide variety of parsing situations.

The match arguments are sub-parsers. A sub-parser is any object which implements the method #from:startingWith:. A sub-parser can be as simple as a single character or string, or a block containing other MEPS statements, or a class such as STMessagePattern in example 1(b). The action object is a block or other evaluable.

MEPS double dispatches #from:startingWith: to each sub-parser, passing the stream and current element. If the result is not nil (ie the sub-parse was successful) then the method returns the evaluation of the action block with the parse result. Otherwise the stream is rewound, the action block is not called, and the return value is nil.

(a) 'abc' readStream match: $a and: $b action: [:r | r]

(b) 'aaa' readstream match: $a and: $b action: [:r | r]

Example 2: (a) parse succeeds. (b) parse fails

2(a) is evaluated as follows:

  1. The argument $a is sent the #from:startingWith: message.
  2. Character>>#from:startingWith: determines this is a match and returns the $a on the stream to the match:and:action method. The stream is now points to the $b.
  3. The argument $b is treated the same way and the stream now points to the $c
  4. The action block is evaluated with the orderedCollection ($a $b).
  5. The result of the call is the evaluation of the block, the array #($a $b).

2(b) is evaluated as as follows:

  1. The argument $a is sent the #from:startingWith: message.
  2. String>>#from:startingWith: matches and the stream is advanced.
  3. The argument $b is sent the #from:startingWith: message and returns nil
  4. The stream is rewound to position 0.
  5. The method returns nil without calling the action block.

Blocks as sub-parsers.

Using blocks as sub-parsers allows complex BNF rules to be represented literally. The following example is taken from the MEPS regex parser.

RxBracket class method
from: input startingWith: s 
"bracketExp = '[' ['^'] (range | any | escape | specialChar)+ ']' "
^input 
   match: $[
   and:[input 
          maybeMatch: $^
          and:
	    [input 
	       matchOneOrMore: 
                 [input
                    match: RxRange
                    or: RxAny
                    or: RxEscape
                    or: RxSpecialCharacter
                    action: [:r|r]]
               action:[:r|r]] 
	  action: [:r|r]]
   and: $]
   action: [:list | 
     self new
       negate: list second first notNil
     ; list: list second second]
"
'[a]'         regex << 'a'
'[^a]'        regex << 'a'
'[ab-c]'      regex << 'c'
'[a-z0-9]'    regex << '3'
'[a-z]+[A-Z]' regex << 'ababaB'
"

Example 3: Nested Blocks

Classes as sub-parsers.

Using classes as sub-parsers allows a parse to be implemented as a set of parse nodes which know how to deserialize themselves from a stream, rather than having a centralized parser class or generator.

STBinaryExpression class

from: input startingWith: start
" binaryExp = unaryExp binaryMessage+ "
^input 
   match: STUnaryExpression
   and:   [input matchOneOrMore: STBinaryMessage action: [: r | r ]]
   action: [:list | 
      self new
         unaryExpression: list first
      ;  binaryMessages: list second]

Example 4: Parse nodes (STBinaryExpression,STUnaryExpression etc) as sub-parsers, showing the implementation of the STBinaryExpression sub-parser.

MEPS Regular Expressions.

The next section provides an example of how to implement a complete grammar. Our example will be the parsing of expressions in the regex parser which is included with MEPS. MEPS regular expressions can integrated with MEPS as in example 1(c).

regex 		= branch [ '|' branch ]*
branch 		= piece+
piece 		= atom ['*' | '+' | '?']
atom 		= bracketExp | dot | any | parenExp | escape | specialChar
bracketExp 	= '[' (['^'] (range | any | escape | specialChar)+ ']'
range		= anychar '-' anychar
dot 		= '.'
anychar		= any character except . [ ] $ ^ ( ) | \
parenExp 	= '(' regex ')'
escape 		= '\' ascii-character
specialChar 	= '^' | '$'

Example 5: Grammar of MEPS regular expressions.

Each production is represented by an object:

RxNode
   Regex
   RxAtom
      RxBracket
      RxCharacter
         RxAny
	 RxDot
	 RxEscape
	 RxSpecialCharacter
      RxParen
   RxBranch
   RxPiece
   RxRange

Example 6: Class Hierarchy of the MEPS regular expression objects

Finally, here is the class-side parse for each node. (Note that these objects also implement #from:startingWith: on the instance side so that compiled expressions can be used as sub-parsers with MEPS.


Regex class 
from: input startingWith: s 

" regex = branch [ '|' branch ]* "
   ^input 
	matchOneOrMore: RxBranch
	separatedBy: $|
	action: [:list | self new branches: list]

RxBranch class from: input startingWith: s " branch = piece+ " ^input matchOneOrMore: RxPiece action: [:list | self new pieces: list]
RxBracket class (see example 3)
RxRange class from: input startingWith: s " anychar = any character except . [ ] $ ^ ( ) | \ " ^input match: [(RxAny from: input) ifNotNilDo: [:f | f isCharacter ifTrue: [f]]] and: $- and: [(RxAny from: input) ifNotNilDo: [:t | t isCharacter ifTrue: [t]]] action: [:list | (self new) r1: list first ; r2: list last]
RxDot class from: input startingWith: c " dot = '.' " ^input match: $. action: [:x | RxDot new]
RxAny class from: input startingWith: c ^input match: [ input atEnd ifTrue:[^nil] . ('[]$^()|\' includes: c) ifFalse:[input next]] action:[ : r | self new characters: r asString]
RxParen class from: input startingWith: s " parenExp = '(' regex ')'" ^input match: $( and: Regex and: $) action: [:list | self new exp: list second]
RxEscape class from: input startingWith: s " escape = '\' ascii-character " ^input match: $\ and: [input atEnd ifFalse: [input next]] action: [:list | self new character: list second]
RxSpecialCharacter class from: input startingWith: s " specialChar = '^' | '$' " ^input match: $^ or: $$ action: [:c | self new character: c ]

Example 7: Class side parsing of regular expression nodes.