milestone: scanner and parser - note on lexer states

When implementing the WIG scanner you may have to take into account the fact that WIG has multiple so-called lexical scopes. You already encountered lexical scopes when implementing multi-line comments for JOOS: inside comments, there exists a different set of tokens/keywords than outside comments. WIG has even more such scopes. Consider the following little WIG fragment:

service {
const html Compliment = <html> <body>
This is a <[fin]> great service, man!
</body> </html>;

const html Pledge = <html> <body>
What is your name?
<input name=name type="text" size=20>
</body> </html>;

string name; //name is an id here, although it is a keyword inside HTML tags
//inside HTML text, it's considered plain text

session Contribute() {

In this snippet I identified the following lexical scopes:
  • WIG syntax: Here, stuff like service, const, html and so on are keywords. "name" is not a keyword.
  • HTML syntax:
    • is entered when <html> is scanned and left when </html> is scanned
    • unlike in WIG syntax, service, const etc. are no keywords
    • > and < have different meaning than in WIG syntax (although the scanner may not necessarily have to distinguish those)
  • HTML Tags: Here input, name etc. should be keywords so that the parser can recognize them specially.
  • Holes: only allow for identifiers - any identifiers in fact, including those that would be keywords in other scopes, e.g. <[html]> is valid
  • HTML right-hand side values: It may be useful to have another scope here so that e.g. name is not recognized as a keyword.

Can you think of other lexical scopes? What about HTML comments? Do those exist in the benchmarks? If so, can HTML comments be nested?

You can extend a Flex scanner with lexical scopes using so-called start conditions. You prefix a regular expression with <c> to denote that it should only be scanned when being in state c. You switch to a state c by calling BEGIN(c) in the scanner's action.

SableCC supports a similar mechanism using so-called states (see pages 35 ff).

Maintained by Chris Pickett. [HOME]