Week 2

Exercises on scanners and parsers

  1. Implement 3 different flex-based scanners that:

    1. Format text in 80-column blocks while removing multiple blanks, leading blanks, and trailing blanks.
    2. Remove all tags from an HTML document.
    3. Introduce plausible spelling errors.

    Provide a Makefile that includes a "check" target for testing the scanners that works with sample input and output files that you also provide.

  2. What do the following regular expressions match?

    1. \"([^\"])*\"
    2. http:[^?]*?
    3. [-+]?[0-9]*\.[0-9]+([eE][-+]?[0-9]+)?

  3. A question from a previous year said, "Read the flex documentation and discuss the full syntax for regular expressions." Since you could simply echo back the manual to answer that, instead explain what the following regular expressions do:

    1. (^|[ \t\r\n])((ftp|http|https|gopher|mailto|news|nntp|telnet|wais|file|prospero|aim|webcal):(([A-Za-z0-9$_.+!*(),;/?:@&~=-])|%[A-Fa-f0-9]{2}){2,}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*(),;/?:@&~=%-]*))?([A-Za-z0-9$_+!*();/?:~-]))
    2. (^|[ \t\r\n<;(])((([A-Za-z0-9$_.+%=-])|%[A-Fa-f0-9]{2})+@(([A-Za-z0-9$_.+!*,;/?:%&=-])|%[A-Fa-f0-9]{2})+\.[a-zA-Z0-9]{1,4})

    That is, identify what you think they are attempting to match, provide an English description of each component of the regular expression, and talk about any problems with respect to RFC specifications if you can spot or know about them.

  4. Invent a grammar for a small language that is amusing in some way. Amusing is defined as being able to make your instructor laugh. Provide the following:

    1. A BNF grammar
    2. An EBNF grammar for the same thing
    3. Railroad syntax diagrams
    4. An English description of the language and grammar.

    You should draw the syntax diagrams using rail.sty or some other automated tool. You won't fail the exercise if you draw them by hand, but since you'll need an automated tool later in the course you should really just learn how to do it now. If you want to see how they were typeset for your slides, look at the slides directory, in particular scanparse.tex and build_scanparse.sh. Cookies if you can write a working Makefile.

Maintained by Christopher J. F. Pickett Last modified Tue Sep 11 00:00:45 EDT 2007. [HOME]