This is a course page of David Casperson |
|
Nested comments cannot be handled by a lexical analyzer built from a finite state machine. This has consequences for tools. For instance Emacs might have a harder time colouring comments.
On the other hand, a programmer can easily comment out any section of program that doesn't explicitly contain a fragment of a comment if comments nest. When comments don't nest it is quite easy to make an error when attempting to comment out a long section of code.
One of the tricky things here is first specifying how we are going to specify a regular expression. The answer given here is quite loose and in the spirit of Flex.
printing = a | b | c | ... /* all printing characters. */ escape = a | b | f | n | r | t | v | ' | " | ? | ooo | xhhhwhere
ooo
is a sequence of octal digits and
xhhh
stands for a sequence of hexadecimal digits.
slash = \ lf = /* the line ending sequence */ string_literal = " ( {printing} | {slash} {escape} | {slash} {lf} ) * "In this solution {, }, |, *, and ( and ) are meta-characters, as are the C-style comments. The rest of the characters are intended literally.
An actual regular expression can be obtained by substiting previous
definitions in for the
{
...}
sequences
in the final definition.
Here is another solution that builds a Finite State Machine first.
fall-2024