| This is a course page of David Casperson |
|
Nested comments cannot be handled by a lexical analyzer built from a finite state machine. This has consequences for tools. For instance Emacs might have a harder time colouring comments.
On the other hand, a programmer can easily comment out any section of program that doesn't explicitly contain a fragment of a comment if comments nest. When comments don't nest it is quite easy to make an error when attempting to comment out a long section of code.
One of the tricky things here is first specifying how we are going to specify a regular expression. The answer given here is quite loose and in the spirit of Flex.
printing = a | b | c | ... /* all printing characters. */
escape = a | b | f | n | r | t | v | ' | " | ?
| ooo | xhhh
where ooo is a sequence of octal digits and
xhhh stands for a sequence of hexadecimal digits.
slash = \
lf = /* the line ending sequence */
string_literal = " ( {printing} | {slash} {escape} | {slash} {lf} ) * "
In this solution
{,
},
|,
*,
and
(
and ) are
meta-characters, as are the C-style comments. The rest of the
characters are intended literally.
An actual regular expression can be obtained by substiting previous
definitions in for the
{...} sequences
in the final definition.
Here is another solution that builds a Finite State Machine first.
fall-2024