In flex it is possible to provide initialising code in the definition section (see section 4.1) and as the first lines in the rules section.
Flexc++ does not support these code blocks. Since flexc++ generates a class with appropriate header files, there are other means to include code in your scanner. See also generated files 3.3 below.
Flexc++ also does not support a last `user code' section, where additional code can be placed to be copied verbatim to the source file. A second section delimiter (%%) is therefore considered a syntax error.
There are two reasons for dropping support of these code blocks. First, the format of the lexer file becomes simpler. Second, the alternatives to the code blocks are actually preferred. With flex one would use code blocks before the rules to declare local variables that are used in some of the actions. With flexc++ it is possible to use data members of the scanner class for this. With flex the third section of the lexer file could be used to define helper functions. With flexc++ helper methods may be defined in the scanner class. Below we list the differences between flex and flexc++. We provide suggestions for flexc++ solutions to problems that were addressed by flex features that we no longer support.
Sections 3.1.1, 3.1.2 and 3.1.3 provide a list of items which are no longer supported in flexc++ and the suggested solution.
%top block, copies code to top of yylex.cc.
`%{ ... %}' blocks copied verbatim to
yylex.cc
`/* ... */') is copied to yylex.cc
`%{ ... %}' blocks before first rule copied
to the top of the function body of yylex.cc
`%{ ... %}' blocks are copied
to the output, but meaning is ill-defined and compiler errors may
result.
The most important difference in behaviour with regard to patterns is
the fact that flexc++ can actually match the empty string. For example,
the regular expression a* matches zero or more a's. Thus, it
should also match the empty string. In flex, it does not. Currently,
in flexc++, it does. This means that a lexer file with only the pattern
a* in it, and a 'b' on the input will cause flexc++ to loop forever.
Therefore, care must be taken with patterns that match the empty string.
Not all patterns that are supported by flex are yet supported by flexc++. Notably, flexc++ does not yet support certain flags in regular expressions, e.g. a flag that makes the regular expression case insensitive, or a flag that allows whitespace in a regular expression.
Another small difference in the patterns is that in a named pattern, defined in the definion section, we do not allow the lookahead operator (`/') or the begin anchor operator (`^'). That is because we treat the name expansion as a group if it appears in a pattern in the rule section. Since groups may occur any number of times in a regular expression but a lookahead operator or a begin anchor operator only once, we do not accept them in a name definition.
Flexc++ generates more files than flex does. While flex only generates a
lex.yy.cc, flexc++ generates a number of header files and a source file:
scanner.h, scanner.ih, scannerbase.h, and lex.cc. Both
scannerbase.h and lex.cc are overwritten when flexc++ is invoked.
The other two files (scanner.h and scanner.ih) are only generated the
first time flexc++ is called. These files can be used to add additional code
to. If you want to use a namespace for your actions, for instance, a using
namespace directive can be added to scanner.ih.
Since C++ supports the concept of namespaces, the yy-prefix for every
member and macro is no longer needed. Most functions can now be used without
the prefix. Also, because flexc++ generates a scanner class, instead of macros
often member functions of the scanner class may be used. See the conversion
table below.
| flex | flexc++ | flexc++ alternative |
yylex() |
lex() |
|
YYText() |
text() |
match() |
YYLeng() |
leng() |
|
ECHO |
ECHO() |
|
yymore() |
more() |
|
yyless() |
less() |
|
BEGIN startcondition |
begin(startcondition) |
|
YY_AT_BOL |
atBol() |
|
yy_set_bol(at_bol) |
setBol(bool atBol) |
The member functions in the flexc++ column above are either members of
Scanner or one of it's base classes. Also note that flexc++ no longer uses
macros and all member functions can be used from either actions or other
member functions.
Flex also offers macros that flexc++ no longer supports. We list them here, along with their purpose and suggestions for alternative solutions with flexc++.
YY_USER_ACTION and yy_act
YY_NUM_RULES
YY_USER_INIT
yy_set_interactive
YY_BREAK
break statement in the switch statement that
contains all actions. It is suggested that C++ programmers might
redefine it to nothing, and manually make sure that all actions in
the lexer have either a break or a return statement. The reason
being that otherwise the compiler might give a warning that the
break statement is unreachable because of the return statement
above it.
g++ -pedantic. We would be glad to hear if
you experience problems.
Flexc++ differs completely from flex in how it handles multiple input streams. The method for stream switching is described in full in section 9.