flexc++ - Generate a C++ scanner class and scanning function
SYNOPSIS
flexc++ [OPTIONS] [FILENAME]
DESCRIPTION
Generates classes that perform pattern-matching on text.
[Needs a more descriptive text]
FORMAT OF THE INPUT FILE
The flexc++ input file consists of two (not three!) sections, ending at
lines containing %%.
options and definitions
%%
rules
%%
The final section delimiter (%%) is optional.
Note in particular that flexc++ no longer supports (nor needs)
a %header{ ... %} section.
GENERATED FILES
Flexc++ may generate the following files:
A file containing the implementation of the lex() function and its
support functions. By default, this file is called lex.cc.
A file containing the class definition of the scanner class. By
default, this file is called scanner.h.
A file containing the scanner class's base class. This class
defines, among other things, tokens for start conditions. By default,
this file is called scannerbase.h.
A file containing the implementation header. This file is used for
includes and declarations needed for implementing the members of the
scanner class. By default, this file is called scanner.ih.
OPTIONS
If available, single letter options are listed between parentheses
following their associated long-option variants. Single letter options require
arguments if their associated long options require arguments as well.
Options to set filenames
--baseclassheader=header (-b)
Use header as the pathname of the file containing the scanner
class's base class. Defaults to the name of the scanner class plus
base.h
--classheader=header (-c)
Use header as the pathname of the file containing the scanner
class. Defaults to the name of the scanner class plus the suffix
.h
--implementationheader=header (-i)
Use header as the pathname of the file containing the
implementation header. Defaults to the name of the generated
scanner class plus the suffix .ih. The implementation header
should contain all directives and declarations only used by
the implementations of the scanner's member functions. It is the
only header file that is included by the source file containing
lex()'s implementation . User defined implementation of
other class members may use the same convention, thus
concentrating all directives and declarations that are required
for the compilation of other source files belonging to the scanner
class in one header file.
--lexsource=source (-l)
Define source as the name of the source file containing the
scanner member function lex(). Defaults to lex.cc.
Skeleton options
--skeletondirectory=directory (-S)
Specifies the directory containing the skeleton files to use. This
option can be overridden by the specific skeleton-specifying
options (-B, -C, -H, and -I).
--baseclassskeleton=skeleton (-B)
Use skeleton as the pathname of the file containing the
skeleton of the scanner class's base class. Its filename defaults to
scannerbase.h.
--classskeleton=skeleton (-C)
Use skeleton as the pathname of the file containing the
skeleton of the scanner class. Its filename defaults to
scanner.h.
--implementationskeleton=skeleton (-I)
Use skeleton as the pathname of the file containing the
skeleton of the implementation header. Its filename defaults to
scanner.ih.
--lexskeleton=skeleton (-L)
Use skeleton as the pathname of the file containing the
lex() member function's skeleton. Its filename defaults to
lex.cc.
Options to force overwriting
--forceclassheader
By default the generated class header is not overwritten once it
has been created. This option can be used to force the
(re)writing of the file containing the scanner's class.
--forceimplementationheader
By default the generated implementation header is not overwritten
once it has been created. This option can be used to force the
(re)writing of the implementation header file.
Output options
--interactive
Generates an interactive scanner, suitable for interactive programs
--debug (-d)
Provide lex() and its support functions with debugging code,
showing the actual scanning process on the standard output
stream. When included, the debugging output is active by default,
but its activity may be controlled using the setDebug(bool
on-off) member. Note that no #ifdef DEBUG macros are used
anymore. By rerunning flexc++ without the --debug option an
equivalent scanner is generated not containing the debugging
code.
--verbose (-V)
Output information on DFA-creation and code generation, specify
multiple times for more output.
--dot
Forces flexc++ not to generate code but to produce graphviz-style
graphs on stdout. Produces NFA graph by default.
--dfa
Together with --dot, this produces a DFA graph, instead of
an NFA.
Miscellaneous options
--version (-v)
Display flexc++'s version number and terminate.
--help (-h)
Write basic usage information to the standard output stream and
terminate.
Not (yet) implemented options
--showfilenames
Write the names of the files that are generated to the
standard error stream.
--namespace=namespace (-n)
Define the scanner base class, the paser class and the scanner
implentations in the namespace namespace. By default
no namespace is defined. If this options is used the
implementation header will contain a commented out using
namespace declaration for the requested namespace.
--lines (-l)
Put #line preprocessor directives in the file containing the
scanner's lex() function. By including this option the
compiler and debuggers will associate errors with lines in your
grammar specification file, rather than with the source file
containing the lex() function itself.
--nolines
Do not put #line preprocessor directives in the file containing
the scanner's lex() function. This option is primarily useful
in combination with the %lines directive, to suppress that
directive. It also overrides option --lines, though.
--nolexmember
Do not write the file containing the scanner's predefined scanner
member functions, even if that file doesn't yet exist. By default
the file containing the scanner's lex() member function is
(re)written each time flexc++ is called. Note that this option
should normally be avoided, as this file contains parsing
tables which are altered whenever the grammar definition is
modified.
DIRECTIVES
The following directives can be used in the initial section of the
grammar specification file. When command-line options for directives exist,
they overrule the corresponding directives given in the grammar
specification file.
Multiple options may be specified on the same line, like %option
classheader="scanner.h" classname="Scanner". Each line with options begins
with the %option directive. Examples show defaults unless indicating
otherwise.
%option classname="scanner-class-name"
Declares the name of this scanner. It defines
the name of the C++ class that will be generated.
Example: %option classname="Scanner"
%option lexfunctionname="lex-function-name"
Declares the name of the lex() function.
Example: %option lexfunctionname="lex"
%option baseclassheader="header"
Defines the pathname of the file containing the scanner
class's base class. This directive is overridden by the
--baseclassheader or -b command-line options.
Example: %option baseclassheader="scannerbase.h"
%option classheader=header
Defines the pathname of the file containing the scanner
class. This directive is overridden by the
--classheader or -c command-line options.
%option implementationheader=header
Defines the pathname of the file containing the implementation
header. This directive is overridden by the
--implementationheader or -i command-line options.
%option lexsource=source
Defines the pathname of the file containing the scanner member
lex(). This directive is overridden by the
--lexsource or -l command-line options.
%option skeletondirectory=path
Defines the path of the skeleton directory. This directive is
overridden by the --skeletondirectory or -S command-line
options.
%option streaminfoclassname="class-name"
Defines the class name of the StreamInfo class, used in stream
switching. See the user guide for more detail.
%option streaminfoinclude="header-file"
Defines the path of the header file containing the definition of
the StreamInfo class.
Not (yet) implemented
%debug
Provide lex() and its support functions with debugging code,
showing the actual parsing process on the standard output
stream. When included, the debugging output is active by default,
but its activity may be controlled using the setDebug(bool
on-off) member. Note that no #ifdef DEBUG macros are used
anymore. By rerunning flexc++ without the --debug option an
equivalent scanner is generated not containing the debugging
code.
%filenamesheader
Defines the generic name of all generated files, unless overridden
by specific names. This directive is overridden by the
--filenames or -f command-line options.
%lines
Put #line preprocessor directives in the file containing the
scanner's lex() function. It acts identically to the -l
command line option, and is suppressed by the --no-lines
option.
%namespacenamespace
Define the scanner class in the namespace namespace. By default
no namespace is defined. If this options is used the
implementation header will contain a commented out using
namespace declaration for the requested namespace. This
directive is overridden by the --namespace command-line
option.
PUBLIC MEMBERS AND -TYPES
The following public members can be used by users of the scanner classes
generated by flexc++. The Scanner:: prefixes are silently implied:
Scanner():
The default constructor. Uses stdin by default, until another stream
is pushed on the stack.
Scanner(std::istream &in):
A constructor taking an istream to use as the default stream.
int lex():
The scanner's scanning member function. It returns either a
user-defined return-value or 0 if it reached EOF and no more streams
can be popped off the stack.
std::string const &match() const: std::string const &text() const:
Returns the last matched lexeme. Is analogous to the YYText()
function in flex.
size_t leng() const:
Returns the length of the last matched lexeme. It is analogous
to the value yyleng in flex.
size_t lineno() const:
Returns the number of lines matched so far. This is done per
stream, so after switching streams, the old number of lines is no
longer accessible with this member.
void pushStreamInfo(StreamInfo *si):
Pushes the current stream on the scanner's stream stack and
initializes the scanner's buffer with the new stream.
void popStreamInfo():
Pops a previously pushed stream off the scanner's stream stack and
resets the scanner's buffer to the popped info.
Called by default on EOF, unless the user provides an action
for the <<EOF>> pattern.
StreamBuffer const switchStream(StreamInfo *si):
Returns the current StreamBuffer and continues processing the
stream defined by the StreamInfo object. All bookkeeping with
regards to streams is now handled by the user.
void switchStream(StreamBuffer const &sb):
Reverts processing to a stream buffer saved previously by
switchStream's caller.
void setDebug(bool mode): (not implemented yet)
This member can be used to activate or deactivate the debug-code
compiled into the parsing function. It is available but has no
effect if no debug code has been compiled into the parsing
function. When debugging code has been compiled into the parsing
function, it is active by default, but debug-code is suppressed by
calling setDebug(false).
The following public types are available:
StreamSwitching::StreamInfo:
Default class to be used when switching streams: opens files and
sets the line number to 1.
StreamSwitching::LineStreamInfo:
Alternative class for switching streams: opens files but makes
sure line numbering is continued from previous file. See the
flexc++ user guide for more information on switching streams.
PROTECTED MEMBER FUNCTIONS
The following members can be used in actions and other member functions:
begin(StartCondition startCondition):
Places the scanner in the corresponding start condition.
ECHO():
Inserts the current match into the scanner's output stream.
less(size_t n = 0): less(n) returns all but the first n characters of the current
token back to the buffer. They will be rescanned when the scanner looks
for the next match. The current match will be adjusted appropriately
(as such, leng() will return n). For example, on the input
`foobar' the following will write out `foobarbar':
%%
foobar ECHO(); less(3);
[a-z]+ ECHO();
An argument of 0 to less() will cause the entire current input
string to be scanned again. Unless you've changed how the scanner will
subsequently process its input (using begin(), for example), this
will result in an endless loop.
more():
Next time the scanner matches a rule, the corresponding token will be
appended to the current match. For example, when presented with the
input `foo-bar', the following will write `foo-foo-bar':
%%
foo- ECHO(); more();
bar ECHO();
The first `foo-' will be written to the output. Then `bar' is matched,
but appended to `foo-'. Therefore, the second ECHO() will output
`foo-bar', not `bar'.
REGULAR EXPRESSIONS
The patterns in the input (see Rules Section of the flexc++ manual) are
written using an extended set of regular expressions. They are summarized
below. Characters x, y and z represent single characters, characters
s and r represent regular expressions.
x
match the character x;
.
any character (byte) except newline;
[xyz]
a character class; in this case, the pattern matches either an x, a
y, or a z;
[abj-oZ]
a character class containing a range; matches an a, a b,
any letter from j through o, or a Z;
[^A-Z]
a negated character class, i.e., any character but those in the
range following the caret. In this case, any character except an
uppercase letter;
[^A-Z\n]
any character except an uppercase letter or a newline;
r*
zero or more rs (note that r represents a regular expression);
r+
one or more rs;
r?
zero or one rs (i.e., an optional r);
{name}
the expansion of the name definition provided in the definition
section of the lexer specification file;
"[xyz]\"foo+"
the literal string: [xy]"foo;
\X
if X is a, b, f, n, r, t, or v, then the ANSI-C
interpretation of \x. Otherwise, a literal X (used to escape
regular expression operators like *, ? and |);
\0
a NUL character (ASCII code 0);
\123
the character with octal value 123;
\x2a
the character with hexadecimal value 2a;
(r)
matches r. Parentheses are used to override precedence (see
below);
rs
concatenation: the regular expression r followed by the regular
expression s;
r|s
either r or s;
r/s
trailing context: an r but only when followed by s. The text
matched by s is included when determining whether this rule is the
longest match, but is then returned to the input before executing its
associated action. So the action only sees the text matched by r
(some combinations of r/s are incorrectly matched by flexc++
(cf. the flexc++ manual for details and examples).
^r
an r, but only at the beginning of a line (i.e., when just
starting to scan, or right after a newline has been scanned);
r$
an r, but only at the end of a line (i.e., just before a
newline). Equivalent to r/\n. Note that flexc++'s notion of
`newline' is equal to your C++ compiler's interpretation of a
\n character. E.g., on
some DOS systems you must either filter the carriage return
(\rs) from the input, or explicitly use r/\r\n instead of
r$.
<s>r
an r, but only in start condition (mini scanner) s;
<s1,s2,s3>r
an r, but only in start condition (mini scanner) s1, s2 or
s3;
<<EOF>>
an end-of-file condition;
<s1,s2><<EOF>>'
an end-of-file when in start condition s1 or s2.
)
Note that within a character class specification all regular expression
operators lose their special meaning except for the escape character (\)
and the character class operators (-, ]], and --at the beginning of the
character class specification-- ^).
PROTECTED ENUMS AND -TYPES
To do
PRIVATE MEMBER FUNCTIONS
To do
PROTECTED DATA MEMBERS
To do
TYPES AND VARIABLES IN THE ANONYMOUS NAMESPACE
To do
DIFFERENCES WITH FLEX(++)
To switch (mini)scanners, begin(miniscannername) is used rather
than BEGIN miniscannername.
The names of miniscanners remain defined as symbolic constants
(defined within the Scanner class), they are no longer handled through
#define directives.
To echo the matched text in a regular expression action ECHO()
should be called. The ECHO macro is not supported anymore.
In general, YY-symbols are no longer used. Below specific
changes are mentioned:
User code can call match() or text() to obtain the last
matched text.
User code can call leng() to obtain the length of the last
matched text.
OBSOLETE SYMBOLS
All DECLARATIONS and DEFINE symbols not listed above but defined
in flex++ are obsolete with flexc++. In particular, there is no
%header{ ... %} section anymore. Also, all DEFINE symbols related to
member functions are now obsolete. There is no need for these symbols anymore
as they can simply be declared in the class header file and defined elsewhere.
CODE BLOCKS
Flexc++ does not support code blocks, except for multi-line actions.
Code previously placed in code blocks can now be placed in methods.
USER CODE
Related to the CODE BLOCKS section, flexc++ does not support a last
section of the input file for user code.
This is an experimental version of Flexc++, very much under
development. There are still many open tickets and the reader is kindly
requested to consult the list of open tickets at
http://code.flexcpp.org/projects/flexcpp/ prior to filing a bug.
However, any bug filed against flexc++ is greatly appreciated and will
receive the authors' undivided attention.
ABOUT FLEXC++
Flexc++ was based on flex++(1), derived from flex(1).
Flexc++ a complete rewrite of a lexical scanner generator, closely
following the theory of deterministic and non-deterministic automatons as
described in Aho, Sethi and Ullman's (1986) book Compilers (i.e., the Dragon
book).
However, flex(1) variables, declarations that are obsolete and (of
course!) C-like macros were removed from flexc++ and replaced by
(member) functions. In particular, all primitive forms of name protection as
used by flex++ were replaced by state-of-the-art name protection
techniques, like class-embedding and using name spaces.
AUTHOR
Frank B. Brokken (f.b.brokken@rug.nl),
Jean-Paul van Oosten (jpoosten@ai.rug.nl)
Richard Berendsen (richardberendsen@xs4all.nl),