Multiple input buffers for Flex
Some scanners (such as those which support "include" files) require reading from several input streams. As flex
scanners do a large amount of buffering, one cannot control where the next input will be read from by simply writing a YY_INPUT
which is sensitive to the scanning context. YY_INPUT
is only called when the scanner reaches the end of its buffer, which may be a long time after scanning a statement such as an "include" which requires switching the input source.
To negotiate these sorts of problems, flex
provides a mechanism for creating and switching between multiple input buffers. An input buffer is created by using:
YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
which takes a FILE
pointer and a size and creates a buffer associated with the given file and large enough to hold size characters (when in doubt, use YY_BUF_SIZE
for the size). It returns a YY_BUFFER_STATE
handle, which may then be passed to other routines (see below). The YY_BUFFER_STATE
type is a pointer to an opaque struct
yy_buffer_state
structure, so you may safely initialize YY_BUFFER_STATE variables to ((YY_BUFFER_STATE) 0)
if you wish, and also refer to the opaque structure in order to correctly declare input buffers in source files other than that of your scanner. Note that the FILE
pointer in the call to yy_create_buffer
is only used as the value of yyin
seen by YY_INPUT
; if you redefine YY_INPUT
so it no longer uses yyin
, then you can safely pass a nil FILE
pointer to yy_create_buffer
. You select a particular buffer to scan from using:
void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
switches the scanner's input buffer so subsequent tokens will come from new_buffer. Note that yy_switch_to_buffer()
may be used by yywrap()
to set things up for continued scanning, instead of opening a new file and pointing yyin
at it. Note also that switching input sources via either yy_switch_to_buffer()
or yywrap()
does not change the start condition.
void yy_delete_buffer( YY_BUFFER_STATE buffer )
is used to reclaim the storage associated with a buffer. You can also clear the current contents of a buffer using:
void yy_flush_buffer( YY_BUFFER_STATE buffer )
This function discards the buffer's contents, so the next time the scanner attempts to match a token from the buffer, it will first fill the buffer anew using YY_INPUT
.
yy_new_buffer()
is an alias for yy_create_buffer()
, provided for compatibility with the C++ use of new
and delete
for creating and destroying dynamic objects.
Finally, the YY_CURRENT_BUFFER
macro returns a YY_BUFFER_STATE
handle to the current buffer.
Here is an example of using these features for writing a scanner which expands include files (the <<EOF>>
feature is discussed below):
/* the "incl" state is used for picking up the name * of an include file */ %x incl %{ #define MAX_INCLUDE_DEPTH 10 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; int include_stack_ptr = 0; %} %% include BEGIN(incl); [a-z]+ ECHO; [^a-z\n]*\n? ECHO; <incl>[ \t]* /* eat the whitespace */ <incl>[^ \t\n]+ { /* got the include file name */ if ( include_stack_ptr >= MAX_INCLUDE_DEPTH ) { fprintf( stderr, "Includes nested too deeply" ); exit( 1 ); } include_stack[include_stack_ptr++] = YY_CURRENT_BUFFER; yyin = fopen( yytext, "r" ); if ( ! yyin ) error( ... ); yy_switch_to_buffer( yy_create_buffer( yyin, YY_BUF_SIZE ) ); BEGIN(INITIAL); } <<EOF>> { if ( --include_stack_ptr < 0 ) { yyterminate(); } else { yy_delete_buffer( YY_CURRENT_BUFFER ); yy_switch_to_buffer( include_stack[include_stack_ptr] ); } }
Three routines are available for setting up input buffers for scanning in-memory strings instead of files. All of them create a new input buffer for scanning the string, and return a corresponding YY_BUFFER_STATE
handle (which you should delete with yy_delete_buffer()
when done with it). They also switch to the new buffer using yy_switch_to_buffer()
, so the next call to yylex()
will start scanning the string.
yy_scan_string(const char *str)
- scans a NUL-terminated string.
yy_scan_bytes(const char *bytes, int len)
- scans
len
bytes (including possibly NUL's) starting at location bytes.
Note that both of these functions create and scan a copy of the string or bytes. (This may be desirable, since yylex()
modifies the contents of the buffer it is scanning.) You can avoid the copy by using:
yy_scan_buffer(char *base, yy_size_t size)
- which scans in place the buffer starting at base, consisting of size bytes, the last two bytes of which must be
YY_END_OF_BUFFER_CHAR
(ASCII NUL). These last two bytes are not scanned; thus, scanning consists ofbase[0]
throughbase[size-2]
, inclusive.If you fail to set up base in this manner (i.e., forget the final two
YY_END_OF_BUFFER_CHAR
bytes), thenyy_scan_buffer()
returns a nil pointer instead of creating a new input buffer.The type
yy_size_t
is an integral type to which you can cast an integer expression reflecting the size of the buffer.