[code notes] ecpg precompiler 1

This note will introduce the workflow of parse.pl of the ecpg precompiler. Run the precompiler:

perl parse.pl . ../../../backend/parser/gram.y

workflow

  • load ecpg.addons into a memory hash table. The key is composed of string literals from a production rule with no delimiter. The value is also a hash table which has two keys(type and lines). The type value could be of: block, rule and addon. The lines value is the code from below the addons definition. For more detail, see src/interfaces/ecpg/preproc/README.parser.
    • For block type, the attached code is completely written as the new semantic action.
    • For rule type, the attached new rules is directly appended to the original rule.
    • For addon type, the attached code is prepended to the original semantic action.
  • read the gram.y file line by line until end of file
    • split the line by space into an array
    • load ecpg.tokens file content into memory buffer with tag tokens if not yet
    • load ecpg.header file content into memory buffer with tag header if not yet
    • load ecpg.types file content into memory buffer with tag ecpgtype if not yet
    • for each token line without token type specified in gram.y(includes %token, %nonassoc, %left, etc), add each word to the tokens set. Also, reconcatenate the words with single space. For the token %nonassoc IDENT, add one more token %nonassoc CSTRING. Finally, add the reconcatenated token line into memory buffer with tag orig_tokens.
    • skip other lines until the bison grammar rules section reached
    • read each rule until rule delimiter ';'. In the process, we skip semantic action and only take care of the rule symbol.
    • if the rule symbol is in replace_token hash table, update the rule symbol.
    • if the rule symbol is a non-terminal symbol,
      • and is not defined in replace_types hash table, then set the rule symbol type to str and mark this rule as 'copymode'.
      • and is told being ignored, then go to read the next line.
      • populate the memory buffer tagging with 'rules' with this non-terminal symbol
      • if the non-terminal symbol is stmt, remember the state.
      • define the type of the non-terminal symbol, such as %type <str> stmt, and then populate the memory buffer with tag 'types'.
      • remember we're in a rule and going to process the remaining fields
    • if the rule symbol is '%prec', mark this state
    • if we're in 'copymode' and no '%prec' found and in processing the remaining fields,
      • if the following two conditions are met:
        • the symbol is not 'Op', and it is in the tokens set or it is a single quoted string
        • we'are in stmt rule,
          then, get the target string if this symbol is in replace_string hash table otherwise use this symbol string as target. Push the target string, lowcase it if we're not in stmt rule, into fields array.
      • else, push the $n into fields array where n is one plus the length of fields array.
  • dump the memory buffer as the following order:
    • header
    • tokens
    • types
    • ecpgtype
    • orig_tokens
    • rules
    • trailer
      The trailer buffer is loaded with ecpg.trailer file contents.

posted on 2024-04-19 11:35  winter-loo  阅读(5)  评论(0编辑  收藏  举报

导航