Software Secret Weapons™
|
From LEXYACC To AST by Pavel Simakov on 2006-03-24 00:10:43 under Code Linguine, view comments |
||||
|
The grammars in Code Linguine package were built using modification to Version 4.1 of TPLY (Turbo Pascal Lex and Yacc), a compiler generator for Turbo Pascal and compatibles. Normally Lex/Yacc produces a list of tokens for an input file. However, we needed to produce an abstract syntax tree (AST) for the input file, not just a list. The key difference is that TPLY does not produce trees. Our modifications to TPLY convert list of tokens into trees with the help of frames. A "frame" is a container for a group of tokens. Frames can be directly converted into AST trees. A simple example follows to illustrate conversion of token list into frames followed by conversion of frames into AST tree. Here is the Pascal source file:
unit Test;
interface
uses A,B;
implementation
end.
Here is list of non-whitespace tokens normally produced by TPLY:
0=unit
1=Test
2=;
3=interface
4=uses
5=A
6=,
7=B
8=;
9=implementation
10=end
11=.
Here is list of rule completion frames. Each frame specifies start and end token for the corresponding YACC rule followed by the rule id:
[1, 1]53
[0, 2]52
[5, 5]20
[5, 5]19
[7, 7]20
[5, 7]18
[4, 8]17
[4, 4]66
[3, 4]54
[9, 9]16
[9, 9]68
[9, 9]55
[10, 10]56
[0, 11]51
[0, 0]2
Here is the token enclosure frames derived from the frames.
0----|
1]-53|-52-----------------------|
2----| |
3-------------------------| |
4----------------| |-54 |
5]-20]-19----| | | |
6 |-18|-17]-66-| |-51]-2
7]-20--------| | |
8----------------| |
9]-16]-68 |
10]-56 |
11-------------------------------|
Our modifications were quite complex algorithmically, but overall did not slow down parsing. Parsing is still being done in a single pass in LLR(0) and LLR(1) style with no backtracking. It turns out that for any completed YACC rule around the token (X, Y) the frame must be formed not around tokens X and Y. It must be formed around all the tokens that are enclosed by any YACC rule that encloses X and Y. The frame containment must be resolved while parsing is in progress as YACC enclosures are being found in the source file. The obtained AST trees can be saved as XML or in any other format as desired. XML format gives an advantage of XSLT query and ease of parsing. For the complete details and full source code please look inside Code Linguine. |
|
||||
|
Copyright © 2004-2010 by Pavel Simakov any conclusions, recommendations, ideas, thoughts or the source code presented on this site are my own and do not reflect a official opinion of my current or past employers, partners or clients |
|
No comments yet
Leave a comment