🌱TODO LIST:
- Finish Mx.g4
- Finish AST Architecture
- FinishSymbolCollector
- Finish SemanticCheck
- Finish IRBuilder
- Finish Codegen
- Finish Mem2Reg Optimization
- Finish graph coloring optimization
- Finish const optimization
- Finish redundant instruction removal optimization (but opt2 didn’t gain any points)
[TOC]
ASTNode: Abstract class serving as the base for all AST nodes.
Core Components
ASTBuilder
- Responsible for initial tree construction, linking nodes starting from the root node.
Utility
- Position Class: Tracks the position of each word in every statement.
- Scope Class: Records variables defined within a scope and those inherited from the previous level.
- GlobalScope: Inherits from
Scope. Tracks all defined types and global variables.
- GlobalScope: Inherits from
- Type Class: Records basic variable forms and newly defined
classNode.
ASTNode Hierarchy
1. ASTNode (Base Class)
- Virtual functions for all nodes.
- Each node contains an
acceptfunction connected to theASTVisitorinterface. Theacceptfunction calls differentvisitorfunctions inASTVisitorbased on the node type, enabling diverse methods.
2. RootNode
- Represents the root of the entire program.
3. StmtNode (Statement Node)
-
Base class for all statement nodes.
-
Includes a
positionfield to locate words.Subtypes:
- blockStmt: Introduces a new curly-brace scope, containing all statements within the scope.
- varDefStmt: Represents
type + name + expression. - classDefStmt: Represents
name + memberList + functionList. - funDefStmt: Represents
name + returnType + parameterList + funcBodyBlock. - ifStmt: Represents
condition + thenStmt + elseStmt. - forStmt: Represents
(initStmt + condition + updateExpr) + forBodyBlock. - whileStmt: Represents
condition + whileBodyBlock. - returnStmt: Represents
Type. - exprStmt: Represents an expression statement.
- ctrlStmt: Includes
breakandcontinue.
4. ExprNode (Expression Node)
-
Base class for all expression nodes.
-
Includes
Typefor the expression type andentityfor the instance value.Subtypes:
- atomExpr: Atomic values.
- funcExpr: Includes lambdas, global functions, and class methods.
- ConstExpr: Includes constants like
int,string,bool. - varExpr: Includes identifiers, classes, class members,
this, andthis.member.
- binaryExpr: Binary operations including
operator,leftOperand, andrightOperand. - cellExpr: Unary operations with a single operator.
- delayCellExpr: Postfix unary operations like
a++,a--. - assignExpr: Assignment operations with
leftOperandandrightOperand. - newExpr: Represents
newstatements for arrays and classes. - dotVarExpr: Dot notation for accessing class members.
- dotFuncExpr: Dot notation for calling class methods.
- atomExpr: Atomic values.
5. Atom Class
- More fundamental nodes.
Subtypes:
- NewArrDemNode: Records content inside
new[Expression?]brackets. - SingleVarDefNode: Records a single variable definition.
- TypeNode: Records variable types, including whether it is an array.
- NewArrDemNode: Records content inside
ASTVisitor Interface
-
Provides a way to traverse the AST, with functions named
visit. -
Calls specific methods based on the type of
ASTNode.Subtypes:
- SymbolCollector: Traverses the tree from the root, collecting all defined classes into
gScope.typesand gathering functions. - SemanticCheck: Traverses the tree from the root, performing semantic checks.
- SymbolCollector: Traverses the tree from the root, collecting all defined classes into
Error Class
- Error (Base Class): Virtual function to record error messages and locations.
- SemanticError: Represents semantic errors.
- SyntaxError: Represents syntax errors.
ClassDef and FunctionDef support forward references.
-
firstVisit
Traverse all ClassDef nodes first, define all types, check for duplicate names, and add them to gScope.types.
Next, traverse all global functions, update their return types and parameter lists, and store them in funDef. -
secondVisit
Traverse all ClassDef nodes again to collect class members and methods. For methods, only update their return types and parameter lists.
At this stage, aside from the already updated values (types in ClassDef and FunctionDef), the types of other vardef nodes have not yet been updated. Be mindful of this!
Reference materials:
(Provide relevant links or references here)
- Tutorial of Yx
- Antlr 介绍:(127条消息) ANTLR4_pourtheworld的博客-CSDN博客
Type Class Modifications:
- Assign an
IRTypecorresponding to eachType. - Features like
aligncan be omitted entirely (try to identify which ones are unnecessary and remove them to keep the implementation lightweight). Optimize wherever possible.
- Create a new block for each
||and&&operation to handle the control flow jumps. - Avoid using
phinodes directly. Instead, use a virtual register to store all assignment statements that occur during the||process. Finally, thebranchinstruction will use the value stored in the virtual register (acting as thephinode).
- Recursively expand each level of the
mallocprocess at the IR layer. - For each level, construct a
forloop in IR to performmalloc. - The allocated address must reserve an
i32space at the beginning to store thesize. Shift the pointer forward by the size ofi32to use it as a normal pointer. For subsequentarr.size()operations, usegetelementptr -1to retrieve
-
Convert corresponding IR instructions into ASM instructions, replacing stack space with virtual registers.
-
Standardize the size of all variables (
bool,int,ptr) toi32. -
Add
rd,rs1,rs2, andimmfields to each instruction to facilitate the mapping to virtual registers:rdrepresentsdef(the destination of the instruction).rs1andrs2representuse(source operands).
-
For global variables, the address is not located in stack space. Use
lato load the actual address. -
Place function arguments in registers
a0, ..., a7. If there are more than 8 arguments, store the overflow in the stack starting fromspupwards. -
Use
a0to store the function return value.
Push all registers onto the stack. Use lw to load values when needed, and use sw to store values if there is a result in rd after computation.
Perform liveness analysis, build an interference graph, and assign different colors to conflicting nodes.
Assume there are K colors:
- simplify: Remove nodes with degrees less than K that are unrelated to
move(does not affect coloring). - coalesce: Merge
moveinstructions that can be combined according to the rules. - freeze: Give up merging certain unmergeable nodes and try simplifying again.
- spill: Select high-degree nodes to spill, remove these nodes, and attempt simplification again.
assign color:
Perform graph coloring and check if spilled nodes can be colored (e.g., two conflict edges having the same color).
rewrite program:
If coloring fails, place spilled nodes onto the stack, use virtual registers instead, perform lw when needed, and store (sw) if there is an rd. Repeat the process until coloring succeeds.
**Tips for graph