🎓 MASTER NOTES: COMPILER CONSTRUCTION (Lectures 23–45)
PART 1: THE BRAIN OF THE COMPILER (Advanced Parsing)
(Covering Lectures 23–30)
1. The "Handle" Concept
What is it? When the parser looks at the code, it tries to find a specific pattern it can "fold up" (reduce) into a grammar rule. This pattern is called a Handle1.
• The Goal: We want to recognize handles to reduce the stack1.
• The Problem: Sometimes we don't know if we should "Shift" (add more items) or "Reduce" (fold them up). This requires looking ahead at the next token2,3.
2. LR(1) Parsing
What is it? A powerful parsing method. "L" stands for Left-to-right scanning, "R" for Rightmost derivation, and "(1)" means it looks 1 symbol ahead to make decisions4.
• The State: The parser builds a huge map (DFA) where every "State" is a set of rules showing where the "dot" • is.
◦ Example: A → α • β means "I have seen Alpha, and I am expecting Beta"5.
• Closure: If the dot is before a Non-Terminal (like • E), we have to expand E to see what it starts with. This is called Closure6,7.
3. Conflicts (When the Parser gets Confused)
Sometimes the grammar is ambiguous, leading to two types of errors:
• Shift/Reduce Conflict: The parser doesn't know whether to stack the current item or reduce the existing stack.
◦ ⚡ Real World Example: The Dangling Else. In if A then if B then X else Y, does the else belong to the first if or the second?
◦ Solution: The parser prefers to SHIFT. It matches the else to the nearest if8,9.
• Reduce/Reduce Conflict: The parser sees two different rules that both match. This usually means the grammar is badly designed10.
4. LALR(1) vs. LR(1)
• LR(1): Very precise but creates massive tables (thousands of states)10.
• LALR(1) (LookAhead LR): An efficiency hack. If two states do the same thing but just have different lookaheads, we merge them.
◦ Benefit: Tables are 10x smaller.
◦ Tools: YACC and Bison use this method11,12.
--------------------------------------------------------------------------------
PART 2: GIVING MEANING TO CODE (Semantics)
(Covering Lectures 31–34)
1. Attribute Grammars
We don't just check if the code is correct (Syntax); we calculate values (Semantics). We attach Attributes (values) to the tree nodes13.
• Synthesized Attributes: Information flows BOTTOM-UP (from children to parent).
◦ Example: Calculating 3 + 5. The parent gets the value 8 from the children 3 and 514.
• Inherited Attributes: Information flows TOP-DOWN or SIDEWAYS (from parent/siblings to child).
◦ Example: Telling a variable "You are an Integer" flows down from the declaration int x14.
2. Ad-Hoc Translation (The "Action" Hack)
Instead of building a fancy tree first, we execute small snippets of C code during the parsing.
• How it works: When the parser "Reduces" a rule, it runs a code snippet immediately15.
• YACC Style:
◦ $1, $2, $3 = The values of the children.
◦ $$ = The result value of the parent.
◦ Code: expr : expr PLUS expr { $$ = $1 + $3; }16.
--------------------------------------------------------------------------------
PART 3: THE BLUEPRINT (Intermediate Representation)
(Covering Lectures 35–37)
Before creating machine code (which is hard), we create an "Intermediate" version (IR)17.
1. Visual IRs
• AST (Abstract Syntax Tree): A clean tree without punctuation (parentheses/semicolons removed). It just shows the logic18.
• DAG (Directed Acyclic Graph): Like an AST, but if x*2 happens twice, we point to the same node to save space. It avoids duplication19.
2. Linear IR (Three-Address Code)
This looks like assembly language but for a fake, generic machine.
• Format: x = y op z (Result = Operand1 Operator Operand2)20.
• Why use it? It breaks complex math into tiny steps.
◦ Source: a = b + c * d
◦ 3-Address Code:
1. t1 = c * d
2. a = b + t121,22.
--------------------------------------------------------------------------------
PART 4: FLOW CONTROL & BACKPATCHING
(Covering Lectures 38–41)
1. Translating If and While
When translating if (x < y) goto L1, the compiler often doesn't know where L1 is yet because it hasn't read that far in the file23.
2. The Solution: Backpatching
This is a "Time Travel" technique for compilers.
1. Emit Empty Jump: The compiler writes if x < y goto ___ (leaving a blank space).
2. Make a List: It adds this line number to a "Truelist" (Needs fixing later).
3. Fill it later: When the compiler finally reaches the target label, it goes back to the blank space and writes the correct address24,25.
🌍 Real World Example: Writing a check but leaving the "Pay to the Order of" line blank until you know exactly who you are paying.
--------------------------------------------------------------------------------
PART 5: BUILDING THE FINAL CODE
(Covering Lectures 42–45)
1. Code Generation Issues
We need to map our "Blueprint" (3AC) to actual CPU machine code.
• Cost: We want the cheapest code. MOV R1, R2 (Cost 1) is better than moving data to Memory (Cost 2)26.
2. Basic Blocks & Flow Graphs
• Basic Block: A chunk of code with one entry (top) and one exit (bottom). You cannot jump into the middle of it27.
• Control Flow Graph (CFG): We connect these blocks with arrows to show how the program runs (loops, ifs, elses)28.
3. Liveness Analysis
• Question: "Is variable X needed after this line?"
• Why ask? If X is Live (needed later), we must keep it safe in a Register or Memory. If it's Dead (not needed), we can delete it and use that space for something else29,30.
4. Register Allocation (The Hard Part)
CPUs have very few Registers (fast storage). We have many variables.
• The Strategy: Graph Coloring.
• How it works: We draw a graph where variables are dots. If two variables are alive at the same time, we draw a line between them. We try to color the dots so no connected dots have the same color.
◦ Colors = Registers.
◦ If we run out of colors, we must Spill (save a variable to slow RAM)31,32.
🌍 Real World Example: Register Allocation is like Juggling. You have 2 hands (Registers). You have 3 balls (Variables). You can only hold 2 at once. To catch the 3rd one, you must throw one up (Spill) or put it in your pocket (Memory).