SHORT SUMMARY NOTES
SHORT SUMMARY NOTES
Here is a summary of Lectures 23 through 45 based on the provided sources, organized by chapter/lecture in an easy-to-read format.
Handles: The critical insight for LR parsers is that a grammar generates a finite set of handles, which are substrings that match the right-hand side of a production.
Right Context: The parser cannot simply match a handle; it must recognize the correct handle based on the context to its right (lookahead) to decide whether to shift or reduce.
LR(1) Item: This is a pair consisting of a production with a dot (indicating position) and a lookahead symbol, written as $[X \to \alpha \bullet \beta, a]$.
Canonical Collection (CC): This is the model used to represent the parser's states.
Augmentation: To simplify the process, a new start symbol $S'$ is added to the grammar with the production $S' \to S$.
Closure Procedure: This algorithm completes a state. If the dot is before a non-terminal $Y$, it adds items for all productions of $Y$, calculating lookaheads based on $FIRST(\beta a)$.
Goto Procedure: This derives new parser states from existing ones. If a state contains $[X \to \alpha \bullet y\beta, b]$, a transition on symbol $y$ leads to a state containing $[X \to \alpha y \bullet \beta, b]$.
NFA to DFA: LR(1) items form a Nondeterministic Finite Automaton (NFA) that tracks the stack and parse progress; this is converted to a DFA for the parser.
Algorithm: The construction starts with the closure of the initial item $[S' \to \bullet S, $]$. It repeats a cycle of marking processed sets and computing goto transitions for grammar symbols until the collection stops changing.
Action Table:
Shift: If $[A \to \alpha \bullet a \beta, b]$ is in the state and goto on terminal a leads to state $j$, set Action to "shift j".
Reduce: If $[A \to \alpha \bullet, a]$ is in the state, set Action to "reduce $A \to \alpha$".
Accept: If $[S' \to S \bullet, $]$ is present, set Action to "accept".
Goto Table: Filled based on transitions for non-terminals.
Skeleton Parser: A standard loop that checks the Action table to push tokens (shift) or pop items (reduce).
Shift/Reduce Conflict: Occurs when a state allows both shifting a token and reducing a production, typically due to ambiguity like the "dangling else" problem. The usual resolution is to shift.
Reduce/Reduce Conflict: Occurs when a state contains two different completed productions (dots at the end) with the same lookahead.
LALR(1): To reduce table size, states with the same "core" (production parts without lookaheads) are merged. This results in tables 10 times smaller than LR(1).
Tools: Generators like YACC and Bison handle LALR(1) grammars, while ANTLR handles LL(1).
Structure: YACC files consist of definitions, rules, and functions, separated by %%. It often works alongside Lex for scanning.
Context-Sensitive Analysis: Answers questions about values rather than just syntax.
Synthesized Attributes: Values defined by children nodes; information flows bottom-up.
Inherited Attributes: Values defined by parents or siblings; information flows top-down or laterally.
Ad-Hoc Scheme: Instead of formal attribute grammars, code snippets (actions) are associated with productions and executed during parsing (e.g., upon reduction).
Triple Stack: The parser stack is modified to store triples of <value, symbol, state>.
Reductions: When reducing, the parser pops items, computes a new value using the popped values, and pushes the result.
Value References: YACC uses $$ for the left-hand side value and $1, $2, etc. for right-hand side values.
Precedence: Declarations like %left PLUS specify associativity and precedence to resolve conflicts.
Graphical IR: Includes Parse Trees and Abstract Syntax Trees (AST). ASTs are concise and remove extraneous derivation details.
Linear IR: Includes Stack-machine code (one-address) and Three-address code (models modern processors).
DAGs: Directed Acyclic Graphs avoid duplicating common subtrees found in ASTs.
Format: Instructions generally look like x = y op z.
Helpers: Functions like lookup(name) (symbol table), newtemp() (create temporary variable), and emit() (generate code) are used in translation.
Structure: A common implementation of 3-address code using four fields: Target, Op, Arg1, and Arg2.
Storage: Can be stored in an array (where the index is the label) or a linked list.
Labels: Boolean expressions use attributes E.true and E.false to represent jump targets.
Code Generation: Statements like if E then S1 are translated by concatenating code blocks: generated code for E, a label for true, and code for S1.
Control Flow Method: Boolean values are represented by position (which code block is reached) rather than calculating a value.
Backpatching: Allows single-pass code generation. Jumps are generated with empty targets and added to lists (truelist, falselist). These are filled ("patched") later when the target label is known.
Mechanism: A production $M \to \epsilon$ is added. Its semantic action records the current instruction index (nextquad) so it can be used as a jump target later.
Trace: Examples show how backpatch merges lists and fills in specific quad numbers once they are generated.
Implementation: Complex constructs like if-then-else and while loops are translated using backpatch, merge, and marker non-terminals to manage jumps correctly in one pass.
Tasks: The generator handles memory management, instruction selection, instruction scheduling, and register allocation.
Cost: Instructions are assigned costs (e.g., +1 for memory access). Simple code generators often produce inefficient code by treating every IR statement locally.
Basic Blocks: A sequence of instructions with a single entry and single exit.
Partitioning: Code is split into blocks by identifying "Leaders" (first statement, targets of gotos, statements after gotos).
CFG: A graph where nodes are basic blocks and edges represent flow of control.
Liveness: A variable is "live" if it holds a value that will be used later.
Descriptors: The generator uses Register Descriptors (what is in a register) and Address Descriptors (where a variable is located) to optimize register usage within a basic block.
DAGs: Used to optimize basic blocks by detecting common sub-expressions (nodes with multiple parents).
Register Allocation: A hard problem (NP-complete) often solved using heuristics like Graph Coloring. An interference graph is built where edges connect variables that are live simultaneously; the graph is then "colored" using $K$ registers.