Post

Compilers - LR Parsing

Compilers - LR Parsing

$\text{LR}(k)$ parsing is a bottom-up grammar parsing method, for which:

  • $\text{LR}$ stands for “Read from Left to Right, Derive Rightmost Word”.
  • $k$ stands for the number of character to lookahead.

Terminology

There’re some errors in this section. Read with caution.

  • Canonical sentential form (规范句型): sentences derived by rightmost derivation.
  • Canonical reduction (规范规约): left most reduction.
  • Canonical prefix (规范前缀): given canonical sentential form $\alpha \eita$, where $\eita$ is non-terminant or $\varepsilon$, then $\alpha$ is canonical prefix.
  • Canonical viable prefix (规范活前缀): $\alpha$ has form $\alpha’ \pi$, where $\pi$ is a handle or $$.
    • 移入型规范活前缀: $\alpha$ 不含句柄
    • 规约型规范活前缀: $\alpha$ 含一个句柄

LR(0) parsing

In bottom-up approach, we have two actions - shift (移入), and reduce (规约).

The crux: when to perform which.

LR parsing provides two tables (Action and Goto) to clarify this.

Constructing LR automaton

Terminology

  • LR item: dotted production, to represent the process of reduction.
    • shift item: 形如 $S \to \cdot \alpha$
    • reducible item: 形如 $S \to \pi \cdot$,且不是拓广表达式。
  • Accepting state: 包含拓广表达式 $S’ \to S \cdot$ 的状态。

  • $closure(IS)$ is all LR items that can be inferred by LR item set $IS$.
    1. $IS \in closure(IS)$
    2. $\text{If } B \to \alpha \cdot A \beta \text{ and } A \to \pi, \text{ then add } A \to \cdot \pi \text{ to } closure(IS)$
  • $project(IS, X) = \lbrace A \to \alpha \cdot X \beta \mid A \to \alpha \cdot X \beta \in IS \rbrace$
    • 获取下一个符号为 $X$ 的子集。(X 可以为终结符)
  • $shift(IS, X) = \lbrace A \to \alpha X \cdot \beta \mid A \to \alpha \cdot X \beta \in IS \rbrace$
    • 将所有 $IS$ 中下一个符号为 $X$ 的 LR 项目里的 $\cdot$ 都向右移一步。
  • $goto(IS, X) = closure(shift(project(IS, X), X))$

    这里project实际上是重复的,因为shift已经有过滤X了;此处加上只是为了显示地表达需要过滤。

Building the automaton graph

  1. Initially augment the grammar: adding as start production $S’ \to S$
  2. Initial state is $closure(\lbrace S’ \to \cdot S \rbrace)$.
  3. For every availale $NextState = goto(CurrentState, X_i)$, add an edge $CurrentState \xrightarrow{X_i} NextState$.
  4. Repeat step 3 until no any new state, where “$\cdot$” always reaches the end.

$\varepsilon$ shouldn’t be regarded as a symbol, but be ignored.

若某状态包含以下某种冲突,则此文法不是 LR(0) 文法:

  • reduce - reduce conflict
  • shift - reduce conflict

Constructing LR parse tables

Before constructing LR parse tables, we should first construct LR automaton.

Action/Goto table example

Here is an example action/goto table:

State Action     Goto  
  id + # E T
0 s5     1 2
1   s6 acc    
2 r2 r2 r2    

, where action table only contains terminals while goto table only contains non-terminals.

Definition

\[\begin{align*} Action(S_i, X) &= \begin{cases} s_j, & \text{if there is an edge } S_i \xrightarrow{X} S_j, \\ r_j, & \text{if } S_i \text{ contains only reducible } Production_j, \\ Accept, & \text{if } S_i \text{ is an accepting state}. \end{cases} \\ Goto(S_i, X) &= j, \quad \text{ if there is an edge } S_i \xrightarrow{X} S_j \end{align*}\]

Using LR parsing tables

使用 LR(0) 驱动器算法:

LR(0) 驱动器算法 LR(0) 驱动器算法

SLR(1) parsing

LR(0) 分析限制过大,而 SLR(1) 分析减少了 Action 表中的规约列,进而减少冲突,更易满足条件。

相较 LR(0) 分析,对 LR 项目 $X \to \pi \cdot$ 来说,规约列只有 $FOLLOW(X)$。因此,只要所有规约项目 (reducible item) 的 FOLLOW 集及移入项目 (shift item) 两两之间无冲突,文法就是 SLR(1) 文法。

LR(1) parsing

SLR(1) 分析在 FOLLOW 集与 shift item 冲突时也不可用。但 LR(1) 分析只考虑产生式中 实际有的符号 来替代 FOLLOW 集,从而再次减少冲突。

具体地:

  • LR 项目后需另附展望符 (lookahead)。形式为:$X \to Y, a/b/\dots$。
  • $closure(IS) = IS \cup \lbrace A \to \cdot \pi, FIRST(\beta x_i)/\dots \mid X \to \alpha \cdot A \beta, x_i \in IS \rbrace$。

    即:

    • $IS \in closure(IS)$
    • 对于 $IS$ 中所有形如 $X \to \alpha \cdot A \beta, x_i$ 的 LR 项目,$A \to \cdot \pi, FIRST(\beta x_i)/\dots$ 也属于 $closure(IS)$。

[!WARNING] LR 项目的展望符也属于其签名,故展望符不同的 LR 项目算不同的项目。

LALR(1) parsing

Terminology

  • Core (项目核心/项目的心): LR item without lookahead (in a form of $X \to \alpha \cdot Y \beta$).

Merging LR(1) States

LALR(1) involves merging states in LR(1) automaton.

If two states have the same core, merge their lookaheads (for each item, merge individually).

For instance:

Before:

\[\begin{align*} S_1 &= \begin{cases} X \to Y, p\\ A \to B, q \end{cases} \\ S_2 &= \begin{cases} X \to Y, r\\ A \to B, s \end{cases} \end{align*}\]

After:

\[S' = \begin{cases} X \to Y, p/r\\ A \to B, q/s \end{cases}\]

Relationship among LR parsing methods

$LR(0) \subseteq SLR(1) \subseteq LALR(1) \subseteq LR(1) \subset$ 二义性文法

无法构造 LR 分析表的文法称为二义性文法。

This post is licensed under CC BY 4.0 by the author.