Compilers - LR Parsing

Posted Nov 24, 2025 Updated Jan 12, 2026

By ShelpAm

5 min read

$\text{LR}(k)$ parsing is a bottom-up grammar parsing method, for which:

$\text{LR}$ stands for “Read from Left to Right, Derive Rightmost Word”.
$k$ stands for the number of character to lookahead.

Terminology

There’re some errors in this section. Read with caution.

Canonical sentential form (规范句型): sentences derived by rightmost derivation.
Canonical reduction (规范规约): left most reduction.
Canonical prefix (规范前缀): given canonical sentential form $\alpha \eita$, where $\eita$ is non-terminant or $\varepsilon$, then $\alpha$ is canonical prefix.
Canonical viable prefix (规范活前缀): $\alpha$ has form $\alpha’ \pi$, where $\pi$ is a handle or $$.
- 移入型规范活前缀: $\alpha$ 不含句柄
- 规约型规范活前缀: $\alpha$ 含一个句柄

LR(0) parsing

In bottom-up approach, we have two actions - shift (移入), and reduce (规约).

The crux: when to perform which.

LR parsing provides two tables (Action and Goto) to clarify this.

Constructing LR automaton

Terminology

LR item: dotted production, to represent the process of reduction.
- shift item: 形如 $S \to \cdot \alpha$
- reducible item: 形如 $S \to \pi \cdot$，且不是拓广表达式。
Accepting state: 包含拓广表达式 $S’ \to S \cdot$ 的状态。
$closure(IS)$ is all LR items that can be inferred by LR item set $IS$.
1. $IS \in closure(IS)$
2. $\text{If } B \to \alpha \cdot A \beta \text{ and } A \to \pi, \text{ then add } A \to \cdot \pi \text{ to } closure(IS)$
$project(IS, X) = \lbrace A \to \alpha \cdot X \beta \mid A \to \alpha \cdot X \beta \in IS \rbrace$
- 获取下一个符号为 $X$ 的子集。(X 可以为终结符)
$shift(IS, X) = \lbrace A \to \alpha X \cdot \beta \mid A \to \alpha \cdot X \beta \in IS \rbrace$
- 将所有 $IS$ 中下一个符号为 $X$ 的 LR 项目里的 $\cdot$ 都向右移一步。
$goto(IS, X) = closure(shift(project(IS, X), X))$

这里project实际上是重复的，因为shift已经有过滤X了；此处加上只是为了显示地表达需要过滤。

Building the automaton graph

Initially augment the grammar: adding as start production $S’ \to S$
Initial state is $closure(\lbrace S’ \to \cdot S \rbrace)$.
For every availale $NextState = goto(CurrentState, X_i)$, add an edge $CurrentState \xrightarrow{X_i} NextState$.
Repeat step 3 until no any new state, where “$\cdot$” always reaches the end.

$\varepsilon$ shouldn’t be regarded as a symbol, but be ignored.

若某状态包含以下某种冲突，则此文法不是 LR(0) 文法：

reduce - reduce conflict
shift - reduce conflict

Constructing LR parse tables

Before constructing LR parse tables, we should first construct LR automaton.

Action/Goto table example

Here is an example action/goto table:

State	Action			Goto
	id	+	#	E	T
0	s5			1	2
1		s6	acc
2	r2	r2	r2

, where action table only contains terminals while goto table only contains non-terminals.

Definition

\[\begin{align*} Action(S_i, X) &= \begin{cases} s_j, & \text{if there is an edge } S_i \xrightarrow{X} S_j, \\ r_j, & \text{if } S_i \text{ contains only reducible } Production_j, \\ Accept, & \text{if } S_i \text{ is an accepting state}. \end{cases} \\ Goto(S_i, X) &= j, \quad \text{ if there is an edge } S_i \xrightarrow{X} S_j \end{align*}\]

Using LR parsing tables

使用 LR(0) 驱动器算法：

LR(0) 驱动器算法

SLR(1) parsing

LR(0) 分析限制过大，而 SLR(1) 分析减少了 Action 表中的规约列，进而减少冲突，更易满足条件。

相较 LR(0) 分析，对 LR 项目 $X \to \pi \cdot$ 来说，规约列只有 $FOLLOW(X)$。因此，只要所有规约项目 (reducible item) 的 FOLLOW 集及移入项目 (shift item) 两两之间无冲突，文法就是 SLR(1) 文法。

LR(1) parsing

SLR(1) 分析在 FOLLOW 集与 shift item 冲突时也不可用。但 LR(1) 分析只考虑产生式中 实际有的符号 来替代 FOLLOW 集，从而再次减少冲突。

具体地：

LR 项目后需另附展望符 (lookahead)。形式为：$X \to Y, a/b/\dots$。
$closure(IS) = IS \cup \lbrace A \to \cdot \pi, FIRST(\beta x_i)/\dots \mid X \to \alpha \cdot A \beta, x_i \in IS \rbrace$。

即：
- $IS \in closure(IS)$
- 对于 $IS$ 中所有形如 $X \to \alpha \cdot A \beta, x_i$ 的 LR 项目，$A \to \cdot \pi, FIRST(\beta x_i)/\dots$ 也属于 $closure(IS)$。

[!WARNING] LR 项目的展望符也属于其签名，故展望符不同的 LR 项目算不同的项目。

LALR(1) parsing

Terminology

Core (项目核心/项目的心): LR item without lookahead (in a form of $X \to \alpha \cdot Y \beta$).

Merging LR(1) States

LALR(1) involves merging states in LR(1) automaton.

If two states have the same core, merge their lookaheads (for each item, merge individually).

For instance:

Before:

\[\begin{align*} S_1 &= \begin{cases} X \to Y, p\\ A \to B, q \end{cases} \\ S_2 &= \begin{cases} X \to Y, r\\ A \to B, s \end{cases} \end{align*}\]

After:

\[S' = \begin{cases} X \to Y, p/r\\ A \to B, q/s \end{cases}\]

Relationship among LR parsing methods

$LR(0) \subseteq SLR(1) \subseteq LALR(1) \subseteq LR(1) \subset$ 二义性文法

无法构造 LR 分析表的文法称为二义性文法。

compilers

This post is licensed under CC BY 4.0 by the author.

Terminology

LR(0) parsing

Constructing LR automaton

Terminology

Building the automaton graph

Constructing LR parse tables

Action/Goto table example

Definition

Using LR parsing tables

SLR(1) parsing

LR(1) parsing

LALR(1) parsing

Terminology

Merging LR(1) States

Relationship among LR parsing methods

Trending Tags