Post

Compilers - Semantic Analysis with Formal Grammar

Compilers - Semantic Analysis with Formal Grammar

At least the following three tables are involved in semantic analysis:

  • Symbol table stores information about identifiers seen by semantic parser.
  • Type table
  • Constants table

Internal representation of indentifiers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <array>
#include <cstddef>

using TypePointer = void *;
using Value = std::array<std::byte, 8>;

// NOLINTBEGIN
enum class Kind { type, constant, variable };

// @brief: AccessType determines whether actual value changes when the
// variable's value changes inside the function.
enum class AccessType { direct, indirect };

class IdentifierBase {
    TypePointer typeptr;
    Kind kind;
};

class Type : public IdentifierBase {};

class Constant : public IdentifierBase {
    Value value;
};

class Variable : public IdentifierBase {
    AccessType access;
    std::size_t level;
    std::size_t off;
};

class Enum : public IdentifierBase {
    // ...
};

// ...
// NOLINTEND

Symbol table

When seeing

  • definition of a symbol, parser registers it in the symbol table.
  • usage of a symbol, parser finds it in the symbol table.

Finding symbols in table is easy peasy, but be careful about the scope.

Scope

No two identifiers in the same scope can have the same name. It’s ambiguous.

Embedding rule:

  • A scoped identifier can’t be fetched outside its scope, but can be used in inner scope.
  • If more than one identifiers can be seen in current scope, identifier in the deepest level is used.

See also

  • c4 - a minimal c compiler in four functions (500 lines).
This post is licensed under CC BY 4.0 by the author.