Tuga — Compiler & Stack VM
Java
A complete compiler and stack-based virtual machine for the Tuga programming language (a small imperative language with Portuguese keywords), written in Java with ANTLR 4. Covers lexing, parsing, semantic analysis, code generation, and execution.
Built for the Compilers course (2024/2025), Tuga is a full compilation pipeline from source code to execution — five distinct phases, each a separate, well-bounded module.
Pipeline
| Phase | Component | What it does |
|---|---|---|
| Lexing | TugaLexer | ANTLR-generated, custom error listener |
| Parsing | TugaParser | ANTLR grammar (Tuga.g4), syntax errors with line numbers |
| Semantic | DefPhase + RefPhase | Scoped symbol table, reference resolution, type checking |
| Codegen | CodeGenerationVisitor | Emits S-VM bytecode + populates the constant pool |
| Execution | VirtualMachineS | Stack-based interpreter with IP/FP registers |
Language features
- Global and local variable declarations with nested scopes.
- Function declarations with typed parameters and optional return types (
inteiro,real,booleano,string, orvaziofor void). - Function calls as statements (void) or expressions (value-returning).
- Control flow:
se ... senao(if/else),enquanto(while),retorna(return). - Arithmetic, relational, logical, and string-concatenation operators.
- Type checking with automatic
inteiro→realpromotion. - Forward references: a function can be called before its declaration thanks to the two-phase semantic analysis.
Stack Virtual Machine
I designed a small stack VM (VirtualMachineS) with:
- Instruction Pointer (
IP) and Frame Pointer (FP) registers. - A constant pool for literals.
- Global memory + an execution stack for locals, arguments, and return addresses.
- Opcodes for arithmetic, comparisons, control flow (
jump,jumpf), function calls (call), and returns (ret,retval). - Frame-pointer-relative locals:
lalloc n,lload addr,lstore addr.
Sample run
Input (factorial_test.tuga):
funcao principal()
inicio
escreve fact(3);
fim
funcao fact(n: inteiro): inteiro
inicio
se (n igual 0) retorna 1;
retorna n * fact(n-1);
fim
Output (after lexing, parsing, type checking, codegen, and execution):
*** Constant pool ***
*** Instructions ***
0: call 2
1: halt
2: iconst 3
3: call 6
4: iprint
5: ret 0
6: lload -1
7: iconst 0
8: ieq
9: jumpf 12
10: iconst 1
11: retval 1
...
*** VM output ***
6