T81 Foundation

Guide: Adding a Feature to T81Lang

Table of Contents

This guide provides a step-by-step walkthrough for adding a new feature to the T81Lang language. We will use the example of adding a new binary operator, the modulo operator (%), to illustrate the process.

Companion Documents:


1. Frontend Architecture Overview

The T81Lang compiler frontend is responsible for converting .t81 source code into TISC IR. It follows a classic pipeline:

  1. Lexer: Converts source text into a stream of tokens.
  2. Parser: Builds an Abstract Syntax Tree (AST) from the token stream.
  3. Semantic Analyzer: Traverses the AST to check for type errors and resolve symbols (including Option/Result context rules, structural types, and match exhaustiveness).
  4. IR Generator: Traverses the AST to produce a linear sequence of TISC instructions.

This guide will walk you through modifying the Lexer, Parser, and IR Generator.


2. Step 1: Update the Lexer

First, teach the lexer to recognize the new syntax.

2.1 Add the Token Type

In include/t81/frontend/lexer.hpp, add a new entry to the TokenType enum.

// in enum class TokenType
// ...
Plus, Minus, Star, Slash, Percent, // <-- Add Percent
// ...

2.2 Recognize the Lexeme

In lang/frontend/lexer.cpp, find the next_token() method and add a case for the % character in the switch statement.

// in Lexer::next_token()
switch (c) {
    // ...
    case '%': return make_token(TokenType::Percent);
    // ...
}

3. Step 2: Update the Parser

Next, update the parser to understand the operator’s precedence. The modulo operator has the same precedence as multiplication and division.

3.1 Update the Grammar Rule

In lang/frontend/parser.cpp, find the factor() method. Update the while loop to include TokenType::Percent.

// in Parser::factor()
std::unique_ptr<Expr> Parser::factor() {
    std::unique_ptr<Expr> expr = unary();
    // Add TokenType::Percent to this list
    while (match({TokenType::Slash, TokenType::Star, TokenType::Percent})) {
        Token op = previous();
        std::unique_ptr<Expr> right = unary();
        expr = std::make_unique<BinaryExpr>(std::move(expr), op, std::move(right));
    }
    return expr;
}

The parser can now correctly place the modulo operator in the AST.


4. Step 3: Update the IR Generator

Finally, teach the IR generator how to convert the new AST node into a TISC instruction.

4.1 Implement the Visitor Logic

In lang/frontend/ir_generator.cpp, find the visit(const BinaryExpr& expr) method. Add a case for TokenType::Percent.

// in IRGenerator::visit(const BinaryExpr& expr)
std::any IRGenerator::visit(const BinaryExpr& expr) {
    // ... (visit left and right operands)

    switch (expr.op.type) {
        // ...
        case TokenType::Star:
            emit({tisc::Opcode::Mul, {result, left, right}});
            break;
        case TokenType::Percent: // <-- Add this case
            emit({tisc::Opcode::Mod, {result, left, right}});
            break;
        // ...
    }
    return result;
}

The compiler can now generate the Mod TISC instruction.


5. Step 4: Write an End-to-End Test

No feature is complete without a test. An end-to-end test is the best way to validate this change.

  1. Create a Test File: Add a new file in tests/cpp/, such as e2e_mod_test.cpp.
  2. Write the Test: The test should compile a snippet of T81Lang code using % and then execute it on the VM, asserting the final result is correct. See tests/cpp/e2e_arithmetic_test.cpp for a complete example.
  3. Add to CMake: Add your new test file as an executable and test target in the root CMakeLists.txt.

This process—Lexer -> Parser -> IR Generator -> E2E Test—is the standard workflow for adding new language features.

6. Step 5: Reinforce the Semantic Analyzer

The semantic analyzer enforces the invariants described in spec/t81lang-spec.md (sections §2.1 on generic types and §6.2 on match semantics). When you evolve the grammar (see RFC-0011 for the modern generic syntax) you must also extend SemanticAnalyzer so generic inference, Option/Result exhaustiveness, and match lowering remain correct.

Keeping the semantic analyzer in lockstep with the spec lets the IR generator assume it can emit deterministic TISC control flow without rechecking every invariant.