Dynamic Governor Implementation Plan¶

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Replace the static per-token mask governor with a dynamic generate-test-revise loop that checks completed words at word boundaries using G2P, with informed backtracking on violations.

Architecture: Four layers built bottom-up: (1) G2P wrapper + phonological checks, (2) word-level constraint checker, (3) custom generation loop with backtracking, (4) constraint compiler that bridges the user-facing API to the new system. Existing boost mechanisms (LogitBoost, CDD, Include, Thematic) are preserved and relocated. HardGate is removed.

Tech Stack: Python 3.12, PyTorch, g2p_en, phonolex_data (syllabification, WCM, norms), T5Gemma on MPS

Spec: docs/superpowers/specs/2026-04-06-dynamic-governor-design.md

File Map¶

All paths relative to packages/governors/src/phonolex_governors/.

New files (the dynamic system)¶

File	Responsibility
`checking/__init__.py`	Package exports
`checking/g2p.py`	g2p_en wrapper: word → phoneme list, caching, ARPAbet→IPA
`checking/phonology.py`	Phoneme exclusion, MSH stage, cluster detection from phoneme list
`checking/checker.py`	Orchestrate all constraint checks on a completed word
`generation/__init__.py`	Package exports
`generation/loop.py`	Custom autoregressive generation with word-boundary checkpoints
`generation/backtrack.py`	Backtrack state, failure history, intervention planning
`generation/sampling.py`	Top-k/p sampling, distribution-aware reweighting
`constraints/__init__.py`	Package exports
`constraints/types.py`	Constraint type definitions (dataclasses)
`constraints/compiler.py`	Compile constraint list → CheckerConfig + BoostConfig
`boosts/__init__.py`	Package exports
`boosts/logit_boost.py`	Relocated from `boosts.py`
`boosts/include.py`	Relocated from `include.py`
`boosts/thematic.py`	Relocated from `thematic.py`
`boosts/cdd.py`	Relocated from `cdd.py`
`data.py`	PhonoLex norm/vocab lookup for word-level checks

Modified files¶

File	Change
`__init__.py`	Update exports for new structure
`pyproject.toml`	Add `g2p-en` dependency

Test files¶

File	Tests
`tests/test_g2p.py`	G2P wrapper caching, normalization, edge cases
`tests/test_phonology.py`	Phoneme exclusion, MSH, clusters from phoneme lists
`tests/test_checker.py`	Word checker with multiple constraint types
`tests/test_loop.py`	Generation loop with mock model, backtracking
`tests/test_backtrack.py`	Failure history, intervention escalation
`tests/test_compiler.py`	Constraint compilation to CheckerConfig + BoostConfig

Files to remove (after migration)¶

File	Reason
`gates.py`	HardGate replaced by word-level checking
`core.py`	Governor class replaced by generation loop
`constraints.py`	Replaced by `constraints/types.py` + `constraints/compiler.py`
`boosts.py`	Relocated to `boosts/logit_boost.py`
`include.py`	Relocated to `boosts/include.py`
`thematic.py`	Relocated to `boosts/thematic.py`
`cdd.py`	Relocated to `boosts/cdd.py`

Old files are removed in the final task after all consumers are migrated.

Task 1: G2P Wrapper¶

The foundation — everything else depends on being able to get phonemes from a word.

Files: - Create: packages/governors/src/phonolex_governors/checking/__init__.py - Create: packages/governors/src/phonolex_governors/checking/g2p.py - Create: packages/governors/tests/test_g2p.py - Modify: packages/governors/pyproject.toml (add g2p-en dep)

[ ] Step 1: Add g2p-en dependency

In packages/governors/pyproject.toml, change:

dependencies = [
    "torch>=2.0",
]

to:

dependencies = [
    "torch>=2.0",
    "g2p-en>=2.1.0",
]

[ ] Step 2: Create checking package

Create packages/governors/src/phonolex_governors/checking/__init__.py:

"""Word-level constraint checking."""

[ ] Step 3: Write the failing tests

Create packages/governors/tests/test_g2p.py:

"""Tests for G2P wrapper."""

import pytest
from phonolex_governors.checking.g2p import word_to_phonemes, word_has_phoneme, G2PCache


def test_word_to_phonemes_returns_list():
    phonemes = word_to_phonemes("cat")
    assert isinstance(phonemes, list)
    assert len(phonemes) > 0


def test_word_to_phonemes_known_word():
    phonemes = word_to_phonemes("cat")
    assert "K" in phonemes
    assert "AE1" in phonemes
    assert "T" in phonemes


def test_word_to_phonemes_oov_word():
    """G2P should handle out-of-vocabulary words via neural fallback."""
    phonemes = word_to_phonemes("flibbertigibbet")
    assert isinstance(phonemes, list)
    assert len(phonemes) > 0


def test_word_has_phoneme_r():
    assert word_has_phoneme("running", "R") is True
    assert word_has_phoneme("cat", "R") is False


def test_word_has_phoneme_rhotacized_vowels():
    """ER0/ER1/ER2 are rhotacized vowels — should match R check."""
    assert word_has_phoneme("verdant", "R") is True  # contains ER


def test_word_has_phoneme_case_insensitive_input():
    """Input word should be case-insensitive."""
    assert word_to_phonemes("Cat") == word_to_phonemes("cat")


def test_word_to_phonemes_punctuation_returns_empty():
    assert word_to_phonemes(",") == []
    assert word_to_phonemes("") == []
    assert word_to_phonemes("123") == []


def test_cache_returns_same_result():
    cache = G2PCache()
    result1 = cache.get("cat")
    result2 = cache.get("cat")
    assert result1 == result2
    assert cache.hits == 1  # second call was a cache hit


def test_cache_different_words():
    cache = G2PCache()
    cat = cache.get("cat")
    dog = cache.get("dog")
    assert cat != dog
    assert cache.hits == 0  # no cache hits, both were misses

[ ] Step 4: Run tests to verify they fail

Run: cd packages/governors && uv run python -m pytest tests/test_g2p.py -v Expected: FAIL — module phonolex_governors.checking.g2p not found

[ ] Step 5: Implement g2p.py

Create packages/governors/src/phonolex_governors/checking/g2p.py:

"""G2P wrapper — word to phonemes via g2p_en.

Provides cached phoneme resolution for any English word, including
out-of-vocabulary forms. Returns ARPAbet phonemes.

The R_PHONEMES set includes both /R/ and rhotacized vowels (ER0/ER1/ER2)
since clinically these are all "r-colored" sounds that SLPs treat as
part of the /r/ phoneme class.
"""

from __future__ import annotations

from g2p_en import G2p

# ARPAbet phonemes that count as "r-colored" for clinical purposes
R_PHONEMES = {"R", "ER0", "ER1", "ER2"}

# Singleton G2P instance (loads model on first use)
_g2p: G2p | None = None


def _get_g2p() -> G2p:
    global _g2p
    if _g2p is None:
        _g2p = G2p()
    return _g2p


def word_to_phonemes(word: str) -> list[str]:
    """Convert a word to ARPAbet phonemes.

    Returns an empty list for non-alphabetic input (punctuation, numbers, empty).
    """
    clean = word.strip().lower()
    if not clean or not clean.isalpha():
        return []
    g2p = _get_g2p()
    result = g2p(clean)
    # g2p_en returns a mix of phonemes and characters for some inputs;
    # filter to only ARPAbet tokens (uppercase, optionally with digit suffix)
    return [p for p in result if p[0].isupper()]


def word_has_phoneme(word: str, phoneme: str) -> bool:
    """Check if a word contains a specific ARPAbet phoneme.

    For 'R', also checks rhotacized vowels (ER0, ER1, ER2).
    """
    phonemes = word_to_phonemes(word)
    if phoneme == "R":
        return bool(R_PHONEMES & set(phonemes))
    return phoneme in phonemes


class G2PCache:
    """Per-generation phoneme cache. Create one per generation call."""

    def __init__(self):
        self._cache: dict[str, list[str]] = {}
        self.hits = 0
        self.misses = 0

    def get(self, word: str) -> list[str]:
        key = word.strip().lower()
        if key in self._cache:
            self.hits += 1
            return self._cache[key]
        self.misses += 1
        result = word_to_phonemes(key)
        self._cache[key] = result
        return result

[ ] Step 6: Run tests to verify they pass

Run: cd packages/governors && uv run python -m pytest tests/test_g2p.py -v Expected: All pass

[ ] Step 7: Commit

git add packages/governors/src/phonolex_governors/checking/ packages/governors/tests/test_g2p.py packages/governors/pyproject.toml
git commit -m "feat(governors): G2P wrapper with caching for word-level phoneme resolution"

Task 2: Phonological Check Functions¶

Pure functions that take a phoneme list and check specific phonological constraints. No G2P, no generation — just phoneme list in, pass/fail out.

Files: - Create: packages/governors/src/phonolex_governors/checking/phonology.py - Create: packages/governors/tests/test_phonology.py

[ ] Step 1: Write the failing tests

Create packages/governors/tests/test_phonology.py:

"""Tests for phonological check functions."""

import pytest
from phonolex_governors.checking.phonology import (
    check_exclude,
    check_exclude_clusters,
    check_msh_stage,
)


# --- Exclude ---

def test_exclude_catches_r():
    phonemes = ["R", "AH1", "N", "IH0", "NG"]  # "running"
    result = check_exclude(phonemes, excluded={"R"})
    assert not result.passed
    assert "R" in result.found_phonemes


def test_exclude_passes_clean():
    phonemes = ["K", "AE1", "T"]  # "cat"
    result = check_exclude(phonemes, excluded={"R"})
    assert result.passed


def test_exclude_catches_rhotacized_vowel():
    phonemes = ["V", "ER1", "D", "AH0", "N", "T"]  # "verdant"
    result = check_exclude(phonemes, excluded={"R"})
    assert not result.passed


def test_exclude_multiple_phonemes():
    phonemes = ["S", "T", "R", "IY1", "T"]  # "street"
    result = check_exclude(phonemes, excluded={"R", "S"})
    assert not result.passed
    assert {"R", "S"} <= result.found_phonemes


# --- MSH Stage ---

def test_msh_stage_pass():
    phonemes = ["M", "AE1", "P"]  # "map" — all stage ≤3
    result = check_msh_stage(phonemes, max_stage=3)
    assert result.passed


def test_msh_stage_fail():
    phonemes = ["S", "AE1", "T"]  # "sat" — /S/ is stage 5
    result = check_msh_stage(phonemes, max_stage=3)
    assert not result.passed
    assert result.max_found_stage == 5


def test_msh_stage_vowel_only():
    phonemes = ["AH0"]
    result = check_msh_stage(phonemes, max_stage=2)
    assert result.passed


# --- Exclude in clusters ---

def test_exclude_clusters_catches_s_cluster():
    phonemes = ["S", "T", "R", "IY1", "T"]  # "street"
    syllables = [{"onset": ["S", "T", "R"], "nucleus": ["IY1"], "coda": ["T"]}]
    result = check_exclude_clusters(phonemes, syllables, excluded={"S"})
    assert not result.passed


def test_exclude_clusters_passes_singleton_onset():
    phonemes = ["S", "AE1", "T"]  # "sat" — /S/ is singleton onset, not cluster
    syllables = [{"onset": ["S"], "nucleus": ["AE1"], "coda": ["T"]}]
    result = check_exclude_clusters(phonemes, syllables, excluded={"S"})
    assert result.passed

[ ] Step 2: Run tests to verify they fail

Run: cd packages/governors && uv run python -m pytest tests/test_phonology.py -v Expected: FAIL — module not found

[ ] Step 3: Implement phonology.py

Create packages/governors/src/phonolex_governors/checking/phonology.py:

"""Phonological check functions — phoneme list in, pass/fail out.

These are pure functions with no G2P dependency. They operate on
ARPAbet phoneme lists and syllable structures.
"""

from __future__ import annotations

from dataclasses import dataclass, field

from phonolex_governors.checking.g2p import R_PHONEMES


# ARPAbet consonants → MSH stage mapping
# Namasivayam et al. (2021)
_MSH_STAGES: dict[str, int] = {
    "HH": 2,
    "P": 3, "B": 3, "M": 3,
    "F": 4, "W": 4, "R": 4,
    "T": 5, "D": 5, "K": 5, "G": 5, "N": 5,
    "S": 5, "Z": 5, "L": 5, "NG": 5,
    "SH": 5, "ZH": 5, "CH": 5, "JH": 5,
    "TH": 5, "DH": 5, "V": 5, "Y": 5,
}

# ARPAbet vowels (including stressed variants)
_ARPABET_VOWELS = {
    f"{v}{s}" for v in [
        "AA", "AE", "AH", "AO", "AW", "AY", "EH", "ER", "EY",
        "IH", "IY", "OW", "OY", "UH", "UW",
    ] for s in ["", "0", "1", "2"]
}


@dataclass
class ExcludeResult:
    passed: bool
    found_phonemes: set[str] = field(default_factory=set)


@dataclass
class MSHResult:
    passed: bool
    max_found_stage: int = 1


@dataclass
class ClusterExcludeResult:
    passed: bool
    found_in_clusters: set[str] = field(default_factory=set)


def check_exclude(phonemes: list[str], excluded: set[str]) -> ExcludeResult:
    """Check if a phoneme list contains any excluded phonemes.

    For 'R' exclusion, also catches rhotacized vowels (ER0/ER1/ER2).
    """
    phoneme_set = set(phonemes)
    found: set[str] = set()
    for ex in excluded:
        if ex == "R":
            overlap = phoneme_set & R_PHONEMES
            if overlap:
                found.add("R")
        elif ex in phoneme_set:
            found.add(ex)
    return ExcludeResult(passed=len(found) == 0, found_phonemes=found)


def check_msh_stage(phonemes: list[str], max_stage: int) -> MSHResult:
    """Check if all consonants are at or below the max MSH stage.

    Vowels are stage 1. Unknown consonants default to stage 5.
    """
    max_found = 1
    for p in phonemes:
        # Strip stress digits for lookup
        base = p.rstrip("012")
        stage = _MSH_STAGES.get(base)
        if stage is not None:
            max_found = max(max_found, stage)
        elif p not in _ARPABET_VOWELS:
            # Unknown consonant — conservative default
            max_found = 5
    return MSHResult(passed=max_found <= max_stage, max_found_stage=max_found)


def check_exclude_clusters(
    phonemes: list[str],
    syllables: list[dict],
    excluded: set[str],
) -> ClusterExcludeResult:
    """Check if excluded phonemes appear in consonant clusters.

    Only flags the phoneme when it's in an onset or coda with ≥2 consonants.
    """
    found: set[str] = set()
    for syl in syllables:
        onset = syl.get("onset", [])
        coda = syl.get("coda", [])
        if len(onset) >= 2:
            for p in onset:
                if p in excluded:
                    found.add(p)
        if len(coda) >= 2:
            for p in coda:
                if p in excluded:
                    found.add(p)
    return ClusterExcludeResult(passed=len(found) == 0, found_in_clusters=found)

[ ] Step 4: Run tests to verify they pass

Run: cd packages/governors && uv run python -m pytest tests/test_phonology.py -v Expected: All pass

[ ] Step 5: Commit

git add packages/governors/src/phonolex_governors/checking/phonology.py packages/governors/tests/test_phonology.py
git commit -m "feat(governors): phonological check functions for exclude, MSH, clusters"

Task 3: Word Checker¶

Orchestrates all constraint checks on a single completed word. Takes a word string and a CheckerConfig, returns a CheckResult.

Files: - Create: packages/governors/src/phonolex_governors/checking/checker.py - Create: packages/governors/tests/test_checker.py

[ ] Step 1: Write the failing tests

Create packages/governors/tests/test_checker.py:

"""Tests for word-level constraint checker."""

import pytest
from phonolex_governors.checking.checker import (
    check_word, CheckerConfig, CheckResult, PhonemeExcludeCheck,
    MSHCheck, BoundCheck,
)
from phonolex_governors.checking.g2p import G2PCache


def test_exclude_r_catches_running():
    config = CheckerConfig(checks=[PhonemeExcludeCheck(excluded={"R"})])
    cache = G2PCache()
    result = check_word("running", config, cache)
    assert not result.passed
    assert len(result.violations) == 1
    assert "R" in result.violations[0].details


def test_exclude_r_passes_cat():
    config = CheckerConfig(checks=[PhonemeExcludeCheck(excluded={"R"})])
    cache = G2PCache()
    result = check_word("cat", config, cache)
    assert result.passed
    assert len(result.violations) == 0


def test_msh_stage_check():
    config = CheckerConfig(checks=[MSHCheck(max_stage=3)])
    cache = G2PCache()
    # "map" = /M AE P/ — all ≤ stage 3
    assert check_word("map", config, cache).passed
    # "sat" = /S AE T/ — S is stage 5
    assert not check_word("sat", config, cache).passed


def test_bound_check_with_norms():
    norms = {"cat": {"aoa_kuperman": 3.2}, "elephant": {"aoa_kuperman": 6.8}}
    config = CheckerConfig(
        checks=[BoundCheck(norm="aoa_kuperman", max_val=5.0)],
        norm_lookup=norms,
    )
    cache = G2PCache()
    assert check_word("cat", config, cache).passed
    assert not check_word("elephant", config, cache).passed


def test_bound_check_unknown_word_passes():
    """Words not in the norm lookup pass bound checks (no data = no violation)."""
    config = CheckerConfig(
        checks=[BoundCheck(norm="aoa_kuperman", max_val=5.0)],
        norm_lookup={},
    )
    cache = G2PCache()
    assert check_word("xyzzy", config, cache).passed


def test_multiple_checks_all_must_pass():
    norms = {"street": {"aoa_kuperman": 4.0}}
    config = CheckerConfig(checks=[
        PhonemeExcludeCheck(excluded={"R"}),
        BoundCheck(norm="aoa_kuperman", max_val=5.0),
    ], norm_lookup=norms)
    cache = G2PCache()
    # "street" passes AoA but fails R exclusion
    result = check_word("street", config, cache)
    assert not result.passed
    assert len(result.violations) == 1


def test_punctuation_always_passes():
    config = CheckerConfig(checks=[PhonemeExcludeCheck(excluded={"R"})])
    cache = G2PCache()
    assert check_word(",", config, cache).passed
    assert check_word(".", config, cache).passed

[ ] Step 2: Run tests to verify they fail

Run: cd packages/governors && uv run python -m pytest tests/test_checker.py -v Expected: FAIL — module not found

[ ] Step 3: Implement checker.py

Create packages/governors/src/phonolex_governors/checking/checker.py:

"""Word-level constraint checker.

Pure function: (word, config, cache) → CheckResult.
Knows nothing about generation or backtracking.
"""

from __future__ import annotations

from dataclasses import dataclass, field

from phonolex_governors.checking.g2p import G2PCache, word_to_phonemes
from phonolex_governors.checking.phonology import (
    check_exclude, check_msh_stage, check_exclude_clusters,
)


@dataclass
class Violation:
    check_type: str
    details: str


@dataclass
class CheckResult:
    passed: bool
    violations: list[Violation] = field(default_factory=list)


# --- Check types ---

@dataclass
class PhonemeExcludeCheck:
    excluded: set[str]


@dataclass
class PhonemeExcludeClustersCheck:
    excluded: set[str]


@dataclass
class MSHCheck:
    max_stage: int


@dataclass
class BoundCheck:
    norm: str
    min_val: float | None = None
    max_val: float | None = None


@dataclass
class VocabOnlyCheck:
    allowed_words: set[str] | None = None
    allowed_lists: set[str] | None = None


Check = PhonemeExcludeCheck | PhonemeExcludeClustersCheck | MSHCheck | BoundCheck | VocabOnlyCheck


@dataclass
class CheckerConfig:
    checks: list[Check] = field(default_factory=list)
    norm_lookup: dict[str, dict[str, float]] = field(default_factory=dict)
    vocab_lookup: dict[str, set[str]] = field(default_factory=dict)


def check_word(word: str, config: CheckerConfig, cache: G2PCache) -> CheckResult:
    """Check a completed word against all configured constraints.

    Non-alphabetic tokens (punctuation, numbers) always pass.
    """
    clean = word.strip().lower()
    if not clean or not clean.isalpha():
        return CheckResult(passed=True)

    violations: list[Violation] = []
    phonemes = cache.get(clean)

    for check in config.checks:
        if isinstance(check, PhonemeExcludeCheck):
            result = check_exclude(phonemes, check.excluded)
            if not result.passed:
                violations.append(Violation(
                    check_type="exclude",
                    details=f"contains {','.join(sorted(result.found_phonemes))}",
                ))

        elif isinstance(check, MSHCheck):
            result = check_msh_stage(phonemes, check.max_stage)
            if not result.passed:
                violations.append(Violation(
                    check_type="msh",
                    details=f"stage {result.max_found_stage} > {check.max_stage}",
                ))

        elif isinstance(check, BoundCheck):
            norms = config.norm_lookup.get(clean, {})
            val = norms.get(check.norm)
            if val is not None:
                if check.min_val is not None and val < check.min_val:
                    violations.append(Violation(
                        check_type="bound",
                        details=f"{check.norm}={val:.1f} < {check.min_val}",
                    ))
                if check.max_val is not None and val > check.max_val:
                    violations.append(Violation(
                        check_type="bound",
                        details=f"{check.norm}={val:.1f} > {check.max_val}",
                    ))

        elif isinstance(check, VocabOnlyCheck):
            if check.allowed_words is not None:
                if clean not in check.allowed_words:
                    violations.append(Violation(
                        check_type="vocab_only",
                        details=f"'{clean}' not in allowed words",
                    ))
            if check.allowed_lists is not None:
                memberships = config.vocab_lookup.get(clean, set())
                if not (check.allowed_lists & memberships):
                    violations.append(Violation(
                        check_type="vocab_only",
                        details=f"'{clean}' not in required lists",
                    ))

        elif isinstance(check, PhonemeExcludeClustersCheck):
            # Requires syllable structure — get from phonolex_data if available
            # For now, basic implementation using G2P phonemes
            from phonolex_data.phonology.syllabification import syllabify, PhonemeWithStress, is_vowel
            from phonolex_governors.checking.g2p import R_PHONEMES
            pws = [PhonemeWithStress(p, 1 if p in _ARPABET_VOWELS else None) for p in phonemes]
            syls = syllabify(pws)
            if syls:
                syl_dicts = [{"onset": s.onset, "nucleus": s.nucleus, "coda": s.coda} for s in syls]
                result = check_exclude_clusters(phonemes, syl_dicts, check.excluded)
                if not result.passed:
                    violations.append(Violation(
                        check_type="exclude_clusters",
                        details=f"cluster contains {','.join(sorted(result.found_in_clusters))}",
                    ))

    return CheckResult(passed=len(violations) == 0, violations=violations)


# Re-export for convenience
from phonolex_governors.checking.phonology import _ARPABET_VOWELS  # noqa: E402

[ ] Step 4: Run tests to verify they pass

Run: cd packages/governors && uv run python -m pytest tests/test_checker.py -v Expected: All pass

[ ] Step 5: Commit

git add packages/governors/src/phonolex_governors/checking/checker.py packages/governors/tests/test_checker.py
git commit -m "feat(governors): word-level constraint checker with phoneme, MSH, and bound checks"

Task 4: Backtrack Engine¶

Manages backtrack state, failure history, and intervention planning. No model dependency — pure state machine.

Files: - Create: packages/governors/src/phonolex_governors/generation/__init__.py - Create: packages/governors/src/phonolex_governors/generation/backtrack.py - Create: packages/governors/tests/test_backtrack.py

[ ] Step 1: Write the failing tests

Create packages/governors/tests/test_backtrack.py:

"""Tests for backtrack engine."""

import pytest
from phonolex_governors.generation.backtrack import BacktrackState, Intervention
from phonolex_governors.checking.checker import CheckResult, Violation


def test_initial_state_no_intervention():
    state = BacktrackState()
    intervention = state.plan_intervention()
    assert intervention.banned_sequences == []
    assert intervention.logit_bias == {}


def test_record_failure_tracks_history():
    state = BacktrackState(word_start_idx=5)
    result = CheckResult(passed=False, violations=[
        Violation(check_type="exclude", details="contains R"),
    ])
    state.record_failure(result, token_ids=[10, 20, 30])
    assert state.attempt_count == 1
    assert len(state.failure_history) == 1


def test_plan_intervention_bans_sequence():
    state = BacktrackState(word_start_idx=5)
    result = CheckResult(passed=False, violations=[
        Violation(check_type="exclude", details="contains R"),
    ])
    state.record_failure(result, token_ids=[10, 20, 30])
    intervention = state.plan_intervention()
    assert [10, 20, 30] in intervention.banned_sequences


def test_escalation_after_repeated_failures():
    state = BacktrackState(word_start_idx=5)
    for i in range(4):
        result = CheckResult(passed=False, violations=[
            Violation(check_type="exclude", details="contains R"),
        ])
        state.record_failure(result, token_ids=[10 + i])

    intervention = state.plan_intervention()
    # After 3+ failures, should escalate (wider backtrack)
    assert intervention.escalated


def test_max_attempts_reached():
    state = BacktrackState(word_start_idx=5, max_attempts=3)
    for i in range(3):
        state.record_failure(
            CheckResult(passed=False, violations=[Violation("exclude", "R")]),
            token_ids=[i],
        )
    assert state.exhausted

[ ] Step 2: Run tests to verify they fail

Run: cd packages/governors && uv run python -m pytest tests/test_backtrack.py -v Expected: FAIL — module not found

[ ] Step 3: Implement backtrack.py

Create packages/governors/src/phonolex_governors/generation/__init__.py:

"""Generation loop with word-boundary checking and backtracking."""

Create packages/governors/src/phonolex_governors/generation/backtrack.py:

"""Backtrack state and intervention planning.

Tracks failure history at a word position and escalates interventions
based on repeated failures.
"""

from __future__ import annotations

from dataclasses import dataclass, field

from phonolex_governors.checking.checker import CheckResult


@dataclass
class Intervention:
    """What to do before retrying after a backtrack."""
    banned_sequences: list[list[int]] = field(default_factory=list)
    logit_bias: dict[int, float] = field(default_factory=dict)
    temperature_delta: float = 0.0
    escalated: bool = False


@dataclass
class BacktrackState:
    """Tracks backtrack attempts at a single word position."""
    word_start_idx: int = 0
    max_attempts: int = 5
    failure_history: list[tuple[CheckResult, list[int]]] = field(default_factory=list)

    @property
    def attempt_count(self) -> int:
        return len(self.failure_history)

    @property
    def exhausted(self) -> bool:
        return self.attempt_count >= self.max_attempts

    def record_failure(self, result: CheckResult, token_ids: list[int]) -> None:
        """Record a failed word attempt."""
        self.failure_history.append((result, list(token_ids)))

    def plan_intervention(self) -> Intervention:
        """Plan the intervention for the next retry based on failure history."""
        if not self.failure_history:
            return Intervention()

        banned = [ids for _, ids in self.failure_history]
        escalated = self.attempt_count >= 3

        # Progressive temperature increase to explore more
        temp_delta = min(0.1 * self.attempt_count, 0.3)

        return Intervention(
            banned_sequences=banned,
            temperature_delta=temp_delta,
            escalated=escalated,
        )

    def reset(self) -> None:
        """Reset state for a new word position."""
        self.failure_history.clear()

[ ] Step 4: Run tests to verify they pass

Run: cd packages/governors && uv run python -m pytest tests/test_backtrack.py -v Expected: All pass

[ ] Step 5: Commit

git add packages/governors/src/phonolex_governors/generation/ packages/governors/tests/test_backtrack.py
git commit -m "feat(governors): backtrack engine with failure history and escalating interventions"

Task 5: Sampling Utilities¶

Top-k/p sampling and banned sequence enforcement. These are used by the generation loop.

Files: - Create: packages/governors/src/phonolex_governors/generation/sampling.py - Create: packages/governors/tests/test_sampling.py

[ ] Step 1: Write the failing tests

Create packages/governors/tests/test_sampling.py:

"""Tests for sampling utilities."""

import torch
import pytest
from phonolex_governors.generation.sampling import (
    sample_token, apply_banned_sequences,
)


def test_sample_token_returns_valid_id():
    logits = torch.randn(1, 100)
    token_id = sample_token(logits, temperature=1.0, top_k=50, top_p=0.9)
    assert 0 <= token_id < 100


def test_sample_token_deterministic_at_low_temp():
    logits = torch.zeros(1, 100)
    logits[0, 42] = 100.0  # overwhelmingly likely
    token_id = sample_token(logits, temperature=0.01, top_k=50, top_p=0.9)
    assert token_id == 42


def test_apply_banned_sequences_blocks_matching():
    logits = torch.zeros(1, 100)
    # Generated so far: [10, 20] — if banned sequence is [10, 20, 30],
    # then token 30 should be blocked
    generated = [10, 20]
    banned = [[10, 20, 30]]
    result = apply_banned_sequences(logits, generated, banned)
    assert result[0, 30].item() < -1e8


def test_apply_banned_sequences_no_match():
    logits = torch.zeros(1, 100)
    generated = [10, 25]  # doesn't match prefix [10, 20]
    banned = [[10, 20, 30]]
    result = apply_banned_sequences(logits, generated, banned)
    assert result[0, 30].item() == 0.0  # not blocked


def test_apply_banned_sequences_multiple():
    logits = torch.zeros(1, 100)
    generated = [10, 20]
    banned = [[10, 20, 30], [10, 20, 40]]
    result = apply_banned_sequences(logits, generated, banned)
    assert result[0, 30].item() < -1e8
    assert result[0, 40].item() < -1e8
    assert result[0, 50].item() == 0.0  # not affected

[ ] Step 2: Run tests to verify they fail

Run: cd packages/governors && uv run python -m pytest tests/test_sampling.py -v Expected: FAIL

[ ] Step 3: Implement sampling.py

Create packages/governors/src/phonolex_governors/generation/sampling.py:

"""Sampling utilities for the generation loop."""

from __future__ import annotations

import torch


def sample_token(
    logits: torch.Tensor,
    temperature: float = 0.8,
    top_k: int = 50,
    top_p: float = 0.9,
) -> int:
    """Sample a single token from logits with temperature, top-k, and top-p.

    Args:
        logits: (1, vocab_size) raw logits
        temperature: Sampling temperature
        top_k: Keep only top-k tokens
        top_p: Nucleus sampling threshold

    Returns:
        Token ID (int)
    """
    logits = logits[0] / max(temperature, 1e-8)

    # Top-k filtering
    if top_k > 0:
        topk_vals, _ = torch.topk(logits, min(top_k, logits.size(-1)))
        logits[logits < topk_vals[-1]] = -float("inf")

    # Top-p (nucleus) filtering
    if top_p < 1.0:
        sorted_logits, sorted_indices = torch.sort(logits, descending=True)
        cumulative_probs = torch.cumsum(torch.softmax(sorted_logits, dim=-1), dim=-1)
        cutoff_mask = cumulative_probs - torch.softmax(sorted_logits, dim=-1) >= top_p
        sorted_logits[cutoff_mask] = -float("inf")
        logits = sorted_logits.scatter(0, sorted_indices, sorted_logits)

    probs = torch.softmax(logits, dim=-1)
    return torch.multinomial(probs, num_samples=1).item()


def apply_banned_sequences(
    logits: torch.Tensor,
    generated: list[int],
    banned: list[list[int]],
) -> torch.Tensor:
    """Block tokens that would continue a banned sequence.

    For each banned sequence, if the generated tokens end with its prefix,
    the next token in the banned sequence gets -inf logits.
    """
    logits = logits.clone()
    for seq in banned:
        if len(seq) < 2:
            continue
        prefix = seq[:-1]
        next_token = seq[-1]
        # Check if generated ends with this prefix
        if len(generated) >= len(prefix):
            if generated[-len(prefix):] == prefix:
                logits[0, next_token] = -float("inf")
    return logits

[ ] Step 4: Run tests to verify they pass

Run: cd packages/governors && uv run python -m pytest tests/test_sampling.py -v Expected: All pass

[ ] Step 5: Commit

git add packages/governors/src/phonolex_governors/generation/sampling.py packages/governors/tests/test_sampling.py
git commit -m "feat(governors): sampling utilities with banned sequence enforcement"

Task 6: Generation Loop¶

The custom autoregressive generation loop with word-boundary checking and backtracking. This is the core of the new system.

Files: - Create: packages/governors/src/phonolex_governors/generation/loop.py - Create: packages/governors/tests/test_loop.py

[ ] Step 1: Write the failing tests

Create packages/governors/tests/test_loop.py:

"""Tests for generation loop with word-boundary checking.

Uses a mock model that produces a deterministic token sequence,
allowing us to test word-boundary detection and backtracking logic
without a real LLM.
"""

import torch
import pytest
from unittest.mock import MagicMock
from phonolex_governors.generation.loop import generate, GenerationResult
from phonolex_governors.checking.checker import CheckerConfig, PhonemeExcludeCheck
from phonolex_governors.checking.g2p import G2PCache


class MockModel:
    """Deterministic model that returns tokens from a fixed sequence."""

    def __init__(self, token_sequence: list[int], vocab_size: int = 100):
        self.token_sequence = token_sequence
        self.vocab_size = vocab_size
        self._step = 0
        self.config = MagicMock()
        self.config.vocab_size = vocab_size

    def __call__(self, input_ids, past_key_values=None, use_cache=True):
        """Return logits that make the next token in sequence overwhelmingly likely."""
        logits = torch.full((1, 1, self.vocab_size), -100.0)
        if self._step < len(self.token_sequence):
            logits[0, 0, self.token_sequence[self._step]] = 100.0
            self._step += 1
        else:
            logits[0, 0, 1] = 100.0  # EOS fallback
        out = MagicMock()
        out.logits = logits
        out.past_key_values = past_key_values  # pass through
        return out

    def reset(self):
        self._step = 0


class MockTokenizer:
    """Tokenizer that maps token IDs to fixed strings."""

    def __init__(self, vocab: dict[int, str]):
        self._vocab = vocab
        self._reverse = {v: k for k, v in vocab.items()}
        self.eos_token_id = 1

    def decode(self, token_ids, skip_special_tokens=True):
        return "".join(self._vocab.get(t, "") for t in token_ids)

    def encode(self, text, return_tensors=None, add_special_tokens=True):
        # Simplified: return tensor of first matching tokens
        ids = [self._reverse.get(c, 0) for c in text.split()]
        if return_tensors == "pt":
            return {"input_ids": torch.tensor([ids])}
        return ids


def test_generate_returns_result():
    vocab = {0: "▁the", 2: "▁cat", 3: "▁sat", 1: ""}
    model = MockModel([0, 2, 3, 1])  # "the cat sat" + EOS
    tokenizer = MockTokenizer(vocab)
    config = CheckerConfig()

    result = generate(model, tokenizer, "test", config)
    assert isinstance(result, GenerationResult)
    assert len(result.token_ids) > 0


def test_generate_stops_at_eos():
    vocab = {0: "▁hello", 1: ""}
    model = MockModel([0, 1])
    tokenizer = MockTokenizer(vocab)
    config = CheckerConfig()

    result = generate(model, tokenizer, "test", config)
    # Should stop after EOS, not continue to max_tokens
    assert len(result.token_ids) <= 2


def test_generate_respects_max_tokens():
    vocab = {0: "▁word"}
    model = MockModel([0] * 100)
    tokenizer = MockTokenizer(vocab)
    config = CheckerConfig()

    result = generate(model, tokenizer, "test", config, max_tokens=10)
    assert len(result.token_ids) <= 10

[ ] Step 2: Run tests to verify they fail

Run: cd packages/governors && uv run python -m pytest tests/test_loop.py -v Expected: FAIL

[ ] Step 3: Implement loop.py

Create packages/governors/src/phonolex_governors/generation/loop.py:

"""Custom autoregressive generation loop with word-boundary checking.

Replaces model.generate() with a loop that:
1. Generates token by token
2. Detects word boundaries (SentencePiece ▁ prefix)
3. Checks completed words against constraints
4. Backtracks and retries on violations
"""

from __future__ import annotations

import time
from dataclasses import dataclass, field

import torch

from phonolex_governors.checking.checker import check_word, CheckerConfig
from phonolex_governors.checking.g2p import G2PCache
from phonolex_governors.generation.backtrack import BacktrackState, Intervention
from phonolex_governors.generation.sampling import sample_token, apply_banned_sequences


WORD_BOUNDARY_CHAR = "\u2581"  # SentencePiece ▁


@dataclass
class GenerationResult:
    token_ids: list[int]
    text: str
    gen_time_ms: float
    backtracks: int = 0
    words_checked: int = 0


def _is_word_boundary(token_text: str) -> bool:
    """Check if a token starts a new word (SentencePiece ▁ prefix)."""
    return token_text.startswith(WORD_BOUNDARY_CHAR) or token_text.startswith(" ")


def _extract_word(tokenizer, token_ids: list[int]) -> str:
    """Decode a sequence of token IDs into a word, stripping boundaries."""
    text = tokenizer.decode(token_ids, skip_special_tokens=True)
    return text.strip()


def generate(
    model,
    tokenizer,
    prompt: str,
    checker_config: CheckerConfig,
    boost_fn=None,
    max_tokens: int = 256,
    max_backtracks_per_word: int = 5,
    temperature: float = 0.8,
    top_k: int = 50,
    top_p: float = 0.9,
) -> GenerationResult:
    """Generate text with word-boundary constraint checking and backtracking.

    Args:
        model: Language model with __call__(input_ids, past_key_values, use_cache)
        tokenizer: Tokenizer with encode/decode
        prompt: Input prompt text
        checker_config: Word-level constraint configuration
        boost_fn: Optional callable(logits) → logits for soft boosts
        max_tokens: Maximum tokens to generate
        max_backtracks_per_word: Max retries per word position
        temperature, top_k, top_p: Sampling parameters
    """
    t0 = time.time()
    g2p_cache = G2PCache()
    total_backtracks = 0
    words_checked = 0

    # Encode prompt
    encoded = tokenizer.encode(prompt, return_tensors="pt", add_special_tokens=True)
    if isinstance(encoded, dict):
        input_ids = encoded["input_ids"]
    else:
        input_ids = torch.tensor([encoded])

    device = next(model.parameters(), torch.tensor(0)).device
    input_ids = input_ids.to(device)

    # Initial forward pass to get KV cache
    with torch.no_grad():
        outputs = model(input_ids, past_key_values=None, use_cache=True)
    kv_cache = outputs.past_key_values

    # Generation state
    generated_ids: list[int] = []
    current_word_ids: list[int] = []  # tokens in the word being built
    backtrack_state = BacktrackState(max_attempts=max_backtracks_per_word)

    # Snapshot for backtracking
    word_start_kv = kv_cache
    word_start_gen_len = 0

    for step in range(max_tokens):
        # Get logits for next token
        last_token = torch.tensor([[generated_ids[-1] if generated_ids else input_ids[0, -1].item()]], device=device)
        with torch.no_grad():
            outputs = model(last_token, past_key_values=kv_cache, use_cache=True)
        logits = outputs.logits[:, -1:, :]  # (1, 1, vocab)
        kv_cache = outputs.past_key_values

        # Apply soft boosts
        if boost_fn is not None:
            logits = boost_fn(logits)

        # Apply banned sequences from backtrack state
        intervention = backtrack_state.plan_intervention()
        if intervention.banned_sequences:
            logits = apply_banned_sequences(
                logits.squeeze(0), current_word_ids, intervention.banned_sequences,
            ).unsqueeze(0)

        # Sample
        effective_temp = temperature + intervention.temperature_delta
        token_id = sample_token(logits.squeeze(0), effective_temp, top_k, top_p)

        # Check for EOS
        if token_id == tokenizer.eos_token_id:
            # Check the final word before ending
            if current_word_ids:
                word = _extract_word(tokenizer, current_word_ids)
                result = check_word(word, checker_config, g2p_cache)
                words_checked += 1
                if not result.passed and not backtrack_state.exhausted:
                    # Backtrack
                    backtrack_state.record_failure(result, current_word_ids)
                    generated_ids = generated_ids[:word_start_gen_len]
                    current_word_ids = []
                    kv_cache = word_start_kv
                    total_backtracks += 1
                    continue
            break

        # Decode to check for word boundary
        token_text = tokenizer.decode([token_id], skip_special_tokens=False)

        if _is_word_boundary(token_text) and current_word_ids:
            # Previous word is complete — check it
            word = _extract_word(tokenizer, current_word_ids)
            result = check_word(word, checker_config, g2p_cache)
            words_checked += 1

            if not result.passed and not backtrack_state.exhausted:
                # Violation! Backtrack to word start
                backtrack_state.record_failure(result, current_word_ids)
                generated_ids = generated_ids[:word_start_gen_len]
                current_word_ids = []
                kv_cache = word_start_kv
                total_backtracks += 1
                continue

            # Word passed — snapshot for next word
            backtrack_state = BacktrackState(
                word_start_idx=len(generated_ids),
                max_attempts=max_backtracks_per_word,
            )
            word_start_kv = kv_cache
            word_start_gen_len = len(generated_ids)
            current_word_ids = [token_id]
        else:
            current_word_ids.append(token_id)

        generated_ids.append(token_id)

    text = tokenizer.decode(generated_ids, skip_special_tokens=True)
    gen_time_ms = (time.time() - t0) * 1000

    return GenerationResult(
        token_ids=generated_ids,
        text=text,
        gen_time_ms=gen_time_ms,
        backtracks=total_backtracks,
        words_checked=words_checked,
    )

[ ] Step 4: Run tests to verify they pass

Run: cd packages/governors && uv run python -m pytest tests/test_loop.py -v Expected: All pass

[ ] Step 5: Run full test suite

Run: cd packages/governors && uv run python -m pytest tests/test_g2p.py tests/test_phonology.py tests/test_checker.py tests/test_backtrack.py tests/test_sampling.py tests/test_loop.py -v Expected: All pass

[ ] Step 6: Commit

git add packages/governors/src/phonolex_governors/generation/loop.py packages/governors/tests/test_loop.py
git commit -m "feat(governors): custom generation loop with word-boundary checking and backtracking"

Task 7: Constraint Types + Compiler¶

The user-facing constraint declarations and the compiler that translates them into CheckerConfig + boost configs.

Files: - Create: packages/governors/src/phonolex_governors/constraints/__init__.py - Create: packages/governors/src/phonolex_governors/constraints/types.py - Create: packages/governors/src/phonolex_governors/constraints/compiler.py - Create: packages/governors/tests/test_compiler.py

[ ] Step 1: Write the failing tests

Create packages/governors/tests/test_compiler.py:

"""Tests for constraint compiler."""

import pytest
from phonolex_governors.constraints.types import (
    Exclude, ExcludeInClusters, MSHStage, Bound, VocabOnly, Include, ThematicField,
)
from phonolex_governors.constraints.compiler import compile_constraints
from phonolex_governors.checking.checker import PhonemeExcludeCheck, MSHCheck, BoundCheck


def test_compile_exclude():
    constraints = [Exclude(phonemes={"R"})]
    checker_config, boost_config = compile_constraints(constraints)
    assert len(checker_config.checks) == 1
    assert isinstance(checker_config.checks[0], PhonemeExcludeCheck)
    assert "R" in checker_config.checks[0].excluded


def test_compile_msh():
    constraints = [MSHStage(max_stage=3)]
    checker_config, _ = compile_constraints(constraints)
    assert len(checker_config.checks) == 1
    assert isinstance(checker_config.checks[0], MSHCheck)
    assert checker_config.checks[0].max_stage == 3


def test_compile_bound():
    constraints = [Bound(norm="aoa_kuperman", max_val=5.0)]
    checker_config, _ = compile_constraints(constraints)
    assert len(checker_config.checks) == 1
    assert isinstance(checker_config.checks[0], BoundCheck)


def test_compile_include_goes_to_boosts():
    """Include constraints should produce boost config, not checker config."""
    constraints = [Include(phonemes={"TH"}, strength=2.0)]
    checker_config, boost_config = compile_constraints(constraints)
    assert len(checker_config.checks) == 0
    assert len(boost_config.includes) == 1


def test_compile_mixed():
    constraints = [
        Exclude(phonemes={"R"}),
        Bound(norm="aoa_kuperman", max_val=5.0),
        Include(phonemes={"TH"}, strength=2.0),
        MSHStage(max_stage=3),
    ]
    checker_config, boost_config = compile_constraints(constraints)
    assert len(checker_config.checks) == 3  # exclude + bound + msh
    assert len(boost_config.includes) == 1


def test_compile_normalizes_ipa_to_arpabet():
    """IPA phoneme ɹ should be normalized to ARPAbet R."""
    constraints = [Exclude(phonemes={"ɹ"})]
    checker_config, _ = compile_constraints(constraints)
    assert "R" in checker_config.checks[0].excluded

[ ] Step 2: Run tests to verify they fail

Run: cd packages/governors && uv run python -m pytest tests/test_compiler.py -v Expected: FAIL

[ ] Step 3: Implement types.py

Create packages/governors/src/phonolex_governors/constraints/__init__.py:

"""Declarative constraint types and compiler."""

Create packages/governors/src/phonolex_governors/constraints/types.py:

"""Constraint type definitions — the user-facing API.

Users declare what they want. The compiler translates to checker + boost configs.
"""

from __future__ import annotations

from dataclasses import dataclass, field


@dataclass(frozen=True)
class Exclude:
    phonemes: set[str]


@dataclass(frozen=True)
class ExcludeInClusters:
    phonemes: set[str]


@dataclass(frozen=True)
class MSHStage:
    max_stage: int


@dataclass(frozen=True)
class Bound:
    norm: str
    min_val: float | None = None
    max_val: float | None = None


@dataclass(frozen=True)
class VocabOnly:
    lists: set[str] | None = None
    words: set[str] | None = None


@dataclass(frozen=True)
class Include:
    phonemes: set[str]
    strength: float = 2.0
    target_rate: float | None = None


@dataclass(frozen=True)
class VocabBoost:
    lists: set[str] | None = None
    words: set[str] | None = None
    strength: float = 2.0
    target_rate: float | None = None


@dataclass(frozen=True)
class MinPairBoost:
    target: str
    contrast: str
    strength: float = 2.0


@dataclass(frozen=True)
class MaxOppositionBoost:
    target: str
    contrast: str
    strength: float = 2.0


@dataclass(frozen=True)
class ThematicField:
    seed_words: list[str]
    strength: float = 1.5
    threshold: float = 0.02


Constraint = (
    Exclude | ExcludeInClusters | MSHStage | Bound | VocabOnly
    | Include | VocabBoost | MinPairBoost | MaxOppositionBoost | ThematicField
)

[ ] Step 4: Implement compiler.py

Create packages/governors/src/phonolex_governors/constraints/compiler.py:

"""Compile user-facing constraints into CheckerConfig + BoostConfig.

Translates declarative constraint objects into the internal configurations
used by the word checker and the boost mechanisms.
"""

from __future__ import annotations

from dataclasses import dataclass, field

from phonolex_governors.constraints.types import (
    Constraint, Exclude, ExcludeInClusters, MSHStage, Bound, VocabOnly,
    Include, VocabBoost, MinPairBoost, MaxOppositionBoost, ThematicField,
)
from phonolex_governors.checking.checker import (
    CheckerConfig, PhonemeExcludeCheck, PhonemeExcludeClustersCheck,
    MSHCheck, BoundCheck, VocabOnlyCheck,
)

# IPA → ARPAbet mapping for common phonemes users might pass as IPA
_IPA_TO_ARPABET: dict[str, str] = {
    "ɹ": "R", "r": "R",
    "ɡ": "G", "g": "G",
    "θ": "TH", "ð": "DH",
    "ʃ": "SH", "ʒ": "ZH",
    "tʃ": "CH", "dʒ": "JH",
    "ŋ": "NG",
    "j": "Y",
    "p": "P", "b": "B", "t": "T", "d": "D", "k": "K",
    "f": "F", "v": "V", "s": "S", "z": "Z",
    "h": "HH", "m": "M", "n": "N", "l": "L", "w": "W",
}


def _normalize_phoneme(p: str) -> str:
    """Normalize a phoneme to ARPAbet. Pass through if already ARPAbet."""
    return _IPA_TO_ARPABET.get(p, p)


@dataclass
class BoostConfig:
    includes: list[Include] = field(default_factory=list)
    vocab_boosts: list[VocabBoost] = field(default_factory=list)
    min_pair_boosts: list[MinPairBoost] = field(default_factory=list)
    max_opp_boosts: list[MaxOppositionBoost] = field(default_factory=list)
    thematic_fields: list[ThematicField] = field(default_factory=list)


def compile_constraints(
    constraints: list[Constraint],
    norm_lookup: dict[str, dict[str, float]] | None = None,
    vocab_lookup: dict[str, set[str]] | None = None,
) -> tuple[CheckerConfig, BoostConfig]:
    """Compile constraints into checker config + boost config."""
    checks = []
    boost_config = BoostConfig()

    for c in constraints:
        if isinstance(c, Exclude):
            checks.append(PhonemeExcludeCheck(
                excluded={_normalize_phoneme(p) for p in c.phonemes},
            ))
        elif isinstance(c, ExcludeInClusters):
            checks.append(PhonemeExcludeClustersCheck(
                excluded={_normalize_phoneme(p) for p in c.phonemes},
            ))
        elif isinstance(c, MSHStage):
            checks.append(MSHCheck(max_stage=c.max_stage))
        elif isinstance(c, Bound):
            checks.append(BoundCheck(
                norm=c.norm, min_val=c.min_val, max_val=c.max_val,
            ))
        elif isinstance(c, VocabOnly):
            checks.append(VocabOnlyCheck(
                allowed_words=c.words, allowed_lists=c.lists,
            ))
        elif isinstance(c, Include):
            boost_config.includes.append(c)
        elif isinstance(c, VocabBoost):
            boost_config.vocab_boosts.append(c)
        elif isinstance(c, MinPairBoost):
            boost_config.min_pair_boosts.append(c)
        elif isinstance(c, MaxOppositionBoost):
            boost_config.max_opp_boosts.append(c)
        elif isinstance(c, ThematicField):
            boost_config.thematic_fields.append(c)

    checker_config = CheckerConfig(
        checks=checks,
        norm_lookup=norm_lookup or {},
        vocab_lookup=vocab_lookup or {},
    )
    return checker_config, boost_config

[ ] Step 5: Run tests to verify they pass

Run: cd packages/governors && uv run python -m pytest tests/test_compiler.py -v Expected: All pass

[ ] Step 6: Commit

git add packages/governors/src/phonolex_governors/constraints/ packages/governors/tests/test_compiler.py
git commit -m "feat(governors): constraint types and compiler — user API to checker+boost configs"

Task 8: Relocate Boost Mechanisms¶

Move existing boost code from flat files to boosts/ package. No logic changes — just file moves and import updates.

Files: - Create: packages/governors/src/phonolex_governors/boosts/__init__.py - Move: boosts.py → boosts/logit_boost.py - Move: include.py → boosts/include.py - Move: thematic.py → boosts/thematic.py - Move: cdd.py → boosts/cdd.py

[ ] Step 1: Create boosts package with relocated files

mkdir -p packages/governors/src/phonolex_governors/boosts

Create packages/governors/src/phonolex_governors/boosts/__init__.py:

"""Soft per-token boost mechanisms."""

from phonolex_governors.boosts.logit_boost import LogitBoost
from phonolex_governors.boosts.cdd import CDDConstraint, CDDProjection

Copy files preserving content:

cp packages/governors/src/phonolex_governors/boosts.py packages/governors/src/phonolex_governors/boosts/logit_boost.py
cp packages/governors/src/phonolex_governors/include.py packages/governors/src/phonolex_governors/boosts/include.py
cp packages/governors/src/phonolex_governors/thematic.py packages/governors/src/phonolex_governors/boosts/thematic.py
cp packages/governors/src/phonolex_governors/cdd.py packages/governors/src/phonolex_governors/boosts/cdd.py

[ ] Step 2: Update imports in relocated files

Each relocated file has internal imports that need updating. For example, in boosts/include.py, change:

from phonolex_governors.constraints import Constraint, parse_phono

to reference the old constraints module (still exists at this point):

from phonolex_governors.constraints import Constraint, parse_phono

(These imports still resolve because the old files are still in place. They'll be cleaned up when old files are removed.)

[ ] Step 3: Update package init.py

Update packages/governors/src/phonolex_governors/__init__.py to add exports from new locations while keeping old ones for backward compat:

"""PhonoLex Governors — constraint layer for language model generation."""

# New structure
from phonolex_governors.checking.g2p import word_to_phonemes, word_has_phoneme, G2PCache
from phonolex_governors.checking.checker import check_word, CheckerConfig, CheckResult
from phonolex_governors.generation.loop import generate, GenerationResult
from phonolex_governors.generation.backtrack import BacktrackState, Intervention
from phonolex_governors.constraints.types import (
    Exclude, ExcludeInClusters, MSHStage, Bound, VocabOnly,
    Include, VocabBoost, MinPairBoost, MaxOppositionBoost, ThematicField,
)
from phonolex_governors.constraints.compiler import compile_constraints, BoostConfig

# Legacy exports (still used by dashboard server, will migrate later)
from phonolex_governors.core import Governor, GovernorContext
from phonolex_governors.gates import HardGate
from phonolex_governors.boosts.logit_boost import LogitBoost
from phonolex_governors.boosts.include import IncludeConstraint, VocabBoostConstraint
from phonolex_governors.boosts.cdd import CDDConstraint, CDDProjection
from phonolex_governors.boosts.thematic import ThematicConstraint, AssocGraph, build_assoc_graph, assoc_strength
from phonolex_governors.lookups import LookupBuilder, PhonoFeatures, Syllable, TokenFeatures

[ ] Step 4: Run all tests to verify nothing broke

Run: cd packages/governors && uv run python -m pytest tests/ -v Expected: All tests pass (both old tests and new tests)

[ ] Step 5: Commit

git add packages/governors/src/phonolex_governors/boosts/ packages/governors/src/phonolex_governors/__init__.py
git commit -m "refactor(governors): relocate boost mechanisms to boosts/ package"

Task 9: Integration — Wire into Dashboard Server¶

Connect the new generation system to the FastAPI dashboard. The generate-single endpoint uses the new generate() function instead of model.generate().

Files: - Modify: packages/dashboard/server/model.py - Modify: packages/dashboard/server/routes/generate.py - Modify: packages/dashboard/server/governor.py

This task is intentionally left as a description rather than exact code because it requires reading the current state of these files (which may have changed from the bug fixes earlier in this session) and making surgical modifications. The key changes:

[ ] Step 1: Add a generate_with_checking() function to model.py

This wraps the new generate() loop from phonolex_governors.generation.loop with the dashboard's model, tokenizer, and boost setup. It replaces generate_single() for the constrained path.

[ ] Step 2: Update the /generate-single route to use the new path

When constraints are present, call generate_with_checking() instead of model.generate_single(). When no constraints, use the existing path (no overhead).

[ ] Step 3: Update governor.py to compile constraints

Replace build_governor() / build_logits_processor() with a function that calls compile_constraints() and returns (CheckerConfig, BoostConfig).

[ ] Step 4: Run dashboard tests

Run: cd packages/dashboard && uv run python -m pytest server/tests/ -v Expected: All pass

[ ] Step 5: Commit

git add packages/dashboard/server/
git commit -m "feat(dashboard): wire dynamic governor into generation pipeline"

Execution Notes¶

Tasks 1-5 build the new system bottom-up, each independently testable.
Task 6 is the big one — the generation loop that ties everything together.
Task 7 creates the new user-facing API.
Task 8 is a refactoring task — relocates existing code.
Task 9 is integration — wires the new system into the dashboard.
Old files (gates.py, core.py, old constraints.py) are NOT deleted in this plan. They stay for backward compatibility with old tests and the dashboard's existing code paths. A follow-up cleanup task can remove them once all consumers are migrated.