Dynamic Governor Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Replace the static per-token mask governor with a dynamic generate-test-revise loop that checks completed words at word boundaries using G2P, with informed backtracking on violations.
Architecture: Four layers built bottom-up: (1) G2P wrapper + phonological checks, (2) word-level constraint checker, (3) custom generation loop with backtracking, (4) constraint compiler that bridges the user-facing API to the new system. Existing boost mechanisms (LogitBoost, CDD, Include, Thematic) are preserved and relocated. HardGate is removed.
Tech Stack: Python 3.12, PyTorch, g2p_en, phonolex_data (syllabification, WCM, norms), T5Gemma on MPS
Spec: docs/superpowers/specs/2026-04-06-dynamic-governor-design.md
File Map¶
All paths relative to packages/governors/src/phonolex_governors/.
New files (the dynamic system)¶
| File | Responsibility |
|---|---|
checking/__init__.py |
Package exports |
checking/g2p.py |
g2p_en wrapper: word → phoneme list, caching, ARPAbet→IPA |
checking/phonology.py |
Phoneme exclusion, MSH stage, cluster detection from phoneme list |
checking/checker.py |
Orchestrate all constraint checks on a completed word |
generation/__init__.py |
Package exports |
generation/loop.py |
Custom autoregressive generation with word-boundary checkpoints |
generation/backtrack.py |
Backtrack state, failure history, intervention planning |
generation/sampling.py |
Top-k/p sampling, distribution-aware reweighting |
constraints/__init__.py |
Package exports |
constraints/types.py |
Constraint type definitions (dataclasses) |
constraints/compiler.py |
Compile constraint list → CheckerConfig + BoostConfig |
boosts/__init__.py |
Package exports |
boosts/logit_boost.py |
Relocated from boosts.py |
boosts/include.py |
Relocated from include.py |
boosts/thematic.py |
Relocated from thematic.py |
boosts/cdd.py |
Relocated from cdd.py |
data.py |
PhonoLex norm/vocab lookup for word-level checks |
Modified files¶
| File | Change |
|---|---|
__init__.py |
Update exports for new structure |
pyproject.toml |
Add g2p-en dependency |
Test files¶
| File | Tests |
|---|---|
tests/test_g2p.py |
G2P wrapper caching, normalization, edge cases |
tests/test_phonology.py |
Phoneme exclusion, MSH, clusters from phoneme lists |
tests/test_checker.py |
Word checker with multiple constraint types |
tests/test_loop.py |
Generation loop with mock model, backtracking |
tests/test_backtrack.py |
Failure history, intervention escalation |
tests/test_compiler.py |
Constraint compilation to CheckerConfig + BoostConfig |
Files to remove (after migration)¶
| File | Reason |
|---|---|
gates.py |
HardGate replaced by word-level checking |
core.py |
Governor class replaced by generation loop |
constraints.py |
Replaced by constraints/types.py + constraints/compiler.py |
boosts.py |
Relocated to boosts/logit_boost.py |
include.py |
Relocated to boosts/include.py |
thematic.py |
Relocated to boosts/thematic.py |
cdd.py |
Relocated to boosts/cdd.py |
Old files are removed in the final task after all consumers are migrated.
Task 1: G2P Wrapper¶
The foundation — everything else depends on being able to get phonemes from a word.
Files:
- Create: packages/governors/src/phonolex_governors/checking/__init__.py
- Create: packages/governors/src/phonolex_governors/checking/g2p.py
- Create: packages/governors/tests/test_g2p.py
- Modify: packages/governors/pyproject.toml (add g2p-en dep)
- [ ] Step 1: Add g2p-en dependency
In packages/governors/pyproject.toml, change:
dependencies = [
"torch>=2.0",
]
dependencies = [
"torch>=2.0",
"g2p-en>=2.1.0",
]
- [ ] Step 2: Create checking package
Create packages/governors/src/phonolex_governors/checking/__init__.py:
"""Word-level constraint checking."""
- [ ] Step 3: Write the failing tests
Create packages/governors/tests/test_g2p.py:
"""Tests for G2P wrapper."""
import pytest
from phonolex_governors.checking.g2p import word_to_phonemes, word_has_phoneme, G2PCache
def test_word_to_phonemes_returns_list():
phonemes = word_to_phonemes("cat")
assert isinstance(phonemes, list)
assert len(phonemes) > 0
def test_word_to_phonemes_known_word():
phonemes = word_to_phonemes("cat")
assert "K" in phonemes
assert "AE1" in phonemes
assert "T" in phonemes
def test_word_to_phonemes_oov_word():
"""G2P should handle out-of-vocabulary words via neural fallback."""
phonemes = word_to_phonemes("flibbertigibbet")
assert isinstance(phonemes, list)
assert len(phonemes) > 0
def test_word_has_phoneme_r():
assert word_has_phoneme("running", "R") is True
assert word_has_phoneme("cat", "R") is False
def test_word_has_phoneme_rhotacized_vowels():
"""ER0/ER1/ER2 are rhotacized vowels — should match R check."""
assert word_has_phoneme("verdant", "R") is True # contains ER
def test_word_has_phoneme_case_insensitive_input():
"""Input word should be case-insensitive."""
assert word_to_phonemes("Cat") == word_to_phonemes("cat")
def test_word_to_phonemes_punctuation_returns_empty():
assert word_to_phonemes(",") == []
assert word_to_phonemes("") == []
assert word_to_phonemes("123") == []
def test_cache_returns_same_result():
cache = G2PCache()
result1 = cache.get("cat")
result2 = cache.get("cat")
assert result1 == result2
assert cache.hits == 1 # second call was a cache hit
def test_cache_different_words():
cache = G2PCache()
cat = cache.get("cat")
dog = cache.get("dog")
assert cat != dog
assert cache.hits == 0 # no cache hits, both were misses
- [ ] Step 4: Run tests to verify they fail
Run: cd packages/governors && uv run python -m pytest tests/test_g2p.py -v
Expected: FAIL — module phonolex_governors.checking.g2p not found
- [ ] Step 5: Implement g2p.py
Create packages/governors/src/phonolex_governors/checking/g2p.py:
"""G2P wrapper — word to phonemes via g2p_en.
Provides cached phoneme resolution for any English word, including
out-of-vocabulary forms. Returns ARPAbet phonemes.
The R_PHONEMES set includes both /R/ and rhotacized vowels (ER0/ER1/ER2)
since clinically these are all "r-colored" sounds that SLPs treat as
part of the /r/ phoneme class.
"""
from __future__ import annotations
from g2p_en import G2p
# ARPAbet phonemes that count as "r-colored" for clinical purposes
R_PHONEMES = {"R", "ER0", "ER1", "ER2"}
# Singleton G2P instance (loads model on first use)
_g2p: G2p | None = None
def _get_g2p() -> G2p:
global _g2p
if _g2p is None:
_g2p = G2p()
return _g2p
def word_to_phonemes(word: str) -> list[str]:
"""Convert a word to ARPAbet phonemes.
Returns an empty list for non-alphabetic input (punctuation, numbers, empty).
"""
clean = word.strip().lower()
if not clean or not clean.isalpha():
return []
g2p = _get_g2p()
result = g2p(clean)
# g2p_en returns a mix of phonemes and characters for some inputs;
# filter to only ARPAbet tokens (uppercase, optionally with digit suffix)
return [p for p in result if p[0].isupper()]
def word_has_phoneme(word: str, phoneme: str) -> bool:
"""Check if a word contains a specific ARPAbet phoneme.
For 'R', also checks rhotacized vowels (ER0, ER1, ER2).
"""
phonemes = word_to_phonemes(word)
if phoneme == "R":
return bool(R_PHONEMES & set(phonemes))
return phoneme in phonemes
class G2PCache:
"""Per-generation phoneme cache. Create one per generation call."""
def __init__(self):
self._cache: dict[str, list[str]] = {}
self.hits = 0
self.misses = 0
def get(self, word: str) -> list[str]:
key = word.strip().lower()
if key in self._cache:
self.hits += 1
return self._cache[key]
self.misses += 1
result = word_to_phonemes(key)
self._cache[key] = result
return result
- [ ] Step 6: Run tests to verify they pass
Run: cd packages/governors && uv run python -m pytest tests/test_g2p.py -v
Expected: All pass
- [ ] Step 7: Commit
git add packages/governors/src/phonolex_governors/checking/ packages/governors/tests/test_g2p.py packages/governors/pyproject.toml
git commit -m "feat(governors): G2P wrapper with caching for word-level phoneme resolution"
Task 2: Phonological Check Functions¶
Pure functions that take a phoneme list and check specific phonological constraints. No G2P, no generation — just phoneme list in, pass/fail out.
Files:
- Create: packages/governors/src/phonolex_governors/checking/phonology.py
- Create: packages/governors/tests/test_phonology.py
- [ ] Step 1: Write the failing tests
Create packages/governors/tests/test_phonology.py:
"""Tests for phonological check functions."""
import pytest
from phonolex_governors.checking.phonology import (
check_exclude,
check_exclude_clusters,
check_msh_stage,
)
# --- Exclude ---
def test_exclude_catches_r():
phonemes = ["R", "AH1", "N", "IH0", "NG"] # "running"
result = check_exclude(phonemes, excluded={"R"})
assert not result.passed
assert "R" in result.found_phonemes
def test_exclude_passes_clean():
phonemes = ["K", "AE1", "T"] # "cat"
result = check_exclude(phonemes, excluded={"R"})
assert result.passed
def test_exclude_catches_rhotacized_vowel():
phonemes = ["V", "ER1", "D", "AH0", "N", "T"] # "verdant"
result = check_exclude(phonemes, excluded={"R"})
assert not result.passed
def test_exclude_multiple_phonemes():
phonemes = ["S", "T", "R", "IY1", "T"] # "street"
result = check_exclude(phonemes, excluded={"R", "S"})
assert not result.passed
assert {"R", "S"} <= result.found_phonemes
# --- MSH Stage ---
def test_msh_stage_pass():
phonemes = ["M", "AE1", "P"] # "map" — all stage ≤3
result = check_msh_stage(phonemes, max_stage=3)
assert result.passed
def test_msh_stage_fail():
phonemes = ["S", "AE1", "T"] # "sat" — /S/ is stage 5
result = check_msh_stage(phonemes, max_stage=3)
assert not result.passed
assert result.max_found_stage == 5
def test_msh_stage_vowel_only():
phonemes = ["AH0"]
result = check_msh_stage(phonemes, max_stage=2)
assert result.passed
# --- Exclude in clusters ---
def test_exclude_clusters_catches_s_cluster():
phonemes = ["S", "T", "R", "IY1", "T"] # "street"
syllables = [{"onset": ["S", "T", "R"], "nucleus": ["IY1"], "coda": ["T"]}]
result = check_exclude_clusters(phonemes, syllables, excluded={"S"})
assert not result.passed
def test_exclude_clusters_passes_singleton_onset():
phonemes = ["S", "AE1", "T"] # "sat" — /S/ is singleton onset, not cluster
syllables = [{"onset": ["S"], "nucleus": ["AE1"], "coda": ["T"]}]
result = check_exclude_clusters(phonemes, syllables, excluded={"S"})
assert result.passed
- [ ] Step 2: Run tests to verify they fail
Run: cd packages/governors && uv run python -m pytest tests/test_phonology.py -v
Expected: FAIL — module not found
- [ ] Step 3: Implement phonology.py
Create packages/governors/src/phonolex_governors/checking/phonology.py:
"""Phonological check functions — phoneme list in, pass/fail out.
These are pure functions with no G2P dependency. They operate on
ARPAbet phoneme lists and syllable structures.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from phonolex_governors.checking.g2p import R_PHONEMES
# ARPAbet consonants → MSH stage mapping
# Namasivayam et al. (2021)
_MSH_STAGES: dict[str, int] = {
"HH": 2,
"P": 3, "B": 3, "M": 3,
"F": 4, "W": 4, "R": 4,
"T": 5, "D": 5, "K": 5, "G": 5, "N": 5,
"S": 5, "Z": 5, "L": 5, "NG": 5,
"SH": 5, "ZH": 5, "CH": 5, "JH": 5,
"TH": 5, "DH": 5, "V": 5, "Y": 5,
}
# ARPAbet vowels (including stressed variants)
_ARPABET_VOWELS = {
f"{v}{s}" for v in [
"AA", "AE", "AH", "AO", "AW", "AY", "EH", "ER", "EY",
"IH", "IY", "OW", "OY", "UH", "UW",
] for s in ["", "0", "1", "2"]
}
@dataclass
class ExcludeResult:
passed: bool
found_phonemes: set[str] = field(default_factory=set)
@dataclass
class MSHResult:
passed: bool
max_found_stage: int = 1
@dataclass
class ClusterExcludeResult:
passed: bool
found_in_clusters: set[str] = field(default_factory=set)
def check_exclude(phonemes: list[str], excluded: set[str]) -> ExcludeResult:
"""Check if a phoneme list contains any excluded phonemes.
For 'R' exclusion, also catches rhotacized vowels (ER0/ER1/ER2).
"""
phoneme_set = set(phonemes)
found: set[str] = set()
for ex in excluded:
if ex == "R":
overlap = phoneme_set & R_PHONEMES
if overlap:
found.add("R")
elif ex in phoneme_set:
found.add(ex)
return ExcludeResult(passed=len(found) == 0, found_phonemes=found)
def check_msh_stage(phonemes: list[str], max_stage: int) -> MSHResult:
"""Check if all consonants are at or below the max MSH stage.
Vowels are stage 1. Unknown consonants default to stage 5.
"""
max_found = 1
for p in phonemes:
# Strip stress digits for lookup
base = p.rstrip("012")
stage = _MSH_STAGES.get(base)
if stage is not None:
max_found = max(max_found, stage)
elif p not in _ARPABET_VOWELS:
# Unknown consonant — conservative default
max_found = 5
return MSHResult(passed=max_found <= max_stage, max_found_stage=max_found)
def check_exclude_clusters(
phonemes: list[str],
syllables: list[dict],
excluded: set[str],
) -> ClusterExcludeResult:
"""Check if excluded phonemes appear in consonant clusters.
Only flags the phoneme when it's in an onset or coda with ≥2 consonants.
"""
found: set[str] = set()
for syl in syllables:
onset = syl.get("onset", [])
coda = syl.get("coda", [])
if len(onset) >= 2:
for p in onset:
if p in excluded:
found.add(p)
if len(coda) >= 2:
for p in coda:
if p in excluded:
found.add(p)
return ClusterExcludeResult(passed=len(found) == 0, found_in_clusters=found)
- [ ] Step 4: Run tests to verify they pass
Run: cd packages/governors && uv run python -m pytest tests/test_phonology.py -v
Expected: All pass
- [ ] Step 5: Commit
git add packages/governors/src/phonolex_governors/checking/phonology.py packages/governors/tests/test_phonology.py
git commit -m "feat(governors): phonological check functions for exclude, MSH, clusters"
Task 3: Word Checker¶
Orchestrates all constraint checks on a single completed word. Takes a word string and a CheckerConfig, returns a CheckResult.
Files:
- Create: packages/governors/src/phonolex_governors/checking/checker.py
- Create: packages/governors/tests/test_checker.py
- [ ] Step 1: Write the failing tests
Create packages/governors/tests/test_checker.py:
"""Tests for word-level constraint checker."""
import pytest
from phonolex_governors.checking.checker import (
check_word, CheckerConfig, CheckResult, PhonemeExcludeCheck,
MSHCheck, BoundCheck,
)
from phonolex_governors.checking.g2p import G2PCache
def test_exclude_r_catches_running():
config = CheckerConfig(checks=[PhonemeExcludeCheck(excluded={"R"})])
cache = G2PCache()
result = check_word("running", config, cache)
assert not result.passed
assert len(result.violations) == 1
assert "R" in result.violations[0].details
def test_exclude_r_passes_cat():
config = CheckerConfig(checks=[PhonemeExcludeCheck(excluded={"R"})])
cache = G2PCache()
result = check_word("cat", config, cache)
assert result.passed
assert len(result.violations) == 0
def test_msh_stage_check():
config = CheckerConfig(checks=[MSHCheck(max_stage=3)])
cache = G2PCache()
# "map" = /M AE P/ — all ≤ stage 3
assert check_word("map", config, cache).passed
# "sat" = /S AE T/ — S is stage 5
assert not check_word("sat", config, cache).passed
def test_bound_check_with_norms():
norms = {"cat": {"aoa_kuperman": 3.2}, "elephant": {"aoa_kuperman": 6.8}}
config = CheckerConfig(
checks=[BoundCheck(norm="aoa_kuperman", max_val=5.0)],
norm_lookup=norms,
)
cache = G2PCache()
assert check_word("cat", config, cache).passed
assert not check_word("elephant", config, cache).passed
def test_bound_check_unknown_word_passes():
"""Words not in the norm lookup pass bound checks (no data = no violation)."""
config = CheckerConfig(
checks=[BoundCheck(norm="aoa_kuperman", max_val=5.0)],
norm_lookup={},
)
cache = G2PCache()
assert check_word("xyzzy", config, cache).passed
def test_multiple_checks_all_must_pass():
norms = {"street": {"aoa_kuperman": 4.0}}
config = CheckerConfig(checks=[
PhonemeExcludeCheck(excluded={"R"}),
BoundCheck(norm="aoa_kuperman", max_val=5.0),
], norm_lookup=norms)
cache = G2PCache()
# "street" passes AoA but fails R exclusion
result = check_word("street", config, cache)
assert not result.passed
assert len(result.violations) == 1
def test_punctuation_always_passes():
config = CheckerConfig(checks=[PhonemeExcludeCheck(excluded={"R"})])
cache = G2PCache()
assert check_word(",", config, cache).passed
assert check_word(".", config, cache).passed
- [ ] Step 2: Run tests to verify they fail
Run: cd packages/governors && uv run python -m pytest tests/test_checker.py -v
Expected: FAIL — module not found
- [ ] Step 3: Implement checker.py
Create packages/governors/src/phonolex_governors/checking/checker.py:
"""Word-level constraint checker.
Pure function: (word, config, cache) → CheckResult.
Knows nothing about generation or backtracking.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from phonolex_governors.checking.g2p import G2PCache, word_to_phonemes
from phonolex_governors.checking.phonology import (
check_exclude, check_msh_stage, check_exclude_clusters,
)
@dataclass
class Violation:
check_type: str
details: str
@dataclass
class CheckResult:
passed: bool
violations: list[Violation] = field(default_factory=list)
# --- Check types ---
@dataclass
class PhonemeExcludeCheck:
excluded: set[str]
@dataclass
class PhonemeExcludeClustersCheck:
excluded: set[str]
@dataclass
class MSHCheck:
max_stage: int
@dataclass
class BoundCheck:
norm: str
min_val: float | None = None
max_val: float | None = None
@dataclass
class VocabOnlyCheck:
allowed_words: set[str] | None = None
allowed_lists: set[str] | None = None
Check = PhonemeExcludeCheck | PhonemeExcludeClustersCheck | MSHCheck | BoundCheck | VocabOnlyCheck
@dataclass
class CheckerConfig:
checks: list[Check] = field(default_factory=list)
norm_lookup: dict[str, dict[str, float]] = field(default_factory=dict)
vocab_lookup: dict[str, set[str]] = field(default_factory=dict)
def check_word(word: str, config: CheckerConfig, cache: G2PCache) -> CheckResult:
"""Check a completed word against all configured constraints.
Non-alphabetic tokens (punctuation, numbers) always pass.
"""
clean = word.strip().lower()
if not clean or not clean.isalpha():
return CheckResult(passed=True)
violations: list[Violation] = []
phonemes = cache.get(clean)
for check in config.checks:
if isinstance(check, PhonemeExcludeCheck):
result = check_exclude(phonemes, check.excluded)
if not result.passed:
violations.append(Violation(
check_type="exclude",
details=f"contains {','.join(sorted(result.found_phonemes))}",
))
elif isinstance(check, MSHCheck):
result = check_msh_stage(phonemes, check.max_stage)
if not result.passed:
violations.append(Violation(
check_type="msh",
details=f"stage {result.max_found_stage} > {check.max_stage}",
))
elif isinstance(check, BoundCheck):
norms = config.norm_lookup.get(clean, {})
val = norms.get(check.norm)
if val is not None:
if check.min_val is not None and val < check.min_val:
violations.append(Violation(
check_type="bound",
details=f"{check.norm}={val:.1f} < {check.min_val}",
))
if check.max_val is not None and val > check.max_val:
violations.append(Violation(
check_type="bound",
details=f"{check.norm}={val:.1f} > {check.max_val}",
))
elif isinstance(check, VocabOnlyCheck):
if check.allowed_words is not None:
if clean not in check.allowed_words:
violations.append(Violation(
check_type="vocab_only",
details=f"'{clean}' not in allowed words",
))
if check.allowed_lists is not None:
memberships = config.vocab_lookup.get(clean, set())
if not (check.allowed_lists & memberships):
violations.append(Violation(
check_type="vocab_only",
details=f"'{clean}' not in required lists",
))
elif isinstance(check, PhonemeExcludeClustersCheck):
# Requires syllable structure — get from phonolex_data if available
# For now, basic implementation using G2P phonemes
from phonolex_data.phonology.syllabification import syllabify, PhonemeWithStress, is_vowel
from phonolex_governors.checking.g2p import R_PHONEMES
pws = [PhonemeWithStress(p, 1 if p in _ARPABET_VOWELS else None) for p in phonemes]
syls = syllabify(pws)
if syls:
syl_dicts = [{"onset": s.onset, "nucleus": s.nucleus, "coda": s.coda} for s in syls]
result = check_exclude_clusters(phonemes, syl_dicts, check.excluded)
if not result.passed:
violations.append(Violation(
check_type="exclude_clusters",
details=f"cluster contains {','.join(sorted(result.found_in_clusters))}",
))
return CheckResult(passed=len(violations) == 0, violations=violations)
# Re-export for convenience
from phonolex_governors.checking.phonology import _ARPABET_VOWELS # noqa: E402
- [ ] Step 4: Run tests to verify they pass
Run: cd packages/governors && uv run python -m pytest tests/test_checker.py -v
Expected: All pass
- [ ] Step 5: Commit
git add packages/governors/src/phonolex_governors/checking/checker.py packages/governors/tests/test_checker.py
git commit -m "feat(governors): word-level constraint checker with phoneme, MSH, and bound checks"
Task 4: Backtrack Engine¶
Manages backtrack state, failure history, and intervention planning. No model dependency — pure state machine.
Files:
- Create: packages/governors/src/phonolex_governors/generation/__init__.py
- Create: packages/governors/src/phonolex_governors/generation/backtrack.py
- Create: packages/governors/tests/test_backtrack.py
- [ ] Step 1: Write the failing tests
Create packages/governors/tests/test_backtrack.py:
"""Tests for backtrack engine."""
import pytest
from phonolex_governors.generation.backtrack import BacktrackState, Intervention
from phonolex_governors.checking.checker import CheckResult, Violation
def test_initial_state_no_intervention():
state = BacktrackState()
intervention = state.plan_intervention()
assert intervention.banned_sequences == []
assert intervention.logit_bias == {}
def test_record_failure_tracks_history():
state = BacktrackState(word_start_idx=5)
result = CheckResult(passed=False, violations=[
Violation(check_type="exclude", details="contains R"),
])
state.record_failure(result, token_ids=[10, 20, 30])
assert state.attempt_count == 1
assert len(state.failure_history) == 1
def test_plan_intervention_bans_sequence():
state = BacktrackState(word_start_idx=5)
result = CheckResult(passed=False, violations=[
Violation(check_type="exclude", details="contains R"),
])
state.record_failure(result, token_ids=[10, 20, 30])
intervention = state.plan_intervention()
assert [10, 20, 30] in intervention.banned_sequences
def test_escalation_after_repeated_failures():
state = BacktrackState(word_start_idx=5)
for i in range(4):
result = CheckResult(passed=False, violations=[
Violation(check_type="exclude", details="contains R"),
])
state.record_failure(result, token_ids=[10 + i])
intervention = state.plan_intervention()
# After 3+ failures, should escalate (wider backtrack)
assert intervention.escalated
def test_max_attempts_reached():
state = BacktrackState(word_start_idx=5, max_attempts=3)
for i in range(3):
state.record_failure(
CheckResult(passed=False, violations=[Violation("exclude", "R")]),
token_ids=[i],
)
assert state.exhausted
- [ ] Step 2: Run tests to verify they fail
Run: cd packages/governors && uv run python -m pytest tests/test_backtrack.py -v
Expected: FAIL — module not found
- [ ] Step 3: Implement backtrack.py
Create packages/governors/src/phonolex_governors/generation/__init__.py:
"""Generation loop with word-boundary checking and backtracking."""
Create packages/governors/src/phonolex_governors/generation/backtrack.py:
"""Backtrack state and intervention planning.
Tracks failure history at a word position and escalates interventions
based on repeated failures.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from phonolex_governors.checking.checker import CheckResult
@dataclass
class Intervention:
"""What to do before retrying after a backtrack."""
banned_sequences: list[list[int]] = field(default_factory=list)
logit_bias: dict[int, float] = field(default_factory=dict)
temperature_delta: float = 0.0
escalated: bool = False
@dataclass
class BacktrackState:
"""Tracks backtrack attempts at a single word position."""
word_start_idx: int = 0
max_attempts: int = 5
failure_history: list[tuple[CheckResult, list[int]]] = field(default_factory=list)
@property
def attempt_count(self) -> int:
return len(self.failure_history)
@property
def exhausted(self) -> bool:
return self.attempt_count >= self.max_attempts
def record_failure(self, result: CheckResult, token_ids: list[int]) -> None:
"""Record a failed word attempt."""
self.failure_history.append((result, list(token_ids)))
def plan_intervention(self) -> Intervention:
"""Plan the intervention for the next retry based on failure history."""
if not self.failure_history:
return Intervention()
banned = [ids for _, ids in self.failure_history]
escalated = self.attempt_count >= 3
# Progressive temperature increase to explore more
temp_delta = min(0.1 * self.attempt_count, 0.3)
return Intervention(
banned_sequences=banned,
temperature_delta=temp_delta,
escalated=escalated,
)
def reset(self) -> None:
"""Reset state for a new word position."""
self.failure_history.clear()
- [ ] Step 4: Run tests to verify they pass
Run: cd packages/governors && uv run python -m pytest tests/test_backtrack.py -v
Expected: All pass
- [ ] Step 5: Commit
git add packages/governors/src/phonolex_governors/generation/ packages/governors/tests/test_backtrack.py
git commit -m "feat(governors): backtrack engine with failure history and escalating interventions"
Task 5: Sampling Utilities¶
Top-k/p sampling and banned sequence enforcement. These are used by the generation loop.
Files:
- Create: packages/governors/src/phonolex_governors/generation/sampling.py
- Create: packages/governors/tests/test_sampling.py
- [ ] Step 1: Write the failing tests
Create packages/governors/tests/test_sampling.py:
"""Tests for sampling utilities."""
import torch
import pytest
from phonolex_governors.generation.sampling import (
sample_token, apply_banned_sequences,
)
def test_sample_token_returns_valid_id():
logits = torch.randn(1, 100)
token_id = sample_token(logits, temperature=1.0, top_k=50, top_p=0.9)
assert 0 <= token_id < 100
def test_sample_token_deterministic_at_low_temp():
logits = torch.zeros(1, 100)
logits[0, 42] = 100.0 # overwhelmingly likely
token_id = sample_token(logits, temperature=0.01, top_k=50, top_p=0.9)
assert token_id == 42
def test_apply_banned_sequences_blocks_matching():
logits = torch.zeros(1, 100)
# Generated so far: [10, 20] — if banned sequence is [10, 20, 30],
# then token 30 should be blocked
generated = [10, 20]
banned = [[10, 20, 30]]
result = apply_banned_sequences(logits, generated, banned)
assert result[0, 30].item() < -1e8
def test_apply_banned_sequences_no_match():
logits = torch.zeros(1, 100)
generated = [10, 25] # doesn't match prefix [10, 20]
banned = [[10, 20, 30]]
result = apply_banned_sequences(logits, generated, banned)
assert result[0, 30].item() == 0.0 # not blocked
def test_apply_banned_sequences_multiple():
logits = torch.zeros(1, 100)
generated = [10, 20]
banned = [[10, 20, 30], [10, 20, 40]]
result = apply_banned_sequences(logits, generated, banned)
assert result[0, 30].item() < -1e8
assert result[0, 40].item() < -1e8
assert result[0, 50].item() == 0.0 # not affected
- [ ] Step 2: Run tests to verify they fail
Run: cd packages/governors && uv run python -m pytest tests/test_sampling.py -v
Expected: FAIL
- [ ] Step 3: Implement sampling.py
Create packages/governors/src/phonolex_governors/generation/sampling.py:
"""Sampling utilities for the generation loop."""
from __future__ import annotations
import torch
def sample_token(
logits: torch.Tensor,
temperature: float = 0.8,
top_k: int = 50,
top_p: float = 0.9,
) -> int:
"""Sample a single token from logits with temperature, top-k, and top-p.
Args:
logits: (1, vocab_size) raw logits
temperature: Sampling temperature
top_k: Keep only top-k tokens
top_p: Nucleus sampling threshold
Returns:
Token ID (int)
"""
logits = logits[0] / max(temperature, 1e-8)
# Top-k filtering
if top_k > 0:
topk_vals, _ = torch.topk(logits, min(top_k, logits.size(-1)))
logits[logits < topk_vals[-1]] = -float("inf")
# Top-p (nucleus) filtering
if top_p < 1.0:
sorted_logits, sorted_indices = torch.sort(logits, descending=True)
cumulative_probs = torch.cumsum(torch.softmax(sorted_logits, dim=-1), dim=-1)
cutoff_mask = cumulative_probs - torch.softmax(sorted_logits, dim=-1) >= top_p
sorted_logits[cutoff_mask] = -float("inf")
logits = sorted_logits.scatter(0, sorted_indices, sorted_logits)
probs = torch.softmax(logits, dim=-1)
return torch.multinomial(probs, num_samples=1).item()
def apply_banned_sequences(
logits: torch.Tensor,
generated: list[int],
banned: list[list[int]],
) -> torch.Tensor:
"""Block tokens that would continue a banned sequence.
For each banned sequence, if the generated tokens end with its prefix,
the next token in the banned sequence gets -inf logits.
"""
logits = logits.clone()
for seq in banned:
if len(seq) < 2:
continue
prefix = seq[:-1]
next_token = seq[-1]
# Check if generated ends with this prefix
if len(generated) >= len(prefix):
if generated[-len(prefix):] == prefix:
logits[0, next_token] = -float("inf")
return logits
- [ ] Step 4: Run tests to verify they pass
Run: cd packages/governors && uv run python -m pytest tests/test_sampling.py -v
Expected: All pass
- [ ] Step 5: Commit
git add packages/governors/src/phonolex_governors/generation/sampling.py packages/governors/tests/test_sampling.py
git commit -m "feat(governors): sampling utilities with banned sequence enforcement"
Task 6: Generation Loop¶
The custom autoregressive generation loop with word-boundary checking and backtracking. This is the core of the new system.
Files:
- Create: packages/governors/src/phonolex_governors/generation/loop.py
- Create: packages/governors/tests/test_loop.py
- [ ] Step 1: Write the failing tests
Create packages/governors/tests/test_loop.py:
"""Tests for generation loop with word-boundary checking.
Uses a mock model that produces a deterministic token sequence,
allowing us to test word-boundary detection and backtracking logic
without a real LLM.
"""
import torch
import pytest
from unittest.mock import MagicMock
from phonolex_governors.generation.loop import generate, GenerationResult
from phonolex_governors.checking.checker import CheckerConfig, PhonemeExcludeCheck
from phonolex_governors.checking.g2p import G2PCache
class MockModel:
"""Deterministic model that returns tokens from a fixed sequence."""
def __init__(self, token_sequence: list[int], vocab_size: int = 100):
self.token_sequence = token_sequence
self.vocab_size = vocab_size
self._step = 0
self.config = MagicMock()
self.config.vocab_size = vocab_size
def __call__(self, input_ids, past_key_values=None, use_cache=True):
"""Return logits that make the next token in sequence overwhelmingly likely."""
logits = torch.full((1, 1, self.vocab_size), -100.0)
if self._step < len(self.token_sequence):
logits[0, 0, self.token_sequence[self._step]] = 100.0
self._step += 1
else:
logits[0, 0, 1] = 100.0 # EOS fallback
out = MagicMock()
out.logits = logits
out.past_key_values = past_key_values # pass through
return out
def reset(self):
self._step = 0
class MockTokenizer:
"""Tokenizer that maps token IDs to fixed strings."""
def __init__(self, vocab: dict[int, str]):
self._vocab = vocab
self._reverse = {v: k for k, v in vocab.items()}
self.eos_token_id = 1
def decode(self, token_ids, skip_special_tokens=True):
return "".join(self._vocab.get(t, "") for t in token_ids)
def encode(self, text, return_tensors=None, add_special_tokens=True):
# Simplified: return tensor of first matching tokens
ids = [self._reverse.get(c, 0) for c in text.split()]
if return_tensors == "pt":
return {"input_ids": torch.tensor([ids])}
return ids
def test_generate_returns_result():
vocab = {0: "▁the", 2: "▁cat", 3: "▁sat", 1: ""}
model = MockModel([0, 2, 3, 1]) # "the cat sat" + EOS
tokenizer = MockTokenizer(vocab)
config = CheckerConfig()
result = generate(model, tokenizer, "test", config)
assert isinstance(result, GenerationResult)
assert len(result.token_ids) > 0
def test_generate_stops_at_eos():
vocab = {0: "▁hello", 1: ""}
model = MockModel([0, 1])
tokenizer = MockTokenizer(vocab)
config = CheckerConfig()
result = generate(model, tokenizer, "test", config)
# Should stop after EOS, not continue to max_tokens
assert len(result.token_ids) <= 2
def test_generate_respects_max_tokens():
vocab = {0: "▁word"}
model = MockModel([0] * 100)
tokenizer = MockTokenizer(vocab)
config = CheckerConfig()
result = generate(model, tokenizer, "test", config, max_tokens=10)
assert len(result.token_ids) <= 10
- [ ] Step 2: Run tests to verify they fail
Run: cd packages/governors && uv run python -m pytest tests/test_loop.py -v
Expected: FAIL
- [ ] Step 3: Implement loop.py
Create packages/governors/src/phonolex_governors/generation/loop.py:
"""Custom autoregressive generation loop with word-boundary checking.
Replaces model.generate() with a loop that:
1. Generates token by token
2. Detects word boundaries (SentencePiece ▁ prefix)
3. Checks completed words against constraints
4. Backtracks and retries on violations
"""
from __future__ import annotations
import time
from dataclasses import dataclass, field
import torch
from phonolex_governors.checking.checker import check_word, CheckerConfig
from phonolex_governors.checking.g2p import G2PCache
from phonolex_governors.generation.backtrack import BacktrackState, Intervention
from phonolex_governors.generation.sampling import sample_token, apply_banned_sequences
WORD_BOUNDARY_CHAR = "\u2581" # SentencePiece ▁
@dataclass
class GenerationResult:
token_ids: list[int]
text: str
gen_time_ms: float
backtracks: int = 0
words_checked: int = 0
def _is_word_boundary(token_text: str) -> bool:
"""Check if a token starts a new word (SentencePiece ▁ prefix)."""
return token_text.startswith(WORD_BOUNDARY_CHAR) or token_text.startswith(" ")
def _extract_word(tokenizer, token_ids: list[int]) -> str:
"""Decode a sequence of token IDs into a word, stripping boundaries."""
text = tokenizer.decode(token_ids, skip_special_tokens=True)
return text.strip()
def generate(
model,
tokenizer,
prompt: str,
checker_config: CheckerConfig,
boost_fn=None,
max_tokens: int = 256,
max_backtracks_per_word: int = 5,
temperature: float = 0.8,
top_k: int = 50,
top_p: float = 0.9,
) -> GenerationResult:
"""Generate text with word-boundary constraint checking and backtracking.
Args:
model: Language model with __call__(input_ids, past_key_values, use_cache)
tokenizer: Tokenizer with encode/decode
prompt: Input prompt text
checker_config: Word-level constraint configuration
boost_fn: Optional callable(logits) → logits for soft boosts
max_tokens: Maximum tokens to generate
max_backtracks_per_word: Max retries per word position
temperature, top_k, top_p: Sampling parameters
"""
t0 = time.time()
g2p_cache = G2PCache()
total_backtracks = 0
words_checked = 0
# Encode prompt
encoded = tokenizer.encode(prompt, return_tensors="pt", add_special_tokens=True)
if isinstance(encoded, dict):
input_ids = encoded["input_ids"]
else:
input_ids = torch.tensor([encoded])
device = next(model.parameters(), torch.tensor(0)).device
input_ids = input_ids.to(device)
# Initial forward pass to get KV cache
with torch.no_grad():
outputs = model(input_ids, past_key_values=None, use_cache=True)
kv_cache = outputs.past_key_values
# Generation state
generated_ids: list[int] = []
current_word_ids: list[int] = [] # tokens in the word being built
backtrack_state = BacktrackState(max_attempts=max_backtracks_per_word)
# Snapshot for backtracking
word_start_kv = kv_cache
word_start_gen_len = 0
for step in range(max_tokens):
# Get logits for next token
last_token = torch.tensor([[generated_ids[-1] if generated_ids else input_ids[0, -1].item()]], device=device)
with torch.no_grad():
outputs = model(last_token, past_key_values=kv_cache, use_cache=True)
logits = outputs.logits[:, -1:, :] # (1, 1, vocab)
kv_cache = outputs.past_key_values
# Apply soft boosts
if boost_fn is not None:
logits = boost_fn(logits)
# Apply banned sequences from backtrack state
intervention = backtrack_state.plan_intervention()
if intervention.banned_sequences:
logits = apply_banned_sequences(
logits.squeeze(0), current_word_ids, intervention.banned_sequences,
).unsqueeze(0)
# Sample
effective_temp = temperature + intervention.temperature_delta
token_id = sample_token(logits.squeeze(0), effective_temp, top_k, top_p)
# Check for EOS
if token_id == tokenizer.eos_token_id:
# Check the final word before ending
if current_word_ids:
word = _extract_word(tokenizer, current_word_ids)
result = check_word(word, checker_config, g2p_cache)
words_checked += 1
if not result.passed and not backtrack_state.exhausted:
# Backtrack
backtrack_state.record_failure(result, current_word_ids)
generated_ids = generated_ids[:word_start_gen_len]
current_word_ids = []
kv_cache = word_start_kv
total_backtracks += 1
continue
break
# Decode to check for word boundary
token_text = tokenizer.decode([token_id], skip_special_tokens=False)
if _is_word_boundary(token_text) and current_word_ids:
# Previous word is complete — check it
word = _extract_word(tokenizer, current_word_ids)
result = check_word(word, checker_config, g2p_cache)
words_checked += 1
if not result.passed and not backtrack_state.exhausted:
# Violation! Backtrack to word start
backtrack_state.record_failure(result, current_word_ids)
generated_ids = generated_ids[:word_start_gen_len]
current_word_ids = []
kv_cache = word_start_kv
total_backtracks += 1
continue
# Word passed — snapshot for next word
backtrack_state = BacktrackState(
word_start_idx=len(generated_ids),
max_attempts=max_backtracks_per_word,
)
word_start_kv = kv_cache
word_start_gen_len = len(generated_ids)
current_word_ids = [token_id]
else:
current_word_ids.append(token_id)
generated_ids.append(token_id)
text = tokenizer.decode(generated_ids, skip_special_tokens=True)
gen_time_ms = (time.time() - t0) * 1000
return GenerationResult(
token_ids=generated_ids,
text=text,
gen_time_ms=gen_time_ms,
backtracks=total_backtracks,
words_checked=words_checked,
)
- [ ] Step 4: Run tests to verify they pass
Run: cd packages/governors && uv run python -m pytest tests/test_loop.py -v
Expected: All pass
- [ ] Step 5: Run full test suite
Run: cd packages/governors && uv run python -m pytest tests/test_g2p.py tests/test_phonology.py tests/test_checker.py tests/test_backtrack.py tests/test_sampling.py tests/test_loop.py -v
Expected: All pass
- [ ] Step 6: Commit
git add packages/governors/src/phonolex_governors/generation/loop.py packages/governors/tests/test_loop.py
git commit -m "feat(governors): custom generation loop with word-boundary checking and backtracking"
Task 7: Constraint Types + Compiler¶
The user-facing constraint declarations and the compiler that translates them into CheckerConfig + boost configs.
Files:
- Create: packages/governors/src/phonolex_governors/constraints/__init__.py
- Create: packages/governors/src/phonolex_governors/constraints/types.py
- Create: packages/governors/src/phonolex_governors/constraints/compiler.py
- Create: packages/governors/tests/test_compiler.py
- [ ] Step 1: Write the failing tests
Create packages/governors/tests/test_compiler.py:
"""Tests for constraint compiler."""
import pytest
from phonolex_governors.constraints.types import (
Exclude, ExcludeInClusters, MSHStage, Bound, VocabOnly, Include, ThematicField,
)
from phonolex_governors.constraints.compiler import compile_constraints
from phonolex_governors.checking.checker import PhonemeExcludeCheck, MSHCheck, BoundCheck
def test_compile_exclude():
constraints = [Exclude(phonemes={"R"})]
checker_config, boost_config = compile_constraints(constraints)
assert len(checker_config.checks) == 1
assert isinstance(checker_config.checks[0], PhonemeExcludeCheck)
assert "R" in checker_config.checks[0].excluded
def test_compile_msh():
constraints = [MSHStage(max_stage=3)]
checker_config, _ = compile_constraints(constraints)
assert len(checker_config.checks) == 1
assert isinstance(checker_config.checks[0], MSHCheck)
assert checker_config.checks[0].max_stage == 3
def test_compile_bound():
constraints = [Bound(norm="aoa_kuperman", max_val=5.0)]
checker_config, _ = compile_constraints(constraints)
assert len(checker_config.checks) == 1
assert isinstance(checker_config.checks[0], BoundCheck)
def test_compile_include_goes_to_boosts():
"""Include constraints should produce boost config, not checker config."""
constraints = [Include(phonemes={"TH"}, strength=2.0)]
checker_config, boost_config = compile_constraints(constraints)
assert len(checker_config.checks) == 0
assert len(boost_config.includes) == 1
def test_compile_mixed():
constraints = [
Exclude(phonemes={"R"}),
Bound(norm="aoa_kuperman", max_val=5.0),
Include(phonemes={"TH"}, strength=2.0),
MSHStage(max_stage=3),
]
checker_config, boost_config = compile_constraints(constraints)
assert len(checker_config.checks) == 3 # exclude + bound + msh
assert len(boost_config.includes) == 1
def test_compile_normalizes_ipa_to_arpabet():
"""IPA phoneme ɹ should be normalized to ARPAbet R."""
constraints = [Exclude(phonemes={"ɹ"})]
checker_config, _ = compile_constraints(constraints)
assert "R" in checker_config.checks[0].excluded
- [ ] Step 2: Run tests to verify they fail
Run: cd packages/governors && uv run python -m pytest tests/test_compiler.py -v
Expected: FAIL
- [ ] Step 3: Implement types.py
Create packages/governors/src/phonolex_governors/constraints/__init__.py:
"""Declarative constraint types and compiler."""
Create packages/governors/src/phonolex_governors/constraints/types.py:
"""Constraint type definitions — the user-facing API.
Users declare what they want. The compiler translates to checker + boost configs.
"""
from __future__ import annotations
from dataclasses import dataclass, field
@dataclass(frozen=True)
class Exclude:
phonemes: set[str]
@dataclass(frozen=True)
class ExcludeInClusters:
phonemes: set[str]
@dataclass(frozen=True)
class MSHStage:
max_stage: int
@dataclass(frozen=True)
class Bound:
norm: str
min_val: float | None = None
max_val: float | None = None
@dataclass(frozen=True)
class VocabOnly:
lists: set[str] | None = None
words: set[str] | None = None
@dataclass(frozen=True)
class Include:
phonemes: set[str]
strength: float = 2.0
target_rate: float | None = None
@dataclass(frozen=True)
class VocabBoost:
lists: set[str] | None = None
words: set[str] | None = None
strength: float = 2.0
target_rate: float | None = None
@dataclass(frozen=True)
class MinPairBoost:
target: str
contrast: str
strength: float = 2.0
@dataclass(frozen=True)
class MaxOppositionBoost:
target: str
contrast: str
strength: float = 2.0
@dataclass(frozen=True)
class ThematicField:
seed_words: list[str]
strength: float = 1.5
threshold: float = 0.02
Constraint = (
Exclude | ExcludeInClusters | MSHStage | Bound | VocabOnly
| Include | VocabBoost | MinPairBoost | MaxOppositionBoost | ThematicField
)
- [ ] Step 4: Implement compiler.py
Create packages/governors/src/phonolex_governors/constraints/compiler.py:
"""Compile user-facing constraints into CheckerConfig + BoostConfig.
Translates declarative constraint objects into the internal configurations
used by the word checker and the boost mechanisms.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from phonolex_governors.constraints.types import (
Constraint, Exclude, ExcludeInClusters, MSHStage, Bound, VocabOnly,
Include, VocabBoost, MinPairBoost, MaxOppositionBoost, ThematicField,
)
from phonolex_governors.checking.checker import (
CheckerConfig, PhonemeExcludeCheck, PhonemeExcludeClustersCheck,
MSHCheck, BoundCheck, VocabOnlyCheck,
)
# IPA → ARPAbet mapping for common phonemes users might pass as IPA
_IPA_TO_ARPABET: dict[str, str] = {
"ɹ": "R", "r": "R",
"ɡ": "G", "g": "G",
"θ": "TH", "ð": "DH",
"ʃ": "SH", "ʒ": "ZH",
"tʃ": "CH", "dʒ": "JH",
"ŋ": "NG",
"j": "Y",
"p": "P", "b": "B", "t": "T", "d": "D", "k": "K",
"f": "F", "v": "V", "s": "S", "z": "Z",
"h": "HH", "m": "M", "n": "N", "l": "L", "w": "W",
}
def _normalize_phoneme(p: str) -> str:
"""Normalize a phoneme to ARPAbet. Pass through if already ARPAbet."""
return _IPA_TO_ARPABET.get(p, p)
@dataclass
class BoostConfig:
includes: list[Include] = field(default_factory=list)
vocab_boosts: list[VocabBoost] = field(default_factory=list)
min_pair_boosts: list[MinPairBoost] = field(default_factory=list)
max_opp_boosts: list[MaxOppositionBoost] = field(default_factory=list)
thematic_fields: list[ThematicField] = field(default_factory=list)
def compile_constraints(
constraints: list[Constraint],
norm_lookup: dict[str, dict[str, float]] | None = None,
vocab_lookup: dict[str, set[str]] | None = None,
) -> tuple[CheckerConfig, BoostConfig]:
"""Compile constraints into checker config + boost config."""
checks = []
boost_config = BoostConfig()
for c in constraints:
if isinstance(c, Exclude):
checks.append(PhonemeExcludeCheck(
excluded={_normalize_phoneme(p) for p in c.phonemes},
))
elif isinstance(c, ExcludeInClusters):
checks.append(PhonemeExcludeClustersCheck(
excluded={_normalize_phoneme(p) for p in c.phonemes},
))
elif isinstance(c, MSHStage):
checks.append(MSHCheck(max_stage=c.max_stage))
elif isinstance(c, Bound):
checks.append(BoundCheck(
norm=c.norm, min_val=c.min_val, max_val=c.max_val,
))
elif isinstance(c, VocabOnly):
checks.append(VocabOnlyCheck(
allowed_words=c.words, allowed_lists=c.lists,
))
elif isinstance(c, Include):
boost_config.includes.append(c)
elif isinstance(c, VocabBoost):
boost_config.vocab_boosts.append(c)
elif isinstance(c, MinPairBoost):
boost_config.min_pair_boosts.append(c)
elif isinstance(c, MaxOppositionBoost):
boost_config.max_opp_boosts.append(c)
elif isinstance(c, ThematicField):
boost_config.thematic_fields.append(c)
checker_config = CheckerConfig(
checks=checks,
norm_lookup=norm_lookup or {},
vocab_lookup=vocab_lookup or {},
)
return checker_config, boost_config
- [ ] Step 5: Run tests to verify they pass
Run: cd packages/governors && uv run python -m pytest tests/test_compiler.py -v
Expected: All pass
- [ ] Step 6: Commit
git add packages/governors/src/phonolex_governors/constraints/ packages/governors/tests/test_compiler.py
git commit -m "feat(governors): constraint types and compiler — user API to checker+boost configs"
Task 8: Relocate Boost Mechanisms¶
Move existing boost code from flat files to boosts/ package. No logic changes — just file moves and import updates.
Files:
- Create: packages/governors/src/phonolex_governors/boosts/__init__.py
- Move: boosts.py → boosts/logit_boost.py
- Move: include.py → boosts/include.py
- Move: thematic.py → boosts/thematic.py
- Move: cdd.py → boosts/cdd.py
- [ ] Step 1: Create boosts package with relocated files
mkdir -p packages/governors/src/phonolex_governors/boosts
Create packages/governors/src/phonolex_governors/boosts/__init__.py:
"""Soft per-token boost mechanisms."""
from phonolex_governors.boosts.logit_boost import LogitBoost
from phonolex_governors.boosts.cdd import CDDConstraint, CDDProjection
Copy files preserving content:
cp packages/governors/src/phonolex_governors/boosts.py packages/governors/src/phonolex_governors/boosts/logit_boost.py
cp packages/governors/src/phonolex_governors/include.py packages/governors/src/phonolex_governors/boosts/include.py
cp packages/governors/src/phonolex_governors/thematic.py packages/governors/src/phonolex_governors/boosts/thematic.py
cp packages/governors/src/phonolex_governors/cdd.py packages/governors/src/phonolex_governors/boosts/cdd.py
- [ ] Step 2: Update imports in relocated files
Each relocated file has internal imports that need updating. For example, in boosts/include.py, change:
from phonolex_governors.constraints import Constraint, parse_phono
from phonolex_governors.constraints import Constraint, parse_phono
- [ ] Step 3: Update package init.py
Update packages/governors/src/phonolex_governors/__init__.py to add exports from new locations while keeping old ones for backward compat:
"""PhonoLex Governors — constraint layer for language model generation."""
# New structure
from phonolex_governors.checking.g2p import word_to_phonemes, word_has_phoneme, G2PCache
from phonolex_governors.checking.checker import check_word, CheckerConfig, CheckResult
from phonolex_governors.generation.loop import generate, GenerationResult
from phonolex_governors.generation.backtrack import BacktrackState, Intervention
from phonolex_governors.constraints.types import (
Exclude, ExcludeInClusters, MSHStage, Bound, VocabOnly,
Include, VocabBoost, MinPairBoost, MaxOppositionBoost, ThematicField,
)
from phonolex_governors.constraints.compiler import compile_constraints, BoostConfig
# Legacy exports (still used by dashboard server, will migrate later)
from phonolex_governors.core import Governor, GovernorContext
from phonolex_governors.gates import HardGate
from phonolex_governors.boosts.logit_boost import LogitBoost
from phonolex_governors.boosts.include import IncludeConstraint, VocabBoostConstraint
from phonolex_governors.boosts.cdd import CDDConstraint, CDDProjection
from phonolex_governors.boosts.thematic import ThematicConstraint, AssocGraph, build_assoc_graph, assoc_strength
from phonolex_governors.lookups import LookupBuilder, PhonoFeatures, Syllable, TokenFeatures
- [ ] Step 4: Run all tests to verify nothing broke
Run: cd packages/governors && uv run python -m pytest tests/ -v
Expected: All tests pass (both old tests and new tests)
- [ ] Step 5: Commit
git add packages/governors/src/phonolex_governors/boosts/ packages/governors/src/phonolex_governors/__init__.py
git commit -m "refactor(governors): relocate boost mechanisms to boosts/ package"
Task 9: Integration — Wire into Dashboard Server¶
Connect the new generation system to the FastAPI dashboard. The generate-single endpoint uses the new generate() function instead of model.generate().
Files:
- Modify: packages/dashboard/server/model.py
- Modify: packages/dashboard/server/routes/generate.py
- Modify: packages/dashboard/server/governor.py
This task is intentionally left as a description rather than exact code because it requires reading the current state of these files (which may have changed from the bug fixes earlier in this session) and making surgical modifications. The key changes:
- [ ] Step 1: Add a
generate_with_checking()function to model.py
This wraps the new generate() loop from phonolex_governors.generation.loop with the dashboard's model, tokenizer, and boost setup. It replaces generate_single() for the constrained path.
- [ ] Step 2: Update the
/generate-singleroute to use the new path
When constraints are present, call generate_with_checking() instead of model.generate_single(). When no constraints, use the existing path (no overhead).
- [ ] Step 3: Update governor.py to compile constraints
Replace build_governor() / build_logits_processor() with a function that calls compile_constraints() and returns (CheckerConfig, BoostConfig).
- [ ] Step 4: Run dashboard tests
Run: cd packages/dashboard && uv run python -m pytest server/tests/ -v
Expected: All pass
- [ ] Step 5: Commit
git add packages/dashboard/server/
git commit -m "feat(dashboard): wire dynamic governor into generation pipeline"
Execution Notes¶
- Tasks 1-5 build the new system bottom-up, each independently testable.
- Task 6 is the big one — the generation loop that ties everything together.
- Task 7 creates the new user-facing API.
- Task 8 is a refactoring task — relocates existing code.
- Task 9 is integration — wires the new system into the dashboard.
- Old files (
gates.py,core.py, oldconstraints.py) are NOT deleted in this plan. They stay for backward compatibility with old tests and the dashboard's existing code paths. A follow-up cleanup task can remove them once all consumers are migrated.