Skip to content

Dynamic Governor with Informed Backtracking

Date: 2026-04-06 Status: Implemented — phoneme exclusion working end-to-end Branch: feature/governed-generation-ui

Problem

The original governor used static per-token masks built before generation starts. Phonemes are properties of words, not tokens — enforcing word-level constraints at the token level is a category error. BPE tokens are arbitrary character chunks with no inherent phonological meaning.

This isn't a data coverage problem. It's a paradigm problem.

Solution

Two-layer enforcement using HuggingFace's native model.generate():

  1. Reranker (per-step) — a LogitsProcessor that checks top-k candidate tokens against word-level constraints using G2P, boosting compliant tokens and penalizing non-compliant ones. Steers the model proactively.

  2. GUARD-style retry (post-hoc) — after generation, check every word with G2P. If violations found, add violating words to bad_words_ids and regenerate. Catches anything the reranker missed.

This preserves native model quality (proper KV cache, sampling, stopping) while adding constraint enforcement.

Prior Art

  • GUARD (ICLR 2025) — rejection sampling with optimized proposals. Our reranker is the proposal optimizer; the post-hoc check is the rejection step.
  • Grammar-Aligned Decoding (NeurIPS 2024) — distribution preservation when masking. Informed our reranker's boost/penalize approach over hard masking.
  • IterGen (ICLR 2025) — backtracking with KV cache preservation. Informed our custom loop design (retained for future use).

No prior work applies constrained decoding to clinical phonology.

Architecture

G2P Integration

g2p_en converts words to ARPAbet, then phonolex_data.mappings.load_arpa_to_ipa() converts to IPA. Everything speaks IPA — constraints come in as IPA from the frontend, G2P output is IPA, comparison is direct set intersection. No mapping tables, works for any phoneme.

0.06ms per word. Per-generation cache (G2PCache) avoids redundant lookups.

Reranker

Portable engine that takes any word-level criterion function and applies it to the model's top-k candidates at each decoding step. Plugs into HuggingFace's LogitsProcessor interface.

# Phoneme exclusion
criterion = lambda w: not any(p in word_to_phonemes(w) for p in excluded)

# AoA bound (future)
criterion = lambda w: norms.get(w, {}).get("aoa", 0) <= max_aoa

# Any word-level check
reranker = Reranker(criterion, tokenizer, g2p_cache)

Word Checker

Orchestrates constraint checks on a completed word. Takes (word, CheckerConfig, G2PCache), returns CheckResult with pass/fail and violation details. Supports: phoneme exclusion, MSH stage, bounds (AoA, concreteness, etc.), vocab restriction.

Generation Flow

1. Encode prompt (chat template for -it models, raw for base)
2. model.generate() with:
   - RerankerProcessor (per-step constraint steering)
   - PunctuationBoostProcessor (sentence structure)
   - repetition_penalty=1.2
   - temperature=0.7, top_p=0.9, top_k=50
   - bad_words_ids (accumulated from retries)
3. Truncate to last complete sentence
4. G2P word-level compliance check
5. If violations: ban violating words, retry (up to max_retries)
6. Return best compliant text

Package Structure

phonolex_governors/
├── checking/
│   ├── g2p.py           # G2P wrapper, returns IPA, cached
│   ├── phonology.py     # Exclude, MSH, clusters (IPA)
│   └── checker.py       # Word-level orchestrator
├── generation/
│   ├── loop.py          # Custom autoregressive loop (retained for future)
│   ├── backtrack.py     # Backtrack state + intervention planning
│   ├── sampling.py      # Top-k/p, repetition penalty, n-gram blocking
│   └── reranker.py      # Portable reranking engine
├── boosts/              # Relocated soft mechanisms
├── constraint_types.py  # Declarative constraint definitions
└── constraint_compiler.py # Compile to CheckerConfig + BoostConfig

Model

Current: google/t5gemma-9b-2b-ul2-it — Google's instruction-tuned T5Gemma (SFT+RLHF). 9B encoder, 2B decoder, bfloat16 on MPS (~24.6GB). Uses chat template via apply_chat_template().

Future: T5Gemma will track Gemma releases (Gemma 3, Gemma 4). Instruction-tuned variants will improve prose quality without any governor changes. Custom LoRA fine-tuning on clinical data remains an option for the base models.

Results

  • Zero /ɹ/ + /ɝ/ + /ɚ/ violations across multiple generations
  • ~10s per generation (128 tokens on MPS)
  • Natural, creative prose: "Clementine longed to be an outlaw in the Wild West town of Dusty Gulch"
  • Works for any IPA phoneme combination — no special cases
  • Reranker checks ~24K tokens/generation, boosts ~15K, penalizes ~9K

What's Next

  • Wire remaining constraint types (AoA bounds, MSH, complexity, vocab) into the word checker — the infrastructure supports them, they just need criterion functions
  • Soft boosts (Include, Thematic, VocabBoost) as additional LogitsProcessors
  • Norm/vocab data from PhonoLex database at the word level (replacing per-token governor lookup)
  • Multi-draft generation (infrastructure built, default 1, configurable)
  • Frontend analysis mode with G2P-based compliance highlighting