Unified Constrained Generation Architecture¶

Date: 2026-04-15 Status: Draft Supersedes: Token-level governor architecture (HardGate, CDD, CoverageMechanism), session-based generation route

Summary¶

Everything is a word list. Two operations: BAN and BOOST.

Constraints are set in the frontend, sent as semantic types to the generation server, resolved to word lists via the PhonoLex Workers API, and applied through a four-layer enforcement pipeline: BAN/BOOST → GUARD → bigram dead-end filter → targeted rollout. The phonolex_governors package is stripped to its A-team (checker, reranker, lookahead). The static token-level governor (HardGate, LogitBoost, CDDProjection, CoverageMechanism) is removed entirely.

Research grounding: FUDGE (Yang & Klein, 2021) for prefix-level dead-end scoring. NeuroLogic A*esque Decoding (Lu et al., 2022) for targeted rollout escalation.

Motivation¶

The original governor architecture compiled declarative constraints into static per-token masks (HardGate) and logit biases (LogitBoost, CDDProjection) at startup. This was replaced in generate-single by a dynamic word-level system (Reranker + GUARD) that checks words via G2P at every decode step. The dynamic system works but has two problems:

Performance: The Reranker runs G2P on ~200 candidate tokens per step (up to 25K G2P calls per draft). This is the hot path.
Failure mode: When GUARD exhausts retries (4 attempts), it gives up and returns non-compliant output. There is no escalation — the same mechanisms that failed keep being applied harder.

The unified architecture solves both: word list pre-resolution eliminates per-step G2P, and the two-stage lookahead (bigram filter + targeted rollout) provides genuine escalation when base mechanisms fail.

Unified Constraint Model¶

Every constraint resolves to a word list. The generation server calls the PhonoLex Workers API to generate the list, then classifies it as BAN or BOOST.

BAN¶

All enforcement flows through a vocabulary trie — a single trie built from the full PhonoLex vocabulary (~126K words) at server startup (172ms). Each constraint set tags nodes with banned/total word counts (42ms per re-tag). The Reranker walks the trie for O(prefix-length) lookups instead of O(list-size) scans.

BAN: The word list tags trie nodes as banned. The Reranker computes a dead-end ratio for each candidate's prefix: banned_below / total_below. A prefix like "dre" with 94/94 = 100% banned is a hard dead end; "cat" with 20/172 = 11.6% gets a proportional soft penalty. This replaces both bad_words_ids (which choked on 40K entries) and the bigram dead-end filter (which was a probabilistic approximation). The trie gives exact dead-end ratios in 0.4μs per check.

GUARD-caught violations are added to bad_words_ids for retry (small, incremental list — fast).

BOOST¶

The word list contains words to encourage. Passed to the Reranker with a coverage target. The Reranker tracks running coverage (what fraction of generated words are in the boost set) and modulates logit adjustments to converge on the target rate. Self-regulating: boost strength increases when below target, eases off when at or above.

Constraint Resolution Table¶

Constraint	Mode	Workers API call	Returns
Phoneme exclude /ɹ/	BAN (direct)	`{ include_phonemes: ["ɹ"] }`	Words containing /ɹ/
Bounds (AoA max 5)	BAN (complement)	`{ max_aoa_kuperman: 5 }`	Words with AoA ≤ 5
Phoneme include /k/ 20%	BOOST	`{ include_phonemes: ["k"] }`	Words containing /k/
Bound boost (concreteness ≥ 3)	BOOST	`{ min_concreteness: 3 }`	Words with concreteness ≥ 3
Contrastive (s/z initial)	BOOST	`/api/contrastive/minimal-pairs`	Minimal pair word list

Enforcement Layers¶

Four layers. Each activates when the previous is insufficient.

Layer 1: BAN + BOOST (pre-generation)¶

Before model.generate() is called:

BAN word lists tag the vocabulary trie. The Reranker uses trie prefix walks to compute dead-end ratios per candidate.
BOOST word lists are passed to the Reranker with their coverage targets.
GUARD-caught violations from retries are added to bad_words_ids (small incremental list).

The Reranker runs as a LogitsProcessor during generation. For each top-k candidate, it reconstructs the partial word and walks the trie. Dead-end prefixes (high banned/total ratio) are penalized proportionally. Boost candidates in the boost set get coverage-modulated boosts.

Multiple boost lists compose: each has its own coverage counter and target. The Reranker iterates the boost lists and applies the appropriate adjustment per candidate.

No G2P at this layer. All checks are trie prefix walks — O(prefix length) per candidate, 0.4μs each, 25K checks per draft in 9.6ms total.

Layer 2: GUARD (post-generation)¶

After a draft is complete, every word is checked via check_word() with G2P. This is the correctness guarantee. If violations are found:

Violating words are added to bad_words_ids (hard ban for next attempt).
Reranker penalty is escalated.
Draft is regenerated.
Up to N retries (configurable, default 3).

GUARD handles edges that Layer 1 cannot: multi-token words where only the sequence completion is banned (prefix tokens pass through), OOV words not in any word list, words absent from norms data (fail-closed).

Layer 3: Targeted Rollout (escalation)¶

Activated when GUARD has retried M times without producing a compliant draft. Based on NeuroLogic A*esque Decoding (Lu et al., 2022).

The trie-based Reranker (Layer 1) already handles static dead-end detection. The targeted rollout adds context-aware lookahead for cases where the trie alone isn't sufficient:

Take the top-k candidates (post-Reranker).
For each candidate, append it to the current sequence and run 2-3 greedy forward passes — a short rollout.
Decode the rollout tokens, reconstruct the word(s) being formed.
Check the reconstructed words against the trie (banned prefix/word check).
Penalize candidates whose rollouts produce violations.

This is context-aware — it uses the model's actual predictions given the current sequence, not the static trie structure. The trie catches structural dead ends; the rollout catches contextual ones.

Best-of-N Selection¶

Wraps all layers. Generate N drafts through the full pipeline. Each draft is scored by compliance + quality (sentence count, length, word uniqueness). The best compliant draft is returned.

If all drafts fail: return the best draft with full compliance annotations. The clinician sees exactly what leaked and why — better than garbled output from over-constraining.

The Reranker's New Role¶

The Reranker simplifies from a general-purpose constraint enforcer to two focused responsibilities, both powered by a vocabulary trie:

1. Trie-based dead-end detection. For each top-k candidate, the Reranker reconstructs the partial word and walks the vocabulary trie. The trie node's banned_below / total_below ratio gives an exact dead-end score. Candidates with high ratios (e.g., "dre" → 100% banned for /ɹ/ exclusion) are penalized proportionally. This replaces both bad_words_ids (which didn't scale past ~1K entries) and the bigram dead-end filter (which was a probabilistic approximation). The trie gives exact answers in 0.4μs per check.

2. Boost coverage modulation. For BOOST word lists, the Reranker tracks running coverage (fraction of generated words in the boost set) against the target rate. Below target → boost candidates in the set. At or above target → ease off. Self-regulating.

The Reranker no longer needs: - G2P (moved to GUARD only) - The criterion function (replaced by trie walks) - bad_words_ids for large ban lists (replaced by trie dead-end ratios) - Bigram transition matrix (replaced by trie — exact, not probabilistic) - Per-phoneme natural rate calibration (the word list already encodes phoneme content) - Separate code paths for include phonemes vs other boost types

Every boost list is handled identically regardless of origin (phoneme inclusion, norm bounds, contrastive pairs).

`phonolex_governors` Package¶

Stripped to the A-team.

Keeps¶

checking/checker.py — check_word(), CheckerConfig, all Check types (PhonemeExcludeCheck, VocabOnlyCheck, etc.). GUARD's engine.
checking/g2p.py — G2PCache, word_to_phonemes(). Used by GUARD and by the bigram lookahead builder.
checking/phonology.py — check_exclude(), check_msh_stage(), etc. Phonological primitives called by check_word.
generation/reranker.py — Rewritten: vocabulary trie dead-end detection + boost coverage modulation, no G2P.
generation/trie.py — New. VocabTrie — full vocabulary trie with per-constraint ban/total tagging.
generation/lookahead.py — Targeted rollout LogitsProcessor (Layer 3 escalation).

Removed¶

core.py — Governor, GovernorContext, Mechanism. Static governor orchestration.
gates.py — HardGate. Replaced by bad_words_ids.
boosts.py — LogitBoost. Replaced by Reranker boost lists.
cdd.py — CDDProjection, CDDConstraint. Dead.
constraints.py — All declarative constraint classes (Exclude, Bound, Complexity, VocabOnly, NormCovered, MSHStage, MinPairBoost, MaxOppositionBoost). Replaced by word list resolution via Workers API.
include.py — IncludeConstraint, VocabBoostConstraint, _CoverageMechanism. Coverage tracking moves into the Reranker.
lookups.py — PhonoFeatures, Lookup, LookupBuilder. Token-level lookup for the static governor.

Generation Server Changes¶

`schemas.py`¶

The 11 constraint types collapse to 5, matching the frontend's StoreEntry types:

class ExcludeConstraint(BaseModel):
    type: Literal["exclude"]
    phonemes: list[str]

class IncludeConstraint(BaseModel):
    type: Literal["include"]
    phonemes: list[str]
    target_rate: float  # 0.0–1.0

class BoundConstraint(BaseModel):
    type: Literal["bound"]
    norm: str
    min: float | None = None
    max: float | None = None

class BoundBoostConstraint(BaseModel):
    type: Literal["bound_boost"]
    norm: str
    min: float | None = None
    max: float | None = None
    coverage_target: float  # 0.0–1.0

class ContrastiveConstraint(BaseModel):
    type: Literal["contrastive"]
    pair_type: Literal["minpair", "maxopp"]
    phoneme1: str
    phoneme2: str
    position: Literal["initial", "medial", "final", "any"]

Constraint = ExcludeConstraint | IncludeConstraint | BoundConstraint | BoundBoostConstraint | ContrastiveConstraint

Internal resolved representation:

class ResolvedConstraint(BaseModel):
    mode: Literal["ban", "boost"]
    words: list[str]
    strategy: Literal["direct", "complement"]  # for ban mode
    coverage_target: float | None = None        # for boost mode
    label: str                                  # for compliance/status reporting

`governor.py`¶

Removed: build_checker_config, build_boost_processor, build_governor, GovernorCache, _to_dg_constraint, HFGovernorProcessor.

Replaced by: - resolve_constraints() — calls Workers API, returns list[ResolvedConstraint]. - prepare_generation() — takes resolved constraints, tags the vocabulary trie, produces Reranker config.

`model.py`¶

generate_with_checking() restructured around the four layers:

Prepare BAN/BOOST from resolved constraints.
Generate with bad_words_ids + Reranker.
GUARD check.
Escalate (bigram filter, then rollout) if retries exhausted.
Best-of-N selection.

SSE streaming added: emits status events at each stage.

`word_norms.py`¶

No longer needs to load the full norms dictionary at startup for build_checker_config. The Workers API handles norm resolution. Still needed for GUARD's check_word (the checker verifies actual norm values post-hoc). Can become lazy-loaded.

`governor_lookup.json`¶

No longer needed. The token-level lookup was built for the static governor. Word lists from the Workers API replace it. The vocabulary trie is built from the full word list at startup.

Workers API: Word List Endpoint¶

New endpoint optimized for the generation server.

`POST /api/words/word-list`¶

Same filter engine as /api/words/search, lighter response shape, no pagination limit.

// Request
{
  include_phonemes?: string[],   // words containing ANY of these phonemes
  exclude_phonemes?: string[],   // words NOT containing these phonemes
  filters?: {                    // norm bounds (same keys as /search)
    min_aoa_kuperman?: number,
    max_aoa_kuperman?: number,
    min_concreteness?: number,
    // ...
  }
}

// Response
{
  words: string[],
  total: number
}

No new query logic — reuses the existing filter/pattern SQL engine with a lighter response projection.

Contrastive pairs use existing endpoints (/api/contrastive/minimal-pairs, /api/contrastive/maximal-opposition/word-lists). The generation server extracts word strings from the pair response.

Frontend Changes¶

`constraintCompiler.ts`¶

Updated to handle all 5 StoreEntry types. Currently drops bound_boost and contrastive silently. The compiler's role is unchanged: merge per-entry StoreEntries into API constraint shapes.

`governance.ts`¶

The Constraint union type simplifies from 11 types to 5:

type Constraint =
  | { type: "exclude"; phonemes: string[] }
  | { type: "include"; phonemes: string[]; target_rate: number }
  | { type: "bound"; norm: string; min?: number; max?: number }
  | { type: "bound_boost"; norm: string; min?: number; max?: number; coverage_target: number }
  | { type: "contrastive"; pair_type: "minpair" | "maxopp"; phoneme1: string; phoneme2: string; position: string }

`generationApi.ts`¶

Switches from single POST → response to SSE event stream. Emits status updates to a callback, delivers the final response as the terminal event.

`GovernedGenerationTool/index.tsx`¶

Renders a status line below the Generate button during generation. Each SSE status event updates the line. Status clears when the OutputCard appears.

`OutputCard.tsx`¶

Coverage display generalizes from include-phoneme-only to all boost constraint types. Same visual pattern: hit words highlighted in blue, coverage percentage reported. Bound boost and contrastive pair coverage stats displayed alongside phoneme inclusion coverage.

Vocabulary Trie¶

Architecture¶

A single trie built from the full PhonoLex vocabulary (~126K words) at server startup. Each node stores: - children: dict[str, TrieNode] — character-keyed child nodes - is_end: bool — whether this node terminates a word - banned_below: int — count of banned words in the subtree (re-tagged per constraint set) - total_below: int — count of all words in the subtree (static after build)

Lifecycle¶

Startup (172ms): Build full trie from all PhonoLex words via /api/words/word-list. Structure is static.
Per-request (42ms): Tag nodes with banned_below counts from the current constraint set's ban list.
Per-step (0.4μs/check): Reranker walks the trie for each top-k candidate, computing banned_below / total_below dead-end ratios.

Dead-End Detection¶

The Reranker reconstructs the partial word for each candidate token and walks the trie:

dead_end_ratio("dre") = node.banned_below / node.total_below = 94/94 = 1.0
dead_end_ratio("cat") = 20/172 = 0.116
dead_end_ratio("str") = 443/445 = 0.996

Penalty is proportional: logits[tid] -= penalty * dead_end_ratio. 100% dead ends get full penalty; 11% gets light penalty.

This replaces both bad_words_ids (which choked at 40K entries) and the bigram transition matrix (which was a probabilistic approximation requiring a corpus build step). The trie is exact, fast, and derived directly from the constraint's word list.

Files¶

Trie implementation: packages/governors/src/phonolex_governors/generation/trie.py
Built at startup in packages/generation/server/governor.py

Targeted Rollout (Layer 3)¶

When the trie + GUARD retries are insufficient:

Top-k candidates (post-Reranker).
For each candidate, run 2-3 greedy forward passes through the model.
Decode rollout tokens, reconstruct words.
Check words against the trie (banned prefix/word check).
Penalize candidates whose rollouts produce violations.

Context-aware — uses the model's actual predictions, not the static trie structure.

Progress Reporting¶

The generation endpoint emits SSE status events throughout the pipeline:

→ "Resolving constraints..."
→ "Fetching word lists... (3 constraints)"
→ "Generating draft 1..."
→ "Checking compliance..."
→ "Draft 1: 2 violations, retrying..."
→ "Generating draft 1 (attempt 2, stronger penalties)..."
→ "Checking compliance..."
→ "Draft 1: compliant"
→ "Generating draft 2..."
→ "Selecting best draft..."
→ [final response]

Escalation events:

→ "Retries exhausted, activating lookahead..."
→ "Building dead-end filter..."
→ "Generating with lookahead..."
→ "Activating targeted rollout..."

Frontend renders as a status line below the Generate button. Each message overwrites the previous. Clears on OutputCard render.

Architecture Diagram¶

Frontend                    Generation Server              Workers API
────────                    ─────────────────              ───────────
StoreEntry[] ──compile──→   Semantic constraints ──────→   POST /api/words/word-list
                            ←── word lists (BAN/BOOST) ──  POST /api/contrastive/...

                            ┌─────────────────────────────────────┐
                            │ Vocab Trie (126K words, built once)  │
                            │   tagged per constraint set (42ms)  │
                            ├─────────────────────────────────────┤
                            │ Layer 1: Trie Reranker + BOOST      │
                            │   dead-end ratio per prefix (0.4μs) │
                            │   boost coverage modulation         │
                            │   → generate draft                  │
                        SSE ├─────────────────────────────────────┤
                      status│ Layer 2: GUARD                      │
                     events │   check_word() via G2P              │
                            │   ban violations → bad_words_ids    │
                            ├─────────────────────────────────────┤
                            │ Layer 3: Targeted rollout           │
                            │   2-3 token greedy continuation     │
                            │   check rollout words vs trie       │
                            ├─────────────────────────────────────┤
                            │ Best-of-N: score + select           │
                            └─────────────────────────────────────┘
                                          │
                            SSE final ←───┘
                                          │
←── OutputCard ◄──────────────────────────┘
    compliance highlighting
    coverage stats (all boost types)
    status line during generation