Next Session — v5 Constrained Generation¶

Where we are¶

Branch feature/governed-generation-ui. Dynamic governor with word-aware reranker working for exclusion and inclusion. Other constraint types wired but untested.

Tested and working: - exclude — zero leaks, word-aware reranker + GUARD retry - include — per-phoneme calibrated boost, self-regulating coverage, coverage % UI

Wired but untested (9 remaining): 1. exclude_clusters 2. bound (35 filterable norms) 3. complexity_wcm 4. complexity_syllables 5. complexity_shapes 6. msh 7. vocab_boost 8. vocab_only 9. boost_minpair 10. boost_maxopp 11. thematic

Clinical value assessment — what to prioritize¶

Tier 1: Essential for SLPs¶

exclude — "No /ɹ/ for this client." Core use case. Done.
include + coverage % — "Practice /b/ at 20%." Done.
msh — Motor Speech Hierarchy. "Only stage 2-3 sounds." Clinicians use this directly.
bound: aoa — "Words a 5-year-old would know." Very common clinical target.
complexity_syllables — "Only 1-2 syllable words." Standard clinical target.

Tier 2: Useful but secondary¶

exclude_clusters — "Allow /s/ in singletons but not clusters." Some clinicians want this.
complexity_wcm — More nuanced than syllable count. Researchers and advanced clinicians.
bound: concreteness — Concrete words are easier to visualize/teach.
vocab_only — Restrict to specific word lists (Ogden basic, GSL). ELL contexts.
thematic — "Words about animals." Themed therapy sessions.

Tier 3: Questionable — may not produce useful output¶

complexity_shapes — CV/CVC/CCVC. Very granular. May over-constrain and produce garbage.
vocab_boost — Soft targeting of word lists. Overlaps with thematic and include.
boost_minpair — Minimal pairs are a lookup/selection tool, not a generation constraint.
boost_maxopp — Same issue. Maximal opposition is a contrastive therapy approach.
bound: frequency/log_frequency — Raw frequency too opaque and restrictive.
bound: sensorimotor norms — Very niche, sparse data coverage.

Unified phoneme targeting: coverage 0% = exclude¶

Collapse exclude + include into a single "Phoneme Targeting" section. Each phoneme gets a coverage slider (0-50%). Coverage 0% routes to the hard exclude path (reranker penalty + GUARD, zero tolerance). Coverage >0% routes to the soft include boost. One mental model, one section, one slider per phoneme.

UI redesign: organized constraint categories¶

Reorganize the constraint UI into clear categories:

Phoneme Targeting — unified section, two modes:
Exclude mode (coverage 0% = hard block)
Include mode (coverage 5-50% = soft boost with self-regulating coverage)
Complexity — four controls in one section:
Max syllable count
Max WCM
Allowed syllable shapes
MSH stage
Psycholinguistic Bounds — curated to norms that don't break function words:
Safe for generation: familiarity, frequency, imageability, semantic_diversity, socialness, prevalence
Use with care (some function word failures): concreteness, AoA (Glasgow not Kuperman), valence, arousal, dominance
REMOVE from generation UI: all phonotactic probs, elp_lexical_decision_rt, aoa_kuperman, BoI, sensorimotor norms, phoneme_count, wcm_score (handled by Complexity section)
All norms remain available in analysis/lookup tools
Themed Vocabulary — semantic fields + word lists, composable:
Seed words define the semantic field (USF associations)
Word lists constrain the pool (Ogden, AVL, GSL, etc.)
VocabOnly mode (hard restrict to list + stop words + punctuation always)
VocabBoost mode (soft encourage)
AVL supported out of the box
e.g. /theme animals ogden_basic = animal words from Ogden list

Adjustable output length / complexity level — low AoA should produce shorter sentences, simpler structure. Auto-adjust max_new_tokens and punctuation boost aggressiveness based on active constraints, or expose as user-settable "reading level" bins.

Constraint sliders use percentile scale for opaque metrics only (25th-75th percentile frequency) instead of raw values (0.5-50). Raw values available in analysis/expert mode. All _percentile columns already in D1.

Power user features via slash commands in the prompt field. Command parser/registry in git history of deleted dashboard frontend — port to packages/web/frontend/src/.

Spec: docs/superpowers/specs/2026-03-16-governed-chat-command-language-design.md

Key files¶

packages/governors/src/phonolex_governors/generation/reranker.py — word-aware reranker with partial word reconstruction, self-regulating include coverage
packages/governors/src/phonolex_governors/checking/checker.py — all check types (exclude, MSH, bounds, complexity, vocab-only, clusters)
packages/generation/server/model.py — generate_with_checking(), penalty escalation, punctuation boost
packages/generation/server/governor.py — build_checker_config(), build_boost_processor()
packages/generation/server/word_norms.py — word-level norms (105K), vocab memberships, frequency-weighted phoneme natural rates
packages/generation/server/routes/generate.py — /generate-single with include coverage stats
packages/web/frontend/src/components/tools/GovernedGenerationTool/OutputCard.tsx — compliance + include highlighting
packages/web/frontend/src/components/tools/GovernedGenerationTool/PhonemeConstraints.tsx — coverage % slider (no strength)

Key tuning values¶

Penalty schedule: [15, 30, 60, 100, 100]
Include boost: min(2.5 * sqrt(ln(target/natural)), 10.0) per-phoneme, SUBTLEX frequency-weighted natural rates
Punctuation boost: 2.0 baseline + 2.0/word over 12 words
Temperature: 0.6, repetition_penalty: 1.3, top_k: 50, top_p: 0.9

How to start the servers¶

# Backend (FastAPI + T5Gemma, ~40s cold start)
cd packages/generation
nohup uv run uvicorn server.main:app --host 0.0.0.0 --port 8000 --log-level debug > /tmp/phonolex-backend.log 2>&1 &

# Frontend (React, instant)
cd packages/web/frontend
npm run dev

What NOT to do¶

READ THE EXISTING CODE BEFORE TOUCHING ANYTHING.

The UI is packages/web/frontend/. There is no dashboard frontend. Do not create one.
DO NOT reimplement algorithms that already exist. The reranker, checker, word norms, boost calibration, coverage tracking — all exist and work. Read reranker.py, checker.py, word_norms.py, model.py before writing anything.
DO NOT create new packages or directories unless explicitly asked. The code lives where it lives.
Don't boost compliant word-start tokens in the reranker — causes fragmentation by biasing toward new words over continuations
Don't use dictionary phoneme rates for boost calibration — use SUBTLEX frequency-weighted rates
Don't rebuild the governor lookup for phoneme enforcement — G2P handles that now
Don't use static per-token masks — the dynamic paradigm replaces them