Next Session — v5 Constrained Generation¶
Where we are¶
Branch feature/governed-generation-ui. Dynamic governor with word-aware reranker working for exclusion and inclusion. Other constraint types wired but untested.
Tested and working: - exclude — zero leaks, word-aware reranker + GUARD retry - include — per-phoneme calibrated boost, self-regulating coverage, coverage % UI
Wired but untested (9 remaining): 1. exclude_clusters 2. bound (35 filterable norms) 3. complexity_wcm 4. complexity_syllables 5. complexity_shapes 6. msh 7. vocab_boost 8. vocab_only 9. boost_minpair 10. boost_maxopp 11. thematic
Clinical value assessment — what to prioritize¶
Tier 1: Essential for SLPs¶
- exclude — "No /ɹ/ for this client." Core use case. Done.
- include + coverage % — "Practice /b/ at 20%." Done.
- msh — Motor Speech Hierarchy. "Only stage 2-3 sounds." Clinicians use this directly.
- bound: aoa — "Words a 5-year-old would know." Very common clinical target.
- complexity_syllables — "Only 1-2 syllable words." Standard clinical target.
Tier 2: Useful but secondary¶
- exclude_clusters — "Allow /s/ in singletons but not clusters." Some clinicians want this.
- complexity_wcm — More nuanced than syllable count. Researchers and advanced clinicians.
- bound: concreteness — Concrete words are easier to visualize/teach.
- vocab_only — Restrict to specific word lists (Ogden basic, GSL). ELL contexts.
- thematic — "Words about animals." Themed therapy sessions.
Tier 3: Questionable — may not produce useful output¶
- complexity_shapes — CV/CVC/CCVC. Very granular. May over-constrain and produce garbage.
- vocab_boost — Soft targeting of word lists. Overlaps with thematic and include.
- boost_minpair — Minimal pairs are a lookup/selection tool, not a generation constraint.
- boost_maxopp — Same issue. Maximal opposition is a contrastive therapy approach.
- bound: frequency/log_frequency — Raw frequency too opaque and restrictive.
- bound: sensorimotor norms — Very niche, sparse data coverage.
Unified phoneme targeting: coverage 0% = exclude¶
Collapse exclude + include into a single "Phoneme Targeting" section. Each phoneme gets a coverage slider (0-50%). Coverage 0% routes to the hard exclude path (reranker penalty + GUARD, zero tolerance). Coverage >0% routes to the soft include boost. One mental model, one section, one slider per phoneme.
UI redesign: organized constraint categories¶
Reorganize the constraint UI into clear categories:
- Phoneme Targeting — unified section, two modes:
- Exclude mode (coverage 0% = hard block)
-
Include mode (coverage 5-50% = soft boost with self-regulating coverage)
-
Complexity — four controls in one section:
- Max syllable count
- Max WCM
- Allowed syllable shapes
-
MSH stage
-
Psycholinguistic Bounds — curated to norms that don't break function words:
- Safe for generation: familiarity, frequency, imageability, semantic_diversity, socialness, prevalence
- Use with care (some function word failures): concreteness, AoA (Glasgow not Kuperman), valence, arousal, dominance
- REMOVE from generation UI: all phonotactic probs, elp_lexical_decision_rt, aoa_kuperman, BoI, sensorimotor norms, phoneme_count, wcm_score (handled by Complexity section)
-
All norms remain available in analysis/lookup tools
-
Themed Vocabulary — semantic fields + word lists, composable:
- Seed words define the semantic field (USF associations)
- Word lists constrain the pool (Ogden, AVL, GSL, etc.)
- VocabOnly mode (hard restrict to list + stop words + punctuation always)
- VocabBoost mode (soft encourage)
- AVL supported out of the box
- e.g.
/theme animals ogden_basic= animal words from Ogden list
Adjustable output length / complexity level — low AoA should produce shorter sentences, simpler structure. Auto-adjust max_new_tokens and punctuation boost aggressiveness based on active constraints, or expose as user-settable "reading level" bins.
Constraint sliders use percentile scale for opaque metrics only (25th-75th percentile frequency) instead of raw values (0.5-50). Raw values available in analysis/expert mode. All _percentile columns already in D1.
Power user features via slash commands in the prompt field. Command parser/registry in git history of deleted dashboard frontend — port to packages/web/frontend/src/.
Spec: docs/superpowers/specs/2026-03-16-governed-chat-command-language-design.md
Key files¶
packages/governors/src/phonolex_governors/generation/reranker.py— word-aware reranker with partial word reconstruction, self-regulating include coveragepackages/governors/src/phonolex_governors/checking/checker.py— all check types (exclude, MSH, bounds, complexity, vocab-only, clusters)packages/generation/server/model.py—generate_with_checking(), penalty escalation, punctuation boostpackages/generation/server/governor.py—build_checker_config(),build_boost_processor()packages/generation/server/word_norms.py— word-level norms (105K), vocab memberships, frequency-weighted phoneme natural ratespackages/generation/server/routes/generate.py—/generate-singlewith include coverage statspackages/web/frontend/src/components/tools/GovernedGenerationTool/OutputCard.tsx— compliance + include highlightingpackages/web/frontend/src/components/tools/GovernedGenerationTool/PhonemeConstraints.tsx— coverage % slider (no strength)
Key tuning values¶
- Penalty schedule: [15, 30, 60, 100, 100]
- Include boost:
min(2.5 * sqrt(ln(target/natural)), 10.0)per-phoneme, SUBTLEX frequency-weighted natural rates - Punctuation boost: 2.0 baseline + 2.0/word over 12 words
- Temperature: 0.6, repetition_penalty: 1.3, top_k: 50, top_p: 0.9
How to start the servers¶
# Backend (FastAPI + T5Gemma, ~40s cold start)
cd packages/generation
nohup uv run uvicorn server.main:app --host 0.0.0.0 --port 8000 --log-level debug > /tmp/phonolex-backend.log 2>&1 &
# Frontend (React, instant)
cd packages/web/frontend
npm run dev
What NOT to do¶
READ THE EXISTING CODE BEFORE TOUCHING ANYTHING.
- The UI is
packages/web/frontend/. There is no dashboard frontend. Do not create one. - DO NOT reimplement algorithms that already exist. The reranker, checker, word norms, boost calibration, coverage tracking — all exist and work. Read
reranker.py,checker.py,word_norms.py,model.pybefore writing anything. - DO NOT create new packages or directories unless explicitly asked. The code lives where it lives.
- Don't boost compliant word-start tokens in the reranker — causes fragmentation by biasing toward new words over continuations
- Don't use dictionary phoneme rates for boost calibration — use SUBTLEX frequency-weighted rates
- Don't rebuild the governor lookup for phoneme enforcement — G2P handles that now
- Don't use static per-token masks — the dynamic paradigm replaces them