Governed Generation — Product Plan¶

2026-03-12 — planning document for monorepo migration, architecture unification, and next phase

The Big Picture¶

Three repos currently do overlapping work with duplicated utilities and fragmented data handling:

Repo	What it is	What it does
PhonoLex	Phonological analysis platform	5 tools (word lists, text analysis, contrastive sets, similarity search, lookup), 15 datasets, 68K+ cognitive edges, Cloudflare Workers + D1, React + MUI frontend
phonolex-governors	Constraint library	Gates/Boosts/Projections, LookupBuilder, dataset loaders, HuggingFace adapter
constrained_chat	Governed generation dashboard	FastAPI + React, T5Gemma 9B-2B, chat + generate modes, compliance panel

The duplication problem¶

Same data loaded 3+ ways: - CMU dict: PhonoLex Python (english_data_loader.py), phonolex-governors (datasets.cmudict_to_phono()), build_lookup_phonolex.py - Psycholinguistic norms: PhonoLex export-to-d1.py, phonolex-governors (datasets.load_kuperman() etc.), build_lookup_phonolex.py - IPA/ARPAbet conversion: PhonoLex data/mappings/, phonolex-governors internal - Syllabification: PhonoLex syllabification.py (full onset-nucleus-coda), phonolex-governors (none — hence WCM is all zeros) - Phoneme normalization: PhonoLex Workers normalize.ts, build_lookup_phonolex.py IPA_NORMALIZE

The vision: one thing with a few faces¶

One unified platform with three faces:

PhonoLex (existing) — interactive word analysis, list building, contrastive sets, similarity search. The datahub.
Governed Generation (existing) — live governed content generation with real-time compliance. The clinical tool.
Content Catalog (new) — batch-generate 10K stories with specified properties, package, and provide a browsable catalog of static compliant content. Same governor tech, no live model needed to consume.

Shared backbone: - Unified data layer (PhonoLex is the source of truth for all phonological + psycholinguistic data) - Governor engine (constraint compilation, logit processing) - Lookup/enrichment pipeline (tokenizer-aware, subword-attributed)

1. Monorepo Architecture¶

Proposed structure¶

phonolex/                              # The product IS PhonoLex
├── packages/
│   ├── data/                          # THE data layer (single source of truth)
│   │   ├── sources/                   # Raw dataset files (CMU, PHOIBLE, norms)
│   │   ├── mappings/                  # IPA/ARPAbet conversion tables
│   │   ├── loaders/                   # Python data loading (ONE place)
│   │   │   ├── cmudict.py             # CMUdict → phonological features
│   │   │   ├── norms.py               # All 13 norm datasets
│   │   │   ├── associations.py        # 7 cognitive edge datasets
│   │   │   ├── phoible.py             # Phoneme feature vectors
│   │   │   ├── vocab_lists.py         # Curated vocabulary lists
│   │   │   └── g2p_alignment.py       # PhonoLex G2P alignment data
│   │   ├── phonology/                 # Phonological computation (ONE place)
│   │   │   ├── syllabification.py     # Onset-nucleus-coda extraction
│   │   │   ├── wcm.py                 # Word Complexity Measure
│   │   │   ├── similarity.py          # Soft Levenshtein (Python port)
│   │   │   └── normalize.py           # IPA normalization (ɡ→g etc.)
│   │   └── pyproject.toml             # pip install phonolex-data
│   │
│   ├── governors/                     # Constraint engine
│   │   ├── src/phonolex_governors/   # Gates, Boosts, Projections
│   │   │   ├── core.py                # Governor composition
│   │   │   ├── gates.py               # HardGate (Exclude, VocabOnly)
│   │   │   ├── boosts.py              # SoftBoost (Include — NEW)
│   │   │   ├── projections.py         # Dynamic (Coverage — NEW)
│   │   │   ├── constraints.py         # Constraint → layer compilation
│   │   │   └── adapters/
│   │   │       └── huggingface.py     # HFGovernorProcessor
│   │   ├── lookup/                    # LookupBuilder (uses packages/data)
│   │   └── pyproject.toml             # pip install phonolex-governors
│   │
│   ├── web/                           # PhonoLex web app (face #1)
│   │   ├── workers/                   # Cloudflare Workers API
│   │   │   ├── src/routes/            # words, similarity, contrastive, text, etc.
│   │   │   └── scripts/export-to-d1.py
│   │   └── frontend/                  # React + MUI
│   │
│   ├── dashboard/                     # Governed Generation (face #2)
│   │   ├── server/                    # FastAPI backend
│   │   │   ├── model.py               # T5Gemma loading + generation
│   │   │   ├── governor.py            # Governor cache + HF adapter
│   │   │   ├── profiles.py            # Constraint profiles
│   │   │   └── routes/
│   │   ├── frontend/                  # React + Tailwind
│   │   └── scripts/
│   │       └── build_lookup.py        # Uses packages/data + packages/governors
│   │
│   └── catalog/                       # Content Catalog (face #3 — NEW)
│       ├── generator/                 # Batch generation pipeline
│       │   ├── batch.py               # Run N generations with profile
│       │   ├── indexer.py             # Build searchable catalog
│       │   └── export.py              # Package for distribution
│       ├── server/                    # Browse/search API
│       └── frontend/                  # Catalog browser UI
│
├── docs/
├── pyproject.toml                     # Workspace root
└── README.md

Key architectural decisions¶

packages/data is the single source of truth. All data loading, phonological computation, and normalization lives here. No more datasets.cmudict_to_phono() in governors AND english_data_loader.py in PhonoLex AND inline loading in scripts.

packages/governors depends on packages/data. LookupBuilder calls data.loaders.cmudict, data.loaders.norms, data.phonology.syllabification, etc. No internal dataset copies.

PhonoLex web app keeps its Cloudflare architecture. The D1 database is a precomputed snapshot of packages/data output — the export-to-d1.py script is the bridge. No runtime dependency on Python data loaders.

Dashboard build_lookup.py uses the shared data layer. Instead of importing from phonolex_governors.datasets, it imports from packages/data. The governors package provides the constraint engine; the data package provides the data.

2. Immediate Changes (Tomorrow)¶

2a. Remove `vocab_memberships` from RichToken¶

Strip from wire format. Keep in lookup for governor use.

Files: - server/schemas.py — remove field from RichToken - server/model.py — stop passing in enrich_tokens() - frontend/src/types.ts — remove from interface - Tests: update fixtures

2b. Inline Turn Compliance Cards¶

Replace the separate CompliancePanel sparkline with per-turn compliance cards rendered inline with the chat.

Layout:

┌─────────────────────────────────┬──────────────────────────┐
│ Chat                            │ Turn Compliance          │
├─────────────────────────────────┼──────────────────────────┤
│ 🧑 Tell me about your day      │                          │
│                                 │                          │
│ 🤖 I went to the park and      │ ✓ 32/32 pass             │
│    played with my dog.          │ AoA: avg 3.2 (≤ 8.0)    │
│                                 │ /ɹ/: 0 tokens ✓         │
│                                 │ Conc: avg 4.1 (≥ 3.0)   │
├─────────────────────────────────┼──────────────────────────┤
│ 🧑 What did you see there?     │                          │
│                                 │                          │
│ 🤖 I saw birds and flowers.    │ ✓ 18/18 pass             │
│    The sun was warm.            │ AoA: avg 2.8 (≤ 8.0)    │
│                                 │ /ɹ/: 0 tokens ✓         │
│                                 │ ★ "flowers" AoA=5.8     │
└─────────────────────────────────┴──────────────────────────┘

Each card shows: - Pass/fail ratio - Per-constraint summary (Exclude: count of blocked phoneme tokens; Bound: avg/max vs threshold) - Flagged tokens: edge cases, highest-value tokens near thresholds - Future: Include hit rate, Coverage running %

Component: TurnComplianceCard — computed client-side from Turn.assistant_tokens[].compliance + .norms

3. New Constraint Types¶

3a. Include (Boost layer)¶

Purpose: Increase probability of tokens containing target phonemes.

class IncludeConstraint(BaseModel):
    type: Literal["include"]
    phonemes: list[str]
    strength: float = 2.0  # logit boost magnitude

Governor layer: SoftBoost — adds +strength to logits of tokens whose phonemes overlap with targets. Unlike gates (binary), boosts shift the distribution without eliminating options.

Clinical use: "Generate text rich in /b/ and /d/ for minimal pair practice"

3b. Coverage Rate (Projection layer)¶

Purpose: Hit a target phoneme rate across the generation.

class CoverageConstraint(BaseModel):
    type: Literal["coverage"]
    phonemes: list[str]
    target_rate: float       # 0.0–1.0
    window: int | None = None

Governor layer: Stateful projection. Each generation step: 1. Count content tokens generated so far 2. Count how many contained target phonemes 3. Compute current_rate = target_count / total_count 4. If current_rate < target_rate: boost target tokens proportionally 5. If current_rate ≥ target_rate: reduce or zero out the boost

Needs: GovernorContext must carry generation history (current input_ids provides this). Projection layer in phonolex-governors needs stateful callback support.

3c. Thematic Constraint¶

Purpose: Keep generation within a semantic field using PhonoLex's cognitive association graph.

class ThematicConstraint(BaseModel):
    type: Literal["thematic"]
    seed_words: list[str]
    strength: float = 1.5

Implementation: At build time, expand seed_words through association graph (USF data from PhonoLex) to get a set of related tokens. Apply as a soft boost to those tokens.

Data source: PhonoLex already has 1M+ cognitive edges across 7 relationship types. The export-to-d1.py pipeline or a direct Python loader can provide these.

4. Norms Expansion¶

Available in lookup, not yet in profiles¶

Norm	Coverage	Clinical use	Priority
`valence`	34K	Emotional tone (positive/negative)	High
`arousal`	34K	Calm vs exciting language	High
`imageability`	15K	Vivid, concrete descriptions	High
`frequency` / `log_frequency`	79K	Common words only	Medium
`socialness`	19K	Interpersonal vocabulary	Medium
`dominance`	34K	Empowering language	Medium
`familiarity`	15K	Familiar vs rare	Medium
`iconicity`	34K	Sound-meaning correspondence	Low
`boi`	22K	Body-object interaction	Low
`semantic_diversity`	60K	Meaning flexibility	Low
`contextual_diversity`	79K	Context breadth	Low
`lexical_decision_rt`	59K	Processing speed	Low
Sensorimotor (11 dims)	46K	Modality-specific content	Low

Quick-win profiles to add¶

Calm therapeutic: arousal ≤ 3.0
Positive affect: valence ≥ 6.0
High imageability: imageability ≥ 5.0
High frequency only: log_frequency ≥ 3.0
Social language: socialness ≥ 5.0

UI change¶

NormSliders.tsx should dynamically list available norms from a registry, not hardcode AoA + concreteness.

5. Phonological Data Gaps¶

WCM, syllables, shapes — all zeros¶

Root cause: cmudict_to_phono() in phonolex-governors doesn't compute syllable structure or WCM. PhonoLex's syllabification.py DOES — full onset-nucleus-coda extraction, stress handling, maximal onset principle. And PhonoLex's D1 export computes WCM from 8 phonological parameters.

Fix: In the unified monorepo, build_lookup.py would call packages/data/phonology/syllabification.py to get syllable structure and compute WCM. No more depending on governors for phonological computation — that's the data layer's job.

Blocked on: Monorepo migration (or at minimum, making PhonoLex's syllabification importable from constrained_chat).

Interim fix: Import PhonoLex's syllabification.py directly into build_lookup_phonolex.py. Compute WCM using the same formula as PhonoLex's export-to-d1.py.

Cluster phonemes¶

Currently empty for subword tokens. Computable from syllable onset/coda data once syllabification is wired in.

6. Content Catalog (Face #3)¶

Concept¶

Batch-generate thousands of compliant texts with various constraint profiles, index them, and provide a browsable catalog. Clinicians can search for "stories about animals with AoA ≤ 5.0 and no /ɹ/" and get pre-generated, pre-validated content.

Why¶

Live generation requires GPU hardware
Pre-generated content can be served statically
Quality can be human-reviewed before distribution
Enables offline use

Architecture¶

Generation pipeline:
  profiles × prompts × N repetitions
  → governed generation (T5Gemma)
  → enrichment + compliance verification
  → index with metadata

Catalog:
  {
    id: "cat-story-r-aoa5-001",
    prompt: "Tell me a story about a cat",
    profile: "Exclude /ɹ/ + AoA ≤ 5.0",
    text: "Clement, a tabby cat with a coat the color...",
    metadata: {
      token_count: 128,
      avg_aoa: 3.8,
      phoneme_distribution: {...},
      compliance: { passed: true, violations: 0 }
    }
  }

Relation to generation_sweep.py¶

The existing scripts/generation_sweep.py is a prototype of this. Extend it with: - Multiple prompts per profile - N repetitions per prompt × profile combination - JSON catalog output with full metadata - Search/filter API - Simple browse UI

7. PhonoLex Features → Governed Generation Integration¶

PhonoLex has tools that directly serve the governed generation use case but aren't connected:

PhonoLex Tool	Governed Generation Use
Text Analysis	Analyze generated text for readability, highlight by property — same as compliance panel but richer
Contrastive Sets	Generate minimal pair targets, then use Include/Coverage constraints to elicit them
Phonological Similarity	Find alternative words when governor blocks a word — suggest rewording
Word Lists	Build targeted vocab lists for VocabOnly constraint palettes
Lookup + Associations	Thematic constraint seed expansion; show clinician why a word was flagged

Future integration¶

Compliance panel could use PhonoLex's text analysis percentile engine
"Suggest alternatives" feature: when a word is flagged, query PhonoLex similarity for compliant alternatives
Contrastive set generation → automatic Include constraint targets
Association graph → thematic constraint seed expansion

8. Cleanup¶

Remove from constrained_chat¶

phase0_eval*.py, phase1_*.py, phase2_*.py — archive to docs/archive/
WORKING_IMPLEMENTATIONS.md — fold into docs
governor-t5-plan.md — superseded by this document
__pycache__/ directories — add to .gitignore
Unused _normalize_phoneme_list in build_lookup_phonolex.py

Frontend polish¶

"Violations" highlight mode is useless with airtight governor — replace with "target phonemes" mode for future Include constraint
Compliance panel: restructure around turn cards, not aggregate sparklines
Loading states during model warmup and generation

9. Priority Sequence¶

Phase A — Quick wins ✅¶

~~Remove vocab_memberships from RichToken wire format~~
~~Add quick-win norm profiles (arousal, valence, imageability, frequency)~~
~~Make NormSliders dynamic~~ → replaced by command language
~~Turn compliance cards in chat UI~~ → replaced by inline system messages

Phase B — Data unification ✅¶

~~Extract shared data layer from PhonoLex + phonolex-governors~~
~~Wire PhonoLex syllabification into lookup builder (fix WCM)~~
~~Wire PhonoLex association data into lookup builder~~
~~Monorepo scaffolding~~

Phase C — New constraints ✅¶

~~Include constraint (Boost layer in governors)~~ → density-weighted IncludeConstraint + /include command
~~Coverage rate imperative (unified into Include)~~ → IncludeConstraint(target_rate=0.2) + /include k 20%
~~VocabBoost (soft vocabulary targeting)~~ → VocabBoostConstraint + /vocab-boost command
~~Thematic constraint (association-backed boost)~~ → ThematicConstraint + /theme command

Phase B2 — Command language ✅¶

Slash command language with 15 verbs (/exclude, /include, /aoa, /msh, /boost, etc.)
ConstraintStore (Zustand) replaces hardcoded ConstraintPanel
Pinned constraint bar with dismissible color-coded chips + preset loader/save
CLI-style monospace system messages in chat feed
API migrated from profile_id to inline constraints with hash-keyed cache
5 constraint schemas (Include, VocabBoost, MSH, MinPairBoost, MaxOppositionBoost)
NormCovered auto-inserted server-side for Bound constraints
Profiles dissolve into store on selection — no "active profile" state
Autocomplete dropdown + IPA keyboard toggle for command input
Compliance panel removed — violations always on, single-column layout
Syllabification + WCM wired into governor lookup builder (was all zeros)
Token card cleaned up: filtered norms, scrollable, no compliance/vocab sections
See docs/superpowers/specs/2026-03-16-governed-chat-command-language-design.md for full spec

Phase D — Content catalog (next)¶

Batch generation pipeline
Catalog indexer + search
Browse UI

Phase E — Polish¶

Loading states during model warmup and generation
Token card styling/colors
Clickable compliance indicator on messages (expandable detail)