Skip to content

Governed Generation — Product Plan

2026-03-12 — planning document for monorepo migration, architecture unification, and next phase


The Big Picture

Three repos currently do overlapping work with duplicated utilities and fragmented data handling:

Repo What it is What it does
PhonoLex Phonological analysis platform 5 tools (word lists, text analysis, contrastive sets, similarity search, lookup), 15 datasets, 68K+ cognitive edges, Cloudflare Workers + D1, React + MUI frontend
phonolex-governors Constraint library Gates/Boosts/Projections, LookupBuilder, dataset loaders, HuggingFace adapter
constrained_chat Governed generation dashboard FastAPI + React, T5Gemma 9B-2B, chat + generate modes, compliance panel

The duplication problem

Same data loaded 3+ ways: - CMU dict: PhonoLex Python (english_data_loader.py), phonolex-governors (datasets.cmudict_to_phono()), build_lookup_phonolex.py - Psycholinguistic norms: PhonoLex export-to-d1.py, phonolex-governors (datasets.load_kuperman() etc.), build_lookup_phonolex.py - IPA/ARPAbet conversion: PhonoLex data/mappings/, phonolex-governors internal - Syllabification: PhonoLex syllabification.py (full onset-nucleus-coda), phonolex-governors (none — hence WCM is all zeros) - Phoneme normalization: PhonoLex Workers normalize.ts, build_lookup_phonolex.py IPA_NORMALIZE

The vision: one thing with a few faces

One unified platform with three faces:

  1. PhonoLex (existing) — interactive word analysis, list building, contrastive sets, similarity search. The datahub.
  2. Governed Generation (existing) — live governed content generation with real-time compliance. The clinical tool.
  3. Content Catalog (new) — batch-generate 10K stories with specified properties, package, and provide a browsable catalog of static compliant content. Same governor tech, no live model needed to consume.

Shared backbone: - Unified data layer (PhonoLex is the source of truth for all phonological + psycholinguistic data) - Governor engine (constraint compilation, logit processing) - Lookup/enrichment pipeline (tokenizer-aware, subword-attributed)


1. Monorepo Architecture

Proposed structure

phonolex/                              # The product IS PhonoLex
├── packages/
│   ├── data/                          # THE data layer (single source of truth)
│   │   ├── sources/                   # Raw dataset files (CMU, PHOIBLE, norms)
│   │   ├── mappings/                  # IPA/ARPAbet conversion tables
│   │   ├── loaders/                   # Python data loading (ONE place)
│   │   │   ├── cmudict.py             # CMUdict → phonological features
│   │   │   ├── norms.py               # All 13 norm datasets
│   │   │   ├── associations.py        # 7 cognitive edge datasets
│   │   │   ├── phoible.py             # Phoneme feature vectors
│   │   │   ├── vocab_lists.py         # Curated vocabulary lists
│   │   │   └── g2p_alignment.py       # PhonoLex G2P alignment data
│   │   ├── phonology/                 # Phonological computation (ONE place)
│   │   │   ├── syllabification.py     # Onset-nucleus-coda extraction
│   │   │   ├── wcm.py                 # Word Complexity Measure
│   │   │   ├── similarity.py          # Soft Levenshtein (Python port)
│   │   │   └── normalize.py           # IPA normalization (ɡ→g etc.)
│   │   └── pyproject.toml             # pip install phonolex-data
│   │
│   ├── governors/                     # Constraint engine
│   │   ├── src/phonolex_governors/   # Gates, Boosts, Projections
│   │   │   ├── core.py                # Governor composition
│   │   │   ├── gates.py               # HardGate (Exclude, VocabOnly)
│   │   │   ├── boosts.py              # SoftBoost (Include — NEW)
│   │   │   ├── projections.py         # Dynamic (Coverage — NEW)
│   │   │   ├── constraints.py         # Constraint → layer compilation
│   │   │   └── adapters/
│   │   │       └── huggingface.py     # HFGovernorProcessor
│   │   ├── lookup/                    # LookupBuilder (uses packages/data)
│   │   └── pyproject.toml             # pip install phonolex-governors
│   │
│   ├── web/                           # PhonoLex web app (face #1)
│   │   ├── workers/                   # Cloudflare Workers API
│   │   │   ├── src/routes/            # words, similarity, contrastive, text, etc.
│   │   │   └── scripts/export-to-d1.py
│   │   └── frontend/                  # React + MUI
│   │
│   ├── dashboard/                     # Governed Generation (face #2)
│   │   ├── server/                    # FastAPI backend
│   │   │   ├── model.py               # T5Gemma loading + generation
│   │   │   ├── governor.py            # Governor cache + HF adapter
│   │   │   ├── profiles.py            # Constraint profiles
│   │   │   └── routes/
│   │   ├── frontend/                  # React + Tailwind
│   │   └── scripts/
│   │       └── build_lookup.py        # Uses packages/data + packages/governors
│   │
│   └── catalog/                       # Content Catalog (face #3 — NEW)
│       ├── generator/                 # Batch generation pipeline
│       │   ├── batch.py               # Run N generations with profile
│       │   ├── indexer.py             # Build searchable catalog
│       │   └── export.py              # Package for distribution
│       ├── server/                    # Browse/search API
│       └── frontend/                  # Catalog browser UI
│
├── docs/
├── pyproject.toml                     # Workspace root
└── README.md

Key architectural decisions

packages/data is the single source of truth. All data loading, phonological computation, and normalization lives here. No more datasets.cmudict_to_phono() in governors AND english_data_loader.py in PhonoLex AND inline loading in scripts.

packages/governors depends on packages/data. LookupBuilder calls data.loaders.cmudict, data.loaders.norms, data.phonology.syllabification, etc. No internal dataset copies.

PhonoLex web app keeps its Cloudflare architecture. The D1 database is a precomputed snapshot of packages/data output — the export-to-d1.py script is the bridge. No runtime dependency on Python data loaders.

Dashboard build_lookup.py uses the shared data layer. Instead of importing from phonolex_governors.datasets, it imports from packages/data. The governors package provides the constraint engine; the data package provides the data.


2. Immediate Changes (Tomorrow)

2a. Remove vocab_memberships from RichToken

Strip from wire format. Keep in lookup for governor use.

Files: - server/schemas.py — remove field from RichToken - server/model.py — stop passing in enrich_tokens() - frontend/src/types.ts — remove from interface - Tests: update fixtures

2b. Inline Turn Compliance Cards

Replace the separate CompliancePanel sparkline with per-turn compliance cards rendered inline with the chat.

Layout:

┌─────────────────────────────────┬──────────────────────────┐
│ Chat                            │ Turn Compliance          │
├─────────────────────────────────┼──────────────────────────┤
│ 🧑 Tell me about your day      │                          │
│                                 │                          │
│ 🤖 I went to the park and      │ ✓ 32/32 pass             │
│    played with my dog.          │ AoA: avg 3.2 (≤ 8.0)    │
│                                 │ /ɹ/: 0 tokens ✓         │
│                                 │ Conc: avg 4.1 (≥ 3.0)   │
├─────────────────────────────────┼──────────────────────────┤
│ 🧑 What did you see there?     │                          │
│                                 │                          │
│ 🤖 I saw birds and flowers.    │ ✓ 18/18 pass             │
│    The sun was warm.            │ AoA: avg 2.8 (≤ 8.0)    │
│                                 │ /ɹ/: 0 tokens ✓         │
│                                 │ ★ "flowers" AoA=5.8     │
└─────────────────────────────────┴──────────────────────────┘

Each card shows: - Pass/fail ratio - Per-constraint summary (Exclude: count of blocked phoneme tokens; Bound: avg/max vs threshold) - Flagged tokens: edge cases, highest-value tokens near thresholds - Future: Include hit rate, Coverage running %

Component: TurnComplianceCard — computed client-side from Turn.assistant_tokens[].compliance + .norms


3. New Constraint Types

3a. Include (Boost layer)

Purpose: Increase probability of tokens containing target phonemes.

class IncludeConstraint(BaseModel):
    type: Literal["include"]
    phonemes: list[str]
    strength: float = 2.0  # logit boost magnitude

Governor layer: SoftBoost — adds +strength to logits of tokens whose phonemes overlap with targets. Unlike gates (binary), boosts shift the distribution without eliminating options.

Clinical use: "Generate text rich in /b/ and /d/ for minimal pair practice"

3b. Coverage Rate (Projection layer)

Purpose: Hit a target phoneme rate across the generation.

class CoverageConstraint(BaseModel):
    type: Literal["coverage"]
    phonemes: list[str]
    target_rate: float       # 0.0–1.0
    window: int | None = None

Governor layer: Stateful projection. Each generation step: 1. Count content tokens generated so far 2. Count how many contained target phonemes 3. Compute current_rate = target_count / total_count 4. If current_rate < target_rate: boost target tokens proportionally 5. If current_rate ≥ target_rate: reduce or zero out the boost

Needs: GovernorContext must carry generation history (current input_ids provides this). Projection layer in phonolex-governors needs stateful callback support.

3c. Thematic Constraint

Purpose: Keep generation within a semantic field using PhonoLex's cognitive association graph.

class ThematicConstraint(BaseModel):
    type: Literal["thematic"]
    seed_words: list[str]
    strength: float = 1.5

Implementation: At build time, expand seed_words through association graph (USF data from PhonoLex) to get a set of related tokens. Apply as a soft boost to those tokens.

Data source: PhonoLex already has 1M+ cognitive edges across 7 relationship types. The export-to-d1.py pipeline or a direct Python loader can provide these.


4. Norms Expansion

Available in lookup, not yet in profiles

Norm Coverage Clinical use Priority
valence 34K Emotional tone (positive/negative) High
arousal 34K Calm vs exciting language High
imageability 15K Vivid, concrete descriptions High
frequency / log_frequency 79K Common words only Medium
socialness 19K Interpersonal vocabulary Medium
dominance 34K Empowering language Medium
familiarity 15K Familiar vs rare Medium
iconicity 34K Sound-meaning correspondence Low
boi 22K Body-object interaction Low
semantic_diversity 60K Meaning flexibility Low
contextual_diversity 79K Context breadth Low
lexical_decision_rt 59K Processing speed Low
Sensorimotor (11 dims) 46K Modality-specific content Low

Quick-win profiles to add

  • Calm therapeutic: arousal ≤ 3.0
  • Positive affect: valence ≥ 6.0
  • High imageability: imageability ≥ 5.0
  • High frequency only: log_frequency ≥ 3.0
  • Social language: socialness ≥ 5.0

UI change

NormSliders.tsx should dynamically list available norms from a registry, not hardcode AoA + concreteness.


5. Phonological Data Gaps

WCM, syllables, shapes — all zeros

Root cause: cmudict_to_phono() in phonolex-governors doesn't compute syllable structure or WCM. PhonoLex's syllabification.py DOES — full onset-nucleus-coda extraction, stress handling, maximal onset principle. And PhonoLex's D1 export computes WCM from 8 phonological parameters.

Fix: In the unified monorepo, build_lookup.py would call packages/data/phonology/syllabification.py to get syllable structure and compute WCM. No more depending on governors for phonological computation — that's the data layer's job.

Blocked on: Monorepo migration (or at minimum, making PhonoLex's syllabification importable from constrained_chat).

Interim fix: Import PhonoLex's syllabification.py directly into build_lookup_phonolex.py. Compute WCM using the same formula as PhonoLex's export-to-d1.py.

Cluster phonemes

Currently empty for subword tokens. Computable from syllable onset/coda data once syllabification is wired in.


6. Content Catalog (Face #3)

Concept

Batch-generate thousands of compliant texts with various constraint profiles, index them, and provide a browsable catalog. Clinicians can search for "stories about animals with AoA ≤ 5.0 and no /ɹ/" and get pre-generated, pre-validated content.

Why

  • Live generation requires GPU hardware
  • Pre-generated content can be served statically
  • Quality can be human-reviewed before distribution
  • Enables offline use

Architecture

Generation pipeline:
  profiles × prompts × N repetitions
  → governed generation (T5Gemma)
  → enrichment + compliance verification
  → index with metadata

Catalog:
  {
    id: "cat-story-r-aoa5-001",
    prompt: "Tell me a story about a cat",
    profile: "Exclude /ɹ/ + AoA ≤ 5.0",
    text: "Clement, a tabby cat with a coat the color...",
    metadata: {
      token_count: 128,
      avg_aoa: 3.8,
      phoneme_distribution: {...},
      compliance: { passed: true, violations: 0 }
    }
  }

Relation to generation_sweep.py

The existing scripts/generation_sweep.py is a prototype of this. Extend it with: - Multiple prompts per profile - N repetitions per prompt × profile combination - JSON catalog output with full metadata - Search/filter API - Simple browse UI


7. PhonoLex Features → Governed Generation Integration

PhonoLex has tools that directly serve the governed generation use case but aren't connected:

PhonoLex Tool Governed Generation Use
Text Analysis Analyze generated text for readability, highlight by property — same as compliance panel but richer
Contrastive Sets Generate minimal pair targets, then use Include/Coverage constraints to elicit them
Phonological Similarity Find alternative words when governor blocks a word — suggest rewording
Word Lists Build targeted vocab lists for VocabOnly constraint palettes
Lookup + Associations Thematic constraint seed expansion; show clinician why a word was flagged

Future integration

  • Compliance panel could use PhonoLex's text analysis percentile engine
  • "Suggest alternatives" feature: when a word is flagged, query PhonoLex similarity for compliant alternatives
  • Contrastive set generation → automatic Include constraint targets
  • Association graph → thematic constraint seed expansion

8. Cleanup

Remove from constrained_chat

  • phase0_eval*.py, phase1_*.py, phase2_*.py — archive to docs/archive/
  • WORKING_IMPLEMENTATIONS.md — fold into docs
  • governor-t5-plan.md — superseded by this document
  • __pycache__/ directories — add to .gitignore
  • Unused _normalize_phoneme_list in build_lookup_phonolex.py

Frontend polish

  • "Violations" highlight mode is useless with airtight governor — replace with "target phonemes" mode for future Include constraint
  • Compliance panel: restructure around turn cards, not aggregate sparklines
  • Loading states during model warmup and generation

9. Priority Sequence

Phase A — Quick wins ✅

  1. ~~Remove vocab_memberships from RichToken wire format~~
  2. ~~Add quick-win norm profiles (arousal, valence, imageability, frequency)~~
  3. ~~Make NormSliders dynamic~~ → replaced by command language
  4. ~~Turn compliance cards in chat UI~~ → replaced by inline system messages

Phase B — Data unification ✅

  1. ~~Extract shared data layer from PhonoLex + phonolex-governors~~
  2. ~~Wire PhonoLex syllabification into lookup builder (fix WCM)~~
  3. ~~Wire PhonoLex association data into lookup builder~~
  4. ~~Monorepo scaffolding~~

Phase C — New constraints ✅

  1. ~~Include constraint (Boost layer in governors)~~ → density-weighted IncludeConstraint + /include command
  2. ~~Coverage rate imperative (unified into Include)~~ → IncludeConstraint(target_rate=0.2) + /include k 20%
  3. ~~VocabBoost (soft vocabulary targeting)~~ → VocabBoostConstraint + /vocab-boost command
  4. ~~Thematic constraint (association-backed boost)~~ → ThematicConstraint + /theme command

Phase B2 — Command language ✅

  • Slash command language with 15 verbs (/exclude, /include, /aoa, /msh, /boost, etc.)
  • ConstraintStore (Zustand) replaces hardcoded ConstraintPanel
  • Pinned constraint bar with dismissible color-coded chips + preset loader/save
  • CLI-style monospace system messages in chat feed
  • API migrated from profile_id to inline constraints with hash-keyed cache
  • 5 constraint schemas (Include, VocabBoost, MSH, MinPairBoost, MaxOppositionBoost)
  • NormCovered auto-inserted server-side for Bound constraints
  • Profiles dissolve into store on selection — no "active profile" state
  • Autocomplete dropdown + IPA keyboard toggle for command input
  • Compliance panel removed — violations always on, single-column layout
  • Syllabification + WCM wired into governor lookup builder (was all zeros)
  • Token card cleaned up: filtered norms, scrollable, no compliance/vocab sections
  • See docs/superpowers/specs/2026-03-16-governed-chat-command-language-design.md for full spec

Phase D — Content catalog (next)

  1. Batch generation pipeline
  2. Catalog indexer + search
  3. Browse UI

Phase E — Polish

  1. Loading states during model warmup and generation
  2. Token card styling/colors
  3. Clickable compliance indicator on messages (expandable detail)