Governed Generation¶

Generate constrained content that respects phonological and psycholinguistic constraints in real time. Designed for clinicians creating therapy materials, researchers studying constrained language production, and educators building controlled reading passages.

How It Works¶

Governed generation uses a large language model (T5Gemma 9B-2B) with real-time constraint enforcement. You compose constraints visually, write a prompt, and get compliant output with per-word analysis.

The pipeline:

Constraints are resolved to word lists via the PhonoLex API
A vocabulary trie (126K words) is tagged with banned/boosted words
The model generates 4 drafts in parallel with constraint steering
A GUARD check catches any remaining violations via G2P
The best compliant draft is selected and returned with compliance details

Constraint Types¶

Exclude (hard constraint)¶

Ban all words containing specified phonemes. The model's output will contain zero instances of these sounds.

Use case: A child substituting /ɹ/ — generate a story that avoids all R sounds.

Parameter	Description
Phonemes	One or more IPA phonemes to exclude

Note

Exclude constraints also cover allophones. Excluding /ɹ/ also excludes rhotacized vowels /ɝ/ and /ɚ/ when specified.

Include (soft constraint)¶

Boost words containing specified phonemes toward a target coverage rate. The output will contain approximately the target percentage of words with those sounds.

Use case: Eliciting /k/ — generate text where ~20% of words contain /k/.

Parameter	Description
Phonemes	One or more IPA phonemes to target
Target rate	Desired percentage of words containing the phoneme (default: 20%)

Bound (hard constraint)¶

Restrict vocabulary to words within specified psycholinguistic norm ranges. Words outside the bounds are banned from the output.

Use case: Limit to early-acquired vocabulary — set Age of Acquisition (Kuperman) max to 5.0.

Parameter	Description
Norm	Any filterable PhonoLex property (e.g., `aoa_kuperman`, `concreteness`)
Min / Max	Lower and/or upper bound

Warning

Tight bounds dramatically reduce available vocabulary. AoA ≤ 5 leaves only ~1% of words, which degrades output quality. The system displays a survival warning when vocabulary drops below 20%.

Bound Boost (soft constraint)¶

Soft-target words within norm ranges toward a coverage rate, without banning words outside the range. Gentler than a hard bound.

Use case: Encourage concrete words — boost concreteness ≥ 3.0 at 30% coverage.

Parameter	Description
Norm	Any filterable PhonoLex property
Min / Max	Lower and/or upper bound for the target set
Coverage target	Desired percentage of words from the target set (default: 20%)

Contrastive (soft constraint)¶

Boost words from minimal pair or maximal opposition sets. Draws from PhonoLex's contrastive intervention database.

Use case: Target the /s/↔/z/ voicing contrast — boost minimal pair words.

Parameter	Description
Pair type	`minpair` (minimal pairs) or `maxopp` (maximal opposition)
Phoneme 1 / 2	The two phonemes in the contrast
Position	`initial`, `medial`, `final`, or `any`

Composing Constraints¶

Constraints compose freely. You can combine multiple constraints in a single generation:

Exclude /ɹ/ + Include /b/ 15% — avoid R sounds while encouraging B sounds
Bound AoA ≤ 7 + Exclude /θ,ð/ — simple vocabulary without TH sounds
Include /k/ 20% + Contrastive /k/↔/ɡ/ — target velars with minimal pair exposure

The constraint bar shows all active constraints as dismissible chips. Constraints accumulate until cleared.

Understanding the Output¶

Each generated result includes:

Compliance status — whether the output passes all hard constraints (exclude, bound)
Violation details — which words violated which constraints (if any)
Boost coverage — actual vs. target coverage for each soft constraint
Warnings — alerts when vocabulary survival is low

Toggle Analysis mode on any output card to see per-word compliance highlighting:

Red background — word violates a hard constraint
Blue underline — word matches a boost target (include, contrastive)
Click any word to open its full PhonoLex profile

Vocabulary Survival¶

When hard constraints (exclude, bound) reduce the available vocabulary, the system adapts:

Survival	Max tokens	Quality
> 20%	128	Normal — multi-sentence paragraphs
5–20%	80	Shortened output, may need retries
< 5%	48	Significantly degraded, warning displayed

The survival ratio is reported in the SSE status stream during generation.

Tips for Best Results¶

Start with one constraint and verify the output before adding more
Prefer soft constraints (include, bound_boost) over hard ones (bound) when possible — they preserve text quality
Exclude constraints work best for high-frequency phonemes (/ɹ/, /s/, /l/) where ~50% of vocabulary survives
Tight AoA bounds (AoA ≤ 5) are extremely restrictive — consider using bound_boost instead
Check the compliance panel — toggle Analysis mode to verify constraint satisfaction per word