Skip to content

Governed Generation

Generate constrained content that respects phonological and psycholinguistic constraints in real time. Designed for clinicians creating therapy materials, researchers studying constrained language production, and educators building controlled reading passages.

How It Works

Governed generation uses a large language model (T5Gemma 9B-2B) with real-time constraint enforcement. You compose constraints visually, write a prompt, and get compliant output with per-word analysis.

The pipeline:

  1. Constraints are resolved to word lists via the PhonoLex API
  2. A vocabulary trie (126K words) is tagged with banned/boosted words
  3. The model generates 4 drafts in parallel with constraint steering
  4. A GUARD check catches any remaining violations via G2P
  5. The best compliant draft is selected and returned with compliance details

Constraint Types

Exclude (hard constraint)

Ban all words containing specified phonemes. The model's output will contain zero instances of these sounds.

Use case: A child substituting /ɹ/ — generate a story that avoids all R sounds.

Parameter Description
Phonemes One or more IPA phonemes to exclude

Note

Exclude constraints also cover allophones. Excluding /ɹ/ also excludes rhotacized vowels /ɝ/ and /ɚ/ when specified.

Include (soft constraint)

Boost words containing specified phonemes toward a target coverage rate. The output will contain approximately the target percentage of words with those sounds.

Use case: Eliciting /k/ — generate text where ~20% of words contain /k/.

Parameter Description
Phonemes One or more IPA phonemes to target
Target rate Desired percentage of words containing the phoneme (default: 20%)

Bound (hard constraint)

Restrict vocabulary to words within specified psycholinguistic norm ranges. Words outside the bounds are banned from the output.

Use case: Limit to early-acquired vocabulary — set Age of Acquisition (Kuperman) max to 5.0.

Parameter Description
Norm Any filterable PhonoLex property (e.g., aoa_kuperman, concreteness)
Min / Max Lower and/or upper bound

Warning

Tight bounds dramatically reduce available vocabulary. AoA ≤ 5 leaves only ~1% of words, which degrades output quality. The system displays a survival warning when vocabulary drops below 20%.

Bound Boost (soft constraint)

Soft-target words within norm ranges toward a coverage rate, without banning words outside the range. Gentler than a hard bound.

Use case: Encourage concrete words — boost concreteness ≥ 3.0 at 30% coverage.

Parameter Description
Norm Any filterable PhonoLex property
Min / Max Lower and/or upper bound for the target set
Coverage target Desired percentage of words from the target set (default: 20%)

Contrastive (soft constraint)

Boost words from minimal pair or maximal opposition sets. Draws from PhonoLex's contrastive intervention database.

Use case: Target the /s/↔/z/ voicing contrast — boost minimal pair words.

Parameter Description
Pair type minpair (minimal pairs) or maxopp (maximal opposition)
Phoneme 1 / 2 The two phonemes in the contrast
Position initial, medial, final, or any

Composing Constraints

Constraints compose freely. You can combine multiple constraints in a single generation:

  • Exclude /ɹ/ + Include /b/ 15% — avoid R sounds while encouraging B sounds
  • Bound AoA ≤ 7 + Exclude /θ,ð/ — simple vocabulary without TH sounds
  • Include /k/ 20% + Contrastive /k/↔/ɡ/ — target velars with minimal pair exposure

The constraint bar shows all active constraints as dismissible chips. Constraints accumulate until cleared.

Understanding the Output

Each generated result includes:

  • Compliance status — whether the output passes all hard constraints (exclude, bound)
  • Violation details — which words violated which constraints (if any)
  • Boost coverage — actual vs. target coverage for each soft constraint
  • Warnings — alerts when vocabulary survival is low

Toggle Analysis mode on any output card to see per-word compliance highlighting:

  • Red background — word violates a hard constraint
  • Blue underline — word matches a boost target (include, contrastive)
  • Click any word to open its full PhonoLex profile

Vocabulary Survival

When hard constraints (exclude, bound) reduce the available vocabulary, the system adapts:

Survival Max tokens Quality
> 20% 128 Normal — multi-sentence paragraphs
5–20% 80 Shortened output, may need retries
< 5% 48 Significantly degraded, warning displayed

The survival ratio is reported in the SSE status stream during generation.

Tips for Best Results

  1. Start with one constraint and verify the output before adding more
  2. Prefer soft constraints (include, bound_boost) over hard ones (bound) when possible — they preserve text quality
  3. Exclude constraints work best for high-frequency phonemes (/ɹ/, /s/, /l/) where ~50% of vocabulary survives
  4. Tight AoA bounds (AoA ≤ 5) are extremely restrictive — consider using bound_boost instead
  5. Check the compliance panel — toggle Analysis mode to verify constraint satisfaction per word