Skip to content

Include/Coverage Redesign + VocabBoost

2026-03-17 — Design spec for principled phoneme inclusion, vocabulary boosting, and session-level coverage tracking


Problem

The current IncludeConstraint and CoverageConstraint are unprincipled:

  1. Flat boost. Every token containing a target phoneme gets the same +strength logit bias regardless of how phoneme-rich the token is. A token with /k/ as one of six phonemes gets the same boost as a token that's mostly /k/.
  2. Two separate classes (IncludeConstraint + CoverageConstraint + _CoverageProjection) for what is conceptually one mechanism with two modes.
  3. Inconsistent phono access. include.py does raw dict access (feats.get("phono")) while every other constraint uses _parse_phono() from constraints.py.
  4. Per-generation coverage only. Each turn's coverage tracking starts from zero. The clinician has no way to target phoneme density across a therapy session.
  5. No vocabulary boosting. VocabOnly hard-gates to a word list. There is no soft alternative for encouraging target vocabulary without eliminating everything else.

Additionally, the audit identified several governor-wide gaps: - Density constraint is dead code (deprecated, no schema, unreachable from API) - MSHStage not checked in _check_compliance - MaxOppositionBoost validation error propagates as a 500 - Only 7 of 25+ norms reachable from the command language parser - HFGovernorProcessor.reset() not called between turns

Solution

1. Unified Boost-with-Coverage Pattern

A shared two-mode pattern used by both IncludeConstraint and VocabBoostConstraint:

Static mode — user specifies strength, boost applied every step:

boost(token) = weight(token) * strength

Coverage mode — user specifies target_rate, boost scales with the gap:

boost(token) = weight(token) * max_boost * gap(ctx)
gap(ctx) = max(0, target_rate - current_rate)

When coverage is met or exceeded, boost drops to zero. The mechanism self-regulates.

Mode selection is implicit from parameters: - target_rate provided → coverage mode (stateful projection, mechanism_kind = "projection") - target_rate absent → static mode (LogitBoost, mechanism_kind = "boost")

strength and target_rate are mutually exclusive: when target_rate is provided, strength is ignored and max_boost controls the dynamic scaling. Pydantic validators enforce this — if both strength (non-default) and target_rate are set, a ValueError is raised.

One class, one build path per constraint type. The stateful tracking lives inside a shared _CoverageMechanism that wraps the weight tensor and handles gap computation.

Deliberate asymmetry: weighting vs. counting. The boost magnitude uses the weight function (density for Include, binary for VocabBoost), so phoneme-rich tokens get stronger encouragement. But coverage tracking counts tokens as binary hit/miss — a token either contains a target phoneme or it doesn't. This is intentional: coverage answers "how many tokens practiced the sound" (binary), while density controls "how strongly to prefer phoneme-rich tokens among those that do" (weighted).

2. IncludeConstraint — Phoneme Density Weighting

Replace the flat +strength with normalized phoneme density:

weight(token) = count(target_phonemes ∩ token_phonemes) / len(token_phonemes)

Examples targeting /k/:

Token phonemes Density Rationale
[k, æ, t] 1/3 = 0.33 One target phoneme, three total
[k, ɹ, æ, k] 2/4 = 0.50 Two /k/ occurrences, four total
[k, ə] 1/2 = 0.50 Short token, high target density
[s, t, ɹ, ɪ, ŋ] 0/5 = 0.00 No target phonemes, no boost

Multi-phoneme targets work naturally: /include k t targeting {k, t} against token [k, æ, t] → 2/3 = 0.67.

Tokens with no phonemes in the lookup get weight 0 — they pass through unboosted. Phono access uses _parse_phono() from constraints.py.

Parameters: - phonemes: set[str] — IPA phonemes to target - strength: float = 2.0 — static mode boost (mutually exclusive with target_rate) - target_rate: float | None = None — coverage mode target (0.0–1.0) - max_boost: float = 3.0 — maximum boost in coverage mode

Content tokens for coverage: All tokens with phonological data (non-empty phoneme lists in the lookup). Target tokens are those whose phonemes intersect the target set. This parallels VocabBoost where content = tokens with lookup data, target = tokens in the target vocabulary.

Clinical purpose: Phoneme elicitation — maximize practice opportunities for target sounds. The density weighting ensures tokens rich in the target phoneme are preferred over tokens where the target is incidental.

Future extension: Position-aware filtering (onset, coda, nucleus) can layer on top by filtering token_phonemes to only the relevant syllable positions before computing density. The syllable data is already in the lookup (PhonoFeatures.syllables with onset, nucleus, coda lists). The scoring function doesn't change — only the input. This is deferred from this spec to assess coherence impact separately.

3. VocabBoostConstraint — Soft Vocabulary Targeting

A new constraint type parallel to IncludeConstraint, using the same dual-mode pattern with binary membership weighting:

weight(token) = 1.0 if token_word ∈ target_vocab else 0.0

target_vocab is built from two optional sources (same as VocabOnly): - lists: list[str] | None — named vocabulary sets from vocab_memberships in the lookup - words: list[str] | None — explicit word strings, resolved to token IDs via the tokenizer

Single-token resolution only for words. If a word doesn't map to a single token, it is not boosted. This matches VocabOnly behavior and avoids fuzzy subword boosting.

At least one of lists or words is required. If words is provided, the tokenizer must be present in the build kwargs; build() raises ValueError if missing.

Parameters: - lists: list[str] | None — named vocab sets - words: list[str] | None — explicit target words - strength: float = 2.0 — static mode (mutually exclusive with target_rate) - target_rate: float | None = None — coverage mode target (0.0–1.0) - max_boost: float = 3.0 — maximum boost in coverage mode - include_punctuation: bool = True — exempt punctuation tokens from content_ids in coverage mode

Content tokens for coverage: Any token in the lookup with data (a real word), excluding punctuation tokens when include_punctuation is True. Punctuation tokens are identified via the tokenizer (same set as VocabOnly). They pass through unboosted and do not affect the coverage rate. Target tokens are those in the target vocabulary.

Relationship to VocabOnly: VocabOnly (hard gate) remains for strict restriction. VocabBoost is the soft alternative. They compose — gate on a broad list, boost a narrow one within it.

Command syntax: - /vocab-boost <list_name> — boost a named list (static) - /vocab-boost <word> <word> ... — boost specific words (static) - /vocab-boost <list_or_words> N% — coverage mode - /remove vocab-boost — remove

4. Session-Level Coverage Tracking

Coverage mode tracks phoneme/vocabulary hit rates across the entire chat session, not just the current generation.

CoverageTracker — a lightweight counter that lives alongside the session:

CoverageTracker:
  key: str              # deterministic coverage key (see below)
  content_count: int    # total content tokens across completed turns
  target_count: int     # total target tokens across completed turns

Coverage key computation. The key is a deterministic string derived from the constraint's identity fields: - IncludeConstraint: "include:{sorted_phonemes}" — e.g., "include:k" or "include:k,t" - VocabBoostConstraint: "vocab_boost:{sorted_lists}:{sorted_words}" — e.g., "vocab_boost:ogden_basic:" or "vocab_boost::cat,dog,fish"

The same algorithm is used in two places: (1) the constraint's build() method, which passes the key to _CoverageMechanism; and (2) the route handler, which uses the key to look up/create trackers. Both derive the key from the schema constraint's fields, ensuring they always match. A shared utility function coverage_key_for(constraint) in governor.py computes this from the schema constraint.

Lifecycle: 1. When a coverage constraint is in the active set at generation time, the route handler looks up or creates a tracker keyed by the constraint identity. 2. Prior counts are passed to the governor mechanism via a new field on GovernorContext: prior_coverage: dict[str, tuple[int, int]] | None — key → (content_count, target_count). 3. After generation completes, the route handler counts content/target tokens in the output and updates the tracker. 4. When the constraint is removed from the store, the tracker is dropped. 5. Re-adding starts fresh from zero.

GovernorContext addition:

@dataclass
class GovernorContext:
    step: int = 0
    total_steps: int = 1
    token_ids: torch.Tensor | None = None
    mask_positions: torch.Tensor | None = None
    device: str | torch.device = "cpu"
    prior_coverage: dict[str, tuple[int, int]] | None = None  # NEW

What counts toward coverage: Only assistant tokens from completed turns. User tokens are excluded — the clinician isn't the one practicing. The counting uses the same content_ids / target_ids sets that the mechanism builds at governor construction time. The mechanism exposes a count_tokens(token_ids: list[int]) -> tuple[int, int] method that the route handler calls post-generation to update the tracker.

Gap computation with priors:

def _compute_gap(self, ctx):
    prior_content, prior_target = 0, 0
    if ctx.prior_coverage and self.coverage_key in ctx.prior_coverage:
        prior_content, prior_target = ctx.prior_coverage[self.coverage_key]

    # Count current generation tokens
    current_content, current_target = self._count_current(ctx.token_ids)

    total_content = prior_content + current_content
    total_target = prior_target + current_target

    if total_content == 0:
        return self.target_rate  # full gap, no data yet

    current_rate = total_target / total_content
    return max(0.0, self.target_rate - current_rate)

Injecting priors into the HF adapter. HFGovernorProcessor.reset() gains an optional prior_coverage parameter:

def reset(self, total_steps: int | None = None,
          prior_coverage: dict[str, tuple[int, int]] | None = None):
    self._step = 0
    self._prior_coverage = prior_coverage
    if total_steps is not None:
        self.total_steps = total_steps

The processor stores prior_coverage and includes it in every GovernorContext it constructs during that generation run. The route handler calls reset(prior_coverage=priors) before each generation.

Accessing count_tokens post-generation. The Governor exposes a get_coverage_mechanisms() -> dict[str, _CoverageMechanism] method that returns coverage mechanisms keyed by their coverage_key. HFGovernorProcessor delegates this through as get_coverage_counters(). The route handler calls this post-generation to get the count_tokens() method for each active coverage constraint.

Cache impact: Only one chat session is active at a time, so the GovernorCache can continue to cache built governors by constraint hash. The per-turn prior counts are injected via reset(), not baked into the build.

Edge case — early tokens: When total content count is 0 or very small (start of session, start of first turn), current_rate is unstable. The mechanism returns target_rate as the gap (full boost), which is the correct behavior — at the start, full encouragement is appropriate.

Sessionless generation (/generate-single): The /generate-single endpoint operates without a session. Coverage mode falls back to per-generation tracking with zero priors — the same behavior as today's _CoverageProjection. Session-level accumulation requires the /generate endpoint with a session.

5. Mechanism Implementation

Shared _CoverageMechanism class implements the Mechanism protocol and handles the stateful coverage mode for both IncludeConstraint and VocabBoostConstraint:

class _CoverageMechanism:
    def __init__(self, weights, target_ids, content_ids, target_rate, max_boost, vocab_size, coverage_key):
        self.weights = weights          # (vocab_size,) — density or binary
        self.target_ids = target_ids
        self.content_ids = content_ids
        self.target_rate = target_rate
        self.max_boost = max_boost
        self.coverage_key = coverage_key

    def apply(self, logits, ctx):
        gap = self._compute_gap(ctx)
        if gap <= 0.0:
            return logits
        return logits + self.weights.to(logits.device) * (self.max_boost * gap)

    def count_tokens(self, token_ids: list[int]) -> tuple[int, int]:
        """Count content and target tokens for post-generation tracker update."""
        content = sum(1 for t in token_ids if t in self.content_ids)
        target = sum(1 for t in token_ids if t in self.target_ids)
        return content, target

Static mode uses LogitBoost directly (no _CoverageMechanism). The weight tensor is precomputed at build time:

# Static: bias[tid] = density(tid) * strength (Include)
# Static: bias[tid] = 1.0 * strength if in vocab else 0.0 (VocabBoost)
return LogitBoost(bias)

6. Schema and API Changes

Remove CoverageConstraint as a separate schema type.

Modify IncludeConstraint schema to accept optional target_rate:

class IncludeConstraint(BaseModel):
    type: Literal["include"] = "include"
    phonemes: list[str]
    strength: float = 2.0
    target_rate: float | None = None   # NEW — triggers coverage mode
    max_boost: float = 3.0             # NEW — used in coverage mode

Add VocabBoostConstraint schema:

class VocabBoostConstraint(BaseModel):
    type: Literal["vocab_boost"] = "vocab_boost"
    lists: list[str] | None = None
    words: list[str] | None = None
    strength: float = 2.0
    target_rate: float | None = None
    max_boost: float = 3.0
    include_punctuation: bool = True

Update the Constraint discriminated union to include VocabBoostConstraint and remove the separate CoverageConstraint.

Update _to_dg_constraint in governor.py to handle the unified IncludeConstraint (with optional target_rate) and the new VocabBoostConstraint.

7. Frontend Command Changes

IncludeConstraint — no syntax change. /include k and /include k 20% work as before. The existing "coverage" StoreEntry variant is removed; the "include" variant gains an optional targetRate field:

// Before: two entry types
type IncludeEntry = { type: "include"; phoneme: string; strength: number };
type CoverageEntry = { type: "coverage"; phoneme: string; targetRate: number };

// After: one unified entry type
type IncludeEntry = { type: "include"; phoneme: string; strength: number; targetRate?: number };

The compiler emits a single IncludeConstraint with or without target_rate. Naming convention: StoreEntry uses camelCase (targetRate), API constraint uses snake_case (target_rate). The compiler handles the conversion.

target_rate conversion: The parser stores percentages as user-entered values (e.g., 20 from /include k 20%). The compiler divides by 100 before sending to the API (e.g., target_rate: 0.20). The API schema validates 0.0 <= target_rate <= 1.0 via a @field_validator.

VocabBoostConstraint — new command: - /vocab-boost <list_name> — boost a named list - /vocab-boost <word> <word> ... — boost specific words - /vocab-boost <list_or_words> N% — coverage mode

The parser needs a new parseVocabBoost function. The compiler needs a new VocabBoostEntryVocabBoostConstraint path. A new chip color for vocab-boost entries in the ConstraintBar.

StoreEntry union gains a vocab_boost variant:

type VocabBoostEntry = {
  type: "vocab_boost";
  lists?: string[];
  words?: string[];
  targetRate?: number;
};

8. Cleanup and Consistency Fixes

These are included in this redesign scope:

8a. Remove Density constraint. Delete from constraints.py and __init__.py. Dead code — deprecated, no schema, unreachable from API.

8b. Add MSH to _check_compliance. Check MSHStage in model.py post-hoc compliance. Tokens without phono data still pass through the governor (by design), but tokens with phono data that exceed the stage limit surface as violations.

8c. MaxOppositionBoost validation. Catch ValueError from the sonorant/obstruent class check in _to_dg_constraint and return HTTP 422 with a clear message instead of an unhandled 500.

8d. Expand norm allowlist. Add to the parser's NORM_COMMANDS: dominance, socialness, boi, iconicity, semantic_diversity, contextual_diversity, lexical_decision_rt, and sensorimotor dimensions. These are all present in the lookup already. Each needs a norm key, default direction, and help text entry.

8e. HFGovernorProcessor.reset() between turns. The route handler calls reset(prior_coverage=priors) before each generation to restart the step counter and inject session-level coverage priors (see Section 4).

8f. Unify phono access in include.py. Use _parse_phono() from constraints.py instead of raw dict access. This aligns Include with every other constraint's access pattern.


Files Affected

Engine (packages/governors/src/diffusion_governors/)

  • include.py — rewrite: unified IncludeConstraint with density weighting and dual mode; new VocabBoostConstraint; shared _CoverageMechanism; delete _CoverageProjection and old CoverageConstraint
  • core.py — add prior_coverage field to GovernorContext
  • constraints.py — delete Density; export _parse_phono for use by include.py (or move to shared util)
  • boosts.py — no changes
  • gates.py — no changes
  • cdd.py — no changes
  • lookups.py — no changes
  • __init__.py — update exports: remove Density, CoverageConstraint; add VocabBoostConstraint

Dashboard server (packages/dashboard/server/)

  • schemas.py — remove CoverageConstraint schema; add target_rate/max_boost to IncludeConstraint; add VocabBoostConstraint schema; update Constraint union
  • governor.py — update _to_dg_constraint for unified Include and new VocabBoost; catch MaxOppositionBoost ValueError as 422; call reset() on processor between turns
  • model.py — add MSH to _check_compliance
  • sessions.py — add CoverageTracker storage alongside sessions (or as a field on Session)
  • routes/generate.py — wire coverage tracker lifecycle: look up priors, pass to context, update after generation

Dashboard frontend (packages/dashboard/frontend/src/)

  • types.ts — remove CoverageEntry; add target_rate/max_boost to IncludeEntry; add VocabBoostEntry to StoreEntry union; add VocabBoostConstraint to Constraint union
  • commands/parser.ts — update parseInclude to emit single entry type; add parseVocabBoost; expand NORM_COMMANDS with new norms
  • commands/registry.ts — add vocab-boost verb definition, help text; add new norm entries
  • commands/compiler.ts — update Include compilation (no more separate Coverage); add VocabBoost compilation
  • store/constraintStore.ts — minor: handle new entry types
  • components/ConstraintBar/ — new chip color for vocab-boost

Tests

  • Engine tests for density weighting, coverage gap computation with priors, VocabBoost static/coverage modes
  • Integration tests for session-level coverage tracking across multiple turns
  • Parser tests for /vocab-boost command variants
  • Compiler tests for unified Include output

What This Does NOT Change

  • Gate constraints (Exclude, ExcludeInClusters, Complexity, MSHStage, Bound, NormCovered, VocabOnly) — untouched.
  • Boost constraints (MinPairBoost, MaxOppositionBoost) — untouched except the 422 fix.
  • CDDProjection — untouched. Bound(mechanism="cdd") still works (unreachable from commands, but the engine path is preserved).
  • Lookup format — no changes to build_lookup.py or the lookup JSON structure.
  • Governor composition order — gates → boosts → projections, unchanged.
  • Position-aware Include — deferred. The syllable data is in the lookup; filtering by onset/coda/nucleus can be added later without changing the scoring function.