v5.2.1 handoff — Sentences as a first-class clinical surface¶
Branch: polish/v5.2.1-ui-copy → develop → main
Predecessor: the v5.1 handoff narrated the CSP solver + Qwen3-Embedding reranker that was retired in v5.2.0 (PR #113). See memory/project_csp_and_reranker_deprecated.md for the post-mortem; that work is snapshot at tag archive/csp-generation-v5.2.
What v5.2.1 ships¶
Sentences becomes a first-class clinical surface — same constraint vocabulary as Word Lists / Contrast Sets / Lookup, plus a per-result word-highlight overlay that makes triage workable. The 50+ commits between develop and the polish branch cluster into four buckets:
1. Sentences route now actually filters¶
- Contrastive constraints wired — minpair / maxopp / multopp self-joins through the
pairstable. Previously the route silently dropped these atconstraintsToBody, meaning the UI's contrastive controls did nothing for as long as the D1-only refactor had been live. Seememory/feedback_silent_drops_are_regressions.md. CONTAINS_MEDIALenforces edge exclusion in SQL — no more word-edge matches under medial-only.- NULL-fail for percentile bounds — content surfaces with NULL percentile (zero evidence in the relevant frequency band) fail the bound. Raw norm bounds keep NULL-pass.
- Tiered ranking — global by per-query
match_countDESC (multi-hit sentences before single-hit, regardless of source), source-interleaved within tier byrarity_score. - Per-result highlight payload —
{include_surfaces, pair_surfaces}per match; CorpusRow renders via WordHighlighter compliance mode.
2. Corpus quality gates¶
- SLP content-suitability gate (in-house V/A norms + AFINN NEG_3/4/5 buckets) — drops violence/conflict content
- PROPN cap of 2 per sentence + English-frequency threshold (5wpm in FineWeb-Edu) substituting for langdetect
uhverbal filler + Spanish loanword denylist + letter-spelled-word (X-Y-Z) rejection- Parataxis dependency rejection (catches single-ROOT run-on patterns)
- spaCy contraction handling — stem + suffix glued back to whole-word contraction surfaces (
don't,won't,it's) - CHILDES + PhonBank retired (CHAT-transcript artifacts unfit for SLP)
- Cross-source identical-text merge with aggressive normalization
- Leading-quote / leading-dash strip on the persisted
textcolumn
Corpus: 342,976 → 235,866 sentences (-31%), almost entirely from upstream quality gates rather than corpus-source removal.
3. Developmental frequency rework¶
freq_age_adult→freq_age_all(renamed, repointed to FineWeb-Edufrequency)freq_age_2y/5y/8y/12ynow aggregate child PRODUCTION (CHILDES + PhonBank prod channels), not caregiver INPUTnaturalness_scoreretired end-to-end;rarity_scoreis the persisted ranking signal- Frequency-class percentile properties treat value=0 as NULL (zero-occurrence words no longer cluster at the 57th percentile)
4. UI polish¶
- Chip onDelete bug — store gained
removeAt(index)so chip deletes target exactly the clicked entry (previously partial-match field comparison could collapse sibling constraints) - WordHighlighter overlay normalised on the result cards (strips trailing punctuation so
"d."in highlight set matches"d"in rendered text) - IPA keyboard icon spacing in ContrastiveSection
- Tool descriptions in AppHeader drawer updated for Sentences
R&D workstreams (where the loop closes)¶
| Workstream | State | Endgame role |
|---|---|---|
| Audio Detection (PHON-44/53/55) | Active — transcriber model trained | Diagnostic input + progress feedback |
| Curriculum Recommender | Concept | Diagnostic profile → graded sequence of targets, delivered through the live tools. Successor framing for "Content Catalog." |
| Governed Generation | Paused (CSP retired in v5.2.0) | Returns as R&D when curricula need synthetic material the corpus can't supply |
| Adaptive Loop | Concept | Glue closing diagnostic → curriculum → feedback |
Migration / API breaking changes¶
Consumers of /api/sentences and /api/property-metadata will see:
naturalness_scoreremoved from response;rarity_score+match_count+highlightsaddedfreq_age_adultrenamed tofreq_age_all- Percentile bounds now exclude ~5-15× more sentences each (NULL-fail semantics)
freq_age_*numeric ranks have shifted (child PROD source vs caregiver INPUT source — different distributions)- Contrastive constraints actually filter results (previously dropped silently)
Verification quick-commands¶
# Corpus + worker + frontend tests
uv run python -m pytest packages/data/tests/ -q
cd packages/web/workers && npm test
cd ../frontend && npm test && npx tsc --noEmit
# Smoke the route
curl -s -X POST http://localhost:8787/api/sentences \
-H 'Content-Type: application/json' \
-d '{"constraints":[{"type":"contrastive_minpair","phoneme1":"b","phoneme2":"d","position":"initial"}],"top_k":5}'
Open items worth thinking about¶
- Curriculum recommender scope — what the API surface looks like, where it lives in the existing tool grid
- Audio detection integration — how diagnostic profiles flow from the audio workstream into curriculum recommendations
- Inverted index for constraint cardinality — large pre-computed coverage tables (~29M rows) staged in
research/2026-05-21-corpus-filter-audit/could speed up cardinality preview ("4,000 sentences before AND-cascade"). Deferred until the CTE path shows latency issues at production scale. - Coverage badges in UI — surface "X sentences match this constraint" cardinality so the AND-cascade math is visible to the clinician before they run the query.