PHON-154 Variant-Aware Matching — Phase 2: Worker Matching¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax.
Goal: Make phoneme-pattern and CV-shape matching span ALL attested pronunciations by querying the variants_str / cv_shapes columns (added in Phase 1) instead of the primary-only phonemes_str / cv_shape.
Architecture: variants_str concatenates every pronunciation in pipe form with || variant boundaries. A constraint matches a word if ANY variant satisfies it (include), and excludes a word if ANY variant violates it (exclude) — the governing conservative rule. Changes are confined to patterns.ts (phoneme SQL) and wordFilter.ts (CV-shape SQL); both are pure SQL-string builders, so tests are unit-level and CI-safe (no dependence on a reseeded D1).
Tech Stack: TypeScript, Hono/Workers, Vitest (cloudflare:test). Files in packages/web/workers/.
Spec: docs/superpowers/specs/2026-06-15-phon-154-variant-aware-matching-design.md. Depends on: Phase 1 (columns variants_str, cv_shapes emitted).
Out of scope for Phase 2 (explicit follow-on tasks):
- Count-range matching (min/max phoneme/syllable count matching any variant via phoneme_count_min/max, syllable_count_min/max) — flows through partitionFilterColumns in lib/queries.ts, not touched here.
- CONTAINS_MEDIAL variant post-filter — matchesMedialPattern checks the primary phonemes; making it check any variant's medial position needs the routes to pass variant phoneme lists. The SQL pre-filter already spans variants via variants_str; only the medial-position refinement remains primary-only (documented limitation until that follow-on).
- Integration tests asserting a word matched via a non-primary variant — require the reseeded D1 (deferred, folds with PHON-151). Phase 2 ships unit tests on the SQL generation.
Reference: current behavior (read before editing)¶
packages/web/workers/src/lib/patterns.ts—buildMatchClauseemitsphonemes_str LIKE/initial_phoneme =/final_phoneme =;buildPatternClausesnegates for exclude via.replace('LIKE','NOT LIKE')and IS-NULL branches.packages/web/workers/src/lib/wordFilter.ts—prefixWordsColumnsaliases bare columns tow.*; thebody.cv_shapeblock emitsw.cv_shape IN (...)/NOT IN.||boundary facts: a variant starts at string-start (|seq|...) or right after||(...||seq|...); ends at string-end (...|seq|) or right before||(...|seq||...).%|shape|%oncv_shapesmatches an exact shape segment because the trailing|anchors it (soCVCdoes not match within|CVCV|).
Task 1: Phoneme patterns match across variants¶
Files:
- Modify: packages/web/workers/src/lib/patterns.ts (buildMatchClause ~lines 33-66, buildPatternClauses ~lines 68-103)
- Test: packages/web/workers/src/lib/patterns.test.ts (create if absent; else add to it)
- [ ] Step 1: Write the failing tests
Create/append packages/web/workers/src/lib/patterns.test.ts:
import { describe, it, expect } from 'vitest';
import { buildPatternClauses } from './patterns';
describe('buildPatternClauses — variant-aware matching', () => {
it('CONTAINS matches any variant via variants_str', () => {
const r = buildPatternClauses([{ type: 'CONTAINS', phoneme: 'ɛ' }]);
expect(r.conditions).toEqual(['variants_str LIKE ?']);
expect(r.params).toEqual(['%|ɛ|%']);
});
it('STARTS_WITH matches first OR any later variant', () => {
const r = buildPatternClauses([{ type: 'STARTS_WITH', phoneme: 'h' }]);
expect(r.conditions).toEqual(['(variants_str LIKE ? OR variants_str LIKE ?)']);
expect(r.params).toEqual(['|h|%', '%||h|%']);
});
it('ENDS_WITH matches last OR any earlier variant', () => {
const r = buildPatternClauses([{ type: 'ENDS_WITH', phoneme: 'oʊ' }]);
expect(r.conditions).toEqual(['(variants_str LIKE ? OR variants_str LIKE ?)']);
expect(r.params).toEqual(['%|oʊ|', '%|oʊ||%']);
});
it('multi-phoneme CONTAINS joins with pipes', () => {
const r = buildPatternClauses([{ type: 'CONTAINS', phoneme: 'k æ t' }]);
expect(r.params).toEqual(['%|k|æ|t|%']);
});
it('exclude CONTAINS keeps words where NO variant has the phoneme', () => {
const r = buildPatternClauses([{ type: 'CONTAINS', phoneme: 'ŋ', mode: 'exclude' }]);
expect(r.conditions).toEqual(['variants_str NOT LIKE ?']);
expect(r.params).toEqual(['%|ŋ|%']);
});
it('exclude STARTS_WITH negates with AND (De Morgan), not OR', () => {
const r = buildPatternClauses([{ type: 'STARTS_WITH', phoneme: 'h', mode: 'exclude' }]);
expect(r.conditions).toEqual(['(variants_str NOT LIKE ? AND variants_str NOT LIKE ?)']);
expect(r.params).toEqual(['|h|%', '%||h|%']);
});
it('flags CONTAINS_MEDIAL for post-filter while still emitting variant SQL', () => {
const r = buildPatternClauses([{ type: 'CONTAINS_MEDIAL', phoneme: 't' }]);
expect(r.conditions).toEqual(['variants_str LIKE ?']);
expect(r.needsMedialPostFilter).toBe(true);
expect(r.medialSequences).toEqual([{ seq: ['t'], mode: 'include' }]);
});
});
- [ ] Step 2: Run tests to verify they fail
Run: cd packages/web/workers && npx vitest run src/lib/patterns.test.ts
Expected: FAIL (current code emits phonemes_str LIKE / initial_phoneme =, not variants_str).
- [ ] Step 3: Rewrite
buildMatchClauseand the exclude path
Replace buildMatchClause (patterns.ts ~lines 33-66) with:
/** Build a rule's include condition, its correct negation (exclude), and params.
* Matches against `variants_str` — all attested pronunciations concatenated in
* pipe form with `||` variant boundaries — so a constraint satisfied by ANY
* variant matches the word, and exclude removes the word if ANY variant
* satisfies it (conservative rule). */
function buildMatchClause(pat: PatternRule): {
condition: string;
excludeCondition: string;
params: unknown[];
seq: string[];
} {
const phonemeStr = pat.phoneme;
const seq = phonemeStr.includes(' ')
? phonemeStr.split(' ').map(normalizePhoneme)
: [normalizePhoneme(phonemeStr)];
const joined = seq.join('|');
switch (pat.type) {
case 'STARTS_WITH':
// A variant starts at the string start (|seq|…) or right after a `||` boundary.
return {
condition: '(variants_str LIKE ? OR variants_str LIKE ?)',
excludeCondition: '(variants_str NOT LIKE ? AND variants_str NOT LIKE ?)',
params: ['|' + joined + '|%', '%||' + joined + '|%'],
seq,
};
case 'ENDS_WITH':
// A variant ends at the string end (…|seq|) or right before a `||` boundary.
return {
condition: '(variants_str LIKE ? OR variants_str LIKE ?)',
excludeCondition: '(variants_str NOT LIKE ? AND variants_str NOT LIKE ?)',
params: ['%|' + joined + '|', '%|' + joined + '||%'],
seq,
};
case 'CONTAINS':
case 'CONTAINS_MEDIAL':
return {
condition: 'variants_str LIKE ?',
excludeCondition: 'variants_str NOT LIKE ?',
params: ['%|' + joined + '|%'],
seq,
};
}
}
Replace the per-rule loop body in buildPatternClauses (the block that computes condition from match, ~lines 74-99) with:
for (const pat of patterns) {
const mode: 'include' | 'exclude' = pat.mode ?? 'include';
const match = buildMatchClause(pat);
// Include = any variant matches; Exclude = no variant matches (the
// excludeCondition is the De Morgan negation, correct for the OR'd
// STARTS_WITH/ENDS_WITH conditions as well as the single-LIKE CONTAINS).
conditions.push(mode === 'include' ? match.condition : match.excludeCondition);
params.push(...match.params);
if (pat.type === 'CONTAINS_MEDIAL') {
needsMedialPostFilter = true;
medialSequences.push({ seq: match.seq, mode });
}
}
(The IS-NULL branches for initial_phoneme/final_phoneme are gone — those columns are no longer used by patterns. variants_str is non-NULL for every phonology-bearing word, so no NULL guard is needed.)
- [ ] Step 4: Run tests to verify they pass
Run: cd packages/web/workers && npx vitest run src/lib/patterns.test.ts
Expected: PASS (all 7).
- [ ] Step 5: Commit
git add packages/web/workers/src/lib/patterns.ts packages/web/workers/src/lib/patterns.test.ts
git commit -m "feat(phon-154): phoneme patterns match across variants (variants_str)"
Task 2: Teach wordFilter to alias variants_str and match CV across variants¶
Files:
- Modify: packages/web/workers/src/lib/wordFilter.ts (prefixWordsColumns ~lines 73-80; the body.cv_shape block ~lines 156-168)
- Test: packages/web/workers/src/lib/wordFilter.test.ts (create if absent; else add to it)
- [ ] Step 1: Write the failing tests
Create/append packages/web/workers/src/lib/wordFilter.test.ts:
import { describe, it, expect } from 'vitest';
import { compileWordFilter } from './wordFilter';
describe('compileWordFilter — variant-aware matching', () => {
it('prefixes variants_str with the w. alias in pattern clauses', () => {
const c = compileWordFilter({ patterns: [{ type: 'CONTAINS', phoneme: 'ɛ' }] });
expect(c.wordsWhere.some((w) => w === 'w.variants_str LIKE ?')).toBe(true);
expect(c.wordsParams).toContain('%|ɛ|%');
});
it('CV-shape include matches any variant via cv_shapes', () => {
const c = compileWordFilter({ cv_shape: ['CVC', 'CV'] });
expect(c.wordsWhere).toContain('(w.cv_shapes LIKE ? OR w.cv_shapes LIKE ?)');
expect(c.wordsParams).toEqual(expect.arrayContaining(['%|CVC|%', '%|CV|%']));
});
it('CV-shape exclude removes words where ANY variant has the shape', () => {
const c = compileWordFilter({ cv_shape: ['CVC'], cv_shape_mode: 'exclude' });
expect(c.wordsWhere).toContain('(w.cv_shapes IS NULL OR (w.cv_shapes NOT LIKE ?))');
expect(c.wordsParams).toContain('%|CVC|%');
});
});
- [ ] Step 2: Run tests to verify they fail
Run: cd packages/web/workers && npx vitest run src/lib/wordFilter.test.ts
Expected: FAIL (current code emits w.cv_shape IN (...) and doesn't alias variants_str).
- [ ] Step 3: Add the
variants_stralias
In wordFilter.ts prefixWordsColumns, add a replace for variants_str (keep the existing lines):
function prefixWordsColumns(condition: string): string {
return condition
.replace(/\b(variants_str)\b/g, 'w.$1')
.replace(/\b(phonemes_str)\b/g, 'w.$1')
.replace(/\b(initial_phoneme)\b/g, 'w.$1')
.replace(/\b(final_phoneme)\b/g, 'w.$1')
.replace(/\b(phoneme_count)\b(?!\s*_percentile)/g, 'w.$1')
.replace(/\b(syllable_count)\b(?!\s*_percentile)/g, 'w.$1');
}
- [ ] Step 4: Rewrite the CV-shape block to match
cv_shapes
Replace the if (body.cv_shape?.length) { … } block (~lines 156-168) with:
// CV shape — match against cv_shapes (pipe-bounded set of ALL variants' shapes,
// e.g. |CVC|CVCV|). Include: any variant has any listed shape. Exclude: no
// variant has any listed shape (NULL cv_shapes passes the exclude). The trailing
// `|` in the LIKE pattern anchors the shape so "CVC" doesn't match within "CVCV".
if (body.cv_shape?.length) {
const mode = body.cv_shape_mode ?? 'include';
if (mode === 'include') {
const likes = body.cv_shape.map(() => 'w.cv_shapes LIKE ?').join(' OR ');
wordsWhere.push(`(${likes})`);
} else {
const notLikes = body.cv_shape.map(() => 'w.cv_shapes NOT LIKE ?').join(' AND ');
wordsWhere.push(`(w.cv_shapes IS NULL OR (${notLikes}))`);
}
wordsParams.push(...body.cv_shape.map((s) => '%|' + s + '|%'));
}
- [ ] Step 5: Run tests to verify they pass
Run: cd packages/web/workers && npx vitest run src/lib/wordFilter.test.ts
Expected: PASS.
- [ ] Step 6: Commit
git add packages/web/workers/src/lib/wordFilter.ts packages/web/workers/src/lib/wordFilter.test.ts
git commit -m "feat(phon-154): CV-shape + variants_str matching across variants in wordFilter"
Task 3: Full worker regression (type-check + existing tests)¶
Files: none (verification only)
- [ ] Step 1: Type-check
Run: cd packages/web/workers && npm run type-check
Expected: clean. (If PatternRule/types complain about the new return shape, fix the typing in patterns.ts — the function's return type is inline, so this should not surface.)
- [ ] Step 2: Full worker test suite
Run: cd packages/web/workers && npm test
Expected: all pass. Existing api.test.ts sentence/word-search tests use the LOCAL D1 seed (which, until the deferred reseed, still has the OLD schema WITHOUT variants_str). If any existing integration test now fails because a query references variants_str against an un-reseeded local D1, that is the EXPECTED consequence of the deferred reseed — note it and locally reseed (see Phase 1 Task 5 steps, local-only, do NOT commit the seed) to confirm green, then report. Do NOT weaken the new SQL to dodge this.
- [ ] Step 3: Commit (only if any test file needed adjustment)
git add -A packages/web/workers
git commit -m "test(phon-154): worker suite green with variant-aware matching"
Phase 2 done — exit criteria¶
patterns.tsemitsvariants_strLIKE clauses (STARTS/ENDS/CONTAINS) with correct De-Morgan exclude; unit tests green.wordFilter.tsaliasesvariants_strand matchescv_shapes; unit tests green.- Worker type-check + suite green (with a local reseed if integration tests touch
variants_str). - Phoneme + CV-shape matching now span all attested pronunciations.
Next¶
- Phase 2b (follow-on): count-range matching (
phoneme_count_min/max,syllable_count_min/max) vialib/queries.tspartitionFilterColumns; CONTAINS_MEDIAL variant post-filter in the routes. - Phase 3: minimal-pair generation across variants (build-time).
- Phase 4: audio per-variant scoring + frontend variant display + superscript flag.