Skip to content

PHON-154 Variant-Aware Matching — Phase 2: Worker Matching

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax.

Goal: Make phoneme-pattern and CV-shape matching span ALL attested pronunciations by querying the variants_str / cv_shapes columns (added in Phase 1) instead of the primary-only phonemes_str / cv_shape.

Architecture: variants_str concatenates every pronunciation in pipe form with || variant boundaries. A constraint matches a word if ANY variant satisfies it (include), and excludes a word if ANY variant violates it (exclude) — the governing conservative rule. Changes are confined to patterns.ts (phoneme SQL) and wordFilter.ts (CV-shape SQL); both are pure SQL-string builders, so tests are unit-level and CI-safe (no dependence on a reseeded D1).

Tech Stack: TypeScript, Hono/Workers, Vitest (cloudflare:test). Files in packages/web/workers/.

Spec: docs/superpowers/specs/2026-06-15-phon-154-variant-aware-matching-design.md. Depends on: Phase 1 (columns variants_str, cv_shapes emitted).

Out of scope for Phase 2 (explicit follow-on tasks): - Count-range matching (min/max phoneme/syllable count matching any variant via phoneme_count_min/max, syllable_count_min/max) — flows through partitionFilterColumns in lib/queries.ts, not touched here. - CONTAINS_MEDIAL variant post-filtermatchesMedialPattern checks the primary phonemes; making it check any variant's medial position needs the routes to pass variant phoneme lists. The SQL pre-filter already spans variants via variants_str; only the medial-position refinement remains primary-only (documented limitation until that follow-on). - Integration tests asserting a word matched via a non-primary variant — require the reseeded D1 (deferred, folds with PHON-151). Phase 2 ships unit tests on the SQL generation.


Reference: current behavior (read before editing)

  • packages/web/workers/src/lib/patterns.tsbuildMatchClause emits phonemes_str LIKE/initial_phoneme =/final_phoneme =; buildPatternClauses negates for exclude via .replace('LIKE','NOT LIKE') and IS-NULL branches.
  • packages/web/workers/src/lib/wordFilter.tsprefixWordsColumns aliases bare columns to w.*; the body.cv_shape block emits w.cv_shape IN (...) / NOT IN.
  • || boundary facts: a variant starts at string-start (|seq|...) or right after || (...||seq|...); ends at string-end (...|seq|) or right before || (...|seq||...). %|shape|% on cv_shapes matches an exact shape segment because the trailing | anchors it (so CVC does not match within |CVCV|).

Task 1: Phoneme patterns match across variants

Files: - Modify: packages/web/workers/src/lib/patterns.ts (buildMatchClause ~lines 33-66, buildPatternClauses ~lines 68-103) - Test: packages/web/workers/src/lib/patterns.test.ts (create if absent; else add to it)

  • [ ] Step 1: Write the failing tests

Create/append packages/web/workers/src/lib/patterns.test.ts:

import { describe, it, expect } from 'vitest';
import { buildPatternClauses } from './patterns';

describe('buildPatternClauses — variant-aware matching', () => {
  it('CONTAINS matches any variant via variants_str', () => {
    const r = buildPatternClauses([{ type: 'CONTAINS', phoneme: 'ɛ' }]);
    expect(r.conditions).toEqual(['variants_str LIKE ?']);
    expect(r.params).toEqual(['%|ɛ|%']);
  });

  it('STARTS_WITH matches first OR any later variant', () => {
    const r = buildPatternClauses([{ type: 'STARTS_WITH', phoneme: 'h' }]);
    expect(r.conditions).toEqual(['(variants_str LIKE ? OR variants_str LIKE ?)']);
    expect(r.params).toEqual(['|h|%', '%||h|%']);
  });

  it('ENDS_WITH matches last OR any earlier variant', () => {
    const r = buildPatternClauses([{ type: 'ENDS_WITH', phoneme: 'oʊ' }]);
    expect(r.conditions).toEqual(['(variants_str LIKE ? OR variants_str LIKE ?)']);
    expect(r.params).toEqual(['%|oʊ|', '%|oʊ||%']);
  });

  it('multi-phoneme CONTAINS joins with pipes', () => {
    const r = buildPatternClauses([{ type: 'CONTAINS', phoneme: 'k æ t' }]);
    expect(r.params).toEqual(['%|k|æ|t|%']);
  });

  it('exclude CONTAINS keeps words where NO variant has the phoneme', () => {
    const r = buildPatternClauses([{ type: 'CONTAINS', phoneme: 'ŋ', mode: 'exclude' }]);
    expect(r.conditions).toEqual(['variants_str NOT LIKE ?']);
    expect(r.params).toEqual(['%|ŋ|%']);
  });

  it('exclude STARTS_WITH negates with AND (De Morgan), not OR', () => {
    const r = buildPatternClauses([{ type: 'STARTS_WITH', phoneme: 'h', mode: 'exclude' }]);
    expect(r.conditions).toEqual(['(variants_str NOT LIKE ? AND variants_str NOT LIKE ?)']);
    expect(r.params).toEqual(['|h|%', '%||h|%']);
  });

  it('flags CONTAINS_MEDIAL for post-filter while still emitting variant SQL', () => {
    const r = buildPatternClauses([{ type: 'CONTAINS_MEDIAL', phoneme: 't' }]);
    expect(r.conditions).toEqual(['variants_str LIKE ?']);
    expect(r.needsMedialPostFilter).toBe(true);
    expect(r.medialSequences).toEqual([{ seq: ['t'], mode: 'include' }]);
  });
});
  • [ ] Step 2: Run tests to verify they fail

Run: cd packages/web/workers && npx vitest run src/lib/patterns.test.ts Expected: FAIL (current code emits phonemes_str LIKE / initial_phoneme =, not variants_str).

  • [ ] Step 3: Rewrite buildMatchClause and the exclude path

Replace buildMatchClause (patterns.ts ~lines 33-66) with:

/** Build a rule's include condition, its correct negation (exclude), and params.
 *  Matches against `variants_str` — all attested pronunciations concatenated in
 *  pipe form with `||` variant boundaries — so a constraint satisfied by ANY
 *  variant matches the word, and exclude removes the word if ANY variant
 *  satisfies it (conservative rule). */
function buildMatchClause(pat: PatternRule): {
  condition: string;
  excludeCondition: string;
  params: unknown[];
  seq: string[];
} {
  const phonemeStr = pat.phoneme;
  const seq = phonemeStr.includes(' ')
    ? phonemeStr.split(' ').map(normalizePhoneme)
    : [normalizePhoneme(phonemeStr)];
  const joined = seq.join('|');

  switch (pat.type) {
    case 'STARTS_WITH':
      // A variant starts at the string start (|seq|…) or right after a `||` boundary.
      return {
        condition: '(variants_str LIKE ? OR variants_str LIKE ?)',
        excludeCondition: '(variants_str NOT LIKE ? AND variants_str NOT LIKE ?)',
        params: ['|' + joined + '|%', '%||' + joined + '|%'],
        seq,
      };
    case 'ENDS_WITH':
      // A variant ends at the string end (…|seq|) or right before a `||` boundary.
      return {
        condition: '(variants_str LIKE ? OR variants_str LIKE ?)',
        excludeCondition: '(variants_str NOT LIKE ? AND variants_str NOT LIKE ?)',
        params: ['%|' + joined + '|', '%|' + joined + '||%'],
        seq,
      };
    case 'CONTAINS':
    case 'CONTAINS_MEDIAL':
      return {
        condition: 'variants_str LIKE ?',
        excludeCondition: 'variants_str NOT LIKE ?',
        params: ['%|' + joined + '|%'],
        seq,
      };
  }
}

Replace the per-rule loop body in buildPatternClauses (the block that computes condition from match, ~lines 74-99) with:

  for (const pat of patterns) {
    const mode: 'include' | 'exclude' = pat.mode ?? 'include';
    const match = buildMatchClause(pat);
    // Include = any variant matches; Exclude = no variant matches (the
    // excludeCondition is the De Morgan negation, correct for the OR'd
    // STARTS_WITH/ENDS_WITH conditions as well as the single-LIKE CONTAINS).
    conditions.push(mode === 'include' ? match.condition : match.excludeCondition);
    params.push(...match.params);

    if (pat.type === 'CONTAINS_MEDIAL') {
      needsMedialPostFilter = true;
      medialSequences.push({ seq: match.seq, mode });
    }
  }

(The IS-NULL branches for initial_phoneme/final_phoneme are gone — those columns are no longer used by patterns. variants_str is non-NULL for every phonology-bearing word, so no NULL guard is needed.)

  • [ ] Step 4: Run tests to verify they pass

Run: cd packages/web/workers && npx vitest run src/lib/patterns.test.ts Expected: PASS (all 7).

  • [ ] Step 5: Commit
git add packages/web/workers/src/lib/patterns.ts packages/web/workers/src/lib/patterns.test.ts
git commit -m "feat(phon-154): phoneme patterns match across variants (variants_str)"

Task 2: Teach wordFilter to alias variants_str and match CV across variants

Files: - Modify: packages/web/workers/src/lib/wordFilter.ts (prefixWordsColumns ~lines 73-80; the body.cv_shape block ~lines 156-168) - Test: packages/web/workers/src/lib/wordFilter.test.ts (create if absent; else add to it)

  • [ ] Step 1: Write the failing tests

Create/append packages/web/workers/src/lib/wordFilter.test.ts:

import { describe, it, expect } from 'vitest';
import { compileWordFilter } from './wordFilter';

describe('compileWordFilter — variant-aware matching', () => {
  it('prefixes variants_str with the w. alias in pattern clauses', () => {
    const c = compileWordFilter({ patterns: [{ type: 'CONTAINS', phoneme: 'ɛ' }] });
    expect(c.wordsWhere.some((w) => w === 'w.variants_str LIKE ?')).toBe(true);
    expect(c.wordsParams).toContain('%|ɛ|%');
  });

  it('CV-shape include matches any variant via cv_shapes', () => {
    const c = compileWordFilter({ cv_shape: ['CVC', 'CV'] });
    expect(c.wordsWhere).toContain('(w.cv_shapes LIKE ? OR w.cv_shapes LIKE ?)');
    expect(c.wordsParams).toEqual(expect.arrayContaining(['%|CVC|%', '%|CV|%']));
  });

  it('CV-shape exclude removes words where ANY variant has the shape', () => {
    const c = compileWordFilter({ cv_shape: ['CVC'], cv_shape_mode: 'exclude' });
    expect(c.wordsWhere).toContain('(w.cv_shapes IS NULL OR (w.cv_shapes NOT LIKE ?))');
    expect(c.wordsParams).toContain('%|CVC|%');
  });
});
  • [ ] Step 2: Run tests to verify they fail

Run: cd packages/web/workers && npx vitest run src/lib/wordFilter.test.ts Expected: FAIL (current code emits w.cv_shape IN (...) and doesn't alias variants_str).

  • [ ] Step 3: Add the variants_str alias

In wordFilter.ts prefixWordsColumns, add a replace for variants_str (keep the existing lines):

function prefixWordsColumns(condition: string): string {
  return condition
    .replace(/\b(variants_str)\b/g, 'w.$1')
    .replace(/\b(phonemes_str)\b/g, 'w.$1')
    .replace(/\b(initial_phoneme)\b/g, 'w.$1')
    .replace(/\b(final_phoneme)\b/g, 'w.$1')
    .replace(/\b(phoneme_count)\b(?!\s*_percentile)/g, 'w.$1')
    .replace(/\b(syllable_count)\b(?!\s*_percentile)/g, 'w.$1');
}
  • [ ] Step 4: Rewrite the CV-shape block to match cv_shapes

Replace the if (body.cv_shape?.length) { … } block (~lines 156-168) with:

  // CV shape — match against cv_shapes (pipe-bounded set of ALL variants' shapes,
  // e.g. |CVC|CVCV|). Include: any variant has any listed shape. Exclude: no
  // variant has any listed shape (NULL cv_shapes passes the exclude). The trailing
  // `|` in the LIKE pattern anchors the shape so "CVC" doesn't match within "CVCV".
  if (body.cv_shape?.length) {
    const mode = body.cv_shape_mode ?? 'include';
    if (mode === 'include') {
      const likes = body.cv_shape.map(() => 'w.cv_shapes LIKE ?').join(' OR ');
      wordsWhere.push(`(${likes})`);
    } else {
      const notLikes = body.cv_shape.map(() => 'w.cv_shapes NOT LIKE ?').join(' AND ');
      wordsWhere.push(`(w.cv_shapes IS NULL OR (${notLikes}))`);
    }
    wordsParams.push(...body.cv_shape.map((s) => '%|' + s + '|%'));
  }
  • [ ] Step 5: Run tests to verify they pass

Run: cd packages/web/workers && npx vitest run src/lib/wordFilter.test.ts Expected: PASS.

  • [ ] Step 6: Commit
git add packages/web/workers/src/lib/wordFilter.ts packages/web/workers/src/lib/wordFilter.test.ts
git commit -m "feat(phon-154): CV-shape + variants_str matching across variants in wordFilter"

Task 3: Full worker regression (type-check + existing tests)

Files: none (verification only)

  • [ ] Step 1: Type-check

Run: cd packages/web/workers && npm run type-check Expected: clean. (If PatternRule/types complain about the new return shape, fix the typing in patterns.ts — the function's return type is inline, so this should not surface.)

  • [ ] Step 2: Full worker test suite

Run: cd packages/web/workers && npm test Expected: all pass. Existing api.test.ts sentence/word-search tests use the LOCAL D1 seed (which, until the deferred reseed, still has the OLD schema WITHOUT variants_str). If any existing integration test now fails because a query references variants_str against an un-reseeded local D1, that is the EXPECTED consequence of the deferred reseed — note it and locally reseed (see Phase 1 Task 5 steps, local-only, do NOT commit the seed) to confirm green, then report. Do NOT weaken the new SQL to dodge this.

  • [ ] Step 3: Commit (only if any test file needed adjustment)
git add -A packages/web/workers
git commit -m "test(phon-154): worker suite green with variant-aware matching"

Phase 2 done — exit criteria

  • patterns.ts emits variants_str LIKE clauses (STARTS/ENDS/CONTAINS) with correct De-Morgan exclude; unit tests green.
  • wordFilter.ts aliases variants_str and matches cv_shapes; unit tests green.
  • Worker type-check + suite green (with a local reseed if integration tests touch variants_str).
  • Phoneme + CV-shape matching now span all attested pronunciations.

Next

  • Phase 2b (follow-on): count-range matching (phoneme_count_min/max, syllable_count_min/max) via lib/queries.ts partitionFilterColumns; CONTAINS_MEDIAL variant post-filter in the routes.
  • Phase 3: minimal-pair generation across variants (build-time).
  • Phase 4: audio per-variant scoring + frontend variant display + superscript flag.