PHON-142 Dev Integration — Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development. Steps use checkbox (
- [ ]) syntax.
Goal: Wire the PHON-142 winners — the faithful ft-l2 transcriber and the L1 scoring-prior — into the /dev/pronounce dev flow against the local phonolex_audio. Dev-page validation only; NOT user-facing (synthesis into a product feature is a later, user-driven step).
Architecture: (1) phonolex_audio multi-model registry serves ft-l2; (2) the L1 prior P(produced|canonical,L1,position) ships to the Worker as a bundled JSON loaded into isolate memory; (3) pronunciationScore.ts applies the prior when l1 is set (position derived from the canonical index of the single target word); (4) /dev/pronounce gains an ft-l2 toggle option so the L1 dropdown now drives the variant/error call.
Tech Stack: Python (phonolex_audio serving, prior export), TypeScript (Workers scorer/route, React dev page), Vitest.
Branch: research/phon-142-ft-l2-l1-transcriber. Spec/RESULTS: research/2026-06-05-phon-142-ft-l2/ (RESULTS.md, scoring_prior.py).
File Structure¶
- Create
research/2026-06-05-phon-142-ft-l2/export_prior.py— emitl1_prior.jsonfromscoring_prior.py's trained prior. - Create
packages/web/workers/src/config/l1Prior.json— the bundled prior (small derived statistic; committed). - Modify
packages/web/workers/src/lib/pronunciationScore.ts— L1-prior classification path. - Modify
packages/web/workers/src/routes/audio.ts— load the prior, passl1+ position into scoring. - Modify
packages/audio/src/phonolex_audio/{server.py,__main__.py}— multi-model registry (off-the-shelf/ft-l2/ft-child). - Modify
packages/web/frontend/src/components/tools/PronunciationViewer.tsx— addft-l2toggle option. - Tests alongside each.
Reuse: scoring_prior.py (the validated prior logic — port classify_with_prior to TS), VARIANT_ERROR_THRESHOLD (0.112), the existing phoneme_dots/phonemes cache pattern.
Task 1: Export the L1 prior → bundled JSON¶
Files: Create research/2026-06-05-phon-142-ft-l2/export_prior.py; Create packages/web/workers/src/config/l1Prior.json
-
[ ] Step 1: Write
export_prior.py— build the prior fromscoring_prior.py(use the same TRAIN-split source it was validated on; that's the prior we tunedp_min=0.15against) and emit a compact JSON the Worker can index:Adapt the function/attribute names to"""Emit l1Prior.json for the Worker. Structure: { "p_min": 0.15, "threshold": 0.112, "table": { "<canonical>|<l1>|<position>": { "<produced>": prob, ... }, ... } } Run: uv run python export_prior.py""" import json from pathlib import Path from importlib import import_module import sys sys.path.insert(0, str(Path(__file__).resolve().parent)) sp = import_module("scoring_prior") prior = sp.build_prior(sp.load_train_rows()) # use the SAME builder the study used; adapt fn names OUT = Path(__file__).resolve().parents[2] / "packages/web/workers/src/config/l1Prior.json" table = {} for (canon, l1, pos), dist in sp.iter_prior(prior): # adapt to the real prior structure table[f"{canon}|{l1}|{pos}"] = {prod: round(float(p), 6) for prod, p in dist.items()} OUT.write_text(json.dumps({"p_min": sp.P_MIN, "threshold": sp.THRESHOLD, "table": table}, indent=0)) print(f"wrote {OUT}: {len(table)} contexts")scoring_prior.py's actual API (read it first). Ifscoring_priorlacks a clean iterator, add a tinyiter_prior/to_dicthelper there. -
[ ] Step 2: Run it.
cd research/2026-06-05-phon-142-ft-l2 && uv run python export_prior.py→ confirml1Prior.jsonwritten, contexts ~hundreds, file < ~1 MB. Spot-check one Spanish onset entry (e.g.v|Spanish|onsetshould givebnon-trivial prob). -
[ ] Step 3: Commit (the JSON is a derived statistic — ships):
git add research/2026-06-05-phon-142-ft-l2/export_prior.py packages/web/workers/src/config/l1Prior.json git commit -m "feat(phon-142): export L1 scoring-prior to bundled l1Prior.json"
Task 2: L1-prior classification in pronunciationScore.ts¶
Files: Modify packages/web/workers/src/lib/pronunciationScore.ts; Modify packages/web/workers/src/__tests__/pronunciationScore.test.ts
-
[ ] Step 1: Write the failing test. Append:
import { positionForIndex, classifyWithPrior, type L1Prior } from '../lib/pronunciationScore'; const PRIOR: L1Prior = { p_min: 0.15, threshold: 0.112, table: { 'v|Spanish|onset': { b: 0.4 }, 'v|Korean|onset': {} }, }; describe('L1-prior classification', () => { it('position is onset at index 0, coda at last, medial otherwise', () => { expect(positionForIndex(0, 4)).toBe('onset'); expect(positionForIndex(3, 4)).toBe('coda'); expect(positionForIndex(1, 4)).toBe('medial'); }); it('an L1-typical sub above the prior floor is variant even above cos_dist threshold', () => { // Spanish onset v->b, prior 0.4 >= p_min 0.15, cos_dist 0.30 > 0.112 expect(classifyWithPrior('v','b','onset',0.30,'Spanish',PRIOR)).toBe('variant'); }); it('the same sub for an L1 that does not do it stays error', () => { expect(classifyWithPrior('v','b','onset',0.30,'Korean',PRIOR)).toBe('error'); }); it('below the cos_dist threshold is always variant regardless of L1', () => { expect(classifyWithPrior('v','b','onset',0.05,'Korean',PRIOR)).toBe('variant'); }); }); -
[ ] Step 2: Run, expect FAIL.
cd packages/web/workers && npx vitest run src/__tests__/pronunciationScore.test.ts -
[ ] Step 3: Implement. Append to
pronunciationScore.ts:Then extendexport interface L1Prior { p_min: number; threshold: number; table: Record<string, Record<string, number>>; // "canon|l1|position" -> { produced: prob } } export type Position = 'onset' | 'medial' | 'coda'; export function positionForIndex(i: number, n: number): Position { if (i === 0) return 'onset'; if (i === n - 1) return 'coda'; return 'medial'; } /** L1-conditioned variant/error decision (ports scoring_prior.classify_with_prior). */ export function classifyWithPrior( canonical: string, produced: string, position: Position, cosDistValue: number, l1: string, prior: L1Prior, ): 'variant' | 'error' { if (cosDistValue < prior.threshold) return 'variant'; const dist = prior.table[`${canonical}|${l1}|${position}`]; const p = dist?.[produced] ?? 0; return p >= prior.p_min ? 'variant' : 'error'; }scorePronunciationto acceptopts?: { l1?: string; prior?: L1Prior }: whenl1+priorare present, classify eachsubposition withclassifyWithPrior(canonical, produced, positionForIndex(i, canonical.length), cos_dist, l1, prior); deletions → error; the per-wordvariant_vs_error_class= worst position as before. When absent, keep the existing L1-agnostic threshold path. Setthreshold_basisto'l1_conditioned'when the prior is used, else'l1_agnostic'. -
[ ] Step 4: Run, expect PASS. Full file +
npx tsc --noEmit. -
[ ] Step 5: Fixture pin (optional-but-recommended). Add a case to
export_prior.py/a small fixture asserting the TSclassifyWithPriormatchesscoring_prior.classify_with_prioron a handful of (canonical, produced, position, l1, cos_dist) tuples. Skip only if time-constrained. -
[ ] Step 6: Commit.
git add packages/web/workers/src/lib/pronunciationScore.ts packages/web/workers/src/__tests__/pronunciationScore.test.ts git commit -m "feat(phon-142): L1-prior variant/error classification in scorer"
Task 3: Wire the prior into /api/audio/pronounce¶
Files: Modify packages/web/workers/src/routes/audio.ts; Modify packages/web/workers/src/__tests__/audio.test.ts
-
[ ] Step 1: Implement. Import the bundled prior + pass it through:
In theimport l1Prior from '../config/l1Prior.json'; import { scorePronunciation, type L1Prior } from '../lib/pronunciationScore';/pronouncehandler, after the transcript + canonical are obtained, call:The response already echoesconst score = scorePronunciation(canonical, produced, cache, l1 ? { l1, prior: l1Prior as L1Prior } : undefined);l1and now carriesthreshold_basis: 'l1_conditioned'whenl1was supplied. ConfirmresolveJsonModuleis on (it is, per PHON-129 Task 4) and the JSON import type-checks. -
[ ] Step 2: Tests. Add a route test that posts with
l1=Spanishand asserts the responsethreshold_basis === 'l1_conditioned'(the D1-dependent scoring path stays graceful in the unseeded test env per the house pattern — assert on the seam, not a seeded-DB score). Runnpx vitest run+npx tsc --noEmit. -
[ ] Step 3: Commit.
git add packages/web/workers/src/routes/audio.ts packages/web/workers/src/__tests__/audio.test.ts git commit -m "feat(phon-142): /api/audio/pronounce applies the L1 prior when l1 is set"
Task 4: phonolex_audio multi-model registry (serve ft-l2)¶
Files: Modify packages/audio/src/phonolex_audio/{server.py,__main__.py}; Modify packages/audio/tests/test_server.py
-
[ ] Step 1: Read
server.py(build_app(transcriber, ft_transcriber=None),/transcribe,/compare) +__main__.py(--checkpoint,--ft-checkpoint) +transcribe_ft.py. -
[ ] Step 2: Implement a registry. Generalize
build_appto take a dictmodels: {name: transcriber}(e.g.{"off-the-shelf": ..., "ft-l2": FTTranscriber(faithful_ckpt), "ft-child": FTTranscriber(phon139_ckpt)})./transcribeaccepts an optionalmodelform field (defaultoff-the-shelf) selecting from the registry;/comparetakesmodel_a/model_b(defaults preserve today's baseline-vs-ft behavior). Each model's response carries its owncoverage/limitations. Add__main__flags--ft-l2-checkpoint/--ft-child-checkpoint(keep--ft-checkpointas an alias for--ft-child-checkpointfor back-comp).FTTranscriberalready loads a faithful-style broad-40 checkpoint —ft-l2= the faithfulstate.pt. -
[ ] Step 3: Tests in
test_server.py(these use a stub transcriber, no real model):/transcribewithmodel=ft-l2routes to the right registry entry; unknownmodel→ 400;/healthreports the loaded model names. Runuv run python -m pytest packages/audio/tests/test_server.py -v. -
[ ] Step 4: Worker side — the route's
transcriberfield already maps off-the-shelf→/transcribe, ft→/compare. UpdatefetchTranscriptsotranscriber: 'ft-l2'posts to/transcribewithmodel=ft-l2(and keepft→ft-child or /compare as-is). Add the value to the route's accepted set + the frontend type. -
[ ] Step 5: Commit.
git add packages/audio/src/phonolex_audio/ packages/audio/tests/test_server.py packages/web/workers/src/routes/audio.ts git commit -m "feat(phon-142): phonolex_audio multi-model registry + ft-l2 serving"
Task 5: /dev/pronounce — add ft-l2 to the toggle¶
Files: Modify packages/web/frontend/src/components/tools/PronunciationViewer.tsx; Modify its test; Modify packages/web/frontend/src/services/audioApi.ts (the transcriber union)
-
[ ] Step 1: Implement. Add
'ft-l2'to thetranscriberunion inaudioApi.ts('off-the-shelf' | 'ft-l2' | 'ft') and to theToggleButtonGroupinPronunciationViewer(three options). Default staysoff-the-shelf. The L1 dropdown already sendsl1— no change needed; it now drives the prior server-side. -
[ ] Step 2: Test. Extend the component test: render, confirm the
ft-l2toggle option is present and selectable. -
[ ] Step 3: Run the frontend matrix.
cd packages/web/frontend && npx vitest run && npx tsc --noEmit && npm run build. -
[ ] Step 4: Commit.
git add packages/web/frontend/src/components/tools/PronunciationViewer.tsx packages/web/frontend/src/components/tools/PronunciationViewer.test.tsx packages/web/frontend/src/services/audioApi.ts git commit -m "feat(phon-142): /dev/pronounce ft-l2 toggle option"
Task 6: Full matrix + manual dev verification¶
- [ ] Step 1:
cd packages/web/workers && npm test && npx tsc --noEmit;cd packages/web/frontend && npx vitest run && npx tsc --noEmit && npm run build;uv run python -m pytest packages/audio/tests/. - [ ] Step 2 (manual, optional now): start
phonolex_audiowith--ft-l2-checkpoint research/2026-06-05-phon-142-ft-l2/ckpt/full_s17/state.pt, the worker, and the frontend; on/dev/pronounceselect an L2 clip, pickft-l2+ the speaker's L1, and confirm the per-position class reflects the L1 prior (an L1-typical sub showsvariantwhere the agnostic path showederror).
Self-Review¶
Spec coverage: serve ft-l2 → Task 4 ✓; ship prior → Task 1 ✓; scorer applies prior → Task 2 ✓; route wiring → Task 3 ✓; dev-page toggle → Task 5 ✓; dev-only/local (no prod/deploy) ✓.
Placeholder scan: Task 1's export_prior.py adapts to scoring_prior.py's real API (read-first noted) — the only "adapt" marker, inherent since the prior's internal structure lives in that file; everything else is literal code or pinned tests.
Type consistency: L1Prior/Position/classifyWithPrior/positionForIndex consistent across Tasks 1–3; threshold_basis extends to 'l1_conditioned'; transcriber union 'off-the-shelf'|'ft-l2'|'ft' consistent across route + service + component.
Dev-only guard: nothing here deploys or touches the production seed/Worker config beyond the bundled l1Prior.json (a small committed asset); the ft-l2 model stays local (the 3.5 GB checkpoint is gitignored; served by the local phonolex_audio).