PHON-142 Dev Integration — Implementation Plan¶

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development. Steps use checkbox (- [ ]) syntax.

Goal: Wire the PHON-142 winners — the faithful ft-l2 transcriber and the L1 scoring-prior — into the /dev/pronounce dev flow against the local phonolex_audio. Dev-page validation only; NOT user-facing (synthesis into a product feature is a later, user-driven step).

Architecture: (1) phonolex_audio multi-model registry serves ft-l2; (2) the L1 prior P(produced|canonical,L1,position) ships to the Worker as a bundled JSON loaded into isolate memory; (3) pronunciationScore.ts applies the prior when l1 is set (position derived from the canonical index of the single target word); (4) /dev/pronounce gains an ft-l2 toggle option so the L1 dropdown now drives the variant/error call.

Tech Stack: Python (phonolex_audio serving, prior export), TypeScript (Workers scorer/route, React dev page), Vitest.

Branch: research/phon-142-ft-l2-l1-transcriber. Spec/RESULTS: research/2026-06-05-phon-142-ft-l2/ (RESULTS.md, scoring_prior.py).

File Structure¶

Create research/2026-06-05-phon-142-ft-l2/export_prior.py — emit l1_prior.json from scoring_prior.py's trained prior.
Create packages/web/workers/src/config/l1Prior.json — the bundled prior (small derived statistic; committed).
Modify packages/web/workers/src/lib/pronunciationScore.ts — L1-prior classification path.
Modify packages/web/workers/src/routes/audio.ts — load the prior, pass l1 + position into scoring.
Modify packages/audio/src/phonolex_audio/{server.py,__main__.py} — multi-model registry (off-the-shelf/ft-l2/ft-child).
Modify packages/web/frontend/src/components/tools/PronunciationViewer.tsx — add ft-l2 toggle option.
Tests alongside each.

Reuse: scoring_prior.py (the validated prior logic — port classify_with_prior to TS), VARIANT_ERROR_THRESHOLD (0.112), the existing phoneme_dots/phonemes cache pattern.

Task 1: Export the L1 prior → bundled JSON¶

Files: Create research/2026-06-05-phon-142-ft-l2/export_prior.py; Create packages/web/workers/src/config/l1Prior.json

[ ] Step 1: Write export_prior.py — build the prior from scoring_prior.py (use the same TRAIN-split source it was validated on; that's the prior we tuned p_min=0.15 against) and emit a compact JSON the Worker can index:

"""Emit l1Prior.json for the Worker. Structure:
{ "p_min": 0.15, "threshold": 0.112,
  "table": { "<canonical>|<l1>|<position>": { "<produced>": prob, ... }, ... } }
Run: uv run python export_prior.py"""
import json
from pathlib import Path
from importlib import import_module
import sys
sys.path.insert(0, str(Path(__file__).resolve().parent))
sp = import_module("scoring_prior")

prior = sp.build_prior(sp.load_train_rows())  # use the SAME builder the study used; adapt fn names
OUT = Path(__file__).resolve().parents[2] / "packages/web/workers/src/config/l1Prior.json"
table = {}
for (canon, l1, pos), dist in sp.iter_prior(prior):   # adapt to the real prior structure
    table[f"{canon}|{l1}|{pos}"] = {prod: round(float(p), 6) for prod, p in dist.items()}
OUT.write_text(json.dumps({"p_min": sp.P_MIN, "threshold": sp.THRESHOLD, "table": table}, indent=0))
print(f"wrote {OUT}: {len(table)} contexts")

Adapt the function/attribute names to scoring_prior.py's actual API (read it first). If scoring_prior lacks a clean iterator, add a tiny iter_prior/to_dict helper there.

[ ] Step 2: Run it. cd research/2026-06-05-phon-142-ft-l2 && uv run python export_prior.py → confirm l1Prior.json written, contexts ~hundreds, file < ~1 MB. Spot-check one Spanish onset entry (e.g. v|Spanish|onset should give b non-trivial prob).

[ ] Step 3: Commit (the JSON is a derived statistic — ships):

git add research/2026-06-05-phon-142-ft-l2/export_prior.py packages/web/workers/src/config/l1Prior.json
git commit -m "feat(phon-142): export L1 scoring-prior to bundled l1Prior.json"

Task 2: L1-prior classification in `pronunciationScore.ts`¶

Files: Modify packages/web/workers/src/lib/pronunciationScore.ts; Modify packages/web/workers/src/__tests__/pronunciationScore.test.ts

[ ] Step 1: Write the failing test. Append:

import { positionForIndex, classifyWithPrior, type L1Prior } from '../lib/pronunciationScore';

const PRIOR: L1Prior = {
  p_min: 0.15, threshold: 0.112,
  table: { 'v|Spanish|onset': { b: 0.4 }, 'v|Korean|onset': {} },
};

describe('L1-prior classification', () => {
  it('position is onset at index 0, coda at last, medial otherwise', () => {
    expect(positionForIndex(0, 4)).toBe('onset');
    expect(positionForIndex(3, 4)).toBe('coda');
    expect(positionForIndex(1, 4)).toBe('medial');
  });
  it('an L1-typical sub above the prior floor is variant even above cos_dist threshold', () => {
    // Spanish onset v->b, prior 0.4 >= p_min 0.15, cos_dist 0.30 > 0.112
    expect(classifyWithPrior('v','b','onset',0.30,'Spanish',PRIOR)).toBe('variant');
  });
  it('the same sub for an L1 that does not do it stays error', () => {
    expect(classifyWithPrior('v','b','onset',0.30,'Korean',PRIOR)).toBe('error');
  });
  it('below the cos_dist threshold is always variant regardless of L1', () => {
    expect(classifyWithPrior('v','b','onset',0.05,'Korean',PRIOR)).toBe('variant');
  });
});

[ ] Step 2: Run, expect FAIL. cd packages/web/workers && npx vitest run src/__tests__/pronunciationScore.test.ts

[ ] Step 3: Implement. Append to pronunciationScore.ts:

export interface L1Prior {
  p_min: number;
  threshold: number;
  table: Record<string, Record<string, number>>; // "canon|l1|position" -> { produced: prob }
}
export type Position = 'onset' | 'medial' | 'coda';

export function positionForIndex(i: number, n: number): Position {
  if (i === 0) return 'onset';
  if (i === n - 1) return 'coda';
  return 'medial';
}

/** L1-conditioned variant/error decision (ports scoring_prior.classify_with_prior). */
export function classifyWithPrior(
  canonical: string, produced: string, position: Position,
  cosDistValue: number, l1: string, prior: L1Prior,
): 'variant' | 'error' {
  if (cosDistValue < prior.threshold) return 'variant';
  const dist = prior.table[`${canonical}|${l1}|${position}`];
  const p = dist?.[produced] ?? 0;
  return p >= prior.p_min ? 'variant' : 'error';
}

Then extend scorePronunciation to accept opts?: { l1?: string; prior?: L1Prior }: when l1 + prior are present, classify each sub position with classifyWithPrior(canonical, produced, positionForIndex(i, canonical.length), cos_dist, l1, prior); deletions → error; the per-word variant_vs_error_class = worst position as before. When absent, keep the existing L1-agnostic threshold path. Set threshold_basis to 'l1_conditioned' when the prior is used, else 'l1_agnostic'.

[ ] Step 4: Run, expect PASS. Full file + npx tsc --noEmit.
[ ] Step 5: Fixture pin (optional-but-recommended). Add a case to export_prior.py/a small fixture asserting the TS classifyWithPrior matches scoring_prior.classify_with_prior on a handful of (canonical, produced, position, l1, cos_dist) tuples. Skip only if time-constrained.

[ ] Step 6: Commit.

git add packages/web/workers/src/lib/pronunciationScore.ts packages/web/workers/src/__tests__/pronunciationScore.test.ts
git commit -m "feat(phon-142): L1-prior variant/error classification in scorer"

Task 3: Wire the prior into `/api/audio/pronounce`¶

Files: Modify packages/web/workers/src/routes/audio.ts; Modify packages/web/workers/src/__tests__/audio.test.ts

[ ] Step 1: Implement. Import the bundled prior + pass it through:
```
import l1Prior from '../config/l1Prior.json';
import { scorePronunciation, type L1Prior } from '../lib/pronunciationScore';
```
In the /pronounce handler, after the transcript + canonical are obtained, call:
```
const score = scorePronunciation(canonical, produced, cache,
  l1 ? { l1, prior: l1Prior as L1Prior } : undefined);
```
The response already echoes l1 and now carries threshold_basis: 'l1_conditioned' when l1 was supplied. Confirm resolveJsonModule is on (it is, per PHON-129 Task 4) and the JSON import type-checks.
[ ] Step 2: Tests. Add a route test that posts with l1=Spanish and asserts the response threshold_basis === 'l1_conditioned' (the D1-dependent scoring path stays graceful in the unseeded test env per the house pattern — assert on the seam, not a seeded-DB score). Run npx vitest run + npx tsc --noEmit.

[ ] Step 3: Commit.

git add packages/web/workers/src/routes/audio.ts packages/web/workers/src/__tests__/audio.test.ts
git commit -m "feat(phon-142): /api/audio/pronounce applies the L1 prior when l1 is set"

Task 4: `phonolex_audio` multi-model registry (serve `ft-l2`)¶

Files: Modify packages/audio/src/phonolex_audio/{server.py,__main__.py}; Modify packages/audio/tests/test_server.py

[ ] Step 1: Read server.py (build_app(transcriber, ft_transcriber=None), /transcribe, /compare) + __main__.py (--checkpoint, --ft-checkpoint) + transcribe_ft.py.
[ ] Step 2: Implement a registry. Generalize build_app to take a dict models: {name: transcriber} (e.g. {"off-the-shelf": ..., "ft-l2": FTTranscriber(faithful_ckpt), "ft-child": FTTranscriber(phon139_ckpt)}). /transcribe accepts an optional model form field (default off-the-shelf) selecting from the registry; /compare takes model_a/model_b (defaults preserve today's baseline-vs-ft behavior). Each model's response carries its own coverage/limitations. Add __main__ flags --ft-l2-checkpoint / --ft-child-checkpoint (keep --ft-checkpoint as an alias for --ft-child-checkpoint for back-comp). FTTranscriber already loads a faithful-style broad-40 checkpoint — ft-l2 = the faithful state.pt.
[ ] Step 3: Tests in test_server.py (these use a stub transcriber, no real model): /transcribe with model=ft-l2 routes to the right registry entry; unknown model → 400; /health reports the loaded model names. Run uv run python -m pytest packages/audio/tests/test_server.py -v.
[ ] Step 4: Worker side — the route's transcriber field already maps off-the-shelf→/transcribe, ft→/compare. Update fetchTranscript so transcriber: 'ft-l2' posts to /transcribe with model=ft-l2 (and keep ft→ft-child or /compare as-is). Add the value to the route's accepted set + the frontend type.

[ ] Step 5: Commit.

git add packages/audio/src/phonolex_audio/ packages/audio/tests/test_server.py packages/web/workers/src/routes/audio.ts
git commit -m "feat(phon-142): phonolex_audio multi-model registry + ft-l2 serving"

Task 5: `/dev/pronounce` — add `ft-l2` to the toggle¶

Files: Modify packages/web/frontend/src/components/tools/PronunciationViewer.tsx; Modify its test; Modify packages/web/frontend/src/services/audioApi.ts (the transcriber union)

[ ] Step 1: Implement. Add 'ft-l2' to the transcriber union in audioApi.ts ('off-the-shelf' | 'ft-l2' | 'ft') and to the ToggleButtonGroup in PronunciationViewer (three options). Default stays off-the-shelf. The L1 dropdown already sends l1 — no change needed; it now drives the prior server-side.
[ ] Step 2: Test. Extend the component test: render, confirm the ft-l2 toggle option is present and selectable.
[ ] Step 3: Run the frontend matrix. cd packages/web/frontend && npx vitest run && npx tsc --noEmit && npm run build.

[ ] Step 4: Commit.

git add packages/web/frontend/src/components/tools/PronunciationViewer.tsx packages/web/frontend/src/components/tools/PronunciationViewer.test.tsx packages/web/frontend/src/services/audioApi.ts
git commit -m "feat(phon-142): /dev/pronounce ft-l2 toggle option"

Task 6: Full matrix + manual dev verification¶

[ ] Step 1: cd packages/web/workers && npm test && npx tsc --noEmit; cd packages/web/frontend && npx vitest run && npx tsc --noEmit && npm run build; uv run python -m pytest packages/audio/tests/.
[ ] Step 2 (manual, optional now): start phonolex_audio with --ft-l2-checkpoint research/2026-06-05-phon-142-ft-l2/ckpt/full_s17/state.pt, the worker, and the frontend; on /dev/pronounce select an L2 clip, pick ft-l2 + the speaker's L1, and confirm the per-position class reflects the L1 prior (an L1-typical sub shows variant where the agnostic path showed error).

Self-Review¶

Spec coverage: serve ft-l2 → Task 4 ✓; ship prior → Task 1 ✓; scorer applies prior → Task 2 ✓; route wiring → Task 3 ✓; dev-page toggle → Task 5 ✓; dev-only/local (no prod/deploy) ✓.

Placeholder scan: Task 1's export_prior.py adapts to scoring_prior.py's real API (read-first noted) — the only "adapt" marker, inherent since the prior's internal structure lives in that file; everything else is literal code or pinned tests.

Type consistency: L1Prior/Position/classifyWithPrior/positionForIndex consistent across Tasks 1–3; threshold_basis extends to 'l1_conditioned'; transcriber union 'off-the-shelf'|'ft-l2'|'ft' consistent across route + service + component.

Dev-only guard: nothing here deploys or touches the production seed/Worker config beyond the bundled l1Prior.json (a small committed asset); the ft-l2 model stays local (the 3.5 GB checkpoint is gitignored; served by the local phonolex_audio).

PHON-142 Dev Integration — Implementation Plan¶

File Structure¶

Task 1: Export the L1 prior → bundled JSON¶

Task 2: L1-prior classification in pronunciationScore.ts¶

Task 3: Wire the prior into /api/audio/pronounce¶

Task 4: phonolex_audio multi-model registry (serve ft-l2)¶