Beta Audio Tab Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Ship the user-facing Beta Audio tab — record/upload/batch a word-level production against a corpus-verified target and get the faithful transcript + per-position deviation overlay (hero), plus a session-level source-attribution read (bonus).
Architecture: Three slices, built back-to-front so each is testable on its own. (1) Serving — the host analyze() returns the produced transcript + canonical + the raw per-production feature vector, plus a new stateless /attribute endpoint that mean-pools feature vectors and classifies (the session read). (2) Worker — a /api/audio/analyze route (target→canonical lookup, forward to host /analyze) and /api/audio/attribute proxy. (3) Frontend — a new AudioAnalysisTool tool tab with a session model, target verification, capture/batch input, the deviation overlay, and the attribution panel.
Tech Stack: Python (FastAPI host, phonolex_audio), TypeScript (Hono Worker on Cloudflare + D1), React + MUI + Vitest (frontend). Spec: docs/superpowers/specs/2026-06-14-audio-beta-tab-design.md.
Branch: feature/phon-145-audio-beta-tab (off release/v6-audio).
Standing constraints: Local-only — no hosted endpoint, no deploy. "feature vectors", not "embeddings". Decision support, clinician-in-the-loop; "patterns like…", never "has X". The host runs on 127.0.0.1:8000 (matches wrangler.toml).
Shared contract (used by every slice)¶
// The /analyze response (host -> worker -> frontend), and AnalyzePosition/Attribution.
interface AnalyzePosition { phone: string; deviation: number | null; nearest: string | null; }
interface AnalyzeAttribution { source: string; distances: Record<string, number>; }
interface AnalyzeResult {
canonical: string[];
produced: string[]; // the faithful transcript ("what we heard")
positions: AnalyzePosition[];
attribution: AnalyzeAttribution | null; // per-clip (noisy on short input)
features: number[] | null; // raw per-production 6-vector for session pooling
}
// The /attribute response (session-level read over pooled features):
type SessionAttribution = AnalyzeAttribution;
Python analyze() returns the same keys (snake-free; lists/dicts). features is the raw [g,x,cg,cx,rate,a] vector analyzer._attribution_features already computes (per-production mean over slots) — it is exposed, not recomputed.
Task 1: Serving — analyze() returns produced + canonical + features¶
Files:
- Modify: packages/audio/src/phonolex_audio/analyzer.py (the analyze method)
- Test: packages/audio/tests/test_analyze_smoke.py (extend the slow smoke assertions)
The hero needs the produced transcript; the session read needs the raw feature vector. analyze() currently returns only {positions, attribution} and drops both.
- [ ] Step 1: Update
analyze()inanalyzer.pyto:
def analyze(self, audio_bytes: bytes, canon: list[str]) -> dict:
"""audio + canonical phones -> {canonical, produced, positions, attribution, features}."""
e, centers, produced = self._emit_and_align(audio_bytes, canon)
result = {
"canonical": list(canon),
"produced": produced,
"positions": self.positions(e, centers, canon),
"attribution": None,
"features": None,
}
if self.attribution is not None:
feats = self._attribution_features(e, centers, canon, produced)
if feats is not None:
result["features"] = [float(x) for x in feats]
result["attribution"] = self.attribution.classify(feats)
return result
- [ ] Step 2: Extend the slow smoke (
test_analyze_smoke.py::test_analyze_runs_end_to_end) — after the existing asserts, add:
assert isinstance(out["produced"], list)
assert out["canonical"] == (row.get("canonical") or [])
if attr is not None:
assert isinstance(out["features"], list) and len(out["features"]) == 6
- [ ] Step 3: Run the slow smoke (needs the drive + keeper):
Run: PYTORCH_ENABLE_MPS_FALLBACK=1 uv run --package phonolex-audio --extra dev --extra inference pytest packages/audio/tests/test_analyze_smoke.py -v -m slow
Expected: PASS (produced/canonical/features present). Fast coverage of the full shape is at the worker layer (Task 3, mocked host).
- [ ] Step 4: Commit
git add packages/audio/src/phonolex_audio/analyzer.py packages/audio/tests/test_analyze_smoke.py
git commit -m "feat(audio): analyze() returns produced transcript + canonical + raw feature vector"
Task 2: Serving — /attribute endpoint (session-level pooled read)¶
Files:
- Modify: packages/audio/src/phonolex_audio/server.py (add /attribute)
- Test: packages/audio/tests/test_server.py (add /attribute tests)
A stateless endpoint: given a list of per-production raw feature vectors, mean-pool them and classify with the baked model — reproducing the validated subject-level aggregation. Keeps the baked model + math in Python.
- [ ] Step 1: Write the failing test in
test_server.py(extendFakeAnalyzerwith anattributionstub):
class FakeAttribution:
def classify(self, feats):
# echo the mean so the test can assert pooling happened
return {"source": "typical", "distances": {"typical": float(sum(feats))}}
def attribute_client() -> TestClient:
fa = FakeAnalyzer()
fa.attribution = FakeAttribution()
return TestClient(build_app(transcriber=FakeTranscriber(), analyzer=fa))
def test_attribute_pools_feature_vectors():
r = attribute_client().post("/attribute", json={"features": [[1, 1], [3, 3]]})
assert r.status_code == 200
# mean of [1,1] and [3,3] = [2,2] -> sum 4
assert r.json()["distances"]["typical"] == 4.0
def test_attribute_400_when_no_analyzer():
r = client().post("/attribute", json={"features": [[1, 2]]})
assert r.status_code == 400
def test_attribute_400_on_empty_list():
r = attribute_client().post("/attribute", json={"features": []})
assert r.status_code == 400
(Note: FakeAnalyzer in test_server.py has no attribution attr today — assigning fa.attribution is fine; the real analyzer always has one.)
- [ ] Step 2: Run to verify it fails
Run: uv run --package phonolex-audio --extra dev pytest packages/audio/tests/test_server.py -k attribute -v
Expected: FAIL (404, no /attribute route)
- [ ] Step 3: Add the endpoint in
server.py, right after/analyze:
@app.post("/attribute")
async def attribute(payload: dict) -> dict:
"""Session-level attribution: mean-pool a list of per-production raw feature
vectors and classify. Stateless — the session lives in the caller."""
import numpy as np
analyzer = app.state.analyzer
if analyzer is None or getattr(analyzer, "attribution", None) is None:
raise HTTPException(status_code=400, detail="Attribution model not loaded")
feats = payload.get("features")
if not isinstance(feats, list) or not feats:
raise HTTPException(status_code=400, detail="features must be a non-empty list of vectors")
pooled = np.mean(np.array(feats, dtype=float), axis=0)
return analyzer.attribution.classify(pooled)
- [ ] Step 4: Run to verify it passes
Run: uv run --package phonolex-audio --extra dev pytest packages/audio/tests/test_server.py -k attribute -v
Expected: PASS (3 passed)
- [ ] Step 5: Run the full host suite (no regressions)
Run: uv run --package phonolex-audio --extra dev pytest packages/audio/tests/ -q -m "not slow"
Expected: PASS
- [ ] Step 6: Commit
git add packages/audio/src/phonolex_audio/server.py packages/audio/tests/test_server.py
git commit -m "feat(audio): /attribute endpoint — mean-pool session feature vectors + classify"
Task 3: Worker — POST /api/audio/analyze¶
Files:
- Modify: packages/web/workers/src/routes/audio.ts (add the route)
- Test: packages/web/workers/test/ (worker vitest — find the audio test file or create audio.analyze.test.ts)
Mirrors /pronounce's structure (validate multipart, lexicon canonical lookup, warming-aware forward) but forwards to the host /analyze and returns its shape. Canonical lookup happens first (the host needs it to align).
- [ ] Step 1: Write the failing worker test by APPENDING a
describetopackages/web/workers/src/__tests__/audio.test.ts(it already imports{ SELF, fetchMock }fromcloudflare:testand sets upbeforeAll/afterEach). CRITICAL harness facts: (a) the host is mocked withfetchMock.get('http://127.0.0.1:8000').intercept({ path, method }).reply(...), NOTvi.stubGlobal; (b) D1 is NOT seeded in CI — the canonical lookup must be tested gracefully (the existing api/audio tests doif seeded {...} else {accept 4xx/5xx}); (c) because/analyzedoes the D1 lookup BEFORE the host call, the host interceptor is only consumed when D1 is seeded — so detect seeding first to avoid a pending-interceptor failure (afterEachasserts none pending).
function analyzeForm(target: string): FormData {
const fd = new FormData();
fd.append('audio', new Blob([new Uint8Array([1, 2, 3])], { type: 'audio/wav' }), 'clip.wav');
fd.append('target', target);
return fd;
}
describe('POST /api/audio/analyze', () => {
it('returns 400 when no audio part is present', async () => {
const fd = new FormData(); fd.append('target', 'cat');
const res = await SELF.fetch('http://localhost/api/audio/analyze', { method: 'POST', body: fd });
expect(res.status).toBe(400);
});
it('returns 400 when target is missing', async () => {
const fd = new FormData();
fd.append('audio', new Blob([new Uint8Array([1, 2, 3])], { type: 'audio/wav' }), 'clip.wav');
const res = await SELF.fetch('http://localhost/api/audio/analyze', { method: 'POST', body: fd });
expect(res.status).toBe(400);
});
it('forwards to host /analyze when D1 is seeded (else 4xx/5xx — D1 unseeded in CI)', async () => {
const seeded = (await SELF.fetch('http://localhost/api/words/cat')).status === 200;
if (seeded) {
fetchMock.get('http://127.0.0.1:8000').intercept({ path: '/analyze', method: 'POST' })
.reply(200, { canonical: ['k','æ','t'], produced: ['k','æ','t'], positions: [], attribution: null, features: null });
const res = await SELF.fetch('http://localhost/api/audio/analyze', { method: 'POST', body: analyzeForm('cat') });
expect(res.status).toBe(200);
expect((await res.json() as { produced: string[] }).produced).toEqual(['k','æ','t']);
} else {
const res = await SELF.fetch('http://localhost/api/audio/analyze', { method: 'POST', body: analyzeForm('cat') });
expect([404, 500]).toContain(res.status);
}
});
it('returns 404 (or 500 unseeded) for a word not in the lexicon', async () => {
const res = await SELF.fetch('http://localhost/api/audio/analyze', { method: 'POST', body: analyzeForm('zzzznotaword') });
expect([404, 500]).toContain(res.status);
});
});
- [ ] Step 2: Run to verify it fails
Run: cd packages/web/workers && npm test -- audio
Expected: FAIL (route 404s)
- [ ] Step 3: Add the route in
audio.ts, after/pronounce:
// ============================================================================
// /analyze — v6 trajectory model: target -> canonical, forward to host /analyze
// ============================================================================
audio.post('/analyze', async (c) => {
const form = await c.req.formData().catch(() => null);
if (!form) return c.json({ detail: 'Missing required multipart field: audio' }, 400);
const fileEntry = form.get('audio');
if (!fileEntry || typeof fileEntry === 'string') {
return c.json({ detail: 'Missing required multipart field: audio' }, 400);
}
const file = fileEntry as File;
if (file.size > MAX_BYTES) return c.json({ detail: 'Audio exceeds 10 MB limit' }, 400);
if (file.type && !file.type.startsWith('audio/')) {
return c.json({ detail: `Unsupported content type: ${file.type}` }, 400);
}
const target = form.get('target');
if (typeof target !== 'string' || !target.trim()) {
return c.json({ detail: 'Missing required field: target' }, 400);
}
const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
if (!base) return c.json({ detail: 'Audio inference host not configured' }, 500);
// canonical phonemes from D1 (host needs them to align)
const row = await c.env.DB.prepare('SELECT phonemes FROM words WHERE word = ? LIMIT 1')
.bind(target.trim().toLowerCase())
.first<{ phonemes: string | null }>();
if (!row || !row.phonemes) {
return c.json({ detail: `Word not in lexicon: ${target}` }, 404);
}
const canonical = JSON.parse(row.phonemes) as string[];
const fwd = new FormData();
fwd.append('audio', file, file.name || 'clip');
fwd.append('canonical', JSON.stringify(canonical));
let upstream: Response;
try {
upstream = await fetch(`${base}/analyze`, { method: 'POST', body: fwd });
} catch {
return c.json({ warming: true, detail: 'Inference host is warming up. Retry shortly.' }, 503);
}
if (upstream.status === 503) {
return c.json({ warming: true, detail: 'Inference host is warming up. Retry shortly.' }, 503);
}
if (!upstream.ok) return c.json({ detail: `Inference host error (${upstream.status})` }, 502);
const body = await upstream.json().catch(() => null);
if (body === null) return c.json({ detail: 'Inference host returned invalid JSON' }, 502);
return c.json(body);
});
- [ ] Step 4: Run to verify it passes
Run: cd packages/web/workers && npm test -- audio
Expected: PASS
- [ ] Step 5: Commit
git add packages/web/workers/src/routes/audio.ts packages/web/workers/test
git commit -m "feat(audio-api): /api/audio/analyze — target->canonical lookup + forward to host /analyze"
Task 4: Worker — POST /api/audio/attribute¶
Files:
- Modify: packages/web/workers/src/routes/audio.ts
- Test: same audio worker test file
Thin JSON proxy (no D1): forward {features: number[][]} to host /attribute, warming-aware.
- [ ] Step 1: Write the failing test — append to
audio.test.ts(usesfetchMock, no D1):
describe('POST /api/audio/attribute', () => {
it('forwards features and returns the host read', async () => {
fetchMock.get('http://127.0.0.1:8000').intercept({ path: '/attribute', method: 'POST' })
.reply(200, { source: 'accent', distances: { accent: 0.1 } });
const res = await SELF.fetch('http://localhost/api/audio/attribute', {
method: 'POST', headers: { 'content-type': 'application/json' },
body: JSON.stringify({ features: [[1, 2, 3, 4, 5, 6]] }),
});
expect(res.status).toBe(200);
expect((await res.json() as { source: string }).source).toBe('accent');
});
it('returns 400 on an empty feature list', async () => {
const res = await SELF.fetch('http://localhost/api/audio/attribute', {
method: 'POST', headers: { 'content-type': 'application/json' },
body: JSON.stringify({ features: [] }),
});
expect(res.status).toBe(400);
});
});
-
[ ] Step 2: Run to verify it fails —
cd packages/web/workers && npm test -- audio→ FAIL. -
[ ] Step 3: Add the route in
audio.ts:
audio.post('/attribute', async (c) => {
const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
if (!base) return c.json({ detail: 'Audio inference host not configured' }, 500);
const payload = await c.req.json().catch(() => null);
if (!payload || !Array.isArray(payload.features) || payload.features.length === 0) {
return c.json({ detail: 'features must be a non-empty list of vectors' }, 400);
}
let upstream: Response;
try {
upstream = await fetch(`${base}/attribute`, {
method: 'POST', headers: { 'content-type': 'application/json' },
body: JSON.stringify({ features: payload.features }),
});
} catch {
return c.json({ warming: true, detail: 'Inference host is warming up. Retry shortly.' }, 503);
}
if (upstream.status === 503) return c.json({ warming: true }, 503);
if (!upstream.ok) return c.json({ detail: `Inference host error (${upstream.status})` }, 502);
const body = await upstream.json().catch(() => null);
if (body === null) return c.json({ detail: 'Inference host returned invalid JSON' }, 502);
return c.json(body);
});
-
[ ] Step 4: Run to verify it passes — PASS.
-
[ ] Step 5: Commit
git add packages/web/workers/src/routes/audio.ts packages/web/workers/test
git commit -m "feat(audio-api): /api/audio/attribute — proxy session feature pooling to host"
Task 5: Frontend — audioAnalysisApi service + types¶
Files:
- Create: packages/web/frontend/src/services/audioAnalysisApi.ts
- Test: packages/web/frontend/src/services/audioAnalysisApi.test.ts
The service owns the three calls: coverage check (existing words API), analyze, attribute. Mirror an existing service for VITE_API_URL base + fetch conventions (read services/ for one, e.g. the old audioApi is gone — use lib/generationApi.ts or any services/*.ts still present for the base-URL pattern).
- [ ] Step 1: Write the failing test
import { describe, it, expect, vi, afterEach } from 'vitest';
import { checkCoverage, analyzeProduction, attributeSession } from './audioAnalysisApi';
afterEach(() => vi.unstubAllGlobals());
describe('audioAnalysisApi', () => {
it('checkCoverage: supported when the word has phonemes', async () => {
vi.stubGlobal('fetch', vi.fn(async () =>
new Response(JSON.stringify({ word: 'cat', phonemes: ['k','æ','t'] }), { status: 200 })));
const r = await checkCoverage('cat');
expect(r.supported).toBe(true);
expect(r.canonical).toEqual(['k','æ','t']);
});
it('checkCoverage: unsupported on 404', async () => {
vi.stubGlobal('fetch', vi.fn(async () => new Response('{}', { status: 404 })));
expect((await checkCoverage('zzzz')).supported).toBe(false);
});
it('attributeSession: posts the feature list', async () => {
const spy = vi.fn(async () =>
new Response(JSON.stringify({ source: 'typical', distances: { typical: 0 } }), { status: 200 }));
vi.stubGlobal('fetch', spy);
const r = await attributeSession([[1,2,3,4,5,6]]);
expect(r.source).toBe('typical');
});
});
-
[ ] Step 2: Run to verify it fails —
cd packages/web/frontend && npm test -- audioAnalysisApi→ FAIL. -
[ ] Step 3: Implement
audioAnalysisApi.ts:
const API = import.meta.env.VITE_API_URL ?? '';
export interface AnalyzePosition { phone: string; deviation: number | null; nearest: string | null; }
export interface AnalyzeAttribution { source: string; distances: Record<string, number>; }
export interface AnalyzeResult {
canonical: string[]; produced: string[]; positions: AnalyzePosition[];
attribution: AnalyzeAttribution | null; features: number[] | null;
}
export interface Coverage { supported: boolean; canonical: string[]; }
export async function checkCoverage(word: string): Promise<Coverage> {
const res = await fetch(`${API}/api/words/${encodeURIComponent(word.trim().toLowerCase())}`);
if (!res.ok) return { supported: false, canonical: [] };
const body = await res.json();
const canonical = Array.isArray(body.phonemes) ? body.phonemes : [];
return { supported: canonical.length > 0, canonical };
}
export type AnalyzeOutcome =
| { kind: 'ok'; result: AnalyzeResult }
| { kind: 'warming' }
| { kind: 'error'; detail: string };
export async function analyzeProduction(target: string, audio: Blob): Promise<AnalyzeOutcome> {
const fd = new FormData();
fd.append('audio', audio, 'clip.wav');
fd.append('target', target);
let res: Response;
try { res = await fetch(`${API}/api/audio/analyze`, { method: 'POST', body: fd }); }
catch { return { kind: 'warming' }; }
if (res.status === 503) return { kind: 'warming' };
if (!res.ok) return { kind: 'error', detail: (await res.json().catch(() => ({}))).detail ?? `Error ${res.status}` };
return { kind: 'ok', result: await res.json() };
}
export async function attributeSession(features: number[][]): Promise<AnalyzeAttribution | null> {
if (features.length === 0) return null;
const res = await fetch(`${API}/api/audio/attribute`, {
method: 'POST', headers: { 'content-type': 'application/json' },
body: JSON.stringify({ features }),
});
if (!res.ok) return null;
return res.json();
}
(Confirm the /api/words/:word response key is phonemes by reading packages/web/workers/src/routes/words.ts; adjust the coverage parse if it differs — e.g. has_phonology + phonemes.)
-
[ ] Step 4: Run to verify it passes — PASS.
-
[ ] Step 5: Commit
git add packages/web/frontend/src/services/audioAnalysisApi.ts packages/web/frontend/src/services/audioAnalysisApi.test.ts
git commit -m "feat(audio-tab): audioAnalysisApi service — coverage, analyze, attribute"
Task 6: Frontend — TargetField (corpus-verified target input)¶
Files:
- Create: packages/web/frontend/src/components/tools/AudioAnalysisTool/TargetField.tsx
- Test: .../TargetField.test.tsx
A controlled text field that debounces checkCoverage and reports { target, canonical, supported } upward. Shows the canonical preview when supported, a "not in our dictionary" hint when not. Mirror MUI usage from an existing tool (e.g. LookupTool.tsx search field).
- [ ] Step 1: Write the failing test — render, type a supported word (mock
checkCoverage), assert canonical preview appears andonResolvedfires withsupported: true; type an uncovered word, assert the unsupported hint.
import { render, screen, waitFor } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import { vi } from 'vitest';
import TargetField from './TargetField';
import * as api from '../../../services/audioAnalysisApi';
it('reports coverage and previews canonical for a supported word', async () => {
vi.spyOn(api, 'checkCoverage').mockResolvedValue({ supported: true, canonical: ['k','æ','t'] });
const onResolved = vi.fn();
render(<TargetField onResolved={onResolved} />);
await userEvent.type(screen.getByRole('textbox'), 'cat');
await waitFor(() => expect(onResolved).toHaveBeenCalledWith(
expect.objectContaining({ target: 'cat', supported: true, canonical: ['k','æ','t'] })));
});
-
[ ] Step 2: Run → FAIL.
cd packages/web/frontend && npm test -- TargetField -
[ ] Step 3: Implement
TargetField.tsx— props{ value?: string; onResolved: (r: {target:string; canonical:string[]; supported:boolean}) => void }. ATextField(MUI) + debounced (setTimeout~300ms)checkCoverage, render canonical as phoneme chips when supported, a helper-text warning when not. Keep it presentational + one effect; no session logic here. -
[ ] Step 4: Run → PASS.
-
[ ] Step 5: Commit —
git addthe two files;git commit -m "feat(audio-tab): TargetField — debounced corpus coverage check + canonical preview".
Task 7: Frontend — CaptureControls + BatchUpload¶
Files:
- Create: .../AudioAnalysisTool/CaptureControls.tsx, .../BatchUpload.tsx
- Test: .../CaptureControls.test.tsx, .../BatchUpload.test.tsx
CaptureControls: record (MediaRecorder) or file-pick a single clip → emits a Blob. BatchUpload: multi-file <input> → emits draft rows { filename, seedTarget, audio }[] where seedTarget is the filename stem (cat.wav → cat).
- [ ] Step 1: Write the failing tests. For
CaptureControls, test the file-pick path (MediaRecorder is hard to unit-test — cover upload, leave record to manual): supplying a file via the input firesonClip(blob). ForBatchUpload: selecting two files firesonRowswith two rows whoseseedTargetare the filename stems.
// BatchUpload.test.tsx
it('emits one row per file with filename-stem seed targets', async () => {
const onRows = vi.fn();
render(<BatchUpload onRows={onRows} />);
const input = screen.getByTestId('batch-input') as HTMLInputElement;
await userEvent.upload(input, [
new File([new Uint8Array([1])], 'cat.wav', { type: 'audio/wav' }),
new File([new Uint8Array([2])], 'dog-1.wav', { type: 'audio/wav' }),
]);
expect(onRows).toHaveBeenCalledWith([
expect.objectContaining({ seedTarget: 'cat' }),
expect.objectContaining({ seedTarget: 'dog-1' }),
]);
});
- [ ] Step 2: Run → FAIL.
- [ ] Step 3: Implement both.
BatchUpload:<input type="file" multiple accept="audio/*" data-testid="batch-input">; on change, map files →{ filename: f.name, seedTarget: f.name.replace(/\.[^.]+$/, ''), audio: f }.CaptureControls: an upload button + a record toggle (MediaRecorder → Blob on stop),onClip(blob). - [ ] Step 4: Run → PASS.
- [ ] Step 5: Commit —
git commit -m "feat(audio-tab): CaptureControls (record/upload) + BatchUpload (filename-seeded rows)".
Task 8: Frontend — DeviationOverlay + ProductionCard (the hero)¶
Files:
- Create: .../AudioAnalysisTool/DeviationOverlay.tsx, .../ProductionCard.tsx
- Test: .../DeviationOverlay.test.tsx, .../ProductionCard.test.tsx
DeviationOverlay: given AnalyzeResult, render the canonical phones as chips heat-colored by deviation, each with a tooltip of deviation + nearest; mark positions where nearest !== phone as substitutions. ProductionCard: the per-production hero — target label, the faithful transcript line (produced), and the DeviationOverlay, plus warming/error/"couldn't score" states.
- [ ] Step 1: Write the failing tests from a fixture
AnalyzeResult:
const fixture = {
canonical: ['k','æ','t'], produced: ['k','æ','p'],
positions: [
{ phone: 'k', deviation: 0.1, nearest: 'k' },
{ phone: 'æ', deviation: 0.2, nearest: 'æ' },
{ phone: 't', deviation: 1.4, nearest: 'p' }, // substitution
], attribution: null, features: null,
};
it('flags positions where nearest != target', () => {
render(<DeviationOverlay result={fixture as any} />);
// the /t/ chip is marked a substitution (e.g. data-substitution="true")
expect(screen.getByTestId('pos-2')).toHaveAttribute('data-substitution', 'true');
expect(screen.getByTestId('pos-0')).toHaveAttribute('data-substitution', 'false');
});
it('renders the produced transcript', () => {
render(<ProductionCard production={{ id:'cat-1', target:'cat', result: fixture } as any} />);
expect(screen.getByText('k æ p')).toBeInTheDocument();
});
- [ ] Step 2: Run → FAIL.
- [ ] Step 3: Implement.
DeviationOverlay: mappositions, color bydeviation(a small heat scale; null → neutral),data-testid={pos-${i}},data-substitution={String(p.nearest !== p.phone && p.nearest !== null)}, MUITooltipwith deviation + "sounded like {nearest}".ProductionCard: header with the target label + produced line + overlay; render warming/error/no-positions states from astatusprop or the result shape. Reuse the app's phoneme-chip styling (checkcomponents/shared/for an existing chip). - [ ] Step 4: Run → PASS.
- [ ] Step 5: Commit —
git commit -m "feat(audio-tab): DeviationOverlay + ProductionCard — the transcript+deviation hero".
Task 9: Frontend — AttributionPanel (the session bonus)¶
Files:
- Create: .../AudioAnalysisTool/AttributionPanel.tsx
- Test: .../AttributionPanel.test.tsx
Given a SessionAttribution | null and the production count, render the source read with "patterns like…" language, the per-source distances, an explicit confidence-by-quantity indicator, and an "add more productions to sharpen this" affordance. When count is low, lead with the low-confidence caveat. Never render "has X".
- [ ] Step 1: Write the failing test
it('shows a low-confidence caveat for a single production', () => {
render(<AttributionPanel attribution={{ source:'accent', distances:{accent:0.1} }} productionCount={1} />);
expect(screen.getByText(/add more productions/i)).toBeInTheDocument();
expect(screen.getByText(/patterns like/i)).toBeInTheDocument();
expect(screen.queryByText(/\bhas\b/i)).toBeNull();
});
it('renders nothing when attribution is null', () => {
const { container } = render(<AttributionPanel attribution={null} productionCount={0} />);
expect(container).toBeEmptyDOMElement();
});
- [ ] Step 2: Run → FAIL.
- [ ] Step 3: Implement — presentational; props
{ attribution: SessionAttribution | null; productionCount: number }. Confidence tier fromproductionCount(e.g. 1 = low, 2–4 = moderate, 5+ = good). Copy: "This production patterns like {source} speech." + the "add more" line whenever count is below the good tier. - [ ] Step 4: Run → PASS.
- [ ] Step 5: Commit —
git commit -m "feat(audio-tab): AttributionPanel — session bonus read with quantity-confidence + guardrail copy".
Task 10: Frontend — AudioAnalysisTool (session orchestration)¶
Files:
- Create: .../AudioAnalysisTool/AudioAnalysisTool.tsx, .../AudioAnalysisTool/index.ts (re-export)
- Test: .../AudioAnalysisTool.test.tsx
Owns the session: a list of productions, the input affordances (Task 6/7), runs analyzeProduction per production, renders a ProductionCard each, and recomputes session attribution via attributeSession over the collected features whenever the session changes. Generates production ids from the verified target (hyphenated, -N disambiguation).
- [ ] Step 1: Write the failing test — a happy path with mocked api: add one production (target
cat, a blob), assert aProductionCardrenders with the produced transcript, and thatattributeSessionwas called with the production'sfeatures.
it('runs a production and recomputes session attribution', async () => {
vi.spyOn(api, 'analyzeProduction').mockResolvedValue({ kind:'ok', result: {
canonical:['k','æ','t'], produced:['k','æ','t'],
positions:[{phone:'k',deviation:0.1,nearest:'k'}], attribution:null, features:[1,2,3,4,5,6] } });
const attrSpy = vi.spyOn(api, 'attributeSession').mockResolvedValue({ source:'typical', distances:{typical:0} });
render(<AudioAnalysisTool />);
// drive the UI: set target + supply a clip + run (use testids exposed by the sub-components)
// ...
await waitFor(() => expect(attrSpy).toHaveBeenCalledWith([[1,2,3,4,5,6]]));
});
- [ ] Step 2: Run → FAIL.
- [ ] Step 3: Implement —
useState<Production[]>, anaddProduction(target, canonical, audio)that pushes a draft then callsanalyzeProduction, anidFor(target, existing)helper (target.toLowerCase().replace(/\s+/g,'-')+-Nif repeated), and an effect that callsattributeSession(features[])over productions withfeaturesand stores the session read forAttributionPanel. ComposeTargetField+CaptureControls+BatchUpload+ theProductionCardlist +AttributionPanel. Include the cold-start warming copy and the Beta framing. - [ ] Step 4: Run → PASS.
- [ ] Step 5: Commit —
git commit -m "feat(audio-tab): AudioAnalysisTool — session model, per-production runs, session attribution".
Task 11: Frontend — register the tool tab¶
Files:
- Modify: packages/web/frontend/src/App_new.tsx (TOOL_DEFS + the component factory + accent color)
-
[ ] Step 1: Add the
TOOL_DEFSentry (mirror an existing entry's fields:id: 'audio', an icon import,title: 'Speech Analysis', a description, and the Beta indication). Add an accent color in thetoolAccentColorsmap. -
[ ] Step 2: Add the factory entry in the component map:
import AudioAnalysisTool from './components/tools/AudioAnalysisTool';
// ...in the factory object:
audio: () => <AudioAnalysisTool />,
- [ ] Step 3: Verify the app builds + the tab mounts
Run: cd packages/web/frontend && npm run build && npm test -- AudioAnalysisTool
Expected: build clean; component tests pass.
- [ ] Step 4: Commit —
git commit -m "feat(audio-tab): register Speech Analysis (Beta) as a tool tab".
Task 12: Full-stack manual smoke (local)¶
Files: none (verification only).
- [ ] Step 1: Start the host (it's likely already up):
PYTORCH_ENABLE_MPS_FALLBACK=1 uv run --package phonolex-audio --extra inference python -m phonolex_audio --trajectory-refs /Volumes/ExternalData1/audio-union/refs_fisher.json --attribution-model /Volumes/ExternalData1/audio-union/attribution_model.json - [ ] Step 2: Start the worker (
cd packages/web/workers && npm run dev) and frontend (cd packages/web/frontend && npm run dev); ensure local D1 is seeded (the tab's canonical lookup needs it). - [ ] Step 3: Open the Speech Analysis tab, run a single production (target a common word, record or upload), confirm the transcript + deviation overlay render and the attribution panel shows the low-confidence caveat. Add a second production; confirm the session attribution recomputes.
- [ ] Step 4: Note any gaps as follow-up tasks; do NOT fold the branch back into
releaseuntil the owner has exercised it (the "local until happy" rule).
Self-Review notes¶
- Spec coverage: session model (T10), per-production targets + corpus verify (T6), three input modes (T7), filename-seed batch (T7), transcript+deviation hero (T8), session attribution faithful to validation = mean-pool raw features (T2 host + T10 wiring), guardrail copy (T9), cold-start (T10), worker proxy + canonical lookup (T3/T4), tab registration (T11). Futures (sentences/slicing) are out of scope per the spec — no tasks, by design.
- Spec correction: the spec said
_attribution_features"will be split to expose the raw per-production vector" — it already returns that per-production vector; T1 just exposes it inanalyze(). No split needed. - Type consistency:
AnalyzeResult/AnalyzePosition/AnalyzeAttributionidentical across the shared contract, the service (T5), and the components (T8/T9).features: number[]flows host→worker→attributeSession→/attributeunchanged. - Coverage-parse caveat: T5 assumes
/api/words/:wordreturnsphonemes. The implementer must confirm againstwords.tsand adjust the one parse line if the key differs — flagged in T5 Step 3.