Skip to content

Beta Audio Tab Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Ship the user-facing Beta Audio tab — record/upload/batch a word-level production against a corpus-verified target and get the faithful transcript + per-position deviation overlay (hero), plus a session-level source-attribution read (bonus).

Architecture: Three slices, built back-to-front so each is testable on its own. (1) Serving — the host analyze() returns the produced transcript + canonical + the raw per-production feature vector, plus a new stateless /attribute endpoint that mean-pools feature vectors and classifies (the session read). (2) Worker — a /api/audio/analyze route (target→canonical lookup, forward to host /analyze) and /api/audio/attribute proxy. (3) Frontend — a new AudioAnalysisTool tool tab with a session model, target verification, capture/batch input, the deviation overlay, and the attribution panel.

Tech Stack: Python (FastAPI host, phonolex_audio), TypeScript (Hono Worker on Cloudflare + D1), React + MUI + Vitest (frontend). Spec: docs/superpowers/specs/2026-06-14-audio-beta-tab-design.md.

Branch: feature/phon-145-audio-beta-tab (off release/v6-audio).

Standing constraints: Local-only — no hosted endpoint, no deploy. "feature vectors", not "embeddings". Decision support, clinician-in-the-loop; "patterns like…", never "has X". The host runs on 127.0.0.1:8000 (matches wrangler.toml).


Shared contract (used by every slice)

// The /analyze response (host -> worker -> frontend), and AnalyzePosition/Attribution.
interface AnalyzePosition { phone: string; deviation: number | null; nearest: string | null; }
interface AnalyzeAttribution { source: string; distances: Record<string, number>; }
interface AnalyzeResult {
  canonical: string[];
  produced: string[];                 // the faithful transcript ("what we heard")
  positions: AnalyzePosition[];
  attribution: AnalyzeAttribution | null; // per-clip (noisy on short input)
  features: number[] | null;          // raw per-production 6-vector for session pooling
}
// The /attribute response (session-level read over pooled features):
type SessionAttribution = AnalyzeAttribution;

Python analyze() returns the same keys (snake-free; lists/dicts). features is the raw [g,x,cg,cx,rate,a] vector analyzer._attribution_features already computes (per-production mean over slots) — it is exposed, not recomputed.


Task 1: Serving — analyze() returns produced + canonical + features

Files: - Modify: packages/audio/src/phonolex_audio/analyzer.py (the analyze method) - Test: packages/audio/tests/test_analyze_smoke.py (extend the slow smoke assertions)

The hero needs the produced transcript; the session read needs the raw feature vector. analyze() currently returns only {positions, attribution} and drops both.

  • [ ] Step 1: Update analyze() in analyzer.py to:
    def analyze(self, audio_bytes: bytes, canon: list[str]) -> dict:
        """audio + canonical phones -> {canonical, produced, positions, attribution, features}."""
        e, centers, produced = self._emit_and_align(audio_bytes, canon)
        result = {
            "canonical": list(canon),
            "produced": produced,
            "positions": self.positions(e, centers, canon),
            "attribution": None,
            "features": None,
        }
        if self.attribution is not None:
            feats = self._attribution_features(e, centers, canon, produced)
            if feats is not None:
                result["features"] = [float(x) for x in feats]
                result["attribution"] = self.attribution.classify(feats)
        return result
  • [ ] Step 2: Extend the slow smoke (test_analyze_smoke.py::test_analyze_runs_end_to_end) — after the existing asserts, add:
    assert isinstance(out["produced"], list)
    assert out["canonical"] == (row.get("canonical") or [])
    if attr is not None:
        assert isinstance(out["features"], list) and len(out["features"]) == 6
  • [ ] Step 3: Run the slow smoke (needs the drive + keeper):

Run: PYTORCH_ENABLE_MPS_FALLBACK=1 uv run --package phonolex-audio --extra dev --extra inference pytest packages/audio/tests/test_analyze_smoke.py -v -m slow Expected: PASS (produced/canonical/features present). Fast coverage of the full shape is at the worker layer (Task 3, mocked host).

  • [ ] Step 4: Commit
git add packages/audio/src/phonolex_audio/analyzer.py packages/audio/tests/test_analyze_smoke.py
git commit -m "feat(audio): analyze() returns produced transcript + canonical + raw feature vector"

Task 2: Serving — /attribute endpoint (session-level pooled read)

Files: - Modify: packages/audio/src/phonolex_audio/server.py (add /attribute) - Test: packages/audio/tests/test_server.py (add /attribute tests)

A stateless endpoint: given a list of per-production raw feature vectors, mean-pool them and classify with the baked model — reproducing the validated subject-level aggregation. Keeps the baked model + math in Python.

  • [ ] Step 1: Write the failing test in test_server.py (extend FakeAnalyzer with an attribution stub):
class FakeAttribution:
    def classify(self, feats):
        # echo the mean so the test can assert pooling happened
        return {"source": "typical", "distances": {"typical": float(sum(feats))}}


def attribute_client() -> TestClient:
    fa = FakeAnalyzer()
    fa.attribution = FakeAttribution()
    return TestClient(build_app(transcriber=FakeTranscriber(), analyzer=fa))


def test_attribute_pools_feature_vectors():
    r = attribute_client().post("/attribute", json={"features": [[1, 1], [3, 3]]})
    assert r.status_code == 200
    # mean of [1,1] and [3,3] = [2,2] -> sum 4
    assert r.json()["distances"]["typical"] == 4.0


def test_attribute_400_when_no_analyzer():
    r = client().post("/attribute", json={"features": [[1, 2]]})
    assert r.status_code == 400


def test_attribute_400_on_empty_list():
    r = attribute_client().post("/attribute", json={"features": []})
    assert r.status_code == 400

(Note: FakeAnalyzer in test_server.py has no attribution attr today — assigning fa.attribution is fine; the real analyzer always has one.)

  • [ ] Step 2: Run to verify it fails

Run: uv run --package phonolex-audio --extra dev pytest packages/audio/tests/test_server.py -k attribute -v Expected: FAIL (404, no /attribute route)

  • [ ] Step 3: Add the endpoint in server.py, right after /analyze:
    @app.post("/attribute")
    async def attribute(payload: dict) -> dict:
        """Session-level attribution: mean-pool a list of per-production raw feature
        vectors and classify. Stateless — the session lives in the caller."""
        import numpy as np

        analyzer = app.state.analyzer
        if analyzer is None or getattr(analyzer, "attribution", None) is None:
            raise HTTPException(status_code=400, detail="Attribution model not loaded")
        feats = payload.get("features")
        if not isinstance(feats, list) or not feats:
            raise HTTPException(status_code=400, detail="features must be a non-empty list of vectors")
        pooled = np.mean(np.array(feats, dtype=float), axis=0)
        return analyzer.attribution.classify(pooled)
  • [ ] Step 4: Run to verify it passes

Run: uv run --package phonolex-audio --extra dev pytest packages/audio/tests/test_server.py -k attribute -v Expected: PASS (3 passed)

  • [ ] Step 5: Run the full host suite (no regressions)

Run: uv run --package phonolex-audio --extra dev pytest packages/audio/tests/ -q -m "not slow" Expected: PASS

  • [ ] Step 6: Commit
git add packages/audio/src/phonolex_audio/server.py packages/audio/tests/test_server.py
git commit -m "feat(audio): /attribute endpoint — mean-pool session feature vectors + classify"

Task 3: Worker — POST /api/audio/analyze

Files: - Modify: packages/web/workers/src/routes/audio.ts (add the route) - Test: packages/web/workers/test/ (worker vitest — find the audio test file or create audio.analyze.test.ts)

Mirrors /pronounce's structure (validate multipart, lexicon canonical lookup, warming-aware forward) but forwards to the host /analyze and returns its shape. Canonical lookup happens first (the host needs it to align).

  • [ ] Step 1: Write the failing worker test by APPENDING a describe to packages/web/workers/src/__tests__/audio.test.ts (it already imports { SELF, fetchMock } from cloudflare:test and sets up beforeAll/afterEach). CRITICAL harness facts: (a) the host is mocked with fetchMock.get('http://127.0.0.1:8000').intercept({ path, method }).reply(...), NOT vi.stubGlobal; (b) D1 is NOT seeded in CI — the canonical lookup must be tested gracefully (the existing api/audio tests do if seeded {...} else {accept 4xx/5xx}); (c) because /analyze does the D1 lookup BEFORE the host call, the host interceptor is only consumed when D1 is seeded — so detect seeding first to avoid a pending-interceptor failure (afterEach asserts none pending).
function analyzeForm(target: string): FormData {
  const fd = new FormData();
  fd.append('audio', new Blob([new Uint8Array([1, 2, 3])], { type: 'audio/wav' }), 'clip.wav');
  fd.append('target', target);
  return fd;
}

describe('POST /api/audio/analyze', () => {
  it('returns 400 when no audio part is present', async () => {
    const fd = new FormData(); fd.append('target', 'cat');
    const res = await SELF.fetch('http://localhost/api/audio/analyze', { method: 'POST', body: fd });
    expect(res.status).toBe(400);
  });

  it('returns 400 when target is missing', async () => {
    const fd = new FormData();
    fd.append('audio', new Blob([new Uint8Array([1, 2, 3])], { type: 'audio/wav' }), 'clip.wav');
    const res = await SELF.fetch('http://localhost/api/audio/analyze', { method: 'POST', body: fd });
    expect(res.status).toBe(400);
  });

  it('forwards to host /analyze when D1 is seeded (else 4xx/5xx — D1 unseeded in CI)', async () => {
    const seeded = (await SELF.fetch('http://localhost/api/words/cat')).status === 200;
    if (seeded) {
      fetchMock.get('http://127.0.0.1:8000').intercept({ path: '/analyze', method: 'POST' })
        .reply(200, { canonical: ['k','æ','t'], produced: ['k','æ','t'], positions: [], attribution: null, features: null });
      const res = await SELF.fetch('http://localhost/api/audio/analyze', { method: 'POST', body: analyzeForm('cat') });
      expect(res.status).toBe(200);
      expect((await res.json() as { produced: string[] }).produced).toEqual(['k','æ','t']);
    } else {
      const res = await SELF.fetch('http://localhost/api/audio/analyze', { method: 'POST', body: analyzeForm('cat') });
      expect([404, 500]).toContain(res.status);
    }
  });

  it('returns 404 (or 500 unseeded) for a word not in the lexicon', async () => {
    const res = await SELF.fetch('http://localhost/api/audio/analyze', { method: 'POST', body: analyzeForm('zzzznotaword') });
    expect([404, 500]).toContain(res.status);
  });
});
  • [ ] Step 2: Run to verify it fails

Run: cd packages/web/workers && npm test -- audio Expected: FAIL (route 404s)

  • [ ] Step 3: Add the route in audio.ts, after /pronounce:
// ============================================================================
// /analyze — v6 trajectory model: target -> canonical, forward to host /analyze
// ============================================================================
audio.post('/analyze', async (c) => {
  const form = await c.req.formData().catch(() => null);
  if (!form) return c.json({ detail: 'Missing required multipart field: audio' }, 400);
  const fileEntry = form.get('audio');
  if (!fileEntry || typeof fileEntry === 'string') {
    return c.json({ detail: 'Missing required multipart field: audio' }, 400);
  }
  const file = fileEntry as File;
  if (file.size > MAX_BYTES) return c.json({ detail: 'Audio exceeds 10 MB limit' }, 400);
  if (file.type && !file.type.startsWith('audio/')) {
    return c.json({ detail: `Unsupported content type: ${file.type}` }, 400);
  }
  const target = form.get('target');
  if (typeof target !== 'string' || !target.trim()) {
    return c.json({ detail: 'Missing required field: target' }, 400);
  }
  const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
  if (!base) return c.json({ detail: 'Audio inference host not configured' }, 500);

  // canonical phonemes from D1 (host needs them to align)
  const row = await c.env.DB.prepare('SELECT phonemes FROM words WHERE word = ? LIMIT 1')
    .bind(target.trim().toLowerCase())
    .first<{ phonemes: string | null }>();
  if (!row || !row.phonemes) {
    return c.json({ detail: `Word not in lexicon: ${target}` }, 404);
  }
  const canonical = JSON.parse(row.phonemes) as string[];

  const fwd = new FormData();
  fwd.append('audio', file, file.name || 'clip');
  fwd.append('canonical', JSON.stringify(canonical));
  let upstream: Response;
  try {
    upstream = await fetch(`${base}/analyze`, { method: 'POST', body: fwd });
  } catch {
    return c.json({ warming: true, detail: 'Inference host is warming up. Retry shortly.' }, 503);
  }
  if (upstream.status === 503) {
    return c.json({ warming: true, detail: 'Inference host is warming up. Retry shortly.' }, 503);
  }
  if (!upstream.ok) return c.json({ detail: `Inference host error (${upstream.status})` }, 502);
  const body = await upstream.json().catch(() => null);
  if (body === null) return c.json({ detail: 'Inference host returned invalid JSON' }, 502);
  return c.json(body);
});
  • [ ] Step 4: Run to verify it passes

Run: cd packages/web/workers && npm test -- audio Expected: PASS

  • [ ] Step 5: Commit
git add packages/web/workers/src/routes/audio.ts packages/web/workers/test
git commit -m "feat(audio-api): /api/audio/analyze — target->canonical lookup + forward to host /analyze"

Task 4: Worker — POST /api/audio/attribute

Files: - Modify: packages/web/workers/src/routes/audio.ts - Test: same audio worker test file

Thin JSON proxy (no D1): forward {features: number[][]} to host /attribute, warming-aware.

  • [ ] Step 1: Write the failing test — append to audio.test.ts (uses fetchMock, no D1):
describe('POST /api/audio/attribute', () => {
  it('forwards features and returns the host read', async () => {
    fetchMock.get('http://127.0.0.1:8000').intercept({ path: '/attribute', method: 'POST' })
      .reply(200, { source: 'accent', distances: { accent: 0.1 } });
    const res = await SELF.fetch('http://localhost/api/audio/attribute', {
      method: 'POST', headers: { 'content-type': 'application/json' },
      body: JSON.stringify({ features: [[1, 2, 3, 4, 5, 6]] }),
    });
    expect(res.status).toBe(200);
    expect((await res.json() as { source: string }).source).toBe('accent');
  });

  it('returns 400 on an empty feature list', async () => {
    const res = await SELF.fetch('http://localhost/api/audio/attribute', {
      method: 'POST', headers: { 'content-type': 'application/json' },
      body: JSON.stringify({ features: [] }),
    });
    expect(res.status).toBe(400);
  });
});
  • [ ] Step 2: Run to verify it failscd packages/web/workers && npm test -- audio → FAIL.

  • [ ] Step 3: Add the route in audio.ts:

audio.post('/attribute', async (c) => {
  const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
  if (!base) return c.json({ detail: 'Audio inference host not configured' }, 500);
  const payload = await c.req.json().catch(() => null);
  if (!payload || !Array.isArray(payload.features) || payload.features.length === 0) {
    return c.json({ detail: 'features must be a non-empty list of vectors' }, 400);
  }
  let upstream: Response;
  try {
    upstream = await fetch(`${base}/attribute`, {
      method: 'POST', headers: { 'content-type': 'application/json' },
      body: JSON.stringify({ features: payload.features }),
    });
  } catch {
    return c.json({ warming: true, detail: 'Inference host is warming up. Retry shortly.' }, 503);
  }
  if (upstream.status === 503) return c.json({ warming: true }, 503);
  if (!upstream.ok) return c.json({ detail: `Inference host error (${upstream.status})` }, 502);
  const body = await upstream.json().catch(() => null);
  if (body === null) return c.json({ detail: 'Inference host returned invalid JSON' }, 502);
  return c.json(body);
});
  • [ ] Step 4: Run to verify it passes — PASS.

  • [ ] Step 5: Commit

git add packages/web/workers/src/routes/audio.ts packages/web/workers/test
git commit -m "feat(audio-api): /api/audio/attribute — proxy session feature pooling to host"

Task 5: Frontend — audioAnalysisApi service + types

Files: - Create: packages/web/frontend/src/services/audioAnalysisApi.ts - Test: packages/web/frontend/src/services/audioAnalysisApi.test.ts

The service owns the three calls: coverage check (existing words API), analyze, attribute. Mirror an existing service for VITE_API_URL base + fetch conventions (read services/ for one, e.g. the old audioApi is gone — use lib/generationApi.ts or any services/*.ts still present for the base-URL pattern).

  • [ ] Step 1: Write the failing test
import { describe, it, expect, vi, afterEach } from 'vitest';
import { checkCoverage, analyzeProduction, attributeSession } from './audioAnalysisApi';

afterEach(() => vi.unstubAllGlobals());

describe('audioAnalysisApi', () => {
  it('checkCoverage: supported when the word has phonemes', async () => {
    vi.stubGlobal('fetch', vi.fn(async () =>
      new Response(JSON.stringify({ word: 'cat', phonemes: ['k','æ','t'] }), { status: 200 })));
    const r = await checkCoverage('cat');
    expect(r.supported).toBe(true);
    expect(r.canonical).toEqual(['k','æ','t']);
  });

  it('checkCoverage: unsupported on 404', async () => {
    vi.stubGlobal('fetch', vi.fn(async () => new Response('{}', { status: 404 })));
    expect((await checkCoverage('zzzz')).supported).toBe(false);
  });

  it('attributeSession: posts the feature list', async () => {
    const spy = vi.fn(async () =>
      new Response(JSON.stringify({ source: 'typical', distances: { typical: 0 } }), { status: 200 }));
    vi.stubGlobal('fetch', spy);
    const r = await attributeSession([[1,2,3,4,5,6]]);
    expect(r.source).toBe('typical');
  });
});
  • [ ] Step 2: Run to verify it failscd packages/web/frontend && npm test -- audioAnalysisApi → FAIL.

  • [ ] Step 3: Implement audioAnalysisApi.ts:

const API = import.meta.env.VITE_API_URL ?? '';

export interface AnalyzePosition { phone: string; deviation: number | null; nearest: string | null; }
export interface AnalyzeAttribution { source: string; distances: Record<string, number>; }
export interface AnalyzeResult {
  canonical: string[]; produced: string[]; positions: AnalyzePosition[];
  attribution: AnalyzeAttribution | null; features: number[] | null;
}
export interface Coverage { supported: boolean; canonical: string[]; }

export async function checkCoverage(word: string): Promise<Coverage> {
  const res = await fetch(`${API}/api/words/${encodeURIComponent(word.trim().toLowerCase())}`);
  if (!res.ok) return { supported: false, canonical: [] };
  const body = await res.json();
  const canonical = Array.isArray(body.phonemes) ? body.phonemes : [];
  return { supported: canonical.length > 0, canonical };
}

export type AnalyzeOutcome =
  | { kind: 'ok'; result: AnalyzeResult }
  | { kind: 'warming' }
  | { kind: 'error'; detail: string };

export async function analyzeProduction(target: string, audio: Blob): Promise<AnalyzeOutcome> {
  const fd = new FormData();
  fd.append('audio', audio, 'clip.wav');
  fd.append('target', target);
  let res: Response;
  try { res = await fetch(`${API}/api/audio/analyze`, { method: 'POST', body: fd }); }
  catch { return { kind: 'warming' }; }
  if (res.status === 503) return { kind: 'warming' };
  if (!res.ok) return { kind: 'error', detail: (await res.json().catch(() => ({}))).detail ?? `Error ${res.status}` };
  return { kind: 'ok', result: await res.json() };
}

export async function attributeSession(features: number[][]): Promise<AnalyzeAttribution | null> {
  if (features.length === 0) return null;
  const res = await fetch(`${API}/api/audio/attribute`, {
    method: 'POST', headers: { 'content-type': 'application/json' },
    body: JSON.stringify({ features }),
  });
  if (!res.ok) return null;
  return res.json();
}

(Confirm the /api/words/:word response key is phonemes by reading packages/web/workers/src/routes/words.ts; adjust the coverage parse if it differs — e.g. has_phonology + phonemes.)

  • [ ] Step 4: Run to verify it passes — PASS.

  • [ ] Step 5: Commit

git add packages/web/frontend/src/services/audioAnalysisApi.ts packages/web/frontend/src/services/audioAnalysisApi.test.ts
git commit -m "feat(audio-tab): audioAnalysisApi service — coverage, analyze, attribute"

Task 6: Frontend — TargetField (corpus-verified target input)

Files: - Create: packages/web/frontend/src/components/tools/AudioAnalysisTool/TargetField.tsx - Test: .../TargetField.test.tsx

A controlled text field that debounces checkCoverage and reports { target, canonical, supported } upward. Shows the canonical preview when supported, a "not in our dictionary" hint when not. Mirror MUI usage from an existing tool (e.g. LookupTool.tsx search field).

  • [ ] Step 1: Write the failing test — render, type a supported word (mock checkCoverage), assert canonical preview appears and onResolved fires with supported: true; type an uncovered word, assert the unsupported hint.
import { render, screen, waitFor } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import { vi } from 'vitest';
import TargetField from './TargetField';
import * as api from '../../../services/audioAnalysisApi';

it('reports coverage and previews canonical for a supported word', async () => {
  vi.spyOn(api, 'checkCoverage').mockResolvedValue({ supported: true, canonical: ['k','æ','t'] });
  const onResolved = vi.fn();
  render(<TargetField onResolved={onResolved} />);
  await userEvent.type(screen.getByRole('textbox'), 'cat');
  await waitFor(() => expect(onResolved).toHaveBeenCalledWith(
    expect.objectContaining({ target: 'cat', supported: true, canonical: ['k','æ','t'] })));
});
  • [ ] Step 2: Run → FAIL. cd packages/web/frontend && npm test -- TargetField

  • [ ] Step 3: Implement TargetField.tsx — props { value?: string; onResolved: (r: {target:string; canonical:string[]; supported:boolean}) => void }. A TextField (MUI) + debounced (setTimeout ~300ms) checkCoverage, render canonical as phoneme chips when supported, a helper-text warning when not. Keep it presentational + one effect; no session logic here.

  • [ ] Step 4: Run → PASS.

  • [ ] Step 5: Commitgit add the two files; git commit -m "feat(audio-tab): TargetField — debounced corpus coverage check + canonical preview".


Task 7: Frontend — CaptureControls + BatchUpload

Files: - Create: .../AudioAnalysisTool/CaptureControls.tsx, .../BatchUpload.tsx - Test: .../CaptureControls.test.tsx, .../BatchUpload.test.tsx

CaptureControls: record (MediaRecorder) or file-pick a single clip → emits a Blob. BatchUpload: multi-file <input> → emits draft rows { filename, seedTarget, audio }[] where seedTarget is the filename stem (cat.wavcat).

  • [ ] Step 1: Write the failing tests. For CaptureControls, test the file-pick path (MediaRecorder is hard to unit-test — cover upload, leave record to manual): supplying a file via the input fires onClip(blob). For BatchUpload: selecting two files fires onRows with two rows whose seedTarget are the filename stems.
// BatchUpload.test.tsx
it('emits one row per file with filename-stem seed targets', async () => {
  const onRows = vi.fn();
  render(<BatchUpload onRows={onRows} />);
  const input = screen.getByTestId('batch-input') as HTMLInputElement;
  await userEvent.upload(input, [
    new File([new Uint8Array([1])], 'cat.wav', { type: 'audio/wav' }),
    new File([new Uint8Array([2])], 'dog-1.wav', { type: 'audio/wav' }),
  ]);
  expect(onRows).toHaveBeenCalledWith([
    expect.objectContaining({ seedTarget: 'cat' }),
    expect.objectContaining({ seedTarget: 'dog-1' }),
  ]);
});
  • [ ] Step 2: Run → FAIL.
  • [ ] Step 3: Implement both. BatchUpload: <input type="file" multiple accept="audio/*" data-testid="batch-input">; on change, map files → { filename: f.name, seedTarget: f.name.replace(/\.[^.]+$/, ''), audio: f }. CaptureControls: an upload button + a record toggle (MediaRecorder → Blob on stop), onClip(blob).
  • [ ] Step 4: Run → PASS.
  • [ ] Step 5: Commitgit commit -m "feat(audio-tab): CaptureControls (record/upload) + BatchUpload (filename-seeded rows)".

Task 8: Frontend — DeviationOverlay + ProductionCard (the hero)

Files: - Create: .../AudioAnalysisTool/DeviationOverlay.tsx, .../ProductionCard.tsx - Test: .../DeviationOverlay.test.tsx, .../ProductionCard.test.tsx

DeviationOverlay: given AnalyzeResult, render the canonical phones as chips heat-colored by deviation, each with a tooltip of deviation + nearest; mark positions where nearest !== phone as substitutions. ProductionCard: the per-production hero — target label, the faithful transcript line (produced), and the DeviationOverlay, plus warming/error/"couldn't score" states.

  • [ ] Step 1: Write the failing tests from a fixture AnalyzeResult:
const fixture = {
  canonical: ['k','æ','t'], produced: ['k','æ','p'],
  positions: [
    { phone: 'k', deviation: 0.1, nearest: 'k' },
    { phone: 'æ', deviation: 0.2, nearest: 'æ' },
    { phone: 't', deviation: 1.4, nearest: 'p' },  // substitution
  ], attribution: null, features: null,
};

it('flags positions where nearest != target', () => {
  render(<DeviationOverlay result={fixture as any} />);
  // the /t/ chip is marked a substitution (e.g. data-substitution="true")
  expect(screen.getByTestId('pos-2')).toHaveAttribute('data-substitution', 'true');
  expect(screen.getByTestId('pos-0')).toHaveAttribute('data-substitution', 'false');
});

it('renders the produced transcript', () => {
  render(<ProductionCard production={{ id:'cat-1', target:'cat', result: fixture } as any} />);
  expect(screen.getByText('k æ p')).toBeInTheDocument();
});
  • [ ] Step 2: Run → FAIL.
  • [ ] Step 3: Implement. DeviationOverlay: map positions, color by deviation (a small heat scale; null → neutral), data-testid={pos-${i}}, data-substitution={String(p.nearest !== p.phone && p.nearest !== null)}, MUI Tooltip with deviation + "sounded like {nearest}". ProductionCard: header with the target label + produced line + overlay; render warming/error/no-positions states from a status prop or the result shape. Reuse the app's phoneme-chip styling (check components/shared/ for an existing chip).
  • [ ] Step 4: Run → PASS.
  • [ ] Step 5: Commitgit commit -m "feat(audio-tab): DeviationOverlay + ProductionCard — the transcript+deviation hero".

Task 9: Frontend — AttributionPanel (the session bonus)

Files: - Create: .../AudioAnalysisTool/AttributionPanel.tsx - Test: .../AttributionPanel.test.tsx

Given a SessionAttribution | null and the production count, render the source read with "patterns like…" language, the per-source distances, an explicit confidence-by-quantity indicator, and an "add more productions to sharpen this" affordance. When count is low, lead with the low-confidence caveat. Never render "has X".

  • [ ] Step 1: Write the failing test
it('shows a low-confidence caveat for a single production', () => {
  render(<AttributionPanel attribution={{ source:'accent', distances:{accent:0.1} }} productionCount={1} />);
  expect(screen.getByText(/add more productions/i)).toBeInTheDocument();
  expect(screen.getByText(/patterns like/i)).toBeInTheDocument();
  expect(screen.queryByText(/\bhas\b/i)).toBeNull();
});

it('renders nothing when attribution is null', () => {
  const { container } = render(<AttributionPanel attribution={null} productionCount={0} />);
  expect(container).toBeEmptyDOMElement();
});
  • [ ] Step 2: Run → FAIL.
  • [ ] Step 3: Implement — presentational; props { attribution: SessionAttribution | null; productionCount: number }. Confidence tier from productionCount (e.g. 1 = low, 2–4 = moderate, 5+ = good). Copy: "This production patterns like {source} speech." + the "add more" line whenever count is below the good tier.
  • [ ] Step 4: Run → PASS.
  • [ ] Step 5: Commitgit commit -m "feat(audio-tab): AttributionPanel — session bonus read with quantity-confidence + guardrail copy".

Task 10: Frontend — AudioAnalysisTool (session orchestration)

Files: - Create: .../AudioAnalysisTool/AudioAnalysisTool.tsx, .../AudioAnalysisTool/index.ts (re-export) - Test: .../AudioAnalysisTool.test.tsx

Owns the session: a list of productions, the input affordances (Task 6/7), runs analyzeProduction per production, renders a ProductionCard each, and recomputes session attribution via attributeSession over the collected features whenever the session changes. Generates production ids from the verified target (hyphenated, -N disambiguation).

  • [ ] Step 1: Write the failing test — a happy path with mocked api: add one production (target cat, a blob), assert a ProductionCard renders with the produced transcript, and that attributeSession was called with the production's features.
it('runs a production and recomputes session attribution', async () => {
  vi.spyOn(api, 'analyzeProduction').mockResolvedValue({ kind:'ok', result: {
    canonical:['k','æ','t'], produced:['k','æ','t'],
    positions:[{phone:'k',deviation:0.1,nearest:'k'}], attribution:null, features:[1,2,3,4,5,6] } });
  const attrSpy = vi.spyOn(api, 'attributeSession').mockResolvedValue({ source:'typical', distances:{typical:0} });
  render(<AudioAnalysisTool />);
  // drive the UI: set target + supply a clip + run (use testids exposed by the sub-components)
  // ...
  await waitFor(() => expect(attrSpy).toHaveBeenCalledWith([[1,2,3,4,5,6]]));
});
  • [ ] Step 2: Run → FAIL.
  • [ ] Step 3: ImplementuseState<Production[]>, an addProduction(target, canonical, audio) that pushes a draft then calls analyzeProduction, an idFor(target, existing) helper (target.toLowerCase().replace(/\s+/g,'-') + -N if repeated), and an effect that calls attributeSession(features[]) over productions with features and stores the session read for AttributionPanel. Compose TargetField + CaptureControls + BatchUpload + the ProductionCard list + AttributionPanel. Include the cold-start warming copy and the Beta framing.
  • [ ] Step 4: Run → PASS.
  • [ ] Step 5: Commitgit commit -m "feat(audio-tab): AudioAnalysisTool — session model, per-production runs, session attribution".

Task 11: Frontend — register the tool tab

Files: - Modify: packages/web/frontend/src/App_new.tsx (TOOL_DEFS + the component factory + accent color)

  • [ ] Step 1: Add the TOOL_DEFS entry (mirror an existing entry's fields: id: 'audio', an icon import, title: 'Speech Analysis', a description, and the Beta indication). Add an accent color in the toolAccentColors map.

  • [ ] Step 2: Add the factory entry in the component map:

import AudioAnalysisTool from './components/tools/AudioAnalysisTool';
// ...in the factory object:
audio: () => <AudioAnalysisTool />,
  • [ ] Step 3: Verify the app builds + the tab mounts

Run: cd packages/web/frontend && npm run build && npm test -- AudioAnalysisTool Expected: build clean; component tests pass.

  • [ ] Step 4: Commitgit commit -m "feat(audio-tab): register Speech Analysis (Beta) as a tool tab".

Task 12: Full-stack manual smoke (local)

Files: none (verification only).

  • [ ] Step 1: Start the host (it's likely already up): PYTORCH_ENABLE_MPS_FALLBACK=1 uv run --package phonolex-audio --extra inference python -m phonolex_audio --trajectory-refs /Volumes/ExternalData1/audio-union/refs_fisher.json --attribution-model /Volumes/ExternalData1/audio-union/attribution_model.json
  • [ ] Step 2: Start the worker (cd packages/web/workers && npm run dev) and frontend (cd packages/web/frontend && npm run dev); ensure local D1 is seeded (the tab's canonical lookup needs it).
  • [ ] Step 3: Open the Speech Analysis tab, run a single production (target a common word, record or upload), confirm the transcript + deviation overlay render and the attribution panel shows the low-confidence caveat. Add a second production; confirm the session attribution recomputes.
  • [ ] Step 4: Note any gaps as follow-up tasks; do NOT fold the branch back into release until the owner has exercised it (the "local until happy" rule).

Self-Review notes

  1. Spec coverage: session model (T10), per-production targets + corpus verify (T6), three input modes (T7), filename-seed batch (T7), transcript+deviation hero (T8), session attribution faithful to validation = mean-pool raw features (T2 host + T10 wiring), guardrail copy (T9), cold-start (T10), worker proxy + canonical lookup (T3/T4), tab registration (T11). Futures (sentences/slicing) are out of scope per the spec — no tasks, by design.
  2. Spec correction: the spec said _attribution_features "will be split to expose the raw per-production vector" — it already returns that per-production vector; T1 just exposes it in analyze(). No split needed.
  3. Type consistency: AnalyzeResult/AnalyzePosition/AnalyzeAttribution identical across the shared contract, the service (T5), and the components (T8/T9). features: number[] flows host→worker→attributeSession/attribute unchanged.
  4. Coverage-parse caveat: T5 assumes /api/words/:word returns phonemes. The implementer must confirm against words.ts and adjust the one parse line if the key differs — flagged in T5 Step 3.