v6 Audio Trajectory Serving Harness Implementation Plan¶

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Make the "all phones are trajectories" model serveable at all — one config-driven FastAPI host that loads the keeper emitter + full-63 Fisher refs + a baked attribution model and answers audio + canonical phones → per-position trajectory deviation + 4-way source attribution, runnable locally now and packageable for RunPod later.

Architecture: Lift the validated research scorer (score_trajectories.py) and attribution classifier (attribution_classifier.py) out of research/2026-06-06-audio-union/ into importable serving modules in packages/audio. The scorer is corpus-free at serve time. Attribution is split into a one-time offline bake (centroids + standardization + priors → attribution_model.json) and a cheap serve-time classify. Artifact paths + device come from env/flags so the identical harness runs on a laptop (MPS/CPU) or in a RunPod container (CUDA). No hosting is deployed in this plan — only made possible.

Tech Stack: Python 3.11, PyTorch (inference), transformers (wav2vec2-lv-60-espeak backbone), numpy, FastAPI/uvicorn, parselmouth (already a host dep). Frontend cleanup in React/TS.

Scope boundaries (explicit): - IN: stripped serving artifact, scorer module, attribution bake + serve modules, /analyze endpoint, launch flags, local end-to-end smoke, RunPod packaging (buildable, not deployed), removal of the /dev/* pages from the shippable frontend. - OUT (separate plans): the new product UI surface (a later reimagining — the /dev/* pages are scratch and are being removed, not productized); the worker /api/audio/analyze proxy route + canonical lookup (lands with the product UI); the PHON-151 app-wide vector reseed (gated on "happy").

Standing constraints (from research/2026-06-06-audio-union/README.md §7): dev/local only — no production, no remote reseed, no hosting deploy until the user says "happy." Never overwrite/rebuild the gitignored model weights on /Volumes/ExternalData1/audio-union/. The 63 inventory stays 63. Say "feature vectors," not "embeddings."

File Structure¶

Path	Responsibility	Action
`packages/audio/scripts/strip_checkpoint.py`	3.5 GB training ckpt → ~1.26 GB serving weights	Created (Task 1, done)
`packages/audio/src/phonolex_audio/serving_config.py`	Resolve artifact paths + device from env/flags (local↔RunPod parity)	Create (Task 2)
`packages/audio/src/phonolex_audio/trajectory_scorer.py`	Pure trajectory scoring: align emission to canonical, per-position Fisher deviation + nearest-ref identity	Create (Task 3)
`packages/audio/src/phonolex_audio/analyzer.py`	`TrajectoryAnalyzer`: owns the emitter + refs + attribution model; `analyze(audio, canon) -> dict`	Create (Task 4)
`packages/audio/scripts/bake_attribution_model.py`	One-time offline build of `attribution_model.json` (means/stds, per-source centroids, L1/Dev priors)	Create (Task 5)
`packages/audio/src/phonolex_audio/attribution.py`	Serve-time: load baked model, compute 6 features for a clip, standardize, nearest-centroid → source	Create (Task 6)
`packages/audio/src/phonolex_audio/server.py:142`	Add `/analyze` endpoint mirroring `/feature-review`	Modify (Task 7)
`packages/audio/src/phonolex_audio/__main__.py`	Flags: `--trajectory-refs`, `--attribution-model`; register the analyzer	Modify (Task 7)
`packages/audio/tests/test_trajectory_scorer.py`	Pure-function unit tests (synthetic arrays, no model)	Create (Task 3)
`packages/audio/tests/test_attribution.py`	Serve-time classify unit tests (synthetic features + tiny baked model)	Create (Task 6)
`packages/audio/tests/test_analyze_smoke.py`	`@pytest.mark.slow` end-to-end: real keeper + one clip	Create (Task 8)
`packages/audio/deploy/Dockerfile` + `handler.py` + `README.md`	RunPod-buildable image (HF cache baked, artifacts mounted/copied, env-driven)	Create (Task 9)
`packages/web/frontend/src/main.tsx` + `components/tools/Viewer`	Remove `/dev/*` routes + viewer components + their tests/services	Modify/Delete (Task 10)

Phoneme-set note: trajectory_scorer and attribution import the canonical CONSONANTS / phone handling from one place. Define CONSONANTS once in trajectory_scorer.py and import it into attribution.py (do not duplicate the literal from attribution_classifier.py:40).

Task 1: Stripped serving artifact (DONE)¶

Files: Create packages/audio/scripts/strip_checkpoint.py

Already implemented and run. Measured: state.pt 3,752 MB → state_serve.pt 1,262 MB fp32 (427 model tensors; optimizer/step/epoch/history dropped). The output keeps the {"model": ...} key so FeatureEmitter(checkpoint=...) loads it unchanged. fp16 (--half) yields ~630 MB and is available if image size pressures it.

[x] Step 1: strip_checkpoint.py written.
[x] Step 2: Run produced state_serve.pt (1,262 MB) on the drive (gitignored).
[ ] Step 3: Commit the utility

git add packages/audio/scripts/strip_checkpoint.py
git commit -m "feat(audio): checkpoint-strip utility — 3.5GB training ckpt -> 1.26GB serving artifact"

Task 2: Serving config (local↔RunPod parity)¶

Files: - Create: packages/audio/src/phonolex_audio/serving_config.py - Test: packages/audio/tests/test_serving_config.py

Resolve every serving artifact path and the device from env with explicit fallbacks, so the same code runs locally (drive paths, MPS/CPU) and in a container (copied paths, CUDA). No torch import here — pure path/string logic, fast to test.

[ ] Step 1: Write the failing test

# packages/audio/tests/test_serving_config.py
import os
from phonolex_audio.serving_config import resolve_serving_config


def test_env_overrides_defaults(monkeypatch):
    monkeypatch.setenv("PHONOLEX_AUDIO_CHECKPOINT", "/tmp/state_serve.pt")
    monkeypatch.setenv("PHONOLEX_AUDIO_VECTORS", "/tmp/vectors.csv")
    monkeypatch.setenv("PHONOLEX_AUDIO_TRAJ_REFS", "/tmp/refs_fisher.json")
    monkeypatch.setenv("PHONOLEX_AUDIO_ATTRIBUTION", "/tmp/attribution_model.json")
    monkeypatch.setenv("PHONOLEX_AUDIO_DEVICE", "cuda")
    cfg = resolve_serving_config()
    assert cfg.checkpoint == "/tmp/state_serve.pt"
    assert cfg.vectors == "/tmp/vectors.csv"
    assert cfg.traj_refs == "/tmp/refs_fisher.json"
    assert cfg.attribution_model == "/tmp/attribution_model.json"
    assert cfg.device == "cuda"


def test_flag_overrides_env(monkeypatch):
    monkeypatch.setenv("PHONOLEX_AUDIO_DEVICE", "cpu")
    cfg = resolve_serving_config(device="mps")
    assert cfg.device == "mps"  # explicit arg wins over env

[ ] Step 2: Run to verify it fails

Run: uv run --extra dev pytest packages/audio/tests/test_serving_config.py -v Expected: FAIL (module not found)

[ ] Step 3: Implement

# packages/audio/src/phonolex_audio/serving_config.py
"""Resolve serving artifact paths + device from env/flags.

The same harness runs locally (external-drive paths, MPS/CPU) and in a RunPod
container (image-local paths, CUDA). Precedence: explicit arg > env var > default.
No torch import — device is resolved lazily by the caller when None.
"""
from __future__ import annotations

import os
from dataclasses import dataclass

DRIVE = "/Volumes/ExternalData1/audio-union"
_DEFAULTS = {
    "checkpoint": f"{DRIVE}/model_feat_traj_target/state_serve.pt",
    "vectors": f"{DRIVE}/model_feat_traj_target/vectors.csv",
    "traj_refs": f"{DRIVE}/refs_fisher.json",
    "attribution_model": f"{DRIVE}/attribution_model.json",
}
_ENV = {
    "checkpoint": "PHONOLEX_AUDIO_CHECKPOINT",
    "vectors": "PHONOLEX_AUDIO_VECTORS",
    "traj_refs": "PHONOLEX_AUDIO_TRAJ_REFS",
    "attribution_model": "PHONOLEX_AUDIO_ATTRIBUTION",
}


@dataclass(frozen=True)
class ServingConfig:
    checkpoint: str
    vectors: str
    traj_refs: str
    attribution_model: str
    device: str | None  # None = auto-detect at load (cuda>mps>cpu)


def _pick(arg, env_key, default):
    if arg is not None:
        return arg
    return os.environ.get(env_key, default)


def resolve_serving_config(
    *, checkpoint=None, vectors=None, traj_refs=None, attribution_model=None, device=None
) -> ServingConfig:
    return ServingConfig(
        checkpoint=_pick(checkpoint, _ENV["checkpoint"], _DEFAULTS["checkpoint"]),
        vectors=_pick(vectors, _ENV["vectors"], _DEFAULTS["vectors"]),
        traj_refs=_pick(traj_refs, _ENV["traj_refs"], _DEFAULTS["traj_refs"]),
        attribution_model=_pick(
            attribution_model, _ENV["attribution_model"], _DEFAULTS["attribution_model"]
        ),
        device=_pick(device, "PHONOLEX_AUDIO_DEVICE", None),
    )

[ ] Step 4: Run to verify it passes

Run: uv run --extra dev pytest packages/audio/tests/test_serving_config.py -v Expected: PASS (2 passed)

[ ] Step 5: Commit

git add packages/audio/src/phonolex_audio/serving_config.py packages/audio/tests/test_serving_config.py
git commit -m "feat(audio): serving config — env/flag artifact paths for local<->runpod parity"

Task 3: Trajectory scorer module¶

Files: - Create: packages/audio/src/phonolex_audio/trajectory_scorer.py - Test: packages/audio/tests/test_trajectory_scorer.py

Lift the pure scoring math from research/.../score_trajectories.py (lev_align, resample, span-by-neighbour-midpoints, Fisher tdist, nearest-ref). Keep it model-free: functions take the already-emitted e[T,26], the aligned slot centers, the canonical phones, and the loaded refs/fisher. The emitter wiring lives in Task 4.

[ ] Step 1: Write the failing tests (synthetic arrays — no torch, no model)

# packages/audio/tests/test_trajectory_scorer.py
import numpy as np
from phonolex_audio.trajectory_scorer import resample, lev_align, position_spans, tdist, K


def test_resample_to_K_timepoints():
    seg = np.linspace(0, 1, 5)[:, None] * np.ones((5, 26))
    out = resample(seg)
    assert out.shape == (K, 26)


def test_resample_too_short_returns_none():
    assert resample(np.zeros((1, 26))) is None


def test_lev_align_marks_substitution_and_match():
    pairs = lev_align(["k", "a", "t"], ["k", "a", "p"])
    assert pairs == [("k", "k"), ("a", "a"), ("t", "p")]


def test_position_spans_uses_neighbour_midpoints():
    # centers at frames 10, 20, 30 -> middle slot span = midpoints (15, 25)
    spans = position_spans([10.0, 20.0, 30.0], n_frames=40)
    assert spans[1] == (15, 26)  # hi is inclusive+1


def test_tdist_fisher_weighting():
    P = np.zeros((K, 26)); R = np.ones((K, 26))
    fisher_flat = np.ones((K, 26))
    assert abs(tdist(P, R, fisher_flat) - 1.0) < 1e-9  # sqrt(26*1)/... mean over K -> sqrt(26)? check

Note: compute the expected tdist value precisely from the formula sqrt((fisher*(P-R)**2).sum(1)).mean() when writing the test — for P=0,R=1,fisher=1 that is sqrt(26) ≈ 5.099, not 1.0. Fix the assertion to abs(tdist(...) - np.sqrt(26)) < 1e-9.

[ ] Step 2: Run to verify it fails

Run: uv run --extra dev pytest packages/audio/tests/test_trajectory_scorer.py -v Expected: FAIL (module not found)

[ ] Step 3: Implement (lifted verbatim from score_trajectories.py lines 29–55, 82–83, 119–124, generalized)

# packages/audio/src/phonolex_audio/trajectory_scorer.py
"""Pure trajectory-to-trajectory scoring (no model, no corpus).

Given the emitter's per-frame features e[T,26], the force-aligned slot centers,
the canonical phone sequence, and the loaded full-63 Fisher refs, score each
canonical position: Fisher-weighted deviation to its reference trajectory + which
reference the produced sub-trajectory is actually NEAREST (error identity).

Lifted from research/2026-06-06-audio-union/score_trajectories.py (validated).
"""
from __future__ import annotations

import numpy as np

K = 8

# Single source of truth for the consonant set (imported by attribution.py).
CONSONANTS = set("p b t d k ɡ tʃ dʒ f v θ ð s z ʃ ʒ h m n ŋ l ɹ w j ʔ ɾ ɫ ʈ ɖ ɳ β ʋ ɬ c x ɥ".split())


def lev_align(ref, hyp):
    n, m = len(ref), len(hyp)
    D = [[0] * (m + 1) for _ in range(n + 1)]
    for i in range(n + 1): D[i][0] = i
    for j in range(m + 1): D[0][j] = j
    for i in range(1, n + 1):
        for j in range(1, m + 1):
            D[i][j] = min(D[i-1][j]+1, D[i][j-1]+1, D[i-1][j-1]+(ref[i-1] != hyp[j-1]))
    i, j, out = n, m, []
    while i > 0 or j > 0:
        if i > 0 and j > 0 and D[i][j] == D[i-1][j-1]+(ref[i-1] != hyp[j-1]):
            out.append((ref[i-1], hyp[j-1])); i -= 1; j -= 1
        elif i > 0 and D[i][j] == D[i-1][j]+1:
            out.append((ref[i-1], None)); i -= 1
        else:
            out.append((None, hyp[j-1])); j -= 1
    return out[::-1]


def resample(seg, k=K):
    if len(seg) < 2:
        return None
    src = np.linspace(0, 1, len(seg)); dst = np.linspace(0, 1, k)
    return np.stack([np.interp(dst, src, seg[:, d]) for d in range(seg.shape[1])], 1)


def position_spans(centers, n_frames):
    """For each slot center, the [lo, hi) frame span = inter-neighbour midpoints
    (CTC is peaky). centers is a list of float|None. Returns list of (lo, hi)|None."""
    spans = []
    for i, c in enumerate(centers):
        if c is None:
            spans.append(None); continue
        prev = next((centers[k] for k in range(i-1, -1, -1) if centers[k] is not None), None)
        nxt = next((centers[k] for k in range(i+1, len(centers)) if centers[k] is not None), None)
        lo = int((prev + c) / 2) if prev is not None else int(c - 3)
        hi = int((c + nxt) / 2) if nxt is not None else int(c + 3)
        spans.append((max(0, lo), min(n_frames, hi + 1)))
    return spans


def tdist(P, R, fisher):
    """Fisher-weighted mean per-timepoint distance (fisher all-ones -> euclidean)."""
    return float(np.sqrt((fisher * (P - R) ** 2).sum(1)).mean())


def nearest_ref(P, ref_stack, ref_names, fisher):
    d = np.sqrt((fisher[None] * (P[None] - ref_stack) ** 2).sum(2)).mean(1)
    return ref_names[int(d.argmin())]

[ ] Step 4: Run to verify it passes

Run: uv run --extra dev pytest packages/audio/tests/test_trajectory_scorer.py -v Expected: PASS (5 passed)

[ ] Step 5: Commit

git add packages/audio/src/phonolex_audio/trajectory_scorer.py packages/audio/tests/test_trajectory_scorer.py
git commit -m "feat(audio): trajectory scorer module (pure) — lifted from validated research scorer"

Task 4: TrajectoryAnalyzer (emitter + refs wiring)¶

Files: - Modify: packages/audio/src/phonolex_audio/analyzer.py (Create)

Owns the loaded FeatureEmitter and the parsed refs. Mirrors the existing emitter usage in score_trajectories.py (em._decode_audio → em._emit → em.ph2id → _forced_align_positions). Produces the per-position list. Attribution is attached in Task 6. This task has no fast unit test (it requires the real model); it is exercised by the Task 8 smoke. Keep it thin — all math is Task 3 functions.

[ ] Step 1: Implement

# packages/audio/src/phonolex_audio/analyzer.py
"""TrajectoryAnalyzer: load the keeper emitter + full-63 Fisher refs once, then
analyze (audio, canonical_phones) -> per-position trajectory deviation. The
attribution read is attached by attribution.AttributionModel (Task 6).

Mirrors research/.../score_trajectories.py and attribution_classifier.py feature
extraction, but corpus-free at serve time.
"""
from __future__ import annotations

import json
from pathlib import Path

import numpy as np

from phonolex_audio import trajectory_scorer as ts


class TrajectoryAnalyzer:
    def __init__(self, checkpoint: str, vectors: str, traj_refs: str, device: str | None = None):
        from phonolex_audio.feature_emitter import FeatureEmitter

        self.em = FeatureEmitter(checkpoint=checkpoint, vectors_csv=vectors, device=device)
        raw = json.loads(Path(traj_refs).read_text())
        if "_fisher" in raw:
            self.fisher = np.array(raw["_fisher"], float)
            src = raw["refs"]
        else:
            self.fisher = np.ones((ts.K, 26))
            src = raw
        self.refs = {p: np.array(v["traj"], float) for p, v in src.items() if v.get("traj")}
        self.ref_names = list(self.refs)
        self.ref_stack = np.stack([self.refs[n] for n in self.ref_names])  # [63,K,26]

    def _emit_and_align(self, audio_bytes: bytes, canon: list[str]):
        """-> (e[T,26], centers[list], produced[list])."""
        from phonolex_audio.feature_emitter import _forced_align_positions

        arr = self.em._decode_audio(audio_bytes)
        e, lp = self.em._emit(arr)
        produced = self.em._decode_transcript(lp) if hasattr(self.em, "_decode_transcript") else []
        tids, slot_tp = [], []
        for p in canon:
            aid = self.em.ph2id.get(p)
            if aid is not None:
                slot_tp.append(len(tids)); tids.append(aid)
            else:
                slot_tp.append(None)
        if not tids:
            return e, [None] * len(canon), produced
        pos = _forced_align_positions(lp, tids)
        centers = [
            float(np.median(np.where(pos == tp)[0])) if (tp is not None and np.any(pos == tp)) else None
            for tp in slot_tp
        ]
        return e, centers, produced

    def positions(self, e, centers, canon):
        """Per-canonical-position deviation + nearest reference."""
        spans = ts.position_spans(centers, e.shape[0])
        out = []
        for i, ph in enumerate(canon):
            span = spans[i]
            if span is None or ph not in self.refs:
                out.append({"phone": ph, "deviation": None, "nearest": None}); continue
            P = ts.resample(e[span[0]:span[1]])
            if P is None:
                out.append({"phone": ph, "deviation": None, "nearest": None}); continue
            out.append({
                "phone": ph,
                "deviation": ts.tdist(P, self.refs[ph], self.fisher),
                "nearest": ts.nearest_ref(P, self.ref_stack, self.ref_names, self.fisher),
            })
        return out

Note for the implementer: Confirm the emitter's produced-transcript accessor name by reading feature_emitter.py (the research scorer used lp + lev_align(canon, prod) where prod came from the data row, not the emitter). If FeatureEmitter exposes no produced decode, derive produced from the run-coherence decode already in review() / _decode_trajectory, or pass produced through from the caller. The attribution accent-score (Task 6) needs produced; wire whichever path feature_emitter.py actually provides and adjust this stub to match — do not invent a method.

[ ] Step 2: Commit (no test yet — covered by Task 8 smoke)

git add packages/audio/src/phonolex_audio/analyzer.py
git commit -m "feat(audio): TrajectoryAnalyzer — emitter + full-63 Fisher refs, per-position scoring"

Task 5: Bake the attribution model (offline)¶

Files: - Create: packages/audio/scripts/bake_attribution_model.py

The research classifier (attribution_classifier.py) computes per-source centroids + feature standardization + categorical priors from the labeled corpora at runtime. Serving cannot do that. This script runs that computation once and writes attribution_model.json (small; committable or drive-stored): {feature_names, mean[6], std[6], centroids:{source:[6]}, priors:{l1:{...}, dev:{...}}}. It reuses attribution_classifier.py feature extraction and attribution_prototype.build_prior/loglik unchanged — this is a thin driver, not new logic.

[ ] Step 1: Implement (driver over existing research code; run on the machine with the drive + corpora)

# packages/audio/scripts/bake_attribution_model.py
"""Bake the attribution serving model ONCE from the labeled union.

Reuses research/2026-06-06-audio-union/attribution_classifier.py feature
extraction + attribution_prototype priors. Output attribution_model.json is the
ONLY artifact the serving attribution module needs — no corpora at serve time.

  cd research/2026-06-06-audio-union && PYTORCH_ENABLE_MPS_FALLBACK=1 uv run \
    --with torch --with transformers --with librosa --with soundfile --with numpy \
    python ../../packages/audio/scripts/bake_attribution_model.py \
    --out /Volumes/ExternalData1/audio-union/attribution_model.json
"""
from __future__ import annotations

import argparse
import json
import sys
from pathlib import Path

import numpy as np

RESEARCH = Path(__file__).resolve().parents[3] / "research/2026-06-06-audio-union"
sys.path.insert(0, str(RESEARCH))


def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--out", required=True)
    args = ap.parse_args()

    # Import the validated research harness and reuse its subject-feature builder.
    import attribution_classifier as AC  # noqa: E402

    # AC.main() prints a confusion matrix; we need its intermediate (F, lab, priors).
    # Refactor AC to expose build_subject_features() returning (F[N,6], labels[N],
    # feature_names, priors_dict) — a pure extraction of AC.main lines 49–148 with
    # the LOSO/print block removed. Call it here:
    F, lab, feature_names, priors = AC.build_subject_features()

    POP2SRC = AC.POP2SRC
    mean = F.mean(0)
    std = F.std(0) + 1e-9
    Z = (F - mean) / std
    centroids = {}
    for pop in sorted(set(lab)):
        m = lab == pop
        centroids[POP2SRC[pop]] = Z[m].mean(0).tolist()

    out = {
        "feature_names": feature_names,         # ["g","x","cg","cx","rate","a"]
        "mean": mean.tolist(),
        "std": std.tolist(),
        "centroids": centroids,                 # source -> standardized 6-vector
        "priors": priors,                       # {"l1": {...}, "dev": {...}} for accent-score
    }
    Path(args.out).write_text(json.dumps(out))
    print(f"[bake] wrote {args.out}: sources={list(centroids)}, "
          f"subjects={ {AC.POP2SRC[p]: int((lab==p).sum()) for p in set(lab)} }")


if __name__ == "__main__":
    main()

[ ] Step 2: Refactor attribution_classifier.py to expose build_subject_features() (extract main lines 49–148 into a function returning (F, lab, ["g","x","cg","cx","rate","a"], {"l1":..., "dev":...}); keep main() calling it then doing the LOSO print). This keeps the research entrypoint working and gives the bake script a clean seam.
[ ] Step 3: Run the bake

Run: the command in the script docstring. Expected: attribution_model.json written (~a few KB), prints subject counts per source.

[ ] Step 4: Commit (the script + the research refactor; the JSON if small enough — else document it stays on the drive like the weights)

git add packages/audio/scripts/bake_attribution_model.py research/2026-06-06-audio-union/attribution_classifier.py
git commit -m "feat(audio): bake attribution serving model (centroids+std+priors) from validated harness"

Task 6: Attribution serve-time module¶

Files: - Create: packages/audio/src/phonolex_audio/attribution.py - Test: packages/audio/tests/test_attribution.py

Loads attribution_model.json; given the per-position arrays already computed by the analyzer (gradient g, excursion x, consonant-restricted cg/cx, rate, and the produced/canonical phone pair list for accent-score), produces {source, distances:{source:dist}} by standardize → nearest centroid. The 6-feature extraction logic is lifted from attribution_classifier.py:114–137; accent-score from :68–76 using the baked priors (no corpora).

[ ] Step 1: Write the failing test (synthetic features + a tiny in-memory baked model)

# packages/audio/tests/test_attribution.py
import numpy as np
from phonolex_audio.attribution import AttributionModel


def _tiny_model():
    # two sources, 6 features; 'motor' centroid far in feature space
    return {
        "feature_names": ["g", "x", "cg", "cx", "rate", "a"],
        "mean": [0.0] * 6, "std": [1.0] * 6,
        "centroids": {"typical": [0, 0, 0, 0, 0, 0], "motor": [3, -3, 3, -3, 3, 0]},
        "priors": {"l1": {}, "dev": {}},
    }


def test_classify_nearest_centroid(tmp_path):
    import json
    p = tmp_path / "m.json"; p.write_text(json.dumps(_tiny_model()))
    am = AttributionModel(str(p))
    feats = np.array([3, -3, 3, -3, 3, 0], float)  # right on 'motor'
    res = am.classify(feats)
    assert res["source"] == "motor"
    assert set(res["distances"]) == {"typical", "motor"}

[ ] Step 2: Run to verify it fails

Run: uv run --extra dev pytest packages/audio/tests/test_attribution.py -v Expected: FAIL (module not found)

[ ] Step 3: Implement

# packages/audio/src/phonolex_audio/attribution.py
"""Serve-time 4-way source attribution (typical/accent/developmental/motor).

Loads the baked attribution_model.json (Task 5). classify() standardizes a clip's
6-feature vector with the baked mean/std and returns the nearest source centroid.
Feature *extraction* (g/x/cg/cx/rate/a) is done by the analyzer using these helpers
so the math matches attribution_classifier.py exactly.
"""
from __future__ import annotations

import json
from pathlib import Path

import numpy as np

from phonolex_audio.trajectory_scorer import CONSONANTS  # single source of truth


class AttributionModel:
    def __init__(self, path: str):
        m = json.loads(Path(path).read_text())
        self.feature_names = m["feature_names"]
        self.mean = np.array(m["mean"], float)
        self.std = np.array(m["std"], float)
        self.centroids = {k: np.array(v, float) for k, v in m["centroids"].items()}
        self.priors = m["priors"]

    def classify(self, feats: np.ndarray) -> dict:
        z = (feats - self.mean) / self.std
        dist = {s: float(np.linalg.norm(z - c)) for s, c in self.centroids.items()}
        return {"source": min(dist, key=dist.get), "distances": dist}

(The g/x/cg/cx/rate/a extraction + accent_score(priors, subs) helper are lifted into attribution.py as module functions and called by analyzer.analyze(); port them verbatim from attribution_classifier.py:114–137 and :68–76, swapping the runtime build_prior calls for loglik against self.priors. Add a unit test for accent_score with a tiny prior dict.)

[ ] Step 4: Run to verify it passes

Run: uv run --extra dev pytest packages/audio/tests/test_attribution.py -v Expected: PASS

[ ] Step 5: Wire into TrajectoryAnalyzer.analyze() — add to analyzer.py:

    def analyze(self, audio_bytes: bytes, canon: list[str]) -> dict:
        e, centers, produced = self._emit_and_align(audio_bytes, canon)
        positions = self.positions(e, centers, canon)
        result = {"positions": positions}
        if self.attribution is not None:
            feats = self._attribution_features(e, centers, canon, produced)
            result["attribution"] = self.attribution.classify(feats)
        return result

(Implement _attribution_features using the ported g/x/cg/cx/rate/accent helpers; self.attribution is an optional AttributionModel passed to __init__.)

[ ] Step 6: Commit

git add packages/audio/src/phonolex_audio/attribution.py packages/audio/src/phonolex_audio/analyzer.py packages/audio/tests/test_attribution.py
git commit -m "feat(audio): serve-time 4-way attribution (baked centroids) wired into analyzer"

Task 7: `/analyze` endpoint + launch flags¶

Files: - Modify: packages/audio/src/phonolex_audio/server.py (add endpoint after /feature-review) - Modify: packages/audio/src/phonolex_audio/__main__.py (flags + analyzer registration)

[ ] Step 1: Add the endpoint (mirrors /feature-review: canonical is a JSON phoneme array)

    @app.post("/analyze")
    async def analyze(
        audio: UploadFile = File(...),
        canonical: str = Form(...),
    ) -> dict:
        """Trajectory analysis: audio + canonical (JSON phoneme array) ->
        per-position Fisher trajectory deviation + 4-way source attribution.
        Requires the trajectory analyzer to be loaded (--trajectory-refs)."""
        import json as _json
        analyzer = getattr(app.state, "analyzer", None)
        if analyzer is None:
            raise HTTPException(status_code=400, detail="Trajectory analyzer not loaded")
        try:
            canon = _json.loads(canonical)
        except _json.JSONDecodeError:
            raise HTTPException(status_code=400, detail="canonical must be a JSON array of phonemes")
        if not isinstance(canon, list) or not all(isinstance(x, str) for x in canon):
            raise HTTPException(status_code=400, detail="canonical must be a JSON array of phoneme strings")
        raw = await audio.read()
        return analyzer.analyze(raw, canon)

Set app.state.analyzer in build_app from an optional analyzer=None kwarg, and add "analyze": app.state.analyzer is not None to /health.

[ ] Step 2: Add launch flags in __main__.py:

    ap.add_argument("--trajectory-refs", default=None,
                    help="full-63 Fisher refs JSON (refs_fisher.json); enables /analyze")
    ap.add_argument("--attribution-model", default=None,
                    help="baked attribution_model.json; enables the attribution read in /analyze")

After building transcribers, when --feature-checkpoint and --trajectory-refs are both given, construct TrajectoryAnalyzer(checkpoint, vectors, traj_refs, device) (+ optional AttributionModel) and pass it into build_app(..., analyzer=...). Use resolve_serving_config() so env vars work when flags are omitted.

[ ] Step 3: Run the existing host tests (ensure no regression to /transcribe, /compare, /feature-review, /acoustic)

Run: uv run --extra dev --extra inference pytest packages/audio/tests/test_server.py -v Expected: PASS (existing tests green; analyzer is optional so stub-built apps are unaffected)

[ ] Step 4: Commit

git add packages/audio/src/phonolex_audio/server.py packages/audio/src/phonolex_audio/__main__.py
git commit -m "feat(audio): /analyze endpoint + launch flags (trajectory refs + attribution)"

Task 8: Local end-to-end smoke (the "watch it run")¶

Files: - Create: packages/audio/tests/test_analyze_smoke.py (@pytest.mark.slow, local-only)

Loads the real keeper via state_serve.pt and scores one short clip from the union test set, asserting structure + a sanity signal (a clearly-correct production has a low deviation at its matched position). This is the proof the harness runs end-to-end.

[ ] Step 1: Implement

# packages/audio/tests/test_analyze_smoke.py
import json
import os
from pathlib import Path

import pytest

DRIVE = Path("/Volumes/ExternalData1/audio-union")
pytestmark = pytest.mark.slow


@pytest.mark.skipif(not (DRIVE / "model_feat_traj_target/state_serve.pt").exists(),
                    reason="keeper serving artifact not on this machine")
def test_analyze_runs_end_to_end():
    from phonolex_audio.analyzer import TrajectoryAnalyzer

    analyzer = TrajectoryAnalyzer(
        checkpoint=str(DRIVE / "model_feat_traj_target/state_serve.pt"),
        vectors=str(DRIVE / "model_feat_traj_target/vectors.csv"),
        traj_refs=str(DRIVE / "refs_fisher.json"),
    )
    row = next(json.loads(l) for l in open(DRIVE / "train_pkg/test.jsonl"))
    audio = (DRIVE / "train_pkg/wavroot" / row["wav"]).read_bytes()
    out = analyzer.analyze(audio, row.get("canonical") or [])
    assert "positions" in out and isinstance(out["positions"], list)
    scored = [p for p in out["positions"] if p["deviation"] is not None]
    assert scored, "expected at least one scored position"
    for p in scored:
        assert p["deviation"] >= 0 and p["nearest"] is not None

[ ] Step 2: Run the smoke

Run: PYTORCH_ENABLE_MPS_FALLBACK=1 uv run --extra dev --extra inference pytest packages/audio/tests/test_analyze_smoke.py -v -m slow Expected: PASS (loads keeper, scores a clip). This is the milestone to show the user.

[ ] Step 3: Commit

git add packages/audio/tests/test_analyze_smoke.py
git commit -m "test(audio): end-to-end /analyze smoke on the real keeper (slow, local-only)"

Task 9: RunPod packaging (buildable, NOT deployed)¶

Files: - Create: packages/audio/deploy/Dockerfile, packages/audio/deploy/handler.py, packages/audio/deploy/README.md

Adapt the archived pattern (git show archive/csp-generation-v5.2:packages/generation/server/Dockerfile and the 2026-04-16-runpod-serverless-deployment.md plan) to this host. The image bakes the HF base model cache (so Wav2Vec2Model.from_pretrained is offline), copies the serving artifacts (or mounts a network volume), and runs the FastAPI host (or a serverless handler.py wrapping analyzer.analyze). All paths via the Task 2 env vars. Do not deploy — this task ends at a buildable image + a README documenting docker build and the env contract for staging vs prod endpoints.

[ ] Step 1: Write Dockerfile (python:3.11-slim base, uv pip install the inference extra, HF_HOME pre-warmed with facebook/wav2vec2-lv-60-espeak-cv-ft, ENV PHONOLEX_AUDIO_DEVICE=cuda).
[ ] Step 2: Write handler.py (RunPod serverless entry: cold-load TrajectoryAnalyzer once, def handler(event): return analyzer.analyze(b64decode(event["audio"]), event["canonical"])).
[ ] Step 3: Write README.md — docker build, the env contract (PHONOLEX_AUDIO_*), and the note that staging and prod are two separate endpoints selected by the Worker's AUDIO_INFERENCE_URL per environment. No deploy step.
[ ] Step 4: Commit

git add packages/audio/deploy/
git commit -m "build(audio): RunPod-buildable serving image + serverless handler (env-driven; not deployed)"

Task 10: Remove the `/dev/*` pages from the shippable frontend¶

Files: - Modify: packages/web/frontend/src/main.tsx (drop the three /dev/* routes + their imports) - Delete: components/tools/{AudioTranscribeViewer,PronunciationViewer,AcousticViewer}.tsx + their .test.tsx - Delete: now-unused frontend services (services/acousticApi.ts, the /pronounce//compare paths in services/audioApi.ts if unused elsewhere) — verify with grep before deleting.

The dev pages are scratch and must not ship. Remove the routes and components so they cannot appear in a build. Worker-side audio routes are left for the (later) product surface; this task is frontend-only.

[ ] Step 1: Grep for residual references before deleting

Run: grep -rn "AudioTranscribeViewer\|PronunciationViewer\|AcousticViewer\|acousticApi\|/dev/" packages/web/frontend/src Expected: references only in main.tsx + the components/tests themselves.

[ ] Step 2: Remove routes + imports from main.tsx (delete lines 9–11 imports and 30–32 routes).
[ ] Step 3: Delete the components + tests + dead services (only those grep proved unused elsewhere).
[ ] Step 4: Run the frontend test + build gates

Run: cd packages/web/frontend && npm run test && npm run build Expected: PASS (no dangling imports; bundle builds without the dev pages).

[ ] Step 5: Commit

git add -A packages/web/frontend
git commit -m "chore(frontend): remove /dev/* scratch pages from the shippable build (not product)"

Self-Review Notes (gaps the implementer must close, not skip)¶

Emitter produced-transcript path (Task 4 note): the research scorer got produced from the data row, not the emitter. Serving must derive it from the emission. Read feature_emitter.py review()/_decode_trajectory and wire the actual decode; do not invent a method. Accent-score depends on it.
build_subject_features() refactor (Task 5): the bake script assumes attribution_classifier.py is refactored to expose this seam. That refactor is Step 2 of Task 5 — do it first.
Artifact commit policy: state_serve.pt (1.26 GB) is NEVER committed (gitignored, drive-only). attribution_model.json, full63_trajectories.json, refs_fisher.json, vectors.csv are small — decide per file whether to commit into packages/audio/ or keep on the drive referenced by serving_config. Default: keep on the drive for now (dev-only), commit later when a hosting target is chosen.
No hosting deploy in this plan. Task 9 produces a buildable image only. The local↔RunPod capability is satisfied; the decision of where to serve is the user's, post-smoke. ```