PHON-130 — Acoustic Analysis /dev/acoustic — Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development. Steps use checkbox (
- [ ]) syntax.
Goal: A /dev/acoustic dev page (Model #4) that extracts F1–F3 + F0 + duration from a vowel production via Parselmouth and overlays Hillenbrand percentile bands by target-vowel × speaker-group. Dev/validation surface, NOT the product.
Architecture: phonolex_audio gains a Parselmouth /acoustic endpoint (local Python); a build step turns Hillenbrand vowdata.dat into a bundled hillenbrandNorms.json; the Worker proxies extraction + computes the percentile in-TS (pinned to Python); an AcousticViewer dev page mirrors PronunciationViewer.
Tech Stack: Python (Parselmouth/Praat, the new dep), TypeScript (Workers overlay + route, React dev page), Vitest, pytest.
Spec: docs/superpowers/specs/2026-06-05-phon-130-acoustic-analysis-dev-page-design.md. Branch: research/phon-130-acoustic-analysis.
File Structure¶
research/2026-06-05-phon-130-acoustic/build_hillenbrand_norms.py— parsevowdata.dat→hillenbrandNorms.json.packages/web/workers/src/config/hillenbrandNorms.json— bundled norm distributions (committed; Tier A).packages/web/workers/src/lib/acousticOverlay.ts— percentile of a value vs a sorted norm array.packages/web/workers/src/routes/audio.ts—POST /api/audio/acoustic(proxy + overlay).packages/audio/src/phonolex_audio/acoustic.py— Parselmouth extraction.packages/audio/src/phonolex_audio/server.py—POST /acoustic.packages/audio/pyproject.toml— addpraat-parselmouth.packages/web/frontend/src/services/acousticApi.ts—analyzeAcoustic().packages/web/frontend/src/components/tools/AcousticViewer.tsx(+ test) — the dev page.packages/web/frontend/src/main.tsx—/dev/acousticroute.
Reuse: the phonolex_audio server multipart pattern, audio.ts proxy + JSON-import patterns, PronunciationViewer dev-page scaffold, the cumulative-percentile formula.
Task 1: Build Hillenbrand norm tables¶
Files: Create research/2026-06-05-phon-130-acoustic/build_hillenbrand_norms.py; Create packages/web/workers/src/config/hillenbrandNorms.json
-
[ ] Step 1: Inspect the data format first. Read
/Volumes/ExternalData2/audio-datasets/hillenbrand_et_al_1995/h95-alldata/{readme.txt,vowdata.dat}(head). Confirm: the speaker-id prefix encodes group (m=men,w=women,b=boys,g=girls), the vowel code column (e.g.ae ah aw eh ei er ih iy oa oo uh uw→ IPAæ ɑ ɔ ɛ eɪ ɝ ɪ i oʊ ʊ ʌ u), and the F0/F1/F2/F3 steady-state columns (Hz;0= not measured → drop). Note the exact column indices. -
[ ] Step 2: Implement the builder (
build_hillenbrand_norms.py): parse each row →(group, vowel_ipa, {f0,f1,f2,f3}), drop unmeasured (0) values, and emit sorted ascending arrays per(vowel, group):"""Hillenbrand 1995 -> hillenbrandNorms.json for the Worker percentile overlay. { "vowels": ["i","ɪ",...], "groups": ["men","women","boys","girls"], "table": { "<vowel>|<group>": { "f1": [sorted Hz...], "f2": [...], "f3": [...], "f0": [...] } } } Run: uv run python build_hillenbrand_norms.py""" import json from pathlib import Path DAT = Path("/Volumes/ExternalData2/audio-datasets/hillenbrand_et_al_1995/h95-alldata/vowdata.dat") OUT = Path(__file__).resolve().parents[2] / "packages/web/workers/src/config/hillenbrandNorms.json" GROUP = {"m": "men", "w": "women", "b": "boys", "g": "girls"} VOWEL = {"ae":"æ","ah":"ɑ","aw":"ɔ","eh":"ɛ","ei":"eɪ","er":"ɝ", "ih":"ɪ","iy":"i","oa":"oʊ","oo":"ʊ","uh":"ʌ","uw":"u"} # adapt column parsing to the REAL vowdata.dat layout confirmed in Step 1. def main(): acc = {} for line in DAT.read_text().splitlines(): # parse: speaker-id (e.g. 'm01ae'), then F0,F1,F2,F3 steady-state cols # group = GROUP[id[0]]; vowel = VOWEL[id[3:5]]; values from the right columns ... # implement against the confirmed layout table = {} for (vowel, group), d in acc.items(): table[f"{vowel}|{group}"] = {k: sorted(v) for k, v in d.items() if v} OUT.write_text(json.dumps({"vowels": sorted(VOWEL.values()), "groups": list(GROUP.values()), "table": table})) print(f"wrote {OUT}: {len(table)} (vowel,group) cells") -
[ ] Step 3: Run + verify.
cd research/2026-06-05-phon-130-acoustic && uv run python build_hillenbrand_norms.py. Confirm ~48 cells (12 vowels × 4 groups), and sanity-check against published Hillenbrand means: e.g. men's /i/ (iy) median F1 ≈ 340 Hz, F2 ≈ 2240 Hz; women's /ɑ/ (ah) F1 ≈ 920 Hz. Print a couple of medians. -
[ ] Step 4: Commit (JSON is a Tier-A derived statistic — ships):
git add research/2026-06-05-phon-130-acoustic/build_hillenbrand_norms.py packages/web/workers/src/config/hillenbrandNorms.json git commit -m "data(phon-130): Hillenbrand vowel-formant norm tables (hillenbrandNorms.json)"
Task 2: In-Worker percentile overlay (acousticOverlay.ts)¶
Files: Create packages/web/workers/src/lib/acousticOverlay.ts; Create packages/web/workers/src/__tests__/acousticOverlay.test.ts
-
[ ] Step 1: Failing test
import { percentile, overlayFor, type HillenbrandNorms } from '../lib/acousticOverlay'; const NORMS: HillenbrandNorms = { vowels: ['i'], groups: ['men'], table: { 'i|men': { f1: [300, 320, 340, 360, 380], f2: [2200, 2220, 2240, 2260, 2280], f3: [], f0: [] } }, }; describe('percentile (cumulative bisect_right/N*100)', () => { it('is 0 below all, 100 at/above max, ~mid in the middle', () => { expect(percentile(290, [300, 320, 340, 360, 380])).toBe(0); expect(percentile(340, [300, 320, 340, 360, 380])).toBe(60); // bisect_right=3 -> 3/5*100 expect(percentile(999, [300, 320, 340, 360, 380])).toBe(100); }); it('empty norm array -> null (unmeasured)', () => { expect(percentile(340, [])).toBeNull(); }); }); describe('overlayFor', () => { it('returns per-measure percentiles for the (vowel, group) cell', () => { const o = overlayFor({ f1: 340, f2: 2240, f3: 3000, f0: 120 }, 'i', 'men', NORMS); expect(o.f1).toBe(60); expect(o.f2).toBe(60); expect(o.f3).toBeNull(); // empty norms }); it('missing cell -> all null', () => { const o = overlayFor({ f1: 340, f2: 2240, f3: 3000, f0: 120 }, 'u', 'boys', NORMS); expect(o).toEqual({ f1: null, f2: null, f3: null, f0: null }); }); }); -
[ ] Step 2: Run, expect FAIL.
cd packages/web/workers && npx vitest run src/__tests__/acousticOverlay.test.ts -
[ ] Step 3: Implement
export interface HillenbrandNorms { vowels: string[]; groups: string[]; table: Record<string, { f1: number[]; f2: number[]; f3: number[]; f0: number[] }>; } export interface Overlay { f1: number | null; f2: number | null; f3: number | null; f0: number | null; } /** cumulative percentile: bisect_right(sorted, v) / N * 100. null if no norms. */ export function percentile(v: number, sorted: number[]): number | null { if (sorted.length === 0) return null; let lo = 0, hi = sorted.length; while (lo < hi) { const m = (lo + hi) >> 1; if (sorted[m] <= v) lo = m + 1; else hi = m; } return (lo / sorted.length) * 100; } export function overlayFor( vals: { f1: number; f2: number; f3: number; f0: number }, vowel: string, group: string, norms: HillenbrandNorms, ): Overlay { const cell = norms.table[`${vowel}|${group}`]; if (!cell) return { f1: null, f2: null, f3: null, f0: null }; return { f1: percentile(vals.f1, cell.f1), f2: percentile(vals.f2, cell.f2), f3: percentile(vals.f3, cell.f3), f0: percentile(vals.f0, cell.f0), }; } -
[ ] Step 4: Run, expect PASS +
npx tsc --noEmit. -
[ ] Step 5: Fixture pin — add a small
research/2026-06-05-phon-130-acoustic/percentile_fixture.pythat computesbisect_right-based percentiles for a few (value, group, vowel) cases fromhillenbrandNorms.jsonand a TS test assertingpercentile/overlayFormatch (the PHON-126/142 pattern). Confirms the Worker matches Python. -
[ ] Step 6: Commit
git add packages/web/workers/src/lib/acousticOverlay.ts packages/web/workers/src/__tests__/acousticOverlay.test.ts git commit -m "feat(phon-130): in-Worker Hillenbrand percentile overlay"
Task 3: Parselmouth extraction (acoustic.py)¶
Files: Modify packages/audio/pyproject.toml; Create packages/audio/src/phonolex_audio/acoustic.py; Create packages/audio/tests/test_acoustic.py
-
[ ] Step 1: Add the dep. In
packages/audio/pyproject.tomladd"praat-parselmouth"to dependencies. Runuv sync --package phonolex-audio(or the workspace equivalent); confirmuv run python -c "import parselmouth; print(parselmouth.__version__)". -
[ ] Step 2: Implement extraction (
acoustic.py): audio bytes → features. Steady state = median over the central 40% of the voiced region."""Parselmouth acoustic extraction for Model #4. F1-F3 track + steady-state, F0, duration.""" from __future__ import annotations import io import numpy as np import parselmouth from parselmouth.praat import call # formant ceiling by group (Praat convention): men 5000, women/children 5500. CEILING = {"men": 5000.0, "women": 5500.0, "boys": 5500.0, "girls": 5500.0} def extract(audio_bytes: bytes, group: str = "women") -> dict: snd = parselmouth.Sound(io.BytesIO(audio_bytes)) # accepts wav bytes dur_ms = round(snd.get_total_duration() * 1000) ceiling = CEILING.get(group, 5500.0) formant = snd.to_formant_burg(max_number_of_formants=5, maximum_formant=ceiling) pitch = snd.to_pitch() ts = formant.ts() # frame times f1 = [call(formant, "Get value at time", 1, t, "Hertz", "Linear") for t in ts] f2 = [call(formant, "Get value at time", 2, t, "Hertz", "Linear") for t in ts] f3 = [call(formant, "Get value at time", 3, t, "Hertz", "Linear") for t in ts] f0_track = [pitch.get_value_at_time(t) or float("nan") for t in ts] def steady(track): a = np.array(track, float); a = a[~np.isnan(a)] if a.size == 0: return None lo, hi = int(a.size*0.3), int(a.size*0.7) or a.size return float(np.median(a[lo:hi] if hi > lo else a)) return { "formants": {"f1": steady(f1), "f2": steady(f2), "f3": steady(f3), "track": {"t": list(ts), "f1": f1, "f2": f2, "f3": f3}}, "f0": {"value": steady(f0_track), "track": f0_track}, "duration_ms": dur_ms, "group": group, } -
[ ] Step 3: Test (
test_acoustic.py) against a real Hillenbrand stimulus wav (the dataset has wavs, or synthesize a steady tone): assertduration_ms > 0,formants.f1is a plausible Hz value (200–1200),f0.valueplausible (80–400). Use a known Hillenbrand /ɑ/ clip and assert F1 in a sane window. Runuv run python -m pytest packages/audio/tests/test_acoustic.py -v. -
[ ] Step 4: Commit
git add packages/audio/pyproject.toml packages/audio/src/phonolex_audio/acoustic.py packages/audio/tests/test_acoustic.py git commit -m "feat(phon-130): Parselmouth F1-F3/F0/duration extraction"
Task 4: /acoustic server endpoint¶
Files: Modify packages/audio/src/phonolex_audio/server.py; Modify packages/audio/tests/test_server.py
- [ ] Step 1: Implement. Add
POST /acoustictobuild_app(multipartaudio+ optionalgroupform field; defaultwomen), callingacoustic.extract(bytes, group). Reuse the existing multipart validation./healthmay addacoustic: true. - [ ] Step 2: Test (
test_server.py, stubacoustic.extractvia monkeypatch — no real Praat in CI):/acousticreturns the feature JSON; missing audio → 400. Runuv run python -m pytest packages/audio/tests/test_server.py -v. - [ ] Step 3: Commit
git add packages/audio/src/phonolex_audio/server.py packages/audio/tests/test_server.py git commit -m "feat(phon-130): phonolex_audio /acoustic endpoint"
Task 5: /api/audio/acoustic Worker route + overlay¶
Files: Modify packages/web/workers/src/routes/audio.ts; Modify packages/web/workers/src/__tests__/audio.test.ts
- [ ] Step 1: Implement. Add
POST /api/audio/acoustic: multipart (audio,target_vowel,group), proxy to the host/acoustic(reuse theAUDIO_INFERENCE_URL+ warming pattern fromfetchTranscript), then:Handle null steady values (unvoiced/failed extraction) → percentiles null, 200 (descriptive, no throw).import hillenbrandNorms from '../config/hillenbrandNorms.json'; import { overlayFor, type HillenbrandNorms } from '../lib/acousticOverlay'; // after extraction `ex`: const steady = { f1: ex.formants.f1, f2: ex.formants.f2, f3: ex.formants.f3, f0: ex.f0.value }; const percentiles = overlayFor(steady, target_vowel, group, hillenbrandNorms as HillenbrandNorms); return c.json({ ...ex, target_vowel, group, percentiles }); - [ ] Step 2: Test (mirror the pronounce tests; fetchMock the host
/acoustic): validation 400s; a mocked extraction returns + the response carriespercentiles. Runnpx vitest run+npx tsc --noEmit. - [ ] Step 3: Commit
git add packages/web/workers/src/routes/audio.ts packages/web/workers/src/__tests__/audio.test.ts git commit -m "feat(phon-130): /api/audio/acoustic proxy + percentile overlay"
Task 6: analyzeAcoustic() frontend service¶
Files: Modify packages/web/frontend/src/services/acousticApi.ts (Create); Create test
- [ ] Step 1: Failing test — mirror
audioApi.pronounce.test.ts:analyzeAcoustic(blob, 'i', 'men')posts multipart, returns the result; 503 →TranscriberWarmingError. - [ ] Step 2: Implement
acousticApi.ts(mirrorpronounceAudio): multipart POST/api/audio/acousticwithaudio/target_vowel/group;AcousticResulttype{formants, f0, duration_ms, percentiles, target_vowel, group}; reuseTranscriberWarmingError/freshRequestId/baseUrl. - [ ] Step 3: Run test + tsc. Commit
git add packages/web/frontend/src/services/acousticApi.ts packages/web/frontend/src/services/acousticApi.test.ts git commit -m "feat(phon-130): analyzeAcoustic frontend service"
Task 7: AcousticViewer dev page¶
Files: Read PronunciationViewer.tsx first; Create packages/web/frontend/src/components/tools/AcousticViewer.tsx (+ test); Modify main.tsx
- [ ] Step 1: Read
PronunciationViewer.tsx— reuse its capture scaffold (record/upload/preloaded, warming-state discriminated union, the clip-injection test mechanism). - [ ] Step 2: Failing component test — render
AcousticViewer, set target vowel + group, mockanalyzeAcousticto resolve a result withpercentiles.f1=18, supply a clip via the file-upload mechanism, click Analyze, assert the F1 value + its percentile render. (MirrorPronunciationViewer.test.tsx'sselectFileapproach.) - [ ] Step 3: Implement
AcousticViewer.tsx: capture controls (mirror) + target-vowel<Select>(12 vowels) + group<Select>(men/women/boys/girls) → Analyze → display F1–F3/F0/duration each with its percentile (a band: green if 10–90th pct, amber/red outside; null → "no norm"). Register<Route path="/dev/acoustic" element={<AcousticViewer />} />inmain.tsx. - [ ] Step 4: Frontend matrix —
npx vitest run && npx tsc --noEmit && npm run build. - [ ] Step 5: Commit
git add packages/web/frontend/src/components/tools/AcousticViewer.tsx packages/web/frontend/src/components/tools/AcousticViewer.test.tsx packages/web/frontend/src/main.tsx git commit -m "feat(phon-130): /dev/acoustic AcousticViewer page"
Task 8: Praat-parity validation + RESULTS¶
Files: Create research/2026-06-05-phon-130-acoustic/{validate_parity.py,RESULTS.md}
- [ ] Step 1: Parity script —
validate_parity.py: on a few Hillenbrand stimulus wavs, compareacoustic.extract's steady-state F1–F3 against Praat-direct (parselmouthcallon the same settings, OR the publishedvowdata.datmeasured values for that exact stimulus) — assert F1–F3 within ±10 Hz, F0 within ±2 Hz (the umbrella §6 gate). Report pass/fail per clip. - [ ] Step 2: Run it (needs the local stimulus wavs + Parselmouth). Confirm the parity gate.
- [ ] Step 3: Write
RESULTS.md— parity table + 2–3 example extractions with their Hillenbrand percentiles (a known /i/ from a man should land near its own group's 50th pct). - [ ] Step 4: Commit
git add research/2026-06-05-phon-130-acoustic/{validate_parity.py,RESULTS.md} git commit -m "research(phon-130): Praat-parity validation + RESULTS"
Task 9: Full matrix¶
- [ ]
cd packages/web/workers && npx vitest run && npx tsc --noEmit;cd packages/web/frontend && npx vitest run && npx tsc --noEmit && npm run build;uv run python -m pytest packages/audio/tests/. - [ ] Manual (optional): start
phonolex_audio(now serving/acoustic) + worker + frontend; on/dev/acousticupload a vowel, pick the vowel + group, confirm F1–F3/F0 + percentile bands render and a Hillenbrand-matched production lands near the 50th pct.
Self-Review¶
Spec coverage: §3.1 extraction → Task 3; §3.2 norms → Task 1; §3.3 Worker proxy + overlay → Tasks 2,5; §3.4 dev page → Tasks 6,7; §4 validation → Task 8; §2 scope (vowel core, target-vowel+group) → Tasks 5,7; out-of-scope (VOT/COG, judgment) honored. ✓
Placeholder scan: Task 1's parse + Task 3's stimulus assertion adapt to the real vowdata.dat layout / Parselmouth output (inspect-first noted) — inherent to data-dependent research code; the overlay (Task 2) and route/service/viewer (Tasks 5–7) are complete code or pinned tests.
Type/name consistency: HillenbrandNorms/Overlay/percentile/overlayFor consistent Tasks 1–5; AcousticResult {formants:{f1,f2,f3,track}, f0:{value,track}, duration_ms, percentiles, target_vowel, group} consistent across server/route/service/viewer; group strings men/women/boys/girls, the 12 vowel IPA symbols, consistent across the norm table, route, and selectors.