PHON-130 — Model #4: Acoustic Analysis (/dev/acoustic dev page)¶
Status: design — pending plan
Ticket: PHON-130
Parent: PHON-44 Audio
Umbrella spec: docs/superpowers/specs/2026-05-30-v6-audio-support-design.md §2.4
Date: 2026-06-05
1. Goal & framing¶
Deliverable = a dev page (/dev/acoustic), an internal validation surface for Model #4 — NOT the
user-facing product. Synthesis of the v6 audio dev pages into a shipped tool is a separate, later,
user-driven step. This page lets us validate that Parselmouth-driven acoustic extraction + the
Hillenbrand percentile overlay work, the same way /dev/pronounce validated Model #2.
Model #4 extracts acoustic measurements from a vowel production and wraps them in percentile-anchored norms — the differentiator vs. raw Praat. v1 is the vowel-normed core: the measurements that have real norms (formants + F0), so the overlay is front and center.
2. Scope (v1)¶
Extract: formant track F1–F3 (+ steady-state at the vowel midpoint), F0 track + steady-state,
and duration. Overlay: Hillenbrand (1995) percentile bands for a user-selected target vowel ×
speaker group (men / women / boys / girls). Input: an isolated-vowel or hVd-word production
(matching Hillenbrand's elicitation protocol → a clean steady state).
Out of scope (v1): VOT, COG, spectral moments (no Hillenbrand norms → descriptive-only — deferred to v1.x); any judgment/clinical-opinion layer (Model #4 is descriptive only); connected-speech vowel location; the production "Praat-on-Cloudflare" deployment question (§3.2 of the umbrella spec) — this runs Parselmouth locally, dev-only.
3. Architecture¶
Mirrors the transcribe/pronounce flow (extraction host → Worker proxy → dev page):
audio (vowel production) + target_vowel + group
→ [phonolex_audio /acoustic] Parselmouth: F1-F3 track + steady-state, F0, duration
→ [Worker /api/audio/acoustic] overlay Hillenbrand percentile for (vowel, group)
→ [/dev/acoustic AcousticViewer] steady-state values + percentile bands + raw tracks
3.1 Extraction service (packages/audio)¶
- New
packages/audio/src/phonolex_audio/acoustic.py— Parselmouth extraction. One responsibility: audio bytes →{formants:{f1,f2,f3,track[]}, f0:{value,track[]}, duration_ms}. Steady state = median over the central ~40% of the voiced region (robust to onset/offset transitions). Standard Praat settings (formant ceiling 5500 Hz women/children, 5000 Hz men — selectable by group; pitch floor/ceiling by group). server.pygainsPOST /acoustic(multipartaudio+ optionalgrouphint) → the extraction JSON. Reuses the existing multipart/validation pattern. New dependency:praat-parselmouth(bundles Praat; pip-installable) added topackages/audio/pyproject.toml.
3.2 Hillenbrand norm tables¶
- A build script (
research/2026-06-05-phon-130-acoustic/build_hillenbrand_norms.py) parses/Volumes/ExternalData2/audio-datasets/hillenbrand_et_al_1995/h95-alldata/vowdata.dat→ per (vowel, group) sorted distributions of F1/F2/F3/F0 →hillenbrandNorms.json, bundled into the Worker (packages/web/workers/src/config/). Hillenbrand is Tier A (ships freely). The 12 Hillenbrand vowels (ɑ æ ʌ ɔ ɛ ɝ eɪ ɪ i oʊ ʊ u) mapped to PhonoLex broad IPA.
3.3 Worker¶
POST /api/audio/acoustic(packages/web/workers/src/routes/audio.tsor a sibling) proxies the extraction host (theAUDIO_INFERENCE_URLpattern), then computes the percentile of each steady-state value againsthillenbrandNorms.jsonfor (target_vowel, group) using the existing cumulative formula (bisect_right / N * 100). Overlay logic in-Worker TS, in a focusedacousticOverlay.ts, pinned to the Python percentile via a frozen fixture (the PHON-126/142 pattern). Response:{formants, f0, duration_ms, percentiles:{f1,f2,f3,f0}, target_vowel, group}.
3.4 Dev page¶
packages/web/frontend/src/components/tools/AcousticViewer.tsxat/dev/acoustic(registered inmain.tsxbeside the other dev routes). MirrorsPronunciationViewer: record/upload/preloaded clip- target-vowel selector (12 vowels) + speaker-group selector → display steady-state F1–F3/F0/
duration with percentile bands (a value's percentile rendered as a position within the
group's typical range; outliers flagged) + the raw track values.
acousticApi.tsservice (multipart, mirrorspronounceAudio).
4. Validation¶
- Praat parity (the umbrella §6 gate): extracted F1–F3 within ±10 Hz and F0 within ±2 Hz of Praat-direct extraction on a reference clip (a Hillenbrand stimulus with a known measurement).
- Percentile-overlay correctness: the Worker percentile matches the Python
hillenbrandNormstable (frozen fixture, 1e-6) and matches Hillenbrand's published group means (a known vowel's median lands at ~50th pct for its own group). - A small
research/2026-06-05-phon-130-acoustic/RESULTS.mdrecording the parity + a few example extractions.
5. Done when¶
phonolex_audio /acoustic(Parselmouth) live locally;praat-parselmouthadded.hillenbrandNorms.jsonbuilt (Tier A) + bundled;acousticOverlay.tspinned to the Python percentile.POST /api/audio/acousticreturns features + percentiles for (target_vowel, group)./dev/acousticpage mounted, mirroring the other dev viewers; record/upload/preloaded + vowel/group selectors + percentile-band display.- Praat-parity validation passes (±10 Hz / ±2 Hz); percentile correctness confirmed.
6. References¶
- Umbrella
docs/superpowers/specs/2026-05-30-v6-audio-support-design.md§2.4, §6. - Hillenbrand et al. (1995)
vowdata.dat(reservoir; Tier A) — the norm source. packages/audio/src/phonolex_audio/{server.py}(host pattern),routes/audio.ts(proxy pattern),PronunciationViewer.tsx(dev-page pattern), the L1-prior/cos_dist fixture-pin pattern.- [[project_audio_targeted_models]] (the five-model map; Model #4 unblocks Model #5).