Skip to content

PHON-130 — Model #4: Acoustic Analysis (/dev/acoustic dev page)

Status: design — pending plan Ticket: PHON-130 Parent: PHON-44 Audio Umbrella spec: docs/superpowers/specs/2026-05-30-v6-audio-support-design.md §2.4 Date: 2026-06-05

1. Goal & framing

Deliverable = a dev page (/dev/acoustic), an internal validation surface for Model #4 — NOT the user-facing product. Synthesis of the v6 audio dev pages into a shipped tool is a separate, later, user-driven step. This page lets us validate that Parselmouth-driven acoustic extraction + the Hillenbrand percentile overlay work, the same way /dev/pronounce validated Model #2.

Model #4 extracts acoustic measurements from a vowel production and wraps them in percentile-anchored norms — the differentiator vs. raw Praat. v1 is the vowel-normed core: the measurements that have real norms (formants + F0), so the overlay is front and center.

2. Scope (v1)

Extract: formant track F1–F3 (+ steady-state at the vowel midpoint), F0 track + steady-state, and duration. Overlay: Hillenbrand (1995) percentile bands for a user-selected target vowel × speaker group (men / women / boys / girls). Input: an isolated-vowel or hVd-word production (matching Hillenbrand's elicitation protocol → a clean steady state).

Out of scope (v1): VOT, COG, spectral moments (no Hillenbrand norms → descriptive-only — deferred to v1.x); any judgment/clinical-opinion layer (Model #4 is descriptive only); connected-speech vowel location; the production "Praat-on-Cloudflare" deployment question (§3.2 of the umbrella spec) — this runs Parselmouth locally, dev-only.

3. Architecture

Mirrors the transcribe/pronounce flow (extraction host → Worker proxy → dev page):

audio (vowel production) + target_vowel + group
  → [phonolex_audio /acoustic]  Parselmouth: F1-F3 track + steady-state, F0, duration
  → [Worker /api/audio/acoustic]  overlay Hillenbrand percentile for (vowel, group)
  → [/dev/acoustic AcousticViewer]  steady-state values + percentile bands + raw tracks

3.1 Extraction service (packages/audio)

  • New packages/audio/src/phonolex_audio/acoustic.py — Parselmouth extraction. One responsibility: audio bytes → {formants:{f1,f2,f3,track[]}, f0:{value,track[]}, duration_ms}. Steady state = median over the central ~40% of the voiced region (robust to onset/offset transitions). Standard Praat settings (formant ceiling 5500 Hz women/children, 5000 Hz men — selectable by group; pitch floor/ceiling by group).
  • server.py gains POST /acoustic (multipart audio + optional group hint) → the extraction JSON. Reuses the existing multipart/validation pattern. New dependency: praat-parselmouth (bundles Praat; pip-installable) added to packages/audio/pyproject.toml.

3.2 Hillenbrand norm tables

  • A build script (research/2026-06-05-phon-130-acoustic/build_hillenbrand_norms.py) parses /Volumes/ExternalData2/audio-datasets/hillenbrand_et_al_1995/h95-alldata/vowdata.dat → per (vowel, group) sorted distributions of F1/F2/F3/F0 → hillenbrandNorms.json, bundled into the Worker (packages/web/workers/src/config/). Hillenbrand is Tier A (ships freely). The 12 Hillenbrand vowels (ɑ æ ʌ ɔ ɛ ɝ eɪ ɪ i oʊ ʊ u) mapped to PhonoLex broad IPA.

3.3 Worker

  • POST /api/audio/acoustic (packages/web/workers/src/routes/audio.ts or a sibling) proxies the extraction host (the AUDIO_INFERENCE_URL pattern), then computes the percentile of each steady-state value against hillenbrandNorms.json for (target_vowel, group) using the existing cumulative formula (bisect_right / N * 100). Overlay logic in-Worker TS, in a focused acousticOverlay.ts, pinned to the Python percentile via a frozen fixture (the PHON-126/142 pattern). Response: {formants, f0, duration_ms, percentiles:{f1,f2,f3,f0}, target_vowel, group}.

3.4 Dev page

  • packages/web/frontend/src/components/tools/AcousticViewer.tsx at /dev/acoustic (registered in main.tsx beside the other dev routes). Mirrors PronunciationViewer: record/upload/preloaded clip
  • target-vowel selector (12 vowels) + speaker-group selector → display steady-state F1–F3/F0/ duration with percentile bands (a value's percentile rendered as a position within the group's typical range; outliers flagged) + the raw track values. acousticApi.ts service (multipart, mirrors pronounceAudio).

4. Validation

  • Praat parity (the umbrella §6 gate): extracted F1–F3 within ±10 Hz and F0 within ±2 Hz of Praat-direct extraction on a reference clip (a Hillenbrand stimulus with a known measurement).
  • Percentile-overlay correctness: the Worker percentile matches the Python hillenbrandNorms table (frozen fixture, 1e-6) and matches Hillenbrand's published group means (a known vowel's median lands at ~50th pct for its own group).
  • A small research/2026-06-05-phon-130-acoustic/RESULTS.md recording the parity + a few example extractions.

5. Done when

  • phonolex_audio /acoustic (Parselmouth) live locally; praat-parselmouth added.
  • hillenbrandNorms.json built (Tier A) + bundled; acousticOverlay.ts pinned to the Python percentile.
  • POST /api/audio/acoustic returns features + percentiles for (target_vowel, group).
  • /dev/acoustic page mounted, mirroring the other dev viewers; record/upload/preloaded + vowel/group selectors + percentile-band display.
  • Praat-parity validation passes (±10 Hz / ±2 Hz); percentile correctness confirmed.

6. References

  • Umbrella docs/superpowers/specs/2026-05-30-v6-audio-support-design.md §2.4, §6.
  • Hillenbrand et al. (1995) vowdata.dat (reservoir; Tier A) — the norm source.
  • packages/audio/src/phonolex_audio/{server.py} (host pattern), routes/audio.ts (proxy pattern), PronunciationViewer.tsx (dev-page pattern), the L1-prior/cos_dist fixture-pin pattern.
  • [[project_audio_targeted_models]] (the five-model map; Model #4 unblocks Model #5).