Skip to content

PHON-130 — Acoustic Analysis /dev/acoustic — Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development. Steps use checkbox (- [ ]) syntax.

Goal: A /dev/acoustic dev page (Model #4) that extracts F1–F3 + F0 + duration from a vowel production via Parselmouth and overlays Hillenbrand percentile bands by target-vowel × speaker-group. Dev/validation surface, NOT the product.

Architecture: phonolex_audio gains a Parselmouth /acoustic endpoint (local Python); a build step turns Hillenbrand vowdata.dat into a bundled hillenbrandNorms.json; the Worker proxies extraction + computes the percentile in-TS (pinned to Python); an AcousticViewer dev page mirrors PronunciationViewer.

Tech Stack: Python (Parselmouth/Praat, the new dep), TypeScript (Workers overlay + route, React dev page), Vitest, pytest.

Spec: docs/superpowers/specs/2026-06-05-phon-130-acoustic-analysis-dev-page-design.md. Branch: research/phon-130-acoustic-analysis.


File Structure

  • research/2026-06-05-phon-130-acoustic/build_hillenbrand_norms.py — parse vowdata.dathillenbrandNorms.json.
  • packages/web/workers/src/config/hillenbrandNorms.json — bundled norm distributions (committed; Tier A).
  • packages/web/workers/src/lib/acousticOverlay.ts — percentile of a value vs a sorted norm array.
  • packages/web/workers/src/routes/audio.tsPOST /api/audio/acoustic (proxy + overlay).
  • packages/audio/src/phonolex_audio/acoustic.py — Parselmouth extraction.
  • packages/audio/src/phonolex_audio/server.pyPOST /acoustic.
  • packages/audio/pyproject.toml — add praat-parselmouth.
  • packages/web/frontend/src/services/acousticApi.tsanalyzeAcoustic().
  • packages/web/frontend/src/components/tools/AcousticViewer.tsx (+ test) — the dev page.
  • packages/web/frontend/src/main.tsx/dev/acoustic route.

Reuse: the phonolex_audio server multipart pattern, audio.ts proxy + JSON-import patterns, PronunciationViewer dev-page scaffold, the cumulative-percentile formula.


Task 1: Build Hillenbrand norm tables

Files: Create research/2026-06-05-phon-130-acoustic/build_hillenbrand_norms.py; Create packages/web/workers/src/config/hillenbrandNorms.json

  • [ ] Step 1: Inspect the data format first. Read /Volumes/ExternalData2/audio-datasets/hillenbrand_et_al_1995/h95-alldata/{readme.txt,vowdata.dat} (head). Confirm: the speaker-id prefix encodes group (m=men, w=women, b=boys, g=girls), the vowel code column (e.g. ae ah aw eh ei er ih iy oa oo uh uw → IPA æ ɑ ɔ ɛ eɪ ɝ ɪ i oʊ ʊ ʌ u), and the F0/F1/F2/F3 steady-state columns (Hz; 0 = not measured → drop). Note the exact column indices.

  • [ ] Step 2: Implement the builder (build_hillenbrand_norms.py): parse each row → (group, vowel_ipa, {f0,f1,f2,f3}), drop unmeasured (0) values, and emit sorted ascending arrays per (vowel, group):

    """Hillenbrand 1995 -> hillenbrandNorms.json for the Worker percentile overlay.
    { "vowels": ["i","ɪ",...], "groups": ["men","women","boys","girls"],
      "table": { "<vowel>|<group>": { "f1": [sorted Hz...], "f2": [...], "f3": [...], "f0": [...] } } }
    Run: uv run python build_hillenbrand_norms.py"""
    import json
    from pathlib import Path
    DAT = Path("/Volumes/ExternalData2/audio-datasets/hillenbrand_et_al_1995/h95-alldata/vowdata.dat")
    OUT = Path(__file__).resolve().parents[2] / "packages/web/workers/src/config/hillenbrandNorms.json"
    GROUP = {"m": "men", "w": "women", "b": "boys", "g": "girls"}
    VOWEL = {"ae":"æ","ah":"ɑ","aw":"ɔ","eh":"ɛ","ei":"eɪ","er":"ɝ",
             "ih":"ɪ","iy":"i","oa":"oʊ","oo":"ʊ","uh":"ʌ","uw":"u"}
    # adapt column parsing to the REAL vowdata.dat layout confirmed in Step 1.
    def main():
        acc = {}
        for line in DAT.read_text().splitlines():
            # parse: speaker-id (e.g. 'm01ae'), then F0,F1,F2,F3 steady-state cols
            # group = GROUP[id[0]]; vowel = VOWEL[id[3:5]]; values from the right columns
            ...  # implement against the confirmed layout
        table = {}
        for (vowel, group), d in acc.items():
            table[f"{vowel}|{group}"] = {k: sorted(v) for k, v in d.items() if v}
        OUT.write_text(json.dumps({"vowels": sorted(VOWEL.values()),
                                   "groups": list(GROUP.values()), "table": table}))
        print(f"wrote {OUT}: {len(table)} (vowel,group) cells")
    

  • [ ] Step 3: Run + verify. cd research/2026-06-05-phon-130-acoustic && uv run python build_hillenbrand_norms.py. Confirm ~48 cells (12 vowels × 4 groups), and sanity-check against published Hillenbrand means: e.g. men's /i/ (iy) median F1 ≈ 340 Hz, F2 ≈ 2240 Hz; women's /ɑ/ (ah) F1 ≈ 920 Hz. Print a couple of medians.

  • [ ] Step 4: Commit (JSON is a Tier-A derived statistic — ships):

    git add research/2026-06-05-phon-130-acoustic/build_hillenbrand_norms.py packages/web/workers/src/config/hillenbrandNorms.json
    git commit -m "data(phon-130): Hillenbrand vowel-formant norm tables (hillenbrandNorms.json)"
    


Task 2: In-Worker percentile overlay (acousticOverlay.ts)

Files: Create packages/web/workers/src/lib/acousticOverlay.ts; Create packages/web/workers/src/__tests__/acousticOverlay.test.ts

  • [ ] Step 1: Failing test

    import { percentile, overlayFor, type HillenbrandNorms } from '../lib/acousticOverlay';
    
    const NORMS: HillenbrandNorms = {
      vowels: ['i'], groups: ['men'],
      table: { 'i|men': { f1: [300, 320, 340, 360, 380], f2: [2200, 2220, 2240, 2260, 2280], f3: [], f0: [] } },
    };
    
    describe('percentile (cumulative bisect_right/N*100)', () => {
      it('is 0 below all, 100 at/above max, ~mid in the middle', () => {
        expect(percentile(290, [300, 320, 340, 360, 380])).toBe(0);
        expect(percentile(340, [300, 320, 340, 360, 380])).toBe(60); // bisect_right=3 -> 3/5*100
        expect(percentile(999, [300, 320, 340, 360, 380])).toBe(100);
      });
      it('empty norm array -> null (unmeasured)', () => {
        expect(percentile(340, [])).toBeNull();
      });
    });
    
    describe('overlayFor', () => {
      it('returns per-measure percentiles for the (vowel, group) cell', () => {
        const o = overlayFor({ f1: 340, f2: 2240, f3: 3000, f0: 120 }, 'i', 'men', NORMS);
        expect(o.f1).toBe(60);
        expect(o.f2).toBe(60);
        expect(o.f3).toBeNull(); // empty norms
      });
      it('missing cell -> all null', () => {
        const o = overlayFor({ f1: 340, f2: 2240, f3: 3000, f0: 120 }, 'u', 'boys', NORMS);
        expect(o).toEqual({ f1: null, f2: null, f3: null, f0: null });
      });
    });
    

  • [ ] Step 2: Run, expect FAIL. cd packages/web/workers && npx vitest run src/__tests__/acousticOverlay.test.ts

  • [ ] Step 3: Implement

    export interface HillenbrandNorms {
      vowels: string[]; groups: string[];
      table: Record<string, { f1: number[]; f2: number[]; f3: number[]; f0: number[] }>;
    }
    export interface Overlay { f1: number | null; f2: number | null; f3: number | null; f0: number | null; }
    
    /** cumulative percentile: bisect_right(sorted, v) / N * 100. null if no norms. */
    export function percentile(v: number, sorted: number[]): number | null {
      if (sorted.length === 0) return null;
      let lo = 0, hi = sorted.length;
      while (lo < hi) { const m = (lo + hi) >> 1; if (sorted[m] <= v) lo = m + 1; else hi = m; }
      return (lo / sorted.length) * 100;
    }
    
    export function overlayFor(
      vals: { f1: number; f2: number; f3: number; f0: number },
      vowel: string, group: string, norms: HillenbrandNorms,
    ): Overlay {
      const cell = norms.table[`${vowel}|${group}`];
      if (!cell) return { f1: null, f2: null, f3: null, f0: null };
      return {
        f1: percentile(vals.f1, cell.f1), f2: percentile(vals.f2, cell.f2),
        f3: percentile(vals.f3, cell.f3), f0: percentile(vals.f0, cell.f0),
      };
    }
    

  • [ ] Step 4: Run, expect PASS + npx tsc --noEmit.

  • [ ] Step 5: Fixture pin — add a small research/2026-06-05-phon-130-acoustic/percentile_fixture.py that computes bisect_right-based percentiles for a few (value, group, vowel) cases from hillenbrandNorms.json and a TS test asserting percentile/overlayFor match (the PHON-126/142 pattern). Confirms the Worker matches Python.

  • [ ] Step 6: Commit

    git add packages/web/workers/src/lib/acousticOverlay.ts packages/web/workers/src/__tests__/acousticOverlay.test.ts
    git commit -m "feat(phon-130): in-Worker Hillenbrand percentile overlay"
    


Task 3: Parselmouth extraction (acoustic.py)

Files: Modify packages/audio/pyproject.toml; Create packages/audio/src/phonolex_audio/acoustic.py; Create packages/audio/tests/test_acoustic.py

  • [ ] Step 1: Add the dep. In packages/audio/pyproject.toml add "praat-parselmouth" to dependencies. Run uv sync --package phonolex-audio (or the workspace equivalent); confirm uv run python -c "import parselmouth; print(parselmouth.__version__)".

  • [ ] Step 2: Implement extraction (acoustic.py): audio bytes → features. Steady state = median over the central 40% of the voiced region.

    """Parselmouth acoustic extraction for Model #4. F1-F3 track + steady-state, F0, duration."""
    from __future__ import annotations
    import io
    import numpy as np
    import parselmouth
    from parselmouth.praat import call
    
    # formant ceiling by group (Praat convention): men 5000, women/children 5500.
    CEILING = {"men": 5000.0, "women": 5500.0, "boys": 5500.0, "girls": 5500.0}
    
    def extract(audio_bytes: bytes, group: str = "women") -> dict:
        snd = parselmouth.Sound(io.BytesIO(audio_bytes))  # accepts wav bytes
        dur_ms = round(snd.get_total_duration() * 1000)
        ceiling = CEILING.get(group, 5500.0)
        formant = snd.to_formant_burg(max_number_of_formants=5, maximum_formant=ceiling)
        pitch = snd.to_pitch()
        ts = formant.ts()  # frame times
        f1 = [call(formant, "Get value at time", 1, t, "Hertz", "Linear") for t in ts]
        f2 = [call(formant, "Get value at time", 2, t, "Hertz", "Linear") for t in ts]
        f3 = [call(formant, "Get value at time", 3, t, "Hertz", "Linear") for t in ts]
        f0_track = [pitch.get_value_at_time(t) or float("nan") for t in ts]
        def steady(track):
            a = np.array(track, float); a = a[~np.isnan(a)]
            if a.size == 0: return None
            lo, hi = int(a.size*0.3), int(a.size*0.7) or a.size
            return float(np.median(a[lo:hi] if hi > lo else a))
        return {
            "formants": {"f1": steady(f1), "f2": steady(f2), "f3": steady(f3),
                         "track": {"t": list(ts), "f1": f1, "f2": f2, "f3": f3}},
            "f0": {"value": steady(f0_track), "track": f0_track},
            "duration_ms": dur_ms,
            "group": group,
        }
    

  • [ ] Step 3: Test (test_acoustic.py) against a real Hillenbrand stimulus wav (the dataset has wavs, or synthesize a steady tone): assert duration_ms > 0, formants.f1 is a plausible Hz value (200–1200), f0.value plausible (80–400). Use a known Hillenbrand /ɑ/ clip and assert F1 in a sane window. Run uv run python -m pytest packages/audio/tests/test_acoustic.py -v.

  • [ ] Step 4: Commit

    git add packages/audio/pyproject.toml packages/audio/src/phonolex_audio/acoustic.py packages/audio/tests/test_acoustic.py
    git commit -m "feat(phon-130): Parselmouth F1-F3/F0/duration extraction"
    


Task 4: /acoustic server endpoint

Files: Modify packages/audio/src/phonolex_audio/server.py; Modify packages/audio/tests/test_server.py

  • [ ] Step 1: Implement. Add POST /acoustic to build_app (multipart audio + optional group form field; default women), calling acoustic.extract(bytes, group). Reuse the existing multipart validation. /health may add acoustic: true.
  • [ ] Step 2: Test (test_server.py, stub acoustic.extract via monkeypatch — no real Praat in CI): /acoustic returns the feature JSON; missing audio → 400. Run uv run python -m pytest packages/audio/tests/test_server.py -v.
  • [ ] Step 3: Commit
    git add packages/audio/src/phonolex_audio/server.py packages/audio/tests/test_server.py
    git commit -m "feat(phon-130): phonolex_audio /acoustic endpoint"
    

Task 5: /api/audio/acoustic Worker route + overlay

Files: Modify packages/web/workers/src/routes/audio.ts; Modify packages/web/workers/src/__tests__/audio.test.ts

  • [ ] Step 1: Implement. Add POST /api/audio/acoustic: multipart (audio, target_vowel, group), proxy to the host /acoustic (reuse the AUDIO_INFERENCE_URL + warming pattern from fetchTranscript), then:
    import hillenbrandNorms from '../config/hillenbrandNorms.json';
    import { overlayFor, type HillenbrandNorms } from '../lib/acousticOverlay';
    // after extraction `ex`:
    const steady = { f1: ex.formants.f1, f2: ex.formants.f2, f3: ex.formants.f3, f0: ex.f0.value };
    const percentiles = overlayFor(steady, target_vowel, group, hillenbrandNorms as HillenbrandNorms);
    return c.json({ ...ex, target_vowel, group, percentiles });
    
    Handle null steady values (unvoiced/failed extraction) → percentiles null, 200 (descriptive, no throw).
  • [ ] Step 2: Test (mirror the pronounce tests; fetchMock the host /acoustic): validation 400s; a mocked extraction returns + the response carries percentiles. Run npx vitest run + npx tsc --noEmit.
  • [ ] Step 3: Commit
    git add packages/web/workers/src/routes/audio.ts packages/web/workers/src/__tests__/audio.test.ts
    git commit -m "feat(phon-130): /api/audio/acoustic proxy + percentile overlay"
    

Task 6: analyzeAcoustic() frontend service

Files: Modify packages/web/frontend/src/services/acousticApi.ts (Create); Create test

  • [ ] Step 1: Failing test — mirror audioApi.pronounce.test.ts: analyzeAcoustic(blob, 'i', 'men') posts multipart, returns the result; 503 → TranscriberWarmingError.
  • [ ] Step 2: Implement acousticApi.ts (mirror pronounceAudio): multipart POST /api/audio/acoustic with audio/target_vowel/group; AcousticResult type {formants, f0, duration_ms, percentiles, target_vowel, group}; reuse TranscriberWarmingError/freshRequestId/baseUrl.
  • [ ] Step 3: Run test + tsc. Commit
    git add packages/web/frontend/src/services/acousticApi.ts packages/web/frontend/src/services/acousticApi.test.ts
    git commit -m "feat(phon-130): analyzeAcoustic frontend service"
    

Task 7: AcousticViewer dev page

Files: Read PronunciationViewer.tsx first; Create packages/web/frontend/src/components/tools/AcousticViewer.tsx (+ test); Modify main.tsx

  • [ ] Step 1: Read PronunciationViewer.tsx — reuse its capture scaffold (record/upload/preloaded, warming-state discriminated union, the clip-injection test mechanism).
  • [ ] Step 2: Failing component test — render AcousticViewer, set target vowel + group, mock analyzeAcoustic to resolve a result with percentiles.f1=18, supply a clip via the file-upload mechanism, click Analyze, assert the F1 value + its percentile render. (Mirror PronunciationViewer.test.tsx's selectFile approach.)
  • [ ] Step 3: Implement AcousticViewer.tsx: capture controls (mirror) + target-vowel <Select> (12 vowels) + group <Select> (men/women/boys/girls) → Analyze → display F1–F3/F0/duration each with its percentile (a band: green if 10–90th pct, amber/red outside; null → "no norm"). Register <Route path="/dev/acoustic" element={<AcousticViewer />} /> in main.tsx.
  • [ ] Step 4: Frontend matrixnpx vitest run && npx tsc --noEmit && npm run build.
  • [ ] Step 5: Commit
    git add packages/web/frontend/src/components/tools/AcousticViewer.tsx packages/web/frontend/src/components/tools/AcousticViewer.test.tsx packages/web/frontend/src/main.tsx
    git commit -m "feat(phon-130): /dev/acoustic AcousticViewer page"
    

Task 8: Praat-parity validation + RESULTS

Files: Create research/2026-06-05-phon-130-acoustic/{validate_parity.py,RESULTS.md}

  • [ ] Step 1: Parity scriptvalidate_parity.py: on a few Hillenbrand stimulus wavs, compare acoustic.extract's steady-state F1–F3 against Praat-direct (parselmouth call on the same settings, OR the published vowdata.dat measured values for that exact stimulus) — assert F1–F3 within ±10 Hz, F0 within ±2 Hz (the umbrella §6 gate). Report pass/fail per clip.
  • [ ] Step 2: Run it (needs the local stimulus wavs + Parselmouth). Confirm the parity gate.
  • [ ] Step 3: Write RESULTS.md — parity table + 2–3 example extractions with their Hillenbrand percentiles (a known /i/ from a man should land near its own group's 50th pct).
  • [ ] Step 4: Commit
    git add research/2026-06-05-phon-130-acoustic/{validate_parity.py,RESULTS.md}
    git commit -m "research(phon-130): Praat-parity validation + RESULTS"
    

Task 9: Full matrix

  • [ ] cd packages/web/workers && npx vitest run && npx tsc --noEmit; cd packages/web/frontend && npx vitest run && npx tsc --noEmit && npm run build; uv run python -m pytest packages/audio/tests/.
  • [ ] Manual (optional): start phonolex_audio (now serving /acoustic) + worker + frontend; on /dev/acoustic upload a vowel, pick the vowel + group, confirm F1–F3/F0 + percentile bands render and a Hillenbrand-matched production lands near the 50th pct.

Self-Review

Spec coverage: §3.1 extraction → Task 3; §3.2 norms → Task 1; §3.3 Worker proxy + overlay → Tasks 2,5; §3.4 dev page → Tasks 6,7; §4 validation → Task 8; §2 scope (vowel core, target-vowel+group) → Tasks 5,7; out-of-scope (VOT/COG, judgment) honored. ✓

Placeholder scan: Task 1's parse + Task 3's stimulus assertion adapt to the real vowdata.dat layout / Parselmouth output (inspect-first noted) — inherent to data-dependent research code; the overlay (Task 2) and route/service/viewer (Tasks 5–7) are complete code or pinned tests.

Type/name consistency: HillenbrandNorms/Overlay/percentile/overlayFor consistent Tasks 1–5; AcousticResult {formants:{f1,f2,f3,track}, f0:{value,track}, duration_ms, percentiles, target_vowel, group} consistent across server/route/service/viewer; group strings men/women/boys/girls, the 12 vowel IPA symbols, consistent across the norm table, route, and selectors.