Speech Analysis (Beta)¶

Record or upload a spoken production against a target word and get back what was actually said (a narrow phonetic transcript) and how it deviated from the target (a per-position deviation overlay).

Beta — decision support, not diagnosis

Speech Analysis is a clinician-in-the-loop decision-support tool, not an autonomous assessment. It surfaces structured phonetic evidence — a faithful transcript and graded per-position deviations — for you to interpret. It does not diagnose, and its outputs are not a clinical score. The audio model is under active development.

Speech Analysis is consent-gated: before recording or uploading, you must acknowledge a consent notice covering how audio is processed. Per-IP rate limits and usage quotas also apply during the beta.

What it does¶

For each production you supply (a target word + an audio clip), the tool returns two things:

The faithful transcript — the phonemes the model actually heard, in narrow form ("what we heard"), aligned against the canonical target.
The deviation overlay — each target phoneme, colored by how far the production drifted from it, with the nearest reference sound on hover. Where the nearest sound differs from the target, the position is flagged as a substitution (e.g. target /ɹ/, nearest /w/).

This is the core of the tool, and it works on a single clip.

How it works¶

PhonoLex's audio model emits, for every frame of audio, a 26-dimensional articulatory feature vector — the same learned feature space the rest of the platform uses for phonological similarity. Every phoneme is represented as a short trajectory (a path) through that space, and your production is scored against a reference trajectory for each target phoneme. The deviation is a discriminatively-weighted distance: larger means the production's path through articulatory space drifted further from the target's. The nearest reference is simply the phoneme whose trajectory the production actually came closest to — so the tool can name the sound that was produced, not just flag that the target was missed.

For the model internals, see Technical → Audio Model.

Using it¶

1. Set a target¶

Type the target word. The tool checks it against the PhonoLex lexicon in real time and shows whether it is supported (we have a canonical pronunciation to score against) — its canonical phonemes are previewed when it is. Words not in the dictionary cannot be scored.

Recording and upload stay disabled until a supported target is set, so a clip is always attached to a target.

2. Add a production¶

Three ways to add a production to the session:

Record — capture from the microphone against the current target.
Upload — attach a single audio clip to the current target.
Batch upload — select several files at once. Each becomes an editable row, its target seeded from the filename and verified against the lexicon. Fix or remove any row, then run the verified ones on demand — nothing is analyzed silently.

A single production is just a session of one. Each production carries its own target, so a session can be the same word repeated or a whole word-list probe.

3. Read the result¶

Each production renders as a card: the target, the faithful transcript beneath it, and the deviation overlay. Hover any position for its deviation value and the nearest sound. Substitutions are flagged.

Scope and limits¶

Word-level, broad-phoneme. Targets are single words drawn from the lexicon. Connected-sentence input and automatic slicing of longer utterances into word-level productions are planned, not yet available. Scoring is over a broad phonetic inventory; fine sub-phonemic distortions are not separately modeled.
Beta. The audio model is under active development; its coverage and outputs may change as it matures. Per-IP rate limits and usage quotas apply.
Source attribution not surfaced. A session-level speaker-pattern attribution (typical / accent / developmental / motor) is computed server-side but is not currently shown in the UI.
Supporting evidence only. Treat every output as structured input to your own clinical judgment.