API Reference¶

PhonoLex provides a public REST API for programmatic access to its full dataset. No API key required.

Base URL: https://phonolex.com/api

Interactive docs: Available at /docs (Swagger UI) and /redoc (ReDoc).

Deployment: Cloudflare Workers + D1 (edge-deployed).

Quick Examples¶

Python¶

import requests

BASE = "https://phonolex.com/api"

# Look up a word
word = requests.get(f"{BASE}/words/cat").json
print(word["ipa"], word["frequency"], word["concreteness"])

# Search for CVC words with high frequency
results = requests.post(f"{BASE}/words/search", json={
 "patterns": [{"type": "STARTS_WITH", "phoneme": "k"}],
 "filters": {"min_frequency": 50, "max_syllable_count": 1},
 "sort_by": "frequency",
 "limit": 20
}).json
for w in results["items"]:
 print(w["word"], w["frequency"])

curl¶

# Health check
curl https://phonolex.com/api/health

# Word lookup
curl https://phonolex.com/api/words/cat

# Search
curl -X POST https://phonolex.com/api/words/search \
 -H "Content-Type: application/json" \
 -d '{"filters": {"min_concreteness": 4.5, "max_syllable_count": 2}, "limit": 10}'

R¶

library(httr)
library(jsonlite)

base <- "https://phonolex.com/api"

# Word lookup
word <- fromJSON(content(GET(paste0(base, "/words/cat")), "text"))

# Batch lookup
batch <- fromJSON(content(POST(
 paste0(base, "/words/batch"),
 body = toJSON(list(words = c("cat", "dog", "fish")), auto_unbox = TRUE),
 content_type_json
), "text"))

Endpoints¶

Meta¶

`GET /api/health`¶

Health check with vocabulary stats.

{
 "status": "healthy",
 "vocabulary_size": 44011,
 "total_edges": 1012327
}

`GET /api/stats`¶

Full statistics including edge type counts and property coverage.

`GET /api/property-metadata`¶

Property definitions with labels, categories, sources, and display configuration. Use this to dynamically build UIs or understand what each property means.

`GET /api/property-ranges`¶

Min/max values for all numeric properties. Useful for building filter sliders.

`GET /api/edge-types`¶

Edge type definitions with labels and descriptions for the 7 relationship types.

Words¶

`GET /api/words/{word}`¶

Get full word data with all properties and percentile ranks.

Example: GET /api/words/cat

{
 "word": "cat",
 "ipa": "kæt",
 "phonemes": ["k", "æ", "t"],
 "syllables": [{"onset": ["k"], "nucleus": "æ", "coda": ["t"], "stress": 1}],
 "phoneme_count": 3,
 "syllable_count": 1,
 "frequency": 57.39,
 "frequency_percentile": 95.8,
 "concreteness": 5.0,
 "concreteness_percentile": 91.2,
 "valence": 6.34,
 "aoa": 3.72,
 ...
}

All properties are returned (null if unavailable for that word), plus {property}_percentile fields (0–100, cumulative percentile rank). 35 properties are filterable via the API; additional structural/derived fields are also included.

`GET /api/words`¶

Browse the vocabulary with pagination and sorting.

Parameter	Type	Default	Description
`sort_by`	string	null	Property to sort by (e.g. `frequency`, `aoa`)
`sort_order`	string	`desc`	`asc` or `desc`
`limit`	int	50	Max items (1–5000)
`offset`	int	0	Items to skip

Response: { items: [...], total: 44011, offset: 0, limit: 50 }

`POST /api/words/search`¶

Unified search combining phoneme patterns, property filters, exclusion rules, sorting, and pagination. This is the primary search endpoint.

Request body:

{
 "patterns": [
 {"type": "STARTS_WITH", "phoneme": "k"},
 {"type": "ENDS_WITH", "phoneme": "t"}
 ],
 "filters": {
 "min_frequency": 10,
 "max_syllable_count": 2,
 "min_concreteness": 3.0
 },
 "exclude_phonemes": ["ʃ", "ʒ"],
 "sort_by": "frequency",
 "sort_order": "desc",
 "limit": 50,
 "offset": 0
}

Pattern types:

Type	Description	Example
`STARTS_WITH`	Word begins with phoneme(s)	`"k"` matches cat, keep, kind
`ENDS_WITH`	Word ends with phoneme(s)	`"t"` matches cat, sit, want
`CONTAINS`	Word contains phoneme(s) anywhere	`"æ"` matches cat, bat, happy
`CONTAINS_MEDIAL`	Contains phoneme(s) in medial position	`"æ"` matches happy (not cat)

Phonemes use IPA notation. Multiple phonemes in a sequence are space-separated: "s t" matches words containing the /st/ cluster.

Filter fields: min_{property} and max_{property} for any of the 35 filterable properties. Multiple filters use AND logic. See GET /api/property-metadata for the full list.

Response: Same paginated format as GET /api/words.

`POST /api/words/batch`¶

Look up multiple words at once. Unknown words are silently omitted.

{"words": ["cat", "dog", "fish", "xyzzy"]}

Returns an array of word objects (max 1000 words per request).

`POST /api/words/word-list`¶

Resolve constraints to a flat word list. Lightweight alternative to search when the caller only needs words, not full records.

Request body:

{
 "include_phonemes": ["k"],
 "exclude_phonemes": ["ɹ"],
 "filters": {
 "min_concreteness": 3.0,
 "max_aoa": 7.0
 }
}

All fields are optional but at least one must be provided. include_phonemes uses OR logic (word contains ANY listed phoneme). exclude_phonemes uses AND logic (word contains NONE of the listed phonemes). Filters use the same min_/max_ convention as the search endpoint.

Response:

{
 "words": ["cat", "dog", "fish", ...],
 "total": 12847
}

Similarity¶

`POST /api/similarity/search`¶

Find phonologically similar words using soft Levenshtein distance on learned feature vectors.

Request body:

{
 "word": "cat",
 "threshold": 0.7,
 "limit": 20,
 "onset_weight": 0.33,
 "nucleus_weight": 0.33,
 "coda_weight": 0.33
}

Parameter	Type	Default	Description
`word`	string	required	Target word
`threshold`	float	0.7	Minimum similarity (0–1)
`limit`	int	50	Max results (1–500)
`onset_weight`	float	0.33	Weight for onset similarity
`nucleus_weight`	float	0.33	Weight for nucleus similarity
`coda_weight`	float	0.33	Weight for coda similarity

Weight presets:

Preset	Onset	Nucleus	Coda	Use case
Balanced	0.33	0.33	0.33	Overall similarity
Rhymes	0.0	0.5	0.5	Rhyming words
Alliteration	1.0	0.5	0.0	Same initial sound
Assonance	0.0	1.0	0.0	Matching vowels
Consonance	0.5	0.0	0.5	Matching consonants

Response:

[
 {
 "word": { "word": "bat", "ipa": "bæt", ... },
 "similarity": 0.92
 },
 ...
]

Associations¶

`GET /api/associations/{word}`¶

Get cognitive associations from the graph. Returns edges from up to 6 relationship types.

Parameter	Type	Default	Description
`edge_types`	string	all	Comma-separated types: `USF`, `MEN`, `ECCC`, `SPP`, `SimLex`, `WordSim`
`limit`	int	50	Max edges
`offset`	int	0	Pagination offset

Example: GET /api/associations/cat?edge_types=USF,ECCC&limit=10

Response:

{
 "word": "cat",
 "associations": [
 {
 "target": "dog",
 "edge_sources": ["USF"],
 "in_vocabulary": true,
 "usf_forward": 0.178
 },
 ...
 ],
 "total": 12,
 "edge_type_counts": {"USF": 5, "ECCC": 2}
}

`GET /api/associations/{word}/confusability`¶

Get ECCC perceptual confusability edges only (words confused in noise).

`GET /api/associations/compare`¶

Compare shared associations between two words.

Parameter	Type	Description
`word1`	string	First word
`word2`	string	Second word

Returns shared targets, Jaccard similarity, and degree for each word.

Phonemes¶

`GET /api/phonemes`¶

List all 39 English phonemes with their articulatory features.

`GET /api/phonemes/{ipa}`¶

Get features for a single phoneme (38 distinctive features). ASCII g is automatically normalized to IPA ɡ (U+0261).

Example: GET /api/phonemes/k

{
 "ipa": "k",
 "type": "consonant",
 "features": {
 "consonantal": "+",
 "sonorant": "-",
 "continuant": "-",
 "dorsal": "+",
 ...
 }
}

`POST /api/phonemes/compare`¶

Compare two phonemes feature by feature.

{"phoneme1": "k", "phoneme2": "ɡ"}

Returns shared features, differing features, and a similarity score.

`POST /api/phonemes/search`¶

Find phonemes matching specific feature values.

{"features": {"consonantal": "+", "dorsal": "+", "sonorant": "-"}}

Returns all phonemes matching the given feature constraints.

Contrastive Sets¶

`POST /api/contrastive/minimal-pairs`¶

Find minimal pairs for a phoneme contrast.

{
 "phoneme1": "k",
 "phoneme2": "ɡ",
 "position": "initial",
 "limit": 20
}

Parameter	Type	Default	Description
`phoneme1`	string	required	First phoneme (IPA)
`phoneme2`	string	required	Second phoneme (IPA)
`position`	string	null	`initial`, `medial`, `final`, or null for any
`limit`	int	50	Max pairs (1–500)

Response:

[
 {
 "word1": { "word": "cap", ... },
 "word2": { "word": "gap", ... },
 "position": 0,
 "phoneme1": "k",
 "phoneme2": "ɡ"
 },
 ...
]

`POST /api/contrastive/maximal-opposition/pairs`¶

Generate maximally opposed phoneme pairs from a list of unknown phonemes (Gierut 1989–1992). Returns pairs ranked by feature distance.

{
 "unknown_phonemes": ["k", "ɡ", "t", "d"],
 "top_n": 5
}

`POST /api/contrastive/maximal-opposition/word-lists`¶

Find word pairs for a specific maximal opposition phoneme pair.

{
 "phoneme1": "k",
 "phoneme2": "m",
 "position": "initial",
 "max_pairs": 10
}

`POST /api/contrastive/multiple-opposition/targets`¶

Select representative target phonemes for multiple opposition therapy (Maximal Classification + Maximal Distinction).

{
 "substitute_phoneme": "t",
 "target_phonemes": ["k", "ɡ", "d", "s"],
 "count": 3
}

`POST /api/contrastive/multiple-opposition/sets`¶

Generate minimal sets (triplets/quadruplets) for multiple opposition therapy.

{
 "substitute_phoneme": "t",
 "target_phonemes": ["k", "d"],
 "position": "initial",
 "max_sets": 10
}

Text Analysis¶

`POST /api/text/analyze`¶

Analyze a passage for phonological and psycholinguistic properties.

{"text": "The quick brown fox jumps over the lazy dog."}

Response:

{
 "total_words": 9,
 "analyzed_words": 9,
 "unknown_words": [],
 "coverage_percent": 100.0,
 "aggregate_percentiles": {
 "frequency_percentile": 89.2,
 "concreteness_percentile": 54.1,
 "aoa_percentile": 31.7,
 ...
 },
 "word_details": [
 {
 "word": "quick",
 "percentiles": {
 "frequency_percentile": 82.1,
 "concreteness_percentile": 32.5,
 ...
 }
 },
 ...
 ]
}

aggregate_percentiles are weighted averages across all analyzed words. word_details gives per-word percentiles for highlighting and drill-down.

Sentences¶

`POST /api/sentences`¶

Retrieve naturalistic English sentences satisfying a constraint set. Sentences are drawn from the curated ~236K-sentence corpus (CoLA, UD English-EWT, GUM, Tatoeba, OpenSubtitles), gated at corpus-build time for SLP suitability.

Request body:

{
 "constraints": [
 {"type": "pattern", "pattern_type": "STARTS_WITH", "phonemes": ["k"], "mode": "include"},
 {"type": "pattern", "pattern_type": "CONTAINS", "phonemes": ["ɹ"], "mode": "exclude"},
 {"type": "bound", "norm": "freq_age_2y_percentile", "min_value": 40},
 {"type": "contrastive_minpair", "phoneme1": "b", "phoneme2": "d", "position": "initial"}
 ],
 "top_k": 50
}

Constraint types:

Type	Semantics	Parameters
`pattern`	Phoneme position match against words in the sentence (`STARTS_WITH` / `ENDS_WITH` / `CONTAINS` / `CONTAINS_MEDIAL`)	`pattern_type`, `phonemes`, `mode` (`include` / `exclude`)
`bound`	Psycholinguistic-norm threshold per content word. Raw norms use NULL-pass; `*_percentile` properties use NULL-fail.	`norm`, `min_value` / `max_value`
`contrastive_minpair`	Sentence must contain BOTH members of a minimal pair witness through the `pairs` table	`phoneme1`, `phoneme2`, optional `position`
`contrastive_maxopp`	Minpair + sonorant-class crossing	`phoneme1`, `phoneme2`, optional `position`, optional `min_sonorant_diff`
`contrastive_multopp`	Sentence must contain a substitute word + contrast partners covering ≥`n_targets` distinct target phonemes. Accepted for API completeness but not surfaced in the Sentences UI — rarely witnessable in a single sentence for `n_targets ≥ 2`.	`substitute`, `targets`, optional `n_targets`, optional `position`

Ranking: tiered globally by match_count (per-query count of distinct words satisfying include / contrastive rules), then source-interleaved within tier by static rarity_score. Multi-hit sentences come back ahead of single-hit sentences regardless of source.

Response:

{
 "corpus_matches": [
 {
 "text": "She drained the bath and the brain trust took notes.",
 "sources": ["opensubtitles"],
 "rarity_score": 0.0287,
 "match_count": 2,
 "n_content_in_vocab": 7,
 "highlights": {
 "include_surfaces": [],
 "pair_surfaces": ["brain", "drain"]
 }
 }
 ],
 "total": 1,
 "elapsed_ms": {"corpus": 184, "total": 184}
}

highlights.pair_surfaces carries both members of every witnessed minpair/maxopp/multopp pair, so a UI overlay can underline the contrast at a glance. highlights.include_surfaces carries the surfaces matching any surface-include rule (patterns, CV shape). Both arrays are present (empty when no rules are active).

Properties¶

Properties available on word objects, grouped by category. Around 150 columns total — most are filterable via /api/words/search and /api/sentences bound rules. Each numeric property also has a {property}_percentile field (0–100); frequency-class percentiles treat value=0 as NULL.

Category	Properties
Phonological Complexity	`syllable_count`, `phoneme_count`, `wcm_score`, `cv_shape`
Phonotactic Probability	`phono_prob_avg`, `positional_prob_avg`, `str_phono_prob_avg`, `str_positional_prob_avg`, `neighborhood_density`, `str_neighborhood_density`
Lexical Frequency	`frequency`, `log_frequency`, `contextual_diversity` (PhonoLex FineWeb-Edu derivation)
Developmental Frequency	`freq_age_2y`, `freq_age_5y`, `freq_age_8y`, `freq_age_12y` (child PRODUCTION from CHILDES + PhonBank), `freq_age_all` (alias for `frequency`)
Child-Corpus Frequency	`freq_cyplex_7_9`, `freq_cyplex_10_12`, `freq_cyplex_13` (CYP-LEX)
Lexical Timing	`aoa` (PhonoLex in-house gpt-4.1-mini cloze; 1-7 age-banded, Spearman 0.868 vs Glasgow)
Semantic	`imageability`, `familiarity`, `concreteness`, `boi`, `iconicity`, `socialness`, `semantic_diversity`, `semd_topic`, `semd_vn`, `semd_h13`, `n_topics_for_word` (PhonoLex in-house gpt-4.1-mini)
Affective	`valence`, `arousal` (PhonoLex in-house, Warriner-scale)
Morphological	`morpheme_count`, `is_monomorphemic`, `n_prefixes`, `n_suffixes` (algorithmic + MorphyNet)
POS	`pos_dominant_freq`

Retired columns: dominance (Warriner D axis was never re-derived), prevalence, aoa_kuperman, elp_lexical_decision_rt, Lancaster sensorimotor channels (auditory, visual, haptic, gustatory, olfactory, interoceptive, hand_arm, foot_leg, head, mouth, torso), size, freq_age_adult (renamed to freq_age_all).

Error Handling¶

Status	Meaning
`200`	Success
`404`	Word or phoneme not found
`422`	Validation error (bad request body)
`429`	Rate limit exceeded (check `Retry-After` header)
`500`	Server error

Error responses include a detail field with a human-readable message.

API Reference¶

Quick Examples¶

Python¶

curl¶

R¶

Endpoints¶

Meta¶

GET /api/health¶

GET /api/stats¶

GET /api/property-metadata¶

GET /api/property-ranges¶

GET /api/edge-types¶

Words¶

GET /api/words/{word}¶

GET /api/words¶

POST /api/words/search¶

POST /api/words/batch¶

POST /api/words/word-list¶

Similarity¶

POST /api/similarity/search¶

Associations¶

GET /api/associations/{word}¶

GET /api/associations/{word}/confusability¶

GET /api/associations/compare¶

Phonemes¶

GET /api/phonemes¶

GET /api/phonemes/{ipa}¶

POST /api/phonemes/compare¶

POST /api/phonemes/search¶

Contrastive Sets¶

POST /api/contrastive/minimal-pairs¶

POST /api/contrastive/maximal-opposition/pairs¶

POST /api/contrastive/maximal-opposition/word-lists¶

POST /api/contrastive/multiple-opposition/targets¶

POST /api/contrastive/multiple-opposition/sets¶

Text Analysis¶

POST /api/text/analyze¶

Sentences¶

POST /api/sentences¶

Properties¶

Error Handling¶

`GET /api/health`¶

`GET /api/stats`¶

`GET /api/property-metadata`¶

`GET /api/property-ranges`¶

`GET /api/edge-types`¶

`GET /api/words/{word}`¶

`GET /api/words`¶

`POST /api/words/search`¶

`POST /api/words/batch`¶

`POST /api/words/word-list`¶

`POST /api/similarity/search`¶

`GET /api/associations/{word}`¶

`GET /api/associations/{word}/confusability`¶

`GET /api/associations/compare`¶

`GET /api/phonemes`¶

`GET /api/phonemes/{ipa}`¶

`POST /api/phonemes/compare`¶

`POST /api/phonemes/search`¶

`POST /api/contrastive/minimal-pairs`¶

`POST /api/contrastive/maximal-opposition/pairs`¶

`POST /api/contrastive/maximal-opposition/word-lists`¶

`POST /api/contrastive/multiple-opposition/targets`¶

`POST /api/contrastive/multiple-opposition/sets`¶

`POST /api/text/analyze`¶

`POST /api/sentences`¶