API Reference¶
PhonoLex provides a public REST API for programmatic access to its full dataset. No API key required.
Base URL: https://phonolex.com/api
Interactive docs: Available at /docs (Swagger UI) and /redoc (ReDoc).
Deployment: Cloudflare Workers + D1 (edge-deployed).
Quick Examples¶
Python¶
import requests
BASE = "https://phonolex.com/api"
# Look up a word
word = requests.get(f"{BASE}/words/cat").json
print(word["ipa"], word["frequency"], word["concreteness"])
# Search for CVC words with high frequency
results = requests.post(f"{BASE}/words/search", json={
"patterns": [{"type": "STARTS_WITH", "phoneme": "k"}],
"filters": {"min_frequency": 50, "max_syllable_count": 1},
"sort_by": "frequency",
"limit": 20
}).json
for w in results["items"]:
print(w["word"], w["frequency"])
curl¶
# Health check
curl https://phonolex.com/api/health
# Word lookup
curl https://phonolex.com/api/words/cat
# Search
curl -X POST https://phonolex.com/api/words/search \
-H "Content-Type: application/json" \
-d '{"filters": {"min_concreteness": 4.5, "max_syllable_count": 2}, "limit": 10}'
R¶
library(httr)
library(jsonlite)
base <- "https://phonolex.com/api"
# Word lookup
word <- fromJSON(content(GET(paste0(base, "/words/cat")), "text"))
# Batch lookup
batch <- fromJSON(content(POST(
paste0(base, "/words/batch"),
body = toJSON(list(words = c("cat", "dog", "fish")), auto_unbox = TRUE),
content_type_json
), "text"))
Endpoints¶
Meta¶
GET /api/health¶
Health check with vocabulary stats.
{
"status": "healthy",
"vocabulary_size": 44011,
"total_edges": 1012327
}
GET /api/stats¶
Full statistics including edge type counts and property coverage.
GET /api/property-metadata¶
Property definitions with labels, categories, sources, and display configuration. Use this to dynamically build UIs or understand what each property means.
GET /api/property-ranges¶
Min/max values for all numeric properties. Useful for building filter sliders.
GET /api/edge-types¶
Edge type definitions with labels and descriptions for the 7 relationship types.
Words¶
GET /api/words/{word}¶
Get full word data with all properties and percentile ranks.
Example: GET /api/words/cat
{
"word": "cat",
"ipa": "kæt",
"phonemes": ["k", "æ", "t"],
"syllables": [{"onset": ["k"], "nucleus": "æ", "coda": ["t"], "stress": 1}],
"phoneme_count": 3,
"syllable_count": 1,
"frequency": 57.39,
"frequency_percentile": 95.8,
"concreteness": 5.0,
"concreteness_percentile": 91.2,
"valence": 6.34,
"aoa": 3.72,
...
}
All properties are returned (null if unavailable for that word), plus {property}_percentile fields (0–100, cumulative percentile rank). 35 properties are filterable via the API; additional structural/derived fields are also included.
GET /api/words¶
Browse the vocabulary with pagination and sorting.
| Parameter | Type | Default | Description |
|---|---|---|---|
sort_by |
string | null | Property to sort by (e.g. frequency, aoa) |
sort_order |
string | desc |
asc or desc |
limit |
int | 50 | Max items (1–5000) |
offset |
int | 0 | Items to skip |
Response: { items: [...], total: 44011, offset: 0, limit: 50 }
POST /api/words/search¶
Unified search combining phoneme patterns, property filters, exclusion rules, sorting, and pagination. This is the primary search endpoint.
Request body:
{
"patterns": [
{"type": "STARTS_WITH", "phoneme": "k"},
{"type": "ENDS_WITH", "phoneme": "t"}
],
"filters": {
"min_frequency": 10,
"max_syllable_count": 2,
"min_concreteness": 3.0
},
"exclude_phonemes": ["ʃ", "ʒ"],
"sort_by": "frequency",
"sort_order": "desc",
"limit": 50,
"offset": 0
}
Pattern types:
| Type | Description | Example |
|---|---|---|
STARTS_WITH |
Word begins with phoneme(s) | "k" matches cat, keep, kind |
ENDS_WITH |
Word ends with phoneme(s) | "t" matches cat, sit, want |
CONTAINS |
Word contains phoneme(s) anywhere | "æ" matches cat, bat, happy |
CONTAINS_MEDIAL |
Contains phoneme(s) in medial position | "æ" matches happy (not cat) |
Phonemes use IPA notation. Multiple phonemes in a sequence are space-separated: "s t" matches words containing the /st/ cluster.
Filter fields: min_{property} and max_{property} for any of the 35 filterable properties. Multiple filters use AND logic. See GET /api/property-metadata for the full list.
Response: Same paginated format as GET /api/words.
POST /api/words/batch¶
Look up multiple words at once. Unknown words are silently omitted.
{"words": ["cat", "dog", "fish", "xyzzy"]}
Returns an array of word objects (max 1000 words per request).
POST /api/words/word-list¶
Resolve constraints to a flat word list. Lightweight alternative to search when the caller only needs words, not full records.
Request body:
{
"include_phonemes": ["k"],
"exclude_phonemes": ["ɹ"],
"filters": {
"min_concreteness": 3.0,
"max_aoa": 7.0
}
}
All fields are optional but at least one must be provided. include_phonemes uses OR logic (word contains ANY listed phoneme). exclude_phonemes uses AND logic (word contains NONE of the listed phonemes). Filters use the same min_/max_ convention as the search endpoint.
Response:
{
"words": ["cat", "dog", "fish", ...],
"total": 12847
}
Similarity¶
POST /api/similarity/search¶
Find phonologically similar words using soft Levenshtein distance on learned feature vectors.
Request body:
{
"word": "cat",
"threshold": 0.7,
"limit": 20,
"onset_weight": 0.33,
"nucleus_weight": 0.33,
"coda_weight": 0.33
}
| Parameter | Type | Default | Description |
|---|---|---|---|
word |
string | required | Target word |
threshold |
float | 0.7 | Minimum similarity (0–1) |
limit |
int | 50 | Max results (1–500) |
onset_weight |
float | 0.33 | Weight for onset similarity |
nucleus_weight |
float | 0.33 | Weight for nucleus similarity |
coda_weight |
float | 0.33 | Weight for coda similarity |
Weight presets:
| Preset | Onset | Nucleus | Coda | Use case |
|---|---|---|---|---|
| Balanced | 0.33 | 0.33 | 0.33 | Overall similarity |
| Rhymes | 0.0 | 0.5 | 0.5 | Rhyming words |
| Alliteration | 1.0 | 0.5 | 0.0 | Same initial sound |
| Assonance | 0.0 | 1.0 | 0.0 | Matching vowels |
| Consonance | 0.5 | 0.0 | 0.5 | Matching consonants |
Response:
[
{
"word": { "word": "bat", "ipa": "bæt", ... },
"similarity": 0.92
},
...
]
Associations¶
GET /api/associations/{word}¶
Get cognitive associations from the graph. Returns edges from up to 6 relationship types.
| Parameter | Type | Default | Description |
|---|---|---|---|
edge_types |
string | all | Comma-separated types: USF, MEN, ECCC, SPP, SimLex, WordSim |
limit |
int | 50 | Max edges |
offset |
int | 0 | Pagination offset |
Example: GET /api/associations/cat?edge_types=USF,ECCC&limit=10
Response:
{
"word": "cat",
"associations": [
{
"target": "dog",
"edge_sources": ["USF"],
"in_vocabulary": true,
"usf_forward": 0.178
},
...
],
"total": 12,
"edge_type_counts": {"USF": 5, "ECCC": 2}
}
GET /api/associations/{word}/confusability¶
Get ECCC perceptual confusability edges only (words confused in noise).
GET /api/associations/compare¶
Compare shared associations between two words.
| Parameter | Type | Description |
|---|---|---|
word1 |
string | First word |
word2 |
string | Second word |
Returns shared targets, Jaccard similarity, and degree for each word.
Phonemes¶
GET /api/phonemes¶
List all 39 English phonemes with their articulatory features.
GET /api/phonemes/{ipa}¶
Get features for a single phoneme (38 distinctive features). ASCII g is automatically normalized to IPA ɡ (U+0261).
Example: GET /api/phonemes/k
{
"ipa": "k",
"type": "consonant",
"features": {
"consonantal": "+",
"sonorant": "-",
"continuant": "-",
"dorsal": "+",
...
}
}
POST /api/phonemes/compare¶
Compare two phonemes feature by feature.
{"phoneme1": "k", "phoneme2": "ɡ"}
Returns shared features, differing features, and a similarity score.
POST /api/phonemes/search¶
Find phonemes matching specific feature values.
{"features": {"consonantal": "+", "dorsal": "+", "sonorant": "-"}}
Returns all phonemes matching the given feature constraints.
Contrastive Sets¶
POST /api/contrastive/minimal-pairs¶
Find minimal pairs for a phoneme contrast.
{
"phoneme1": "k",
"phoneme2": "ɡ",
"position": "initial",
"limit": 20
}
| Parameter | Type | Default | Description |
|---|---|---|---|
phoneme1 |
string | required | First phoneme (IPA) |
phoneme2 |
string | required | Second phoneme (IPA) |
position |
string | null | initial, medial, final, or null for any |
limit |
int | 50 | Max pairs (1–500) |
Response:
[
{
"word1": { "word": "cap", ... },
"word2": { "word": "gap", ... },
"position": 0,
"phoneme1": "k",
"phoneme2": "ɡ"
},
...
]
POST /api/contrastive/maximal-opposition/pairs¶
Generate maximally opposed phoneme pairs from a list of unknown phonemes (Gierut 1989–1992). Returns pairs ranked by feature distance.
{
"unknown_phonemes": ["k", "ɡ", "t", "d"],
"top_n": 5
}
POST /api/contrastive/maximal-opposition/word-lists¶
Find word pairs for a specific maximal opposition phoneme pair.
{
"phoneme1": "k",
"phoneme2": "m",
"position": "initial",
"max_pairs": 10
}
POST /api/contrastive/multiple-opposition/targets¶
Select representative target phonemes for multiple opposition therapy (Maximal Classification + Maximal Distinction).
{
"substitute_phoneme": "t",
"target_phonemes": ["k", "ɡ", "d", "s"],
"count": 3
}
POST /api/contrastive/multiple-opposition/sets¶
Generate minimal sets (triplets/quadruplets) for multiple opposition therapy.
{
"substitute_phoneme": "t",
"target_phonemes": ["k", "d"],
"position": "initial",
"max_sets": 10
}
Text Analysis¶
POST /api/text/analyze¶
Analyze a passage for phonological and psycholinguistic properties.
{"text": "The quick brown fox jumps over the lazy dog."}
Response:
{
"total_words": 9,
"analyzed_words": 9,
"unknown_words": [],
"coverage_percent": 100.0,
"aggregate_percentiles": {
"frequency_percentile": 89.2,
"concreteness_percentile": 54.1,
"aoa_percentile": 31.7,
...
},
"word_details": [
{
"word": "quick",
"percentiles": {
"frequency_percentile": 82.1,
"concreteness_percentile": 32.5,
...
}
},
...
]
}
aggregate_percentiles are weighted averages across all analyzed words. word_details gives per-word percentiles for highlighting and drill-down.
Sentences¶
POST /api/sentences¶
Retrieve naturalistic English sentences satisfying a constraint set. Sentences are drawn from the curated ~236K-sentence corpus (CoLA, UD English-EWT, GUM, Tatoeba, OpenSubtitles), gated at corpus-build time for SLP suitability.
Request body:
{
"constraints": [
{"type": "pattern", "pattern_type": "STARTS_WITH", "phonemes": ["k"], "mode": "include"},
{"type": "pattern", "pattern_type": "CONTAINS", "phonemes": ["ɹ"], "mode": "exclude"},
{"type": "bound", "norm": "freq_age_2y_percentile", "min_value": 40},
{"type": "contrastive_minpair", "phoneme1": "b", "phoneme2": "d", "position": "initial"}
],
"top_k": 50
}
Constraint types:
| Type | Semantics | Parameters |
|---|---|---|
pattern |
Phoneme position match against words in the sentence (STARTS_WITH / ENDS_WITH / CONTAINS / CONTAINS_MEDIAL) |
pattern_type, phonemes, mode (include / exclude) |
bound |
Psycholinguistic-norm threshold per content word. Raw norms use NULL-pass; *_percentile properties use NULL-fail. |
norm, min_value / max_value |
contrastive_minpair |
Sentence must contain BOTH members of a minimal pair witness through the pairs table |
phoneme1, phoneme2, optional position |
contrastive_maxopp |
Minpair + sonorant-class crossing | phoneme1, phoneme2, optional position, optional min_sonorant_diff |
contrastive_multopp |
Sentence must contain a substitute word + contrast partners covering ≥n_targets distinct target phonemes. Accepted for API completeness but not surfaced in the Sentences UI — rarely witnessable in a single sentence for n_targets ≥ 2. |
substitute, targets, optional n_targets, optional position |
Ranking: tiered globally by match_count (per-query count of distinct words satisfying include / contrastive rules), then source-interleaved within tier by static rarity_score. Multi-hit sentences come back ahead of single-hit sentences regardless of source.
Response:
{
"corpus_matches": [
{
"text": "She drained the bath and the brain trust took notes.",
"sources": ["opensubtitles"],
"rarity_score": 0.0287,
"match_count": 2,
"n_content_in_vocab": 7,
"highlights": {
"include_surfaces": [],
"pair_surfaces": ["brain", "drain"]
}
}
],
"total": 1,
"elapsed_ms": {"corpus": 184, "total": 184}
}
highlights.pair_surfaces carries both members of every witnessed minpair/maxopp/multopp pair, so a UI overlay can underline the contrast at a glance. highlights.include_surfaces carries the surfaces matching any surface-include rule (patterns, CV shape). Both arrays are present (empty when no rules are active).
Properties¶
Properties available on word objects, grouped by category. Around 150 columns total — most are filterable via /api/words/search and /api/sentences bound rules. Each numeric property also has a {property}_percentile field (0–100); frequency-class percentiles treat value=0 as NULL.
| Category | Properties |
|---|---|
| Phonological Complexity | syllable_count, phoneme_count, wcm_score, cv_shape |
| Phonotactic Probability | phono_prob_avg, positional_prob_avg, str_phono_prob_avg, str_positional_prob_avg, neighborhood_density, str_neighborhood_density |
| Lexical Frequency | frequency, log_frequency, contextual_diversity (PhonoLex FineWeb-Edu derivation) |
| Developmental Frequency | freq_age_2y, freq_age_5y, freq_age_8y, freq_age_12y (child PRODUCTION from CHILDES + PhonBank), freq_age_all (alias for frequency) |
| Child-Corpus Frequency | freq_cyplex_7_9, freq_cyplex_10_12, freq_cyplex_13 (CYP-LEX) |
| Lexical Timing | aoa (PhonoLex in-house gpt-4.1-mini cloze; 1-7 age-banded, Spearman 0.868 vs Glasgow) |
| Semantic | imageability, familiarity, concreteness, boi, iconicity, socialness, semantic_diversity, semd_topic, semd_vn, semd_h13, n_topics_for_word (PhonoLex in-house gpt-4.1-mini) |
| Affective | valence, arousal (PhonoLex in-house, Warriner-scale) |
| Morphological | morpheme_count, is_monomorphemic, n_prefixes, n_suffixes (algorithmic + MorphyNet) |
| POS | pos_dominant_freq |
Retired columns: dominance (Warriner D axis was never re-derived), prevalence, aoa_kuperman, elp_lexical_decision_rt, Lancaster sensorimotor channels (auditory, visual, haptic, gustatory, olfactory, interoceptive, hand_arm, foot_leg, head, mouth, torso), size, freq_age_adult (renamed to freq_age_all).
Error Handling¶
| Status | Meaning |
|---|---|
200 |
Success |
404 |
Word or phoneme not found |
422 |
Validation error (bad request body) |
429 |
Rate limit exceeded (check Retry-After header) |
500 |
Server error |
Error responses include a detail field with a human-readable message.