API Reference¶
PhonoLex provides a public REST API for programmatic access to its full dataset. No API key required.
Base URL: https://phonolex.com/api
Interactive docs: Available at /docs (Swagger UI) and /redoc (ReDoc).
Deployment: Cloudflare Workers + D1 (edge-deployed).
Quick Examples¶
Python¶
import requests
BASE = "https://phonolex.com/api"
# Look up a word
word = requests.get(f"{BASE}/words/cat").json()
print(word["ipa"], word["frequency"], word["concreteness"])
# Search for CVC words with high frequency
results = requests.post(f"{BASE}/words/search", json={
"patterns": [{"type": "STARTS_WITH", "phoneme": "k"}],
"filters": {"min_frequency": 50, "max_syllable_count": 1},
"sort_by": "frequency",
"limit": 20
}).json()
for w in results["items"]:
print(w["word"], w["frequency"])
curl¶
# Health check
curl https://phonolex.com/api/health
# Word lookup
curl https://phonolex.com/api/words/cat
# Search
curl -X POST https://phonolex.com/api/words/search \
-H "Content-Type: application/json" \
-d '{"filters": {"min_concreteness": 4.5, "max_syllable_count": 2}, "limit": 10}'
R¶
library(httr)
library(jsonlite)
base <- "https://phonolex.com/api"
# Word lookup
word <- fromJSON(content(GET(paste0(base, "/words/cat")), "text"))
# Batch lookup
batch <- fromJSON(content(POST(
paste0(base, "/words/batch"),
body = toJSON(list(words = c("cat", "dog", "fish")), auto_unbox = TRUE),
content_type_json()
), "text"))
Endpoints¶
Meta¶
GET /api/health¶
Health check with vocabulary stats.
{
"status": "healthy",
"vocabulary_size": 44011,
"total_edges": 1012327
}
GET /api/stats¶
Full statistics including edge type counts and property coverage.
GET /api/property-metadata¶
Property definitions with labels, categories, sources, and display configuration. Use this to dynamically build UIs or understand what each property means.
GET /api/property-ranges¶
Min/max values for all numeric properties. Useful for building filter sliders.
GET /api/edge-types¶
Edge type definitions with labels and descriptions for the 7 relationship types.
Words¶
GET /api/words/{word}¶
Get full word data with all properties and percentile ranks.
Example: GET /api/words/cat
{
"word": "cat",
"ipa": "kæt",
"phonemes": ["k", "æ", "t"],
"syllables": [{"onset": ["k"], "nucleus": "æ", "coda": ["t"], "stress": 1}],
"phoneme_count": 3,
"syllable_count": 1,
"frequency": 57.39,
"frequency_percentile": 95.8,
"concreteness": 5.0,
"concreteness_percentile": 91.2,
"valence": 6.34,
"aoa": 3.72,
...
}
All properties are returned (null if unavailable for that word), plus {property}_percentile fields (0–100, cumulative percentile rank). 35 properties are filterable via the API; additional structural/derived fields are also included.
GET /api/words¶
Browse the vocabulary with pagination and sorting.
| Parameter | Type | Default | Description |
|---|---|---|---|
sort_by |
string | null | Property to sort by (e.g. frequency, aoa) |
sort_order |
string | desc |
asc or desc |
limit |
int | 50 | Max items (1–5000) |
offset |
int | 0 | Items to skip |
Response: { items: [...], total: 44011, offset: 0, limit: 50 }
POST /api/words/search¶
Unified search combining phoneme patterns, property filters, exclusion rules, sorting, and pagination. This is the primary search endpoint.
Request body:
{
"patterns": [
{"type": "STARTS_WITH", "phoneme": "k"},
{"type": "ENDS_WITH", "phoneme": "t"}
],
"filters": {
"min_frequency": 10,
"max_syllable_count": 2,
"min_concreteness": 3.0
},
"exclude_phonemes": ["ʃ", "ʒ"],
"sort_by": "frequency",
"sort_order": "desc",
"limit": 50,
"offset": 0
}
Pattern types:
| Type | Description | Example |
|---|---|---|
STARTS_WITH |
Word begins with phoneme(s) | "k" matches cat, keep, kind |
ENDS_WITH |
Word ends with phoneme(s) | "t" matches cat, sit, want |
CONTAINS |
Word contains phoneme(s) anywhere | "æ" matches cat, bat, happy |
CONTAINS_MEDIAL |
Contains phoneme(s) in medial position | "æ" matches happy (not cat) |
Phonemes use IPA notation. Multiple phonemes in a sequence are space-separated: "s t" matches words containing the /st/ cluster.
Filter fields: min_{property} and max_{property} for any of the 35 filterable properties. Multiple filters use AND logic. See GET /api/property-metadata for the full list.
Response: Same paginated format as GET /api/words.
POST /api/words/batch¶
Look up multiple words at once. Unknown words are silently omitted.
{"words": ["cat", "dog", "fish", "xyzzy"]}
Returns an array of word objects (max 1000 words per request).
Similarity¶
POST /api/similarity/search¶
Find phonologically similar words using soft Levenshtein distance on learned feature vectors.
Request body:
{
"word": "cat",
"threshold": 0.7,
"limit": 20,
"onset_weight": 0.33,
"nucleus_weight": 0.33,
"coda_weight": 0.33
}
| Parameter | Type | Default | Description |
|---|---|---|---|
word |
string | required | Target word |
threshold |
float | 0.7 | Minimum similarity (0–1) |
limit |
int | 50 | Max results (1–500) |
onset_weight |
float | 0.33 | Weight for onset similarity |
nucleus_weight |
float | 0.33 | Weight for nucleus similarity |
coda_weight |
float | 0.33 | Weight for coda similarity |
Weight presets:
| Preset | Onset | Nucleus | Coda | Use case |
|---|---|---|---|---|
| Balanced | 0.33 | 0.33 | 0.33 | Overall similarity |
| Rhymes | 0.0 | 0.5 | 0.5 | Rhyming words |
| Alliteration | 1.0 | 0.5 | 0.0 | Same initial sound |
| Assonance | 0.0 | 1.0 | 0.0 | Matching vowels |
| Consonance | 0.5 | 0.0 | 0.5 | Matching consonants |
Response:
[
{
"word": { "word": "bat", "ipa": "bæt", ... },
"similarity": 0.92
},
...
]
Associations¶
GET /api/associations/{word}¶
Get cognitive associations from the graph. Returns edges from up to 6 relationship types.
| Parameter | Type | Default | Description |
|---|---|---|---|
edge_types |
string | all | Comma-separated types: USF, MEN, ECCC, SPP, SimLex, WordSim |
limit |
int | 50 | Max edges |
offset |
int | 0 | Pagination offset |
Example: GET /api/associations/cat?edge_types=USF,ECCC&limit=10
Response:
{
"word": "cat",
"associations": [
{
"target": "dog",
"edge_sources": ["USF"],
"in_vocabulary": true,
"usf_forward": 0.178
},
...
],
"total": 12,
"edge_type_counts": {"USF": 5, "ECCC": 2}
}
GET /api/associations/{word}/confusability¶
Get ECCC perceptual confusability edges only (words confused in noise).
GET /api/associations/compare¶
Compare shared associations between two words.
| Parameter | Type | Description |
|---|---|---|
word1 |
string | First word |
word2 |
string | Second word |
Returns shared targets, Jaccard similarity, and degree for each word.
Phonemes¶
GET /api/phonemes¶
List all 39 English phonemes with their articulatory features.
GET /api/phonemes/{ipa}¶
Get features for a single phoneme (38 distinctive features). ASCII g is automatically normalized to IPA ɡ (U+0261).
Example: GET /api/phonemes/k
{
"ipa": "k",
"type": "consonant",
"features": {
"consonantal": "+",
"sonorant": "-",
"continuant": "-",
"dorsal": "+",
...
}
}
POST /api/phonemes/compare¶
Compare two phonemes feature by feature.
{"phoneme1": "k", "phoneme2": "ɡ"}
Returns shared features, differing features, and a similarity score.
POST /api/phonemes/search¶
Find phonemes matching specific feature values.
{"features": {"consonantal": "+", "dorsal": "+", "sonorant": "-"}}
Returns all phonemes matching the given feature constraints.
Contrastive Sets¶
POST /api/contrastive/minimal-pairs¶
Find minimal pairs for a phoneme contrast.
{
"phoneme1": "k",
"phoneme2": "ɡ",
"position": "initial",
"limit": 20
}
| Parameter | Type | Default | Description |
|---|---|---|---|
phoneme1 |
string | required | First phoneme (IPA) |
phoneme2 |
string | required | Second phoneme (IPA) |
position |
string | null | initial, medial, final, or null for any |
limit |
int | 50 | Max pairs (1–500) |
Response:
[
{
"word1": { "word": "cap", ... },
"word2": { "word": "gap", ... },
"position": 0,
"phoneme1": "k",
"phoneme2": "ɡ"
},
...
]
POST /api/contrastive/maximal-opposition/pairs¶
Generate maximally opposed phoneme pairs from a list of unknown phonemes (Gierut 1989–1992). Returns pairs ranked by feature distance.
{
"unknown_phonemes": ["k", "ɡ", "t", "d"],
"top_n": 5
}
POST /api/contrastive/maximal-opposition/word-lists¶
Find word pairs for a specific maximal opposition phoneme pair.
{
"phoneme1": "k",
"phoneme2": "m",
"position": "initial",
"max_pairs": 10
}
POST /api/contrastive/multiple-opposition/targets¶
Select representative target phonemes for multiple opposition therapy (Maximal Classification + Maximal Distinction).
{
"substitute_phoneme": "t",
"target_phonemes": ["k", "ɡ", "d", "s"],
"count": 3
}
POST /api/contrastive/multiple-opposition/sets¶
Generate minimal sets (triplets/quadruplets) for multiple opposition therapy.
{
"substitute_phoneme": "t",
"target_phonemes": ["k", "d"],
"position": "initial",
"max_sets": 10
}
Text Analysis¶
POST /api/text/analyze¶
Analyze a passage for phonological and psycholinguistic properties.
{"text": "The quick brown fox jumps over the lazy dog."}
Response:
{
"total_words": 9,
"analyzed_words": 9,
"unknown_words": [],
"coverage_percent": 100.0,
"aggregate_percentiles": {
"frequency_percentile": 89.2,
"concreteness_percentile": 54.1,
"aoa_percentile": 31.7,
...
},
"word_details": [
{
"word": "quick",
"percentiles": {
"frequency_percentile": 82.1,
"concreteness_percentile": 32.5,
...
}
},
...
]
}
aggregate_percentiles are weighted averages across all analyzed words. word_details gives per-word percentiles for highlighting and drill-down.
Properties¶
Properties available on word objects, grouped by category (35 are filterable via the API; additional derived fields are included in responses):
| Category | Properties |
|---|---|
| Phonological Complexity | syllable_count, phoneme_count, wcm_score |
| Phonotactic Probability | phono_prob_avg, positional_prob_avg |
| Lexical | frequency, log_frequency, contextual_diversity, prevalence, aoa, aoa_kuperman, elp_lexical_decision_rt |
| Semantic | imageability, familiarity, concreteness, size |
| Affective | valence, arousal, dominance |
| Cognitive / Embodied | iconicity, boi, socialness |
| Sensorimotor — Perceptual | auditory, visual, haptic, gustatory, olfactory, interoceptive |
| Sensorimotor — Action | hand_arm, foot_leg, head, mouth, torso |
| Morphological | morpheme_count, is_monomorphemic, n_prefixes, n_suffixes |
Each numeric property also has a {property}_percentile field (0–100) representing the cumulative percentile rank within the vocabulary.
Error Handling¶
| Status | Meaning |
|---|---|
200 |
Success |
404 |
Word or phoneme not found |
422 |
Validation error (bad request body) |
429 |
Rate limit exceeded (check Retry-After header) |
500 |
Server error |
Error responses include a detail field with a human-readable message.