Skip to content

API Reference

PhonoLex provides a public REST API for programmatic access to its full dataset. No API key required.

Base URL: https://phonolex.com/api

Interactive docs: Available at /docs (Swagger UI) and /redoc (ReDoc).

Deployment: Cloudflare Workers + D1 (edge-deployed).


Quick Examples

Python

import requests

BASE = "https://phonolex.com/api"

# Look up a word
word = requests.get(f"{BASE}/words/cat").json
print(word["ipa"], word["frequency"], word["concreteness"])

# Search for CVC words with high frequency
results = requests.post(f"{BASE}/words/search", json={
 "patterns": [{"type": "STARTS_WITH", "phoneme": "k"}],
 "filters": {"min_frequency": 50, "max_syllable_count": 1},
 "sort_by": "frequency",
 "limit": 20
}).json
for w in results["items"]:
 print(w["word"], w["frequency"])

curl

# Health check
curl https://phonolex.com/api/health

# Word lookup
curl https://phonolex.com/api/words/cat

# Search
curl -X POST https://phonolex.com/api/words/search \
 -H "Content-Type: application/json" \
 -d '{"filters": {"min_concreteness": 4.5, "max_syllable_count": 2}, "limit": 10}'

R

library(httr)
library(jsonlite)

base <- "https://phonolex.com/api"

# Word lookup
word <- fromJSON(content(GET(paste0(base, "/words/cat")), "text"))

# Batch lookup
batch <- fromJSON(content(POST(
 paste0(base, "/words/batch"),
 body = toJSON(list(words = c("cat", "dog", "fish")), auto_unbox = TRUE),
 content_type_json
), "text"))

Endpoints

Meta

GET /api/health

Health check with vocabulary stats.

{
 "status": "healthy",
 "vocabulary_size": 44011,
 "total_edges": 1012327
}

GET /api/stats

Full statistics including edge type counts and property coverage.

GET /api/property-metadata

Property definitions with labels, categories, sources, and display configuration. Use this to dynamically build UIs or understand what each property means.

GET /api/property-ranges

Min/max values for all numeric properties. Useful for building filter sliders.

GET /api/edge-types

Edge type definitions with labels and descriptions for the 7 relationship types.


Words

GET /api/words/{word}

Get full word data with all properties and percentile ranks.

Example: GET /api/words/cat

{
 "word": "cat",
 "ipa": "kæt",
 "phonemes": ["k", "æ", "t"],
 "syllables": [{"onset": ["k"], "nucleus": "æ", "coda": ["t"], "stress": 1}],
 "phoneme_count": 3,
 "syllable_count": 1,
 "frequency": 57.39,
 "frequency_percentile": 95.8,
 "concreteness": 5.0,
 "concreteness_percentile": 91.2,
 "valence": 6.34,
 "aoa": 3.72,
 ...
}

All properties are returned (null if unavailable for that word), plus {property}_percentile fields (0–100, cumulative percentile rank). 35 properties are filterable via the API; additional structural/derived fields are also included.

GET /api/words

Browse the vocabulary with pagination and sorting.

Parameter Type Default Description
sort_by string null Property to sort by (e.g. frequency, aoa)
sort_order string desc asc or desc
limit int 50 Max items (1–5000)
offset int 0 Items to skip

Response: { items: [...], total: 44011, offset: 0, limit: 50 }

POST /api/words/search

Unified search combining phoneme patterns, property filters, exclusion rules, sorting, and pagination. This is the primary search endpoint.

Request body:

{
 "patterns": [
 {"type": "STARTS_WITH", "phoneme": "k"},
 {"type": "ENDS_WITH", "phoneme": "t"}
 ],
 "filters": {
 "min_frequency": 10,
 "max_syllable_count": 2,
 "min_concreteness": 3.0
 },
 "exclude_phonemes": ["ʃ", "ʒ"],
 "sort_by": "frequency",
 "sort_order": "desc",
 "limit": 50,
 "offset": 0
}

Pattern types:

Type Description Example
STARTS_WITH Word begins with phoneme(s) "k" matches cat, keep, kind
ENDS_WITH Word ends with phoneme(s) "t" matches cat, sit, want
CONTAINS Word contains phoneme(s) anywhere "æ" matches cat, bat, happy
CONTAINS_MEDIAL Contains phoneme(s) in medial position "æ" matches happy (not cat)

Phonemes use IPA notation. Multiple phonemes in a sequence are space-separated: "s t" matches words containing the /st/ cluster.

Filter fields: min_{property} and max_{property} for any of the 35 filterable properties. Multiple filters use AND logic. See GET /api/property-metadata for the full list.

Response: Same paginated format as GET /api/words.

POST /api/words/batch

Look up multiple words at once. Unknown words are silently omitted.

{"words": ["cat", "dog", "fish", "xyzzy"]}

Returns an array of word objects (max 1000 words per request).

POST /api/words/word-list

Resolve constraints to a flat word list. Lightweight alternative to search when the caller only needs words, not full records.

Request body:

{
 "include_phonemes": ["k"],
 "exclude_phonemes": ["ɹ"],
 "filters": {
 "min_concreteness": 3.0,
 "max_aoa": 7.0
 }
}

All fields are optional but at least one must be provided. include_phonemes uses OR logic (word contains ANY listed phoneme). exclude_phonemes uses AND logic (word contains NONE of the listed phonemes). Filters use the same min_/max_ convention as the search endpoint.

Response:

{
 "words": ["cat", "dog", "fish", ...],
 "total": 12847
}

Similarity

POST /api/similarity/search

Find phonologically similar words using soft Levenshtein distance on learned feature vectors.

Request body:

{
 "word": "cat",
 "threshold": 0.7,
 "limit": 20,
 "onset_weight": 0.33,
 "nucleus_weight": 0.33,
 "coda_weight": 0.33
}
Parameter Type Default Description
word string required Target word
threshold float 0.7 Minimum similarity (0–1)
limit int 50 Max results (1–500)
onset_weight float 0.33 Weight for onset similarity
nucleus_weight float 0.33 Weight for nucleus similarity
coda_weight float 0.33 Weight for coda similarity

Weight presets:

Preset Onset Nucleus Coda Use case
Balanced 0.33 0.33 0.33 Overall similarity
Rhymes 0.0 0.5 0.5 Rhyming words
Alliteration 1.0 0.5 0.0 Same initial sound
Assonance 0.0 1.0 0.0 Matching vowels
Consonance 0.5 0.0 0.5 Matching consonants

Response:

[
 {
 "word": { "word": "bat", "ipa": "bæt", ... },
 "similarity": 0.92
 },
 ...
]

Associations

GET /api/associations/{word}

Get cognitive associations from the graph. Returns edges from up to 6 relationship types.

Parameter Type Default Description
edge_types string all Comma-separated types: USF, MEN, ECCC, SPP, SimLex, WordSim
limit int 50 Max edges
offset int 0 Pagination offset

Example: GET /api/associations/cat?edge_types=USF,ECCC&limit=10

Response:

{
 "word": "cat",
 "associations": [
 {
 "target": "dog",
 "edge_sources": ["USF"],
 "in_vocabulary": true,
 "usf_forward": 0.178
 },
 ...
 ],
 "total": 12,
 "edge_type_counts": {"USF": 5, "ECCC": 2}
}

GET /api/associations/{word}/confusability

Get ECCC perceptual confusability edges only (words confused in noise).

GET /api/associations/compare

Compare shared associations between two words.

Parameter Type Description
word1 string First word
word2 string Second word

Returns shared targets, Jaccard similarity, and degree for each word.


Phonemes

GET /api/phonemes

List all 39 English phonemes with their articulatory features.

GET /api/phonemes/{ipa}

Get features for a single phoneme (38 distinctive features). ASCII g is automatically normalized to IPA ɡ (U+0261).

Example: GET /api/phonemes/k

{
 "ipa": "k",
 "type": "consonant",
 "features": {
 "consonantal": "+",
 "sonorant": "-",
 "continuant": "-",
 "dorsal": "+",
 ...
 }
}

POST /api/phonemes/compare

Compare two phonemes feature by feature.

{"phoneme1": "k", "phoneme2": "ɡ"}

Returns shared features, differing features, and a similarity score.

POST /api/phonemes/search

Find phonemes matching specific feature values.

{"features": {"consonantal": "+", "dorsal": "+", "sonorant": "-"}}

Returns all phonemes matching the given feature constraints.


Contrastive Sets

POST /api/contrastive/minimal-pairs

Find minimal pairs for a phoneme contrast.

{
 "phoneme1": "k",
 "phoneme2": "ɡ",
 "position": "initial",
 "limit": 20
}
Parameter Type Default Description
phoneme1 string required First phoneme (IPA)
phoneme2 string required Second phoneme (IPA)
position string null initial, medial, final, or null for any
limit int 50 Max pairs (1–500)

Response:

[
 {
 "word1": { "word": "cap", ... },
 "word2": { "word": "gap", ... },
 "position": 0,
 "phoneme1": "k",
 "phoneme2": "ɡ"
 },
 ...
]

POST /api/contrastive/maximal-opposition/pairs

Generate maximally opposed phoneme pairs from a list of unknown phonemes (Gierut 1989–1992). Returns pairs ranked by feature distance.

{
 "unknown_phonemes": ["k", "ɡ", "t", "d"],
 "top_n": 5
}

POST /api/contrastive/maximal-opposition/word-lists

Find word pairs for a specific maximal opposition phoneme pair.

{
 "phoneme1": "k",
 "phoneme2": "m",
 "position": "initial",
 "max_pairs": 10
}

POST /api/contrastive/multiple-opposition/targets

Select representative target phonemes for multiple opposition therapy (Maximal Classification + Maximal Distinction).

{
 "substitute_phoneme": "t",
 "target_phonemes": ["k", "ɡ", "d", "s"],
 "count": 3
}

POST /api/contrastive/multiple-opposition/sets

Generate minimal sets (triplets/quadruplets) for multiple opposition therapy.

{
 "substitute_phoneme": "t",
 "target_phonemes": ["k", "d"],
 "position": "initial",
 "max_sets": 10
}

Text Analysis

POST /api/text/analyze

Analyze a passage for phonological and psycholinguistic properties.

{"text": "The quick brown fox jumps over the lazy dog."}

Response:

{
 "total_words": 9,
 "analyzed_words": 9,
 "unknown_words": [],
 "coverage_percent": 100.0,
 "aggregate_percentiles": {
 "frequency_percentile": 89.2,
 "concreteness_percentile": 54.1,
 "aoa_percentile": 31.7,
 ...
 },
 "word_details": [
 {
 "word": "quick",
 "percentiles": {
 "frequency_percentile": 82.1,
 "concreteness_percentile": 32.5,
 ...
 }
 },
 ...
 ]
}

aggregate_percentiles are weighted averages across all analyzed words. word_details gives per-word percentiles for highlighting and drill-down.


Sentences

POST /api/sentences

Retrieve naturalistic English sentences satisfying a constraint set. Sentences are drawn from the curated ~236K-sentence corpus (CoLA, UD English-EWT, GUM, Tatoeba, OpenSubtitles), gated at corpus-build time for SLP suitability.

Request body:

{
 "constraints": [
 {"type": "pattern", "pattern_type": "STARTS_WITH", "phonemes": ["k"], "mode": "include"},
 {"type": "pattern", "pattern_type": "CONTAINS", "phonemes": ["ɹ"], "mode": "exclude"},
 {"type": "bound", "norm": "freq_age_2y_percentile", "min_value": 40},
 {"type": "contrastive_minpair", "phoneme1": "b", "phoneme2": "d", "position": "initial"}
 ],
 "top_k": 50
}

Constraint types:

Type Semantics Parameters
pattern Phoneme position match against words in the sentence (STARTS_WITH / ENDS_WITH / CONTAINS / CONTAINS_MEDIAL) pattern_type, phonemes, mode (include / exclude)
bound Psycholinguistic-norm threshold per content word. Raw norms use NULL-pass; *_percentile properties use NULL-fail. norm, min_value / max_value
contrastive_minpair Sentence must contain BOTH members of a minimal pair witness through the pairs table phoneme1, phoneme2, optional position
contrastive_maxopp Minpair + sonorant-class crossing phoneme1, phoneme2, optional position, optional min_sonorant_diff
contrastive_multopp Sentence must contain a substitute word + contrast partners covering ≥n_targets distinct target phonemes. Accepted for API completeness but not surfaced in the Sentences UI — rarely witnessable in a single sentence for n_targets ≥ 2. substitute, targets, optional n_targets, optional position

Ranking: tiered globally by match_count (per-query count of distinct words satisfying include / contrastive rules), then source-interleaved within tier by static rarity_score. Multi-hit sentences come back ahead of single-hit sentences regardless of source.

Response:

{
 "corpus_matches": [
 {
 "text": "She drained the bath and the brain trust took notes.",
 "sources": ["opensubtitles"],
 "rarity_score": 0.0287,
 "match_count": 2,
 "n_content_in_vocab": 7,
 "highlights": {
 "include_surfaces": [],
 "pair_surfaces": ["brain", "drain"]
 }
 }
 ],
 "total": 1,
 "elapsed_ms": {"corpus": 184, "total": 184}
}

highlights.pair_surfaces carries both members of every witnessed minpair/maxopp/multopp pair, so a UI overlay can underline the contrast at a glance. highlights.include_surfaces carries the surfaces matching any surface-include rule (patterns, CV shape). Both arrays are present (empty when no rules are active).


Properties

Properties available on word objects, grouped by category. Around 150 columns total — most are filterable via /api/words/search and /api/sentences bound rules. Each numeric property also has a {property}_percentile field (0–100); frequency-class percentiles treat value=0 as NULL.

Category Properties
Phonological Complexity syllable_count, phoneme_count, wcm_score, cv_shape
Phonotactic Probability phono_prob_avg, positional_prob_avg, str_phono_prob_avg, str_positional_prob_avg, neighborhood_density, str_neighborhood_density
Lexical Frequency frequency, log_frequency, contextual_diversity (PhonoLex FineWeb-Edu derivation)
Developmental Frequency freq_age_2y, freq_age_5y, freq_age_8y, freq_age_12y (child PRODUCTION from CHILDES + PhonBank), freq_age_all (alias for frequency)
Child-Corpus Frequency freq_cyplex_7_9, freq_cyplex_10_12, freq_cyplex_13 (CYP-LEX)
Lexical Timing aoa (PhonoLex in-house gpt-4.1-mini cloze; 1-7 age-banded, Spearman 0.868 vs Glasgow)
Semantic imageability, familiarity, concreteness, boi, iconicity, socialness, semantic_diversity, semd_topic, semd_vn, semd_h13, n_topics_for_word (PhonoLex in-house gpt-4.1-mini)
Affective valence, arousal (PhonoLex in-house, Warriner-scale)
Morphological morpheme_count, is_monomorphemic, n_prefixes, n_suffixes (algorithmic + MorphyNet)
POS pos_dominant_freq

Retired columns: dominance (Warriner D axis was never re-derived), prevalence, aoa_kuperman, elp_lexical_decision_rt, Lancaster sensorimotor channels (auditory, visual, haptic, gustatory, olfactory, interoceptive, hand_arm, foot_leg, head, mouth, torso), size, freq_age_adult (renamed to freq_age_all).

Error Handling

Status Meaning
200 Success
404 Word or phoneme not found
422 Validation error (bad request body)
429 Rate limit exceeded (check Retry-After header)
500 Server error

Error responses include a detail field with a human-readable message.