Skip to content

API Reference

PhonoLex provides a public REST API for programmatic access to its full dataset. No API key required.

Base URL: https://phonolex.com/api

Interactive docs: Available at /docs (Swagger UI) and /redoc (ReDoc).

Deployment: Cloudflare Workers + D1 (edge-deployed).


Quick Examples

Python

import requests

BASE = "https://phonolex.com/api"

# Look up a word
word = requests.get(f"{BASE}/words/cat").json()
print(word["ipa"], word["frequency"], word["concreteness"])

# Search for CVC words with high frequency
results = requests.post(f"{BASE}/words/search", json={
    "patterns": [{"type": "STARTS_WITH", "phoneme": "k"}],
    "filters": {"min_frequency": 50, "max_syllable_count": 1},
    "sort_by": "frequency",
    "limit": 20
}).json()
for w in results["items"]:
    print(w["word"], w["frequency"])

curl

# Health check
curl https://phonolex.com/api/health

# Word lookup
curl https://phonolex.com/api/words/cat

# Search
curl -X POST https://phonolex.com/api/words/search \
  -H "Content-Type: application/json" \
  -d '{"filters": {"min_concreteness": 4.5, "max_syllable_count": 2}, "limit": 10}'

R

library(httr)
library(jsonlite)

base <- "https://phonolex.com/api"

# Word lookup
word <- fromJSON(content(GET(paste0(base, "/words/cat")), "text"))

# Batch lookup
batch <- fromJSON(content(POST(
  paste0(base, "/words/batch"),
  body = toJSON(list(words = c("cat", "dog", "fish")), auto_unbox = TRUE),
  content_type_json()
), "text"))

Endpoints

Meta

GET /api/health

Health check with vocabulary stats.

{
  "status": "healthy",
  "vocabulary_size": 44011,
  "total_edges": 1012327
}

GET /api/stats

Full statistics including edge type counts and property coverage.

GET /api/property-metadata

Property definitions with labels, categories, sources, and display configuration. Use this to dynamically build UIs or understand what each property means.

GET /api/property-ranges

Min/max values for all numeric properties. Useful for building filter sliders.

GET /api/edge-types

Edge type definitions with labels and descriptions for the 7 relationship types.


Words

GET /api/words/{word}

Get full word data with all properties and percentile ranks.

Example: GET /api/words/cat

{
  "word": "cat",
  "ipa": "kæt",
  "phonemes": ["k", "æ", "t"],
  "syllables": [{"onset": ["k"], "nucleus": "æ", "coda": ["t"], "stress": 1}],
  "phoneme_count": 3,
  "syllable_count": 1,
  "frequency": 57.39,
  "frequency_percentile": 95.8,
  "concreteness": 5.0,
  "concreteness_percentile": 91.2,
  "valence": 6.34,
  "aoa": 3.72,
  ...
}

All properties are returned (null if unavailable for that word), plus {property}_percentile fields (0–100, cumulative percentile rank). 35 properties are filterable via the API; additional structural/derived fields are also included.

GET /api/words

Browse the vocabulary with pagination and sorting.

Parameter Type Default Description
sort_by string null Property to sort by (e.g. frequency, aoa)
sort_order string desc asc or desc
limit int 50 Max items (1–5000)
offset int 0 Items to skip

Response: { items: [...], total: 44011, offset: 0, limit: 50 }

POST /api/words/search

Unified search combining phoneme patterns, property filters, exclusion rules, sorting, and pagination. This is the primary search endpoint.

Request body:

{
  "patterns": [
    {"type": "STARTS_WITH", "phoneme": "k"},
    {"type": "ENDS_WITH", "phoneme": "t"}
  ],
  "filters": {
    "min_frequency": 10,
    "max_syllable_count": 2,
    "min_concreteness": 3.0
  },
  "exclude_phonemes": ["ʃ", "ʒ"],
  "sort_by": "frequency",
  "sort_order": "desc",
  "limit": 50,
  "offset": 0
}

Pattern types:

Type Description Example
STARTS_WITH Word begins with phoneme(s) "k" matches cat, keep, kind
ENDS_WITH Word ends with phoneme(s) "t" matches cat, sit, want
CONTAINS Word contains phoneme(s) anywhere "æ" matches cat, bat, happy
CONTAINS_MEDIAL Contains phoneme(s) in medial position "æ" matches happy (not cat)

Phonemes use IPA notation. Multiple phonemes in a sequence are space-separated: "s t" matches words containing the /st/ cluster.

Filter fields: min_{property} and max_{property} for any of the 35 filterable properties. Multiple filters use AND logic. See GET /api/property-metadata for the full list.

Response: Same paginated format as GET /api/words.

POST /api/words/batch

Look up multiple words at once. Unknown words are silently omitted.

{"words": ["cat", "dog", "fish", "xyzzy"]}

Returns an array of word objects (max 1000 words per request).


Similarity

POST /api/similarity/search

Find phonologically similar words using soft Levenshtein distance on learned feature vectors.

Request body:

{
  "word": "cat",
  "threshold": 0.7,
  "limit": 20,
  "onset_weight": 0.33,
  "nucleus_weight": 0.33,
  "coda_weight": 0.33
}
Parameter Type Default Description
word string required Target word
threshold float 0.7 Minimum similarity (0–1)
limit int 50 Max results (1–500)
onset_weight float 0.33 Weight for onset similarity
nucleus_weight float 0.33 Weight for nucleus similarity
coda_weight float 0.33 Weight for coda similarity

Weight presets:

Preset Onset Nucleus Coda Use case
Balanced 0.33 0.33 0.33 Overall similarity
Rhymes 0.0 0.5 0.5 Rhyming words
Alliteration 1.0 0.5 0.0 Same initial sound
Assonance 0.0 1.0 0.0 Matching vowels
Consonance 0.5 0.0 0.5 Matching consonants

Response:

[
  {
    "word": { "word": "bat", "ipa": "bæt", ... },
    "similarity": 0.92
  },
  ...
]

Associations

GET /api/associations/{word}

Get cognitive associations from the graph. Returns edges from up to 6 relationship types.

Parameter Type Default Description
edge_types string all Comma-separated types: USF, MEN, ECCC, SPP, SimLex, WordSim
limit int 50 Max edges
offset int 0 Pagination offset

Example: GET /api/associations/cat?edge_types=USF,ECCC&limit=10

Response:

{
  "word": "cat",
  "associations": [
    {
      "target": "dog",
      "edge_sources": ["USF"],
      "in_vocabulary": true,
      "usf_forward": 0.178
    },
    ...
  ],
  "total": 12,
  "edge_type_counts": {"USF": 5, "ECCC": 2}
}

GET /api/associations/{word}/confusability

Get ECCC perceptual confusability edges only (words confused in noise).

GET /api/associations/compare

Compare shared associations between two words.

Parameter Type Description
word1 string First word
word2 string Second word

Returns shared targets, Jaccard similarity, and degree for each word.


Phonemes

GET /api/phonemes

List all 39 English phonemes with their articulatory features.

GET /api/phonemes/{ipa}

Get features for a single phoneme (38 distinctive features). ASCII g is automatically normalized to IPA ɡ (U+0261).

Example: GET /api/phonemes/k

{
  "ipa": "k",
  "type": "consonant",
  "features": {
    "consonantal": "+",
    "sonorant": "-",
    "continuant": "-",
    "dorsal": "+",
    ...
  }
}

POST /api/phonemes/compare

Compare two phonemes feature by feature.

{"phoneme1": "k", "phoneme2": "ɡ"}

Returns shared features, differing features, and a similarity score.

POST /api/phonemes/search

Find phonemes matching specific feature values.

{"features": {"consonantal": "+", "dorsal": "+", "sonorant": "-"}}

Returns all phonemes matching the given feature constraints.


Contrastive Sets

POST /api/contrastive/minimal-pairs

Find minimal pairs for a phoneme contrast.

{
  "phoneme1": "k",
  "phoneme2": "ɡ",
  "position": "initial",
  "limit": 20
}
Parameter Type Default Description
phoneme1 string required First phoneme (IPA)
phoneme2 string required Second phoneme (IPA)
position string null initial, medial, final, or null for any
limit int 50 Max pairs (1–500)

Response:

[
  {
    "word1": { "word": "cap", ... },
    "word2": { "word": "gap", ... },
    "position": 0,
    "phoneme1": "k",
    "phoneme2": "ɡ"
  },
  ...
]

POST /api/contrastive/maximal-opposition/pairs

Generate maximally opposed phoneme pairs from a list of unknown phonemes (Gierut 1989–1992). Returns pairs ranked by feature distance.

{
  "unknown_phonemes": ["k", "ɡ", "t", "d"],
  "top_n": 5
}

POST /api/contrastive/maximal-opposition/word-lists

Find word pairs for a specific maximal opposition phoneme pair.

{
  "phoneme1": "k",
  "phoneme2": "m",
  "position": "initial",
  "max_pairs": 10
}

POST /api/contrastive/multiple-opposition/targets

Select representative target phonemes for multiple opposition therapy (Maximal Classification + Maximal Distinction).

{
  "substitute_phoneme": "t",
  "target_phonemes": ["k", "ɡ", "d", "s"],
  "count": 3
}

POST /api/contrastive/multiple-opposition/sets

Generate minimal sets (triplets/quadruplets) for multiple opposition therapy.

{
  "substitute_phoneme": "t",
  "target_phonemes": ["k", "d"],
  "position": "initial",
  "max_sets": 10
}

Text Analysis

POST /api/text/analyze

Analyze a passage for phonological and psycholinguistic properties.

{"text": "The quick brown fox jumps over the lazy dog."}

Response:

{
  "total_words": 9,
  "analyzed_words": 9,
  "unknown_words": [],
  "coverage_percent": 100.0,
  "aggregate_percentiles": {
    "frequency_percentile": 89.2,
    "concreteness_percentile": 54.1,
    "aoa_percentile": 31.7,
    ...
  },
  "word_details": [
    {
      "word": "quick",
      "percentiles": {
        "frequency_percentile": 82.1,
        "concreteness_percentile": 32.5,
        ...
      }
    },
    ...
  ]
}

aggregate_percentiles are weighted averages across all analyzed words. word_details gives per-word percentiles for highlighting and drill-down.


Properties

Properties available on word objects, grouped by category (35 are filterable via the API; additional derived fields are included in responses):

Category Properties
Phonological Complexity syllable_count, phoneme_count, wcm_score
Phonotactic Probability phono_prob_avg, positional_prob_avg
Lexical frequency, log_frequency, contextual_diversity, prevalence, aoa, aoa_kuperman, elp_lexical_decision_rt
Semantic imageability, familiarity, concreteness, size
Affective valence, arousal, dominance
Cognitive / Embodied iconicity, boi, socialness
Sensorimotor — Perceptual auditory, visual, haptic, gustatory, olfactory, interoceptive
Sensorimotor — Action hand_arm, foot_leg, head, mouth, torso
Morphological morpheme_count, is_monomorphemic, n_prefixes, n_suffixes

Each numeric property also has a {property}_percentile field (0–100) representing the cumulative percentile rank within the vocabulary.

Error Handling

Status Meaning
200 Success
404 Word or phoneme not found
422 Validation error (bad request body)
429 Rate limit exceeded (check Retry-After header)
500 Server error

Error responses include a detail field with a human-readable message.