Word Lists SLP Curation Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Curate the platform Word Lists UI from 30 properties down to 14 under SLP-language labels, fold Sound Similarity in as a composable rule, add cv_shape (CV-skeleton categorical filter) and freq_age_adult (adult-band aggregated headline) — while keeping the API researcher-grade.
Architecture:
- Data layer adds two derivations: cv_shape (string, derived from existing Syllable objects) and freq_age_adult (numeric headline, mean of wpm_b4 + wpm_b5).
- PropertyDef gains platform_visible: boolean and kind: 'numeric' | 'categorical'. /api/property-metadata gains ?surface=platform; 14 properties get tagged.
- /api/words/search gains an optional similar_to block (server-side intersection with similarity ranking) and a cv_shape filter list.
- Frontend restructures Builder.tsx into 5 accordions, lifts preset chips + labeled-threshold pattern from PhonologicalSimilarityTool.tsx into a new reusable SimilarToRule, adds reusable CategoricalRule for cv_shape, then deletes the orphan PhonologicalSimilarityTool.
Tech Stack: Python 3.12 (pipeline, pytest), Polars (parquet), TypeScript (Hono on Cloudflare Workers, vitest, React + MUI), D1 (SQLite).
Spec: docs/superpowers/specs/2026-05-14-word-lists-slp-curation-design.md (commit e38a62c0).
Branch: feature/phon-116-naturalness-scorer (general catch-all per user direction; pile commits here).
File map¶
Created:
- packages/web/frontend/src/components/shared/SimilarToRule.tsx
- packages/web/frontend/src/components/shared/CategoricalRule.tsx
- packages/data/tests/test_cv_shape.py
- packages/data/tests/test_freq_age_adult.py
- packages/web/workers/test/routes/meta.surface-platform.test.ts
- packages/web/workers/test/routes/words.search-similar.test.ts
Modified:
- packages/data/src/phonolex_data/pipeline/words.py — add cv_shape derivation + freq_age_adult aggregation
- packages/data/src/phonolex_data/pipeline/schema.py — add cv_shape: str | None + freq_age_adult: float | None to WordRecord
- packages/data/src/phonolex_data/runtime/schema.py — add cv_shape to _CORE_WORDS_COLUMNS
- packages/data/src/phonolex_data/runtime/emit_d1_sql.py — ensure cv_shape ships on words table; freq_age_adult percentile flows like siblings
- packages/data/src/phonolex_data/runtime/store.py — freq_age_adult percentile mapping
- packages/data/src/phonolex_data/pipeline/derived.py — freq_age_adult percentile inclusion
- packages/web/workers/src/config/properties.ts — interface extension, new PropertyDefs, platform_visible flags, getPlatformCategories()
- packages/web/workers/src/lib/queries.ts — cv_shape recognised as words-table column; partitionFilterColumns extended for IN-list categorical filter
- packages/web/workers/src/routes/meta.ts — ?surface=platform query param handling
- packages/web/workers/src/routes/words.ts — cv_shape filter clause + similar_to intersection logic
- packages/web/workers/src/types.ts — add similar_to to WordSearchBody, cv_shape field on WordRow
- packages/web/workers/scripts/config.py — mirror property metadata changes
- packages/web/frontend/src/services/apiClient.ts — extend WordSearchRequest with similar_to + cv_shape; expose Word.cv_shape + Word.similarity
- packages/web/frontend/src/hooks/usePropertyMetadata.tsx — call ?surface=platform
- packages/web/frontend/src/components/Builder.tsx — restructure to 5 accordions; wire CategoricalRule + SimilarToRule
- packages/web/frontend/src/App_new.tsx — update Word Lists description; remove obsolete PHON-117 comment
Deleted:
- packages/web/frontend/src/components/tools/PhonologicalSimilarityTool.tsx
Task 1: Add cv_shape derivation in the data pipeline¶
Files:
- Create: packages/data/tests/test_cv_shape.py
- Modify: packages/data/src/phonolex_data/pipeline/schema.py:48 (add column to WordRecord)
- Modify: packages/data/src/phonolex_data/pipeline/words.py:199-210 (set cv_shape in returned WordRecord)
- Modify: packages/data/src/phonolex_data/runtime/schema.py:48 (add to _CORE_WORDS_COLUMNS)
- [ ] Step 1: Write the failing test
# packages/data/tests/test_cv_shape.py
from phonolex_data.phonology.syllabification import (
syllabify, PhonemeWithStress,
)
from phonolex_data.pipeline.words import _build_phonology_record
def _cv_shape_for(phonemes_with_stress):
"""Helper: syllabify then derive CV shape via the same code path the pipeline uses."""
syls = syllabify(phonemes_with_stress)
parts = []
for s in syls:
parts.append("C" * len(s.onset) + "V" + "C" * len(s.coda))
return "-".join(parts)
def test_cv_shape_monosyllabic_cvc():
"""cat /k.æ.t/ → CVC"""
phs = [
PhonemeWithStress("k", None),
PhonemeWithStress("æ", 1),
PhonemeWithStress("t", None),
]
assert _cv_shape_for(phs) == "CVC"
def test_cv_shape_initial_cluster():
"""spring /s.p.ɹ.ɪ.ŋ/ → CCCVC"""
phs = [
PhonemeWithStress("s", None),
PhonemeWithStress("p", None),
PhonemeWithStress("ɹ", None),
PhonemeWithStress("ɪ", 1),
PhonemeWithStress("ŋ", None),
]
assert _cv_shape_for(phs) == "CCCVC"
def test_cv_shape_disyllabic():
"""kitten /k.ɪ.t.ə.n/ → CVC-VC"""
phs = [
PhonemeWithStress("k", None),
PhonemeWithStress("ɪ", 1),
PhonemeWithStress("t", None),
PhonemeWithStress("ə", 0),
PhonemeWithStress("n", None),
]
assert _cv_shape_for(phs) == "CVC-VC"
def test_cv_shape_diphthong_counts_as_single_v():
"""boat /b.oʊ.t/ → CVC (oʊ is one nucleus phoneme)"""
phs = [
PhonemeWithStress("b", None),
PhonemeWithStress("oʊ", 1),
PhonemeWithStress("t", None),
]
assert _cv_shape_for(phs) == "CVC"
def test_cv_shape_pipeline_emits_field():
"""The pipeline's _build_phonology_record sets WordRecord.cv_shape."""
phono_data = {
"word": "cat",
"phonemes": ["k", "æ", "t"],
"stress_pattern": [None, 1, None],
"ipa": "kæt",
}
rec = _build_phonology_record(phono_data)
assert rec.cv_shape == "CVC"
- [ ] Step 2: Run test to verify it fails
cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/test_cv_shape.py -v
Expected: 5 tests fail (4 pure-helper tests pass if helper logic is correct; test_cv_shape_pipeline_emits_field fails with AttributeError: 'WordRecord' object has no attribute 'cv_shape').
- [ ] Step 3: Add
cv_shapetoWordRecord
Edit packages/data/src/phonolex_data/pipeline/schema.py. Find the structural-cols block in the WordRecord dataclass (around the wcm_score field, line ~48 — look for wcm_score: int | None = None and add the new line directly below):
wcm_score: int | None = None
cv_shape: str | None = None # CV skeleton from syllabification ("CVC", "CCVCC", "CV-CVC", ...)
- [ ] Step 4: Derive
cv_shapein_build_phonology_record
Edit packages/data/src/phonolex_data/pipeline/words.py. Find the existing WordRecord(...) construction in _build_phonology_record (starts at line ~199). Just before the return statement, compute the shape from the already-built syllables_obj:
# Existing code computes syllables_obj and wcm above; now derive CV shape.
cv_shape_parts = []
for s in syllables_obj:
cv_shape_parts.append("C" * len(s.onset) + "V" + "C" * len(s.coda))
cv_shape = "-".join(cv_shape_parts) if cv_shape_parts else None
return WordRecord(
word=phono_data.get("word", ""),
has_phonology=True,
ipa=ipa,
phonemes=phonemes,
phoneme_count=len(phonemes),
syllables=syllables,
syllable_count=len(syllables),
initial_phoneme=phonemes[0] if phonemes else None,
final_phoneme=phonemes[-1] if phonemes else None,
wcm_score=wcm,
cv_shape=cv_shape,
)
- [ ] Step 5: Register
cv_shapein the runtime schema
Edit packages/data/src/phonolex_data/runtime/schema.py. Find the _CORE_WORDS_COLUMNS dict (line 37). Add directly after "wcm_score": pl.Int32,:
"wcm_score": pl.Int32,
"cv_shape": pl.Utf8, # CV skeleton derived from syllabification; categorical filter for SLP word-shape queries
- [ ] Step 6: Run tests to verify they pass
uv run python -m pytest packages/data/tests/test_cv_shape.py -v
Expected: all 5 tests PASS.
- [ ] Step 7: Commit
git add packages/data/tests/test_cv_shape.py \
packages/data/src/phonolex_data/pipeline/schema.py \
packages/data/src/phonolex_data/pipeline/words.py \
packages/data/src/phonolex_data/runtime/schema.py
git commit -m "feat(data): derive cv_shape column from existing syllabification
CV skeleton string emitted per word from the existing Syllable objects.
Lives on the words table (first non-numeric platform property).
Examples: cat→CVC, spring→CCCVC, kitten→CVC-VC, boat→CVC.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 2: Add freq_age_adult headline aggregation¶
Files:
- Create: packages/data/tests/test_freq_age_adult.py
- Modify: packages/data/src/phonolex_data/pipeline/schema.py:197-200 (add freq_age_adult field next to siblings)
- Modify: packages/data/src/phonolex_data/pipeline/words.py:399-415 (compute headline)
- Modify: packages/data/src/phonolex_data/pipeline/derived.py:32 (include in percentile set)
- Modify: packages/data/src/phonolex_data/runtime/store.py:67-70 (percentile mapping)
- Modify: packages/data/src/phonolex_data/runtime/emit_d1_sql.py:70 (treat as a freq_age headline for D1 placement)
- [ ] Step 1: Write the failing test
# packages/data/tests/test_freq_age_adult.py
from phonolex_data.pipeline.schema import WordRecord
from phonolex_data.pipeline.words import _agg_mean # exposed for testing if not, see step note
def test_freq_age_adult_mean_of_b4_b5():
"""freq_age_adult = mean(wpm_b4, wpm_b5), None treated as 0 unless ALL are None."""
rec = WordRecord(word="example")
rec.wpm_b4 = 100.0
rec.wpm_b5 = 200.0
# The aggregation step runs over records in _build_words; emulate it here.
rec.freq_age_adult = _agg_mean([rec.wpm_b4, rec.wpm_b5])
assert rec.freq_age_adult == 150.0
def test_freq_age_adult_none_when_all_none():
rec = WordRecord(word="missing")
rec.wpm_b4 = None
rec.wpm_b5 = None
rec.freq_age_adult = _agg_mean([rec.wpm_b4, rec.wpm_b5])
assert rec.freq_age_adult is None
def test_freq_age_adult_partial_coverage():
"""One None counts as 0 per sibling-aggregation semantics."""
rec = WordRecord(word="partial")
rec.wpm_b4 = 80.0
rec.wpm_b5 = None
rec.freq_age_adult = _agg_mean([rec.wpm_b4, rec.wpm_b5])
assert rec.freq_age_adult == 40.0
Note:
_agg_meanis currently defined locally inside_build_wordsatpackages/data/src/phonolex_data/pipeline/words.py:390. Promote it to module scope (rename helper out of the closure) as part of Step 4 below so it's importable in tests.
- [ ] Step 2: Run test to verify it fails
uv run python -m pytest packages/data/tests/test_freq_age_adult.py -v
Expected: 3 tests FAIL — ImportError: cannot import name '_agg_mean' and AttributeError: 'WordRecord' object has no attribute 'freq_age_adult'.
- [ ] Step 3: Add
freq_age_adulttoWordRecord
Edit packages/data/src/phonolex_data/pipeline/schema.py. Find the existing freq_age headline declarations (line ~197). Add freq_age_adult directly after freq_age_12y:
freq_age_2y: float | None = None
freq_age_5y: float | None = None
freq_age_8y: float | None = None
freq_age_12y: float | None = None
freq_age_adult: float | None = None # mean(wpm_b4, wpm_b5); high-school + college reading bands
Also update the comment block directly above (the # freq_age_12y = mean(...) lines around line 191) by adding:
# freq_age_adult = mean(wpm_b4, wpm_b5)
- [ ] Step 4: Promote
_agg_meanand add the adult aggregation
Edit packages/data/src/phonolex_data/pipeline/words.py. Move _agg_mean from its closure inside _build_words to module scope (just above _build_words):
def _agg_mean(values):
"""Mean treating None as 0; returns None only if ALL inputs are None.
Used for the PHON-88 freq_age_* headline aggregations.
"""
if all(v is None for v in values):
return None
cleaned = [v if v is not None else 0.0 for v in values]
return sum(cleaned) / len(cleaned)
Inside _build_words, remove the local nested _agg_mean definition (lines ~390-395 in the snapshot read during planning), and add the new aggregation line directly after the existing record.freq_age_12y = _agg_mean([...]) block (around line 412):
record.freq_age_12y = _agg_mean([
record.wpm_childes_input_108_144mo, record.wpm_b3,
])
record.freq_age_adult = _agg_mean([
record.wpm_b4, record.wpm_b5,
])
if record.freq_age_2y is not None:
n_with_2y += 1
- [ ] Step 5: Run tests to verify they pass
uv run python -m pytest packages/data/tests/test_freq_age_adult.py -v
Expected: all 3 tests PASS.
- [ ] Step 6: Include in percentile set and runtime mapping
Edit packages/data/src/phonolex_data/pipeline/derived.py. Find the FREQ_AGE list (line 32) and add "freq_age_adult":
"freq_age_2y", "freq_age_5y", "freq_age_8y", "freq_age_12y", "freq_age_adult",
Edit packages/data/src/phonolex_data/runtime/store.py. Find the percentile mapping tuples around line 67–70 and add the adult entry:
("freq_age_2y", "freq_age_2y_percentile"),
("freq_age_5y", "freq_age_5y_percentile"),
("freq_age_8y", "freq_age_8y_percentile"),
("freq_age_12y", "freq_age_12y_percentile"),
("freq_age_adult", "freq_age_adult_percentile"),
Edit packages/data/src/phonolex_data/runtime/emit_d1_sql.py. Find _FREQ_AGE_HEADLINES (line 70) and add freq_age_adult:
_FREQ_AGE_HEADLINES = {"freq_age_2y", "freq_age_5y", "freq_age_8y", "freq_age_12y", "freq_age_adult"}
- [ ] Step 7: Run the full data package test suite
uv run python -m pytest packages/data/tests/ -v
Expected: all existing tests still pass; the new freq_age_adult tests pass.
- [ ] Step 8: Commit
git add packages/data/tests/test_freq_age_adult.py \
packages/data/src/phonolex_data/pipeline/schema.py \
packages/data/src/phonolex_data/pipeline/words.py \
packages/data/src/phonolex_data/pipeline/derived.py \
packages/data/src/phonolex_data/runtime/store.py \
packages/data/src/phonolex_data/runtime/emit_d1_sql.py
git commit -m "feat(data): add freq_age_adult headline (mean of wpm_b4 + wpm_b5)
Adult-band development-frequency aggregation parallel to the existing
freq_age_2y/5y/8y/12y headlines. Promotes _agg_mean to module scope so
the aggregation helper is testable.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 3: Extend PropertyDef in the worker config (TypeScript)¶
Files:
- Modify: packages/web/workers/src/config/properties.ts:7-29 (interface) + new entries + getPlatformCategories()
- [ ] Step 1: Extend the interface and add
getPlatformCategories
Edit packages/web/workers/src/config/properties.ts. Update the interface block at the top:
export interface PropertyDef {
id: string;
label: string;
short_label: string;
source: string;
description: string;
scale: string;
interpretation: string;
display_format: string;
filterable: boolean;
slider_step: number;
use_log_scale: boolean;
is_integer: boolean;
/**
* When false, the prop is hidden from /api/property-metadata entirely
* (still ships to D1 for round-trip via /api/words/:word). Defaults true.
*/
surfaced?: boolean;
/**
* When true, the prop appears in the curated platform UI surface
* (returned by GET /api/property-metadata?surface=platform). Defaults
* undefined (= API-only). Only opt in explicit clinical workhorses.
*/
platform_visible?: boolean;
/**
* Renderer/filter kind. 'numeric' (default) → range slider, min/max.
* 'categorical' → chip-list multi-select, exact-or-IN match.
*/
kind?: 'numeric' | 'categorical';
}
- [ ] Step 2: Add the
cv_shapePropertyDef inside the Phonological Complexity category
In the same file, find the phonological_complexity category (line 39) and append to its properties array — the WCM entry currently closes the array; insert cv_shape directly after it (keeping the existing trailing comma on wcm_score):
{
id: 'wcm_score', label: 'Word Complexity Measure', short_label: 'WCM',
// ... existing fields
},
{
id: 'cv_shape', label: 'CV Shape', short_label: 'Shape',
source: 'Derived from CMU syllabification',
description: 'Consonant–vowel skeleton; one CV-letter per phoneme, dash between syllables (e.g., CVC, CVC-VC, CCVCC).',
scale: 'string',
interpretation: 'Categorical match; supports multi-select OR within rule.',
display_format: 'string',
filterable: true,
slider_step: 0,
use_log_scale: false,
is_integer: false,
platform_visible: true,
kind: 'categorical',
},
- [ ] Step 3: Add
freq_age_adulttoDEV_FREQ_HEADLINES
In the same file, find DEV_FREQ_HEADLINES (line 368). Append a fifth entry after the freq_age_12y block (keep the trailing comma on the existing last entry):
{
id: 'freq_age_12y', label: 'Developmental Freq (~12 yrs)', short_label: 'DF12y',
// ... existing fields
},
{
id: 'freq_age_adult', label: 'Developmental Freq (~Adult)', short_label: 'DFAdult',
source: 'PhonoLex Developmental Frequency',
description:
'Aggregated words-per-million across FineWeb-Edu high-school + college reading bands ' +
'(mean of wpm_b4 + wpm_b5; missing treated as 0).',
scale: '0-50000',
interpretation: 'Higher = more frequent in adult-level reading material',
display_format: '.2f',
filterable: true,
slider_step: 10,
use_log_scale: true,
is_integer: false,
surfaced: true,
platform_visible: true,
},
- [ ] Step 4: Tag the remaining 12 platform-visible properties
In the same file, set platform_visible: true on each of these PropertyDef records (cv_shape + freq_age_adult already tagged in Steps 2–3):
| File location (approximate line) | id | Where to add |
|---|---|---|
line 43 (syllable_count) |
syllable_count |
add platform_visible: true, to the record |
line 51 (phoneme_count) |
phoneme_count |
same |
line 58 (wcm_score) |
wcm_score |
same |
line 151 (aoa) |
aoa |
same |
line 174 (familiarity) |
familiarity |
same |
line 182 (concreteness) |
concreteness |
same |
line 196 (valence) |
valence |
same |
line 204 (arousal) |
arousal |
same |
line 370 (freq_age_2y) |
freq_age_2y |
same |
line 381 (freq_age_5y) |
freq_age_5y |
same |
line 393 (freq_age_8y) |
freq_age_8y |
same |
line 404 (freq_age_12y) |
freq_age_12y |
same |
Example pattern for any one of them:
{
id: 'syllable_count', label: 'Syllable Count', short_label: 'Syl',
source: 'CMU Pronouncing Dictionary',
description: 'Number of syllables', scale: '1-8',
interpretation: 'More syllables = more complex',
display_format: '.0f', filterable: true, slider_step: 1,
use_log_scale: false, is_integer: true,
platform_visible: true,
},
- [ ] Step 5: Add
getPlatformCategories()
Edit the same file. After getSurfacedCategories() (currently ends at line 500), add:
/**
* Property categories filtered to platform_visible=true. Used by
* /api/property-metadata?surface=platform to drive the curated SLP UI.
* Categories that become empty after filtering are dropped.
*/
export function getPlatformCategories(): PropertyCategory[] {
const out: PropertyCategory[] = [];
for (const cat of PROPERTY_CATEGORIES) {
const props = cat.properties.filter(
(p) => p.surfaced !== false && p.platform_visible === true,
);
if (props.length > 0) out.push({ id: cat.id, label: cat.label, properties: props });
}
return out;
}
- [ ] Step 6: Typecheck the worker package
cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npx tsc --noEmit
Expected: no type errors. (If FILTERABLE_PROPERTIES picks up cv_shape automatically because filterable: true — that's intended; downstream validation in partitionFilterColumns will be extended in Task 5.)
- [ ] Step 7: Commit
git add packages/web/workers/src/config/properties.ts
git commit -m "feat(api): extend PropertyDef with platform_visible + kind; add cv_shape and freq_age_adult
PropertyDef gains platform_visible (curated platform UI flag) and
kind (numeric|categorical, default numeric). 14 properties tagged
platform_visible across Word Shape / Age Appropriateness / Imagery &
Familiarity / Emotional Tone groups. getPlatformCategories helper
returns the curated subset.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 4: Mirror PropertyDef changes in scripts/config.py¶
Files:
- Modify: packages/web/workers/scripts/config.py — mirror cv_shape PropertyDef, freq_age_adult PropertyDef, platform_visible flags
- [ ] Step 1: Read the existing pattern
sed -n '180,210p' packages/web/workers/scripts/config.py
sed -n '450,540p' packages/web/workers/scripts/config.py
Confirm the Python dataclass / dict pattern used; mirror style exactly.
- [ ] Step 2: Mirror the
cv_shapePropertyDef
Edit packages/web/workers/scripts/config.py. Find the WCM PropertyDef entry (search id="wcm_score"). Add a cv_shape PropertyDef directly after it using the same dataclass/dict shape as the surrounding entries. Required fields (matching the TS):
PropertyDef(
id="cv_shape",
label="CV Shape",
short_label="Shape",
source="Derived from CMU syllabification",
description=(
"Consonant-vowel skeleton; one CV-letter per phoneme, dash between "
"syllables (e.g., CVC, CVC-VC, CCVCC)."
),
scale="string",
interpretation="Categorical match; supports multi-select OR within rule.",
display_format="string",
filterable=True,
slider_step=0,
use_log_scale=False,
is_integer=False,
platform_visible=True,
kind="categorical",
),
If the existing PropertyDef dataclass in this file doesn't declare platform_visible and kind, extend it (add platform_visible: bool | None = None and kind: str = "numeric" to the dataclass declaration) and ensure both new fields are emitted through whatever serialization path downstream consumers use.
- [ ] Step 3: Mirror the
freq_age_adultPropertyDef
Find the freq_age_12y PropertyDef (line ~524) and add a freq_age_adult entry directly below with the same shape:
PropertyDef(
id="freq_age_adult",
label="Developmental Freq (~Adult)",
short_label="DFAdult",
source="PhonoLex Developmental Frequency",
description=(
"Aggregated words-per-million across FineWeb-Edu high-school + "
"college reading bands (mean of wpm_b4 + wpm_b5; missing treated as 0)."
),
scale="0-50000",
interpretation="Higher = more frequent in adult-level reading material",
display_format=".2f",
filterable=True,
slider_step=10,
use_log_scale=True,
is_integer=False,
surfaced=True,
platform_visible=True,
),
Also extend the comment block above freq_age_2y to add # freq_age_adult = mean(wpm_b4, wpm_b5) for parity with words.py.
- [ ] Step 4: Tag the 12 remaining platform-visible properties
Add platform_visible=True, to the PropertyDef records in scripts/config.py for:
syllable_count, phoneme_count, wcm_score, aoa, familiarity, concreteness, valence, arousal, freq_age_2y, freq_age_5y, freq_age_8y, freq_age_12y.
- [ ] Step 5: Run the existing data package tests to verify the config still parses
uv run python -m pytest packages/data/tests/ -v
Expected: all data tests still pass (no behavior change in the pipeline yet beyond Tasks 1–2).
- [ ] Step 6: Commit
git add packages/web/workers/scripts/config.py
git commit -m "feat(config): mirror PropertyDef extension in scripts/config.py
Python config picks up platform_visible + kind fields, cv_shape +
freq_age_adult PropertyDef records, and 12 platform_visible flags.
Mirrors the TS config in packages/web/workers/src/config/properties.ts.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 5: /api/property-metadata?surface=platform¶
Files:
- Create: packages/web/workers/test/routes/meta.surface-platform.test.ts
- Modify: packages/web/workers/src/routes/meta.ts:47-52
- [ ] Step 1: Write the failing test
// packages/web/workers/test/routes/meta.surface-platform.test.ts
import { describe, expect, it } from 'vitest';
import { unstable_dev } from 'wrangler';
import type { UnstableDevWorker } from 'wrangler';
describe('GET /api/property-metadata?surface=platform', () => {
let worker: UnstableDevWorker;
beforeAll(async () => {
worker = await unstable_dev('src/index.ts', {
experimental: { disableExperimentalWarning: true },
local: true,
});
});
afterAll(async () => {
await worker.stop();
});
it('returns only platform_visible categories', async () => {
const res = await worker.fetch('/api/property-metadata?surface=platform');
expect(res.status).toBe(200);
const cats = (await res.json()) as Array<{ id: string; properties: Array<{ id: string }> }>;
const ids = cats.flatMap((c) => c.properties.map((p) => p.id));
// Must include the 14 curated platform properties
expect(ids).toEqual(expect.arrayContaining([
'syllable_count', 'phoneme_count', 'wcm_score', 'cv_shape',
'aoa', 'freq_age_2y', 'freq_age_5y', 'freq_age_8y', 'freq_age_12y', 'freq_age_adult',
'concreteness', 'familiarity',
'valence', 'arousal',
]));
// Must NOT include researcher-only props
expect(ids).not.toContain('phono_prob_avg');
expect(ids).not.toContain('iconicity');
expect(ids).not.toContain('semd_topic');
expect(ids).not.toContain('log_frequency');
});
it('returns the full surfaced set when no surface param is given', async () => {
const res = await worker.fetch('/api/property-metadata');
expect(res.status).toBe(200);
const cats = (await res.json()) as Array<{ id: string; properties: Array<{ id: string }> }>;
const ids = cats.flatMap((c) => c.properties.map((p) => p.id));
// Full surfaced set includes researcher-grade props
expect(ids).toContain('phono_prob_avg');
expect(ids).toContain('iconicity');
});
});
- [ ] Step 2: Run test to verify it fails
cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npm test -- meta.surface-platform
Expected: FAIL (the route does not yet handle ?surface=platform; returns the full set).
- [ ] Step 3: Update the meta route
Edit packages/web/workers/src/routes/meta.ts. Update the import (line 7) and the property-metadata handler (lines 47-52):
import { getSurfacedCategories, getPlatformCategories } from '../config/properties';
meta.get('/property-metadata', (c) => {
const surface = c.req.query('surface');
if (surface === 'platform') {
return c.json(getPlatformCategories());
}
return c.json(getSurfacedCategories());
});
- [ ] Step 4: Run test to verify it passes
npm test -- meta.surface-platform
Expected: both tests PASS.
- [ ] Step 5: Commit
git add packages/web/workers/src/routes/meta.ts \
packages/web/workers/test/routes/meta.surface-platform.test.ts
git commit -m "feat(api): /api/property-metadata?surface=platform for curated UI
Researcher consumers continue to hit the unparam'd route and get the
full surfaced set. The frontend opts into ?surface=platform for the
curated 14-property platform surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 6: cv_shape filter on /api/words/search¶
Files:
- Modify: packages/web/workers/src/types.ts — add cv_shape?: string[] to WordSearchBody and cv_shape?: string | null to WordRow
- Modify: packages/web/workers/src/lib/queries.ts:20-33 — add cv_shape to WORDS_TABLE_COLS
- Modify: packages/web/workers/src/routes/words.ts:93-145 — accept and apply the filter
- [ ] Step 1: Write the failing test
Add a new test case to (or create) packages/web/workers/test/routes/words.search-cv-shape.test.ts:
import { describe, expect, it, beforeAll, afterAll } from 'vitest';
import { unstable_dev } from 'wrangler';
import type { UnstableDevWorker } from 'wrangler';
describe('POST /api/words/search with cv_shape filter', () => {
let worker: UnstableDevWorker;
beforeAll(async () => {
worker = await unstable_dev('src/index.ts', {
experimental: { disableExperimentalWarning: true },
local: true,
});
});
afterAll(async () => { await worker.stop(); });
it('returns only words whose cv_shape matches', async () => {
const res = await worker.fetch('/api/words/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ cv_shape: ['CVC'], limit: 50 }),
});
expect(res.status).toBe(200);
const json = (await res.json()) as { items: Array<{ cv_shape: string }> };
expect(json.items.length).toBeGreaterThan(0);
for (const item of json.items) {
expect(item.cv_shape).toBe('CVC');
}
});
it('OR-matches across a multi-shape list', async () => {
const res = await worker.fetch('/api/words/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ cv_shape: ['CVC', 'CCVC'], limit: 50 }),
});
const json = (await res.json()) as { items: Array<{ cv_shape: string }> };
for (const item of json.items) {
expect(['CVC', 'CCVC']).toContain(item.cv_shape);
}
});
it('ignores empty cv_shape array', async () => {
const res = await worker.fetch('/api/words/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ cv_shape: [], limit: 5 }),
});
expect(res.status).toBe(200);
});
});
- [ ] Step 2: Run to verify it fails
npm test -- words.search-cv-shape
Expected: FAIL — the filter is unimplemented; items have no cv_shape field.
- [ ] Step 3: Update types
Edit packages/web/workers/src/types.ts. Find the WordSearchBody interface and add:
export interface WordSearchBody {
// ... existing fields
cv_shape?: string[]; // OR list of CV shapes; empty/undefined = no filter
}
Find the WordRow type and add (string column, nullable):
export interface WordRow {
// ... existing fields
cv_shape?: string | null;
}
- [ ] Step 4: Register
cv_shapeas a words-table column
Edit packages/web/workers/src/lib/queries.ts:20. Add 'cv_shape' to WORDS_TABLE_COLS:
const WORDS_TABLE_COLS = new Set([
'word', 'has_phonology', 'ipa', 'phonemes', 'phonemes_str',
'syllables', 'phoneme_count', 'syllable_count',
'initial_phoneme', 'final_phoneme', 'root', 'variants', 'cv_shape',
]);
- [ ] Step 5: Apply the filter in the search route
Edit packages/web/workers/src/routes/words.ts. In words.post('/search', ...) (line 93), after the existing exclude_phonemes block (around line 142) and before the WHERE assembly:
// CV-shape filter — OR within array, words table column
if (body.cv_shape?.length) {
const placeholders = body.cv_shape.map(() => '?').join(', ');
wordsWhere.push(`w.cv_shape IN (${placeholders})`);
params.push(...body.cv_shape);
}
- [ ] Step 6: Ensure
cv_shaperound-trips on the response
Open packages/web/workers/src/lib/wordResponse.ts and confirm rowToWordResponse copies all WordRow fields (it typically does via spread). If it whitelists fields explicitly, add cv_shape to the whitelist:
grep -n "cv_shape\|wcm_score\|syllable_count" packages/web/workers/src/lib/wordResponse.ts
If cv_shape is missing where wcm_score appears, add it next to that field.
- [ ] Step 7: Run tests to verify they pass
npm test -- words.search-cv-shape
Expected: all 3 tests PASS.
- [ ] Step 8: Commit
git add packages/web/workers/src/types.ts \
packages/web/workers/src/lib/queries.ts \
packages/web/workers/src/routes/words.ts \
packages/web/workers/src/lib/wordResponse.ts \
packages/web/workers/test/routes/words.search-cv-shape.test.ts
git commit -m "feat(api): cv_shape filter on POST /api/words/search
Accepts cv_shape: string[] with OR semantics within the array;
ANDs with the rest of the query. Recognised as a words-table column
in queries.ts; round-trips through rowToWordResponse.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 7: similar_to block on /api/words/search¶
Files:
- Create: packages/web/workers/test/routes/words.search-similar.test.ts
- Modify: packages/web/workers/src/types.ts — add similar_to block to WordSearchBody; add similarity?: number to WordRow/response
- Modify: packages/web/workers/src/routes/words.ts:93-214 — intersect with similarity scan when block is present
- [ ] Step 1: Write the failing test
// packages/web/workers/test/routes/words.search-similar.test.ts
import { describe, expect, it, beforeAll, afterAll } from 'vitest';
import { unstable_dev } from 'wrangler';
import type { UnstableDevWorker } from 'wrangler';
describe('POST /api/words/search with similar_to block', () => {
let worker: UnstableDevWorker;
beforeAll(async () => {
worker = await unstable_dev('src/index.ts', {
experimental: { disableExperimentalWarning: true },
local: true,
});
});
afterAll(async () => { await worker.stop(); });
it('returns words ranked by similarity to anchor when similar_to provided', async () => {
const res = await worker.fetch('/api/words/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
similar_to: {
word: 'cat',
weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
threshold: 0.7,
position: 'all',
syllable_count: 1,
},
limit: 20,
}),
});
expect(res.status).toBe(200);
const json = (await res.json()) as { items: Array<{ word: string; similarity: number }> };
expect(json.items.length).toBeGreaterThan(0);
// Each item carries a similarity score >= threshold
for (const item of json.items) {
expect(item.similarity).toBeGreaterThanOrEqual(0.7);
}
// Results are sorted by similarity descending
for (let i = 1; i < json.items.length; i++) {
expect(json.items[i].similarity).toBeLessThanOrEqual(json.items[i - 1].similarity);
}
});
it('intersects similarity with cv_shape filter', async () => {
const res = await worker.fetch('/api/words/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
similar_to: {
word: 'cat',
weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
threshold: 0.6,
position: 'all',
syllable_count: 1,
},
cv_shape: ['CVC'],
limit: 50,
}),
});
expect(res.status).toBe(200);
const json = (await res.json()) as { items: Array<{ word: string; cv_shape: string; similarity: number }> };
for (const item of json.items) {
expect(item.cv_shape).toBe('CVC');
expect(item.similarity).toBeGreaterThanOrEqual(0.6);
}
});
});
- [ ] Step 2: Run test to verify it fails
npm test -- words.search-similar
Expected: FAIL — the route ignores similar_to; items lack similarity field.
- [ ] Step 3: Extend types
Edit packages/web/workers/src/types.ts. Add to WordSearchBody:
export interface WordSearchBody {
// ... existing fields including cv_shape from Task 6
similar_to?: {
word: string;
weights: { onset: number; nucleus: number; coda: number };
threshold: number;
position: 'all' | 'initial' | 'final' | 'medial';
syllable_count: number;
};
}
If the response type (e.g., PaginatedWordResponse's item shape) needs similarity, add it there too (likely on the per-item Word shape):
export interface Word {
// ... existing fields
similarity?: number;
}
- [ ] Step 4: Refactor similarity scan into a reusable helper
Open packages/web/workers/src/routes/similarity.ts. Find the scan loop (line 107-127). Extract its core into an exported helper that can be invoked from the search route. Add (or move) at the top of the file:
/**
* Score all words against a target anchor; returns { word → similarity } above threshold.
* Cold-start cache load shared with /similarity/search.
*/
export async function scoreSimilarityScan(
db: D1Database,
body: {
word: string;
threshold: number;
weights: { onset: number; nucleus: number; coda: number };
position: string;
syllable_count: number;
},
): Promise<Map<string, number>> {
await ensureCache(db);
const word = body.word.toLowerCase();
const targetSyls = allWordSyllables!.get(word);
if (!targetSyls) return new Map();
const targetExtracted = extractSyllables(targetSyls, body.position, body.syllable_count);
if (!targetExtracted.length) return new Map();
const out = new Map<string, number>();
for (const [candidate, candidateSyls] of allWordSyllables!) {
if (candidate === word) continue;
const candidateExtracted = extractSyllables(candidateSyls, body.position, body.syllable_count);
if (!candidateExtracted.length) continue;
const sim = softLevenshteinSimilarity(
targetExtracted, candidateExtracted, body.weights, componentMap!, phonemeCache!,
);
if (sim >= body.threshold) {
out.set(candidate, Math.round(sim * 10000) / 10000);
}
}
return out;
}
The existing similarity.post('/search', ...) handler reuses this helper internally (refactor: replace its inner scan loop with const scored = await scoreSimilarityScan(c.env.DB, { ... }); then iterate scored to build the response).
- [ ] Step 5: Wire the intersection in the search route
Edit packages/web/workers/src/routes/words.ts. At the top, import the helper:
import { scoreSimilarityScan } from './similarity';
In words.post('/search', ...), after the existing exclude_phonemes and cv_shape clauses are pushed onto wordsWhere (around line 142 after Task 6's edit), and BEFORE the SQL-assembly + execution block (the const wordsWhereSQL = ... line), insert this complete early-return branch. It does similarity scan → chunked SQL filter with IN (...) per chunk → JS intersection → sort by similarity desc → paginate → fetch full rows → stamp similarity on each:
// ---- Similarity intersection path (when similar_to is present) ----
// D1 caps bind params at 100; chunk the IN-list at 80 per call to stay under.
if (body.similar_to) {
const scoreMap = await scoreSimilarityScan(c.env.DB, body.similar_to);
if (scoreMap.size === 0) {
return c.json({
items: [],
total: 0,
offset: Math.max(body.offset ?? 0, 0),
limit: Math.min(Math.max(body.limit ?? 50, 1), 5000),
});
}
const candidates = Array.from(scoreMap.keys());
// Build the WHERE without an IN clause; we append a fresh IN per chunk below.
const wordsWhereSQL = wordsWhere.length ? wordsWhere.join(' AND ') : '1=1';
const propsWhereSQL = propsWhere.length ? propsWhere.join(' AND ') : '1=1';
const fromClause = 'FROM words w INNER JOIN word_properties wp ON w.word = wp.word';
const selectCols = needsMedialPostFilter ? 'w.word, w.phonemes' : 'w.word';
const filterMatching = new Set<string>();
for (let i = 0; i < candidates.length; i += 80) {
const chunk = candidates.slice(i, i + 80);
const placeholders = chunk.map(() => '?').join(', ');
const sql = `SELECT ${selectCols} ${fromClause}
WHERE ${wordsWhereSQL} AND w.word IN (${placeholders}) AND ${propsWhereSQL}`;
const { results } = await c.env.DB.prepare(sql)
.bind(...params, ...chunk)
.all<{ word: string; phonemes?: string }>();
for (const row of results) {
if (needsMedialPostFilter && row.phonemes) {
const phonemes = JSON.parse(row.phonemes) as string[];
if (!matchesMedialPattern(phonemes, medialSequences)) continue;
}
filterMatching.add(row.word);
}
}
// Intersection, sorted by similarity desc.
const intersected = candidates.filter((w) => filterMatching.has(w));
intersected.sort((a, b) => (scoreMap.get(b) ?? 0) - (scoreMap.get(a) ?? 0));
const offset = Math.max(body.offset ?? 0, 0);
const limit = Math.min(Math.max(body.limit ?? 50, 1), 5000);
const page = intersected.slice(offset, offset + limit);
if (!page.length) {
return c.json({ items: [], total: intersected.length, offset, limit });
}
const rowMap = await fetchMergedWordRows(c.env.DB, page);
return c.json({
items: page
.filter((w) => rowMap.has(w))
.map((w) => ({
...rowToWordResponse(rowMap.get(w)!),
similarity: scoreMap.get(w) ?? 0,
})),
total: intersected.length,
offset,
limit,
});
}
// ---- End similarity intersection path; falls through to existing non-similar code below ----
Note: this branch returns early. The existing non-similar code path (count + paginated word list + fetchMergedWordRows, lines ~192-213) is left unchanged and serves all queries where body.similar_to is undefined.
- [ ] Step 6: Refactor
/api/similarity/searchto use the same helper
Edit packages/web/workers/src/routes/similarity.ts. Find the existing handler (line 70). Replace the inner scan loop (lines 107-127, the for (const [candidate, candidateSyls] of allWordSyllables!) block) with a call to the new scoreSimilarityScan. The handler becomes:
similarity.post('/search', async (c) => {
const body = await c.req.json<{
word: string;
threshold?: number;
limit?: number;
onset_weight?: number;
nucleus_weight?: number;
coda_weight?: number;
position?: string;
syllable_count?: number;
}>();
const scoreMap = await scoreSimilarityScan(c.env.DB, {
word: body.word.toLowerCase(),
threshold: body.threshold ?? 0.7,
weights: {
onset: body.onset_weight ?? 0.33,
nucleus: body.nucleus_weight ?? 0.33,
coda: body.coda_weight ?? 0.33,
},
position: body.position ?? 'all',
syllable_count: body.syllable_count ?? 1,
});
const limit = Math.min(Math.max(body.limit ?? 50, 1), 500);
const ranked = Array.from(scoreMap.entries())
.sort((a, b) => b[1] - a[1])
.slice(0, limit);
if (!ranked.length) return c.json([]);
const wordNames = ranked.map(([w]) => w);
const rowMap = await fetchMergedWordRows(c.env.DB, wordNames, {
requirePhonology: true,
requireFrequency: true,
});
return c.json(
ranked
.filter(([w]) => rowMap.has(w))
.map(([w, sim]) => ({ word: rowToWordResponse(rowMap.get(w)!), similarity: sim })),
);
});
This preserves the existing endpoint's contract (callers still get [{ word: WordResponse, similarity: number }, ...]) while sharing the scan implementation.
- [ ] Step 7: Run tests to verify they pass
npm test -- words.search-similar
Expected: both tests PASS.
- [ ] Step 8: Re-run the full worker test suite to confirm no regressions
npm test
Expected: all tests pass.
- [ ] Step 9: Commit
git add packages/web/workers/src/types.ts \
packages/web/workers/src/routes/words.ts \
packages/web/workers/src/routes/similarity.ts \
packages/web/workers/test/routes/words.search-similar.test.ts
git commit -m "feat(api): similar_to block on POST /api/words/search
Intersects existing filter+pattern query with a similarity scan; ranks
the intersection by similarity desc and stamps similarity onto each
item. Reuses the soft-Levenshtein helper from /api/similarity/search
(now extracted to scoreSimilarityScan). D1 bind chunking at 80/call.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 8: Frontend CategoricalRule component¶
Files:
- Create: packages/web/frontend/src/components/shared/CategoricalRule.tsx
- Create: packages/web/frontend/src/components/shared/__tests__/CategoricalRule.test.tsx (if tests directory exists; check first)
- [ ] Step 1: Check the test convention
ls packages/web/frontend/src/test/ packages/web/frontend/src/**/*.test.tsx 2>/dev/null | head
Use whatever co-location pattern is already in place. Create the test file in the matching location.
- [ ] Step 2: Write the failing test
// packages/web/frontend/src/test/CategoricalRule.test.tsx (or co-located)
import { describe, expect, it, vi } from 'vitest';
import { render, screen, fireEvent } from '@testing-library/react';
import CategoricalRule from '../components/shared/CategoricalRule';
describe('CategoricalRule', () => {
it('renders preset chips and toggles selection', () => {
const onChange = vi.fn();
render(
<CategoricalRule
label="CV shape"
presets={['CV', 'CVC', 'CCV']}
value={[]}
onChange={onChange}
allowCustom={true}
/>,
);
expect(screen.getByText('CV')).toBeInTheDocument();
fireEvent.click(screen.getByText('CVC'));
expect(onChange).toHaveBeenCalledWith(['CVC']);
});
it('accepts custom entries via the Add button', () => {
const onChange = vi.fn();
render(
<CategoricalRule
label="CV shape"
presets={['CVC']}
value={[]}
onChange={onChange}
allowCustom={true}
customValidator={(s) => /^[CV]+(-[CV]+)*$/.test(s)}
/>,
);
fireEvent.change(screen.getByLabelText(/custom/i), { target: { value: 'CVCV-CVC' } });
fireEvent.click(screen.getByText(/add/i));
expect(onChange).toHaveBeenCalledWith(['CVCV-CVC']);
});
it('rejects invalid custom entries when validator provided', () => {
const onChange = vi.fn();
render(
<CategoricalRule
label="CV shape"
presets={['CVC']}
value={[]}
onChange={onChange}
allowCustom={true}
customValidator={(s) => /^[CV]+(-[CV]+)*$/.test(s)}
/>,
);
fireEvent.change(screen.getByLabelText(/custom/i), { target: { value: 'invalid123' } });
fireEvent.click(screen.getByText(/add/i));
expect(onChange).not.toHaveBeenCalled();
});
it('removes active entries when X is clicked', () => {
const onChange = vi.fn();
render(
<CategoricalRule
label="CV shape"
presets={['CV', 'CVC', 'CCV']}
value={['CV', 'CVC']}
onChange={onChange}
allowCustom={false}
/>,
);
fireEvent.click(screen.getByLabelText(/remove CV$/i));
expect(onChange).toHaveBeenCalledWith(['CVC']);
});
});
- [ ] Step 3: Run test to verify it fails
cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm test -- CategoricalRule
Expected: FAIL — component does not exist.
- [ ] Step 4: Implement the component
Create packages/web/frontend/src/components/shared/CategoricalRule.tsx:
import React, { useState } from 'react';
import { Box, Chip, Stack, TextField, Button, Typography } from '@mui/material';
export interface CategoricalRuleProps {
/** Section label displayed above the chip group. */
label: string;
/** Preset values to display as chips (always shown). */
presets: string[];
/** Currently active selection. */
value: string[];
/** Called with the new selection array. */
onChange: (next: string[]) => void;
/** Allow a free-text "custom" input + Add button. */
allowCustom?: boolean;
/** Optional validator for custom entries; rejected entries do not add. */
customValidator?: (input: string) => boolean;
}
const CategoricalRule: React.FC<CategoricalRuleProps> = ({
label,
presets,
value,
onChange,
allowCustom = false,
customValidator,
}) => {
const [customInput, setCustomInput] = useState('');
const [customError, setCustomError] = useState<string | null>(null);
const togglePreset = (preset: string) => {
if (value.includes(preset)) {
onChange(value.filter((v) => v !== preset));
} else {
onChange([...value, preset]);
}
};
const addCustom = () => {
const trimmed = customInput.trim();
if (!trimmed) return;
if (customValidator && !customValidator(trimmed)) {
setCustomError(`Invalid ${label.toLowerCase()} pattern`);
return;
}
if (value.includes(trimmed)) {
setCustomInput('');
return;
}
onChange([...value, trimmed]);
setCustomInput('');
setCustomError(null);
};
const removeActive = (item: string) => {
onChange(value.filter((v) => v !== item));
};
return (
<Box>
<Typography variant="subtitle2" gutterBottom>{label}</Typography>
<Typography variant="caption" color="text.secondary" sx={{ display: 'block', mb: 1 }}>
Common shapes:
</Typography>
<Stack direction="row" spacing={1} flexWrap="wrap" useFlexGap sx={{ mb: 1 }}>
{presets.map((p) => (
<Chip
key={p}
label={p}
onClick={() => togglePreset(p)}
color={value.includes(p) ? 'primary' : 'default'}
sx={{ mb: 1 }}
/>
))}
</Stack>
{allowCustom && (
<Stack direction="row" spacing={1} alignItems="center" sx={{ mb: 1 }}>
<TextField
label={`Custom ${label.toLowerCase()}`}
value={customInput}
onChange={(e) => { setCustomInput(e.target.value); setCustomError(null); }}
size="small"
error={!!customError}
helperText={customError ?? undefined}
onKeyDown={(e) => { if (e.key === 'Enter') addCustom(); }}
/>
<Button onClick={addCustom} variant="outlined" size="small">Add</Button>
</Stack>
)}
{value.length > 0 && (
<Box>
<Typography variant="caption" color="text.secondary" sx={{ display: 'block', mb: 0.5 }}>
Active:
</Typography>
<Stack direction="row" spacing={1} flexWrap="wrap" useFlexGap>
{value.map((v) => (
<Chip
key={v}
label={v}
onDelete={() => removeActive(v)}
aria-label={`remove ${v}`}
color="primary"
variant="outlined"
size="small"
/>
))}
</Stack>
</Box>
)}
</Box>
);
};
export default CategoricalRule;
- [ ] Step 5: Run test to verify it passes
npm test -- CategoricalRule
Expected: all 4 tests PASS.
- [ ] Step 6: Commit
git add packages/web/frontend/src/components/shared/CategoricalRule.tsx \
packages/web/frontend/src/test/CategoricalRule.test.tsx
git commit -m "feat(frontend): CategoricalRule — reusable chip + custom-input picker
Generic primitive for multi-select OR-semantics categorical filters.
Used for cv_shape; reusable for future categorical filters (e.g., POS).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 9: Frontend SimilarToRule component¶
Files:
- Create: packages/web/frontend/src/components/shared/SimilarToRule.tsx
- Create: packages/web/frontend/src/test/SimilarToRule.test.tsx
- [ ] Step 1: Write the failing test
// packages/web/frontend/src/test/SimilarToRule.test.tsx
import { describe, expect, it, vi } from 'vitest';
import { render, screen, fireEvent } from '@testing-library/react';
import SimilarToRule from '../components/shared/SimilarToRule';
describe('SimilarToRule', () => {
it('renders preset chips and applies preset on click', () => {
const onChange = vi.fn();
render(
<SimilarToRule
value={{
word: '',
weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
threshold: 0.85,
position: 'all',
syllable_count: 1,
}}
onChange={onChange}
/>,
);
fireEvent.click(screen.getByText('Rhymes'));
expect(onChange).toHaveBeenCalled();
const arg = onChange.mock.calls.pop()[0];
expect(arg.weights.onset).toBe(0.0);
expect(arg.position).toBe('final');
});
it('updates anchor word on input', () => {
const onChange = vi.fn();
render(
<SimilarToRule
value={{
word: '',
weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
threshold: 0.85,
position: 'all',
syllable_count: 1,
}}
onChange={onChange}
/>,
);
fireEvent.change(screen.getByLabelText(/anchor word/i), { target: { value: 'snake' } });
expect(onChange).toHaveBeenLastCalledWith(expect.objectContaining({ word: 'snake' }));
});
it('exposes advanced sliders only when disclosure expanded', () => {
const onChange = vi.fn();
render(
<SimilarToRule
value={{
word: 'snake',
weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
threshold: 0.85,
position: 'all',
syllable_count: 1,
}}
onChange={onChange}
/>,
);
// Advanced is collapsed by default; sliders not visible
expect(screen.queryByLabelText(/onset weight/i)).not.toBeInTheDocument();
fireEvent.click(screen.getByText(/advanced/i));
expect(screen.getByLabelText(/onset weight/i)).toBeInTheDocument();
});
});
- [ ] Step 2: Run test to verify it fails
npm test -- SimilarToRule
Expected: FAIL — component does not exist.
- [ ] Step 3: Implement the component
Create packages/web/frontend/src/components/shared/SimilarToRule.tsx. Lift the preset definitions, component-weight slider block, position/count select pair, and labeled-bucket threshold select from PhonologicalSimilarityTool.tsx lines 47-305. Refactor into a controlled component:
import React, { useState } from 'react';
import {
Box, TextField, Stack, Chip, FormControl, InputLabel, Select, MenuItem,
Paper, Slider, Typography, Button, Accordion, AccordionSummary, AccordionDetails,
} from '@mui/material';
import { ExpandMore as ExpandMoreIcon, Refresh as ResetIcon } from '@mui/icons-material';
export interface SimilarToValue {
word: string;
weights: { onset: number; nucleus: number; coda: number };
threshold: number;
position: 'all' | 'initial' | 'final' | 'medial';
syllable_count: number;
}
interface PresetConfig {
name: string;
weights: SimilarToValue['weights'];
position: SimilarToValue['position'];
syllable_count: number;
}
const PRESETS: PresetConfig[] = [
{ name: 'Rhymes', weights: { onset: 0.0, nucleus: 0.5, coda: 0.5 }, position: 'final', syllable_count: 1 },
{ name: 'Alliteration', weights: { onset: 1.0, nucleus: 0.5, coda: 0.0 }, position: 'initial', syllable_count: 1 },
{ name: 'Assonance', weights: { onset: 0.0, nucleus: 1.0, coda: 0.0 }, position: 'all', syllable_count: 1 },
{ name: 'Consonance', weights: { onset: 0.5, nucleus: 0.0, coda: 0.5 }, position: 'all', syllable_count: 1 },
{ name: 'Balanced', weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 }, position: 'all', syllable_count: 1 },
];
const THRESHOLD_OPTIONS = [
{ value: 0.95, label: 'Very High (0.95)' },
{ value: 0.85, label: 'High (0.85)' },
{ value: 0.75, label: 'Medium (0.75)' },
{ value: 0.65, label: 'Low (0.65)' },
{ value: 0.50, label: 'Very Low (0.50)' },
];
export interface SimilarToRuleProps {
value: SimilarToValue;
onChange: (next: SimilarToValue) => void;
}
const SimilarToRule: React.FC<SimilarToRuleProps> = ({ value, onChange }) => {
const update = (patch: Partial<SimilarToValue>) => onChange({ ...value, ...patch });
const applyPreset = (preset: PresetConfig) => {
onChange({
...value,
weights: preset.weights,
position: preset.position,
syllable_count: preset.syllable_count,
});
};
const matchesPreset = (preset: PresetConfig) =>
value.weights.onset === preset.weights.onset &&
value.weights.nucleus === preset.weights.nucleus &&
value.weights.coda === preset.weights.coda &&
value.position === preset.position &&
value.syllable_count === preset.syllable_count;
return (
<Paper variant="outlined" sx={{ p: 2 }}>
<Stack spacing={2}>
<Typography variant="subtitle2">Similar to anchor word</Typography>
<Typography variant="caption" color="text.secondary">
Empty anchor = rule inactive.
</Typography>
<TextField
label="Anchor word"
value={value.word}
onChange={(e) => update({ word: e.target.value })}
size="small"
placeholder="e.g., cat, snake, computer"
fullWidth
/>
<Box>
<Typography variant="body2" gutterBottom>Preset</Typography>
<Stack direction="row" spacing={1} flexWrap="wrap" useFlexGap>
{PRESETS.map((preset) => (
<Chip
key={preset.name}
label={preset.name}
onClick={() => applyPreset(preset)}
color={matchesPreset(preset) ? 'primary' : 'default'}
sx={{ mb: 1 }}
/>
))}
</Stack>
</Box>
<FormControl size="small" fullWidth>
<InputLabel>Match strength</InputLabel>
<Select
value={value.threshold}
label="Match strength"
onChange={(e) => update({ threshold: e.target.value as number })}
>
{THRESHOLD_OPTIONS.map((opt) => (
<MenuItem key={opt.value} value={opt.value}>{opt.label}</MenuItem>
))}
</Select>
</FormControl>
<Accordion variant="outlined" disableGutters>
<AccordionSummary expandIcon={<ExpandMoreIcon />}>
<Typography variant="body2">Advanced (component weights + position)</Typography>
</AccordionSummary>
<AccordionDetails>
<Stack spacing={2}>
<Stack direction="row" spacing={2}>
<FormControl size="small" fullWidth>
<InputLabel>Position</InputLabel>
<Select
value={value.position}
label="Position"
onChange={(e) => update({ position: e.target.value as SimilarToValue['position'] })}
>
<MenuItem value="all">All syllables</MenuItem>
<MenuItem value="final">Final</MenuItem>
<MenuItem value="initial">Initial</MenuItem>
<MenuItem value="medial">Medial</MenuItem>
</Select>
</FormControl>
<FormControl size="small" sx={{ minWidth: 120 }}
disabled={value.position === 'all' || value.position === 'medial'}>
<InputLabel>Count</InputLabel>
<Select
value={value.syllable_count}
label="Count"
onChange={(e) => update({ syllable_count: e.target.value as number })}
>
<MenuItem value={1}>1 syllable</MenuItem>
<MenuItem value={2}>2 syllables</MenuItem>
<MenuItem value={3}>3 syllables</MenuItem>
</Select>
</FormControl>
</Stack>
{(['onset', 'nucleus', 'coda'] as const).map((axis) => (
<Box key={axis}>
<Typography variant="body2" gutterBottom id={`${axis}-weight-label`}>
{axis.charAt(0).toUpperCase() + axis.slice(1)}: {value.weights[axis].toFixed(2)}
</Typography>
<Slider
aria-label={`${axis} weight`}
aria-labelledby={`${axis}-weight-label`}
value={value.weights[axis]}
onChange={(_, v) => update({ weights: { ...value.weights, [axis]: v as number } })}
min={0}
max={1}
step={0.05}
marks={[{ value: 0, label: '0' }, { value: 0.5, label: '0.5' }, { value: 1, label: '1' }]}
valueLabelDisplay="auto"
/>
</Box>
))}
<Box>
<Button
size="small"
startIcon={<ResetIcon />}
onClick={() => applyPreset(PRESETS[0])}
>
Reset to Rhymes
</Button>
</Box>
</Stack>
</AccordionDetails>
</Accordion>
</Stack>
</Paper>
);
};
export default SimilarToRule;
- [ ] Step 4: Run tests to verify they pass
npm test -- SimilarToRule
Expected: all 3 tests PASS.
- [ ] Step 5: Commit
git add packages/web/frontend/src/components/shared/SimilarToRule.tsx \
packages/web/frontend/src/test/SimilarToRule.test.tsx
git commit -m "feat(frontend): SimilarToRule — composable similarity rule
Lifts preset chips, labeled-threshold bucket select, and component-
weight sliders from PhonologicalSimilarityTool into a controlled
component. Advanced disclosure hides the per-axis sliders + position
controls; preset chips cover ~95% of clinical intent.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 10: usePropertyMetadata → ?surface=platform¶
Files:
- Modify: packages/web/frontend/src/hooks/usePropertyMetadata.tsx:44-52
- Modify: packages/web/frontend/src/services/apiClient.ts — getPropertyMetadata accepts optional surface arg
- Optional: smoke test if there's an existing hook test
- [ ] Step 1: Find and update the API client method
grep -n "getPropertyMetadata" packages/web/frontend/src/services/apiClient.ts
Locate the existing method (typically named getPropertyMetadata). Update it to accept an optional surface parameter:
async getPropertyMetadata(opts?: { surface?: 'platform' }): Promise<PropertyCategory[]> {
const url = opts?.surface
? `${API_BASE}/api/property-metadata?surface=${opts.surface}`
: `${API_BASE}/api/property-metadata`;
const res = await fetch(url);
if (!res.ok) throw new Error(`property-metadata failed: ${res.status}`);
return res.json();
}
(Exact field names depend on the existing apiClient shape — preserve the method-call pattern already in use.)
- [ ] Step 2: Update the hook
Edit packages/web/frontend/src/hooks/usePropertyMetadata.tsx:50:
const [categories, ranges] = await Promise.all([
api.getPropertyMetadata({ surface: 'platform' }),
api.getPropertyRanges(),
]);
- [ ] Step 3: Verify the build still typechecks
cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run build
Expected: build succeeds.
- [ ] Step 4: Commit
git add packages/web/frontend/src/hooks/usePropertyMetadata.tsx \
packages/web/frontend/src/services/apiClient.ts
git commit -m "feat(frontend): usePropertyMetadata calls ?surface=platform
Curated platform property surface (14 props across 4 groups) is the
default for in-app consumers. Researcher consumers continue to hit
the unparam'd route via API directly.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 11: Restructure Builder.tsx to 5 accordions + wire new rules¶
Files:
- Modify: packages/web/frontend/src/components/Builder.tsx (heavy rewrite of the state shape + accordion structure)
- Modify: packages/web/frontend/src/services/apiClient.ts — extend WordSearchRequest with cv_shape: string[] and similar_to: SimilarToValue; expose Word.cv_shape and Word.similarity
- [ ] Step 1: Extend
WordSearchRequestandWordtypes
Edit packages/web/frontend/src/services/apiClient.ts:77-85:
export interface WordSearchRequest {
patterns?: Pattern[];
filters?: WordFilterRequest;
exclude_phonemes?: string[];
cv_shape?: string[];
similar_to?: {
word: string;
weights: { onset: number; nucleus: number; coda: number };
threshold: number;
position: 'all' | 'initial' | 'final' | 'medial';
syllable_count: number;
};
sort_by?: string;
sort_order?: 'asc' | 'desc';
limit?: number;
offset?: number;
}
Find the Word interface in the same file and add (preserving optionality):
export interface Word {
// ... existing fields
cv_shape?: string | null;
similarity?: number;
}
- [ ] Step 2: Restructure Builder.tsx state
Edit packages/web/frontend/src/components/Builder.tsx. The current state holds patterns, filters (object id→[min,max]), excludePhonemeInput. Extend it to hold:
const [cvShapes, setCvShapes] = useState<string[]>([]);
const [similarTo, setSimilarTo] = useState<SimilarToValue>({
word: '',
weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
threshold: 0.85,
position: 'all',
syllable_count: 1,
});
Add imports near the top:
import CategoricalRule from './shared/CategoricalRule';
import SimilarToRule, { type SimilarToValue } from './shared/SimilarToRule';
- [ ] Step 3: Update the search payload assembly
Find the existing search request build (the handleSearch / handleSubmit function). Extend the payload:
const request: WordSearchRequest = {
patterns: nonEmptyPatterns,
filters: filterPayload,
exclude_phonemes: parsedExclusions.length ? parsedExclusions : undefined,
cv_shape: cvShapes.length ? cvShapes : undefined,
similar_to: similarTo.word.trim() ? similarTo : undefined,
limit: 200,
};
- [ ] Step 4: Build the per-section property helpers
Inside the Builder component (after usePropertyMetadata is destructured at the top of the function), derive per-section property lists from the platform metadata. The platform metadata returns 4 categories (phonological_complexity, lexical, semantic, affective, developmental_frequency); the frontend re-groups lexical + developmental_frequency into a single "Age Appropriateness" surface:
const propsByCategory = useMemo(() => {
const find = (id: string) => categories.find((c) => c.id === id)?.properties ?? [];
// Word Shape: numeric props (syllable_count, phoneme_count, wcm_score). cv_shape is categorical and rendered separately.
const wordShape = find('phonological_complexity').filter((p) => p.kind !== 'categorical');
// Age Appropriateness: aoa (lexical) + 5 freq_age_* headlines (developmental_frequency).
const ageAppropriateness = [...find('lexical'), ...find('developmental_frequency')];
// Imagery & Familiarity: concreteness + familiarity.
const imageryFamiliarity = find('semantic');
// Emotional Tone: valence + arousal.
const emotionalTone = find('affective');
return { wordShape, ageAppropriateness, imageryFamiliarity, emotionalTone };
}, [categories]);
- [ ] Step 5: Render the five accordion sections
Replace the existing accordion list in the return statement with these five sections. Reuse <PropertySlider> for numeric props (existing component); <CategoricalRule> and <SimilarToRule> are the new primitives:
return (
<Box>
<Stack spacing={{ xs: 1.5, sm: 2 }}>
{/* 1. Phoneme rules (default open) */}
<Accordion defaultExpanded>
<AccordionSummary expandIcon={<ExpandMoreIcon />}>
<Typography variant="h6" sx={{ fontSize: { xs: '1rem', sm: '1.25rem' } }}>
Phoneme rules
</Typography>
</AccordionSummary>
<AccordionDetails sx={{ px: { xs: 1.5, sm: 2 }, py: { xs: 1, sm: 2 } }}>
<Stack spacing={2}>
{/* Patterns block — keep existing pattern-builder UI verbatim (the
Paper-wrapped pattern rows starting at the current Builder.tsx
line ~236, including the IPA keyboard picker buttons). */}
{patternsBlock}
{/* Exclude phonemes block — keep existing TextField + parse logic verbatim. */}
{excludePhonemesBlock}
{/* New: similarity rule. Empty anchor = inactive. */}
<SimilarToRule value={similarTo} onChange={setSimilarTo} />
</Stack>
</AccordionDetails>
</Accordion>
{/* 2. Word Shape (default open) */}
<Accordion defaultExpanded>
<AccordionSummary expandIcon={<ExpandMoreIcon />}>
<Typography variant="h6">Word Shape</Typography>
</AccordionSummary>
<AccordionDetails>
<Stack spacing={2}>
{propsByCategory.wordShape.map((prop) => (
<PropertySlider
key={prop.id}
prop={prop}
value={filters[prop.id]}
range={ranges[prop.id]}
onChange={(v) => handleFilterChange(prop.id, v)}
/>
))}
<CategoricalRule
label="CV shape"
presets={['V', 'CV', 'VC', 'CVC', 'CCV', 'CCVC', 'CVCC', 'CCVCC', 'CV-CV', 'CV-CVC', 'CCV-CV']}
value={cvShapes}
onChange={setCvShapes}
allowCustom
customValidator={(s) => /^[CV]+(-[CV]+)*$/.test(s)}
/>
</Stack>
</AccordionDetails>
</Accordion>
{/* 3. Age Appropriateness (collapsed) */}
<Accordion>
<AccordionSummary expandIcon={<ExpandMoreIcon />}>
<Typography variant="h6">Age Appropriateness</Typography>
</AccordionSummary>
<AccordionDetails>
<Stack spacing={2}>
{propsByCategory.ageAppropriateness.map((prop) => (
<PropertySlider
key={prop.id}
prop={prop}
value={filters[prop.id]}
range={ranges[prop.id]}
onChange={(v) => handleFilterChange(prop.id, v)}
/>
))}
</Stack>
</AccordionDetails>
</Accordion>
{/* 4. Imagery & Familiarity (collapsed) */}
<Accordion>
<AccordionSummary expandIcon={<ExpandMoreIcon />}>
<Typography variant="h6">Imagery & Familiarity</Typography>
</AccordionSummary>
<AccordionDetails>
<Stack spacing={2}>
{propsByCategory.imageryFamiliarity.map((prop) => (
<PropertySlider
key={prop.id}
prop={prop}
value={filters[prop.id]}
range={ranges[prop.id]}
onChange={(v) => handleFilterChange(prop.id, v)}
/>
))}
</Stack>
</AccordionDetails>
</Accordion>
{/* 5. Emotional Tone (collapsed) */}
<Accordion>
<AccordionSummary expandIcon={<ExpandMoreIcon />}>
<Typography variant="h6">Emotional Tone</Typography>
</AccordionSummary>
<AccordionDetails>
<Stack spacing={2}>
{propsByCategory.emotionalTone.map((prop) => (
<PropertySlider
key={prop.id}
prop={prop}
value={filters[prop.id]}
range={ranges[prop.id]}
onChange={(v) => handleFilterChange(prop.id, v)}
/>
))}
</Stack>
</AccordionDetails>
</Accordion>
{/* Existing Build button + results section stay below */}
{buildButtonAndResults}
</Stack>
</Box>
);
Notes: -
patternsBlock,excludePhonemesBlock, andbuildButtonAndResultsare intermediate variables — extract them from the existing render JSX into namedconstbindings just before the return. This keeps the diff scoped to "restructure the accordion list" rather than touching the inner pattern/exclude UI. -useMemoimport: adduseMemoto the existingReact, { useState }import line at the top of the file. -PropertyDefexposes thekindfield added in Task 3; if thePropertySlidercomponent doesn't already understandkind, no change needed — we filter categorical props out at the helper layer (Step 4) so PropertySlider only sees numeric ones.
- [ ] Step 6: Handle
clearAllfor the new state
Update handleClear to also reset cvShapes to [] and similarTo to its default object.
- [ ] Step 7: Smoke test in dev
cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npx wrangler dev &
WORKER_PID=$!
cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run dev &
FRONTEND_PID=$!
Open the local frontend (default http://localhost:5173), navigate to Word Lists, and verify: - Phoneme rules expanded by default; pattern + exclude + similar-to all visible - Word Shape expanded by default; numeric sliders + CV shape chip picker visible - Age Appropriateness collapsed; expanding shows 6 sliders (aoa + 5 freq_age) - Imagery & Familiarity collapsed; expanding shows 2 sliders - Emotional Tone collapsed; expanding shows 2 sliders - Running a search returns results; clicking a similar-to preset rebuilds the query
Kill processes:
kill $WORKER_PID $FRONTEND_PID
- [ ] Step 8: Commit
git add packages/web/frontend/src/components/Builder.tsx \
packages/web/frontend/src/services/apiClient.ts
git commit -m "feat(frontend): Builder.tsx — 5-accordion SLP-curated surface
Phoneme rules (default open) holds patterns + exclude + new SimilarToRule.
Word Shape (default open) holds 3 sliders + new CategoricalRule for cv_shape.
Age Appropriateness / Imagery & Familiarity / Emotional Tone collapsed
secondary groups. Search request now carries cv_shape and similar_to.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 12: Delete PhonologicalSimilarityTool.tsx; update App copy¶
Files:
- Delete: packages/web/frontend/src/components/tools/PhonologicalSimilarityTool.tsx
- Modify: packages/web/frontend/src/App_new.tsx:74,117 — update Word Lists description; remove stale PHON-117 comment
- [ ] Step 1: Verify nothing else imports the tool
grep -rn "PhonologicalSimilarityTool" packages/web/frontend/src
Expected: only the file itself and a single comment in App_new.tsx:117. If any active imports remain, fix them before deleting.
- [ ] Step 2: Update App_new.tsx
Edit packages/web/frontend/src/App_new.tsx:74:
{
id: 'wordLists',
icon: <BuildIcon />,
title: 'Word Lists',
description: 'Build word lists for therapy and research. Filter by word shape, age-appropriateness, imagery, and emotional tone; compose with phoneme patterns, exclusions, and sound similarity.',
color: TOOL_COLORS.wordLists,
section: 'build',
},
Find line ~117 with the // PHON-117: Sound Similarity is being consolidated into Word Lists. comment and remove it (the consolidation is now done).
- [ ] Step 3: Delete the orphaned tool file
git rm packages/web/frontend/src/components/tools/PhonologicalSimilarityTool.tsx
- [ ] Step 4: Verify the frontend still builds
cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run build
Expected: build succeeds, no missing-import errors.
- [ ] Step 5: Run frontend tests
npm test
Expected: all tests pass.
- [ ] Step 6: Commit
git add packages/web/frontend/src/App_new.tsx \
packages/web/frontend/src/components/tools/PhonologicalSimilarityTool.tsx
git commit -m "chore(frontend): delete PhonologicalSimilarityTool; update Word Lists copy
Consolidated into the new SimilarToRule inside Builder.tsx. Tool was
already unregistered in TOOL_DEFS; this finishes the migration and
updates the Word Lists tool card description to reflect the unified
composable surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
Task 13: Regenerate parquet + D1 seed; verify end-to-end¶
Files: none modified; regenerate artifacts.
- [ ] Step 1: Regenerate
data/runtime/words.parquetand siblings
cd /Users/jneumann/Repos/PhonoLex
uv run python packages/data/scripts/build_runtime_parquet.py
Expected: completes without errors; new cv_shape column appears in data/runtime/words.parquet; freq_age_adult is populated for words with wpm_b4 or wpm_b5 coverage.
- [ ] Step 2: Spot-check the parquet output
uv run python -c "
import polars as pl
df = pl.read_parquet('data/runtime/words.parquet')
print('Columns include cv_shape:', 'cv_shape' in df.columns)
print('Columns include freq_age_adult:', 'freq_age_adult' in df.columns)
print('Sample cv_shape rows:')
print(df.select(['word', 'cv_shape']).head(20))
print('cv_shape value counts (top 20):')
print(df.group_by('cv_shape').len().sort('len', descending=True).head(20))
print('freq_age_adult coverage:')
print(df.select(pl.col('freq_age_adult').is_not_null().sum().alias('with_adult')))
"
Expected: cv_shape populated for all has_phonology rows; common shapes like CVC, CV-CVC, CCV-CVC appear with reasonable counts; freq_age_adult populated for the bulk of words.
- [ ] Step 3: Regenerate
d1-seed.sql
cd packages/web/workers
uv run python scripts/export-to-d1.py
Expected: regenerates scripts/d1-seed.sql with cv_shape column added to the words table CREATE statement and freq_age_adult + percentile in word_properties / word_percentiles.
- [ ] Step 4: Apply migration to local D1
npx wrangler d1 execute phonolex --local --file scripts/d1-seed.sql
Expected: completes; tables drop+recreate with new columns.
- [ ] Step 5: End-to-end smoke
npx wrangler dev &
WORKER_PID=$!
sleep 3
Then:
curl -s http://localhost:8787/api/property-metadata?surface=platform | jq '.[].properties[].id' | sort
Expected output (sorted ids): the 14 curated platform property ids.
curl -s -X POST http://localhost:8787/api/words/search \
-H 'Content-Type: application/json' \
-d '{"cv_shape": ["CVC"], "limit": 10}' | jq '.items | map(.cv_shape) | unique'
Expected: ["CVC"].
curl -s -X POST http://localhost:8787/api/words/search \
-H 'Content-Type: application/json' \
-d '{"similar_to": {"word":"cat","weights":{"onset":0.33,"nucleus":0.33,"coda":0.33},"threshold":0.75,"position":"all","syllable_count":1}, "cv_shape":["CVC"], "limit": 10}' \
| jq '.items | map({word, cv_shape, similarity})'
Expected: items have similarity desc-sorted, all cv_shape == "CVC".
Kill the worker:
kill $WORKER_PID
- [ ] Step 6: Commit the regenerated artifacts
git add data/runtime/words.parquet \
packages/web/workers/scripts/d1-seed.sql
git commit -m "data: regenerate parquet + d1-seed with cv_shape and freq_age_adult
Re-emit data/runtime/words.parquet via build_runtime_parquet.py and
packages/web/workers/scripts/d1-seed.sql via export-to-d1.py to pick
up the new derived columns. End-to-end smoke (cv_shape filter +
similar_to intersection) passes locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"
- [ ] Step 7: Push to origin
git push origin feature/phon-116-naturalness-scorer
Expected: push succeeds; CI runs.
Closing checklist¶
- [ ] All 13 tasks committed
- [ ] Worker test suite passes (
cd packages/web/workers && npm test) - [ ] Data test suite passes (
uv run python -m pytest packages/data/tests/ -v) - [ ] Frontend builds (
cd packages/web/frontend && npm run build) - [ ] Frontend tests pass (
cd packages/web/frontend && npm test) - [ ] Local end-to-end smoke (Task 13 Step 5) returns the expected curated property set + filters apply
- [ ] CI green on the pushed branch