Word Lists SLP Curation Implementation Plan¶

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Curate the platform Word Lists UI from 30 properties down to 14 under SLP-language labels, fold Sound Similarity in as a composable rule, add cv_shape (CV-skeleton categorical filter) and freq_age_adult (adult-band aggregated headline) — while keeping the API researcher-grade.

Architecture: - Data layer adds two derivations: cv_shape (string, derived from existing Syllable objects) and freq_age_adult (numeric headline, mean of wpm_b4 + wpm_b5). - PropertyDef gains platform_visible: boolean and kind: 'numeric' | 'categorical'. /api/property-metadata gains ?surface=platform; 14 properties get tagged. - /api/words/search gains an optional similar_to block (server-side intersection with similarity ranking) and a cv_shape filter list. - Frontend restructures Builder.tsx into 5 accordions, lifts preset chips + labeled-threshold pattern from PhonologicalSimilarityTool.tsx into a new reusable SimilarToRule, adds reusable CategoricalRule for cv_shape, then deletes the orphan PhonologicalSimilarityTool.

Tech Stack: Python 3.12 (pipeline, pytest), Polars (parquet), TypeScript (Hono on Cloudflare Workers, vitest, React + MUI), D1 (SQLite).

Spec: docs/superpowers/specs/2026-05-14-word-lists-slp-curation-design.md (commit e38a62c0).

Branch: feature/phon-116-naturalness-scorer (general catch-all per user direction; pile commits here).

File map¶

Created: - packages/web/frontend/src/components/shared/SimilarToRule.tsx - packages/web/frontend/src/components/shared/CategoricalRule.tsx - packages/data/tests/test_cv_shape.py - packages/data/tests/test_freq_age_adult.py - packages/web/workers/test/routes/meta.surface-platform.test.ts - packages/web/workers/test/routes/words.search-similar.test.ts

Modified: - packages/data/src/phonolex_data/pipeline/words.py — add cv_shape derivation + freq_age_adult aggregation - packages/data/src/phonolex_data/pipeline/schema.py — add cv_shape: str | None + freq_age_adult: float | None to WordRecord - packages/data/src/phonolex_data/runtime/schema.py — add cv_shape to _CORE_WORDS_COLUMNS - packages/data/src/phonolex_data/runtime/emit_d1_sql.py — ensure cv_shape ships on words table; freq_age_adult percentile flows like siblings - packages/data/src/phonolex_data/runtime/store.py — freq_age_adult percentile mapping - packages/data/src/phonolex_data/pipeline/derived.py — freq_age_adult percentile inclusion - packages/web/workers/src/config/properties.ts — interface extension, new PropertyDefs, platform_visible flags, getPlatformCategories() - packages/web/workers/src/lib/queries.ts — cv_shape recognised as words-table column; partitionFilterColumns extended for IN-list categorical filter - packages/web/workers/src/routes/meta.ts — ?surface=platform query param handling - packages/web/workers/src/routes/words.ts — cv_shape filter clause + similar_to intersection logic - packages/web/workers/src/types.ts — add similar_to to WordSearchBody, cv_shape field on WordRow - packages/web/workers/scripts/config.py — mirror property metadata changes - packages/web/frontend/src/services/apiClient.ts — extend WordSearchRequest with similar_to + cv_shape; expose Word.cv_shape + Word.similarity - packages/web/frontend/src/hooks/usePropertyMetadata.tsx — call ?surface=platform - packages/web/frontend/src/components/Builder.tsx — restructure to 5 accordions; wire CategoricalRule + SimilarToRule - packages/web/frontend/src/App_new.tsx — update Word Lists description; remove obsolete PHON-117 comment

Deleted: - packages/web/frontend/src/components/tools/PhonologicalSimilarityTool.tsx

Task 1: Add `cv_shape` derivation in the data pipeline¶

Files: - Create: packages/data/tests/test_cv_shape.py - Modify: packages/data/src/phonolex_data/pipeline/schema.py:48 (add column to WordRecord) - Modify: packages/data/src/phonolex_data/pipeline/words.py:199-210 (set cv_shape in returned WordRecord) - Modify: packages/data/src/phonolex_data/runtime/schema.py:48 (add to _CORE_WORDS_COLUMNS)

[ ] Step 1: Write the failing test

# packages/data/tests/test_cv_shape.py
from phonolex_data.phonology.syllabification import (
    syllabify, PhonemeWithStress,
)
from phonolex_data.pipeline.words import _build_phonology_record


def _cv_shape_for(phonemes_with_stress):
    """Helper: syllabify then derive CV shape via the same code path the pipeline uses."""
    syls = syllabify(phonemes_with_stress)
    parts = []
    for s in syls:
        parts.append("C" * len(s.onset) + "V" + "C" * len(s.coda))
    return "-".join(parts)


def test_cv_shape_monosyllabic_cvc():
    """cat /k.æ.t/ → CVC"""
    phs = [
        PhonemeWithStress("k", None),
        PhonemeWithStress("æ", 1),
        PhonemeWithStress("t", None),
    ]
    assert _cv_shape_for(phs) == "CVC"


def test_cv_shape_initial_cluster():
    """spring /s.p.ɹ.ɪ.ŋ/ → CCCVC"""
    phs = [
        PhonemeWithStress("s", None),
        PhonemeWithStress("p", None),
        PhonemeWithStress("ɹ", None),
        PhonemeWithStress("ɪ", 1),
        PhonemeWithStress("ŋ", None),
    ]
    assert _cv_shape_for(phs) == "CCCVC"


def test_cv_shape_disyllabic():
    """kitten /k.ɪ.t.ə.n/ → CVC-VC"""
    phs = [
        PhonemeWithStress("k", None),
        PhonemeWithStress("ɪ", 1),
        PhonemeWithStress("t", None),
        PhonemeWithStress("ə", 0),
        PhonemeWithStress("n", None),
    ]
    assert _cv_shape_for(phs) == "CVC-VC"


def test_cv_shape_diphthong_counts_as_single_v():
    """boat /b.oʊ.t/ → CVC (oʊ is one nucleus phoneme)"""
    phs = [
        PhonemeWithStress("b", None),
        PhonemeWithStress("oʊ", 1),
        PhonemeWithStress("t", None),
    ]
    assert _cv_shape_for(phs) == "CVC"


def test_cv_shape_pipeline_emits_field():
    """The pipeline's _build_phonology_record sets WordRecord.cv_shape."""
    phono_data = {
        "word": "cat",
        "phonemes": ["k", "æ", "t"],
        "stress_pattern": [None, 1, None],
        "ipa": "kæt",
    }
    rec = _build_phonology_record(phono_data)
    assert rec.cv_shape == "CVC"

[ ] Step 2: Run test to verify it fails

cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/test_cv_shape.py -v

Expected: 5 tests fail (4 pure-helper tests pass if helper logic is correct; test_cv_shape_pipeline_emits_field fails with AttributeError: 'WordRecord' object has no attribute 'cv_shape').

[ ] Step 3: Add cv_shape to WordRecord

Edit packages/data/src/phonolex_data/pipeline/schema.py. Find the structural-cols block in the WordRecord dataclass (around the wcm_score field, line ~48 — look for wcm_score: int | None = None and add the new line directly below):

    wcm_score: int | None = None
    cv_shape: str | None = None  # CV skeleton from syllabification ("CVC", "CCVCC", "CV-CVC", ...)

[ ] Step 4: Derive cv_shape in _build_phonology_record

Edit packages/data/src/phonolex_data/pipeline/words.py. Find the existing WordRecord(...) construction in _build_phonology_record (starts at line ~199). Just before the return statement, compute the shape from the already-built syllables_obj:

    # Existing code computes syllables_obj and wcm above; now derive CV shape.
    cv_shape_parts = []
    for s in syllables_obj:
        cv_shape_parts.append("C" * len(s.onset) + "V" + "C" * len(s.coda))
    cv_shape = "-".join(cv_shape_parts) if cv_shape_parts else None

    return WordRecord(
        word=phono_data.get("word", ""),
        has_phonology=True,
        ipa=ipa,
        phonemes=phonemes,
        phoneme_count=len(phonemes),
        syllables=syllables,
        syllable_count=len(syllables),
        initial_phoneme=phonemes[0] if phonemes else None,
        final_phoneme=phonemes[-1] if phonemes else None,
        wcm_score=wcm,
        cv_shape=cv_shape,
    )

[ ] Step 5: Register cv_shape in the runtime schema

Edit packages/data/src/phonolex_data/runtime/schema.py. Find the _CORE_WORDS_COLUMNS dict (line 37). Add directly after "wcm_score": pl.Int32,:

    "wcm_score": pl.Int32,
    "cv_shape": pl.Utf8,  # CV skeleton derived from syllabification; categorical filter for SLP word-shape queries

[ ] Step 6: Run tests to verify they pass

uv run python -m pytest packages/data/tests/test_cv_shape.py -v

Expected: all 5 tests PASS.

[ ] Step 7: Commit

git add packages/data/tests/test_cv_shape.py \
        packages/data/src/phonolex_data/pipeline/schema.py \
        packages/data/src/phonolex_data/pipeline/words.py \
        packages/data/src/phonolex_data/runtime/schema.py
git commit -m "feat(data): derive cv_shape column from existing syllabification

CV skeleton string emitted per word from the existing Syllable objects.
Lives on the words table (first non-numeric platform property).
Examples: cat→CVC, spring→CCCVC, kitten→CVC-VC, boat→CVC.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 2: Add `freq_age_adult` headline aggregation¶

Files: - Create: packages/data/tests/test_freq_age_adult.py - Modify: packages/data/src/phonolex_data/pipeline/schema.py:197-200 (add freq_age_adult field next to siblings) - Modify: packages/data/src/phonolex_data/pipeline/words.py:399-415 (compute headline) - Modify: packages/data/src/phonolex_data/pipeline/derived.py:32 (include in percentile set) - Modify: packages/data/src/phonolex_data/runtime/store.py:67-70 (percentile mapping) - Modify: packages/data/src/phonolex_data/runtime/emit_d1_sql.py:70 (treat as a freq_age headline for D1 placement)

[ ] Step 1: Write the failing test

# packages/data/tests/test_freq_age_adult.py
from phonolex_data.pipeline.schema import WordRecord
from phonolex_data.pipeline.words import _agg_mean  # exposed for testing if not, see step note


def test_freq_age_adult_mean_of_b4_b5():
    """freq_age_adult = mean(wpm_b4, wpm_b5), None treated as 0 unless ALL are None."""
    rec = WordRecord(word="example")
    rec.wpm_b4 = 100.0
    rec.wpm_b5 = 200.0
    # The aggregation step runs over records in _build_words; emulate it here.
    rec.freq_age_adult = _agg_mean([rec.wpm_b4, rec.wpm_b5])
    assert rec.freq_age_adult == 150.0


def test_freq_age_adult_none_when_all_none():
    rec = WordRecord(word="missing")
    rec.wpm_b4 = None
    rec.wpm_b5 = None
    rec.freq_age_adult = _agg_mean([rec.wpm_b4, rec.wpm_b5])
    assert rec.freq_age_adult is None


def test_freq_age_adult_partial_coverage():
    """One None counts as 0 per sibling-aggregation semantics."""
    rec = WordRecord(word="partial")
    rec.wpm_b4 = 80.0
    rec.wpm_b5 = None
    rec.freq_age_adult = _agg_mean([rec.wpm_b4, rec.wpm_b5])
    assert rec.freq_age_adult == 40.0

Note: _agg_mean is currently defined locally inside _build_words at packages/data/src/phonolex_data/pipeline/words.py:390. Promote it to module scope (rename helper out of the closure) as part of Step 4 below so it's importable in tests.

[ ] Step 2: Run test to verify it fails

uv run python -m pytest packages/data/tests/test_freq_age_adult.py -v

Expected: 3 tests FAIL — ImportError: cannot import name '_agg_mean' and AttributeError: 'WordRecord' object has no attribute 'freq_age_adult'.

[ ] Step 3: Add freq_age_adult to WordRecord

Edit packages/data/src/phonolex_data/pipeline/schema.py. Find the existing freq_age headline declarations (line ~197). Add freq_age_adult directly after freq_age_12y:

    freq_age_2y: float | None = None
    freq_age_5y: float | None = None
    freq_age_8y: float | None = None
    freq_age_12y: float | None = None
    freq_age_adult: float | None = None  # mean(wpm_b4, wpm_b5); high-school + college reading bands

Also update the comment block directly above (the # freq_age_12y = mean(...) lines around line 191) by adding:

    #   freq_age_adult = mean(wpm_b4, wpm_b5)

[ ] Step 4: Promote _agg_mean and add the adult aggregation

Edit packages/data/src/phonolex_data/pipeline/words.py. Move _agg_mean from its closure inside _build_words to module scope (just above _build_words):

def _agg_mean(values):
    """Mean treating None as 0; returns None only if ALL inputs are None.

    Used for the PHON-88 freq_age_* headline aggregations.
    """
    if all(v is None for v in values):
        return None
    cleaned = [v if v is not None else 0.0 for v in values]
    return sum(cleaned) / len(cleaned)

Inside _build_words, remove the local nested _agg_mean definition (lines ~390-395 in the snapshot read during planning), and add the new aggregation line directly after the existing record.freq_age_12y = _agg_mean([...]) block (around line 412):

        record.freq_age_12y = _agg_mean([
            record.wpm_childes_input_108_144mo, record.wpm_b3,
        ])
        record.freq_age_adult = _agg_mean([
            record.wpm_b4, record.wpm_b5,
        ])
        if record.freq_age_2y is not None:
            n_with_2y += 1

[ ] Step 5: Run tests to verify they pass

uv run python -m pytest packages/data/tests/test_freq_age_adult.py -v

Expected: all 3 tests PASS.

[ ] Step 6: Include in percentile set and runtime mapping

Edit packages/data/src/phonolex_data/pipeline/derived.py. Find the FREQ_AGE list (line 32) and add "freq_age_adult":

    "freq_age_2y", "freq_age_5y", "freq_age_8y", "freq_age_12y", "freq_age_adult",

Edit packages/data/src/phonolex_data/runtime/store.py. Find the percentile mapping tuples around line 67–70 and add the adult entry:

    ("freq_age_2y", "freq_age_2y_percentile"),
    ("freq_age_5y", "freq_age_5y_percentile"),
    ("freq_age_8y", "freq_age_8y_percentile"),
    ("freq_age_12y", "freq_age_12y_percentile"),
    ("freq_age_adult", "freq_age_adult_percentile"),

Edit packages/data/src/phonolex_data/runtime/emit_d1_sql.py. Find _FREQ_AGE_HEADLINES (line 70) and add freq_age_adult:

_FREQ_AGE_HEADLINES = {"freq_age_2y", "freq_age_5y", "freq_age_8y", "freq_age_12y", "freq_age_adult"}

[ ] Step 7: Run the full data package test suite

uv run python -m pytest packages/data/tests/ -v

Expected: all existing tests still pass; the new freq_age_adult tests pass.

[ ] Step 8: Commit

git add packages/data/tests/test_freq_age_adult.py \
        packages/data/src/phonolex_data/pipeline/schema.py \
        packages/data/src/phonolex_data/pipeline/words.py \
        packages/data/src/phonolex_data/pipeline/derived.py \
        packages/data/src/phonolex_data/runtime/store.py \
        packages/data/src/phonolex_data/runtime/emit_d1_sql.py
git commit -m "feat(data): add freq_age_adult headline (mean of wpm_b4 + wpm_b5)

Adult-band development-frequency aggregation parallel to the existing
freq_age_2y/5y/8y/12y headlines. Promotes _agg_mean to module scope so
the aggregation helper is testable.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 3: Extend `PropertyDef` in the worker config (TypeScript)¶

Files: - Modify: packages/web/workers/src/config/properties.ts:7-29 (interface) + new entries + getPlatformCategories()

[ ] Step 1: Extend the interface and add getPlatformCategories

Edit packages/web/workers/src/config/properties.ts. Update the interface block at the top:

export interface PropertyDef {
  id: string;
  label: string;
  short_label: string;
  source: string;
  description: string;
  scale: string;
  interpretation: string;
  display_format: string;
  filterable: boolean;
  slider_step: number;
  use_log_scale: boolean;
  is_integer: boolean;
  /**
   * When false, the prop is hidden from /api/property-metadata entirely
   * (still ships to D1 for round-trip via /api/words/:word). Defaults true.
   */
  surfaced?: boolean;
  /**
   * When true, the prop appears in the curated platform UI surface
   * (returned by GET /api/property-metadata?surface=platform). Defaults
   * undefined (= API-only). Only opt in explicit clinical workhorses.
   */
  platform_visible?: boolean;
  /**
   * Renderer/filter kind. 'numeric' (default) → range slider, min/max.
   * 'categorical' → chip-list multi-select, exact-or-IN match.
   */
  kind?: 'numeric' | 'categorical';
}

[ ] Step 2: Add the cv_shape PropertyDef inside the Phonological Complexity category

In the same file, find the phonological_complexity category (line 39) and append to its properties array — the WCM entry currently closes the array; insert cv_shape directly after it (keeping the existing trailing comma on wcm_score):

      {
        id: 'wcm_score', label: 'Word Complexity Measure', short_label: 'WCM',
        // ... existing fields
      },
      {
        id: 'cv_shape', label: 'CV Shape', short_label: 'Shape',
        source: 'Derived from CMU syllabification',
        description: 'Consonant–vowel skeleton; one CV-letter per phoneme, dash between syllables (e.g., CVC, CVC-VC, CCVCC).',
        scale: 'string',
        interpretation: 'Categorical match; supports multi-select OR within rule.',
        display_format: 'string',
        filterable: true,
        slider_step: 0,
        use_log_scale: false,
        is_integer: false,
        platform_visible: true,
        kind: 'categorical',
      },

[ ] Step 3: Add freq_age_adult to DEV_FREQ_HEADLINES

In the same file, find DEV_FREQ_HEADLINES (line 368). Append a fifth entry after the freq_age_12y block (keep the trailing comma on the existing last entry):

  {
    id: 'freq_age_12y', label: 'Developmental Freq (~12 yrs)', short_label: 'DF12y',
    // ... existing fields
  },
  {
    id: 'freq_age_adult', label: 'Developmental Freq (~Adult)', short_label: 'DFAdult',
    source: 'PhonoLex Developmental Frequency',
    description:
      'Aggregated words-per-million across FineWeb-Edu high-school + college reading bands ' +
      '(mean of wpm_b4 + wpm_b5; missing treated as 0).',
    scale: '0-50000',
    interpretation: 'Higher = more frequent in adult-level reading material',
    display_format: '.2f',
    filterable: true,
    slider_step: 10,
    use_log_scale: true,
    is_integer: false,
    surfaced: true,
    platform_visible: true,
  },

[ ] Step 4: Tag the remaining 12 platform-visible properties

In the same file, set platform_visible: true on each of these PropertyDef records (cv_shape + freq_age_adult already tagged in Steps 2–3):

File location (approximate line)	id	Where to add
line 43 (`syllable_count`)	`syllable_count`	add `platform_visible: true,` to the record
line 51 (`phoneme_count`)	`phoneme_count`	same
line 58 (`wcm_score`)	`wcm_score`	same
line 151 (`aoa`)	`aoa`	same
line 174 (`familiarity`)	`familiarity`	same
line 182 (`concreteness`)	`concreteness`	same
line 196 (`valence`)	`valence`	same
line 204 (`arousal`)	`arousal`	same
line 370 (`freq_age_2y`)	`freq_age_2y`	same
line 381 (`freq_age_5y`)	`freq_age_5y`	same
line 393 (`freq_age_8y`)	`freq_age_8y`	same
line 404 (`freq_age_12y`)	`freq_age_12y`	same

Example pattern for any one of them:

      {
        id: 'syllable_count', label: 'Syllable Count', short_label: 'Syl',
        source: 'CMU Pronouncing Dictionary',
        description: 'Number of syllables', scale: '1-8',
        interpretation: 'More syllables = more complex',
        display_format: '.0f', filterable: true, slider_step: 1,
        use_log_scale: false, is_integer: true,
        platform_visible: true,
      },

[ ] Step 5: Add getPlatformCategories()

Edit the same file. After getSurfacedCategories() (currently ends at line 500), add:

/**
 * Property categories filtered to platform_visible=true. Used by
 * /api/property-metadata?surface=platform to drive the curated SLP UI.
 * Categories that become empty after filtering are dropped.
 */
export function getPlatformCategories(): PropertyCategory[] {
  const out: PropertyCategory[] = [];
  for (const cat of PROPERTY_CATEGORIES) {
    const props = cat.properties.filter(
      (p) => p.surfaced !== false && p.platform_visible === true,
    );
    if (props.length > 0) out.push({ id: cat.id, label: cat.label, properties: props });
  }
  return out;
}

[ ] Step 6: Typecheck the worker package

cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npx tsc --noEmit

Expected: no type errors. (If FILTERABLE_PROPERTIES picks up cv_shape automatically because filterable: true — that's intended; downstream validation in partitionFilterColumns will be extended in Task 5.)

[ ] Step 7: Commit

git add packages/web/workers/src/config/properties.ts
git commit -m "feat(api): extend PropertyDef with platform_visible + kind; add cv_shape and freq_age_adult

PropertyDef gains platform_visible (curated platform UI flag) and
kind (numeric|categorical, default numeric). 14 properties tagged
platform_visible across Word Shape / Age Appropriateness / Imagery &
Familiarity / Emotional Tone groups. getPlatformCategories helper
returns the curated subset.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 4: Mirror PropertyDef changes in `scripts/config.py`¶

Files: - Modify: packages/web/workers/scripts/config.py — mirror cv_shape PropertyDef, freq_age_adult PropertyDef, platform_visible flags

[ ] Step 1: Read the existing pattern

sed -n '180,210p' packages/web/workers/scripts/config.py
sed -n '450,540p' packages/web/workers/scripts/config.py

Confirm the Python dataclass / dict pattern used; mirror style exactly.

[ ] Step 2: Mirror the cv_shape PropertyDef

Edit packages/web/workers/scripts/config.py. Find the WCM PropertyDef entry (search id="wcm_score"). Add a cv_shape PropertyDef directly after it using the same dataclass/dict shape as the surrounding entries. Required fields (matching the TS):

PropertyDef(
    id="cv_shape",
    label="CV Shape",
    short_label="Shape",
    source="Derived from CMU syllabification",
    description=(
        "Consonant-vowel skeleton; one CV-letter per phoneme, dash between "
        "syllables (e.g., CVC, CVC-VC, CCVCC)."
    ),
    scale="string",
    interpretation="Categorical match; supports multi-select OR within rule.",
    display_format="string",
    filterable=True,
    slider_step=0,
    use_log_scale=False,
    is_integer=False,
    platform_visible=True,
    kind="categorical",
),

If the existing PropertyDef dataclass in this file doesn't declare platform_visible and kind, extend it (add platform_visible: bool | None = None and kind: str = "numeric" to the dataclass declaration) and ensure both new fields are emitted through whatever serialization path downstream consumers use.

[ ] Step 3: Mirror the freq_age_adult PropertyDef

Find the freq_age_12y PropertyDef (line ~524) and add a freq_age_adult entry directly below with the same shape:

PropertyDef(
    id="freq_age_adult",
    label="Developmental Freq (~Adult)",
    short_label="DFAdult",
    source="PhonoLex Developmental Frequency",
    description=(
        "Aggregated words-per-million across FineWeb-Edu high-school + "
        "college reading bands (mean of wpm_b4 + wpm_b5; missing treated as 0)."
    ),
    scale="0-50000",
    interpretation="Higher = more frequent in adult-level reading material",
    display_format=".2f",
    filterable=True,
    slider_step=10,
    use_log_scale=True,
    is_integer=False,
    surfaced=True,
    platform_visible=True,
),

Also extend the comment block above freq_age_2y to add # freq_age_adult = mean(wpm_b4, wpm_b5) for parity with words.py.

[ ] Step 4: Tag the 12 remaining platform-visible properties

Add platform_visible=True, to the PropertyDef records in scripts/config.py for: syllable_count, phoneme_count, wcm_score, aoa, familiarity, concreteness, valence, arousal, freq_age_2y, freq_age_5y, freq_age_8y, freq_age_12y.

[ ] Step 5: Run the existing data package tests to verify the config still parses

uv run python -m pytest packages/data/tests/ -v

Expected: all data tests still pass (no behavior change in the pipeline yet beyond Tasks 1–2).

[ ] Step 6: Commit

git add packages/web/workers/scripts/config.py
git commit -m "feat(config): mirror PropertyDef extension in scripts/config.py

Python config picks up platform_visible + kind fields, cv_shape +
freq_age_adult PropertyDef records, and 12 platform_visible flags.
Mirrors the TS config in packages/web/workers/src/config/properties.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 5: `/api/property-metadata?surface=platform`¶

Files: - Create: packages/web/workers/test/routes/meta.surface-platform.test.ts - Modify: packages/web/workers/src/routes/meta.ts:47-52

[ ] Step 1: Write the failing test

// packages/web/workers/test/routes/meta.surface-platform.test.ts
import { describe, expect, it } from 'vitest';
import { unstable_dev } from 'wrangler';
import type { UnstableDevWorker } from 'wrangler';

describe('GET /api/property-metadata?surface=platform', () => {
  let worker: UnstableDevWorker;

  beforeAll(async () => {
    worker = await unstable_dev('src/index.ts', {
      experimental: { disableExperimentalWarning: true },
      local: true,
    });
  });

  afterAll(async () => {
    await worker.stop();
  });

  it('returns only platform_visible categories', async () => {
    const res = await worker.fetch('/api/property-metadata?surface=platform');
    expect(res.status).toBe(200);
    const cats = (await res.json()) as Array<{ id: string; properties: Array<{ id: string }> }>;

    const ids = cats.flatMap((c) => c.properties.map((p) => p.id));
    // Must include the 14 curated platform properties
    expect(ids).toEqual(expect.arrayContaining([
      'syllable_count', 'phoneme_count', 'wcm_score', 'cv_shape',
      'aoa', 'freq_age_2y', 'freq_age_5y', 'freq_age_8y', 'freq_age_12y', 'freq_age_adult',
      'concreteness', 'familiarity',
      'valence', 'arousal',
    ]));
    // Must NOT include researcher-only props
    expect(ids).not.toContain('phono_prob_avg');
    expect(ids).not.toContain('iconicity');
    expect(ids).not.toContain('semd_topic');
    expect(ids).not.toContain('log_frequency');
  });

  it('returns the full surfaced set when no surface param is given', async () => {
    const res = await worker.fetch('/api/property-metadata');
    expect(res.status).toBe(200);
    const cats = (await res.json()) as Array<{ id: string; properties: Array<{ id: string }> }>;
    const ids = cats.flatMap((c) => c.properties.map((p) => p.id));
    // Full surfaced set includes researcher-grade props
    expect(ids).toContain('phono_prob_avg');
    expect(ids).toContain('iconicity');
  });
});

[ ] Step 2: Run test to verify it fails

cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npm test -- meta.surface-platform

Expected: FAIL (the route does not yet handle ?surface=platform; returns the full set).

[ ] Step 3: Update the meta route

Edit packages/web/workers/src/routes/meta.ts. Update the import (line 7) and the property-metadata handler (lines 47-52):

import { getSurfacedCategories, getPlatformCategories } from '../config/properties';

meta.get('/property-metadata', (c) => {
  const surface = c.req.query('surface');
  if (surface === 'platform') {
    return c.json(getPlatformCategories());
  }
  return c.json(getSurfacedCategories());
});

[ ] Step 4: Run test to verify it passes

npm test -- meta.surface-platform

Expected: both tests PASS.

[ ] Step 5: Commit

git add packages/web/workers/src/routes/meta.ts \
        packages/web/workers/test/routes/meta.surface-platform.test.ts
git commit -m "feat(api): /api/property-metadata?surface=platform for curated UI

Researcher consumers continue to hit the unparam'd route and get the
full surfaced set. The frontend opts into ?surface=platform for the
curated 14-property platform surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 6: `cv_shape` filter on `/api/words/search`¶

Files: - Modify: packages/web/workers/src/types.ts — add cv_shape?: string[] to WordSearchBody and cv_shape?: string | null to WordRow - Modify: packages/web/workers/src/lib/queries.ts:20-33 — add cv_shape to WORDS_TABLE_COLS - Modify: packages/web/workers/src/routes/words.ts:93-145 — accept and apply the filter

[ ] Step 1: Write the failing test

Add a new test case to (or create) packages/web/workers/test/routes/words.search-cv-shape.test.ts:

import { describe, expect, it, beforeAll, afterAll } from 'vitest';
import { unstable_dev } from 'wrangler';
import type { UnstableDevWorker } from 'wrangler';

describe('POST /api/words/search with cv_shape filter', () => {
  let worker: UnstableDevWorker;

  beforeAll(async () => {
    worker = await unstable_dev('src/index.ts', {
      experimental: { disableExperimentalWarning: true },
      local: true,
    });
  });
  afterAll(async () => { await worker.stop(); });

  it('returns only words whose cv_shape matches', async () => {
    const res = await worker.fetch('/api/words/search', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ cv_shape: ['CVC'], limit: 50 }),
    });
    expect(res.status).toBe(200);
    const json = (await res.json()) as { items: Array<{ cv_shape: string }> };
    expect(json.items.length).toBeGreaterThan(0);
    for (const item of json.items) {
      expect(item.cv_shape).toBe('CVC');
    }
  });

  it('OR-matches across a multi-shape list', async () => {
    const res = await worker.fetch('/api/words/search', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ cv_shape: ['CVC', 'CCVC'], limit: 50 }),
    });
    const json = (await res.json()) as { items: Array<{ cv_shape: string }> };
    for (const item of json.items) {
      expect(['CVC', 'CCVC']).toContain(item.cv_shape);
    }
  });

  it('ignores empty cv_shape array', async () => {
    const res = await worker.fetch('/api/words/search', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ cv_shape: [], limit: 5 }),
    });
    expect(res.status).toBe(200);
  });
});

[ ] Step 2: Run to verify it fails

npm test -- words.search-cv-shape

Expected: FAIL — the filter is unimplemented; items have no cv_shape field.

[ ] Step 3: Update types

Edit packages/web/workers/src/types.ts. Find the WordSearchBody interface and add:

export interface WordSearchBody {
  // ... existing fields
  cv_shape?: string[];          // OR list of CV shapes; empty/undefined = no filter
}

Find the WordRow type and add (string column, nullable):

export interface WordRow {
  // ... existing fields
  cv_shape?: string | null;
}

[ ] Step 4: Register cv_shape as a words-table column

Edit packages/web/workers/src/lib/queries.ts:20. Add 'cv_shape' to WORDS_TABLE_COLS:

const WORDS_TABLE_COLS = new Set([
  'word', 'has_phonology', 'ipa', 'phonemes', 'phonemes_str',
  'syllables', 'phoneme_count', 'syllable_count',
  'initial_phoneme', 'final_phoneme', 'root', 'variants', 'cv_shape',
]);

[ ] Step 5: Apply the filter in the search route

Edit packages/web/workers/src/routes/words.ts. In words.post('/search', ...) (line 93), after the existing exclude_phonemes block (around line 142) and before the WHERE assembly:

  // CV-shape filter — OR within array, words table column
  if (body.cv_shape?.length) {
    const placeholders = body.cv_shape.map(() => '?').join(', ');
    wordsWhere.push(`w.cv_shape IN (${placeholders})`);
    params.push(...body.cv_shape);
  }

[ ] Step 6: Ensure cv_shape round-trips on the response

Open packages/web/workers/src/lib/wordResponse.ts and confirm rowToWordResponse copies all WordRow fields (it typically does via spread). If it whitelists fields explicitly, add cv_shape to the whitelist:

grep -n "cv_shape\|wcm_score\|syllable_count" packages/web/workers/src/lib/wordResponse.ts

If cv_shape is missing where wcm_score appears, add it next to that field.

[ ] Step 7: Run tests to verify they pass

npm test -- words.search-cv-shape

Expected: all 3 tests PASS.

[ ] Step 8: Commit

git add packages/web/workers/src/types.ts \
        packages/web/workers/src/lib/queries.ts \
        packages/web/workers/src/routes/words.ts \
        packages/web/workers/src/lib/wordResponse.ts \
        packages/web/workers/test/routes/words.search-cv-shape.test.ts
git commit -m "feat(api): cv_shape filter on POST /api/words/search

Accepts cv_shape: string[] with OR semantics within the array;
ANDs with the rest of the query. Recognised as a words-table column
in queries.ts; round-trips through rowToWordResponse.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 7: `similar_to` block on `/api/words/search`¶

Files: - Create: packages/web/workers/test/routes/words.search-similar.test.ts - Modify: packages/web/workers/src/types.ts — add similar_to block to WordSearchBody; add similarity?: number to WordRow/response - Modify: packages/web/workers/src/routes/words.ts:93-214 — intersect with similarity scan when block is present

[ ] Step 1: Write the failing test

// packages/web/workers/test/routes/words.search-similar.test.ts
import { describe, expect, it, beforeAll, afterAll } from 'vitest';
import { unstable_dev } from 'wrangler';
import type { UnstableDevWorker } from 'wrangler';

describe('POST /api/words/search with similar_to block', () => {
  let worker: UnstableDevWorker;

  beforeAll(async () => {
    worker = await unstable_dev('src/index.ts', {
      experimental: { disableExperimentalWarning: true },
      local: true,
    });
  });
  afterAll(async () => { await worker.stop(); });

  it('returns words ranked by similarity to anchor when similar_to provided', async () => {
    const res = await worker.fetch('/api/words/search', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        similar_to: {
          word: 'cat',
          weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
          threshold: 0.7,
          position: 'all',
          syllable_count: 1,
        },
        limit: 20,
      }),
    });
    expect(res.status).toBe(200);
    const json = (await res.json()) as { items: Array<{ word: string; similarity: number }> };
    expect(json.items.length).toBeGreaterThan(0);
    // Each item carries a similarity score >= threshold
    for (const item of json.items) {
      expect(item.similarity).toBeGreaterThanOrEqual(0.7);
    }
    // Results are sorted by similarity descending
    for (let i = 1; i < json.items.length; i++) {
      expect(json.items[i].similarity).toBeLessThanOrEqual(json.items[i - 1].similarity);
    }
  });

  it('intersects similarity with cv_shape filter', async () => {
    const res = await worker.fetch('/api/words/search', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        similar_to: {
          word: 'cat',
          weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
          threshold: 0.6,
          position: 'all',
          syllable_count: 1,
        },
        cv_shape: ['CVC'],
        limit: 50,
      }),
    });
    expect(res.status).toBe(200);
    const json = (await res.json()) as { items: Array<{ word: string; cv_shape: string; similarity: number }> };
    for (const item of json.items) {
      expect(item.cv_shape).toBe('CVC');
      expect(item.similarity).toBeGreaterThanOrEqual(0.6);
    }
  });
});

[ ] Step 2: Run test to verify it fails

npm test -- words.search-similar

Expected: FAIL — the route ignores similar_to; items lack similarity field.

[ ] Step 3: Extend types

Edit packages/web/workers/src/types.ts. Add to WordSearchBody:

export interface WordSearchBody {
  // ... existing fields including cv_shape from Task 6
  similar_to?: {
    word: string;
    weights: { onset: number; nucleus: number; coda: number };
    threshold: number;
    position: 'all' | 'initial' | 'final' | 'medial';
    syllable_count: number;
  };
}

If the response type (e.g., PaginatedWordResponse's item shape) needs similarity, add it there too (likely on the per-item Word shape):

export interface Word {
  // ... existing fields
  similarity?: number;
}

[ ] Step 4: Refactor similarity scan into a reusable helper

Open packages/web/workers/src/routes/similarity.ts. Find the scan loop (line 107-127). Extract its core into an exported helper that can be invoked from the search route. Add (or move) at the top of the file:

/**
 * Score all words against a target anchor; returns { word → similarity } above threshold.
 * Cold-start cache load shared with /similarity/search.
 */
export async function scoreSimilarityScan(
  db: D1Database,
  body: {
    word: string;
    threshold: number;
    weights: { onset: number; nucleus: number; coda: number };
    position: string;
    syllable_count: number;
  },
): Promise<Map<string, number>> {
  await ensureCache(db);
  const word = body.word.toLowerCase();
  const targetSyls = allWordSyllables!.get(word);
  if (!targetSyls) return new Map();
  const targetExtracted = extractSyllables(targetSyls, body.position, body.syllable_count);
  if (!targetExtracted.length) return new Map();
  const out = new Map<string, number>();
  for (const [candidate, candidateSyls] of allWordSyllables!) {
    if (candidate === word) continue;
    const candidateExtracted = extractSyllables(candidateSyls, body.position, body.syllable_count);
    if (!candidateExtracted.length) continue;
    const sim = softLevenshteinSimilarity(
      targetExtracted, candidateExtracted, body.weights, componentMap!, phonemeCache!,
    );
    if (sim >= body.threshold) {
      out.set(candidate, Math.round(sim * 10000) / 10000);
    }
  }
  return out;
}

The existing similarity.post('/search', ...) handler reuses this helper internally (refactor: replace its inner scan loop with const scored = await scoreSimilarityScan(c.env.DB, { ... }); then iterate scored to build the response).

[ ] Step 5: Wire the intersection in the search route

Edit packages/web/workers/src/routes/words.ts. At the top, import the helper:

import { scoreSimilarityScan } from './similarity';

In words.post('/search', ...), after the existing exclude_phonemes and cv_shape clauses are pushed onto wordsWhere (around line 142 after Task 6's edit), and BEFORE the SQL-assembly + execution block (the const wordsWhereSQL = ... line), insert this complete early-return branch. It does similarity scan → chunked SQL filter with IN (...) per chunk → JS intersection → sort by similarity desc → paginate → fetch full rows → stamp similarity on each:

  // ---- Similarity intersection path (when similar_to is present) ----
  // D1 caps bind params at 100; chunk the IN-list at 80 per call to stay under.
  if (body.similar_to) {
    const scoreMap = await scoreSimilarityScan(c.env.DB, body.similar_to);
    if (scoreMap.size === 0) {
      return c.json({
        items: [],
        total: 0,
        offset: Math.max(body.offset ?? 0, 0),
        limit: Math.min(Math.max(body.limit ?? 50, 1), 5000),
      });
    }
    const candidates = Array.from(scoreMap.keys());

    // Build the WHERE without an IN clause; we append a fresh IN per chunk below.
    const wordsWhereSQL = wordsWhere.length ? wordsWhere.join(' AND ') : '1=1';
    const propsWhereSQL = propsWhere.length ? propsWhere.join(' AND ') : '1=1';
    const fromClause = 'FROM words w INNER JOIN word_properties wp ON w.word = wp.word';
    const selectCols = needsMedialPostFilter ? 'w.word, w.phonemes' : 'w.word';

    const filterMatching = new Set<string>();
    for (let i = 0; i < candidates.length; i += 80) {
      const chunk = candidates.slice(i, i + 80);
      const placeholders = chunk.map(() => '?').join(', ');
      const sql = `SELECT ${selectCols} ${fromClause}
        WHERE ${wordsWhereSQL} AND w.word IN (${placeholders}) AND ${propsWhereSQL}`;
      const { results } = await c.env.DB.prepare(sql)
        .bind(...params, ...chunk)
        .all<{ word: string; phonemes?: string }>();
      for (const row of results) {
        if (needsMedialPostFilter && row.phonemes) {
          const phonemes = JSON.parse(row.phonemes) as string[];
          if (!matchesMedialPattern(phonemes, medialSequences)) continue;
        }
        filterMatching.add(row.word);
      }
    }

    // Intersection, sorted by similarity desc.
    const intersected = candidates.filter((w) => filterMatching.has(w));
    intersected.sort((a, b) => (scoreMap.get(b) ?? 0) - (scoreMap.get(a) ?? 0));

    const offset = Math.max(body.offset ?? 0, 0);
    const limit = Math.min(Math.max(body.limit ?? 50, 1), 5000);
    const page = intersected.slice(offset, offset + limit);

    if (!page.length) {
      return c.json({ items: [], total: intersected.length, offset, limit });
    }
    const rowMap = await fetchMergedWordRows(c.env.DB, page);
    return c.json({
      items: page
        .filter((w) => rowMap.has(w))
        .map((w) => ({
          ...rowToWordResponse(rowMap.get(w)!),
          similarity: scoreMap.get(w) ?? 0,
        })),
      total: intersected.length,
      offset,
      limit,
    });
  }
  // ---- End similarity intersection path; falls through to existing non-similar code below ----

Note: this branch returns early. The existing non-similar code path (count + paginated word list + fetchMergedWordRows, lines ~192-213) is left unchanged and serves all queries where body.similar_to is undefined.

[ ] Step 6: Refactor /api/similarity/search to use the same helper

Edit packages/web/workers/src/routes/similarity.ts. Find the existing handler (line 70). Replace the inner scan loop (lines 107-127, the for (const [candidate, candidateSyls] of allWordSyllables!) block) with a call to the new scoreSimilarityScan. The handler becomes:

similarity.post('/search', async (c) => {
  const body = await c.req.json<{
    word: string;
    threshold?: number;
    limit?: number;
    onset_weight?: number;
    nucleus_weight?: number;
    coda_weight?: number;
    position?: string;
    syllable_count?: number;
  }>();

  const scoreMap = await scoreSimilarityScan(c.env.DB, {
    word: body.word.toLowerCase(),
    threshold: body.threshold ?? 0.7,
    weights: {
      onset: body.onset_weight ?? 0.33,
      nucleus: body.nucleus_weight ?? 0.33,
      coda: body.coda_weight ?? 0.33,
    },
    position: body.position ?? 'all',
    syllable_count: body.syllable_count ?? 1,
  });

  const limit = Math.min(Math.max(body.limit ?? 50, 1), 500);
  const ranked = Array.from(scoreMap.entries())
    .sort((a, b) => b[1] - a[1])
    .slice(0, limit);

  if (!ranked.length) return c.json([]);

  const wordNames = ranked.map(([w]) => w);
  const rowMap = await fetchMergedWordRows(c.env.DB, wordNames, {
    requirePhonology: true,
    requireFrequency: true,
  });

  return c.json(
    ranked
      .filter(([w]) => rowMap.has(w))
      .map(([w, sim]) => ({ word: rowToWordResponse(rowMap.get(w)!), similarity: sim })),
  );
});

This preserves the existing endpoint's contract (callers still get [{ word: WordResponse, similarity: number }, ...]) while sharing the scan implementation.

[ ] Step 7: Run tests to verify they pass

npm test -- words.search-similar

Expected: both tests PASS.

[ ] Step 8: Re-run the full worker test suite to confirm no regressions

npm test

Expected: all tests pass.

[ ] Step 9: Commit

git add packages/web/workers/src/types.ts \
        packages/web/workers/src/routes/words.ts \
        packages/web/workers/src/routes/similarity.ts \
        packages/web/workers/test/routes/words.search-similar.test.ts
git commit -m "feat(api): similar_to block on POST /api/words/search

Intersects existing filter+pattern query with a similarity scan; ranks
the intersection by similarity desc and stamps similarity onto each
item. Reuses the soft-Levenshtein helper from /api/similarity/search
(now extracted to scoreSimilarityScan). D1 bind chunking at 80/call.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 8: Frontend `CategoricalRule` component¶

Files: - Create: packages/web/frontend/src/components/shared/CategoricalRule.tsx - Create: packages/web/frontend/src/components/shared/__tests__/CategoricalRule.test.tsx (if tests directory exists; check first)

[ ] Step 1: Check the test convention

ls packages/web/frontend/src/test/ packages/web/frontend/src/**/*.test.tsx 2>/dev/null | head

Use whatever co-location pattern is already in place. Create the test file in the matching location.

[ ] Step 2: Write the failing test

// packages/web/frontend/src/test/CategoricalRule.test.tsx (or co-located)
import { describe, expect, it, vi } from 'vitest';
import { render, screen, fireEvent } from '@testing-library/react';
import CategoricalRule from '../components/shared/CategoricalRule';

describe('CategoricalRule', () => {
  it('renders preset chips and toggles selection', () => {
    const onChange = vi.fn();
    render(
      <CategoricalRule
        label="CV shape"
        presets={['CV', 'CVC', 'CCV']}
        value={[]}
        onChange={onChange}
        allowCustom={true}
      />,
    );
    expect(screen.getByText('CV')).toBeInTheDocument();
    fireEvent.click(screen.getByText('CVC'));
    expect(onChange).toHaveBeenCalledWith(['CVC']);
  });

  it('accepts custom entries via the Add button', () => {
    const onChange = vi.fn();
    render(
      <CategoricalRule
        label="CV shape"
        presets={['CVC']}
        value={[]}
        onChange={onChange}
        allowCustom={true}
        customValidator={(s) => /^[CV]+(-[CV]+)*$/.test(s)}
      />,
    );
    fireEvent.change(screen.getByLabelText(/custom/i), { target: { value: 'CVCV-CVC' } });
    fireEvent.click(screen.getByText(/add/i));
    expect(onChange).toHaveBeenCalledWith(['CVCV-CVC']);
  });

  it('rejects invalid custom entries when validator provided', () => {
    const onChange = vi.fn();
    render(
      <CategoricalRule
        label="CV shape"
        presets={['CVC']}
        value={[]}
        onChange={onChange}
        allowCustom={true}
        customValidator={(s) => /^[CV]+(-[CV]+)*$/.test(s)}
      />,
    );
    fireEvent.change(screen.getByLabelText(/custom/i), { target: { value: 'invalid123' } });
    fireEvent.click(screen.getByText(/add/i));
    expect(onChange).not.toHaveBeenCalled();
  });

  it('removes active entries when X is clicked', () => {
    const onChange = vi.fn();
    render(
      <CategoricalRule
        label="CV shape"
        presets={['CV', 'CVC', 'CCV']}
        value={['CV', 'CVC']}
        onChange={onChange}
        allowCustom={false}
      />,
    );
    fireEvent.click(screen.getByLabelText(/remove CV$/i));
    expect(onChange).toHaveBeenCalledWith(['CVC']);
  });
});

[ ] Step 3: Run test to verify it fails

cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm test -- CategoricalRule

Expected: FAIL — component does not exist.

[ ] Step 4: Implement the component

Create packages/web/frontend/src/components/shared/CategoricalRule.tsx:

import React, { useState } from 'react';
import { Box, Chip, Stack, TextField, Button, Typography } from '@mui/material';

export interface CategoricalRuleProps {
  /** Section label displayed above the chip group. */
  label: string;
  /** Preset values to display as chips (always shown). */
  presets: string[];
  /** Currently active selection. */
  value: string[];
  /** Called with the new selection array. */
  onChange: (next: string[]) => void;
  /** Allow a free-text "custom" input + Add button. */
  allowCustom?: boolean;
  /** Optional validator for custom entries; rejected entries do not add. */
  customValidator?: (input: string) => boolean;
}

const CategoricalRule: React.FC<CategoricalRuleProps> = ({
  label,
  presets,
  value,
  onChange,
  allowCustom = false,
  customValidator,
}) => {
  const [customInput, setCustomInput] = useState('');
  const [customError, setCustomError] = useState<string | null>(null);

  const togglePreset = (preset: string) => {
    if (value.includes(preset)) {
      onChange(value.filter((v) => v !== preset));
    } else {
      onChange([...value, preset]);
    }
  };

  const addCustom = () => {
    const trimmed = customInput.trim();
    if (!trimmed) return;
    if (customValidator && !customValidator(trimmed)) {
      setCustomError(`Invalid ${label.toLowerCase()} pattern`);
      return;
    }
    if (value.includes(trimmed)) {
      setCustomInput('');
      return;
    }
    onChange([...value, trimmed]);
    setCustomInput('');
    setCustomError(null);
  };

  const removeActive = (item: string) => {
    onChange(value.filter((v) => v !== item));
  };

  return (
    <Box>
      <Typography variant="subtitle2" gutterBottom>{label}</Typography>
      <Typography variant="caption" color="text.secondary" sx={{ display: 'block', mb: 1 }}>
        Common shapes:
      </Typography>
      <Stack direction="row" spacing={1} flexWrap="wrap" useFlexGap sx={{ mb: 1 }}>
        {presets.map((p) => (
          <Chip
            key={p}
            label={p}
            onClick={() => togglePreset(p)}
            color={value.includes(p) ? 'primary' : 'default'}
            sx={{ mb: 1 }}
          />
        ))}
      </Stack>
      {allowCustom && (
        <Stack direction="row" spacing={1} alignItems="center" sx={{ mb: 1 }}>
          <TextField
            label={`Custom ${label.toLowerCase()}`}
            value={customInput}
            onChange={(e) => { setCustomInput(e.target.value); setCustomError(null); }}
            size="small"
            error={!!customError}
            helperText={customError ?? undefined}
            onKeyDown={(e) => { if (e.key === 'Enter') addCustom(); }}
          />
          <Button onClick={addCustom} variant="outlined" size="small">Add</Button>
        </Stack>
      )}
      {value.length > 0 && (
        <Box>
          <Typography variant="caption" color="text.secondary" sx={{ display: 'block', mb: 0.5 }}>
            Active:
          </Typography>
          <Stack direction="row" spacing={1} flexWrap="wrap" useFlexGap>
            {value.map((v) => (
              <Chip
                key={v}
                label={v}
                onDelete={() => removeActive(v)}
                aria-label={`remove ${v}`}
                color="primary"
                variant="outlined"
                size="small"
              />
            ))}
          </Stack>
        </Box>
      )}
    </Box>
  );
};

export default CategoricalRule;

[ ] Step 5: Run test to verify it passes

npm test -- CategoricalRule

Expected: all 4 tests PASS.

[ ] Step 6: Commit

git add packages/web/frontend/src/components/shared/CategoricalRule.tsx \
        packages/web/frontend/src/test/CategoricalRule.test.tsx
git commit -m "feat(frontend): CategoricalRule — reusable chip + custom-input picker

Generic primitive for multi-select OR-semantics categorical filters.
Used for cv_shape; reusable for future categorical filters (e.g., POS).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 9: Frontend `SimilarToRule` component¶

Files: - Create: packages/web/frontend/src/components/shared/SimilarToRule.tsx - Create: packages/web/frontend/src/test/SimilarToRule.test.tsx

[ ] Step 1: Write the failing test

// packages/web/frontend/src/test/SimilarToRule.test.tsx
import { describe, expect, it, vi } from 'vitest';
import { render, screen, fireEvent } from '@testing-library/react';
import SimilarToRule from '../components/shared/SimilarToRule';

describe('SimilarToRule', () => {
  it('renders preset chips and applies preset on click', () => {
    const onChange = vi.fn();
    render(
      <SimilarToRule
        value={{
          word: '',
          weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
          threshold: 0.85,
          position: 'all',
          syllable_count: 1,
        }}
        onChange={onChange}
      />,
    );
    fireEvent.click(screen.getByText('Rhymes'));
    expect(onChange).toHaveBeenCalled();
    const arg = onChange.mock.calls.pop()[0];
    expect(arg.weights.onset).toBe(0.0);
    expect(arg.position).toBe('final');
  });

  it('updates anchor word on input', () => {
    const onChange = vi.fn();
    render(
      <SimilarToRule
        value={{
          word: '',
          weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
          threshold: 0.85,
          position: 'all',
          syllable_count: 1,
        }}
        onChange={onChange}
      />,
    );
    fireEvent.change(screen.getByLabelText(/anchor word/i), { target: { value: 'snake' } });
    expect(onChange).toHaveBeenLastCalledWith(expect.objectContaining({ word: 'snake' }));
  });

  it('exposes advanced sliders only when disclosure expanded', () => {
    const onChange = vi.fn();
    render(
      <SimilarToRule
        value={{
          word: 'snake',
          weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
          threshold: 0.85,
          position: 'all',
          syllable_count: 1,
        }}
        onChange={onChange}
      />,
    );
    // Advanced is collapsed by default; sliders not visible
    expect(screen.queryByLabelText(/onset weight/i)).not.toBeInTheDocument();
    fireEvent.click(screen.getByText(/advanced/i));
    expect(screen.getByLabelText(/onset weight/i)).toBeInTheDocument();
  });
});

[ ] Step 2: Run test to verify it fails

npm test -- SimilarToRule

Expected: FAIL — component does not exist.

[ ] Step 3: Implement the component

Create packages/web/frontend/src/components/shared/SimilarToRule.tsx. Lift the preset definitions, component-weight slider block, position/count select pair, and labeled-bucket threshold select from PhonologicalSimilarityTool.tsx lines 47-305. Refactor into a controlled component:

import React, { useState } from 'react';
import {
  Box, TextField, Stack, Chip, FormControl, InputLabel, Select, MenuItem,
  Paper, Slider, Typography, Button, Accordion, AccordionSummary, AccordionDetails,
} from '@mui/material';
import { ExpandMore as ExpandMoreIcon, Refresh as ResetIcon } from '@mui/icons-material';

export interface SimilarToValue {
  word: string;
  weights: { onset: number; nucleus: number; coda: number };
  threshold: number;
  position: 'all' | 'initial' | 'final' | 'medial';
  syllable_count: number;
}

interface PresetConfig {
  name: string;
  weights: SimilarToValue['weights'];
  position: SimilarToValue['position'];
  syllable_count: number;
}

const PRESETS: PresetConfig[] = [
  { name: 'Rhymes',       weights: { onset: 0.0,  nucleus: 0.5,  coda: 0.5 }, position: 'final',   syllable_count: 1 },
  { name: 'Alliteration', weights: { onset: 1.0,  nucleus: 0.5,  coda: 0.0 }, position: 'initial', syllable_count: 1 },
  { name: 'Assonance',    weights: { onset: 0.0,  nucleus: 1.0,  coda: 0.0 }, position: 'all',     syllable_count: 1 },
  { name: 'Consonance',   weights: { onset: 0.5,  nucleus: 0.0,  coda: 0.5 }, position: 'all',     syllable_count: 1 },
  { name: 'Balanced',     weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 }, position: 'all',     syllable_count: 1 },
];

const THRESHOLD_OPTIONS = [
  { value: 0.95, label: 'Very High (0.95)' },
  { value: 0.85, label: 'High (0.85)' },
  { value: 0.75, label: 'Medium (0.75)' },
  { value: 0.65, label: 'Low (0.65)' },
  { value: 0.50, label: 'Very Low (0.50)' },
];

export interface SimilarToRuleProps {
  value: SimilarToValue;
  onChange: (next: SimilarToValue) => void;
}

const SimilarToRule: React.FC<SimilarToRuleProps> = ({ value, onChange }) => {
  const update = (patch: Partial<SimilarToValue>) => onChange({ ...value, ...patch });

  const applyPreset = (preset: PresetConfig) => {
    onChange({
      ...value,
      weights: preset.weights,
      position: preset.position,
      syllable_count: preset.syllable_count,
    });
  };

  const matchesPreset = (preset: PresetConfig) =>
    value.weights.onset === preset.weights.onset &&
    value.weights.nucleus === preset.weights.nucleus &&
    value.weights.coda === preset.weights.coda &&
    value.position === preset.position &&
    value.syllable_count === preset.syllable_count;

  return (
    <Paper variant="outlined" sx={{ p: 2 }}>
      <Stack spacing={2}>
        <Typography variant="subtitle2">Similar to anchor word</Typography>
        <Typography variant="caption" color="text.secondary">
          Empty anchor = rule inactive.
        </Typography>
        <TextField
          label="Anchor word"
          value={value.word}
          onChange={(e) => update({ word: e.target.value })}
          size="small"
          placeholder="e.g., cat, snake, computer"
          fullWidth
        />
        <Box>
          <Typography variant="body2" gutterBottom>Preset</Typography>
          <Stack direction="row" spacing={1} flexWrap="wrap" useFlexGap>
            {PRESETS.map((preset) => (
              <Chip
                key={preset.name}
                label={preset.name}
                onClick={() => applyPreset(preset)}
                color={matchesPreset(preset) ? 'primary' : 'default'}
                sx={{ mb: 1 }}
              />
            ))}
          </Stack>
        </Box>
        <FormControl size="small" fullWidth>
          <InputLabel>Match strength</InputLabel>
          <Select
            value={value.threshold}
            label="Match strength"
            onChange={(e) => update({ threshold: e.target.value as number })}
          >
            {THRESHOLD_OPTIONS.map((opt) => (
              <MenuItem key={opt.value} value={opt.value}>{opt.label}</MenuItem>
            ))}
          </Select>
        </FormControl>
        <Accordion variant="outlined" disableGutters>
          <AccordionSummary expandIcon={<ExpandMoreIcon />}>
            <Typography variant="body2">Advanced (component weights + position)</Typography>
          </AccordionSummary>
          <AccordionDetails>
            <Stack spacing={2}>
              <Stack direction="row" spacing={2}>
                <FormControl size="small" fullWidth>
                  <InputLabel>Position</InputLabel>
                  <Select
                    value={value.position}
                    label="Position"
                    onChange={(e) => update({ position: e.target.value as SimilarToValue['position'] })}
                  >
                    <MenuItem value="all">All syllables</MenuItem>
                    <MenuItem value="final">Final</MenuItem>
                    <MenuItem value="initial">Initial</MenuItem>
                    <MenuItem value="medial">Medial</MenuItem>
                  </Select>
                </FormControl>
                <FormControl size="small" sx={{ minWidth: 120 }}
                  disabled={value.position === 'all' || value.position === 'medial'}>
                  <InputLabel>Count</InputLabel>
                  <Select
                    value={value.syllable_count}
                    label="Count"
                    onChange={(e) => update({ syllable_count: e.target.value as number })}
                  >
                    <MenuItem value={1}>1 syllable</MenuItem>
                    <MenuItem value={2}>2 syllables</MenuItem>
                    <MenuItem value={3}>3 syllables</MenuItem>
                  </Select>
                </FormControl>
              </Stack>
              {(['onset', 'nucleus', 'coda'] as const).map((axis) => (
                <Box key={axis}>
                  <Typography variant="body2" gutterBottom id={`${axis}-weight-label`}>
                    {axis.charAt(0).toUpperCase() + axis.slice(1)}: {value.weights[axis].toFixed(2)}
                  </Typography>
                  <Slider
                    aria-label={`${axis} weight`}
                    aria-labelledby={`${axis}-weight-label`}
                    value={value.weights[axis]}
                    onChange={(_, v) => update({ weights: { ...value.weights, [axis]: v as number } })}
                    min={0}
                    max={1}
                    step={0.05}
                    marks={[{ value: 0, label: '0' }, { value: 0.5, label: '0.5' }, { value: 1, label: '1' }]}
                    valueLabelDisplay="auto"
                  />
                </Box>
              ))}
              <Box>
                <Button
                  size="small"
                  startIcon={<ResetIcon />}
                  onClick={() => applyPreset(PRESETS[0])}
                >
                  Reset to Rhymes
                </Button>
              </Box>
            </Stack>
          </AccordionDetails>
        </Accordion>
      </Stack>
    </Paper>
  );
};

export default SimilarToRule;

[ ] Step 4: Run tests to verify they pass

npm test -- SimilarToRule

Expected: all 3 tests PASS.

[ ] Step 5: Commit

git add packages/web/frontend/src/components/shared/SimilarToRule.tsx \
        packages/web/frontend/src/test/SimilarToRule.test.tsx
git commit -m "feat(frontend): SimilarToRule — composable similarity rule

Lifts preset chips, labeled-threshold bucket select, and component-
weight sliders from PhonologicalSimilarityTool into a controlled
component. Advanced disclosure hides the per-axis sliders + position
controls; preset chips cover ~95% of clinical intent.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 10: `usePropertyMetadata` → `?surface=platform`¶

Files: - Modify: packages/web/frontend/src/hooks/usePropertyMetadata.tsx:44-52 - Modify: packages/web/frontend/src/services/apiClient.ts — getPropertyMetadata accepts optional surface arg - Optional: smoke test if there's an existing hook test

[ ] Step 1: Find and update the API client method

grep -n "getPropertyMetadata" packages/web/frontend/src/services/apiClient.ts

Locate the existing method (typically named getPropertyMetadata). Update it to accept an optional surface parameter:

async getPropertyMetadata(opts?: { surface?: 'platform' }): Promise<PropertyCategory[]> {
  const url = opts?.surface
    ? `${API_BASE}/api/property-metadata?surface=${opts.surface}`
    : `${API_BASE}/api/property-metadata`;
  const res = await fetch(url);
  if (!res.ok) throw new Error(`property-metadata failed: ${res.status}`);
  return res.json();
}

(Exact field names depend on the existing apiClient shape — preserve the method-call pattern already in use.)

[ ] Step 2: Update the hook

Edit packages/web/frontend/src/hooks/usePropertyMetadata.tsx:50:

const [categories, ranges] = await Promise.all([
  api.getPropertyMetadata({ surface: 'platform' }),
  api.getPropertyRanges(),
]);

[ ] Step 3: Verify the build still typechecks

cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run build

Expected: build succeeds.

[ ] Step 4: Commit

git add packages/web/frontend/src/hooks/usePropertyMetadata.tsx \
        packages/web/frontend/src/services/apiClient.ts
git commit -m "feat(frontend): usePropertyMetadata calls ?surface=platform

Curated platform property surface (14 props across 4 groups) is the
default for in-app consumers. Researcher consumers continue to hit
the unparam'd route via API directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 11: Restructure `Builder.tsx` to 5 accordions + wire new rules¶

Files: - Modify: packages/web/frontend/src/components/Builder.tsx (heavy rewrite of the state shape + accordion structure) - Modify: packages/web/frontend/src/services/apiClient.ts — extend WordSearchRequest with cv_shape: string[] and similar_to: SimilarToValue; expose Word.cv_shape and Word.similarity

[ ] Step 1: Extend WordSearchRequest and Word types

Edit packages/web/frontend/src/services/apiClient.ts:77-85:

export interface WordSearchRequest {
  patterns?: Pattern[];
  filters?: WordFilterRequest;
  exclude_phonemes?: string[];
  cv_shape?: string[];
  similar_to?: {
    word: string;
    weights: { onset: number; nucleus: number; coda: number };
    threshold: number;
    position: 'all' | 'initial' | 'final' | 'medial';
    syllable_count: number;
  };
  sort_by?: string;
  sort_order?: 'asc' | 'desc';
  limit?: number;
  offset?: number;
}

Find the Word interface in the same file and add (preserving optionality):

export interface Word {
  // ... existing fields
  cv_shape?: string | null;
  similarity?: number;
}

[ ] Step 2: Restructure Builder.tsx state

Edit packages/web/frontend/src/components/Builder.tsx. The current state holds patterns, filters (object id→[min,max]), excludePhonemeInput. Extend it to hold:

const [cvShapes, setCvShapes] = useState<string[]>([]);
const [similarTo, setSimilarTo] = useState<SimilarToValue>({
  word: '',
  weights: { onset: 0.33, nucleus: 0.33, coda: 0.33 },
  threshold: 0.85,
  position: 'all',
  syllable_count: 1,
});

Add imports near the top:

import CategoricalRule from './shared/CategoricalRule';
import SimilarToRule, { type SimilarToValue } from './shared/SimilarToRule';

[ ] Step 3: Update the search payload assembly

Find the existing search request build (the handleSearch / handleSubmit function). Extend the payload:

const request: WordSearchRequest = {
  patterns: nonEmptyPatterns,
  filters: filterPayload,
  exclude_phonemes: parsedExclusions.length ? parsedExclusions : undefined,
  cv_shape: cvShapes.length ? cvShapes : undefined,
  similar_to: similarTo.word.trim() ? similarTo : undefined,
  limit: 200,
};

[ ] Step 4: Build the per-section property helpers

Inside the Builder component (after usePropertyMetadata is destructured at the top of the function), derive per-section property lists from the platform metadata. The platform metadata returns 4 categories (phonological_complexity, lexical, semantic, affective, developmental_frequency); the frontend re-groups lexical + developmental_frequency into a single "Age Appropriateness" surface:

const propsByCategory = useMemo(() => {
  const find = (id: string) => categories.find((c) => c.id === id)?.properties ?? [];
  // Word Shape: numeric props (syllable_count, phoneme_count, wcm_score). cv_shape is categorical and rendered separately.
  const wordShape = find('phonological_complexity').filter((p) => p.kind !== 'categorical');
  // Age Appropriateness: aoa (lexical) + 5 freq_age_* headlines (developmental_frequency).
  const ageAppropriateness = [...find('lexical'), ...find('developmental_frequency')];
  // Imagery & Familiarity: concreteness + familiarity.
  const imageryFamiliarity = find('semantic');
  // Emotional Tone: valence + arousal.
  const emotionalTone = find('affective');
  return { wordShape, ageAppropriateness, imageryFamiliarity, emotionalTone };
}, [categories]);

[ ] Step 5: Render the five accordion sections

Replace the existing accordion list in the return statement with these five sections. Reuse <PropertySlider> for numeric props (existing component); <CategoricalRule> and <SimilarToRule> are the new primitives:

return (
  <Box>
    <Stack spacing={{ xs: 1.5, sm: 2 }}>

      {/* 1. Phoneme rules (default open) */}
      <Accordion defaultExpanded>
        <AccordionSummary expandIcon={<ExpandMoreIcon />}>
          <Typography variant="h6" sx={{ fontSize: { xs: '1rem', sm: '1.25rem' } }}>
            Phoneme rules
          </Typography>
        </AccordionSummary>
        <AccordionDetails sx={{ px: { xs: 1.5, sm: 2 }, py: { xs: 1, sm: 2 } }}>
          <Stack spacing={2}>
            {/* Patterns block — keep existing pattern-builder UI verbatim (the
                Paper-wrapped pattern rows starting at the current Builder.tsx
                line ~236, including the IPA keyboard picker buttons). */}
            {patternsBlock}
            {/* Exclude phonemes block — keep existing TextField + parse logic verbatim. */}
            {excludePhonemesBlock}
            {/* New: similarity rule. Empty anchor = inactive. */}
            <SimilarToRule value={similarTo} onChange={setSimilarTo} />
          </Stack>
        </AccordionDetails>
      </Accordion>

      {/* 2. Word Shape (default open) */}
      <Accordion defaultExpanded>
        <AccordionSummary expandIcon={<ExpandMoreIcon />}>
          <Typography variant="h6">Word Shape</Typography>
        </AccordionSummary>
        <AccordionDetails>
          <Stack spacing={2}>
            {propsByCategory.wordShape.map((prop) => (
              <PropertySlider
                key={prop.id}
                prop={prop}
                value={filters[prop.id]}
                range={ranges[prop.id]}
                onChange={(v) => handleFilterChange(prop.id, v)}
              />
            ))}
            <CategoricalRule
              label="CV shape"
              presets={['V', 'CV', 'VC', 'CVC', 'CCV', 'CCVC', 'CVCC', 'CCVCC', 'CV-CV', 'CV-CVC', 'CCV-CV']}
              value={cvShapes}
              onChange={setCvShapes}
              allowCustom
              customValidator={(s) => /^[CV]+(-[CV]+)*$/.test(s)}
            />
          </Stack>
        </AccordionDetails>
      </Accordion>

      {/* 3. Age Appropriateness (collapsed) */}
      <Accordion>
        <AccordionSummary expandIcon={<ExpandMoreIcon />}>
          <Typography variant="h6">Age Appropriateness</Typography>
        </AccordionSummary>
        <AccordionDetails>
          <Stack spacing={2}>
            {propsByCategory.ageAppropriateness.map((prop) => (
              <PropertySlider
                key={prop.id}
                prop={prop}
                value={filters[prop.id]}
                range={ranges[prop.id]}
                onChange={(v) => handleFilterChange(prop.id, v)}
              />
            ))}
          </Stack>
        </AccordionDetails>
      </Accordion>

      {/* 4. Imagery & Familiarity (collapsed) */}
      <Accordion>
        <AccordionSummary expandIcon={<ExpandMoreIcon />}>
          <Typography variant="h6">Imagery &amp; Familiarity</Typography>
        </AccordionSummary>
        <AccordionDetails>
          <Stack spacing={2}>
            {propsByCategory.imageryFamiliarity.map((prop) => (
              <PropertySlider
                key={prop.id}
                prop={prop}
                value={filters[prop.id]}
                range={ranges[prop.id]}
                onChange={(v) => handleFilterChange(prop.id, v)}
              />
            ))}
          </Stack>
        </AccordionDetails>
      </Accordion>

      {/* 5. Emotional Tone (collapsed) */}
      <Accordion>
        <AccordionSummary expandIcon={<ExpandMoreIcon />}>
          <Typography variant="h6">Emotional Tone</Typography>
        </AccordionSummary>
        <AccordionDetails>
          <Stack spacing={2}>
            {propsByCategory.emotionalTone.map((prop) => (
              <PropertySlider
                key={prop.id}
                prop={prop}
                value={filters[prop.id]}
                range={ranges[prop.id]}
                onChange={(v) => handleFilterChange(prop.id, v)}
              />
            ))}
          </Stack>
        </AccordionDetails>
      </Accordion>

      {/* Existing Build button + results section stay below */}
      {buildButtonAndResults}
    </Stack>
  </Box>
);

Notes: - patternsBlock, excludePhonemesBlock, and buildButtonAndResults are intermediate variables — extract them from the existing render JSX into named const bindings just before the return. This keeps the diff scoped to "restructure the accordion list" rather than touching the inner pattern/exclude UI. - useMemo import: add useMemo to the existing React, { useState } import line at the top of the file. - PropertyDef exposes the kind field added in Task 3; if the PropertySlider component doesn't already understand kind, no change needed — we filter categorical props out at the helper layer (Step 4) so PropertySlider only sees numeric ones.

[ ] Step 6: Handle clearAll for the new state

Update handleClear to also reset cvShapes to [] and similarTo to its default object.

[ ] Step 7: Smoke test in dev

cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npx wrangler dev &
WORKER_PID=$!
cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run dev &
FRONTEND_PID=$!

Open the local frontend (default http://localhost:5173), navigate to Word Lists, and verify: - Phoneme rules expanded by default; pattern + exclude + similar-to all visible - Word Shape expanded by default; numeric sliders + CV shape chip picker visible - Age Appropriateness collapsed; expanding shows 6 sliders (aoa + 5 freq_age) - Imagery & Familiarity collapsed; expanding shows 2 sliders - Emotional Tone collapsed; expanding shows 2 sliders - Running a search returns results; clicking a similar-to preset rebuilds the query

Kill processes:

kill $WORKER_PID $FRONTEND_PID

[ ] Step 8: Commit

git add packages/web/frontend/src/components/Builder.tsx \
        packages/web/frontend/src/services/apiClient.ts
git commit -m "feat(frontend): Builder.tsx — 5-accordion SLP-curated surface

Phoneme rules (default open) holds patterns + exclude + new SimilarToRule.
Word Shape (default open) holds 3 sliders + new CategoricalRule for cv_shape.
Age Appropriateness / Imagery & Familiarity / Emotional Tone collapsed
secondary groups. Search request now carries cv_shape and similar_to.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 12: Delete `PhonologicalSimilarityTool.tsx`; update App copy¶

Files: - Delete: packages/web/frontend/src/components/tools/PhonologicalSimilarityTool.tsx - Modify: packages/web/frontend/src/App_new.tsx:74,117 — update Word Lists description; remove stale PHON-117 comment

[ ] Step 1: Verify nothing else imports the tool

grep -rn "PhonologicalSimilarityTool" packages/web/frontend/src

Expected: only the file itself and a single comment in App_new.tsx:117. If any active imports remain, fix them before deleting.

[ ] Step 2: Update App_new.tsx

Edit packages/web/frontend/src/App_new.tsx:74:

{
  id: 'wordLists',
  icon: <BuildIcon />,
  title: 'Word Lists',
  description: 'Build word lists for therapy and research. Filter by word shape, age-appropriateness, imagery, and emotional tone; compose with phoneme patterns, exclusions, and sound similarity.',
  color: TOOL_COLORS.wordLists,
  section: 'build',
},

Find line ~117 with the // PHON-117: Sound Similarity is being consolidated into Word Lists. comment and remove it (the consolidation is now done).

[ ] Step 3: Delete the orphaned tool file

git rm packages/web/frontend/src/components/tools/PhonologicalSimilarityTool.tsx

[ ] Step 4: Verify the frontend still builds

cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run build

Expected: build succeeds, no missing-import errors.

[ ] Step 5: Run frontend tests

npm test

Expected: all tests pass.

[ ] Step 6: Commit

git add packages/web/frontend/src/App_new.tsx \
        packages/web/frontend/src/components/tools/PhonologicalSimilarityTool.tsx
git commit -m "chore(frontend): delete PhonologicalSimilarityTool; update Word Lists copy

Consolidated into the new SimilarToRule inside Builder.tsx. Tool was
already unregistered in TOOL_DEFS; this finishes the migration and
updates the Word Lists tool card description to reflect the unified
composable surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

Task 13: Regenerate parquet + D1 seed; verify end-to-end¶

Files: none modified; regenerate artifacts.

[ ] Step 1: Regenerate data/runtime/words.parquet and siblings

cd /Users/jneumann/Repos/PhonoLex
uv run python packages/data/scripts/build_runtime_parquet.py

Expected: completes without errors; new cv_shape column appears in data/runtime/words.parquet; freq_age_adult is populated for words with wpm_b4 or wpm_b5 coverage.

[ ] Step 2: Spot-check the parquet output

uv run python -c "
import polars as pl
df = pl.read_parquet('data/runtime/words.parquet')
print('Columns include cv_shape:', 'cv_shape' in df.columns)
print('Columns include freq_age_adult:', 'freq_age_adult' in df.columns)
print('Sample cv_shape rows:')
print(df.select(['word', 'cv_shape']).head(20))
print('cv_shape value counts (top 20):')
print(df.group_by('cv_shape').len().sort('len', descending=True).head(20))
print('freq_age_adult coverage:')
print(df.select(pl.col('freq_age_adult').is_not_null().sum().alias('with_adult')))
"

Expected: cv_shape populated for all has_phonology rows; common shapes like CVC, CV-CVC, CCV-CVC appear with reasonable counts; freq_age_adult populated for the bulk of words.

[ ] Step 3: Regenerate d1-seed.sql

cd packages/web/workers
uv run python scripts/export-to-d1.py

Expected: regenerates scripts/d1-seed.sql with cv_shape column added to the words table CREATE statement and freq_age_adult + percentile in word_properties / word_percentiles.

[ ] Step 4: Apply migration to local D1

npx wrangler d1 execute phonolex --local --file scripts/d1-seed.sql

Expected: completes; tables drop+recreate with new columns.

[ ] Step 5: End-to-end smoke

npx wrangler dev &
WORKER_PID=$!
sleep 3

Then:

curl -s http://localhost:8787/api/property-metadata?surface=platform | jq '.[].properties[].id' | sort

Expected output (sorted ids): the 14 curated platform property ids.

curl -s -X POST http://localhost:8787/api/words/search \
  -H 'Content-Type: application/json' \
  -d '{"cv_shape": ["CVC"], "limit": 10}' | jq '.items | map(.cv_shape) | unique'

Expected: ["CVC"].

curl -s -X POST http://localhost:8787/api/words/search \
  -H 'Content-Type: application/json' \
  -d '{"similar_to": {"word":"cat","weights":{"onset":0.33,"nucleus":0.33,"coda":0.33},"threshold":0.75,"position":"all","syllable_count":1}, "cv_shape":["CVC"], "limit": 10}' \
  | jq '.items | map({word, cv_shape, similarity})'

Expected: items have similarity desc-sorted, all cv_shape == "CVC".

Kill the worker:

kill $WORKER_PID

[ ] Step 6: Commit the regenerated artifacts

git add data/runtime/words.parquet \
        packages/web/workers/scripts/d1-seed.sql
git commit -m "data: regenerate parquet + d1-seed with cv_shape and freq_age_adult

Re-emit data/runtime/words.parquet via build_runtime_parquet.py and
packages/web/workers/scripts/d1-seed.sql via export-to-d1.py to pick
up the new derived columns. End-to-end smoke (cv_shape filter +
similar_to intersection) passes locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>"

[ ] Step 7: Push to origin

git push origin feature/phon-116-naturalness-scorer

Expected: push succeeds; CI runs.

Closing checklist¶

[ ] All 13 tasks committed
[ ] Worker test suite passes (cd packages/web/workers && npm test)
[ ] Data test suite passes (uv run python -m pytest packages/data/tests/ -v)
[ ] Frontend builds (cd packages/web/frontend && npm run build)
[ ] Frontend tests pass (cd packages/web/frontend && npm test)
[ ] Local end-to-end smoke (Task 13 Step 5) returns the expected curated property set + filters apply
[ ] CI green on the pushed branch

Word Lists SLP Curation Implementation Plan¶

File map¶

Task 1: Add cv_shape derivation in the data pipeline¶

Task 2: Add freq_age_adult headline aggregation¶

Task 3: Extend PropertyDef in the worker config (TypeScript)¶

Task 4: Mirror PropertyDef changes in scripts/config.py¶

Task 5: /api/property-metadata?surface=platform¶

Task 6: cv_shape filter on /api/words/search¶

Task 7: similar_to block on /api/words/search¶

Task 8: Frontend CategoricalRule component¶

Task 9: Frontend SimilarToRule component¶

Task 10: usePropertyMetadata → ?surface=platform¶

Task 11: Restructure Builder.tsx to 5 accordions + wire new rules¶

Task 12: Delete PhonologicalSimilarityTool.tsx; update App copy¶

Task 13: Regenerate parquet + D1 seed; verify end-to-end¶

Closing checklist¶

Task 1: Add `cv_shape` derivation in the data pipeline¶

Task 2: Add `freq_age_adult` headline aggregation¶

Task 3: Extend `PropertyDef` in the worker config (TypeScript)¶

Task 4: Mirror PropertyDef changes in `scripts/config.py`¶

Task 5: `/api/property-metadata?surface=platform`¶

Task 6: `cv_shape` filter on `/api/words/search`¶

Task 7: `similar_to` block on `/api/words/search`¶

Task 8: Frontend `CategoricalRule` component¶

Task 9: Frontend `SimilarToRule` component¶

Task 10: `usePropertyMetadata` → `?surface=platform`¶

Task 11: Restructure `Builder.tsx` to 5 accordions + wire new rules¶

Task 12: Delete `PhonologicalSimilarityTool.tsx`; update App copy¶