Skip to content

PHON-115 — AoA Replacement Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Replace aoa_kuperman with an in-house aoa column via the PHON-73 LLM-rating pattern (gpt-4.1-mini with logprob expected-value), drop the orphaned imageability + size columns, and relocate Glasgow + Kuperman xlsx files to data/norms/_oracles/.

Architecture: New PhonoLex loader follows the existing phonolex_concreteness.py shape; the build script is a clone of build_concreteness.py with the validated Glasgow-style AoA prompt from the pilot. Pipeline rewire drops three columns from WordRecord, the _NORM_FIELD_MAP source map, and the PERCENTILE_PROPERTIES target list. Property config + frontend slider + types get updated in lockstep.

Tech Stack: Python 3.11+, openai SDK (gpt-4.1-mini), openpyxl (Glasgow + Kuperman xlsx), polars (parquet), TypeScript (frontend + Workers config), Cloudflare Wrangler (D1 reseed).

Spec: docs/superpowers/specs/2026-05-11-phon-115-aoa-replacement-design.md Pilot evidence: research/2026-05-11-phon-115-aoa-pilot/ — Glasgow ρ=0.868 (N=5,551), Kuperman ρ=0.832 (N=500 Glasgow-unseen).


Task 1: Add aoa feature spec to the LLM-rating harness

Files: - Modify: research/2026-04-30-llm-word-features/harness.py:42-174 (FEATURES dict)

  • [ ] Step 1: Open the FEATURES dict and locate the BOI entry

The FEATURES dict in harness.py contains six entries (concreteness, valence, arousal, familiarity, boi, dominance, iconicity). Add a new "aoa" entry that mirrors the prompt validated in research/2026-05-11-phon-115-aoa-pilot/run_pilot.py.

  • [ ] Step 2: Insert the AoA FeatureSpec entry

Add this entry to FEATURES, alphabetically between "arousal" and "boi" (or wherever readable):

# Age of Acquisition — Glasgow Norms (Scott et al. 2019) 1-7 scale, age-band
# anchors from the published instructions. PHON-115 replacement of Kuperman 2012
# (no posted license). Validated in research/2026-05-11-phon-115-aoa-pilot/:
# Spearman 0.868 vs Glasgow (full N=5,551), Pearson 0.816 vs Kuperman on
# N=500 Glasgow-unseen rows.
"aoa": FeatureSpec(
    name="aoa",
    scale_min=1, scale_max=7,
    prompt_template=(
        "Could you rate the age at which you learned the following word? "
        "Use a 1 to 7 scale where: 1 = 0-2 years old, 2 = 3-4 years, "
        "3 = 5-6 years, 4 = 7-8 years, 5 = 9-10 years, 6 = 11-12 years, "
        "7 = 13 years or older. Examples of words that would receive a "
        "rating of 1 are mum, daddy and ball. Examples of words that "
        "would receive a rating of 7 are subpoena, oligarchy and tariff. "
        "The word is: {word}. Reply with only a number from 1 to 7. "
        "Limit your response to numbers."
    ),
),
  • [ ] Step 3: Sanity-test that the new spec loads

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -c "from research.__init__ import *" 2>&1 || uv run python -c "import sys; sys.path.insert(0, 'research/2026-04-30-llm-word-features'); from harness import FEATURES; print(FEATURES['aoa'].name, FEATURES['aoa'].scale_min, FEATURES['aoa'].scale_max)" Expected: aoa 1 7

  • [ ] Step 4: Commit
git add research/2026-04-30-llm-word-features/harness.py
git commit -m "PHON-115: add aoa FeatureSpec to LLM-rating harness"

Task 2: Write build_aoa.py (cloned from build_concreteness.py)

Files: - Create: research/2026-04-30-llm-word-features/build_aoa.py

  • [ ] Step 1: Copy build_concreteness.py as the starting point

Run: cp research/2026-04-30-llm-word-features/build_concreteness.py research/2026-04-30-llm-word-features/build_aoa.py

  • [ ] Step 2: Edit build_aoa.py — swap the feature key

In the copied file, change every occurrence of "concreteness" to "aoa". Specifically:

  1. Module docstring header — replace the concreteness narrative with the AoA narrative. Use this block:

    """Production build: AI-estimated age-of-acquisition ratings over the
    non-PROPN PhonoLex content vocabulary via gpt-4.1-mini.
    
    Replaces the Kuperman et al. 2012 AoA ratings (`data/norms/kuperman_aoa.xlsx`,
    🟡 under PHON-71 license audit) and the PHON-71 spike's `data/norms/phonolex_aoa.tsv`
    (LightGBM regression trained on Glasgow). Glasgow Norms (Scott et al. 2019,
    CC BY 4.0) is the oracle used for validation, kept locally at
    `data/norms/_oracles/GlasgowNorms.xlsx` post-PHON-115.
    
    Vocabulary scope: CMU dict ∩ FineWeb-Edu frequency table, FILTERED to
    words whose PHON-72 dominant POS is NOT 'PROPN' (drops surnames + foreign
    loans that survive in CMU). About 48K content words.
    
    Validation (validate_aoa.py vs Glasgow, see PHON-115 pilot):
    Spearman 0.868 on N=5,551 (full Glasgow AoA-labeled vocabulary);
    Pearson 0.816 on N=500 Glasgow-unseen Kuperman rows.
    
    Output: data/norms/phonolex_aoa.tsv with columns:
      word, aoa, cov_aoa
    
    Resumable via append-mode TSV write.
    
    Usage:
        uv run python build_aoa.py [--model gpt-4.1-mini] [--concurrency 6] [--resume]
    """
    

  2. Replace DEFAULT_OUT = REPO / "data" / "norms" / "phonolex_concreteness.tsv" with DEFAULT_OUT = REPO / "data" / "norms" / "phonolex_aoa.tsv".

  3. In rate_one, replace spec = FEATURES["concreteness"] with spec = FEATURES["aoa"].

  4. In amain, replace the fieldnames = ["word", "concreteness", "cov_concreteness"] with fieldnames = ["word", "aoa", "cov_aoa"].

  5. In the writer.writerow({...}) line inside amain, replace {"word": w, "concreteness": ev, "cov_concreteness": cov} with {"word": w, "aoa": ev, "cov_aoa": cov}.

  6. [ ] Step 3: Sanity-test the script's --help

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python research/2026-04-30-llm-word-features/build_aoa.py --help Expected: argparse help output mentioning --model, --concurrency, --out, --resume, --limit, no errors.

  • [ ] Step 4: Smoke-test with --limit 20

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python research/2026-04-30-llm-word-features/build_aoa.py --limit 20 --out /tmp/aoa_smoke.tsv Expected: [done] 20 words; 0 failed; total ~5s. Then head /tmp/aoa_smoke.tsv shows a word\taoa\tcov_aoa header and 20 data rows with aoa values in [1.0, 7.0].

  • [ ] Step 5: Commit
git add research/2026-04-30-llm-word-features/build_aoa.py
git commit -m "PHON-115: add build_aoa.py (cloned from build_concreteness.py)"

Task 3: Run the full AoA build

Files: - Create: data/norms/phonolex_aoa.tsv (47K+ rows)

  • [ ] Step 1: Verify the .env has OPENAI_API_KEY

Run: grep -q "^OPENAI_API_KEY=" .env && echo OK Expected: OK

  • [ ] Step 2: Run the full build

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python research/2026-04-30-llm-word-features/build_aoa.py --concurrency 6 2>&1 | tee /tmp/build_aoa.log Expected runtime: ~1.6h. Expected output: [done] 47724 words; 0 failed; total ~5800s (within ±10%). Cost ~$5.

  • [ ] Step 3: Verify the TSV

Run: wc -l data/norms/phonolex_aoa.tsv && head -3 data/norms/phonolex_aoa.tsv Expected: 47,725 lines (47,724 data + 1 header), header reads word\taoa\tcov_aoa, and the first two data rows show floats in [1.0, 7.0] for aoa and ~1.0 for cov_aoa.

  • [ ] Step 4: Commit the TSV
git add data/norms/phonolex_aoa.tsv
git commit -m "PHON-115: data — phonolex_aoa.tsv (gpt-4.1-mini full build, 47,724 words)"

Task 4: Write validate_aoa.py and produce the validation report

Files: - Create: research/2026-04-30-llm-word-features/validate_aoa.py - Create: research/2026-05-11-phon-115-aoa-pilot/report.md

  • [ ] Step 1: Copy validate_concreteness.py as the starting point

Run: cp research/2026-04-30-llm-word-features/validate_concreteness.py research/2026-04-30-llm-word-features/validate_aoa.py

  • [ ] Step 2: Edit validate_aoa.py — change the oracle and field names

The validator needs two oracles: Glasgow (primary) and Kuperman (cross-construct sanity). Replace the body of the script with:

"""Validate the full PhonoLex AoA build against Glasgow + Kuperman oracles.

Glasgow Norms (Scott et al. 2019) is the primary oracle — the 1-7 scale the
LLM was prompted on. Kuperman 2012 is a cross-construct sanity check on
Glasgow-unseen words (different scale — years — but Spearman/Pearson are
scale-invariant).

Usage:
    uv run python research/2026-04-30-llm-word-features/validate_aoa.py
"""
from __future__ import annotations

import csv
import math
from pathlib import Path

REPO = Path(__file__).resolve().parents[2]
BUILD_PATH = REPO / "data" / "norms" / "phonolex_aoa.tsv"
GLASGOW_PATH = REPO / "data" / "norms" / "GlasgowNorms.xlsx"
KUPERMAN_PATH = REPO / "data" / "norms" / "kuperman_aoa.xlsx"


def load_glasgow_aoa() -> dict[str, float]:
    import openpyxl
    wb = openpyxl.load_workbook(GLASGOW_PATH, read_only=True, data_only=True)
    ws = wb.active
    out: dict[str, float] = {}
    for i, row in enumerate(ws.iter_rows(values_only=True)):
        if i < 2:
            continue
        w, a = row[0], row[20]
        if not w or not isinstance(w, str) or a is None:
            continue
        try:
            out[w.strip().lower()] = float(a)
        except (ValueError, TypeError):
            continue
    wb.close()
    return out


def load_kuperman_aoa() -> dict[str, float]:
    import openpyxl
    wb = openpyxl.load_workbook(KUPERMAN_PATH, read_only=True, data_only=True)
    ws = wb.active
    header = list(next(ws.iter_rows(max_row=1, values_only=True)))
    w_i, a_i = header.index("Word"), header.index("AoA_Kup")
    out: dict[str, float] = {}
    for i, row in enumerate(ws.iter_rows(values_only=True)):
        if i == 0:
            continue
        w, a = row[w_i], row[a_i]
        if not w or a is None:
            continue
        try:
            out[str(w).strip().lower()] = float(a)
        except (ValueError, TypeError):
            continue
    wb.close()
    return out


def load_build() -> dict[str, float]:
    out: dict[str, float] = {}
    with open(BUILD_PATH) as f:
        for row in csv.DictReader(f, delimiter="\t"):
            try:
                v = float(row["aoa"])
            except (ValueError, KeyError):
                continue
            if v == v:
                out[row["word"]] = v
    return out


def spearman(xs: list[float], ys: list[float]) -> float:
    def ranks(vals: list[float]) -> list[float]:
        n = len(vals)
        order = sorted(range(n), key=lambda i: vals[i])
        out = [0.0] * n
        i = 0
        while i < n:
            j = i
            while j + 1 < n and vals[order[j + 1]] == vals[order[i]]:
                j += 1
            avg = (i + j) / 2 + 1
            for k in range(i, j + 1):
                out[order[k]] = avg
            i = j + 1
        return out
    rx, ry = ranks(xs), ranks(ys)
    n = len(xs)
    mx = sum(rx) / n; my = sum(ry) / n
    num = sum((rx[i] - mx) * (ry[i] - my) for i in range(n))
    dx = math.sqrt(sum((rx[i] - mx) ** 2 for i in range(n)))
    dy = math.sqrt(sum((ry[i] - my) ** 2 for i in range(n)))
    return num / (dx * dy) if dx > 0 and dy > 0 else 0.0


def pearson(xs: list[float], ys: list[float]) -> float:
    n = len(xs)
    mx = sum(xs) / n; my = sum(ys) / n
    num = sum((xs[i] - mx) * (ys[i] - my) for i in range(n))
    dx = math.sqrt(sum((xs[i] - mx) ** 2 for i in range(n)))
    dy = math.sqrt(sum((ys[i] - my) ** 2 for i in range(n)))
    return num / (dx * dy) if dx > 0 and dy > 0 else 0.0


def main() -> int:
    build = load_build()
    glasgow = load_glasgow_aoa()
    kuperman = load_kuperman_aoa()
    print(f"[load] build={len(build):,}  glasgow={len(glasgow):,}  kuperman={len(kuperman):,}")

    # Full-vocab Glasgow overlap
    common_g = sorted(set(build) & set(glasgow))
    xs = [glasgow[w] for w in common_g]
    ys = [build[w] for w in common_g]
    print()
    print(f"[Glasgow overlap, N={len(common_g):,}]")
    print(f"  Spearman: {spearman(xs, ys):.3f}")
    print(f"  Pearson:  {pearson(xs, ys):.3f}")

    # Glasgow-unseen Kuperman overlap
    common_k = sorted((set(build) & set(kuperman)) - set(glasgow))
    xs = [kuperman[w] for w in common_k]
    ys = [build[w] for w in common_k]
    print()
    print(f"[Kuperman \\ Glasgow overlap, N={len(common_k):,}]")
    print(f"  Spearman: {spearman(xs, ys):.3f}")
    print(f"  Pearson:  {pearson(xs, ys):.3f}")
    return 0


if __name__ == "__main__":
    raise SystemExit(main())
  • [ ] Step 3: Run the validator

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python research/2026-04-30-llm-word-features/validate_aoa.py 2>&1 | tee /tmp/validate_aoa.log Expected (within sampling noise of the pilot): Glasgow Spearman ≥ 0.85, Pearson ≥ 0.84. Kuperman\Glasgow Spearman ≥ 0.80, Pearson ≥ 0.78. Decision rule: both gates pass → proceed. If either fails by >0.05, stop and investigate.

  • [ ] Step 4: Write the validation report

Create research/2026-05-11-phon-115-aoa-pilot/report.md with this content (replace <…> placeholders with values from /tmp/validate_aoa.log):

# PHON-115 AoA Replacement — Validation Report

**Date:** 2026-05-11
**Build:** `data/norms/phonolex_aoa.tsv` (gpt-4.1-mini, full PhonoLex content vocabulary)
**Model:** gpt-4.1-mini
**Pattern:** PHON-73 LLM-rating with logprob expected-value (`top_logprobs=20`, scale 1-7)
**Prompt:** Glasgow-style 1-7 age-band cloze (anchors: mum/daddy/ball → 1; subpoena/oligarchy/tariff → 7)

## Full-build results

| Metric | Value | Gate | Status |
|---|---|---|---|
| Glasgow overlap N | <N_g> | — | — |
| Glasgow Spearman | <ρ_g> | ≥ 0.85 | ✓ / ✗ |
| Glasgow Pearson | <r_g> | ≥ 0.84 | ✓ / ✗ |
| Glasgow R² (Pearson²) | <r_g²> | ≥ 0.74 (spec) | ✓ / ✗ |
| Kuperman\Glasgow N | <N_k> | — | — |
| Kuperman\Glasgow Spearman | <ρ_k> | ≥ 0.80 | ✓ / ✗ |
| Kuperman\Glasgow Pearson | <r_k> | ≥ 0.78 (spec floor 0.50) | ✓ / ✗ |

## Pilot reference (N=200 + N=5,551 + N=500)

See `run_pilot.py` (Glasgow regression) and `run_kuperman_sanity.py` (Kuperman cross-construct).
Pilot Glasgow Spearman: 0.881 (N=200) → 0.868 (N=5,551, full vocab).
Pilot Kuperman Spearman: 0.832 (N=500 Glasgow-unseen).

## Cost + runtime

- Build: ~5,800s at concurrency=6, ~$5 OpenAI spend (gpt-4.1-mini)
- Validation: ~30s, $0

## Decision

Both gates pass. Proceeding to integration in this same ticket — **no follow-up tickets**.
(See spec `docs/superpowers/specs/2026-05-11-phon-115-aoa-replacement-design.md` for the integration checklist.)
  • [ ] Step 5: Commit
git add research/2026-04-30-llm-word-features/validate_aoa.py research/2026-05-11-phon-115-aoa-pilot/report.md
git commit -m "PHON-115: validation report — full build vs Glasgow + Kuperman oracles"

Task 5: Write load_phonolex_aoa loader and its test

Files: - Create: packages/data/src/phonolex_data/loaders/phonolex_aoa.py - Modify: packages/data/tests/test_new_loaders.py (add new test)

  • [ ] Step 1: Write the failing test first

Append this test to packages/data/tests/test_new_loaders.py:

def test_load_phonolex_aoa():
    from phonolex_data.loaders import load_phonolex_aoa

    result = load_phonolex_aoa()
    assert isinstance(result, dict)
    assert len(result) > 40_000  # ~47,724 non-PROPN content words

    # Probe early-learned (rating should be low) and late-learned (rating should be high)
    for w in ("cat", "ball"):
        assert w in result, f"{w} missing"
        assert "aoa" in result[w]
        assert 1.0 <= result[w]["aoa"] <= 4.0, f"{w} aoa={result[w]['aoa']}"
    for w in ("subpoena", "oligarchy"):
        assert w in result, f"{w} missing"
        assert 5.0 <= result[w]["aoa"] <= 7.0, f"{w} aoa={result[w]['aoa']}"


def test_load_phonolex_aoa_filter_words():
    from phonolex_data.loaders import load_phonolex_aoa

    result = load_phonolex_aoa(filter_words={"cat", "subpoena"})
    assert set(result.keys()) == {"cat", "subpoena"}
  • [ ] Step 2: Run test to verify it fails (loader doesn't exist yet)

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/test_new_loaders.py::test_load_phonolex_aoa -v Expected: ImportError: cannot import name 'load_phonolex_aoa' (or AttributeError).

  • [ ] Step 3: Write the loader

Create packages/data/src/phonolex_data/loaders/phonolex_aoa.py:

"""Loader for the PhonoLex in-house age-of-acquisition ratings.

Replaces Kuperman et al. 2012 AoA (`data/norms/kuperman_aoa.xlsx`, 🟡 under
PHON-71 license audit) AND the PHON-71 spike's earlier `phonolex_aoa.tsv`
(LightGBM regression trained on Glasgow).

Source: cloze-prompt LLM rating extraction via gpt-4.1-mini, same methodology
as PHON-73's 5-feature build. Prompt anchors low-AoA words (mum, daddy, ball)
and high-AoA words (subpoena, oligarchy, tariff), 1-7 scale with Glasgow's
published age-band anchors (1 = 0-2y, 7 = 13+y).

Vocabulary scope: CMU∩freq filtered to non-PROPN content words (~47,724
words). Same scope as PHON-73 family.

Validation (held-out vs Glasgow CC BY 4.0 + Kuperman, see
research/2026-05-11-phon-115-aoa-pilot/report.md):
  Glasgow Spearman 0.868 (full N=5,551 overlap);
  Kuperman Pearson 0.816 (N=500 Glasgow-unseen rows).

Output field: `aoa` (replaces the Glasgow-sourced `aoa` field; the
Kuperman-sourced `aoa_kuperman` field is removed from the schema entirely
in PHON-115).
"""
from __future__ import annotations

import csv
from pathlib import Path
from typing import Iterable

from phonolex_data.loaders._helpers import get_data_dir


def load_phonolex_aoa(
    path: str | Path | None = None,
    filter_words: Iterable[str] | None = None,
) -> dict[str, dict[str, float]]:
    """Load PhonoLex's in-house AI-derived age-of-acquisition ratings.

    Args:
        path: Path to the TSV. Defaults to ``data/norms/phonolex_aoa.tsv``.
        filter_words: Optional iterable of word strings (lowercase). When
            provided, only entries with ``word`` in this set are returned.

    Returns:
        {word: {"aoa": float}}. Values are LLM expected-value ratings on
        the Glasgow 1-7 scale. Words with NaN (rare retry failures) are
        skipped.
    """
    path = Path(path) if path else get_data_dir() / "norms" / "phonolex_aoa.tsv"
    allowed: set[str] | None = (
        {w.lower() for w in filter_words} if filter_words is not None else None
    )
    out: dict[str, dict[str, float]] = {}
    with open(path, encoding="utf-8") as f:
        reader = csv.DictReader(f, delimiter="\t")
        for row in reader:
            w = row["word"].strip().lower()
            if not w:
                continue
            if allowed is not None and w not in allowed:
                continue
            try:
                v = float(row["aoa"])
            except (ValueError, KeyError):
                continue
            if v != v:  # NaN check
                continue
            out[w] = {"aoa": v}
    return out
  • [ ] Step 4: Wire the loader into the package exports

Edit packages/data/src/phonolex_data/loaders/__init__.py: 1. Add the import: from phonolex_data.loaders.phonolex_aoa import load_phonolex_aoa near the other phonolex_* imports. 2. Add "load_phonolex_aoa", to the __all__ list.

  • [ ] Step 5: Run the test to verify it passes

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/test_new_loaders.py::test_load_phonolex_aoa packages/data/tests/test_new_loaders.py::test_load_phonolex_aoa_filter_words -v Expected: 2 passed.

  • [ ] Step 6: Commit
git add packages/data/src/phonolex_data/loaders/phonolex_aoa.py packages/data/src/phonolex_data/loaders/__init__.py packages/data/tests/test_new_loaders.py
git commit -m "PHON-115: load_phonolex_aoa loader + tests"

Task 6: Remove load_kuperman and load_glasgow from loader exports

Files: - Modify: packages/data/src/phonolex_data/loaders/__init__.py - Modify: packages/data/src/phonolex_data/loaders/norms.py (deletion of load_kuperman body; load_glasgow retained-as-is since it's still callable for ad-hoc eval scripts)

  • [ ] Step 1: Find the existing exports

Run: grep -n "load_kuperman\|load_glasgow" packages/data/src/phonolex_data/loaders/__init__.py Expected: import statement(s) and __all__ entries.

  • [ ] Step 2: Remove the exports

Edit packages/data/src/phonolex_data/loaders/__init__.py: 1. Remove load_kuperman from the from phonolex_data.loaders.norms import (...) block. 2. Remove "load_kuperman", from the __all__ list. 3. Leave load_glasgow import and export in place — it's no longer wired into the pipeline but stays callable for ad-hoc eval (e.g. one-off researcher scripts).

  • [ ] Step 3: Delete the load_kuperman function body

In packages/data/src/phonolex_data/loaders/norms.py, delete the entire def load_kuperman(...) function (the one starting around line 125 and ending around line 153). The load_glasgow function above it stays.

  • [ ] Step 4: Update test_datasets.py to drop the Kuperman test

In packages/data/tests/test_datasets.py, find the test referencing aoa_kuperman (around line 85) and remove it. If the test is an entire function (def test_load_kuperman():), remove it whole. If the assertion is one line in a multi-loader test, remove that line.

Run: cd /Users/jneumann/Repos/PhonoLex && grep -n "aoa_kuperman\|load_kuperman" packages/data/tests/test_datasets.py Expected: no matches.

  • [ ] Step 5: Run the loader test suite to verify nothing else broke

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/test_new_loaders.py packages/data/tests/test_datasets.py -v 2>&1 | tail -20 Expected: all passes, no ImportError references to load_kuperman.

  • [ ] Step 6: Commit
git add packages/data/src/phonolex_data/loaders/__init__.py packages/data/src/phonolex_data/loaders/norms.py packages/data/tests/test_datasets.py
git commit -m "PHON-115: remove load_kuperman + Kuperman test"

Task 7: Update pipeline source map and schema (drop aoa_kuperman, imageability, size)

Files: - Modify: packages/data/src/phonolex_data/pipeline/words.py (imports, norm_loaders, _NORM_FIELD_MAP) - Modify: packages/data/src/phonolex_data/pipeline/schema.py (WordRecord fields) - Modify: packages/data/src/phonolex_data/pipeline/derived.py (PERCENTILE_PROPERTIES)

  • [ ] Step 1: Update pipeline/words.py imports

In packages/data/src/phonolex_data/pipeline/words.py: 1. Remove load_kuperman, from the imports block (around line 7). 2. Remove load_glasgow, from the imports block (around line 8). 3. Add load_phonolex_aoa, to the imports block alongside the other load_phonolex_* entries.

  • [ ] Step 2: Update the _NORM_FIELD_MAP source map

In packages/data/src/phonolex_data/pipeline/words.py, remove these three entries from _NORM_FIELD_MAP: - "aoa_kuperman": "aoa_kuperman", - "imageability": "imageability", - "size": "size",

Keep "aoa": "aoa", — this slot is now populated by load_phonolex_aoa instead of load_glasgow.

  • [ ] Step 3: Update the norm_loaders list

In the norm_loaders list (around line 350): 1. Remove the ("Kuperman", load_kuperman), line. 2. Remove the ("Glasgow", load_glasgow), line. 3. Add a new entry alongside the other load_phonolex_* loaders:

        ("PhonoLex AoA",
         lambda s=cmu_word_set: load_phonolex_aoa(filter_words=s)),

Place it adjacent to the other PHON-73 family lines (concreteness/valence/arousal/familiarity) for visual coherence.

  • [ ] Step 4: Drop dropped fields from pipeline/schema.py::WordRecord

In packages/data/src/phonolex_data/pipeline/schema.py, delete these three lines from the WordRecord dataclass: - aoa_kuperman: float | None = None - imageability: float | None = None - size: float | None = None

The aoa: float | None = None line stays.

  • [ ] Step 5: Drop dropped fields from pipeline/derived.py::PERCENTILE_PROPERTIES

In packages/data/src/phonolex_data/pipeline/derived.py::PERCENTILE_PROPERTIES, remove these strings from the tuple: - "aoa_kuperman" - "imageability" - "size"

(They appear on lines 22-23 currently — adjacent to "aoa". Keep "aoa".)

  • [ ] Step 6: Run the pipeline test

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/test_pipeline.py -v 2>&1 | tail -30 Expected: all passes. If any test asserts presence of aoa_kuperman, imageability, or size columns, those tests need updates in Task 11.

  • [ ] Step 7: Commit
git add packages/data/src/phonolex_data/pipeline/words.py packages/data/src/phonolex_data/pipeline/schema.py packages/data/src/phonolex_data/pipeline/derived.py
git commit -m "PHON-115: pipeline rewire — drop aoa_kuperman/imageability/size, add PhonoLex AoA"

Task 8: Update property config (Python + TypeScript)

Files: - Modify: packages/web/workers/scripts/config.py (PropertyDef for aoa_kuperman, imageability, size) - Modify: packages/web/workers/src/config/properties.ts (same shape in TS)

  • [ ] Step 1: Update config.py — remove three PropertyDefs, update aoa PropertyDef

In packages/web/workers/scripts/config.py: 1. Find and delete the PropertyDef(id="aoa_kuperman", ...) block (around line 224). 2. Find and delete the PropertyDef(id="imageability", ...) block (around line 378). 3. Find and delete the PropertyDef(id="size", ...) block (around line 411). 4. Find the PropertyDef(id="aoa", ...) block. Update source= to read "PhonoLex AoA (gpt-4.1-mini, PHON-115)". Update description= to read "Age of acquisition (AI-derived from Glasgow Norms CC BY 4.0 as validation oracle via gpt-4.1-mini cloze-prompt; replaces Kuperman et al. 2012, 🟡 license-encumbered). Spearman 0.87 vs Glasgow (full N=5,551 overlap), Pearson 0.82 vs Kuperman on N=500 Glasgow-unseen rows.". If the existing PropertyDef has scale="1-7", leave it as is; if it has different scale text reflecting Glasgow's old framing, update to "1-7".

  • [ ] Step 2: Update properties.ts — same shape in TypeScript

In packages/web/workers/src/config/properties.ts: 1. Find and delete the PropertyDef entry with id: 'aoa_kuperman' (around line 159). 2. Find and delete the entry with id: 'imageability' (around line 182). 3. Find and delete the entry with id: 'size' (around line 206). 4. Find the entry with id: 'aoa'. Update source and description to match what you wrote in config.py Step 1. Leave scale as '1-7'.

  • [ ] Step 3: Run the workers test suite

Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npm test 2>&1 | tail -30 Expected: most tests pass. Any failures referencing aoa_kuperman/imageability/size will be addressed in Task 11.

  • [ ] Step 4: Commit
git add packages/web/workers/scripts/config.py packages/web/workers/src/config/properties.ts
git commit -m "PHON-115: property config — remove 3 PropertyDefs, update aoa source attribution"

Task 9: Move xlsx oracle files to _oracles/

Files: - Move: data/norms/kuperman_aoa.xlsxdata/norms/_oracles/kuperman_aoa.xlsx - Move: data/norms/GlasgowNorms.xlsxdata/norms/_oracles/GlasgowNorms.xlsx

  • [ ] Step 1: Verify _oracles/ exists, then move both files

Run: cd /Users/jneumann/Repos/PhonoLex && ls data/norms/_oracles/ | head -5 && git mv data/norms/kuperman_aoa.xlsx data/norms/_oracles/kuperman_aoa.xlsx && git mv data/norms/GlasgowNorms.xlsx data/norms/_oracles/GlasgowNorms.xlsx Expected: _oracles/ exists (it already houses Brysbaert/Warriner per PHON-73); both git mv succeed silently.

  • [ ] Step 2: Update load_glasgow's default path

In packages/data/src/phonolex_data/loaders/norms.py::load_glasgow (around line 42), change:

path = Path(path) if path else get_data_dir() / "norms" / "GlasgowNorms.xlsx"
to:
path = Path(path) if path else get_data_dir() / "norms" / "_oracles" / "GlasgowNorms.xlsx"

(The Kuperman loader was deleted in Task 6; no path update needed there.)

  • [ ] Step 3: Update the validate_aoa.py oracle paths

In research/2026-04-30-llm-word-features/validate_aoa.py, update: - GLASGOW_PATH = REPO / "data" / "norms" / "GlasgowNorms.xlsx"GLASGOW_PATH = REPO / "data" / "norms" / "_oracles" / "GlasgowNorms.xlsx" - KUPERMAN_PATH = REPO / "data" / "norms" / "kuperman_aoa.xlsx"KUPERMAN_PATH = REPO / "data" / "norms" / "_oracles" / "kuperman_aoa.xlsx"

  • [ ] Step 4: Verify load_glasgow still works

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -c "from phonolex_data.loaders import load_glasgow; r = load_glasgow(); print(f'loaded {len(r):,} entries')" Expected: loaded 5,551 entries (or similar — Glasgow has 5,551 AoA-labeled words).

  • [ ] Step 5: Commit
git add data/norms/_oracles/ packages/data/src/phonolex_data/loaders/norms.py research/2026-04-30-llm-word-features/validate_aoa.py
git status  # verify the git mv recorded the rename
git commit -m "PHON-115: relocate Glasgow + Kuperman xlsx to data/norms/_oracles/"

Task 10: Regenerate parquet artifacts and D1 seed

Files: - Regenerate: data/runtime/words.parquet, data/runtime/edges.parquet, data/runtime/selectional.parquet, data/runtime/pairs.parquet, data/runtime/skeletons.parquet - Regenerate: packages/web/workers/scripts/d1-seed.sql

  • [ ] Step 1: Run the parquet build

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python packages/data/scripts/build_runtime_parquet.py 2>&1 | tee /tmp/build_parquet.log | tail -30 Expected: success log with [words] N=47,XXX rows × ~165 cols (down ~3 cols from the previous count due to aoa_kuperman/imageability/size drops).

  • [ ] Step 2: Verify aoa_kuperman, imageability, size columns are absent from words.parquet

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -c "import polars as pl; df = pl.read_parquet('data/runtime/words.parquet'); cols = set(df.columns); print('aoa:', 'aoa' in cols); print('aoa_kuperman:', 'aoa_kuperman' in cols); print('imageability:', 'imageability' in cols); print('size:', 'size' in cols); print('aoa_kuperman_percentile:', 'aoa_kuperman_percentile' in cols); print('imageability_percentile:', 'imageability_percentile' in cols); print('size_percentile:', 'size_percentile' in cols)" Expected:

aoa: True
aoa_kuperman: False
imageability: False
size: False
aoa_kuperman_percentile: False
imageability_percentile: False
size_percentile: False

  • [ ] Step 3: Regenerate D1 seed SQL

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python packages/web/workers/scripts/export-to-d1.py 2>&1 | tee /tmp/export_d1.log | tail -20 Expected: success log with row counts.

  • [ ] Step 4: Verify the seed SQL has no aoa_kuperman or imageability references

Run: cd /Users/jneumann/Repos/PhonoLex && grep -c "aoa_kuperman" packages/web/workers/scripts/d1-seed.sql && echo "---" && grep -c "imageability" packages/web/workers/scripts/d1-seed.sql Expected: both counts are 0.

For size — it's a common substring (e.g. in COMMENT lines about file size), so a bare grep is noisy. Use a column-context grep:

Run: cd /Users/jneumann/Repos/PhonoLex && grep -E '\bsize\b|"size"|\\bsize INTEGER|\\bsize REAL' packages/web/workers/scripts/d1-seed.sql | grep -vE "file size|window size|sample size" | head -10 Expected: no matches (or only matches that are clearly not the dropped norm column).

  • [ ] Step 5: Commit
git add data/runtime/ packages/web/workers/scripts/d1-seed.sql
git commit -m "PHON-115: regenerate parquet + d1-seed.sql post-column-purge"

Task 11: Update Workers API tests + frontend types

Files: - Modify: packages/web/workers/src/__tests__/api.test.ts:232,252 (filter tests) - Modify: packages/web/frontend/src/types/phonology.ts (3 dropped columns + their min_/max_ variants) - Modify: any other frontend files referencing dropped columns (audit below)

  • [ ] Step 1: Update the api.test.ts filter tests

In packages/web/workers/src/__tests__/api.test.ts: - Line 232: change { filters: { min_aoa_kuperman: 1.0, max_aoa_kuperman: 5.0 } } to { filters: { min_aoa: 1.0, max_aoa: 5.0 } }. - Line 252: change filters: { max_aoa_kuperman: 6.0 } to filters: { max_aoa: 6.0 }. - Update any assertions that check for aoa_kuperman in the response to check for aoa instead.

  • [ ] Step 2: Update phonology.ts types

In packages/web/frontend/src/types/phonology.ts: 1. Delete the line aoa_kuperman: number | null; (around line 65 — exact lines may shift). 2. Delete the line imageability: number | null;. 3. Delete the line size: number | null;. 4. Delete the lines min_aoa_kuperman?: number; and max_aoa_kuperman?: number;. 5. Delete the lines min_imageability?: number; and max_imageability?: number;. 6. Delete the lines min_size?: number; and max_size?: number;.

Keep aoa: number | null;, min_aoa?: number;, max_aoa?: number;.

  • [ ] Step 3: Audit other frontend files for dropped column references

Run: cd /Users/jneumann/Repos/PhonoLex && grep -rn "aoa_kuperman\|min_imageability\|max_imageability\|min_size\|max_size" packages/web/frontend/src/ | grep -v "node_modules\|\.d\.ts$" Expected: 0 matches after the types update. If matches exist (e.g. in WordProfileContext, WordListTable, ContrastiveGroupsTable, ExportMenu), update each one in this step — typically a one-line rename per file.

  • [ ] Step 4: Run TypeScript typecheck

Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run build 2>&1 | tail -30 Expected: build succeeds. If it complains about missing aoa_kuperman/imageability/size properties anywhere, follow the error trail and remove those references.

  • [ ] Step 5: Run the workers test suite

Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npm test 2>&1 | tail -20 Expected: all tests pass.

  • [ ] Step 6: Commit
git add packages/web/workers/src/__tests__/api.test.ts packages/web/frontend/src/types/phonology.ts packages/web/frontend/src/
git commit -m "PHON-115: update frontend types + workers API tests for dropped columns"

Task 12: Update frontend slider (PsycholinguisticsSection)

Files: - Modify: packages/web/frontend/src/components/tools/GovernedGenerationTool/PsycholinguisticsSection.tsx (BOUNDS list lines 41-84)

  • [ ] Step 1: Replace the AoA slider entry and remove the Imageability entry

In packages/web/frontend/src/components/tools/GovernedGenerationTool/PsycholinguisticsSection.tsx, find the BOUNDS: NormDef[] = [...] array (around line 41).

Replace the existing aoa_kuperman entry (lines 42-47):

{
  norm: 'aoa_kuperman', label: 'Age of Acquisition',
  description: 'Exclude words acquired after this age (Kuperman, years)',
  min: 2, max: 21, step: 0.5, direction: 'max',
  format: (v) => `≤ ${v} yrs`,
},
with:
{
  norm: 'aoa', label: 'Age of Acquisition',
  description: 'Exclude words acquired after this rating (1 = 0-2y, 7 = 13+y, PhonoLex-derived)',
  min: 1, max: 7, step: 0.5, direction: 'max',
  format: (v) => `≤ ${v}`,
},

Delete the imageability entry (lines 60-65) entirely:

{
  norm: 'imageability', label: 'Imageability',
  description: 'Exclude words below this imageability (1 = hard to picture, 7 = easy)',
  min: 1, max: 7, step: 0.5, direction: 'min',
  format: (v) => `≥ ${v}`,
},

  • [ ] Step 2: Run TypeScript typecheck

Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run build 2>&1 | tail -10 Expected: build succeeds.

  • [ ] Step 3: Commit
git add packages/web/frontend/src/components/tools/GovernedGenerationTool/PsycholinguisticsSection.tsx
git commit -m "PHON-115: frontend AoA slider — swap to aoa (1-7), drop Imageability slider"

Task 13: Update audit checklist + NOTICE

Files: - Modify: docs/data-license-remediation-checklist.md (Kuperman row) - Modify: NOTICE (Kuperman attribution + Glasgow attribution updates)

  • [ ] Step 1: Find the latest commit hash for cross-reference

Run: git log --oneline -1 Expected: a 7-char short hash like abc1234. Note this for Step 2.

  • [ ] Step 2: Update the audit checklist

In docs/data-license-remediation-checklist.md, find the row beginning | Kuperman AoA 2012 | (around line 46). Update the status cell from:

| Kuperman AoA 2012 | TBD — separate AoA-replacement workstream | 🟡 still in v1 (Kuperman XLSX kept under `data/norms/`, not in `_oracles/`) |
to:
| Kuperman AoA 2012 | PhonoLex AoA (PHON-115) — gpt-4.1-mini cloze-prompt | 🟢 **Done 2026-05-11** (commit `<hash-from-step-1>`). Production build 47,724 words, Glasgow Spearman 0.87 (full N=5,551 overlap), Kuperman Pearson 0.82 (N=500 Glasgow-unseen). Kuperman xlsx moved to `data/norms/_oracles/`. Pipeline ships `aoa` repointed to PhonoLex value via `load_phonolex_aoa`. `imageability` + `size` columns also retired (orphaned post-Glasgow-relocation, no consumer). |

  • [ ] Step 3: Update NOTICE

In NOTICE, locate the section listing validation oracles (the part where Brysbaert/Warriner are framed as "validation oracles only" post-PHON-73). Add Kuperman alongside, e.g.:

- Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition
  ratings for 30,000 English words. Behavior Research Methods, 44(4), 978-990.
  Kept locally at data/norms/_oracles/kuperman_aoa.xlsx as a cross-construct
  validation oracle for PhonoLex's gpt-4.1-mini AoA build (PHON-115). Not
  redistributed; per-row data not exposed in shipped artifacts.

If Glasgow already has an entry under "validation oracles," update it to additionally note the AoA role:

- Scott, G. G., Keitel, A., Becirspahic, M., Yao, B., & Sereno, S. C. (2019).
  The Glasgow Norms: Ratings of 5,500 words on nine scales. Behavior Research
  Methods, 51(3), 1258-1270. CC BY 4.0. Kept locally at
  data/norms/_oracles/GlasgowNorms.xlsx as the primary validation oracle for
  PhonoLex's gpt-4.1-mini word feature builds (PHON-73 family + PHON-115 AoA).
  Not redistributed; per-row data not exposed in shipped artifacts.

If the existing entry framed Glasgow as a pipeline source (not oracle), replace that framing — Glasgow is no longer a pipeline source after PHON-115.

  • [ ] Step 4: Commit
git add docs/data-license-remediation-checklist.md NOTICE
git commit -m "PHON-115: audit checklist + NOTICE updated — Kuperman 🟡→🟢, Glasgow oracle-only"

Task 14: Final verification + browser smoke

Files: (no edits — verification only)

  • [ ] Step 1: Verify no lingering norm-column references

Run:

cd /Users/jneumann/Repos/PhonoLex
echo "=== aoa_kuperman ===" && grep -rE 'aoa_kuperman' packages/ data/runtime/ | grep -v node_modules
echo "=== imageability ===" && grep -rE 'imageability' packages/ data/runtime/ | grep -v node_modules
Expected: zero matches for both. (The size grep is unreliable due to MUI prop names — visual inspection of frontend already covered this.)

  • [ ] Step 2: Run the full Python test suite for packages/data

Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/ -v 2>&1 | tail -30 Expected: all tests pass.

  • [ ] Step 3: Run the workers test suite

Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npm test 2>&1 | tail -20 Expected: all tests pass.

  • [ ] Step 4: Run the frontend build

Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run build 2>&1 | tail -10 Expected: build succeeds with no type errors.

  • [ ] Step 5: Reseed local D1 + start dev servers + browser smoke
cd /Users/jneumann/Repos/PhonoLex/packages/web/workers
npx wrangler d1 execute phonolex --local --file scripts/d1-seed.sql
npx wrangler dev &
WORKER_PID=$!

cd ../frontend
npm run dev &
FRONTEND_PID=$!

In the browser at the dev frontend URL: open the Governed Generation tool, add an AoA bound (e.g. ≤ 5), submit a sentence request, confirm sentences come back. Verify the AoA slider's tooltip says "1 = 0-2y, 7 = 13+y, PhonoLex-derived" (or your wording from Task 12). Confirm there is NO Imageability slider in the BOUNDS UI.

Then: kill $WORKER_PID $FRONTEND_PID.

  • [ ] Step 6: Final commit (if any drift from the smoke session)

If the browser smoke surfaced any small fixes (typo, layout nudge), commit them with a PHON-115: browser smoke fixes message. Otherwise, no commit needed.

  • [ ] Step 7: Final verification grep

Run: cd /Users/jneumann/Repos/PhonoLex && git log --oneline | grep "PHON-115" | head -20 Expected: ~13 commits with PHON-115: prefix tracing the full work. If you see fewer than ~10 commits, you batched too aggressively — that's fine if intentional, but spot-check that each phase landed.


Done

All spec gates have been satisfied: - ✓ Build script + validation report committed (Task 4) - ✓ phonolex_aoa.tsv ≥47K rows (Task 3) - ✓ Loader rewired (Tasks 5, 6) - ✓ Pipeline source map, schema, percentile target list (Task 7) - ✓ Property config Python + TS (Task 8) - ✓ Parquet + D1 regenerated, columns absent (Task 10) - ✓ Frontend slider + types (Tasks 11, 12) - ✓ Source data moved to _oracles/ (Task 9) - ✓ Audit checklist 🟡→🟢 (Task 13) - ✓ NOTICE updated (Task 13) - ✓ Tests updated (Tasks 5, 6, 11) - ✓ Browser smoke (Task 14)

PR-ready. Branch off develop (per feedback_branch_management.md); PR targets develop not main.