PHON-115 — AoA Replacement Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Replace aoa_kuperman with an in-house aoa column via the PHON-73 LLM-rating pattern (gpt-4.1-mini with logprob expected-value), drop the orphaned imageability + size columns, and relocate Glasgow + Kuperman xlsx files to data/norms/_oracles/.
Architecture: New PhonoLex loader follows the existing phonolex_concreteness.py shape; the build script is a clone of build_concreteness.py with the validated Glasgow-style AoA prompt from the pilot. Pipeline rewire drops three columns from WordRecord, the _NORM_FIELD_MAP source map, and the PERCENTILE_PROPERTIES target list. Property config + frontend slider + types get updated in lockstep.
Tech Stack: Python 3.11+, openai SDK (gpt-4.1-mini), openpyxl (Glasgow + Kuperman xlsx), polars (parquet), TypeScript (frontend + Workers config), Cloudflare Wrangler (D1 reseed).
Spec: docs/superpowers/specs/2026-05-11-phon-115-aoa-replacement-design.md
Pilot evidence: research/2026-05-11-phon-115-aoa-pilot/ — Glasgow ρ=0.868 (N=5,551), Kuperman ρ=0.832 (N=500 Glasgow-unseen).
Task 1: Add aoa feature spec to the LLM-rating harness¶
Files:
- Modify: research/2026-04-30-llm-word-features/harness.py:42-174 (FEATURES dict)
- [ ] Step 1: Open the FEATURES dict and locate the BOI entry
The FEATURES dict in harness.py contains six entries (concreteness, valence, arousal, familiarity, boi, dominance, iconicity). Add a new "aoa" entry that mirrors the prompt validated in research/2026-05-11-phon-115-aoa-pilot/run_pilot.py.
- [ ] Step 2: Insert the AoA FeatureSpec entry
Add this entry to FEATURES, alphabetically between "arousal" and "boi" (or wherever readable):
# Age of Acquisition — Glasgow Norms (Scott et al. 2019) 1-7 scale, age-band
# anchors from the published instructions. PHON-115 replacement of Kuperman 2012
# (no posted license). Validated in research/2026-05-11-phon-115-aoa-pilot/:
# Spearman 0.868 vs Glasgow (full N=5,551), Pearson 0.816 vs Kuperman on
# N=500 Glasgow-unseen rows.
"aoa": FeatureSpec(
name="aoa",
scale_min=1, scale_max=7,
prompt_template=(
"Could you rate the age at which you learned the following word? "
"Use a 1 to 7 scale where: 1 = 0-2 years old, 2 = 3-4 years, "
"3 = 5-6 years, 4 = 7-8 years, 5 = 9-10 years, 6 = 11-12 years, "
"7 = 13 years or older. Examples of words that would receive a "
"rating of 1 are mum, daddy and ball. Examples of words that "
"would receive a rating of 7 are subpoena, oligarchy and tariff. "
"The word is: {word}. Reply with only a number from 1 to 7. "
"Limit your response to numbers."
),
),
- [ ] Step 3: Sanity-test that the new spec loads
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -c "from research.__init__ import *" 2>&1 || uv run python -c "import sys; sys.path.insert(0, 'research/2026-04-30-llm-word-features'); from harness import FEATURES; print(FEATURES['aoa'].name, FEATURES['aoa'].scale_min, FEATURES['aoa'].scale_max)"
Expected: aoa 1 7
- [ ] Step 4: Commit
git add research/2026-04-30-llm-word-features/harness.py
git commit -m "PHON-115: add aoa FeatureSpec to LLM-rating harness"
Task 2: Write build_aoa.py (cloned from build_concreteness.py)¶
Files:
- Create: research/2026-04-30-llm-word-features/build_aoa.py
- [ ] Step 1: Copy build_concreteness.py as the starting point
Run: cp research/2026-04-30-llm-word-features/build_concreteness.py research/2026-04-30-llm-word-features/build_aoa.py
- [ ] Step 2: Edit
build_aoa.py— swap the feature key
In the copied file, change every occurrence of "concreteness" to "aoa". Specifically:
-
Module docstring header — replace the concreteness narrative with the AoA narrative. Use this block:
"""Production build: AI-estimated age-of-acquisition ratings over the non-PROPN PhonoLex content vocabulary via gpt-4.1-mini. Replaces the Kuperman et al. 2012 AoA ratings (`data/norms/kuperman_aoa.xlsx`, 🟡 under PHON-71 license audit) and the PHON-71 spike's `data/norms/phonolex_aoa.tsv` (LightGBM regression trained on Glasgow). Glasgow Norms (Scott et al. 2019, CC BY 4.0) is the oracle used for validation, kept locally at `data/norms/_oracles/GlasgowNorms.xlsx` post-PHON-115. Vocabulary scope: CMU dict ∩ FineWeb-Edu frequency table, FILTERED to words whose PHON-72 dominant POS is NOT 'PROPN' (drops surnames + foreign loans that survive in CMU). About 48K content words. Validation (validate_aoa.py vs Glasgow, see PHON-115 pilot): Spearman 0.868 on N=5,551 (full Glasgow AoA-labeled vocabulary); Pearson 0.816 on N=500 Glasgow-unseen Kuperman rows. Output: data/norms/phonolex_aoa.tsv with columns: word, aoa, cov_aoa Resumable via append-mode TSV write. Usage: uv run python build_aoa.py [--model gpt-4.1-mini] [--concurrency 6] [--resume] """ -
Replace
DEFAULT_OUT = REPO / "data" / "norms" / "phonolex_concreteness.tsv"withDEFAULT_OUT = REPO / "data" / "norms" / "phonolex_aoa.tsv". -
In
rate_one, replacespec = FEATURES["concreteness"]withspec = FEATURES["aoa"]. -
In
amain, replace thefieldnames = ["word", "concreteness", "cov_concreteness"]withfieldnames = ["word", "aoa", "cov_aoa"]. -
In the
writer.writerow({...})line insideamain, replace{"word": w, "concreteness": ev, "cov_concreteness": cov}with{"word": w, "aoa": ev, "cov_aoa": cov}. -
[ ] Step 3: Sanity-test the script's --help
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python research/2026-04-30-llm-word-features/build_aoa.py --help
Expected: argparse help output mentioning --model, --concurrency, --out, --resume, --limit, no errors.
- [ ] Step 4: Smoke-test with --limit 20
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python research/2026-04-30-llm-word-features/build_aoa.py --limit 20 --out /tmp/aoa_smoke.tsv
Expected: [done] 20 words; 0 failed; total ~5s. Then head /tmp/aoa_smoke.tsv shows a word\taoa\tcov_aoa header and 20 data rows with aoa values in [1.0, 7.0].
- [ ] Step 5: Commit
git add research/2026-04-30-llm-word-features/build_aoa.py
git commit -m "PHON-115: add build_aoa.py (cloned from build_concreteness.py)"
Task 3: Run the full AoA build¶
Files:
- Create: data/norms/phonolex_aoa.tsv (47K+ rows)
- [ ] Step 1: Verify the .env has OPENAI_API_KEY
Run: grep -q "^OPENAI_API_KEY=" .env && echo OK
Expected: OK
- [ ] Step 2: Run the full build
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python research/2026-04-30-llm-word-features/build_aoa.py --concurrency 6 2>&1 | tee /tmp/build_aoa.log
Expected runtime: ~1.6h. Expected output: [done] 47724 words; 0 failed; total ~5800s (within ±10%). Cost ~$5.
- [ ] Step 3: Verify the TSV
Run: wc -l data/norms/phonolex_aoa.tsv && head -3 data/norms/phonolex_aoa.tsv
Expected: 47,725 lines (47,724 data + 1 header), header reads word\taoa\tcov_aoa, and the first two data rows show floats in [1.0, 7.0] for aoa and ~1.0 for cov_aoa.
- [ ] Step 4: Commit the TSV
git add data/norms/phonolex_aoa.tsv
git commit -m "PHON-115: data — phonolex_aoa.tsv (gpt-4.1-mini full build, 47,724 words)"
Task 4: Write validate_aoa.py and produce the validation report¶
Files:
- Create: research/2026-04-30-llm-word-features/validate_aoa.py
- Create: research/2026-05-11-phon-115-aoa-pilot/report.md
- [ ] Step 1: Copy validate_concreteness.py as the starting point
Run: cp research/2026-04-30-llm-word-features/validate_concreteness.py research/2026-04-30-llm-word-features/validate_aoa.py
- [ ] Step 2: Edit
validate_aoa.py— change the oracle and field names
The validator needs two oracles: Glasgow (primary) and Kuperman (cross-construct sanity). Replace the body of the script with:
"""Validate the full PhonoLex AoA build against Glasgow + Kuperman oracles.
Glasgow Norms (Scott et al. 2019) is the primary oracle — the 1-7 scale the
LLM was prompted on. Kuperman 2012 is a cross-construct sanity check on
Glasgow-unseen words (different scale — years — but Spearman/Pearson are
scale-invariant).
Usage:
uv run python research/2026-04-30-llm-word-features/validate_aoa.py
"""
from __future__ import annotations
import csv
import math
from pathlib import Path
REPO = Path(__file__).resolve().parents[2]
BUILD_PATH = REPO / "data" / "norms" / "phonolex_aoa.tsv"
GLASGOW_PATH = REPO / "data" / "norms" / "GlasgowNorms.xlsx"
KUPERMAN_PATH = REPO / "data" / "norms" / "kuperman_aoa.xlsx"
def load_glasgow_aoa() -> dict[str, float]:
import openpyxl
wb = openpyxl.load_workbook(GLASGOW_PATH, read_only=True, data_only=True)
ws = wb.active
out: dict[str, float] = {}
for i, row in enumerate(ws.iter_rows(values_only=True)):
if i < 2:
continue
w, a = row[0], row[20]
if not w or not isinstance(w, str) or a is None:
continue
try:
out[w.strip().lower()] = float(a)
except (ValueError, TypeError):
continue
wb.close()
return out
def load_kuperman_aoa() -> dict[str, float]:
import openpyxl
wb = openpyxl.load_workbook(KUPERMAN_PATH, read_only=True, data_only=True)
ws = wb.active
header = list(next(ws.iter_rows(max_row=1, values_only=True)))
w_i, a_i = header.index("Word"), header.index("AoA_Kup")
out: dict[str, float] = {}
for i, row in enumerate(ws.iter_rows(values_only=True)):
if i == 0:
continue
w, a = row[w_i], row[a_i]
if not w or a is None:
continue
try:
out[str(w).strip().lower()] = float(a)
except (ValueError, TypeError):
continue
wb.close()
return out
def load_build() -> dict[str, float]:
out: dict[str, float] = {}
with open(BUILD_PATH) as f:
for row in csv.DictReader(f, delimiter="\t"):
try:
v = float(row["aoa"])
except (ValueError, KeyError):
continue
if v == v:
out[row["word"]] = v
return out
def spearman(xs: list[float], ys: list[float]) -> float:
def ranks(vals: list[float]) -> list[float]:
n = len(vals)
order = sorted(range(n), key=lambda i: vals[i])
out = [0.0] * n
i = 0
while i < n:
j = i
while j + 1 < n and vals[order[j + 1]] == vals[order[i]]:
j += 1
avg = (i + j) / 2 + 1
for k in range(i, j + 1):
out[order[k]] = avg
i = j + 1
return out
rx, ry = ranks(xs), ranks(ys)
n = len(xs)
mx = sum(rx) / n; my = sum(ry) / n
num = sum((rx[i] - mx) * (ry[i] - my) for i in range(n))
dx = math.sqrt(sum((rx[i] - mx) ** 2 for i in range(n)))
dy = math.sqrt(sum((ry[i] - my) ** 2 for i in range(n)))
return num / (dx * dy) if dx > 0 and dy > 0 else 0.0
def pearson(xs: list[float], ys: list[float]) -> float:
n = len(xs)
mx = sum(xs) / n; my = sum(ys) / n
num = sum((xs[i] - mx) * (ys[i] - my) for i in range(n))
dx = math.sqrt(sum((xs[i] - mx) ** 2 for i in range(n)))
dy = math.sqrt(sum((ys[i] - my) ** 2 for i in range(n)))
return num / (dx * dy) if dx > 0 and dy > 0 else 0.0
def main() -> int:
build = load_build()
glasgow = load_glasgow_aoa()
kuperman = load_kuperman_aoa()
print(f"[load] build={len(build):,} glasgow={len(glasgow):,} kuperman={len(kuperman):,}")
# Full-vocab Glasgow overlap
common_g = sorted(set(build) & set(glasgow))
xs = [glasgow[w] for w in common_g]
ys = [build[w] for w in common_g]
print()
print(f"[Glasgow overlap, N={len(common_g):,}]")
print(f" Spearman: {spearman(xs, ys):.3f}")
print(f" Pearson: {pearson(xs, ys):.3f}")
# Glasgow-unseen Kuperman overlap
common_k = sorted((set(build) & set(kuperman)) - set(glasgow))
xs = [kuperman[w] for w in common_k]
ys = [build[w] for w in common_k]
print()
print(f"[Kuperman \\ Glasgow overlap, N={len(common_k):,}]")
print(f" Spearman: {spearman(xs, ys):.3f}")
print(f" Pearson: {pearson(xs, ys):.3f}")
return 0
if __name__ == "__main__":
raise SystemExit(main())
- [ ] Step 3: Run the validator
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python research/2026-04-30-llm-word-features/validate_aoa.py 2>&1 | tee /tmp/validate_aoa.log
Expected (within sampling noise of the pilot): Glasgow Spearman ≥ 0.85, Pearson ≥ 0.84. Kuperman\Glasgow Spearman ≥ 0.80, Pearson ≥ 0.78. Decision rule: both gates pass → proceed. If either fails by >0.05, stop and investigate.
- [ ] Step 4: Write the validation report
Create research/2026-05-11-phon-115-aoa-pilot/report.md with this content (replace <…> placeholders with values from /tmp/validate_aoa.log):
# PHON-115 AoA Replacement — Validation Report
**Date:** 2026-05-11
**Build:** `data/norms/phonolex_aoa.tsv` (gpt-4.1-mini, full PhonoLex content vocabulary)
**Model:** gpt-4.1-mini
**Pattern:** PHON-73 LLM-rating with logprob expected-value (`top_logprobs=20`, scale 1-7)
**Prompt:** Glasgow-style 1-7 age-band cloze (anchors: mum/daddy/ball → 1; subpoena/oligarchy/tariff → 7)
## Full-build results
| Metric | Value | Gate | Status |
|---|---|---|---|
| Glasgow overlap N | <N_g> | — | — |
| Glasgow Spearman | <ρ_g> | ≥ 0.85 | ✓ / ✗ |
| Glasgow Pearson | <r_g> | ≥ 0.84 | ✓ / ✗ |
| Glasgow R² (Pearson²) | <r_g²> | ≥ 0.74 (spec) | ✓ / ✗ |
| Kuperman\Glasgow N | <N_k> | — | — |
| Kuperman\Glasgow Spearman | <ρ_k> | ≥ 0.80 | ✓ / ✗ |
| Kuperman\Glasgow Pearson | <r_k> | ≥ 0.78 (spec floor 0.50) | ✓ / ✗ |
## Pilot reference (N=200 + N=5,551 + N=500)
See `run_pilot.py` (Glasgow regression) and `run_kuperman_sanity.py` (Kuperman cross-construct).
Pilot Glasgow Spearman: 0.881 (N=200) → 0.868 (N=5,551, full vocab).
Pilot Kuperman Spearman: 0.832 (N=500 Glasgow-unseen).
## Cost + runtime
- Build: ~5,800s at concurrency=6, ~$5 OpenAI spend (gpt-4.1-mini)
- Validation: ~30s, $0
## Decision
Both gates pass. Proceeding to integration in this same ticket — **no follow-up tickets**.
(See spec `docs/superpowers/specs/2026-05-11-phon-115-aoa-replacement-design.md` for the integration checklist.)
- [ ] Step 5: Commit
git add research/2026-04-30-llm-word-features/validate_aoa.py research/2026-05-11-phon-115-aoa-pilot/report.md
git commit -m "PHON-115: validation report — full build vs Glasgow + Kuperman oracles"
Task 5: Write load_phonolex_aoa loader and its test¶
Files:
- Create: packages/data/src/phonolex_data/loaders/phonolex_aoa.py
- Modify: packages/data/tests/test_new_loaders.py (add new test)
- [ ] Step 1: Write the failing test first
Append this test to packages/data/tests/test_new_loaders.py:
def test_load_phonolex_aoa():
from phonolex_data.loaders import load_phonolex_aoa
result = load_phonolex_aoa()
assert isinstance(result, dict)
assert len(result) > 40_000 # ~47,724 non-PROPN content words
# Probe early-learned (rating should be low) and late-learned (rating should be high)
for w in ("cat", "ball"):
assert w in result, f"{w} missing"
assert "aoa" in result[w]
assert 1.0 <= result[w]["aoa"] <= 4.0, f"{w} aoa={result[w]['aoa']}"
for w in ("subpoena", "oligarchy"):
assert w in result, f"{w} missing"
assert 5.0 <= result[w]["aoa"] <= 7.0, f"{w} aoa={result[w]['aoa']}"
def test_load_phonolex_aoa_filter_words():
from phonolex_data.loaders import load_phonolex_aoa
result = load_phonolex_aoa(filter_words={"cat", "subpoena"})
assert set(result.keys()) == {"cat", "subpoena"}
- [ ] Step 2: Run test to verify it fails (loader doesn't exist yet)
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/test_new_loaders.py::test_load_phonolex_aoa -v
Expected: ImportError: cannot import name 'load_phonolex_aoa' (or AttributeError).
- [ ] Step 3: Write the loader
Create packages/data/src/phonolex_data/loaders/phonolex_aoa.py:
"""Loader for the PhonoLex in-house age-of-acquisition ratings.
Replaces Kuperman et al. 2012 AoA (`data/norms/kuperman_aoa.xlsx`, 🟡 under
PHON-71 license audit) AND the PHON-71 spike's earlier `phonolex_aoa.tsv`
(LightGBM regression trained on Glasgow).
Source: cloze-prompt LLM rating extraction via gpt-4.1-mini, same methodology
as PHON-73's 5-feature build. Prompt anchors low-AoA words (mum, daddy, ball)
and high-AoA words (subpoena, oligarchy, tariff), 1-7 scale with Glasgow's
published age-band anchors (1 = 0-2y, 7 = 13+y).
Vocabulary scope: CMU∩freq filtered to non-PROPN content words (~47,724
words). Same scope as PHON-73 family.
Validation (held-out vs Glasgow CC BY 4.0 + Kuperman, see
research/2026-05-11-phon-115-aoa-pilot/report.md):
Glasgow Spearman 0.868 (full N=5,551 overlap);
Kuperman Pearson 0.816 (N=500 Glasgow-unseen rows).
Output field: `aoa` (replaces the Glasgow-sourced `aoa` field; the
Kuperman-sourced `aoa_kuperman` field is removed from the schema entirely
in PHON-115).
"""
from __future__ import annotations
import csv
from pathlib import Path
from typing import Iterable
from phonolex_data.loaders._helpers import get_data_dir
def load_phonolex_aoa(
path: str | Path | None = None,
filter_words: Iterable[str] | None = None,
) -> dict[str, dict[str, float]]:
"""Load PhonoLex's in-house AI-derived age-of-acquisition ratings.
Args:
path: Path to the TSV. Defaults to ``data/norms/phonolex_aoa.tsv``.
filter_words: Optional iterable of word strings (lowercase). When
provided, only entries with ``word`` in this set are returned.
Returns:
{word: {"aoa": float}}. Values are LLM expected-value ratings on
the Glasgow 1-7 scale. Words with NaN (rare retry failures) are
skipped.
"""
path = Path(path) if path else get_data_dir() / "norms" / "phonolex_aoa.tsv"
allowed: set[str] | None = (
{w.lower() for w in filter_words} if filter_words is not None else None
)
out: dict[str, dict[str, float]] = {}
with open(path, encoding="utf-8") as f:
reader = csv.DictReader(f, delimiter="\t")
for row in reader:
w = row["word"].strip().lower()
if not w:
continue
if allowed is not None and w not in allowed:
continue
try:
v = float(row["aoa"])
except (ValueError, KeyError):
continue
if v != v: # NaN check
continue
out[w] = {"aoa": v}
return out
- [ ] Step 4: Wire the loader into the package exports
Edit packages/data/src/phonolex_data/loaders/__init__.py:
1. Add the import: from phonolex_data.loaders.phonolex_aoa import load_phonolex_aoa near the other phonolex_* imports.
2. Add "load_phonolex_aoa", to the __all__ list.
- [ ] Step 5: Run the test to verify it passes
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/test_new_loaders.py::test_load_phonolex_aoa packages/data/tests/test_new_loaders.py::test_load_phonolex_aoa_filter_words -v
Expected: 2 passed.
- [ ] Step 6: Commit
git add packages/data/src/phonolex_data/loaders/phonolex_aoa.py packages/data/src/phonolex_data/loaders/__init__.py packages/data/tests/test_new_loaders.py
git commit -m "PHON-115: load_phonolex_aoa loader + tests"
Task 6: Remove load_kuperman and load_glasgow from loader exports¶
Files:
- Modify: packages/data/src/phonolex_data/loaders/__init__.py
- Modify: packages/data/src/phonolex_data/loaders/norms.py (deletion of load_kuperman body; load_glasgow retained-as-is since it's still callable for ad-hoc eval scripts)
- [ ] Step 1: Find the existing exports
Run: grep -n "load_kuperman\|load_glasgow" packages/data/src/phonolex_data/loaders/__init__.py
Expected: import statement(s) and __all__ entries.
- [ ] Step 2: Remove the exports
Edit packages/data/src/phonolex_data/loaders/__init__.py:
1. Remove load_kuperman from the from phonolex_data.loaders.norms import (...) block.
2. Remove "load_kuperman", from the __all__ list.
3. Leave load_glasgow import and export in place — it's no longer wired into the pipeline but stays callable for ad-hoc eval (e.g. one-off researcher scripts).
- [ ] Step 3: Delete the
load_kupermanfunction body
In packages/data/src/phonolex_data/loaders/norms.py, delete the entire def load_kuperman(...) function (the one starting around line 125 and ending around line 153). The load_glasgow function above it stays.
- [ ] Step 4: Update test_datasets.py to drop the Kuperman test
In packages/data/tests/test_datasets.py, find the test referencing aoa_kuperman (around line 85) and remove it. If the test is an entire function (def test_load_kuperman():), remove it whole. If the assertion is one line in a multi-loader test, remove that line.
Run: cd /Users/jneumann/Repos/PhonoLex && grep -n "aoa_kuperman\|load_kuperman" packages/data/tests/test_datasets.py
Expected: no matches.
- [ ] Step 5: Run the loader test suite to verify nothing else broke
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/test_new_loaders.py packages/data/tests/test_datasets.py -v 2>&1 | tail -20
Expected: all passes, no ImportError references to load_kuperman.
- [ ] Step 6: Commit
git add packages/data/src/phonolex_data/loaders/__init__.py packages/data/src/phonolex_data/loaders/norms.py packages/data/tests/test_datasets.py
git commit -m "PHON-115: remove load_kuperman + Kuperman test"
Task 7: Update pipeline source map and schema (drop aoa_kuperman, imageability, size)¶
Files:
- Modify: packages/data/src/phonolex_data/pipeline/words.py (imports, norm_loaders, _NORM_FIELD_MAP)
- Modify: packages/data/src/phonolex_data/pipeline/schema.py (WordRecord fields)
- Modify: packages/data/src/phonolex_data/pipeline/derived.py (PERCENTILE_PROPERTIES)
- [ ] Step 1: Update
pipeline/words.pyimports
In packages/data/src/phonolex_data/pipeline/words.py:
1. Remove load_kuperman, from the imports block (around line 7).
2. Remove load_glasgow, from the imports block (around line 8).
3. Add load_phonolex_aoa, to the imports block alongside the other load_phonolex_* entries.
- [ ] Step 2: Update the
_NORM_FIELD_MAPsource map
In packages/data/src/phonolex_data/pipeline/words.py, remove these three entries from _NORM_FIELD_MAP:
- "aoa_kuperman": "aoa_kuperman",
- "imageability": "imageability",
- "size": "size",
Keep "aoa": "aoa", — this slot is now populated by load_phonolex_aoa instead of load_glasgow.
- [ ] Step 3: Update the
norm_loaderslist
In the norm_loaders list (around line 350):
1. Remove the ("Kuperman", load_kuperman), line.
2. Remove the ("Glasgow", load_glasgow), line.
3. Add a new entry alongside the other load_phonolex_* loaders:
("PhonoLex AoA",
lambda s=cmu_word_set: load_phonolex_aoa(filter_words=s)),
Place it adjacent to the other PHON-73 family lines (concreteness/valence/arousal/familiarity) for visual coherence.
- [ ] Step 4: Drop dropped fields from
pipeline/schema.py::WordRecord
In packages/data/src/phonolex_data/pipeline/schema.py, delete these three lines from the WordRecord dataclass:
- aoa_kuperman: float | None = None
- imageability: float | None = None
- size: float | None = None
The aoa: float | None = None line stays.
- [ ] Step 5: Drop dropped fields from
pipeline/derived.py::PERCENTILE_PROPERTIES
In packages/data/src/phonolex_data/pipeline/derived.py::PERCENTILE_PROPERTIES, remove these strings from the tuple:
- "aoa_kuperman"
- "imageability"
- "size"
(They appear on lines 22-23 currently — adjacent to "aoa". Keep "aoa".)
- [ ] Step 6: Run the pipeline test
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/test_pipeline.py -v 2>&1 | tail -30
Expected: all passes. If any test asserts presence of aoa_kuperman, imageability, or size columns, those tests need updates in Task 11.
- [ ] Step 7: Commit
git add packages/data/src/phonolex_data/pipeline/words.py packages/data/src/phonolex_data/pipeline/schema.py packages/data/src/phonolex_data/pipeline/derived.py
git commit -m "PHON-115: pipeline rewire — drop aoa_kuperman/imageability/size, add PhonoLex AoA"
Task 8: Update property config (Python + TypeScript)¶
Files:
- Modify: packages/web/workers/scripts/config.py (PropertyDef for aoa_kuperman, imageability, size)
- Modify: packages/web/workers/src/config/properties.ts (same shape in TS)
- [ ] Step 1: Update
config.py— remove three PropertyDefs, updateaoaPropertyDef
In packages/web/workers/scripts/config.py:
1. Find and delete the PropertyDef(id="aoa_kuperman", ...) block (around line 224).
2. Find and delete the PropertyDef(id="imageability", ...) block (around line 378).
3. Find and delete the PropertyDef(id="size", ...) block (around line 411).
4. Find the PropertyDef(id="aoa", ...) block. Update source= to read "PhonoLex AoA (gpt-4.1-mini, PHON-115)". Update description= to read "Age of acquisition (AI-derived from Glasgow Norms CC BY 4.0 as validation oracle via gpt-4.1-mini cloze-prompt; replaces Kuperman et al. 2012, 🟡 license-encumbered). Spearman 0.87 vs Glasgow (full N=5,551 overlap), Pearson 0.82 vs Kuperman on N=500 Glasgow-unseen rows.". If the existing PropertyDef has scale="1-7", leave it as is; if it has different scale text reflecting Glasgow's old framing, update to "1-7".
- [ ] Step 2: Update
properties.ts— same shape in TypeScript
In packages/web/workers/src/config/properties.ts:
1. Find and delete the PropertyDef entry with id: 'aoa_kuperman' (around line 159).
2. Find and delete the entry with id: 'imageability' (around line 182).
3. Find and delete the entry with id: 'size' (around line 206).
4. Find the entry with id: 'aoa'. Update source and description to match what you wrote in config.py Step 1. Leave scale as '1-7'.
- [ ] Step 3: Run the workers test suite
Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npm test 2>&1 | tail -30
Expected: most tests pass. Any failures referencing aoa_kuperman/imageability/size will be addressed in Task 11.
- [ ] Step 4: Commit
git add packages/web/workers/scripts/config.py packages/web/workers/src/config/properties.ts
git commit -m "PHON-115: property config — remove 3 PropertyDefs, update aoa source attribution"
Task 9: Move xlsx oracle files to _oracles/¶
Files:
- Move: data/norms/kuperman_aoa.xlsx → data/norms/_oracles/kuperman_aoa.xlsx
- Move: data/norms/GlasgowNorms.xlsx → data/norms/_oracles/GlasgowNorms.xlsx
- [ ] Step 1: Verify
_oracles/exists, then move both files
Run: cd /Users/jneumann/Repos/PhonoLex && ls data/norms/_oracles/ | head -5 && git mv data/norms/kuperman_aoa.xlsx data/norms/_oracles/kuperman_aoa.xlsx && git mv data/norms/GlasgowNorms.xlsx data/norms/_oracles/GlasgowNorms.xlsx
Expected: _oracles/ exists (it already houses Brysbaert/Warriner per PHON-73); both git mv succeed silently.
- [ ] Step 2: Update
load_glasgow's default path
In packages/data/src/phonolex_data/loaders/norms.py::load_glasgow (around line 42), change:
path = Path(path) if path else get_data_dir() / "norms" / "GlasgowNorms.xlsx"
path = Path(path) if path else get_data_dir() / "norms" / "_oracles" / "GlasgowNorms.xlsx"
(The Kuperman loader was deleted in Task 6; no path update needed there.)
- [ ] Step 3: Update the validate_aoa.py oracle paths
In research/2026-04-30-llm-word-features/validate_aoa.py, update:
- GLASGOW_PATH = REPO / "data" / "norms" / "GlasgowNorms.xlsx" → GLASGOW_PATH = REPO / "data" / "norms" / "_oracles" / "GlasgowNorms.xlsx"
- KUPERMAN_PATH = REPO / "data" / "norms" / "kuperman_aoa.xlsx" → KUPERMAN_PATH = REPO / "data" / "norms" / "_oracles" / "kuperman_aoa.xlsx"
- [ ] Step 4: Verify load_glasgow still works
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -c "from phonolex_data.loaders import load_glasgow; r = load_glasgow(); print(f'loaded {len(r):,} entries')"
Expected: loaded 5,551 entries (or similar — Glasgow has 5,551 AoA-labeled words).
- [ ] Step 5: Commit
git add data/norms/_oracles/ packages/data/src/phonolex_data/loaders/norms.py research/2026-04-30-llm-word-features/validate_aoa.py
git status # verify the git mv recorded the rename
git commit -m "PHON-115: relocate Glasgow + Kuperman xlsx to data/norms/_oracles/"
Task 10: Regenerate parquet artifacts and D1 seed¶
Files:
- Regenerate: data/runtime/words.parquet, data/runtime/edges.parquet, data/runtime/selectional.parquet, data/runtime/pairs.parquet, data/runtime/skeletons.parquet
- Regenerate: packages/web/workers/scripts/d1-seed.sql
- [ ] Step 1: Run the parquet build
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python packages/data/scripts/build_runtime_parquet.py 2>&1 | tee /tmp/build_parquet.log | tail -30
Expected: success log with [words] N=47,XXX rows × ~165 cols (down ~3 cols from the previous count due to aoa_kuperman/imageability/size drops).
- [ ] Step 2: Verify
aoa_kuperman,imageability,sizecolumns are absent from words.parquet
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -c "import polars as pl; df = pl.read_parquet('data/runtime/words.parquet'); cols = set(df.columns); print('aoa:', 'aoa' in cols); print('aoa_kuperman:', 'aoa_kuperman' in cols); print('imageability:', 'imageability' in cols); print('size:', 'size' in cols); print('aoa_kuperman_percentile:', 'aoa_kuperman_percentile' in cols); print('imageability_percentile:', 'imageability_percentile' in cols); print('size_percentile:', 'size_percentile' in cols)"
Expected:
aoa: True
aoa_kuperman: False
imageability: False
size: False
aoa_kuperman_percentile: False
imageability_percentile: False
size_percentile: False
- [ ] Step 3: Regenerate D1 seed SQL
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python packages/web/workers/scripts/export-to-d1.py 2>&1 | tee /tmp/export_d1.log | tail -20
Expected: success log with row counts.
- [ ] Step 4: Verify the seed SQL has no aoa_kuperman or imageability references
Run: cd /Users/jneumann/Repos/PhonoLex && grep -c "aoa_kuperman" packages/web/workers/scripts/d1-seed.sql && echo "---" && grep -c "imageability" packages/web/workers/scripts/d1-seed.sql
Expected: both counts are 0.
For size — it's a common substring (e.g. in COMMENT lines about file size), so a bare grep is noisy. Use a column-context grep:
Run: cd /Users/jneumann/Repos/PhonoLex && grep -E '\bsize\b|"size"|\\bsize INTEGER|\\bsize REAL' packages/web/workers/scripts/d1-seed.sql | grep -vE "file size|window size|sample size" | head -10
Expected: no matches (or only matches that are clearly not the dropped norm column).
- [ ] Step 5: Commit
git add data/runtime/ packages/web/workers/scripts/d1-seed.sql
git commit -m "PHON-115: regenerate parquet + d1-seed.sql post-column-purge"
Task 11: Update Workers API tests + frontend types¶
Files:
- Modify: packages/web/workers/src/__tests__/api.test.ts:232,252 (filter tests)
- Modify: packages/web/frontend/src/types/phonology.ts (3 dropped columns + their min_/max_ variants)
- Modify: any other frontend files referencing dropped columns (audit below)
- [ ] Step 1: Update the api.test.ts filter tests
In packages/web/workers/src/__tests__/api.test.ts:
- Line 232: change { filters: { min_aoa_kuperman: 1.0, max_aoa_kuperman: 5.0 } } to { filters: { min_aoa: 1.0, max_aoa: 5.0 } }.
- Line 252: change filters: { max_aoa_kuperman: 6.0 } to filters: { max_aoa: 6.0 }.
- Update any assertions that check for aoa_kuperman in the response to check for aoa instead.
- [ ] Step 2: Update
phonology.tstypes
In packages/web/frontend/src/types/phonology.ts:
1. Delete the line aoa_kuperman: number | null; (around line 65 — exact lines may shift).
2. Delete the line imageability: number | null;.
3. Delete the line size: number | null;.
4. Delete the lines min_aoa_kuperman?: number; and max_aoa_kuperman?: number;.
5. Delete the lines min_imageability?: number; and max_imageability?: number;.
6. Delete the lines min_size?: number; and max_size?: number;.
Keep aoa: number | null;, min_aoa?: number;, max_aoa?: number;.
- [ ] Step 3: Audit other frontend files for dropped column references
Run: cd /Users/jneumann/Repos/PhonoLex && grep -rn "aoa_kuperman\|min_imageability\|max_imageability\|min_size\|max_size" packages/web/frontend/src/ | grep -v "node_modules\|\.d\.ts$"
Expected: 0 matches after the types update. If matches exist (e.g. in WordProfileContext, WordListTable, ContrastiveGroupsTable, ExportMenu), update each one in this step — typically a one-line rename per file.
- [ ] Step 4: Run TypeScript typecheck
Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run build 2>&1 | tail -30
Expected: build succeeds. If it complains about missing aoa_kuperman/imageability/size properties anywhere, follow the error trail and remove those references.
- [ ] Step 5: Run the workers test suite
Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npm test 2>&1 | tail -20
Expected: all tests pass.
- [ ] Step 6: Commit
git add packages/web/workers/src/__tests__/api.test.ts packages/web/frontend/src/types/phonology.ts packages/web/frontend/src/
git commit -m "PHON-115: update frontend types + workers API tests for dropped columns"
Task 12: Update frontend slider (PsycholinguisticsSection)¶
Files:
- Modify: packages/web/frontend/src/components/tools/GovernedGenerationTool/PsycholinguisticsSection.tsx (BOUNDS list lines 41-84)
- [ ] Step 1: Replace the AoA slider entry and remove the Imageability entry
In packages/web/frontend/src/components/tools/GovernedGenerationTool/PsycholinguisticsSection.tsx, find the BOUNDS: NormDef[] = [...] array (around line 41).
Replace the existing aoa_kuperman entry (lines 42-47):
{
norm: 'aoa_kuperman', label: 'Age of Acquisition',
description: 'Exclude words acquired after this age (Kuperman, years)',
min: 2, max: 21, step: 0.5, direction: 'max',
format: (v) => `≤ ${v} yrs`,
},
{
norm: 'aoa', label: 'Age of Acquisition',
description: 'Exclude words acquired after this rating (1 = 0-2y, 7 = 13+y, PhonoLex-derived)',
min: 1, max: 7, step: 0.5, direction: 'max',
format: (v) => `≤ ${v}`,
},
Delete the imageability entry (lines 60-65) entirely:
{
norm: 'imageability', label: 'Imageability',
description: 'Exclude words below this imageability (1 = hard to picture, 7 = easy)',
min: 1, max: 7, step: 0.5, direction: 'min',
format: (v) => `≥ ${v}`,
},
- [ ] Step 2: Run TypeScript typecheck
Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run build 2>&1 | tail -10
Expected: build succeeds.
- [ ] Step 3: Commit
git add packages/web/frontend/src/components/tools/GovernedGenerationTool/PsycholinguisticsSection.tsx
git commit -m "PHON-115: frontend AoA slider — swap to aoa (1-7), drop Imageability slider"
Task 13: Update audit checklist + NOTICE¶
Files:
- Modify: docs/data-license-remediation-checklist.md (Kuperman row)
- Modify: NOTICE (Kuperman attribution + Glasgow attribution updates)
- [ ] Step 1: Find the latest commit hash for cross-reference
Run: git log --oneline -1
Expected: a 7-char short hash like abc1234. Note this for Step 2.
- [ ] Step 2: Update the audit checklist
In docs/data-license-remediation-checklist.md, find the row beginning | Kuperman AoA 2012 | (around line 46). Update the status cell from:
| Kuperman AoA 2012 | TBD — separate AoA-replacement workstream | 🟡 still in v1 (Kuperman XLSX kept under `data/norms/`, not in `_oracles/`) |
| Kuperman AoA 2012 | PhonoLex AoA (PHON-115) — gpt-4.1-mini cloze-prompt | 🟢 **Done 2026-05-11** (commit `<hash-from-step-1>`). Production build 47,724 words, Glasgow Spearman 0.87 (full N=5,551 overlap), Kuperman Pearson 0.82 (N=500 Glasgow-unseen). Kuperman xlsx moved to `data/norms/_oracles/`. Pipeline ships `aoa` repointed to PhonoLex value via `load_phonolex_aoa`. `imageability` + `size` columns also retired (orphaned post-Glasgow-relocation, no consumer). |
- [ ] Step 3: Update NOTICE
In NOTICE, locate the section listing validation oracles (the part where Brysbaert/Warriner are framed as "validation oracles only" post-PHON-73). Add Kuperman alongside, e.g.:
- Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition
ratings for 30,000 English words. Behavior Research Methods, 44(4), 978-990.
Kept locally at data/norms/_oracles/kuperman_aoa.xlsx as a cross-construct
validation oracle for PhonoLex's gpt-4.1-mini AoA build (PHON-115). Not
redistributed; per-row data not exposed in shipped artifacts.
If Glasgow already has an entry under "validation oracles," update it to additionally note the AoA role:
- Scott, G. G., Keitel, A., Becirspahic, M., Yao, B., & Sereno, S. C. (2019).
The Glasgow Norms: Ratings of 5,500 words on nine scales. Behavior Research
Methods, 51(3), 1258-1270. CC BY 4.0. Kept locally at
data/norms/_oracles/GlasgowNorms.xlsx as the primary validation oracle for
PhonoLex's gpt-4.1-mini word feature builds (PHON-73 family + PHON-115 AoA).
Not redistributed; per-row data not exposed in shipped artifacts.
If the existing entry framed Glasgow as a pipeline source (not oracle), replace that framing — Glasgow is no longer a pipeline source after PHON-115.
- [ ] Step 4: Commit
git add docs/data-license-remediation-checklist.md NOTICE
git commit -m "PHON-115: audit checklist + NOTICE updated — Kuperman 🟡→🟢, Glasgow oracle-only"
Task 14: Final verification + browser smoke¶
Files: (no edits — verification only)
- [ ] Step 1: Verify no lingering norm-column references
Run:
cd /Users/jneumann/Repos/PhonoLex
echo "=== aoa_kuperman ===" && grep -rE 'aoa_kuperman' packages/ data/runtime/ | grep -v node_modules
echo "=== imageability ===" && grep -rE 'imageability' packages/ data/runtime/ | grep -v node_modules
size grep is unreliable due to MUI prop names — visual inspection of frontend already covered this.)
- [ ] Step 2: Run the full Python test suite for packages/data
Run: cd /Users/jneumann/Repos/PhonoLex && uv run python -m pytest packages/data/tests/ -v 2>&1 | tail -30
Expected: all tests pass.
- [ ] Step 3: Run the workers test suite
Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/workers && npm test 2>&1 | tail -20
Expected: all tests pass.
- [ ] Step 4: Run the frontend build
Run: cd /Users/jneumann/Repos/PhonoLex/packages/web/frontend && npm run build 2>&1 | tail -10
Expected: build succeeds with no type errors.
- [ ] Step 5: Reseed local D1 + start dev servers + browser smoke
cd /Users/jneumann/Repos/PhonoLex/packages/web/workers
npx wrangler d1 execute phonolex --local --file scripts/d1-seed.sql
npx wrangler dev &
WORKER_PID=$!
cd ../frontend
npm run dev &
FRONTEND_PID=$!
In the browser at the dev frontend URL: open the Governed Generation tool, add an AoA bound (e.g. ≤ 5), submit a sentence request, confirm sentences come back. Verify the AoA slider's tooltip says "1 = 0-2y, 7 = 13+y, PhonoLex-derived" (or your wording from Task 12). Confirm there is NO Imageability slider in the BOUNDS UI.
Then: kill $WORKER_PID $FRONTEND_PID.
- [ ] Step 6: Final commit (if any drift from the smoke session)
If the browser smoke surfaced any small fixes (typo, layout nudge), commit them with a PHON-115: browser smoke fixes message. Otherwise, no commit needed.
- [ ] Step 7: Final verification grep
Run: cd /Users/jneumann/Repos/PhonoLex && git log --oneline | grep "PHON-115" | head -20
Expected: ~13 commits with PHON-115: prefix tracing the full work. If you see fewer than ~10 commits, you batched too aggressively — that's fine if intentional, but spot-check that each phase landed.
Done¶
All spec gates have been satisfied:
- ✓ Build script + validation report committed (Task 4)
- ✓ phonolex_aoa.tsv ≥47K rows (Task 3)
- ✓ Loader rewired (Tasks 5, 6)
- ✓ Pipeline source map, schema, percentile target list (Task 7)
- ✓ Property config Python + TS (Task 8)
- ✓ Parquet + D1 regenerated, columns absent (Task 10)
- ✓ Frontend slider + types (Tasks 11, 12)
- ✓ Source data moved to _oracles/ (Task 9)
- ✓ Audit checklist 🟡→🟢 (Task 13)
- ✓ NOTICE updated (Task 13)
- ✓ Tests updated (Tasks 5, 6, 11)
- ✓ Browser smoke (Task 14)
PR-ready. Branch off develop (per feedback_branch_management.md); PR targets develop not main.