PHON-104 — CSP Enumeration Vectorization Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Vectorize CSP enumeration via Polars cross-joins + column expressions, eliminating Python-loop overhead in solve_shape's Cartesian without dropping any candidates. Migrate solve() to delegate to solve_shape so the speedup flows through to all callers.
Architecture: Two-part work. Part A: extract current enumerate_assignments logic into _enumerate_python_fallback, then add _enumerate_vectorized using Polars cross-joins, score-as-column expressions, and unique()-based content-pair dedup. Routing decision in solve_shape picks vectorized vs fallback based on whether ContrastiveConstraint scorers are registered. Part B: rewrite paradigm_3_csp.solve() as a thin wrapper that constructs a SkeletonShape and delegates to solve_shape, repackaging the result into the legacy (top, stats) tuple.
Tech Stack: Python 3.12, Polars 1.0+, pytest.
Spec: docs/superpowers/specs/2026-05-08-phon-104-csp-vectorize-enumeration-design.md
File map¶
| File | Action |
|---|---|
packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py |
Modify — extract Python fallback, add vectorized path, add routing |
packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py |
Modify — solve() delegates to solve_shape; add stats helpers |
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py |
Create — equivalence + routing + stats-parity tests |
packages/generation/research/2026-05-07-sentence-generation-paradigms/bench_enumeration.py |
Create — vectorized vs forced-python timing on largest probe |
All paths in this plan are relative to repo root /Users/jneumann/Repos/PhonoLex/. The spike directory is referenced as <spike>/ for brevity:
<spike>/ = packages/generation/research/2026-05-07-sentence-generation-paradigms/.
Test command throughout:
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Task 1: Extract _enumerate_python_fallback¶
Files:
- Modify: <spike>/skeleton_csp.py
Pull the existing enumerate_assignments generator and the surrounding scoring loop from solve_shape into a separate function. No behavior change — pure refactor. After this task, the cache tests (PHON-103) continue to pass and solve_shape works identically.
- [ ] Step 1.1: Read the current
solve_shapebody
sed -n '589,740p' packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py
The current solve_shape defines enumerate_assignments as a nested generator, then runs a scoring loop over its yields. Extract the scoring portion (everything from best_by_content: dict[...] = {} through the loop that builds best_by_content[key]) into a new module-level helper _enumerate_python_fallback.
- [ ] Step 1.2: Add
_enumerate_python_fallback
Add a new function near solve_shape (just above it):
def _enumerate_python_fallback(
shape: SkeletonShape,
slot_fillers: list[tuple[str, list[str], dict[str, float]]],
word_axes: dict[str, dict[str, float]],
cross_axes: dict,
word_df: pl.DataFrame | None,
weights: dict[str, float] | None,
locked_slots: dict[str, str],
) -> dict[tuple[str, ...], tuple[float, dict[str, str], dict[str, float]]]:
"""Python-loop enumeration. Returns best_by_content dict keyed on
content-slot tuple, valued (total_score, fillers, components)."""
best_by_content: dict[tuple[str, ...], tuple[float, dict[str, str], dict[str, float]]] = {}
def enumerate_assignments(
idx: int,
partial: dict[str, str],
running_components: dict[str, float],
) -> Iterable[tuple[dict[str, str], dict[str, float]]]:
if idx == len(slot_fillers):
yield dict(partial), dict(running_components)
return
slot, fillers, scores = slot_fillers[idx]
if slot in partial:
locked_word = partial[slot]
locked_score = scores.get(locked_word, 0.0)
comp_key = f"pmi_{slot}"
if locked_score > 0:
running_components[comp_key] = running_components.get(comp_key, 0.0) + locked_score
yield from enumerate_assignments(idx + 1, partial, running_components)
if locked_score > 0:
running_components[comp_key] -= locked_score
if abs(running_components.get(comp_key, 0.0)) < 1e-12:
running_components.pop(comp_key, None)
return
for f in fillers:
partial[slot] = f
comp_key = f"pmi_{slot}"
score = scores.get(f, 0.0)
running_components[comp_key] = score if comp_key not in running_components else running_components[comp_key] + score
yield from enumerate_assignments(idx + 1, partial, running_components)
del partial[slot]
if comp_key in running_components:
if score == 0.0:
del running_components[comp_key]
else:
running_components[comp_key] -= score
if abs(running_components[comp_key]) < 1e-12:
del running_components[comp_key]
initial: dict[str, str] = dict(locked_slots)
for fillers_dict, components in enumerate_assignments(0, initial, {}):
if "nsubj" in fillers_dict and "dobj" in fillers_dict and fillers_dict["nsubj"] == fillers_dict["dobj"]:
continue
for axis_name, axis_lookup in word_axes.items():
total_axis = 0.0
for slot in shape.content_slots:
total_axis += axis_lookup.get(fillers_dict[slot], 0.0)
if total_axis != 0.0:
components[axis_name] = float(total_axis)
if cross_axes and word_df is not None:
slot_assignment = {s: fillers_dict[s] for s in shape.content_slots}
for axis_name, scorer in cross_axes.items():
components[axis_name] = float(scorer(slot_assignment, word_df))
if "advmod" in fillers_dict:
components["adv_sentinel"] = 0.001
total = _weighted_total(components, weights)
key = _content_pair_key(shape, fillers_dict)
cur = best_by_content.get(key)
if cur is None or total > cur[0]:
best_by_content[key] = (total, dict(fillers_dict), dict(components))
return best_by_content
- [ ] Step 1.3: Replace the inline body inside
solve_shapewith a call to_enumerate_python_fallback
Find this block in solve_shape (the body after slot_fillers is built; between slot_fillers.append(...) and deduped = sorted(best_by_content.values()...)):
# Cartesian over slot fillers, dedup by content-slot key.
best_by_content: dict[tuple[str, ...], tuple[float, dict[str, str], dict[str, float]]] = {}
def enumerate_assignments(
idx: int,
partial: dict[str, str],
running_components: dict[str, float],
) -> Iterable[tuple[dict[str, str], dict[str, float]]]:
...
initial: dict[str, str] = {"V": verb}
if locked_slots:
initial.update(locked_slots)
for fillers_dict, components in enumerate_assignments(0, initial, {}):
...
best_by_content[key] = (total, dict(fillers_dict), dict(components))
Replace the entire block (from best_by_content: dict[...] = {} through the end of the for-loop building best_by_content[key]) with:
initial_locks: dict[str, str] = {"V": verb}
if locked_slots:
initial_locks.update(locked_slots)
best_by_content = _enumerate_python_fallback(
shape=shape,
slot_fillers=slot_fillers,
word_axes=word_axes,
cross_axes=cross_axes,
word_df=word_df,
weights=weights,
locked_slots=initial_locks,
)
deduped = sorted(best_by_content.values(), key=lambda t: t[0], reverse=True)
- [ ] Step 1.4: Run cache tests + manual smoke
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v
Expected: 25 passed (no regression — pure refactor).
cd packages/generation/research/2026-05-07-sentence-generation-paradigms && \
uv run python -c "
import paradigm_3_csp
from phonolex_data.runtime.store import WordStore
from pathlib import Path
import polars as pl
repo = Path('../../../..').resolve()
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
spec_words = paradigm_3_csp.spec_lexicon(store, 'spec1')
top, stats = paradigm_3_csp.solve('cut', 'spec1', spec_words, sel_df, word_df=store.df)
print(f'top-1: {top[0][\"sentence\"]} total_score={top[0][\"total_score\"]:.3f}')
"
Expected: a sentence printed.
- [ ] Step 1.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py
git commit -m "$(cat <<'EOF'
PHON-104: extract _enumerate_python_fallback (pure refactor)
Pull the existing enumerate_assignments generator + scoring loop out
of solve_shape into a module-level helper. No behavior change — sets
up the routing point for the upcoming vectorized path.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 2: Add _should_use_vectorized routing + _FORCE_PYTHON_PATH flag¶
Files:
- Modify: <spike>/skeleton_csp.py
- Create: <spike>/test_vectorized_enumeration.py
Adds the routing decision but no vectorized path yet — it always returns False until Task 8 wires it in.
- [ ] Step 2.1: Write the failing routing tests
Create <spike>/test_vectorized_enumeration.py:
"""Tests for vectorized enumeration — PHON-104."""
from __future__ import annotations
import sys
from pathlib import Path
import pytest
sys.path.insert(0, str(Path(__file__).parent))
from constraint_surface import (
ContrastiveConstraint,
IncludeConstraint,
cross_slot_axes,
)
import skeleton_csp
def test_no_contrastive_takes_vectorized():
cross = cross_slot_axes([IncludeConstraint(phonemes=("k",))])
assert skeleton_csp._should_use_vectorized(cross_axes=cross) is True
def test_contrastive_takes_python_fallback():
cross = cross_slot_axes([
ContrastiveConstraint(pair_type="minpair", phoneme1="k", phoneme2="g")
])
assert skeleton_csp._should_use_vectorized(cross_axes=cross) is False
def test_force_python_path_overrides_routing():
"""The _FORCE_PYTHON_PATH flag forces fallback regardless of constraints."""
cross = cross_slot_axes([IncludeConstraint(phonemes=("k",))])
with skeleton_csp._force_python_path():
assert skeleton_csp._should_use_vectorized(cross_axes=cross) is False
# Outside the context, normal routing
assert skeleton_csp._should_use_vectorized(cross_axes=cross) is True
- [ ] Step 2.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 3 fail with AttributeError: module 'skeleton_csp' has no attribute '_should_use_vectorized'.
- [ ] Step 2.3: Add the routing helper + flag + context manager
Append to <spike>/skeleton_csp.py:
import contextlib
_FORCE_PYTHON_PATH = False
@contextlib.contextmanager
def _force_python_path():
"""Test-only context manager that forces the python fallback regardless
of constraint shape. Used to compare vectorized vs python output on
inputs where vectorized would normally be selected."""
global _FORCE_PYTHON_PATH
prev = _FORCE_PYTHON_PATH
_FORCE_PYTHON_PATH = True
try:
yield
finally:
_FORCE_PYTHON_PATH = prev
def _should_use_vectorized(*, cross_axes: dict) -> bool:
"""Route between vectorized and python fallback paths.
Vectorized path runs when no cross-slot scorers are registered (i.e.,
no ContrastiveConstraint in the request). When _FORCE_PYTHON_PATH is
set (test only), always returns False.
"""
if _FORCE_PYTHON_PATH:
return False
return not cross_axes
- [ ] Step 2.4: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 3 passed.
- [ ] Step 2.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add _should_use_vectorized routing + _force_python_path test hook
Routing decision: vectorized path runs when no cross-slot scorers
(no ContrastiveConstraint). _force_python_path() context manager
forces the fallback for equivalence testing.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 3: Add _build_slot_filler_tables¶
Files:
- Modify: <spike>/skeleton_csp.py
- Modify: <spike>/test_vectorized_enumeration.py
Builds per-slot Polars frames from the slot_fillers tuples.
- [ ] Step 3.1: Write failing tests
Append to <spike>/test_vectorized_enumeration.py:
import polars as pl
def test_build_slot_filler_tables_basic():
slot_fillers = [
("V", ["cut"], {}),
("nsubj", ["cat", "kid"], {"cat": 1.5, "kid": 0.8}),
("dobj", ["cake", "rope"], {"cake": 2.0, "rope": 1.2}),
]
tables = skeleton_csp._build_slot_filler_tables(slot_fillers, locked_slots={"V": "cut"})
assert set(tables.keys()) == {"V", "nsubj", "dobj"}
# V is locked → 1 row
assert tables["V"].height == 1
assert tables["V"]["V"].to_list() == ["cut"]
assert tables["V"]["pmi_V"].to_list() == [0.0]
# nsubj has 2 fillers
assert tables["nsubj"].height == 2
assert sorted(tables["nsubj"]["nsubj"].to_list()) == ["cat", "kid"]
# PMI scores aligned with filler order
nsubj_rows = dict(zip(tables["nsubj"]["nsubj"].to_list(), tables["nsubj"]["pmi_nsubj"].to_list()))
assert nsubj_rows == {"cat": 1.5, "kid": 0.8}
def test_build_slot_filler_tables_locked_filler_not_in_scores():
"""A locked filler whose word isn't in scores → 0.0 PMI column."""
slot_fillers = [
("nsubj", ["a", "b", "c"], {"a": 1.0}),
]
tables = skeleton_csp._build_slot_filler_tables(slot_fillers, locked_slots={"nsubj": "z"})
assert tables["nsubj"].height == 1
assert tables["nsubj"]["nsubj"].to_list() == ["z"]
assert tables["nsubj"]["pmi_nsubj"].to_list() == [0.0]
- [ ] Step 3.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 2 fail with AttributeError: module 'skeleton_csp' has no attribute '_build_slot_filler_tables'.
- [ ] Step 3.3: Add
_build_slot_filler_tables
Append to <spike>/skeleton_csp.py:
def _build_slot_filler_tables(
slot_fillers: list[tuple[str, list[str], dict[str, float]]],
locked_slots: dict[str, str],
) -> dict[str, pl.DataFrame]:
"""Build per-slot polars frames with `<slot>` (filler) + `pmi_<slot>` columns.
Locked slots produce a 1-row frame with the locked filler. Non-locked
slots produce a |fillers|-row frame.
"""
tables: dict[str, pl.DataFrame] = {}
for slot, fillers, scores in slot_fillers:
if slot in locked_slots:
w = locked_slots[slot]
tables[slot] = pl.DataFrame({
slot: [w],
f"pmi_{slot}": [scores.get(w, 0.0)],
})
else:
tables[slot] = pl.DataFrame({
slot: fillers,
f"pmi_{slot}": [scores.get(f, 0.0) for f in fillers],
})
return tables
- [ ] Step 3.4: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 5 passed.
- [ ] Step 3.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add _build_slot_filler_tables helper
Converts slot_fillers tuples into per-slot polars frames keyed on
slot name. Locked slots produce 1-row frames; missing PMI scores
default to 0.0.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 4: Add _enumerate_vectorized skeleton (cross-join + nsubj!=dobj filter)¶
Files:
- Modify: <spike>/skeleton_csp.py
- Modify: <spike>/test_vectorized_enumeration.py
Cross-joins per-slot frames into a Cartesian, applies the nsubj-dobj distinct invariant. No scoring yet.
- [ ] Step 4.1: Write failing test
Append to <spike>/test_vectorized_enumeration.py:
def test_enumerate_vectorized_cardinality():
"""Cartesian cardinality: 2 nsubj × 2 dobj × 1 V = 4 rows; minus
the nsubj==dobj diagonal (none here, words distinct) = 4."""
shape = skeleton_csp.SkeletonShape(
arg_structure="nsubj,V,dobj",
slots=("nsubj", "V", "dobj"),
band_freq=0,
)
slot_fillers = [
("nsubj", ["cat", "kid"], {"cat": 1.0, "kid": 0.5}),
("V", ["cut"], {}),
("dobj", ["cake", "rope"], {"cake": 2.0, "rope": 1.0}),
]
cart = skeleton_csp._enumerate_vectorized(
shape=shape,
slot_fillers=slot_fillers,
word_axes={},
weights=None,
locked_slots={"V": "cut"},
)
assert cart.height == 4
assert set(cart.columns) >= {"nsubj", "V", "dobj", "pmi_nsubj", "pmi_dobj", "pmi_V"}
def test_enumerate_vectorized_nsubj_dobj_distinct():
"""nsubj != dobj invariant filters out the diagonal."""
shape = skeleton_csp.SkeletonShape(
arg_structure="nsubj,V,dobj",
slots=("nsubj", "V", "dobj"),
band_freq=0,
)
slot_fillers = [
("nsubj", ["cat", "kid"], {"cat": 1.0, "kid": 0.5}),
("V", ["cut"], {}),
("dobj", ["cat", "kid"], {"cat": 2.0, "kid": 1.0}),
]
cart = skeleton_csp._enumerate_vectorized(
shape=shape, slot_fillers=slot_fillers, word_axes={},
weights=None, locked_slots={"V": "cut"},
)
# 2×2 = 4, minus 2 (cat,cat / kid,kid) = 2
assert cart.height == 2
for n, d in zip(cart["nsubj"].to_list(), cart["dobj"].to_list()):
assert n != d
- [ ] Step 4.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 2 fail with AttributeError: module 'skeleton_csp' has no attribute '_enumerate_vectorized'.
- [ ] Step 4.3: Add
_enumerate_vectorized(cardinality + filter only)
Append to <spike>/skeleton_csp.py:
def _enumerate_vectorized(
shape: SkeletonShape,
slot_fillers: list[tuple[str, list[str], dict[str, float]]],
word_axes: dict[str, dict[str, float]],
weights: dict[str, float] | None,
locked_slots: dict[str, str],
) -> pl.DataFrame:
"""Vectorized Cartesian via Polars cross-joins. Returns a DataFrame
with one row per assignment, columns for each slot's filler and PMI
score, plus per-axis score columns (added in subsequent tasks)."""
tables = _build_slot_filler_tables(slot_fillers, locked_slots)
# Cartesian via successive cross joins
cart = tables[shape.slots[0]]
for s in shape.slots[1:]:
cart = cart.join(tables[s], how="cross")
# nsubj != dobj invariant (only when both present)
if "nsubj" in shape.slots and "dobj" in shape.slots:
cart = cart.filter(pl.col("nsubj") != pl.col("dobj"))
return cart
- [ ] Step 4.4: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 7 passed.
- [ ] Step 4.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add _enumerate_vectorized skeleton (cross-join + invariant)
Cross-joins per-slot frames into a Polars Cartesian. Applies the
nsubj != dobj invariant when both slots are present. No scoring yet
— scoring columns added in subsequent tasks.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 5: Add per-word axis scoring¶
Files:
- Modify: <spike>/skeleton_csp.py
- Modify: <spike>/test_vectorized_enumeration.py
For each soft-axis lookup (e.g., include_/k/), add a column summing per-content-slot lookups.
- [ ] Step 5.1: Write failing test
Append to <spike>/test_vectorized_enumeration.py:
def test_enumerate_vectorized_per_word_axes():
"""Per-word axis sums across content slots."""
shape = skeleton_csp.SkeletonShape(
arg_structure="nsubj,V,dobj",
slots=("nsubj", "V", "dobj"),
band_freq=0,
)
slot_fillers = [
("nsubj", ["cat", "kid"], {"cat": 1.0, "kid": 0.5}),
("V", ["cut"], {}),
("dobj", ["cake", "rope"], {"cake": 2.0, "rope": 1.0}),
]
word_axes = {
"include_/k/": {"cat": 1.0, "kid": 1.0, "cake": 1.0}, # contains /k/
# rope contains no /k/ → 0
}
cart = skeleton_csp._enumerate_vectorized(
shape=shape, slot_fillers=slot_fillers, word_axes=word_axes,
weights=None, locked_slots={"V": "cut"},
)
assert "include_/k/" in cart.columns
# cat + cake = 2.0; cat + rope = 1.0; kid + cake = 2.0; kid + rope = 1.0
rows = sorted(zip(
cart["nsubj"].to_list(),
cart["dobj"].to_list(),
cart["include_/k/"].to_list(),
))
assert rows == [
("cat", "cake", 2.0),
("cat", "rope", 1.0),
("kid", "cake", 2.0),
("kid", "rope", 1.0),
]
- [ ] Step 5.2: Run test, verify it fails
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py::test_enumerate_vectorized_per_word_axes -v
Expected: AssertionError — "include_/k/" not in columns.
- [ ] Step 5.3: Add per-word axis scoring to
_enumerate_vectorized
In _enumerate_vectorized, after the nsubj != dobj filter and before the return cart line, add:
# Per-word soft axes — sum contributions across content slots
for axis_name, lookup in word_axes.items():
contributions = [
pl.col(content_slot).replace_strict(lookup, default=0.0).cast(pl.Float64)
for content_slot in shape.content_slots
]
cart = cart.with_columns(pl.sum_horizontal(contributions).alias(axis_name))
- [ ] Step 5.4: Run test, verify it passes
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 8 passed.
- [ ] Step 5.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add per-word axis scoring to vectorized enumeration
For each soft-axis lookup, sum per-content-slot contributions via
replace_strict + sum_horizontal. Matches python path's sum-across-
content-slots semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 6: Add adverb sentinel + total_score¶
Files:
- Modify: <spike>/skeleton_csp.py
- Modify: <spike>/test_vectorized_enumeration.py
Adds the adv_sentinel constant (when shape has advmod) and the weighted total_score column.
- [ ] Step 6.1: Write failing tests
Append to <spike>/test_vectorized_enumeration.py:
def test_enumerate_vectorized_adv_sentinel_when_advmod_present():
shape = skeleton_csp.SkeletonShape(
arg_structure="nsubj,V,dobj,advmod",
slots=("nsubj", "V", "dobj", "advmod"),
band_freq=0,
)
slot_fillers = [
("nsubj", ["cat"], {"cat": 1.0}),
("V", ["cut"], {}),
("dobj", ["cake"], {"cake": 2.0}),
("advmod", ["quickly"], {}),
]
cart = skeleton_csp._enumerate_vectorized(
shape=shape, slot_fillers=slot_fillers, word_axes={},
weights=None, locked_slots={"V": "cut"},
)
assert "adv_sentinel" in cart.columns
assert cart["adv_sentinel"].to_list() == [0.001]
def test_enumerate_vectorized_total_score_unweighted():
"""Default weights=None → all weights 1.0 → total_score is sum of components."""
shape = skeleton_csp.SkeletonShape(
arg_structure="nsubj,V,dobj",
slots=("nsubj", "V", "dobj"),
band_freq=0,
)
slot_fillers = [
("nsubj", ["cat"], {"cat": 1.5}),
("V", ["cut"], {}),
("dobj", ["cake"], {"cake": 2.5}),
]
cart = skeleton_csp._enumerate_vectorized(
shape=shape, slot_fillers=slot_fillers, word_axes={},
weights=None, locked_slots={"V": "cut"},
)
assert "total_score" in cart.columns
# pmi_nsubj 1.5 + pmi_V 0.0 + pmi_dobj 2.5 = 4.0
assert cart["total_score"].to_list() == [4.0]
def test_enumerate_vectorized_total_score_weighted():
"""Custom weights apply per-axis."""
shape = skeleton_csp.SkeletonShape(
arg_structure="nsubj,V,dobj",
slots=("nsubj", "V", "dobj"),
band_freq=0,
)
slot_fillers = [
("nsubj", ["cat"], {"cat": 1.0}),
("V", ["cut"], {}),
("dobj", ["cake"], {"cake": 2.0}),
]
weights = {"pmi_nsubj": 2.0, "pmi_dobj": 0.5}
cart = skeleton_csp._enumerate_vectorized(
shape=shape, slot_fillers=slot_fillers, word_axes={},
weights=weights, locked_slots={"V": "cut"},
)
# 2.0 * 1.0 + 1.0 * 0.0 + 0.5 * 2.0 = 3.0
assert cart["total_score"].to_list() == [3.0]
- [ ] Step 6.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 3 fail (no adv_sentinel, no total_score columns).
- [ ] Step 6.3: Add adv_sentinel + total_score to
_enumerate_vectorized
In _enumerate_vectorized, after the per-word axis loop (before return cart), add:
# Adverb sentinel (constant; only when shape has advmod)
if "advmod" in shape.slots:
cart = cart.with_columns(pl.lit(0.001).alias("adv_sentinel"))
# Total score = weighted sum of all score columns
score_cols = [
c for c in cart.columns
if c.startswith("pmi_") or c in word_axes or c == "adv_sentinel"
]
weighted = [
pl.col(c) * (weights.get(c, 1.0) if weights else 1.0)
for c in score_cols
]
cart = cart.with_columns(pl.sum_horizontal(weighted).alias("total_score"))
- [ ] Step 6.4: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 11 passed.
- [ ] Step 6.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add adv_sentinel + total_score to vectorized enumeration
adv_sentinel adds a 0.001 constant when shape has advmod (matches the
python path tiebreaker for advmod-PMI-absent verbs). total_score is a
weighted sum across all score columns.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 7: Add _dedup_and_assemble (vectorized → list[dict])¶
Files:
- Modify: <spike>/skeleton_csp.py
- Modify: <spike>/test_vectorized_enumeration.py
Sorts by total_score, deduplicates by content-slot tuple, truncates, and assembles the legacy (total, fillers, components) shape that solve_shape's ccomp-resolution + sentence-realization step expects.
- [ ] Step 7.1: Write failing test
Append to <spike>/test_vectorized_enumeration.py:
def test_dedup_and_assemble_drops_zero_components():
"""The python path's running-components logic drops 0-score pmi keys.
Vectorized assembly must match this — no pmi_<slot> entry in the
components dict for fillers with score 0."""
shape = skeleton_csp.SkeletonShape(
arg_structure="nsubj,V,dobj",
slots=("nsubj", "V", "dobj"),
band_freq=0,
)
slot_fillers = [
("nsubj", ["cat"], {"cat": 1.0}),
("V", ["cut"], {}), # locked, score not in dict
("dobj", ["cake"], {"cake": 2.0}),
]
cart = skeleton_csp._enumerate_vectorized(
shape=shape, slot_fillers=slot_fillers, word_axes={},
weights=None, locked_slots={"V": "cut"},
)
assembled = skeleton_csp._dedup_and_assemble(cart, shape, top_k=1, over_fetch=1)
assert len(assembled) == 1
total, fillers, components = assembled[0]
assert fillers == {"nsubj": "cat", "V": "cut", "dobj": "cake"}
assert "pmi_V" not in components, "0-score pmi_V should be dropped from components"
assert components["pmi_nsubj"] == 1.0
assert components["pmi_dobj"] == 2.0
assert total == 3.0
def test_dedup_and_assemble_dedup_by_content_keys():
"""Two rows with same content-slot fillers but different advmod
collapse to one (highest total_score)."""
shape = skeleton_csp.SkeletonShape(
arg_structure="nsubj,V,dobj,advmod",
slots=("nsubj", "V", "dobj", "advmod"),
band_freq=0,
)
slot_fillers = [
("nsubj", ["cat"], {"cat": 1.0}),
("V", ["cut"], {}),
("dobj", ["cake"], {"cake": 2.0}),
("advmod", ["quickly", "slowly"], {}),
]
cart = skeleton_csp._enumerate_vectorized(
shape=shape, slot_fillers=slot_fillers, word_axes={},
weights=None, locked_slots={"V": "cut"},
)
# Both rows have identical content-slot fillers → dedup to 1
assembled = skeleton_csp._dedup_and_assemble(cart, shape, top_k=2, over_fetch=1)
assert len(assembled) == 1
- [ ] Step 7.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 2 fail with AttributeError: '_dedup_and_assemble'.
- [ ] Step 7.3: Add
_dedup_and_assemble
Append to <spike>/skeleton_csp.py:
def _dedup_and_assemble(
cart: pl.DataFrame,
shape: SkeletonShape,
*,
top_k: int,
over_fetch: int,
) -> list[tuple[float, dict[str, str], dict[str, float]]]:
"""Sort by total_score desc, dedup by content-slot tuple, truncate,
and assemble the (total, fillers, components) tuple shape that
solve_shape's ccomp-resolution loop expects.
Drops 0-score pmi_<slot> entries from the components dict to match
the python path's running-components-cleanup behavior.
"""
if cart.height == 0:
return []
content_keys = list(shape.content_slots)
deduped = (
cart
.sort("total_score", descending=True)
.unique(subset=content_keys, keep="first", maintain_order=True)
.head(top_k * over_fetch)
)
# Identify score columns to copy into components
score_cols = [
c for c in cart.columns
if (c.startswith("pmi_") or c == "adv_sentinel" or c.startswith("include_")
or c.startswith("bound_boost_") or c.startswith("contrastive_"))
and c != "total_score"
]
out: list[tuple[float, dict[str, str], dict[str, float]]] = []
for row in deduped.iter_rows(named=True):
fillers = {s: row[s] for s in shape.slots}
components: dict[str, float] = {}
for c in score_cols:
v = float(row[c])
# Match python path: 0-score pmi entries are dropped
if c.startswith("pmi_") and v == 0.0:
continue
components[c] = v
out.append((float(row["total_score"]), fillers, components))
return out
- [ ] Step 7.4: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 13 passed.
- [ ] Step 7.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add _dedup_and_assemble (vectorized → list[tuple])
Sorts by total_score desc, deduplicates by content-slot tuple via
polars unique(maintain_order=True), and assembles the legacy
(total, fillers, components) tuple shape. Drops 0-score pmi_<slot>
entries from components to match the python path.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 8: Wire vectorized path into solve_shape¶
Files:
- Modify: <spike>/skeleton_csp.py
Adds the routing in solve_shape: vectorized when no contrastive, fallback otherwise.
- [ ] Step 8.1: Update
solve_shapeto route
In solve_shape, find the block from Task 1.3:
initial_locks: dict[str, str] = {"V": verb}
if locked_slots:
initial_locks.update(locked_slots)
best_by_content = _enumerate_python_fallback(
shape=shape,
slot_fillers=slot_fillers,
word_axes=word_axes,
cross_axes=cross_axes,
word_df=word_df,
weights=weights,
locked_slots=initial_locks,
)
deduped = sorted(best_by_content.values(), key=lambda t: t[0], reverse=True)
Replace with:
initial_locks: dict[str, str] = {"V": verb}
if locked_slots:
initial_locks.update(locked_slots)
if _should_use_vectorized(cross_axes=cross_axes):
cart = _enumerate_vectorized(
shape=shape,
slot_fillers=slot_fillers,
word_axes=word_axes,
weights=weights,
locked_slots=initial_locks,
)
# _dedup_and_assemble already sorts + dedups + truncates.
# The downstream ccomp-resolution loop expects an iterable of
# (total, fillers, components) tuples, like deduped. We over-
# fetch here only when ccomp resolution may filter; otherwise
# over_fetch=1 is fine because we already truncated to top_k.
over_fetch = 4 if "ccomp" in shape.slots else 1
deduped = _dedup_and_assemble(cart, shape, top_k=top_k, over_fetch=over_fetch)
else:
best_by_content = _enumerate_python_fallback(
shape=shape,
slot_fillers=slot_fillers,
word_axes=word_axes,
cross_axes=cross_axes,
word_df=word_df,
weights=weights,
locked_slots=initial_locks,
)
deduped = sorted(best_by_content.values(), key=lambda t: t[0], reverse=True)
The over_fetch factor in the existing ccomp-resolution loop is already applied (via deduped[: top_k * over_fetch] further down). For the vectorized path, we've already truncated inside _dedup_and_assemble — but only to top_k * over_fetch, so the existing [:top_k * over_fetch] slicing further down is a no-op for vectorized (or a tighter cut if the assembled list is smaller). Behavior is identical.
- [ ] Step 8.2: Smoke-test
cd packages/generation/research/2026-05-07-sentence-generation-paradigms && \
uv run python -c "
import paradigm_3_csp
from phonolex_data.runtime.store import WordStore
from pathlib import Path
import polars as pl
repo = Path('../../../..').resolve()
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
spec_words = paradigm_3_csp.spec_lexicon(store, 'spec1')
top, stats = paradigm_3_csp.solve('cut', 'spec1', spec_words, sel_df, word_df=store.df)
print(f'top-1: {top[0][\"sentence\"]} total={top[0][\"total_score\"]:.3f}')
print(f'components: {top[0][\"score_components\"]}')
"
Expected: a sentence printed, components dict has pmi_nsubj, pmi_dobj, pmi_advmod (or adv_sentinel).
- [ ] Step 8.3: Run cache tests + new tests
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 25 + 13 = 38 passed.
- [ ] Step 8.4: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py
git commit -m "$(cat <<'EOF'
PHON-104: wire vectorized path into solve_shape
solve_shape now routes between _enumerate_vectorized (no contrastive)
and _enumerate_python_fallback (contrastive present, or test override).
Cache tests + new vectorized unit tests both pass; smoke-test produces
a sensible top-1 sentence end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 9: Equivalence tests over the PHON-95 acceptance probes¶
Files:
- Modify: <spike>/test_vectorized_enumeration.py
The strongest tests in this plan: parameterized over the canonical probe matrix, verifying bit-identical output.
- [ ] Step 9.1: Add session-scoped fixtures + the equivalence test
Append to <spike>/test_vectorized_enumeration.py (add fixtures at the top of the file, after the existing imports):
@pytest.fixture(scope="session")
def store():
from phonolex_data.runtime.store import WordStore
repo_root = Path(__file__).resolve().parents[4]
return WordStore.from_parquet(repo_root / "data" / "runtime" / "words.parquet")
@pytest.fixture(scope="session")
def sel_df():
repo_root = Path(__file__).resolve().parents[4]
return pl.read_parquet(repo_root / "data" / "runtime" / "selectional.parquet")
Then append the equivalence test at the end:
@pytest.mark.parametrize("verb,spec_id,arg_structure", [
("cut", "spec1", "nsubj,V,dobj"),
("cut", "spec1", "nsubj,V,dobj,advmod"),
("chase", "spec1", "nsubj,V,dobj,advmod"),
("melt", "spec6", "nsubj,V,dobj,advmod"),
("eat", "spec1", "nsubj,V,dobj,advmod"),
("fill", "spec1", "nsubj,V,dobj,advmod"),
])
def test_vectorized_matches_python(store, sel_df, verb, spec_id, arg_structure):
"""Bit-identical top-K output between vectorized and python paths."""
import paradigm_3_csp
from skeleton_csp import (
SkeletonShape,
parse_arg_structure,
solve_shape,
_force_python_path,
)
spec_words = paradigm_3_csp.spec_lexicon(store, spec_id)
shape = SkeletonShape(arg_structure, parse_arg_structure(arg_structure), 0)
common = dict(
verb=verb,
domain_words=spec_words,
sel_df=sel_df,
band="fineweb_adult",
word_axes={},
cross_axes={},
word_df=store.df,
top_k=8,
)
vec_out = solve_shape(shape, **common)
with _force_python_path():
py_out = solve_shape(shape, **common)
assert len(vec_out) == len(py_out), (
f"length mismatch: vec={len(vec_out)} py={len(py_out)}"
)
for v, p in zip(vec_out, py_out):
assert v["sentence"] == p["sentence"], f"sentence mismatch: {v['sentence']!r} vs {p['sentence']!r}"
assert abs(v["total_score"] - p["total_score"]) < 1e-9, (
f"total_score mismatch: {v['total_score']} vs {p['total_score']}"
)
assert v["score_components"] == p["score_components"], (
f"components mismatch: {v['score_components']} vs {p['score_components']}"
)
assert v["fillers"] == p["fillers"]
- [ ] Step 9.2: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 13 + 6 = 19 passed.
If any equivalence case fails, this is the moment to debug. Likely culprits: - Float-ordering differences (use a tie-breaking sort key in both paths if needed) - Component dict key handling (make sure 0-score drops are consistent) - Adverb sentinel logic (only fires when no real PMI table for the verb)
- [ ] Step 9.3: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: equivalence tests over PHON-95 acceptance probe matrix
Parameterized over 6 canonical (verb, spec, arg_structure) probes.
Asserts bit-identical sentences, total_score (within 1e-9), and
score_components dicts between vectorized and python paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 10: Add _peek_domain_sizes + _build_solve_stats helpers¶
Files:
- Modify: <spike>/paradigm_3_csp.py
Stats helpers for the upcoming solve() delegation. No functional change yet.
- [ ] Step 10.1: Add the helpers
Append to <spike>/paradigm_3_csp.py (just before def solve():
def _peek_domain_sizes(
verb: str,
band: str,
filtered_spec: frozenset[str],
sel_df: pl.DataFrame,
include_adverb: bool,
) -> dict[str, int]:
"""Pre-cartesian domain sizes per slot for stats parity with the
legacy solve() output. Cheap: PMI dict lookup + set intersection."""
nsubj_pmi = pmi_lookup(sel_df, verb, "nsubj", band)
dobj_pmi = pmi_lookup(sel_df, verb, "dobj", band)
sizes = {
"nsubj": len(set(nsubj_pmi.keys()) & filtered_spec),
"dobj": len(set(dobj_pmi.keys()) & filtered_spec),
}
if include_adverb:
adv_pmi = _advmod_pmi_for_verb(verb, band)
if adv_pmi:
sizes["advmod"] = len(_filter_advmod_by_position(sorted(adv_pmi.keys()), "final"))
else:
fallback = _advmod_band_fallback(band)
sizes["advmod"] = len(_filter_advmod_by_position(list(fallback), "final")) if fallback else 0
else:
sizes["advmod"] = 0
return sizes
def _build_solve_stats(
*,
verb: str,
spec_id: str,
band: str,
candidates: list[dict],
trace: list[dict],
word_axes: dict,
cross_axes: dict,
domain_sizes: dict[str, int],
) -> dict:
"""Synthesize the legacy stats dict shape from solve_shape's output."""
return {
"verb": verb,
"spec_id": spec_id,
"band": band,
"nsubj_domain_size": domain_sizes.get("nsubj", 0),
"dobj_domain_size": domain_sizes.get("dobj", 0),
"adv_domain_size": domain_sizes.get("advmod", 0),
"candidate_count": len(candidates),
"unique_pairs": len({(c.get("nsubj"), c.get("dobj")) for c in candidates}),
"domain_trace": trace,
"active_axes": list(word_axes.keys()) + list(cross_axes.keys()),
}
- [ ] Step 10.2: Smoke-test imports
cd packages/generation/research/2026-05-07-sentence-generation-paradigms && \
uv run python -c "import paradigm_3_csp; print('OK')"
Expected: OK.
- [ ] Step 10.3: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py
git commit -m "$(cat <<'EOF'
PHON-104: add _peek_domain_sizes + _build_solve_stats helpers
Stats helpers for the upcoming solve() delegation. _peek_domain_sizes
computes pre-cartesian per-slot domain sizes via PMI dict ∩ filtered
spec; _build_solve_stats synthesizes the legacy stats dict shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 11: Migrate solve() to delegate to solve_shape¶
Files:
- Modify: <spike>/paradigm_3_csp.py
- Modify: <spike>/test_vectorized_enumeration.py
Replace solve()'s body with delegation. Public signature unchanged.
- [ ] Step 11.1: Write failing test
Append to <spike>/test_vectorized_enumeration.py:
def test_solve_delegation_stats_match(store, sel_df):
"""Delegated solve() returns the legacy stats dict shape."""
import paradigm_3_csp
spec_words = paradigm_3_csp.spec_lexicon(store, "spec1")
top, stats = paradigm_3_csp.solve(
"cut", "spec1", spec_words, sel_df, word_df=store.df,
)
expected_keys = {
"verb", "spec_id", "band",
"nsubj_domain_size", "dobj_domain_size", "adv_domain_size",
"candidate_count", "unique_pairs", "domain_trace", "active_axes",
}
assert set(stats.keys()) == expected_keys
assert stats["verb"] == "cut"
assert stats["spec_id"] == "spec1"
assert stats["candidate_count"] == len(top)
assert stats["unique_pairs"] >= 1
assert stats["nsubj_domain_size"] > 0
assert stats["dobj_domain_size"] > 0
assert stats["adv_domain_size"] > 0
def test_solve_delegation_top_k_matches_solve_shape(store, sel_df):
"""solve() and solve_shape produce equivalent top-K candidates."""
import paradigm_3_csp
from skeleton_csp import SkeletonShape, parse_arg_structure, solve_shape
spec_words = paradigm_3_csp.spec_lexicon(store, "spec1")
top_solve, _ = paradigm_3_csp.solve(
"cut", "spec1", spec_words, sel_df, word_df=store.df, top_k=5,
)
arg = "nsubj,V,dobj,advmod"
shape = SkeletonShape(arg, parse_arg_structure(arg), 0)
top_shape = solve_shape(
shape, verb="cut", domain_words=spec_words, sel_df=sel_df,
band="fineweb_adult", word_axes={}, cross_axes={},
word_df=store.df, top_k=5,
)
assert [c["sentence"] for c in top_solve] == [c["sentence"] for c in top_shape]
- [ ] Step 11.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py::test_solve_delegation_stats_match -v
Expected: PASS for solve() returning stats — but the new solve() doesn't exist yet. Actually the OLD solve() may already return all keys. Let me check: looking at the plan/spec, the legacy solve() already returns all the keys we expect. So this test passes against the existing implementation. The migration's job is to preserve this contract — so the test serves as a regression guard.
The test_solve_delegation_top_k_matches_solve_shape test should currently fail because the old solve() has slightly different scoring nuances vs solve_shape's path. After migration, both go through solve_shape, so they'll match.
Run the new tests now to see baseline:
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Capture which test passes/fails. Either: - Both pass → you can skip Steps 11.3–11.4 and just commit the test. - One fails → proceed with the migration.
- [ ] Step 11.3: Migrate
solve()body to delegate
Find the existing solve() in paradigm_3_csp.py (around line 113):
def solve(
verb: str,
spec_id: str,
spec_words: frozenset[str],
sel_df: pl.DataFrame,
*,
constraints: list[Constraint] | None = None,
word_df: pl.DataFrame | None = None,
band: str = BAND,
top_k: int = TOP_K,
include_adverb: bool = True,
weights: dict[str, float] | None = None,
) -> tuple[list[dict], dict]:
"""Return (top_K candidates as dicts, stats dict).
...
"""
constraints = list(constraints or [])
# ... existing 70+ line body ...
Replace the entire body (everything after the docstring) with:
constraints = list(constraints or [])
arg = "nsubj,V,dobj,advmod" if include_adverb else "nsubj,V,dobj"
shape = SkeletonShape(arg, parse_arg_structure(arg), band_freq=0)
filtered_spec, trace = _resolve_domain_words(spec_words, constraints, word_df)
word_axes = get_per_word_axes(constraints, word_df)
cross_axes = cross_slot_axes(constraints)
domain_sizes = _peek_domain_sizes(verb, band, filtered_spec, sel_df, include_adverb)
candidates = solve_shape(
shape,
verb=verb,
domain_words=filtered_spec,
sel_df=sel_df,
band=band,
word_axes=word_axes,
cross_axes=cross_axes,
word_df=word_df,
weights=weights,
top_k=top_k,
)
stats = _build_solve_stats(
verb=verb, spec_id=spec_id, band=band,
candidates=candidates, trace=trace,
word_axes=word_axes, cross_axes=cross_axes,
domain_sizes=domain_sizes,
)
return candidates, stats
You'll need to add imports at the top of paradigm_3_csp.py if not already present:
from skeleton_csp import (
SkeletonShape,
parse_arg_structure,
solve_shape,
# ... other existing imports
)
Verify these are already imported (likely some are).
- [ ] Step 11.4: Run all tests
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v
Expected: 25 + 21 = 46 passed.
- [ ] Step 11.5: Smoke-test paradigm_3_csp demos still work
cd packages/generation/research/2026-05-07-sentence-generation-paradigms && \
uv run python -c "
import paradigm_3_csp
from phonolex_data.runtime.store import WordStore
from pathlib import Path
import polars as pl
repo = Path('../../../..').resolve()
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
spec_words = paradigm_3_csp.spec_lexicon(store, 'spec1')
top, stats = paradigm_3_csp.solve('cut', 'spec1', spec_words, sel_df, word_df=store.df)
print(f'top-1: {top[0][\"sentence\"]}')
print(f'stats nsubj_domain_size={stats[\"nsubj_domain_size\"]}, candidate_count={stats[\"candidate_count\"]}')
"
Expected: a sentence printed, stats fields present.
- [ ] Step 11.6: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: migrate solve() to delegate to solve_shape
solve()'s body shrinks from 70+ lines of manual loops to a thin
wrapper that constructs a SkeletonShape, calls solve_shape, and
repackages the result into the legacy (top, stats) shape via the
new _build_solve_stats helper. Public signature unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 12: Bench script + record baseline¶
Files:
- Create: <spike>/bench_enumeration.py
- Modify: docs/superpowers/specs/2026-05-08-phon-104-csp-vectorize-enumeration-design.md
Bench the largest probe under both paths, record the speedup.
- [ ] Step 12.1: Create the bench script
Create <spike>/bench_enumeration.py:
"""Bench vectorized vs python enumeration on the largest acceptance probe — PHON-104.
Run: uv run python research/2026-05-07-sentence-generation-paradigms/bench_enumeration.py
"""
from __future__ import annotations
import sys
import time
from pathlib import Path
import polars as pl
sys.path.insert(0, str(Path(__file__).parent))
import paradigm_3_csp
from phonolex_data.runtime.store import WordStore
from skeleton_csp import (
SkeletonShape,
_force_python_path,
parse_arg_structure,
solve_shape,
)
def _load_data() -> tuple[WordStore, pl.DataFrame]:
repo_root = Path(__file__).resolve().parents[4]
store = WordStore.from_parquet(repo_root / "data" / "runtime" / "words.parquet")
sel_df = pl.read_parquet(repo_root / "data" / "runtime" / "selectional.parquet")
return store, sel_df
def _run_probe(verb: str, spec_id: str, store: WordStore, sel_df: pl.DataFrame, force_python: bool) -> tuple[float, int]:
"""Return (wall_clock_seconds, num_candidates)."""
spec_words = paradigm_3_csp.spec_lexicon(store, spec_id)
arg = "nsubj,V,dobj,advmod"
shape = SkeletonShape(arg, parse_arg_structure(arg), 0)
common = dict(
verb=verb, domain_words=spec_words, sel_df=sel_df,
band="fineweb_adult", word_axes={}, cross_axes={},
word_df=store.df, top_k=8,
)
if force_python:
with _force_python_path():
t0 = time.perf_counter()
top = solve_shape(shape, **common)
elapsed = time.perf_counter() - t0
else:
t0 = time.perf_counter()
top = solve_shape(shape, **common)
elapsed = time.perf_counter() - t0
return elapsed, len(top)
def main() -> None:
print("Loading WordStore + selectional.parquet…")
store, sel_df = _load_data()
probes = [
("melt", "spec6"),
("cut", "spec1"),
("chase", "spec1"),
("eat", "spec1"),
("fill", "spec1"),
]
print(f"\n{'Probe':<20}{'Vec (s)':>10}{'Py (s)':>10}{'Speedup':>10}")
print("-" * 50)
total_vec = 0.0
total_py = 0.0
for verb, spec_id in probes:
# Warm both paths once to factor out import / compile overhead
_run_probe(verb, spec_id, store, sel_df, force_python=False)
_run_probe(verb, spec_id, store, sel_df, force_python=True)
# Real timing
vec_t, vec_n = _run_probe(verb, spec_id, store, sel_df, force_python=False)
py_t, py_n = _run_probe(verb, spec_id, store, sel_df, force_python=True)
speedup = py_t / vec_t if vec_t > 0 else float("inf")
print(f"{verb} × {spec_id:<10}{vec_t:>10.3f}{py_t:>10.3f}{speedup:>10.2f}x")
total_vec += vec_t
total_py += py_t
overall = total_py / total_vec if total_vec > 0 else float("inf")
print("-" * 50)
print(f"{'TOTAL':<20}{total_vec:>10.3f}{total_py:>10.3f}{overall:>10.2f}x")
if __name__ == "__main__":
main()
- [ ] Step 12.2: Run the bench
cd packages/generation && uv run python research/2026-05-07-sentence-generation-paradigms/bench_enumeration.py
Capture the output. Expected: vectorized path significantly faster on melt × spec6 (the 273K-cartesian probe). Other probes have smaller cartesians so their speedup may be smaller.
- [ ] Step 12.3: Append baseline numbers to the spec
Open docs/superpowers/specs/2026-05-08-phon-104-csp-vectorize-enumeration-design.md and append at the END:
## Empirical baseline (recorded 2026-05-08)
From `bench_enumeration.py`:
| Probe | Vec (s) | Py (s) | Speedup |
|---|---|---|---|
| melt × spec6 | <FILL> | <FILL> | <FILL>x |
| cut × spec1 | <FILL> | <FILL> | <FILL>x |
| chase × spec1 | <FILL> | <FILL> | <FILL>x |
| eat × spec1 | <FILL> | <FILL> | <FILL>x |
| fill × spec1 | <FILL> | <FILL> | <FILL>x |
| **Total** | **<FILL>** | **<FILL>** | **<FILL>x** |
Replace each <FILL> with actual numbers from the bench output.
- [ ] Step 12.4: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/bench_enumeration.py \
docs/superpowers/specs/2026-05-08-phon-104-csp-vectorize-enumeration-design.md
git commit -m "$(cat <<'EOF'
PHON-104: bench enumeration + record baseline speedup
bench_enumeration.py compares vectorized vs forced-python paths on the
PHON-95 acceptance probe matrix. Numbers folded into the design spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Done¶
After Task 12 commits, PHON-104 closes. The CSP enumeration is vectorized; solve() delegates to solve_shape; ContrastiveConstraint requests still use the Python fallback (PHON-106 reworks contrastive scoring). PHON-105 (hybrid PPMI + raw frequency for verbal slots) is unblocked.