Skip to content

PHON-104 — CSP Enumeration Vectorization Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Vectorize CSP enumeration via Polars cross-joins + column expressions, eliminating Python-loop overhead in solve_shape's Cartesian without dropping any candidates. Migrate solve() to delegate to solve_shape so the speedup flows through to all callers.

Architecture: Two-part work. Part A: extract current enumerate_assignments logic into _enumerate_python_fallback, then add _enumerate_vectorized using Polars cross-joins, score-as-column expressions, and unique()-based content-pair dedup. Routing decision in solve_shape picks vectorized vs fallback based on whether ContrastiveConstraint scorers are registered. Part B: rewrite paradigm_3_csp.solve() as a thin wrapper that constructs a SkeletonShape and delegates to solve_shape, repackaging the result into the legacy (top, stats) tuple.

Tech Stack: Python 3.12, Polars 1.0+, pytest.

Spec: docs/superpowers/specs/2026-05-08-phon-104-csp-vectorize-enumeration-design.md


File map

File Action
packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py Modify — extract Python fallback, add vectorized path, add routing
packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py Modify — solve() delegates to solve_shape; add stats helpers
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py Create — equivalence + routing + stats-parity tests
packages/generation/research/2026-05-07-sentence-generation-paradigms/bench_enumeration.py Create — vectorized vs forced-python timing on largest probe

All paths in this plan are relative to repo root /Users/jneumann/Repos/PhonoLex/. The spike directory is referenced as <spike>/ for brevity: <spike>/ = packages/generation/research/2026-05-07-sentence-generation-paradigms/.

Test command throughout:

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v


Task 1: Extract _enumerate_python_fallback

Files: - Modify: <spike>/skeleton_csp.py

Pull the existing enumerate_assignments generator and the surrounding scoring loop from solve_shape into a separate function. No behavior change — pure refactor. After this task, the cache tests (PHON-103) continue to pass and solve_shape works identically.

  • [ ] Step 1.1: Read the current solve_shape body
sed -n '589,740p' packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py

The current solve_shape defines enumerate_assignments as a nested generator, then runs a scoring loop over its yields. Extract the scoring portion (everything from best_by_content: dict[...] = {} through the loop that builds best_by_content[key]) into a new module-level helper _enumerate_python_fallback.

  • [ ] Step 1.2: Add _enumerate_python_fallback

Add a new function near solve_shape (just above it):

def _enumerate_python_fallback(
    shape: SkeletonShape,
    slot_fillers: list[tuple[str, list[str], dict[str, float]]],
    word_axes: dict[str, dict[str, float]],
    cross_axes: dict,
    word_df: pl.DataFrame | None,
    weights: dict[str, float] | None,
    locked_slots: dict[str, str],
) -> dict[tuple[str, ...], tuple[float, dict[str, str], dict[str, float]]]:
    """Python-loop enumeration. Returns best_by_content dict keyed on
    content-slot tuple, valued (total_score, fillers, components)."""
    best_by_content: dict[tuple[str, ...], tuple[float, dict[str, str], dict[str, float]]] = {}

    def enumerate_assignments(
        idx: int,
        partial: dict[str, str],
        running_components: dict[str, float],
    ) -> Iterable[tuple[dict[str, str], dict[str, float]]]:
        if idx == len(slot_fillers):
            yield dict(partial), dict(running_components)
            return
        slot, fillers, scores = slot_fillers[idx]
        if slot in partial:
            locked_word = partial[slot]
            locked_score = scores.get(locked_word, 0.0)
            comp_key = f"pmi_{slot}"
            if locked_score > 0:
                running_components[comp_key] = running_components.get(comp_key, 0.0) + locked_score
            yield from enumerate_assignments(idx + 1, partial, running_components)
            if locked_score > 0:
                running_components[comp_key] -= locked_score
                if abs(running_components.get(comp_key, 0.0)) < 1e-12:
                    running_components.pop(comp_key, None)
            return
        for f in fillers:
            partial[slot] = f
            comp_key = f"pmi_{slot}"
            score = scores.get(f, 0.0)
            running_components[comp_key] = score if comp_key not in running_components else running_components[comp_key] + score
            yield from enumerate_assignments(idx + 1, partial, running_components)
            del partial[slot]
            if comp_key in running_components:
                if score == 0.0:
                    del running_components[comp_key]
                else:
                    running_components[comp_key] -= score
                    if abs(running_components[comp_key]) < 1e-12:
                        del running_components[comp_key]

    initial: dict[str, str] = dict(locked_slots)
    for fillers_dict, components in enumerate_assignments(0, initial, {}):
        if "nsubj" in fillers_dict and "dobj" in fillers_dict and fillers_dict["nsubj"] == fillers_dict["dobj"]:
            continue
        for axis_name, axis_lookup in word_axes.items():
            total_axis = 0.0
            for slot in shape.content_slots:
                total_axis += axis_lookup.get(fillers_dict[slot], 0.0)
            if total_axis != 0.0:
                components[axis_name] = float(total_axis)
        if cross_axes and word_df is not None:
            slot_assignment = {s: fillers_dict[s] for s in shape.content_slots}
            for axis_name, scorer in cross_axes.items():
                components[axis_name] = float(scorer(slot_assignment, word_df))
        if "advmod" in fillers_dict:
            components["adv_sentinel"] = 0.001
        total = _weighted_total(components, weights)
        key = _content_pair_key(shape, fillers_dict)
        cur = best_by_content.get(key)
        if cur is None or total > cur[0]:
            best_by_content[key] = (total, dict(fillers_dict), dict(components))
    return best_by_content
  • [ ] Step 1.3: Replace the inline body inside solve_shape with a call to _enumerate_python_fallback

Find this block in solve_shape (the body after slot_fillers is built; between slot_fillers.append(...) and deduped = sorted(best_by_content.values()...)):

    # Cartesian over slot fillers, dedup by content-slot key.
    best_by_content: dict[tuple[str, ...], tuple[float, dict[str, str], dict[str, float]]] = {}

    def enumerate_assignments(
        idx: int,
        partial: dict[str, str],
        running_components: dict[str, float],
    ) -> Iterable[tuple[dict[str, str], dict[str, float]]]:
        ...

    initial: dict[str, str] = {"V": verb}
    if locked_slots:
        initial.update(locked_slots)
    for fillers_dict, components in enumerate_assignments(0, initial, {}):
        ...
        best_by_content[key] = (total, dict(fillers_dict), dict(components))

Replace the entire block (from best_by_content: dict[...] = {} through the end of the for-loop building best_by_content[key]) with:

    initial_locks: dict[str, str] = {"V": verb}
    if locked_slots:
        initial_locks.update(locked_slots)
    best_by_content = _enumerate_python_fallback(
        shape=shape,
        slot_fillers=slot_fillers,
        word_axes=word_axes,
        cross_axes=cross_axes,
        word_df=word_df,
        weights=weights,
        locked_slots=initial_locks,
    )

    deduped = sorted(best_by_content.values(), key=lambda t: t[0], reverse=True)
  • [ ] Step 1.4: Run cache tests + manual smoke
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 25 passed (no regression — pure refactor).

cd packages/generation/research/2026-05-07-sentence-generation-paradigms && \
  uv run python -c "
import paradigm_3_csp
from phonolex_data.runtime.store import WordStore
from pathlib import Path
import polars as pl

repo = Path('../../../..').resolve()
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
spec_words = paradigm_3_csp.spec_lexicon(store, 'spec1')
top, stats = paradigm_3_csp.solve('cut', 'spec1', spec_words, sel_df, word_df=store.df)
print(f'top-1: {top[0][\"sentence\"]}  total_score={top[0][\"total_score\"]:.3f}')
"

Expected: a sentence printed.

  • [ ] Step 1.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py
git commit -m "$(cat <<'EOF'
PHON-104: extract _enumerate_python_fallback (pure refactor)

Pull the existing enumerate_assignments generator + scoring loop out
of solve_shape into a module-level helper. No behavior change — sets
up the routing point for the upcoming vectorized path.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 2: Add _should_use_vectorized routing + _FORCE_PYTHON_PATH flag

Files: - Modify: <spike>/skeleton_csp.py - Create: <spike>/test_vectorized_enumeration.py

Adds the routing decision but no vectorized path yet — it always returns False until Task 8 wires it in.

  • [ ] Step 2.1: Write the failing routing tests

Create <spike>/test_vectorized_enumeration.py:

"""Tests for vectorized enumeration — PHON-104."""
from __future__ import annotations

import sys
from pathlib import Path

import pytest

sys.path.insert(0, str(Path(__file__).parent))

from constraint_surface import (
    ContrastiveConstraint,
    IncludeConstraint,
    cross_slot_axes,
)
import skeleton_csp


def test_no_contrastive_takes_vectorized():
    cross = cross_slot_axes([IncludeConstraint(phonemes=("k",))])
    assert skeleton_csp._should_use_vectorized(cross_axes=cross) is True


def test_contrastive_takes_python_fallback():
    cross = cross_slot_axes([
        ContrastiveConstraint(pair_type="minpair", phoneme1="k", phoneme2="g")
    ])
    assert skeleton_csp._should_use_vectorized(cross_axes=cross) is False


def test_force_python_path_overrides_routing():
    """The _FORCE_PYTHON_PATH flag forces fallback regardless of constraints."""
    cross = cross_slot_axes([IncludeConstraint(phonemes=("k",))])
    with skeleton_csp._force_python_path():
        assert skeleton_csp._should_use_vectorized(cross_axes=cross) is False
    # Outside the context, normal routing
    assert skeleton_csp._should_use_vectorized(cross_axes=cross) is True
  • [ ] Step 2.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 3 fail with AttributeError: module 'skeleton_csp' has no attribute '_should_use_vectorized'.

  • [ ] Step 2.3: Add the routing helper + flag + context manager

Append to <spike>/skeleton_csp.py:

import contextlib

_FORCE_PYTHON_PATH = False


@contextlib.contextmanager
def _force_python_path():
    """Test-only context manager that forces the python fallback regardless
    of constraint shape. Used to compare vectorized vs python output on
    inputs where vectorized would normally be selected."""
    global _FORCE_PYTHON_PATH
    prev = _FORCE_PYTHON_PATH
    _FORCE_PYTHON_PATH = True
    try:
        yield
    finally:
        _FORCE_PYTHON_PATH = prev


def _should_use_vectorized(*, cross_axes: dict) -> bool:
    """Route between vectorized and python fallback paths.

    Vectorized path runs when no cross-slot scorers are registered (i.e.,
    no ContrastiveConstraint in the request). When _FORCE_PYTHON_PATH is
    set (test only), always returns False.
    """
    if _FORCE_PYTHON_PATH:
        return False
    return not cross_axes
  • [ ] Step 2.4: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 3 passed.

  • [ ] Step 2.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add _should_use_vectorized routing + _force_python_path test hook

Routing decision: vectorized path runs when no cross-slot scorers
(no ContrastiveConstraint). _force_python_path() context manager
forces the fallback for equivalence testing.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 3: Add _build_slot_filler_tables

Files: - Modify: <spike>/skeleton_csp.py - Modify: <spike>/test_vectorized_enumeration.py

Builds per-slot Polars frames from the slot_fillers tuples.

  • [ ] Step 3.1: Write failing tests

Append to <spike>/test_vectorized_enumeration.py:

import polars as pl


def test_build_slot_filler_tables_basic():
    slot_fillers = [
        ("V", ["cut"], {}),
        ("nsubj", ["cat", "kid"], {"cat": 1.5, "kid": 0.8}),
        ("dobj", ["cake", "rope"], {"cake": 2.0, "rope": 1.2}),
    ]
    tables = skeleton_csp._build_slot_filler_tables(slot_fillers, locked_slots={"V": "cut"})

    assert set(tables.keys()) == {"V", "nsubj", "dobj"}
    # V is locked → 1 row
    assert tables["V"].height == 1
    assert tables["V"]["V"].to_list() == ["cut"]
    assert tables["V"]["pmi_V"].to_list() == [0.0]
    # nsubj has 2 fillers
    assert tables["nsubj"].height == 2
    assert sorted(tables["nsubj"]["nsubj"].to_list()) == ["cat", "kid"]
    # PMI scores aligned with filler order
    nsubj_rows = dict(zip(tables["nsubj"]["nsubj"].to_list(), tables["nsubj"]["pmi_nsubj"].to_list()))
    assert nsubj_rows == {"cat": 1.5, "kid": 0.8}


def test_build_slot_filler_tables_locked_filler_not_in_scores():
    """A locked filler whose word isn't in scores → 0.0 PMI column."""
    slot_fillers = [
        ("nsubj", ["a", "b", "c"], {"a": 1.0}),
    ]
    tables = skeleton_csp._build_slot_filler_tables(slot_fillers, locked_slots={"nsubj": "z"})
    assert tables["nsubj"].height == 1
    assert tables["nsubj"]["nsubj"].to_list() == ["z"]
    assert tables["nsubj"]["pmi_nsubj"].to_list() == [0.0]
  • [ ] Step 3.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 2 fail with AttributeError: module 'skeleton_csp' has no attribute '_build_slot_filler_tables'.

  • [ ] Step 3.3: Add _build_slot_filler_tables

Append to <spike>/skeleton_csp.py:

def _build_slot_filler_tables(
    slot_fillers: list[tuple[str, list[str], dict[str, float]]],
    locked_slots: dict[str, str],
) -> dict[str, pl.DataFrame]:
    """Build per-slot polars frames with `<slot>` (filler) + `pmi_<slot>` columns.

    Locked slots produce a 1-row frame with the locked filler. Non-locked
    slots produce a |fillers|-row frame.
    """
    tables: dict[str, pl.DataFrame] = {}
    for slot, fillers, scores in slot_fillers:
        if slot in locked_slots:
            w = locked_slots[slot]
            tables[slot] = pl.DataFrame({
                slot: [w],
                f"pmi_{slot}": [scores.get(w, 0.0)],
            })
        else:
            tables[slot] = pl.DataFrame({
                slot: fillers,
                f"pmi_{slot}": [scores.get(f, 0.0) for f in fillers],
            })
    return tables
  • [ ] Step 3.4: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 5 passed.

  • [ ] Step 3.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add _build_slot_filler_tables helper

Converts slot_fillers tuples into per-slot polars frames keyed on
slot name. Locked slots produce 1-row frames; missing PMI scores
default to 0.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 4: Add _enumerate_vectorized skeleton (cross-join + nsubj!=dobj filter)

Files: - Modify: <spike>/skeleton_csp.py - Modify: <spike>/test_vectorized_enumeration.py

Cross-joins per-slot frames into a Cartesian, applies the nsubj-dobj distinct invariant. No scoring yet.

  • [ ] Step 4.1: Write failing test

Append to <spike>/test_vectorized_enumeration.py:

def test_enumerate_vectorized_cardinality():
    """Cartesian cardinality: 2 nsubj × 2 dobj × 1 V = 4 rows; minus
    the nsubj==dobj diagonal (none here, words distinct) = 4."""
    shape = skeleton_csp.SkeletonShape(
        arg_structure="nsubj,V,dobj",
        slots=("nsubj", "V", "dobj"),
        band_freq=0,
    )
    slot_fillers = [
        ("nsubj", ["cat", "kid"], {"cat": 1.0, "kid": 0.5}),
        ("V", ["cut"], {}),
        ("dobj", ["cake", "rope"], {"cake": 2.0, "rope": 1.0}),
    ]
    cart = skeleton_csp._enumerate_vectorized(
        shape=shape,
        slot_fillers=slot_fillers,
        word_axes={},
        weights=None,
        locked_slots={"V": "cut"},
    )
    assert cart.height == 4
    assert set(cart.columns) >= {"nsubj", "V", "dobj", "pmi_nsubj", "pmi_dobj", "pmi_V"}


def test_enumerate_vectorized_nsubj_dobj_distinct():
    """nsubj != dobj invariant filters out the diagonal."""
    shape = skeleton_csp.SkeletonShape(
        arg_structure="nsubj,V,dobj",
        slots=("nsubj", "V", "dobj"),
        band_freq=0,
    )
    slot_fillers = [
        ("nsubj", ["cat", "kid"], {"cat": 1.0, "kid": 0.5}),
        ("V", ["cut"], {}),
        ("dobj", ["cat", "kid"], {"cat": 2.0, "kid": 1.0}),
    ]
    cart = skeleton_csp._enumerate_vectorized(
        shape=shape, slot_fillers=slot_fillers, word_axes={},
        weights=None, locked_slots={"V": "cut"},
    )
    # 2×2 = 4, minus 2 (cat,cat / kid,kid) = 2
    assert cart.height == 2
    for n, d in zip(cart["nsubj"].to_list(), cart["dobj"].to_list()):
        assert n != d
  • [ ] Step 4.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 2 fail with AttributeError: module 'skeleton_csp' has no attribute '_enumerate_vectorized'.

  • [ ] Step 4.3: Add _enumerate_vectorized (cardinality + filter only)

Append to <spike>/skeleton_csp.py:

def _enumerate_vectorized(
    shape: SkeletonShape,
    slot_fillers: list[tuple[str, list[str], dict[str, float]]],
    word_axes: dict[str, dict[str, float]],
    weights: dict[str, float] | None,
    locked_slots: dict[str, str],
) -> pl.DataFrame:
    """Vectorized Cartesian via Polars cross-joins. Returns a DataFrame
    with one row per assignment, columns for each slot's filler and PMI
    score, plus per-axis score columns (added in subsequent tasks)."""
    tables = _build_slot_filler_tables(slot_fillers, locked_slots)
    # Cartesian via successive cross joins
    cart = tables[shape.slots[0]]
    for s in shape.slots[1:]:
        cart = cart.join(tables[s], how="cross")
    # nsubj != dobj invariant (only when both present)
    if "nsubj" in shape.slots and "dobj" in shape.slots:
        cart = cart.filter(pl.col("nsubj") != pl.col("dobj"))
    return cart
  • [ ] Step 4.4: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 7 passed.

  • [ ] Step 4.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add _enumerate_vectorized skeleton (cross-join + invariant)

Cross-joins per-slot frames into a Polars Cartesian. Applies the
nsubj != dobj invariant when both slots are present. No scoring yet
— scoring columns added in subsequent tasks.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 5: Add per-word axis scoring

Files: - Modify: <spike>/skeleton_csp.py - Modify: <spike>/test_vectorized_enumeration.py

For each soft-axis lookup (e.g., include_/k/), add a column summing per-content-slot lookups.

  • [ ] Step 5.1: Write failing test

Append to <spike>/test_vectorized_enumeration.py:

def test_enumerate_vectorized_per_word_axes():
    """Per-word axis sums across content slots."""
    shape = skeleton_csp.SkeletonShape(
        arg_structure="nsubj,V,dobj",
        slots=("nsubj", "V", "dobj"),
        band_freq=0,
    )
    slot_fillers = [
        ("nsubj", ["cat", "kid"], {"cat": 1.0, "kid": 0.5}),
        ("V", ["cut"], {}),
        ("dobj", ["cake", "rope"], {"cake": 2.0, "rope": 1.0}),
    ]
    word_axes = {
        "include_/k/": {"cat": 1.0, "kid": 1.0, "cake": 1.0},  # contains /k/
        # rope contains no /k/ → 0
    }
    cart = skeleton_csp._enumerate_vectorized(
        shape=shape, slot_fillers=slot_fillers, word_axes=word_axes,
        weights=None, locked_slots={"V": "cut"},
    )
    assert "include_/k/" in cart.columns
    # cat + cake = 2.0; cat + rope = 1.0; kid + cake = 2.0; kid + rope = 1.0
    rows = sorted(zip(
        cart["nsubj"].to_list(),
        cart["dobj"].to_list(),
        cart["include_/k/"].to_list(),
    ))
    assert rows == [
        ("cat", "cake", 2.0),
        ("cat", "rope", 1.0),
        ("kid", "cake", 2.0),
        ("kid", "rope", 1.0),
    ]
  • [ ] Step 5.2: Run test, verify it fails
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py::test_enumerate_vectorized_per_word_axes -v

Expected: AssertionError — "include_/k/" not in columns.

  • [ ] Step 5.3: Add per-word axis scoring to _enumerate_vectorized

In _enumerate_vectorized, after the nsubj != dobj filter and before the return cart line, add:

    # Per-word soft axes — sum contributions across content slots
    for axis_name, lookup in word_axes.items():
        contributions = [
            pl.col(content_slot).replace_strict(lookup, default=0.0).cast(pl.Float64)
            for content_slot in shape.content_slots
        ]
        cart = cart.with_columns(pl.sum_horizontal(contributions).alias(axis_name))
  • [ ] Step 5.4: Run test, verify it passes
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 8 passed.

  • [ ] Step 5.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add per-word axis scoring to vectorized enumeration

For each soft-axis lookup, sum per-content-slot contributions via
replace_strict + sum_horizontal. Matches python path's sum-across-
content-slots semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 6: Add adverb sentinel + total_score

Files: - Modify: <spike>/skeleton_csp.py - Modify: <spike>/test_vectorized_enumeration.py

Adds the adv_sentinel constant (when shape has advmod) and the weighted total_score column.

  • [ ] Step 6.1: Write failing tests

Append to <spike>/test_vectorized_enumeration.py:

def test_enumerate_vectorized_adv_sentinel_when_advmod_present():
    shape = skeleton_csp.SkeletonShape(
        arg_structure="nsubj,V,dobj,advmod",
        slots=("nsubj", "V", "dobj", "advmod"),
        band_freq=0,
    )
    slot_fillers = [
        ("nsubj", ["cat"], {"cat": 1.0}),
        ("V", ["cut"], {}),
        ("dobj", ["cake"], {"cake": 2.0}),
        ("advmod", ["quickly"], {}),
    ]
    cart = skeleton_csp._enumerate_vectorized(
        shape=shape, slot_fillers=slot_fillers, word_axes={},
        weights=None, locked_slots={"V": "cut"},
    )
    assert "adv_sentinel" in cart.columns
    assert cart["adv_sentinel"].to_list() == [0.001]


def test_enumerate_vectorized_total_score_unweighted():
    """Default weights=None → all weights 1.0 → total_score is sum of components."""
    shape = skeleton_csp.SkeletonShape(
        arg_structure="nsubj,V,dobj",
        slots=("nsubj", "V", "dobj"),
        band_freq=0,
    )
    slot_fillers = [
        ("nsubj", ["cat"], {"cat": 1.5}),
        ("V", ["cut"], {}),
        ("dobj", ["cake"], {"cake": 2.5}),
    ]
    cart = skeleton_csp._enumerate_vectorized(
        shape=shape, slot_fillers=slot_fillers, word_axes={},
        weights=None, locked_slots={"V": "cut"},
    )
    assert "total_score" in cart.columns
    # pmi_nsubj 1.5 + pmi_V 0.0 + pmi_dobj 2.5 = 4.0
    assert cart["total_score"].to_list() == [4.0]


def test_enumerate_vectorized_total_score_weighted():
    """Custom weights apply per-axis."""
    shape = skeleton_csp.SkeletonShape(
        arg_structure="nsubj,V,dobj",
        slots=("nsubj", "V", "dobj"),
        band_freq=0,
    )
    slot_fillers = [
        ("nsubj", ["cat"], {"cat": 1.0}),
        ("V", ["cut"], {}),
        ("dobj", ["cake"], {"cake": 2.0}),
    ]
    weights = {"pmi_nsubj": 2.0, "pmi_dobj": 0.5}
    cart = skeleton_csp._enumerate_vectorized(
        shape=shape, slot_fillers=slot_fillers, word_axes={},
        weights=weights, locked_slots={"V": "cut"},
    )
    # 2.0 * 1.0 + 1.0 * 0.0 + 0.5 * 2.0 = 3.0
    assert cart["total_score"].to_list() == [3.0]
  • [ ] Step 6.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 3 fail (no adv_sentinel, no total_score columns).

  • [ ] Step 6.3: Add adv_sentinel + total_score to _enumerate_vectorized

In _enumerate_vectorized, after the per-word axis loop (before return cart), add:

    # Adverb sentinel (constant; only when shape has advmod)
    if "advmod" in shape.slots:
        cart = cart.with_columns(pl.lit(0.001).alias("adv_sentinel"))

    # Total score = weighted sum of all score columns
    score_cols = [
        c for c in cart.columns
        if c.startswith("pmi_") or c in word_axes or c == "adv_sentinel"
    ]
    weighted = [
        pl.col(c) * (weights.get(c, 1.0) if weights else 1.0)
        for c in score_cols
    ]
    cart = cart.with_columns(pl.sum_horizontal(weighted).alias("total_score"))
  • [ ] Step 6.4: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 11 passed.

  • [ ] Step 6.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add adv_sentinel + total_score to vectorized enumeration

adv_sentinel adds a 0.001 constant when shape has advmod (matches the
python path tiebreaker for advmod-PMI-absent verbs). total_score is a
weighted sum across all score columns.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 7: Add _dedup_and_assemble (vectorized → list[dict])

Files: - Modify: <spike>/skeleton_csp.py - Modify: <spike>/test_vectorized_enumeration.py

Sorts by total_score, deduplicates by content-slot tuple, truncates, and assembles the legacy (total, fillers, components) shape that solve_shape's ccomp-resolution + sentence-realization step expects.

  • [ ] Step 7.1: Write failing test

Append to <spike>/test_vectorized_enumeration.py:

def test_dedup_and_assemble_drops_zero_components():
    """The python path's running-components logic drops 0-score pmi keys.
    Vectorized assembly must match this — no pmi_<slot> entry in the
    components dict for fillers with score 0."""
    shape = skeleton_csp.SkeletonShape(
        arg_structure="nsubj,V,dobj",
        slots=("nsubj", "V", "dobj"),
        band_freq=0,
    )
    slot_fillers = [
        ("nsubj", ["cat"], {"cat": 1.0}),
        ("V", ["cut"], {}),  # locked, score not in dict
        ("dobj", ["cake"], {"cake": 2.0}),
    ]
    cart = skeleton_csp._enumerate_vectorized(
        shape=shape, slot_fillers=slot_fillers, word_axes={},
        weights=None, locked_slots={"V": "cut"},
    )
    assembled = skeleton_csp._dedup_and_assemble(cart, shape, top_k=1, over_fetch=1)
    assert len(assembled) == 1
    total, fillers, components = assembled[0]
    assert fillers == {"nsubj": "cat", "V": "cut", "dobj": "cake"}
    assert "pmi_V" not in components, "0-score pmi_V should be dropped from components"
    assert components["pmi_nsubj"] == 1.0
    assert components["pmi_dobj"] == 2.0
    assert total == 3.0


def test_dedup_and_assemble_dedup_by_content_keys():
    """Two rows with same content-slot fillers but different advmod
    collapse to one (highest total_score)."""
    shape = skeleton_csp.SkeletonShape(
        arg_structure="nsubj,V,dobj,advmod",
        slots=("nsubj", "V", "dobj", "advmod"),
        band_freq=0,
    )
    slot_fillers = [
        ("nsubj", ["cat"], {"cat": 1.0}),
        ("V", ["cut"], {}),
        ("dobj", ["cake"], {"cake": 2.0}),
        ("advmod", ["quickly", "slowly"], {}),
    ]
    cart = skeleton_csp._enumerate_vectorized(
        shape=shape, slot_fillers=slot_fillers, word_axes={},
        weights=None, locked_slots={"V": "cut"},
    )
    # Both rows have identical content-slot fillers → dedup to 1
    assembled = skeleton_csp._dedup_and_assemble(cart, shape, top_k=2, over_fetch=1)
    assert len(assembled) == 1
  • [ ] Step 7.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 2 fail with AttributeError: '_dedup_and_assemble'.

  • [ ] Step 7.3: Add _dedup_and_assemble

Append to <spike>/skeleton_csp.py:

def _dedup_and_assemble(
    cart: pl.DataFrame,
    shape: SkeletonShape,
    *,
    top_k: int,
    over_fetch: int,
) -> list[tuple[float, dict[str, str], dict[str, float]]]:
    """Sort by total_score desc, dedup by content-slot tuple, truncate,
    and assemble the (total, fillers, components) tuple shape that
    solve_shape's ccomp-resolution loop expects.

    Drops 0-score pmi_<slot> entries from the components dict to match
    the python path's running-components-cleanup behavior.
    """
    if cart.height == 0:
        return []
    content_keys = list(shape.content_slots)
    deduped = (
        cart
        .sort("total_score", descending=True)
        .unique(subset=content_keys, keep="first", maintain_order=True)
        .head(top_k * over_fetch)
    )

    # Identify score columns to copy into components
    score_cols = [
        c for c in cart.columns
        if (c.startswith("pmi_") or c == "adv_sentinel" or c.startswith("include_")
            or c.startswith("bound_boost_") or c.startswith("contrastive_"))
        and c != "total_score"
    ]
    out: list[tuple[float, dict[str, str], dict[str, float]]] = []
    for row in deduped.iter_rows(named=True):
        fillers = {s: row[s] for s in shape.slots}
        components: dict[str, float] = {}
        for c in score_cols:
            v = float(row[c])
            # Match python path: 0-score pmi entries are dropped
            if c.startswith("pmi_") and v == 0.0:
                continue
            components[c] = v
        out.append((float(row["total_score"]), fillers, components))
    return out
  • [ ] Step 7.4: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 13 passed.

  • [ ] Step 7.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: add _dedup_and_assemble (vectorized → list[tuple])

Sorts by total_score desc, deduplicates by content-slot tuple via
polars unique(maintain_order=True), and assembles the legacy
(total, fillers, components) tuple shape. Drops 0-score pmi_<slot>
entries from components to match the python path.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 8: Wire vectorized path into solve_shape

Files: - Modify: <spike>/skeleton_csp.py

Adds the routing in solve_shape: vectorized when no contrastive, fallback otherwise.

  • [ ] Step 8.1: Update solve_shape to route

In solve_shape, find the block from Task 1.3:

    initial_locks: dict[str, str] = {"V": verb}
    if locked_slots:
        initial_locks.update(locked_slots)
    best_by_content = _enumerate_python_fallback(
        shape=shape,
        slot_fillers=slot_fillers,
        word_axes=word_axes,
        cross_axes=cross_axes,
        word_df=word_df,
        weights=weights,
        locked_slots=initial_locks,
    )

    deduped = sorted(best_by_content.values(), key=lambda t: t[0], reverse=True)

Replace with:

    initial_locks: dict[str, str] = {"V": verb}
    if locked_slots:
        initial_locks.update(locked_slots)

    if _should_use_vectorized(cross_axes=cross_axes):
        cart = _enumerate_vectorized(
            shape=shape,
            slot_fillers=slot_fillers,
            word_axes=word_axes,
            weights=weights,
            locked_slots=initial_locks,
        )
        # _dedup_and_assemble already sorts + dedups + truncates.
        # The downstream ccomp-resolution loop expects an iterable of
        # (total, fillers, components) tuples, like deduped. We over-
        # fetch here only when ccomp resolution may filter; otherwise
        # over_fetch=1 is fine because we already truncated to top_k.
        over_fetch = 4 if "ccomp" in shape.slots else 1
        deduped = _dedup_and_assemble(cart, shape, top_k=top_k, over_fetch=over_fetch)
    else:
        best_by_content = _enumerate_python_fallback(
            shape=shape,
            slot_fillers=slot_fillers,
            word_axes=word_axes,
            cross_axes=cross_axes,
            word_df=word_df,
            weights=weights,
            locked_slots=initial_locks,
        )
        deduped = sorted(best_by_content.values(), key=lambda t: t[0], reverse=True)

The over_fetch factor in the existing ccomp-resolution loop is already applied (via deduped[: top_k * over_fetch] further down). For the vectorized path, we've already truncated inside _dedup_and_assemble — but only to top_k * over_fetch, so the existing [:top_k * over_fetch] slicing further down is a no-op for vectorized (or a tighter cut if the assembled list is smaller). Behavior is identical.

  • [ ] Step 8.2: Smoke-test
cd packages/generation/research/2026-05-07-sentence-generation-paradigms && \
  uv run python -c "
import paradigm_3_csp
from phonolex_data.runtime.store import WordStore
from pathlib import Path
import polars as pl

repo = Path('../../../..').resolve()
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
spec_words = paradigm_3_csp.spec_lexicon(store, 'spec1')
top, stats = paradigm_3_csp.solve('cut', 'spec1', spec_words, sel_df, word_df=store.df)
print(f'top-1: {top[0][\"sentence\"]}  total={top[0][\"total_score\"]:.3f}')
print(f'components: {top[0][\"score_components\"]}')
"

Expected: a sentence printed, components dict has pmi_nsubj, pmi_dobj, pmi_advmod (or adv_sentinel).

  • [ ] Step 8.3: Run cache tests + new tests
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 25 + 13 = 38 passed.

  • [ ] Step 8.4: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py
git commit -m "$(cat <<'EOF'
PHON-104: wire vectorized path into solve_shape

solve_shape now routes between _enumerate_vectorized (no contrastive)
and _enumerate_python_fallback (contrastive present, or test override).
Cache tests + new vectorized unit tests both pass; smoke-test produces
a sensible top-1 sentence end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 9: Equivalence tests over the PHON-95 acceptance probes

Files: - Modify: <spike>/test_vectorized_enumeration.py

The strongest tests in this plan: parameterized over the canonical probe matrix, verifying bit-identical output.

  • [ ] Step 9.1: Add session-scoped fixtures + the equivalence test

Append to <spike>/test_vectorized_enumeration.py (add fixtures at the top of the file, after the existing imports):

@pytest.fixture(scope="session")
def store():
    from phonolex_data.runtime.store import WordStore
    repo_root = Path(__file__).resolve().parents[4]
    return WordStore.from_parquet(repo_root / "data" / "runtime" / "words.parquet")


@pytest.fixture(scope="session")
def sel_df():
    repo_root = Path(__file__).resolve().parents[4]
    return pl.read_parquet(repo_root / "data" / "runtime" / "selectional.parquet")

Then append the equivalence test at the end:

@pytest.mark.parametrize("verb,spec_id,arg_structure", [
    ("cut",   "spec1", "nsubj,V,dobj"),
    ("cut",   "spec1", "nsubj,V,dobj,advmod"),
    ("chase", "spec1", "nsubj,V,dobj,advmod"),
    ("melt",  "spec6", "nsubj,V,dobj,advmod"),
    ("eat",   "spec1", "nsubj,V,dobj,advmod"),
    ("fill",  "spec1", "nsubj,V,dobj,advmod"),
])
def test_vectorized_matches_python(store, sel_df, verb, spec_id, arg_structure):
    """Bit-identical top-K output between vectorized and python paths."""
    import paradigm_3_csp
    from skeleton_csp import (
        SkeletonShape,
        parse_arg_structure,
        solve_shape,
        _force_python_path,
    )

    spec_words = paradigm_3_csp.spec_lexicon(store, spec_id)
    shape = SkeletonShape(arg_structure, parse_arg_structure(arg_structure), 0)

    common = dict(
        verb=verb,
        domain_words=spec_words,
        sel_df=sel_df,
        band="fineweb_adult",
        word_axes={},
        cross_axes={},
        word_df=store.df,
        top_k=8,
    )

    vec_out = solve_shape(shape, **common)
    with _force_python_path():
        py_out = solve_shape(shape, **common)

    assert len(vec_out) == len(py_out), (
        f"length mismatch: vec={len(vec_out)} py={len(py_out)}"
    )
    for v, p in zip(vec_out, py_out):
        assert v["sentence"] == p["sentence"], f"sentence mismatch: {v['sentence']!r} vs {p['sentence']!r}"
        assert abs(v["total_score"] - p["total_score"]) < 1e-9, (
            f"total_score mismatch: {v['total_score']} vs {p['total_score']}"
        )
        assert v["score_components"] == p["score_components"], (
            f"components mismatch: {v['score_components']} vs {p['score_components']}"
        )
        assert v["fillers"] == p["fillers"]
  • [ ] Step 9.2: Run tests, verify all pass
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 13 + 6 = 19 passed.

If any equivalence case fails, this is the moment to debug. Likely culprits: - Float-ordering differences (use a tie-breaking sort key in both paths if needed) - Component dict key handling (make sure 0-score drops are consistent) - Adverb sentinel logic (only fires when no real PMI table for the verb)

  • [ ] Step 9.3: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: equivalence tests over PHON-95 acceptance probe matrix

Parameterized over 6 canonical (verb, spec, arg_structure) probes.
Asserts bit-identical sentences, total_score (within 1e-9), and
score_components dicts between vectorized and python paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 10: Add _peek_domain_sizes + _build_solve_stats helpers

Files: - Modify: <spike>/paradigm_3_csp.py

Stats helpers for the upcoming solve() delegation. No functional change yet.

  • [ ] Step 10.1: Add the helpers

Append to <spike>/paradigm_3_csp.py (just before def solve():

def _peek_domain_sizes(
    verb: str,
    band: str,
    filtered_spec: frozenset[str],
    sel_df: pl.DataFrame,
    include_adverb: bool,
) -> dict[str, int]:
    """Pre-cartesian domain sizes per slot for stats parity with the
    legacy solve() output. Cheap: PMI dict lookup + set intersection."""
    nsubj_pmi = pmi_lookup(sel_df, verb, "nsubj", band)
    dobj_pmi = pmi_lookup(sel_df, verb, "dobj", band)
    sizes = {
        "nsubj": len(set(nsubj_pmi.keys()) & filtered_spec),
        "dobj":  len(set(dobj_pmi.keys()) & filtered_spec),
    }
    if include_adverb:
        adv_pmi = _advmod_pmi_for_verb(verb, band)
        if adv_pmi:
            sizes["advmod"] = len(_filter_advmod_by_position(sorted(adv_pmi.keys()), "final"))
        else:
            fallback = _advmod_band_fallback(band)
            sizes["advmod"] = len(_filter_advmod_by_position(list(fallback), "final")) if fallback else 0
    else:
        sizes["advmod"] = 0
    return sizes


def _build_solve_stats(
    *,
    verb: str,
    spec_id: str,
    band: str,
    candidates: list[dict],
    trace: list[dict],
    word_axes: dict,
    cross_axes: dict,
    domain_sizes: dict[str, int],
) -> dict:
    """Synthesize the legacy stats dict shape from solve_shape's output."""
    return {
        "verb": verb,
        "spec_id": spec_id,
        "band": band,
        "nsubj_domain_size": domain_sizes.get("nsubj", 0),
        "dobj_domain_size":  domain_sizes.get("dobj", 0),
        "adv_domain_size":   domain_sizes.get("advmod", 0),
        "candidate_count":   len(candidates),
        "unique_pairs":      len({(c.get("nsubj"), c.get("dobj")) for c in candidates}),
        "domain_trace":      trace,
        "active_axes":       list(word_axes.keys()) + list(cross_axes.keys()),
    }
  • [ ] Step 10.2: Smoke-test imports
cd packages/generation/research/2026-05-07-sentence-generation-paradigms && \
  uv run python -c "import paradigm_3_csp; print('OK')"

Expected: OK.

  • [ ] Step 10.3: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py
git commit -m "$(cat <<'EOF'
PHON-104: add _peek_domain_sizes + _build_solve_stats helpers

Stats helpers for the upcoming solve() delegation. _peek_domain_sizes
computes pre-cartesian per-slot domain sizes via PMI dict ∩ filtered
spec; _build_solve_stats synthesizes the legacy stats dict shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 11: Migrate solve() to delegate to solve_shape

Files: - Modify: <spike>/paradigm_3_csp.py - Modify: <spike>/test_vectorized_enumeration.py

Replace solve()'s body with delegation. Public signature unchanged.

  • [ ] Step 11.1: Write failing test

Append to <spike>/test_vectorized_enumeration.py:

def test_solve_delegation_stats_match(store, sel_df):
    """Delegated solve() returns the legacy stats dict shape."""
    import paradigm_3_csp

    spec_words = paradigm_3_csp.spec_lexicon(store, "spec1")
    top, stats = paradigm_3_csp.solve(
        "cut", "spec1", spec_words, sel_df, word_df=store.df,
    )
    expected_keys = {
        "verb", "spec_id", "band",
        "nsubj_domain_size", "dobj_domain_size", "adv_domain_size",
        "candidate_count", "unique_pairs", "domain_trace", "active_axes",
    }
    assert set(stats.keys()) == expected_keys
    assert stats["verb"] == "cut"
    assert stats["spec_id"] == "spec1"
    assert stats["candidate_count"] == len(top)
    assert stats["unique_pairs"] >= 1
    assert stats["nsubj_domain_size"] > 0
    assert stats["dobj_domain_size"] > 0
    assert stats["adv_domain_size"] > 0


def test_solve_delegation_top_k_matches_solve_shape(store, sel_df):
    """solve() and solve_shape produce equivalent top-K candidates."""
    import paradigm_3_csp
    from skeleton_csp import SkeletonShape, parse_arg_structure, solve_shape

    spec_words = paradigm_3_csp.spec_lexicon(store, "spec1")
    top_solve, _ = paradigm_3_csp.solve(
        "cut", "spec1", spec_words, sel_df, word_df=store.df, top_k=5,
    )
    arg = "nsubj,V,dobj,advmod"
    shape = SkeletonShape(arg, parse_arg_structure(arg), 0)
    top_shape = solve_shape(
        shape, verb="cut", domain_words=spec_words, sel_df=sel_df,
        band="fineweb_adult", word_axes={}, cross_axes={},
        word_df=store.df, top_k=5,
    )
    assert [c["sentence"] for c in top_solve] == [c["sentence"] for c in top_shape]
  • [ ] Step 11.2: Run tests, verify they fail
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py::test_solve_delegation_stats_match -v

Expected: PASS for solve() returning stats — but the new solve() doesn't exist yet. Actually the OLD solve() may already return all keys. Let me check: looking at the plan/spec, the legacy solve() already returns all the keys we expect. So this test passes against the existing implementation. The migration's job is to preserve this contract — so the test serves as a regression guard.

The test_solve_delegation_top_k_matches_solve_shape test should currently fail because the old solve() has slightly different scoring nuances vs solve_shape's path. After migration, both go through solve_shape, so they'll match.

Run the new tests now to see baseline:

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Capture which test passes/fails. Either: - Both pass → you can skip Steps 11.3–11.4 and just commit the test. - One fails → proceed with the migration.

  • [ ] Step 11.3: Migrate solve() body to delegate

Find the existing solve() in paradigm_3_csp.py (around line 113):

def solve(
    verb: str,
    spec_id: str,
    spec_words: frozenset[str],
    sel_df: pl.DataFrame,
    *,
    constraints: list[Constraint] | None = None,
    word_df: pl.DataFrame | None = None,
    band: str = BAND,
    top_k: int = TOP_K,
    include_adverb: bool = True,
    weights: dict[str, float] | None = None,
) -> tuple[list[dict], dict]:
    """Return (top_K candidates as dicts, stats dict).
    ...
    """
    constraints = list(constraints or [])
    # ... existing 70+ line body ...

Replace the entire body (everything after the docstring) with:

    constraints = list(constraints or [])
    arg = "nsubj,V,dobj,advmod" if include_adverb else "nsubj,V,dobj"
    shape = SkeletonShape(arg, parse_arg_structure(arg), band_freq=0)

    filtered_spec, trace = _resolve_domain_words(spec_words, constraints, word_df)
    word_axes = get_per_word_axes(constraints, word_df)
    cross_axes = cross_slot_axes(constraints)
    domain_sizes = _peek_domain_sizes(verb, band, filtered_spec, sel_df, include_adverb)

    candidates = solve_shape(
        shape,
        verb=verb,
        domain_words=filtered_spec,
        sel_df=sel_df,
        band=band,
        word_axes=word_axes,
        cross_axes=cross_axes,
        word_df=word_df,
        weights=weights,
        top_k=top_k,
    )

    stats = _build_solve_stats(
        verb=verb, spec_id=spec_id, band=band,
        candidates=candidates, trace=trace,
        word_axes=word_axes, cross_axes=cross_axes,
        domain_sizes=domain_sizes,
    )
    return candidates, stats

You'll need to add imports at the top of paradigm_3_csp.py if not already present:

from skeleton_csp import (
    SkeletonShape,
    parse_arg_structure,
    solve_shape,
    # ... other existing imports
)

Verify these are already imported (likely some are).

  • [ ] Step 11.4: Run all tests
cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py -v

Expected: 25 + 21 = 46 passed.

  • [ ] Step 11.5: Smoke-test paradigm_3_csp demos still work
cd packages/generation/research/2026-05-07-sentence-generation-paradigms && \
  uv run python -c "
import paradigm_3_csp
from phonolex_data.runtime.store import WordStore
from pathlib import Path
import polars as pl

repo = Path('../../../..').resolve()
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
spec_words = paradigm_3_csp.spec_lexicon(store, 'spec1')
top, stats = paradigm_3_csp.solve('cut', 'spec1', spec_words, sel_df, word_df=store.df)
print(f'top-1: {top[0][\"sentence\"]}')
print(f'stats nsubj_domain_size={stats[\"nsubj_domain_size\"]}, candidate_count={stats[\"candidate_count\"]}')
"

Expected: a sentence printed, stats fields present.

  • [ ] Step 11.6: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_vectorized_enumeration.py
git commit -m "$(cat <<'EOF'
PHON-104: migrate solve() to delegate to solve_shape

solve()'s body shrinks from 70+ lines of manual loops to a thin
wrapper that constructs a SkeletonShape, calls solve_shape, and
repackages the result into the legacy (top, stats) shape via the
new _build_solve_stats helper. Public signature unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 12: Bench script + record baseline

Files: - Create: <spike>/bench_enumeration.py - Modify: docs/superpowers/specs/2026-05-08-phon-104-csp-vectorize-enumeration-design.md

Bench the largest probe under both paths, record the speedup.

  • [ ] Step 12.1: Create the bench script

Create <spike>/bench_enumeration.py:

"""Bench vectorized vs python enumeration on the largest acceptance probe — PHON-104.

Run: uv run python research/2026-05-07-sentence-generation-paradigms/bench_enumeration.py
"""
from __future__ import annotations

import sys
import time
from pathlib import Path

import polars as pl

sys.path.insert(0, str(Path(__file__).parent))

import paradigm_3_csp
from phonolex_data.runtime.store import WordStore
from skeleton_csp import (
    SkeletonShape,
    _force_python_path,
    parse_arg_structure,
    solve_shape,
)


def _load_data() -> tuple[WordStore, pl.DataFrame]:
    repo_root = Path(__file__).resolve().parents[4]
    store = WordStore.from_parquet(repo_root / "data" / "runtime" / "words.parquet")
    sel_df = pl.read_parquet(repo_root / "data" / "runtime" / "selectional.parquet")
    return store, sel_df


def _run_probe(verb: str, spec_id: str, store: WordStore, sel_df: pl.DataFrame, force_python: bool) -> tuple[float, int]:
    """Return (wall_clock_seconds, num_candidates)."""
    spec_words = paradigm_3_csp.spec_lexicon(store, spec_id)
    arg = "nsubj,V,dobj,advmod"
    shape = SkeletonShape(arg, parse_arg_structure(arg), 0)
    common = dict(
        verb=verb, domain_words=spec_words, sel_df=sel_df,
        band="fineweb_adult", word_axes={}, cross_axes={},
        word_df=store.df, top_k=8,
    )
    if force_python:
        with _force_python_path():
            t0 = time.perf_counter()
            top = solve_shape(shape, **common)
            elapsed = time.perf_counter() - t0
    else:
        t0 = time.perf_counter()
        top = solve_shape(shape, **common)
        elapsed = time.perf_counter() - t0
    return elapsed, len(top)


def main() -> None:
    print("Loading WordStore + selectional.parquet…")
    store, sel_df = _load_data()

    probes = [
        ("melt",  "spec6"),
        ("cut",   "spec1"),
        ("chase", "spec1"),
        ("eat",   "spec1"),
        ("fill",  "spec1"),
    ]

    print(f"\n{'Probe':<20}{'Vec (s)':>10}{'Py (s)':>10}{'Speedup':>10}")
    print("-" * 50)
    total_vec = 0.0
    total_py = 0.0
    for verb, spec_id in probes:
        # Warm both paths once to factor out import / compile overhead
        _run_probe(verb, spec_id, store, sel_df, force_python=False)
        _run_probe(verb, spec_id, store, sel_df, force_python=True)
        # Real timing
        vec_t, vec_n = _run_probe(verb, spec_id, store, sel_df, force_python=False)
        py_t,  py_n  = _run_probe(verb, spec_id, store, sel_df, force_python=True)
        speedup = py_t / vec_t if vec_t > 0 else float("inf")
        print(f"{verb} × {spec_id:<10}{vec_t:>10.3f}{py_t:>10.3f}{speedup:>10.2f}x")
        total_vec += vec_t
        total_py += py_t
    overall = total_py / total_vec if total_vec > 0 else float("inf")
    print("-" * 50)
    print(f"{'TOTAL':<20}{total_vec:>10.3f}{total_py:>10.3f}{overall:>10.2f}x")


if __name__ == "__main__":
    main()
  • [ ] Step 12.2: Run the bench
cd packages/generation && uv run python research/2026-05-07-sentence-generation-paradigms/bench_enumeration.py

Capture the output. Expected: vectorized path significantly faster on melt × spec6 (the 273K-cartesian probe). Other probes have smaller cartesians so their speedup may be smaller.

  • [ ] Step 12.3: Append baseline numbers to the spec

Open docs/superpowers/specs/2026-05-08-phon-104-csp-vectorize-enumeration-design.md and append at the END:

## Empirical baseline (recorded 2026-05-08)

From `bench_enumeration.py`:

| Probe | Vec (s) | Py (s) | Speedup |
|---|---|---|---|
| melt × spec6 | <FILL> | <FILL> | <FILL>x |
| cut × spec1 | <FILL> | <FILL> | <FILL>x |
| chase × spec1 | <FILL> | <FILL> | <FILL>x |
| eat × spec1 | <FILL> | <FILL> | <FILL>x |
| fill × spec1 | <FILL> | <FILL> | <FILL>x |
| **Total** | **<FILL>** | **<FILL>** | **<FILL>x** |

Replace each <FILL> with actual numbers from the bench output.

  • [ ] Step 12.4: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/bench_enumeration.py \
        docs/superpowers/specs/2026-05-08-phon-104-csp-vectorize-enumeration-design.md
git commit -m "$(cat <<'EOF'
PHON-104: bench enumeration + record baseline speedup

bench_enumeration.py compares vectorized vs forced-python paths on the
PHON-95 acceptance probe matrix. Numbers folded into the design spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Done

After Task 12 commits, PHON-104 closes. The CSP enumeration is vectorized; solve() delegates to solve_shape; ContrastiveConstraint requests still use the Python fallback (PHON-106 reworks contrastive scoring). PHON-105 (hybrid PPMI + raw frequency for verbal slots) is unblocked.