PHON-112 — Pair-driven CSP Implementation Plan¶

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Replace verb-fixed, skeleton-driven sentence resolver with constraint-driven resolver where verb is a constrained slot and contrastive constraints drive resolution via a selectional self-join.

Architecture: Constraint-filtered lexicon → (optional) pair filler set → selectional self-join → skeleton host filter → render → reranker. Verb is just-another-slot. Single-sentence scope; paragraphs follow in PHON-113.

Tech Stack: Polars eager-mode joins, pytest-driven TDD, frozen dataclasses for constraint API.

File map¶

Files created (new): - <spike>/verb_candidates.py — verb candidate set helpers (POS filter + selectional mass index) - <spike>/test_pair_driven_solve.py — new test suite for the rewritten path

Files modified: - <spike>/skeleton_csp.py — solve_shape rewrite, _load_pairs_for_request column rename, retire linked-slot mode - <spike>/paradigm_3_csp.py — solve() signature rewrite (drop verb positional, accept WordStore) - <spike>/constraint_surface.py — add slots param to MinpairConstraint/MaxoppConstraint - <spike>/test_contrastive_scorers.py — rewrite to assert against join-driven path - <spike>/build_judging_set.py — drop hardcoded VERBS list

Where <spike> = /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/.

Tests run via cd /Users/jneumann/Repos/PhonoLex/packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v from repo root or relative path.

Task 1: Add `slots` parameter to MinpairConstraint and MaxoppConstraint¶

Files: - Modify: <spike>/constraint_surface.py

The new slots parameter is optional and defaults to None (= "let the join decide role pair"). Explicit pairs like slots=("V", "dobj") filter the join to that role pair.

[ ] Step 1.1: Write failing test

Append to <spike>/test_contrastive_scorers.py (this becomes part of the rewrite in later tasks):

def test_minpair_constraint_accepts_slots_kwarg():
    from constraint_surface import MinpairConstraint
    c = MinpairConstraint(phoneme1="k", phoneme2="b", position="initial", slots=("V", "dobj"))
    assert c.slots == ("V", "dobj")

def test_minpair_constraint_default_slots_is_none():
    from constraint_surface import MinpairConstraint
    c = MinpairConstraint(phoneme1="k", phoneme2="b")
    assert c.slots is None

def test_maxopp_constraint_accepts_slots_kwarg():
    from constraint_surface import MaxoppConstraint
    c = MaxoppConstraint(phoneme1="k", phoneme2="m", position="initial", slots=("V", "nsubj"))
    assert c.slots == ("V", "nsubj")

[ ] Step 1.2: Run, verify fail

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py::test_minpair_constraint_accepts_slots_kwarg -v

Expected: TypeError on unexpected keyword arg slots.

[ ] Step 1.3: Add slots field

In <spike>/constraint_surface.py, locate MinpairConstraint and MaxoppConstraint. Add slots: tuple[str, str] | None = None to each (default None = let join decide):

@dataclass(frozen=True)
class MinpairConstraint:
    phoneme1: str
    phoneme2: str
    position: Literal["initial", "medial", "final", "any"] = "any"
    slots: tuple[str, str] | None = None
    type: Literal["contrastive_minpair"] = "contrastive_minpair"


@dataclass(frozen=True)
class MaxoppConstraint:
    phoneme1: str
    phoneme2: str
    position: Literal["initial", "medial", "final", "any"] = "any"
    min_sonorant_diff: float = 0.5
    slots: tuple[str, str] | None = None
    type: Literal["contrastive_maxopp"] = "contrastive_maxopp"

[ ] Step 1.4: Run, verify pass

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py -k "constraint_accepts_slots or default_slots" -v

Expected: 3 passed.

[ ] Step 1.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/constraint_surface.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py
git commit -m "$(cat <<'EOF'
PHON-112: add slots kwarg to Minpair/MaxoppConstraint

Default slots=None means "let the join decide role pair". Explicit
slots=("V", "dobj") restricts the join to that pair. Default-None
preserves PHON-106 v1 behavior for callers that don't migrate.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 2: Rename `_load_pairs_for_request` columns to filler_a/filler_b¶

Files: - Modify: <spike>/skeleton_csp.py - Modify: <spike>/test_contrastive_scorers.py

The helper currently emits (nsubj, dobj, feature_distance, sonorant_diff). The new join doesn't know which roles will be assigned yet — column names must be neutral.

[ ] Step 2.1: Update existing helper tests for new column names

In <spike>/test_contrastive_scorers.py, change the three Task 7 helper tests (test_load_pairs_for_minpair_basic, test_load_pairs_for_maxopp_filters_sonorant_diff, test_load_pairs_emits_both_orientations) to use filler_a/filler_b:

# In test_load_pairs_for_minpair_basic:
assert set(pairs_df.columns) >= {"filler_a", "filler_b", "feature_distance", "sonorant_diff"}
for a, b in zip(pairs_df["filler_a"].to_list(), pairs_df["filler_b"].to_list()):
    assert a in {"cat", "bat", "kid", "bid", "key", "bee"}
    assert b in {"cat", "bat", "kid", "bid", "key", "bee"}

# In test_load_pairs_emits_both_orientations:
a_set = set(pairs_df["filler_a"].to_list())
b_set = set(pairs_df["filler_b"].to_list())
assert a_set & b_set, "Expected overlap between filler_a and filler_b sets"

[ ] Step 2.2: Run, verify fail

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py::test_load_pairs_for_minpair_basic -v

Expected: AssertionError — columns are nsubj/dobj, not filler_a/filler_b.

[ ] Step 2.3: Update _load_pairs_for_request

In <spike>/skeleton_csp.py, locate _load_pairs_for_request. Replace the forward/backward projections:

    forward = base.select([
        pl.col("word1").alias("filler_a"),
        pl.col("word2").alias("filler_b"),
        pl.col("feature_distance"),
        pl.col("sonorant_diff"),
    ])
    backward = base.select([
        pl.col("word2").alias("filler_a"),
        pl.col("word1").alias("filler_b"),
        pl.col("feature_distance"),
        pl.col("sonorant_diff"),
    ])
    return pl.concat([forward, backward])

Also update the empty-frame fallback at the top of the function:

    if pairs_df is None:
        return pl.DataFrame({
            "filler_a": pl.Series(dtype=pl.Utf8),
            "filler_b": pl.Series(dtype=pl.Utf8),
            "feature_distance": pl.Series(dtype=pl.Float32),
            "sonorant_diff": pl.Series(dtype=pl.Float32),
        })

[ ] Step 2.4: Run, verify pass

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py -k "load_pairs" -v

Expected: 3 passed.

(Note: test_minpair_linked_slot_realization and friends still expect nsubj/dobj columns — those tests get rewritten in Task 9. They will fail in this intermediate state.)

[ ] Step 2.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py
git commit -m "$(cat <<'EOF'
PHON-112: rename _load_pairs_for_request output to filler_a/filler_b

Role assignment is determined by the selectional join, not by the
pair loader, so the column names should not pre-commit to nsubj/dobj.
3 helper tests updated; the realization/error tests still expect
nsubj/dobj and will be rewritten in Task 9.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 3: Add verb_candidates helper module¶

Files: - Create: <spike>/verb_candidates.py - Create: <spike>/test_verb_candidates.py

Computes verb candidate set: lexicon ∩ POS=verb ∩ has_selectional_mass.

[ ] Step 3.1: Write failing test

Create <spike>/test_verb_candidates.py:

"""Tests for PHON-112 verb candidate set."""
from __future__ import annotations

import sys
from pathlib import Path

import pytest

sys.path.insert(0, str(Path(__file__).parent))

import polars as pl
from phonolex_data.runtime.store import WordStore

import verb_candidates


@pytest.fixture(scope="session")
def store():
    repo_root = Path(__file__).resolve().parents[4]
    return WordStore.from_parquet(repo_root / "data" / "runtime" / "words.parquet")


@pytest.fixture(scope="session")
def sel_df():
    repo_root = Path(__file__).resolve().parents[4]
    return pl.read_parquet(repo_root / "data" / "runtime" / "selectional.parquet")


def test_verb_candidates_returns_known_verbs(store, sel_df):
    """Common verbs like 'cut' and 'run' appear in the candidate set
    for a generic spec."""
    candidates = verb_candidates.compute_verb_candidates(
        spec_words=frozenset(),  # no spec restriction → all verbs
        word_df=store.df,
        sel_df=sel_df,
        band="fineweb_adult",
        min_selectional_rows=10,
    )
    assert "cut" in candidates
    assert "run" in candidates
    assert "see" in candidates


def test_verb_candidates_filters_by_spec(store, sel_df):
    """When spec_words is restrictive, only verbs in spec survive."""
    candidates = verb_candidates.compute_verb_candidates(
        spec_words=frozenset({"cut", "run", "make"}),  # restricted
        word_df=store.df,
        sel_df=sel_df,
        band="fineweb_adult",
        min_selectional_rows=10,
    )
    assert candidates <= frozenset({"cut", "run", "make"})
    assert "cut" in candidates


def test_verb_candidates_filters_by_selectional_mass(store, sel_df):
    """A verb with no rows for the band gets excluded."""
    candidates = verb_candidates.compute_verb_candidates(
        spec_words=frozenset(),
        word_df=store.df,
        sel_df=sel_df,
        band="phonbank_age_0_2",  # narrower band
        min_selectional_rows=1000,  # high bar
    )
    # A reasonable number of verbs should still survive, but not all of fineweb
    n_fineweb = len(verb_candidates.compute_verb_candidates(
        spec_words=frozenset(),
        word_df=store.df,
        sel_df=sel_df,
        band="fineweb_adult",
        min_selectional_rows=10,
    ))
    n_narrow = len(candidates)
    assert n_narrow < n_fineweb

[ ] Step 3.2: Run, verify fail

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_verb_candidates.py -v

Expected: ModuleNotFoundError on import verb_candidates.

[ ] Step 3.3: Implement the helper

Create <spike>/verb_candidates.py:

"""PHON-112: verb candidate set computation.

A verb candidate is any word in the lexicon that:
1. Has POS=VERB (from words.parquet pos / all_pos columns)
2. Has at least min_selectional_rows entries in selectional.parquet for the band
3. (Optionally) is in spec_words

The candidate set is the verb-slot allow-set for `solve()`. Constraint
filtering (Exclude, Bound, etc.) is applied UPSTREAM via filtered_lexicon
before this helper runs, so the spec_words argument here represents the
already-constraint-filtered lexicon, not the raw spec.
"""
from __future__ import annotations

import polars as pl


def compute_verb_candidates(
    *,
    spec_words: frozenset[str],
    word_df: pl.DataFrame,
    sel_df: pl.DataFrame,
    band: str,
    min_selectional_rows: int = 10,
) -> frozenset[str]:
    """Return the set of verb candidates satisfying the lexicon, POS, and
    selectional-mass requirements.

    spec_words: lexicon to restrict to. Empty frozenset means no restriction.
    min_selectional_rows: minimum number of (role, filler) entries in the
        target band before a verb is admissible. Cuts off long-tail verbs.
    """
    # POS filter: keep words tagged VERB
    pos_filter = (
        (pl.col("pos") == "VERB")
        | (pl.col("all_pos").list.contains("VERB"))
    )
    verb_words = word_df.filter(pos_filter).select("word")
    pos_set = set(verb_words["word"].to_list())

    # Selectional-mass filter: count (role, filler) rows per verb in the band
    mass = (
        sel_df
        .filter(pl.col("band") == band)
        .group_by("verb")
        .agg(pl.len().alias("n_rows"))
        .filter(pl.col("n_rows") >= min_selectional_rows)
    )
    mass_set = set(mass["verb"].to_list())

    candidates = pos_set & mass_set
    if spec_words:
        candidates &= spec_words

    return frozenset(candidates)

[ ] Step 3.4: Run, verify pass

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_verb_candidates.py -v

Expected: 3 passed.

[ ] Step 3.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/verb_candidates.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_verb_candidates.py
git commit -m "$(cat <<'EOF'
PHON-112: add verb_candidates helper

Computes the verb candidate set for the constraint-driven solver:
lexicon ∩ POS=VERB ∩ has_selectional_mass(band, ≥ min_rows). Spec
words restriction is applied if non-empty (already constraint-filtered
by the caller).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 4: Implement contrastive selectional self-join¶

Files: - Create: <spike>/pair_driven.py - Create test: append to <spike>/test_pair_driven_solve.py

Core algorithm: produce (verb, role_a, w1, role_b, w2, band, ppmi_a, ppmi_b, feature_distance, sonorant_diff) rows from the constraint-filtered pair frame and selectional table.

[ ] Step 4.1: Write failing test

Create <spike>/test_pair_driven_solve.py:

"""Tests for PHON-112 pair-driven CSP."""
from __future__ import annotations

import sys
from pathlib import Path

import pytest

sys.path.insert(0, str(Path(__file__).parent))

import polars as pl
from phonolex_data.runtime.store import WordStore

import pair_driven


@pytest.fixture(scope="session")
def store():
    repo_root = Path(__file__).resolve().parents[4]
    return WordStore.from_parquet(repo_root / "data" / "runtime" / "words.parquet")


@pytest.fixture(scope="session")
def sel_df():
    repo_root = Path(__file__).resolve().parents[4]
    return pl.read_parquet(repo_root / "data" / "runtime" / "selectional.parquet")


def test_resolve_contrastive_join_returns_complete_specs(store, sel_df):
    """Every join row contains a verb, two distinct roles, two fillers,
    and a band — a complete sentence spec (sans skeleton)."""
    pair_frame = pl.DataFrame({
        "filler_a": ["cores", "kids"],     # /k.../z final and /k.../z final
        "filler_b": ["cord", "kid"],       # /k.../d final and /k.../d final
        "feature_distance": [0.8, 0.8],
        "sonorant_diff": [0.0, 0.0],
    })
    joined = pair_driven.resolve_contrastive_join(
        pair_frame=pair_frame,
        sel_df=sel_df,
        verb_candidates=frozenset({"cut", "see", "make", "find"}),
        band="fineweb_adult",
    )
    assert joined.height > 0
    expected_cols = {"verb", "role_a", "filler_a", "role_b", "filler_b",
                     "ppmi_a", "ppmi_b", "feature_distance", "sonorant_diff"}
    assert set(joined.columns) >= expected_cols
    # Roles must differ
    assert (joined["role_a"] != joined["role_b"]).all()
    # All verbs must be in the candidate set
    assert set(joined["verb"].to_list()) <= {"cut", "see", "make", "find"}


def test_resolve_contrastive_join_filters_by_band(store, sel_df):
    """Different bands produce different join sizes."""
    pair_frame = pl.DataFrame({
        "filler_a": ["cores"],
        "filler_b": ["cord"],
        "feature_distance": [0.8],
        "sonorant_diff": [0.0],
    })
    fineweb = pair_driven.resolve_contrastive_join(
        pair_frame=pair_frame,
        sel_df=sel_df,
        verb_candidates=frozenset({"cut", "see", "make"}),
        band="fineweb_adult",
    )
    childes = pair_driven.resolve_contrastive_join(
        pair_frame=pair_frame,
        sel_df=sel_df,
        verb_candidates=frozenset({"cut", "see", "make"}),
        band="childes_age_2_5",
    )
    # Different bands → different row counts (one might be 0)
    assert fineweb.height != childes.height or fineweb.height == 0


def test_resolve_contrastive_join_empty_pair_frame_returns_empty(sel_df):
    """An empty pair frame produces an empty join."""
    pair_frame = pl.DataFrame({
        "filler_a": [],
        "filler_b": [],
        "feature_distance": [],
        "sonorant_diff": [],
    }, schema={"filler_a": pl.Utf8, "filler_b": pl.Utf8,
               "feature_distance": pl.Float32, "sonorant_diff": pl.Float32})
    joined = pair_driven.resolve_contrastive_join(
        pair_frame=pair_frame,
        sel_df=sel_df,
        verb_candidates=frozenset({"cut"}),
        band="fineweb_adult",
    )
    assert joined.height == 0


def test_resolve_contrastive_join_filters_by_slots_kwarg(store, sel_df):
    """When slots=("V", "dobj") is passed, joined rows have role_a=V (or its
    equivalent) and role_b=dobj — but verb-as-role isn't a row in selectional;
    this case requires a different code path tested in Task 5/8."""
    # This is a placeholder test for Task 8's slots-aware join.
    # For now, slots=None should produce all role pairs.
    pair_frame = pl.DataFrame({
        "filler_a": ["cores"],
        "filler_b": ["cord"],
        "feature_distance": [0.8],
        "sonorant_diff": [0.0],
    })
    joined = pair_driven.resolve_contrastive_join(
        pair_frame=pair_frame,
        sel_df=sel_df,
        verb_candidates=frozenset({"cut"}),
        band="fineweb_adult",
        slots=None,
    )
    # Both roles should be in the standard role inventory
    if joined.height > 0:
        roles = set(joined["role_a"].to_list()) | set(joined["role_b"].to_list())
        for r in roles:
            assert r in {"nsubj", "dobj", "iobj", "xcomp", "ccomp", "advmod"} or r.startswith("pobj_")

[ ] Step 4.2: Run, verify fail

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v

Expected: ModuleNotFoundError on import pair_driven.

[ ] Step 4.3: Implement resolve_contrastive_join

Create <spike>/pair_driven.py:

"""PHON-112: pair-driven CSP core.

Selectional self-join: given a constraint-filtered pair frame and a
verb candidate set, produce (verb, role_a, w1, role_b, w2, band, ppmi)
rows by joining selectional × selectional × pair_frame.
"""
from __future__ import annotations

import polars as pl


def resolve_contrastive_join(
    *,
    pair_frame: pl.DataFrame,
    sel_df: pl.DataFrame,
    verb_candidates: frozenset[str],
    band: str,
    slots: tuple[str, str] | None = None,
) -> pl.DataFrame:
    """Self-join sel × sel filtered to pair_frame's filler combinations.

    Returns rows with columns:
        verb, role_a, filler_a, role_b, filler_b, band,
        ppmi_a, ppmi_b, feature_distance, sonorant_diff

    Each row is a complete sentence spec sans skeleton: the verb, the
    two role-positioned fillers, and the corresponding PMIs.

    slots: if given, restrict role_a/role_b to the named pair (e.g.,
        ("nsubj", "dobj")). When None, all role pairs are admitted.
    """
    if pair_frame.height == 0:
        return pl.DataFrame(schema={
            "verb": pl.Utf8,
            "role_a": pl.Utf8,
            "filler_a": pl.Utf8,
            "role_b": pl.Utf8,
            "filler_b": pl.Utf8,
            "band": pl.Utf8,
            "ppmi_a": pl.Float32,
            "ppmi_b": pl.Float32,
            "feature_distance": pl.Float32,
            "sonorant_diff": pl.Float32,
        })

    # Pre-filter selectional: only rows for this band, only fillers in pair_frame,
    # only verbs in candidate set
    pair_words = (
        pl.concat([pair_frame["filler_a"], pair_frame["filler_b"]]).unique()
    )
    sel_window = (
        sel_df
        .filter(pl.col("band") == band)
        .filter(pl.col("filler").is_in(pair_words.to_list()))
        .filter(pl.col("verb").is_in(list(verb_candidates)))
    )

    # Self-join on verb (role differs)
    side_a = sel_window.rename({
        "role": "role_a", "filler": "filler_a", "ppmi": "ppmi_a",
    }).select(["verb", "role_a", "filler_a", "ppmi_a"])
    side_b = sel_window.rename({
        "role": "role_b", "filler": "filler_b", "ppmi": "ppmi_b",
    }).select(["verb", "role_b", "filler_b", "ppmi_b"])
    cross = side_a.join(side_b, on="verb").filter(pl.col("role_a") != pl.col("role_b"))

    # Join with pair_frame to keep only valid (filler_a, filler_b) combinations
    joined = cross.join(pair_frame, on=["filler_a", "filler_b"], how="inner")

    # Slots restriction (if given)
    if slots is not None:
        slot_a, slot_b = slots
        joined = joined.filter(
            ((pl.col("role_a") == slot_a) & (pl.col("role_b") == slot_b))
            | ((pl.col("role_a") == slot_b) & (pl.col("role_b") == slot_a))
        )

    # Add band column for downstream consumers
    joined = joined.with_columns(pl.lit(band).alias("band"))

    return joined.select([
        "verb", "role_a", "filler_a", "role_b", "filler_b", "band",
        "ppmi_a", "ppmi_b", "feature_distance", "sonorant_diff",
    ])

[ ] Step 4.4: Run, verify pass

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v

Expected: 4 passed.

[ ] Step 4.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-112: contrastive selectional self-join

resolve_contrastive_join produces (verb, role_a, w1, role_b, w2, band,
ppmis, feature_distance, sonorant_diff) rows by joining selectional ×
selectional × pair_frame. The verb falls out of the join, role pair
falls out of the join, no caller iteration. slots kwarg restricts to
a specific role pair when needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 5: Skeleton host filter¶

Files: - Modify: <spike>/pair_driven.py - Modify: <spike>/test_pair_driven_solve.py

Given joined rows + skeletons.parquet, produce candidate skeletons that can host the produced role pair.

[ ] Step 5.1: Write failing test

Append to <spike>/test_pair_driven_solve.py:

@pytest.fixture(scope="session")
def skeletons_df():
    skeletons_path = (
        Path(__file__).parent / "outputs" / "skeletons.parquet"
    )
    if not skeletons_path.exists():
        pytest.skip("skeletons.parquet not present")
    return pl.read_parquet(skeletons_path)


def test_select_host_skeletons_filters_by_role_coverage(skeletons_df):
    """Returned skeletons must contain both role_a and role_b."""
    hosts = pair_driven.select_host_skeletons(
        skeletons_df=skeletons_df,
        role_a="nsubj",
        role_b="dobj",
        band="fineweb_adult",
        top_k=5,
    )
    assert hosts.height > 0
    for arg_struct in hosts["arg_structure"].to_list():
        slots = arg_struct.split(",")
        assert "nsubj" in slots
        assert "dobj" in slots


def test_select_host_skeletons_ranks_by_freq(skeletons_df):
    """Top-K ordering is by freq desc."""
    hosts = pair_driven.select_host_skeletons(
        skeletons_df=skeletons_df,
        role_a="nsubj",
        role_b="dobj",
        band="fineweb_adult",
        top_k=5,
    )
    freqs = hosts["freq"].to_list()
    assert freqs == sorted(freqs, reverse=True)


def test_select_host_skeletons_filters_by_band(skeletons_df):
    """A skeleton from a different band is not returned."""
    hosts = pair_driven.select_host_skeletons(
        skeletons_df=skeletons_df,
        role_a="nsubj",
        role_b="dobj",
        band="fineweb_adult",
        top_k=10,
    )
    for b in hosts["band"].to_list():
        assert b == "fineweb_adult"

[ ] Step 5.2: Run, verify fail

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py::test_select_host_skeletons_filters_by_role_coverage -v

Expected: AttributeError on pair_driven.select_host_skeletons.

[ ] Step 5.3: Implement select_host_skeletons

Append to <spike>/pair_driven.py:

def select_host_skeletons(
    *,
    skeletons_df: pl.DataFrame,
    role_a: str,
    role_b: str,
    band: str,
    top_k: int = 5,
) -> pl.DataFrame:
    """Return top-K skeletons whose arg_structure covers {role_a, role_b}
    in the given band, ranked by freq desc.

    skeletons_df schema (from skeletons.parquet):
        band, arg_structure, pos_template, freq, verb_lemma_count, example
    """
    # arg_structure is a comma-joined string; split and check coverage
    return (
        skeletons_df
        .filter(pl.col("band") == band)
        .filter(pl.col("arg_structure").str.split(",").list.contains(role_a))
        .filter(pl.col("arg_structure").str.split(",").list.contains(role_b))
        .sort(["freq", "arg_structure"], descending=[True, False])
        .head(top_k)
    )

[ ] Step 5.4: Run, verify pass

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v

Expected: 7 passed (4 from Task 4 + 3 new).

[ ] Step 5.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-112: skeleton host filter

select_host_skeletons returns top-K skeletons from skeletons.parquet
whose arg_structure covers the produced role pair, restricted to the
target band and ranked by freq desc. Verb compatibility is implicit
from the upstream selectional join.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 6: Per-slot non-contrastive enumeration¶

Files: - Modify: <spike>/pair_driven.py - Modify: <spike>/test_pair_driven_solve.py

For requests without contrastive constraints, enumerate per-slot fillers across all verb candidates. The output shape mirrors resolve_contrastive_join so downstream consumers don't branch.

[ ] Step 6.1: Write failing test

Append to <spike>/test_pair_driven_solve.py:

def test_resolve_non_contrastive_returns_complete_specs(store, sel_df):
    """Non-contrastive enumeration produces (verb, role, filler, ppmi) rows
    across the verb candidate set."""
    rows = pair_driven.resolve_non_contrastive(
        sel_df=sel_df,
        verb_candidates=frozenset({"cut", "see", "make"}),
        per_slot_allow_set={
            "nsubj": frozenset({"cat", "dog", "kid"}),
            "dobj": frozenset({"bat", "ball", "bid"}),
        },
        band="fineweb_adult",
    )
    assert rows.height > 0
    # All verbs in candidate set
    assert set(rows["verb"].to_list()) <= {"cut", "see", "make"}
    # Per-slot allow sets respected
    for role, allow_set in [("nsubj", {"cat", "dog", "kid"}),
                             ("dobj", {"bat", "ball", "bid"})]:
        role_rows = rows.filter(pl.col("role") == role)
        if role_rows.height > 0:
            assert set(role_rows["filler"].to_list()) <= allow_set


def test_resolve_non_contrastive_filters_zero_ppmi(store, sel_df):
    """Rows with ppmi == 0 are excluded."""
    rows = pair_driven.resolve_non_contrastive(
        sel_df=sel_df,
        verb_candidates=frozenset({"cut"}),
        per_slot_allow_set={
            "nsubj": frozenset({"cat"}),
            "dobj": frozenset({"bat"}),
        },
        band="fineweb_adult",
    )
    if rows.height > 0:
        assert (rows["ppmi"] > 0).all()

[ ] Step 6.2: Run, verify fail

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py::test_resolve_non_contrastive_returns_complete_specs -v

Expected: AttributeError on pair_driven.resolve_non_contrastive.

[ ] Step 6.3: Implement resolve_non_contrastive

Append to <spike>/pair_driven.py:

def resolve_non_contrastive(
    *,
    sel_df: pl.DataFrame,
    verb_candidates: frozenset[str],
    per_slot_allow_set: dict[str, frozenset[str]],
    band: str,
) -> pl.DataFrame:
    """Enumerate (verb, role, filler, ppmi) rows where:
    - verb ∈ verb_candidates
    - filler ∈ per_slot_allow_set[role] for the row's role
    - ppmi > 0
    - band == band

    Output shape: long format (verb, role, filler, band, ppmi).
    Downstream consumers cross-join across roles to assemble candidates.
    """
    # Pre-filter sel by band + verbs
    sel_window = (
        sel_df
        .filter(pl.col("band") == band)
        .filter(pl.col("verb").is_in(list(verb_candidates)))
        .filter(pl.col("ppmi") > 0.0)
    )
    # Per-role filler allow-set filter
    role_filters = []
    for role, allow in per_slot_allow_set.items():
        role_filters.append(
            (pl.col("role") == role) & pl.col("filler").is_in(list(allow))
        )
    if not role_filters:
        return sel_window.head(0)
    combined = role_filters[0]
    for f in role_filters[1:]:
        combined = combined | f
    return sel_window.filter(combined).select(["verb", "role", "filler", "band", "ppmi"])

[ ] Step 6.4: Run, verify pass

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v

Expected: 9 passed.

[ ] Step 6.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-112: non-contrastive per-slot enumeration

resolve_non_contrastive emits (verb, role, filler, band, ppmi) rows
respecting per-slot allow sets and the verb candidate set, with
ppmi > 0 filter. Mirrors the long-format output of contrastive join
so the downstream assembler doesn't branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 7: Constraint dispatch — per-slot allow sets¶

Files: - Modify: <spike>/pair_driven.py - Modify: <spike>/test_pair_driven_solve.py

Resolve constraints to per-slot allow sets. ExcludeConstraint and BoundConstraint produce hard filters; IncludeConstraint and BoundBoostConstraint produce per-word axes (deferred to Task 9 where they wire into scoring).

[ ] Step 7.1: Write failing test

Append to <spike>/test_pair_driven_solve.py:

def test_constraint_dispatch_exclude_filters_lexicon(store):
    """ExcludeConstraint(/ɹ/) removes /ɹ/-containing words from all slots."""
    from constraint_surface import ExcludeConstraint

    spec_words = frozenset({"cat", "run", "rat", "bat", "kid"})
    allow_sets = pair_driven.resolve_per_slot_allow_sets(
        spec_words=spec_words,
        word_df=store.df,
        constraints=[ExcludeConstraint(phonemes=("ɹ",))],
        slot_types=("nsubj", "dobj", "V"),
    )
    # /ɹ/-containing words excluded from all slots
    for slot in ("nsubj", "dobj", "V"):
        assert "run" not in allow_sets[slot]
        assert "rat" not in allow_sets[slot]
        assert "cat" in allow_sets[slot]


def test_constraint_dispatch_bound_filters_lexicon(store):
    """BoundConstraint(aoa, max=6) excludes high-AoA words."""
    from constraint_surface import BoundConstraint

    spec_words = frozenset({"cat", "convoluted", "kid", "abstruse"})
    allow_sets = pair_driven.resolve_per_slot_allow_sets(
        spec_words=spec_words,
        word_df=store.df,
        constraints=[BoundConstraint(norm="aoa", max_value=6.0)],
        slot_types=("nsubj", "dobj"),
    )
    # cat (low AoA) should be in; convoluted (high AoA) shouldn't
    assert "cat" in allow_sets["nsubj"]
    # The exact AoA values are data-driven, so only assert the easy cases

[ ] Step 7.2: Run, verify fail

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py::test_constraint_dispatch_exclude_filters_lexicon -v

Expected: AttributeError on pair_driven.resolve_per_slot_allow_sets.

[ ] Step 7.3: Implement resolve_per_slot_allow_sets

Append to <spike>/pair_driven.py:

def resolve_per_slot_allow_sets(
    *,
    spec_words: frozenset[str],
    word_df: pl.DataFrame,
    constraints: list,
    slot_types: tuple[str, ...],
) -> dict[str, frozenset[str]]:
    """Apply hard constraints to spec_words, returning per-slot allow sets.

    Soft constraints (IncludeConstraint, BoundBoostConstraint) are not
    handled here — they produce per-word axes consumed at scoring time.
    """
    from constraint_surface import (
        ExcludeConstraint, BoundConstraint,
    )

    base_filtered = set(spec_words) if spec_words else set(word_df["word"].to_list())

    # Apply hard constraints
    for c in constraints:
        if isinstance(c, ExcludeConstraint):
            # Exclude words containing any banned phoneme
            banned = set(c.phonemes)
            filtered = (
                word_df
                .filter(pl.col("word").is_in(list(base_filtered)))
                .filter(
                    ~pl.col("phonemes").list.eval(
                        pl.element().is_in(list(banned))
                    ).list.any()
                )
            )
            base_filtered &= set(filtered["word"].to_list())
        elif isinstance(c, BoundConstraint):
            # Range filter on a norm column
            col = pl.col(c.norm)
            cond = pl.lit(True)
            if c.min_value is not None:
                cond = cond & (col >= c.min_value)
            if c.max_value is not None:
                cond = cond & (col <= c.max_value)
            cond = cond & col.is_not_null()
            filtered = (
                word_df
                .filter(pl.col("word").is_in(list(base_filtered)))
                .filter(cond)
            )
            base_filtered &= set(filtered["word"].to_list())

    base_set = frozenset(base_filtered)
    return {slot: base_set for slot in slot_types}

[ ] Step 7.4: Run, verify pass

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v

Expected: 11 passed.

[ ] Step 7.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-112: per-slot allow set resolution from hard constraints

resolve_per_slot_allow_sets applies ExcludeConstraint and
BoundConstraint to spec_words, returning a per-slot-type allow set.
Same allow set goes to all slot types (incl. V), so verb gets the
same phonological/norm filtering as nsubj/dobj.

Soft constraints (Include, BoundBoost) defer to scoring (Task 9).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 8: New `solve()` orchestrator¶

Files: - Modify: <spike>/pair_driven.py - Modify: <spike>/test_pair_driven_solve.py

The top-level solve() ties resolve_per_slot_allow_sets → verb_candidates → (contrastive vs non-contrastive branch) → skeleton host filter together.

This task adds the orchestrator. The render + score wiring stays out for now; tests assert on the structured intermediate result.

[ ] Step 8.1: Write failing test

Append to <spike>/test_pair_driven_solve.py:

def test_solve_minpair_end_to_end(store, sel_df, skeletons_df):
    """End-to-end solve with a minpair constraint produces structured
    candidates whose (filler_a, filler_b) is a real minimal pair."""
    from constraint_surface import MinpairConstraint
    import paradigm_3_csp

    spec_words = paradigm_3_csp.spec_lexicon(store, "spec1")
    candidates = pair_driven.solve(
        spec_words=spec_words,
        word_df=store.df,
        sel_df=sel_df,
        pairs_df=store.pairs_df,
        skeletons_df=skeletons_df,
        band="fineweb_adult",
        constraints=[MinpairConstraint(phoneme1="d", phoneme2="z", position="final")],
        top_k=8,
    )
    assert candidates, "expected candidates"
    # Each candidate should have the structured intermediate fields
    for c in candidates:
        assert "verb" in c
        assert "role_a" in c
        assert "filler_a" in c
        assert "role_b" in c
        assert "filler_b" in c
        assert "skeleton" in c
        assert "ppmi_total" in c


def test_solve_exclude_filters_verb(store, sel_df, skeletons_df):
    """ExcludeConstraint(/ɹ/) excludes /ɹ/-containing words including verbs."""
    from constraint_surface import ExcludeConstraint, MinpairConstraint
    import paradigm_3_csp

    spec_words = paradigm_3_csp.spec_lexicon(store, "spec1")
    candidates = pair_driven.solve(
        spec_words=spec_words,
        word_df=store.df,
        sel_df=sel_df,
        pairs_df=store.pairs_df,
        skeletons_df=skeletons_df,
        band="fineweb_adult",
        constraints=[
            ExcludeConstraint(phonemes=("ɹ",)),
            MinpairConstraint(phoneme1="d", phoneme2="z", position="final"),
        ],
        top_k=8,
    )
    if not candidates:
        pytest.skip("no surviving candidates after exclude — adjust phoneme")
    # No verb in any candidate should contain /ɹ/
    verbs = {c["verb"] for c in candidates}
    word_phonemes = {
        row["word"]: list(row["phonemes"])
        for row in store.df.filter(pl.col("word").is_in(list(verbs))).iter_rows(named=True)
    }
    for v in verbs:
        assert "ɹ" not in word_phonemes.get(v, []), (
            f"verb {v} contains /ɹ/, exclude constraint violated"
        )

[ ] Step 8.2: Run, verify fail

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py::test_solve_minpair_end_to_end -v

Expected: AttributeError on pair_driven.solve.

[ ] Step 8.3: Implement solve orchestrator

Append to <spike>/pair_driven.py:

def solve(
    *,
    spec_words: frozenset[str],
    word_df: pl.DataFrame,
    sel_df: pl.DataFrame,
    pairs_df: pl.DataFrame | None,
    skeletons_df: pl.DataFrame,
    band: str,
    constraints: list = (),
    locked_slots: dict[str, str] | None = None,
    top_k: int = 8,
) -> list[dict]:
    """Constraint-driven sentence resolver. See PHON-112 design doc."""
    from constraint_surface import (
        MinpairConstraint, MaxoppConstraint, MultoppConstraint,
    )

    locked_slots = dict(locked_slots or {})

    # Identify contrastive constraint (at most one)
    contrast_constraints = [
        c for c in constraints
        if isinstance(c, (MinpairConstraint, MaxoppConstraint))
    ]
    if len(contrast_constraints) > 1:
        raise ValueError("at most one contrastive constraint per request")
    multopp = [c for c in constraints if isinstance(c, MultoppConstraint)]
    if multopp:
        raise ValueError(
            "MultoppConstraint requires multi-sentence paragraph composition; "
            "deferred to PHON-113"
        )

    # 1. Per-slot allow sets (constraint-filtered lexicon)
    SLOT_TYPES = ("V", "nsubj", "dobj", "iobj")  # extend if pobj_X used
    allow_sets = resolve_per_slot_allow_sets(
        spec_words=spec_words,
        word_df=word_df,
        constraints=constraints,
        slot_types=SLOT_TYPES,
    )

    # 2. Verb candidates
    from verb_candidates import compute_verb_candidates
    verb_set = compute_verb_candidates(
        spec_words=allow_sets["V"],
        word_df=word_df,
        sel_df=sel_df,
        band=band,
    )
    if "V" in locked_slots:
        verb_set = frozenset({locked_slots["V"]}) & verb_set
        if not verb_set:
            return []

    # 3. Contrastive vs non-contrastive branch
    if contrast_constraints:
        cc = contrast_constraints[0]
        if pairs_df is None:
            raise ValueError("contrastive constraint requires pairs_df")
        # Reuse PHON-106 helper for filler set
        from skeleton_csp import _load_pairs_for_request
        # All allow_sets share the same set in v1; pass union for spec filter
        pair_spec = allow_sets["nsubj"] | allow_sets["dobj"] | allow_sets["V"]
        pair_frame = _load_pairs_for_request(
            constraint=cc,
            pairs_df=pairs_df,
            filtered_spec=pair_spec,
        )
        if pair_frame.height == 0:
            return []
        joined = resolve_contrastive_join(
            pair_frame=pair_frame,
            sel_df=sel_df,
            verb_candidates=verb_set,
            band=band,
            slots=cc.slots,
        )
    else:
        # Non-contrastive: long-format → assemble pairs externally
        # For v1 single-sentence we still produce a "joined-shape" output by
        # cross-joining nsubj × dobj rows under the same verb.
        long = resolve_non_contrastive(
            sel_df=sel_df,
            verb_candidates=verb_set,
            per_slot_allow_set=allow_sets,
            band=band,
        )
        side_a = long.filter(pl.col("role") == "nsubj").rename({
            "role": "role_a", "filler": "filler_a", "ppmi": "ppmi_a",
        }).select(["verb", "role_a", "filler_a", "ppmi_a"])
        side_b = long.filter(pl.col("role") == "dobj").rename({
            "role": "role_b", "filler": "filler_b", "ppmi": "ppmi_b",
        }).select(["verb", "role_b", "filler_b", "ppmi_b"])
        joined = (
            side_a.join(side_b, on="verb")
            .with_columns([
                pl.lit(band).alias("band"),
                pl.lit(0.0).alias("feature_distance"),
                pl.lit(0.0).alias("sonorant_diff"),
            ])
        )

    if joined.height == 0:
        return []

    # 4. Skeleton host filter — top skeleton per (role_a, role_b)
    role_pairs = (
        joined.select(["role_a", "role_b"]).unique().iter_rows()
    )
    skeleton_lookup: dict[tuple[str, str], str | None] = {}
    for role_a, role_b in role_pairs:
        hosts = select_host_skeletons(
            skeletons_df=skeletons_df,
            role_a=role_a, role_b=role_b,
            band=band, top_k=1,
        )
        skeleton_lookup[(role_a, role_b)] = (
            hosts["arg_structure"][0] if hosts.height > 0 else None
        )

    # 5. Score: ppmi_a + ppmi_b (+ feature_distance for maxopp)
    joined = joined.with_columns(
        (pl.col("ppmi_a") + pl.col("ppmi_b")).alias("ppmi_total")
    ).sort("ppmi_total", descending=True)

    # 6. Assemble candidates with skeletons
    candidates: list[dict] = []
    for row in joined.head(top_k * 2).iter_rows(named=True):
        skel = skeleton_lookup.get((row["role_a"], row["role_b"]))
        if skel is None:
            continue
        candidates.append({
            "verb": row["verb"],
            "role_a": row["role_a"],
            "filler_a": row["filler_a"],
            "role_b": row["role_b"],
            "filler_b": row["filler_b"],
            "skeleton": skel,
            "ppmi_total": row["ppmi_total"],
            "feature_distance": row.get("feature_distance", 0.0),
            "sonorant_diff": row.get("sonorant_diff", 0.0),
        })
        if len(candidates) >= top_k:
            break
    return candidates

[ ] Step 8.4: Run, verify pass

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v

Expected: 13 passed (or 12 + 1 skipped if exclude probe finds no candidates).

[ ] Step 8.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-112: solve() orchestrator end-to-end

Top-level pair-driven resolver. Produces structured candidates with
verb, role-positioned fillers, skeleton, and ppmi_total. Verb is in
locked_slots not positional. Exclude/Bound apply to verb candidate
set the same way they apply to fillers.

Render + reranker integration is the next step (Task 11). For now
candidates carry intermediate fields so tests can verify structure.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 9: Wire surface realization¶

Files: - Modify: <spike>/pair_driven.py - Modify: <spike>/test_pair_driven_solve.py

Take a structured candidate (verb, fillers, skeleton) and produce a sentence string using the existing renderer in skeleton_csp.py. This re-uses surface realization rather than re-implementing it.

[ ] Step 9.1: Inspect existing render function

grep -n "def.*render\|def.*surface\|_assemble_surface" /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py

The existing renderer in skeleton_csp.py (likely _render_sentence or similar) takes (verb, fillers_dict, skeleton_arg_structure). Use it directly.

[ ] Step 9.2: Write failing test

Append to <spike>/test_pair_driven_solve.py:

def test_solve_produces_renderable_sentences(store, sel_df, skeletons_df):
    """Each returned candidate has a 'sentence' field with a rendered string."""
    from constraint_surface import MinpairConstraint
    import paradigm_3_csp

    spec_words = paradigm_3_csp.spec_lexicon(store, "spec1")
    candidates = pair_driven.solve(
        spec_words=spec_words,
        word_df=store.df,
        sel_df=sel_df,
        pairs_df=store.pairs_df,
        skeletons_df=skeletons_df,
        band="fineweb_adult",
        constraints=[MinpairConstraint(phoneme1="d", phoneme2="z", position="final")],
        top_k=4,
    )
    assert candidates
    for c in candidates:
        assert "sentence" in c
        assert isinstance(c["sentence"], str)
        assert len(c["sentence"]) > 0
        # filler words should appear in the sentence
        assert c["filler_a"] in c["sentence"] or c["filler_b"] in c["sentence"]

[ ] Step 9.3: Run, verify fail

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py::test_solve_produces_renderable_sentences -v

Expected: AssertionError on missing sentence key.

[ ] Step 9.4: Wire renderer in solve()

In <spike>/pair_driven.py, modify the candidate assembly loop in solve() to call the existing renderer. The exact import depends on what's exported by skeleton_csp.py; if there isn't a public render function, factor one out or reuse the existing path's surface formatting.

    from skeleton_csp import _render_candidate  # or whatever the existing helper is

    candidates: list[dict] = []
    for row in joined.head(top_k * 2).iter_rows(named=True):
        skel = skeleton_lookup.get((row["role_a"], row["role_b"]))
        if skel is None:
            continue
        fillers = {
            row["role_a"]: row["filler_a"],
            row["role_b"]: row["filler_b"],
            "V": row["verb"],
        }
        sentence = _render_candidate(
            arg_structure=skel,
            verb=row["verb"],
            fillers=fillers,
        )
        candidates.append({
            "verb": row["verb"],
            "role_a": row["role_a"],
            "filler_a": row["filler_a"],
            "role_b": row["role_b"],
            "filler_b": row["filler_b"],
            "skeleton": skel,
            "ppmi_total": row["ppmi_total"],
            "feature_distance": row.get("feature_distance", 0.0),
            "sonorant_diff": row.get("sonorant_diff", 0.0),
            "sentence": sentence,
        })
        if len(candidates) >= top_k:
            break
    return candidates

If skeleton_csp.py doesn't expose a render function, factor one out from the existing solve_shape body and rename it _render_candidate. Use the existing rendering logic verbatim — don't re-derive determiner placement / conjugation / advmod fill.

[ ] Step 9.5: Run, verify pass

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v

Expected: 14 passed.

[ ] Step 9.6: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py
git commit -m "$(cat <<'EOF'
PHON-112: wire surface realization into solve()

Each candidate now carries a 'sentence' field rendered by the existing
skeleton_csp.py surface formatter. No re-derivation of determiner /
conjugation / advmod logic.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 10: Retire PHON-106 v1 linked-slot mode¶

Files: - Modify: <spike>/skeleton_csp.py - Modify: <spike>/paradigm_3_csp.py - Modify: <spike>/test_contrastive_scorers.py

The new path in pair_driven.py is now end-to-end functional. Retire the v1 linked-slot mode that lives inside _enumerate_vectorized and the solve_shape/paradigm_3_csp.solve contrastive detection block.

[ ] Step 10.1: Inspect retire candidates

grep -n "contrast_pair_frame\|contrast_axis_name\|MinpairConstraint\|MaxoppConstraint\|contrast_constraints" \
    /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
    | head -30

Identify: - The contrastive detection block in solve_shape (~50 lines) - The linked-slot block in _enumerate_vectorized (~30 lines) - The score_cols contrast_* filter in _enumerate_vectorized and _dedup_and_assemble - The python-fallback assert not contrast_constraints guard

[ ] Step 10.2: Remove contrastive code from skeleton_csp.py

In solve_shape, remove the entire contrastive detection block (validation guards + pair loading + contrast_pair_frame/contrast_axis_name plumbing). Keep the MultoppConstraint check (it still raises in v1 because PHON-113 is the home).

In _enumerate_vectorized, remove the if contrast_pair_frame is not None: branch and the else: wrapper around the standard cartesian. Restore the simpler unconditional cartesian.

Remove the c.startswith("contrast_") filter from score_cols in both _enumerate_vectorized and _dedup_and_assemble.

Remove the python-fallback assert not contrast_constraints guard.

Keep _load_pairs_for_request — pair_driven.py uses it.

[ ] Step 10.3: Remove pairs_df + constraints kwargs from solve_shape signature

solve_shape signature shrinks back to its pre-PHON-106 form. The PHON-106-specific kwargs (pairs_df, constraints for contrastive routing) move to pair_driven.solve().

solve_shape retains constraints only for non-contrastive constraint types (Exclude, Include, Bound, BoundBoost) — wait, those are also moving to pair_driven. Check the existing usage:

grep -n "constraints" /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py | head -10

If solve_shape consumes constraints for non-contrastive paths internally and those are NOT yet rerouted through pair_driven, retain that parameter. The retire scope is only the contrastive (linked-slot) parts.

[ ] Step 10.4: Update paradigm_3_csp.solve()

paradigm_3_csp.solve(verb, ...) now becomes a thin compat shim that calls pair_driven.solve(...) with locked_slots={"V": verb}. Or, if it's only used internally, delete it and update callers to use pair_driven.solve directly.

def solve(
    verb: str,
    spec_name: str,
    spec_words: frozenset[str],
    sel_df: pl.DataFrame,
    *,
    constraints: list | None = None,
    word_df: pl.DataFrame | None = None,
    pairs_df: pl.DataFrame | None = None,
    skeletons_df: pl.DataFrame | None = None,
    band: str = "fineweb_adult",
    top_k: int = 8,
):
    """Compat shim around pair_driven.solve(). Will be retired in PHON-109
    productionization."""
    import pair_driven
    if word_df is None or skeletons_df is None:
        raise ValueError("word_df and skeletons_df required")
    candidates = pair_driven.solve(
        spec_words=spec_words,
        word_df=word_df,
        sel_df=sel_df,
        pairs_df=pairs_df,
        skeletons_df=skeletons_df,
        band=band,
        constraints=list(constraints or []),
        locked_slots={"V": verb},
        top_k=top_k,
    )
    return candidates, {}

[ ] Step 10.5: Update / retire PHON-106 contrastive tests

In <spike>/test_contrastive_scorers.py, the realization, error, and routing tests test the v1 path. Since v1 is retired, delete those tests. The remaining tests: - test_load_pairs_for_minpair_basic — still tests the helper, retained - test_load_pairs_for_maxopp_filters_sonorant_diff — retained - test_load_pairs_emits_both_orientations — retained - test_minpair_constraint_accepts_slots_kwarg (Task 1) — retained - test_minpair_constraint_default_slots_is_none (Task 1) — retained - test_maxopp_constraint_accepts_slots_kwarg (Task 1) — retained

Delete: - test_minpair_linked_slot_realization - test_minpair_without_pairs_df_errors - test_minpair_single_content_slot_errors - test_multopp_in_single_sentence_errors - test_both_minpair_and_maxopp_errors - test_maxopp_feature_distance_in_components - test_pairs_empty_after_filter_returns_no_candidates - test_minpair_uses_vectorized_path - test_multopp_forces_python_fallback_in_routing

The replacement tests live in test_pair_driven_solve.py.

[ ] Step 10.6: Run full spike suite

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v

Expected: All retained tests pass. The deleted tests are no longer collected. Total count drops by 9 from PHON-106's 12 contrastive tests, but ~14 new tests in test_pair_driven_solve.py more than compensate.

[ ] Step 10.7: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py
git commit -m "$(cat <<'EOF'
PHON-112: retire v1 PHON-106 linked-slot mode

Removes:
- _enumerate_vectorized's contrast_pair_frame branch
- solve_shape's contrastive detection block + pairs_df kwarg
- score_cols contrast_* filter
- 9 test_contrastive_scorers tests that asserted v1 behavior

Replaced by pair_driven.solve() which uses a selectional self-join
(verb falls out of the join, role assignment falls out of the join).

paradigm_3_csp.solve becomes a thin compat shim around
pair_driven.solve with locked_slots={"V": verb}.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 11: Update eval harness¶

Files: - Modify: <spike>/build_judging_set.py

Drop the hardcoded VERBS = [...] list. Rebuild requests so the verb candidate set is derived per-spec from the new compute_verb_candidates.

[ ] Step 11.1: Inspect existing harness

grep -n "VERBS\|for verb in\|verb=" /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/build_judging_set.py | head -15

Find the request-builder loops over VERBS. They construct judging requests that pin verb= per request.

[ ] Step 11.2: Rewrite request-building

Replace the hardcoded VERBS loop with: for each spec × band × constraints, request a top-K from pair_driven.solve() (no verb fixing). The verb is now a property of the candidate, not the request.

# In the request-building loop:
for spec_name in SPECS:
    spec_words = paradigm_3_csp.spec_lexicon(store, spec_name)
    for band in BANDS:
        for c_label, constraints in CONSTRAINT_CONFIGS.items():
            candidates = pair_driven.solve(
                spec_words=spec_words,
                word_df=store.df,
                sel_df=sel_df,
                pairs_df=store.pairs_df,
                skeletons_df=skeletons_df,
                band=band,
                constraints=constraints,
                top_k=SINGLE_TOP_K,
            )
            requests.append({
                "request_id": f"single_{spec_name}_{band}_{c_label}",
                "request_type": "single",
                "spec": spec_name,
                "band": band,
                "constraints": _serialize_constraints(constraints),
                "candidates": [_candidate_payload(c) for c in candidates],
            })

The VERBS constant is deleted. The PARAGRAPH_CHAINS constant stays (paragraphs are PHON-113 scope).

[ ] Step 11.3: Smoke-test the harness

cd /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms && \
  uv run python build_judging_set.py --dry-run 2>&1 | tail -10

Expected: Produces a judging_set.jsonl with structured candidates; no VERBS-list iteration.

[ ] Step 11.4: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/build_judging_set.py
git commit -m "$(cat <<'EOF'
PHON-112: eval harness uses constraint-driven verb selection

Drops the hardcoded VERBS list; verb candidates fall out of
pair_driven.solve()'s selectional join. Requests are now keyed on
(spec, band, constraints) rather than (verb, spec, band, constraints).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 12: Final verification¶

[ ] Step 12.1: Full spike suite

cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v

Expected: all tests pass; net new test count > 0 vs PHON-106 v1.

[ ] Step 12.2: data-layer suite still passes

cd /Users/jneumann/Repos/PhonoLex && \
  uv run python -m pytest packages/data/tests/ -q

Expected: 209 passed (or whatever the current Task-4-era count is).

[ ] Step 12.3: Smoke test — minpair + exclude

cd /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms && \
  uv run python -c "
import polars as pl
from pathlib import Path
from phonolex_data.runtime.store import WordStore
import pair_driven
from constraint_surface import MinpairConstraint, ExcludeConstraint
import paradigm_3_csp

repo = Path('/Users/jneumann/Repos/PhonoLex')
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
skeletons_df = pl.read_parquet(Path('outputs') / 'skeletons.parquet')
spec_words = paradigm_3_csp.spec_lexicon(store, 'spec1')
candidates = pair_driven.solve(
    spec_words=spec_words,
    word_df=store.df,
    sel_df=sel_df,
    pairs_df=store.pairs_df,
    skeletons_df=skeletons_df,
    band='fineweb_adult',
    constraints=[MinpairConstraint(phoneme1='d', phoneme2='z', position='final')],
    top_k=3,
)
for c in candidates:
    print(c['sentence'], '|', c['verb'], (c['filler_a'], c['filler_b']))
"

Expected: 3 sentences printed, each with a (d, z)-final filler pair from spec1.

[ ] Step 12.4: No further commit needed

If all three steps green, PHON-112 is complete. The commit history shows the migration from v1 linked-slot to v2 pair-driven.

Done¶

After Task 12 verification, PHON-112 v1 is complete on feature/csp-iteration. Stack continues toward PHON-109 productionization.

Follow-ups not in this plan: - PHON-113 — paragraph composition (multopp, shared discourse subject, pronoun coref) - PHON-107 — reranker v2 (trains on pair-driven output, not v1 linked-slot output) - PHON-109 — productionize: replace /api/generate-single internals - PHON-110 — frontend reframe (top-K candidates with per-axis breakdown)

PHON-112 — Pair-driven CSP Implementation Plan¶

File map¶

Task 1: Add slots parameter to MinpairConstraint and MaxoppConstraint¶

Task 2: Rename _load_pairs_for_request columns to filler_a/filler_b¶

Task 3: Add verb_candidates helper module¶

Task 4: Implement contrastive selectional self-join¶

Task 5: Skeleton host filter¶

Task 6: Per-slot non-contrastive enumeration¶

Task 7: Constraint dispatch — per-slot allow sets¶

Task 8: New solve() orchestrator¶

Task 9: Wire surface realization¶

Task 10: Retire PHON-106 v1 linked-slot mode¶

Task 11: Update eval harness¶

Task 12: Final verification¶

Done¶

Task 1: Add `slots` parameter to MinpairConstraint and MaxoppConstraint¶

Task 2: Rename `_load_pairs_for_request` columns to filler_a/filler_b¶

Task 8: New `solve()` orchestrator¶