PHON-112 — Pair-driven CSP Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Replace verb-fixed, skeleton-driven sentence resolver with constraint-driven resolver where verb is a constrained slot and contrastive constraints drive resolution via a selectional self-join.
Architecture: Constraint-filtered lexicon → (optional) pair filler set → selectional self-join → skeleton host filter → render → reranker. Verb is just-another-slot. Single-sentence scope; paragraphs follow in PHON-113.
Tech Stack: Polars eager-mode joins, pytest-driven TDD, frozen dataclasses for constraint API.
File map¶
Files created (new):
- <spike>/verb_candidates.py — verb candidate set helpers (POS filter + selectional mass index)
- <spike>/test_pair_driven_solve.py — new test suite for the rewritten path
Files modified:
- <spike>/skeleton_csp.py — solve_shape rewrite, _load_pairs_for_request column rename, retire linked-slot mode
- <spike>/paradigm_3_csp.py — solve() signature rewrite (drop verb positional, accept WordStore)
- <spike>/constraint_surface.py — add slots param to MinpairConstraint/MaxoppConstraint
- <spike>/test_contrastive_scorers.py — rewrite to assert against join-driven path
- <spike>/build_judging_set.py — drop hardcoded VERBS list
Where <spike> = /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/.
Tests run via cd /Users/jneumann/Repos/PhonoLex/packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v from repo root or relative path.
Task 1: Add slots parameter to MinpairConstraint and MaxoppConstraint¶
Files:
- Modify: <spike>/constraint_surface.py
The new slots parameter is optional and defaults to None (= "let the join decide role pair"). Explicit pairs like slots=("V", "dobj") filter the join to that role pair.
- [ ] Step 1.1: Write failing test
Append to <spike>/test_contrastive_scorers.py (this becomes part of the rewrite in later tasks):
def test_minpair_constraint_accepts_slots_kwarg():
from constraint_surface import MinpairConstraint
c = MinpairConstraint(phoneme1="k", phoneme2="b", position="initial", slots=("V", "dobj"))
assert c.slots == ("V", "dobj")
def test_minpair_constraint_default_slots_is_none():
from constraint_surface import MinpairConstraint
c = MinpairConstraint(phoneme1="k", phoneme2="b")
assert c.slots is None
def test_maxopp_constraint_accepts_slots_kwarg():
from constraint_surface import MaxoppConstraint
c = MaxoppConstraint(phoneme1="k", phoneme2="m", position="initial", slots=("V", "nsubj"))
assert c.slots == ("V", "nsubj")
- [ ] Step 1.2: Run, verify fail
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py::test_minpair_constraint_accepts_slots_kwarg -v
Expected: TypeError on unexpected keyword arg slots.
- [ ] Step 1.3: Add
slotsfield
In <spike>/constraint_surface.py, locate MinpairConstraint and MaxoppConstraint. Add slots: tuple[str, str] | None = None to each (default None = let join decide):
@dataclass(frozen=True)
class MinpairConstraint:
phoneme1: str
phoneme2: str
position: Literal["initial", "medial", "final", "any"] = "any"
slots: tuple[str, str] | None = None
type: Literal["contrastive_minpair"] = "contrastive_minpair"
@dataclass(frozen=True)
class MaxoppConstraint:
phoneme1: str
phoneme2: str
position: Literal["initial", "medial", "final", "any"] = "any"
min_sonorant_diff: float = 0.5
slots: tuple[str, str] | None = None
type: Literal["contrastive_maxopp"] = "contrastive_maxopp"
- [ ] Step 1.4: Run, verify pass
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py -k "constraint_accepts_slots or default_slots" -v
Expected: 3 passed.
- [ ] Step 1.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/constraint_surface.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py
git commit -m "$(cat <<'EOF'
PHON-112: add slots kwarg to Minpair/MaxoppConstraint
Default slots=None means "let the join decide role pair". Explicit
slots=("V", "dobj") restricts the join to that pair. Default-None
preserves PHON-106 v1 behavior for callers that don't migrate.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 2: Rename _load_pairs_for_request columns to filler_a/filler_b¶
Files:
- Modify: <spike>/skeleton_csp.py
- Modify: <spike>/test_contrastive_scorers.py
The helper currently emits (nsubj, dobj, feature_distance, sonorant_diff). The new join doesn't know which roles will be assigned yet — column names must be neutral.
- [ ] Step 2.1: Update existing helper tests for new column names
In <spike>/test_contrastive_scorers.py, change the three Task 7 helper tests (test_load_pairs_for_minpair_basic, test_load_pairs_for_maxopp_filters_sonorant_diff, test_load_pairs_emits_both_orientations) to use filler_a/filler_b:
# In test_load_pairs_for_minpair_basic:
assert set(pairs_df.columns) >= {"filler_a", "filler_b", "feature_distance", "sonorant_diff"}
for a, b in zip(pairs_df["filler_a"].to_list(), pairs_df["filler_b"].to_list()):
assert a in {"cat", "bat", "kid", "bid", "key", "bee"}
assert b in {"cat", "bat", "kid", "bid", "key", "bee"}
# In test_load_pairs_emits_both_orientations:
a_set = set(pairs_df["filler_a"].to_list())
b_set = set(pairs_df["filler_b"].to_list())
assert a_set & b_set, "Expected overlap between filler_a and filler_b sets"
- [ ] Step 2.2: Run, verify fail
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py::test_load_pairs_for_minpair_basic -v
Expected: AssertionError — columns are nsubj/dobj, not filler_a/filler_b.
- [ ] Step 2.3: Update
_load_pairs_for_request
In <spike>/skeleton_csp.py, locate _load_pairs_for_request. Replace the forward/backward projections:
forward = base.select([
pl.col("word1").alias("filler_a"),
pl.col("word2").alias("filler_b"),
pl.col("feature_distance"),
pl.col("sonorant_diff"),
])
backward = base.select([
pl.col("word2").alias("filler_a"),
pl.col("word1").alias("filler_b"),
pl.col("feature_distance"),
pl.col("sonorant_diff"),
])
return pl.concat([forward, backward])
Also update the empty-frame fallback at the top of the function:
if pairs_df is None:
return pl.DataFrame({
"filler_a": pl.Series(dtype=pl.Utf8),
"filler_b": pl.Series(dtype=pl.Utf8),
"feature_distance": pl.Series(dtype=pl.Float32),
"sonorant_diff": pl.Series(dtype=pl.Float32),
})
- [ ] Step 2.4: Run, verify pass
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py -k "load_pairs" -v
Expected: 3 passed.
(Note: test_minpair_linked_slot_realization and friends still expect nsubj/dobj columns — those tests get rewritten in Task 9. They will fail in this intermediate state.)
- [ ] Step 2.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py
git commit -m "$(cat <<'EOF'
PHON-112: rename _load_pairs_for_request output to filler_a/filler_b
Role assignment is determined by the selectional join, not by the
pair loader, so the column names should not pre-commit to nsubj/dobj.
3 helper tests updated; the realization/error tests still expect
nsubj/dobj and will be rewritten in Task 9.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 3: Add verb_candidates helper module¶
Files:
- Create: <spike>/verb_candidates.py
- Create: <spike>/test_verb_candidates.py
Computes verb candidate set: lexicon ∩ POS=verb ∩ has_selectional_mass.
- [ ] Step 3.1: Write failing test
Create <spike>/test_verb_candidates.py:
"""Tests for PHON-112 verb candidate set."""
from __future__ import annotations
import sys
from pathlib import Path
import pytest
sys.path.insert(0, str(Path(__file__).parent))
import polars as pl
from phonolex_data.runtime.store import WordStore
import verb_candidates
@pytest.fixture(scope="session")
def store():
repo_root = Path(__file__).resolve().parents[4]
return WordStore.from_parquet(repo_root / "data" / "runtime" / "words.parquet")
@pytest.fixture(scope="session")
def sel_df():
repo_root = Path(__file__).resolve().parents[4]
return pl.read_parquet(repo_root / "data" / "runtime" / "selectional.parquet")
def test_verb_candidates_returns_known_verbs(store, sel_df):
"""Common verbs like 'cut' and 'run' appear in the candidate set
for a generic spec."""
candidates = verb_candidates.compute_verb_candidates(
spec_words=frozenset(), # no spec restriction → all verbs
word_df=store.df,
sel_df=sel_df,
band="fineweb_adult",
min_selectional_rows=10,
)
assert "cut" in candidates
assert "run" in candidates
assert "see" in candidates
def test_verb_candidates_filters_by_spec(store, sel_df):
"""When spec_words is restrictive, only verbs in spec survive."""
candidates = verb_candidates.compute_verb_candidates(
spec_words=frozenset({"cut", "run", "make"}), # restricted
word_df=store.df,
sel_df=sel_df,
band="fineweb_adult",
min_selectional_rows=10,
)
assert candidates <= frozenset({"cut", "run", "make"})
assert "cut" in candidates
def test_verb_candidates_filters_by_selectional_mass(store, sel_df):
"""A verb with no rows for the band gets excluded."""
candidates = verb_candidates.compute_verb_candidates(
spec_words=frozenset(),
word_df=store.df,
sel_df=sel_df,
band="phonbank_age_0_2", # narrower band
min_selectional_rows=1000, # high bar
)
# A reasonable number of verbs should still survive, but not all of fineweb
n_fineweb = len(verb_candidates.compute_verb_candidates(
spec_words=frozenset(),
word_df=store.df,
sel_df=sel_df,
band="fineweb_adult",
min_selectional_rows=10,
))
n_narrow = len(candidates)
assert n_narrow < n_fineweb
- [ ] Step 3.2: Run, verify fail
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_verb_candidates.py -v
Expected: ModuleNotFoundError on import verb_candidates.
- [ ] Step 3.3: Implement the helper
Create <spike>/verb_candidates.py:
"""PHON-112: verb candidate set computation.
A verb candidate is any word in the lexicon that:
1. Has POS=VERB (from words.parquet pos / all_pos columns)
2. Has at least min_selectional_rows entries in selectional.parquet for the band
3. (Optionally) is in spec_words
The candidate set is the verb-slot allow-set for `solve()`. Constraint
filtering (Exclude, Bound, etc.) is applied UPSTREAM via filtered_lexicon
before this helper runs, so the spec_words argument here represents the
already-constraint-filtered lexicon, not the raw spec.
"""
from __future__ import annotations
import polars as pl
def compute_verb_candidates(
*,
spec_words: frozenset[str],
word_df: pl.DataFrame,
sel_df: pl.DataFrame,
band: str,
min_selectional_rows: int = 10,
) -> frozenset[str]:
"""Return the set of verb candidates satisfying the lexicon, POS, and
selectional-mass requirements.
spec_words: lexicon to restrict to. Empty frozenset means no restriction.
min_selectional_rows: minimum number of (role, filler) entries in the
target band before a verb is admissible. Cuts off long-tail verbs.
"""
# POS filter: keep words tagged VERB
pos_filter = (
(pl.col("pos") == "VERB")
| (pl.col("all_pos").list.contains("VERB"))
)
verb_words = word_df.filter(pos_filter).select("word")
pos_set = set(verb_words["word"].to_list())
# Selectional-mass filter: count (role, filler) rows per verb in the band
mass = (
sel_df
.filter(pl.col("band") == band)
.group_by("verb")
.agg(pl.len().alias("n_rows"))
.filter(pl.col("n_rows") >= min_selectional_rows)
)
mass_set = set(mass["verb"].to_list())
candidates = pos_set & mass_set
if spec_words:
candidates &= spec_words
return frozenset(candidates)
- [ ] Step 3.4: Run, verify pass
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_verb_candidates.py -v
Expected: 3 passed.
- [ ] Step 3.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/verb_candidates.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_verb_candidates.py
git commit -m "$(cat <<'EOF'
PHON-112: add verb_candidates helper
Computes the verb candidate set for the constraint-driven solver:
lexicon ∩ POS=VERB ∩ has_selectional_mass(band, ≥ min_rows). Spec
words restriction is applied if non-empty (already constraint-filtered
by the caller).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 4: Implement contrastive selectional self-join¶
Files:
- Create: <spike>/pair_driven.py
- Create test: append to <spike>/test_pair_driven_solve.py
Core algorithm: produce (verb, role_a, w1, role_b, w2, band, ppmi_a, ppmi_b, feature_distance, sonorant_diff) rows from the constraint-filtered pair frame and selectional table.
- [ ] Step 4.1: Write failing test
Create <spike>/test_pair_driven_solve.py:
"""Tests for PHON-112 pair-driven CSP."""
from __future__ import annotations
import sys
from pathlib import Path
import pytest
sys.path.insert(0, str(Path(__file__).parent))
import polars as pl
from phonolex_data.runtime.store import WordStore
import pair_driven
@pytest.fixture(scope="session")
def store():
repo_root = Path(__file__).resolve().parents[4]
return WordStore.from_parquet(repo_root / "data" / "runtime" / "words.parquet")
@pytest.fixture(scope="session")
def sel_df():
repo_root = Path(__file__).resolve().parents[4]
return pl.read_parquet(repo_root / "data" / "runtime" / "selectional.parquet")
def test_resolve_contrastive_join_returns_complete_specs(store, sel_df):
"""Every join row contains a verb, two distinct roles, two fillers,
and a band — a complete sentence spec (sans skeleton)."""
pair_frame = pl.DataFrame({
"filler_a": ["cores", "kids"], # /k.../z final and /k.../z final
"filler_b": ["cord", "kid"], # /k.../d final and /k.../d final
"feature_distance": [0.8, 0.8],
"sonorant_diff": [0.0, 0.0],
})
joined = pair_driven.resolve_contrastive_join(
pair_frame=pair_frame,
sel_df=sel_df,
verb_candidates=frozenset({"cut", "see", "make", "find"}),
band="fineweb_adult",
)
assert joined.height > 0
expected_cols = {"verb", "role_a", "filler_a", "role_b", "filler_b",
"ppmi_a", "ppmi_b", "feature_distance", "sonorant_diff"}
assert set(joined.columns) >= expected_cols
# Roles must differ
assert (joined["role_a"] != joined["role_b"]).all()
# All verbs must be in the candidate set
assert set(joined["verb"].to_list()) <= {"cut", "see", "make", "find"}
def test_resolve_contrastive_join_filters_by_band(store, sel_df):
"""Different bands produce different join sizes."""
pair_frame = pl.DataFrame({
"filler_a": ["cores"],
"filler_b": ["cord"],
"feature_distance": [0.8],
"sonorant_diff": [0.0],
})
fineweb = pair_driven.resolve_contrastive_join(
pair_frame=pair_frame,
sel_df=sel_df,
verb_candidates=frozenset({"cut", "see", "make"}),
band="fineweb_adult",
)
childes = pair_driven.resolve_contrastive_join(
pair_frame=pair_frame,
sel_df=sel_df,
verb_candidates=frozenset({"cut", "see", "make"}),
band="childes_age_2_5",
)
# Different bands → different row counts (one might be 0)
assert fineweb.height != childes.height or fineweb.height == 0
def test_resolve_contrastive_join_empty_pair_frame_returns_empty(sel_df):
"""An empty pair frame produces an empty join."""
pair_frame = pl.DataFrame({
"filler_a": [],
"filler_b": [],
"feature_distance": [],
"sonorant_diff": [],
}, schema={"filler_a": pl.Utf8, "filler_b": pl.Utf8,
"feature_distance": pl.Float32, "sonorant_diff": pl.Float32})
joined = pair_driven.resolve_contrastive_join(
pair_frame=pair_frame,
sel_df=sel_df,
verb_candidates=frozenset({"cut"}),
band="fineweb_adult",
)
assert joined.height == 0
def test_resolve_contrastive_join_filters_by_slots_kwarg(store, sel_df):
"""When slots=("V", "dobj") is passed, joined rows have role_a=V (or its
equivalent) and role_b=dobj — but verb-as-role isn't a row in selectional;
this case requires a different code path tested in Task 5/8."""
# This is a placeholder test for Task 8's slots-aware join.
# For now, slots=None should produce all role pairs.
pair_frame = pl.DataFrame({
"filler_a": ["cores"],
"filler_b": ["cord"],
"feature_distance": [0.8],
"sonorant_diff": [0.0],
})
joined = pair_driven.resolve_contrastive_join(
pair_frame=pair_frame,
sel_df=sel_df,
verb_candidates=frozenset({"cut"}),
band="fineweb_adult",
slots=None,
)
# Both roles should be in the standard role inventory
if joined.height > 0:
roles = set(joined["role_a"].to_list()) | set(joined["role_b"].to_list())
for r in roles:
assert r in {"nsubj", "dobj", "iobj", "xcomp", "ccomp", "advmod"} or r.startswith("pobj_")
- [ ] Step 4.2: Run, verify fail
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v
Expected: ModuleNotFoundError on import pair_driven.
- [ ] Step 4.3: Implement
resolve_contrastive_join
Create <spike>/pair_driven.py:
"""PHON-112: pair-driven CSP core.
Selectional self-join: given a constraint-filtered pair frame and a
verb candidate set, produce (verb, role_a, w1, role_b, w2, band, ppmi)
rows by joining selectional × selectional × pair_frame.
"""
from __future__ import annotations
import polars as pl
def resolve_contrastive_join(
*,
pair_frame: pl.DataFrame,
sel_df: pl.DataFrame,
verb_candidates: frozenset[str],
band: str,
slots: tuple[str, str] | None = None,
) -> pl.DataFrame:
"""Self-join sel × sel filtered to pair_frame's filler combinations.
Returns rows with columns:
verb, role_a, filler_a, role_b, filler_b, band,
ppmi_a, ppmi_b, feature_distance, sonorant_diff
Each row is a complete sentence spec sans skeleton: the verb, the
two role-positioned fillers, and the corresponding PMIs.
slots: if given, restrict role_a/role_b to the named pair (e.g.,
("nsubj", "dobj")). When None, all role pairs are admitted.
"""
if pair_frame.height == 0:
return pl.DataFrame(schema={
"verb": pl.Utf8,
"role_a": pl.Utf8,
"filler_a": pl.Utf8,
"role_b": pl.Utf8,
"filler_b": pl.Utf8,
"band": pl.Utf8,
"ppmi_a": pl.Float32,
"ppmi_b": pl.Float32,
"feature_distance": pl.Float32,
"sonorant_diff": pl.Float32,
})
# Pre-filter selectional: only rows for this band, only fillers in pair_frame,
# only verbs in candidate set
pair_words = (
pl.concat([pair_frame["filler_a"], pair_frame["filler_b"]]).unique()
)
sel_window = (
sel_df
.filter(pl.col("band") == band)
.filter(pl.col("filler").is_in(pair_words.to_list()))
.filter(pl.col("verb").is_in(list(verb_candidates)))
)
# Self-join on verb (role differs)
side_a = sel_window.rename({
"role": "role_a", "filler": "filler_a", "ppmi": "ppmi_a",
}).select(["verb", "role_a", "filler_a", "ppmi_a"])
side_b = sel_window.rename({
"role": "role_b", "filler": "filler_b", "ppmi": "ppmi_b",
}).select(["verb", "role_b", "filler_b", "ppmi_b"])
cross = side_a.join(side_b, on="verb").filter(pl.col("role_a") != pl.col("role_b"))
# Join with pair_frame to keep only valid (filler_a, filler_b) combinations
joined = cross.join(pair_frame, on=["filler_a", "filler_b"], how="inner")
# Slots restriction (if given)
if slots is not None:
slot_a, slot_b = slots
joined = joined.filter(
((pl.col("role_a") == slot_a) & (pl.col("role_b") == slot_b))
| ((pl.col("role_a") == slot_b) & (pl.col("role_b") == slot_a))
)
# Add band column for downstream consumers
joined = joined.with_columns(pl.lit(band).alias("band"))
return joined.select([
"verb", "role_a", "filler_a", "role_b", "filler_b", "band",
"ppmi_a", "ppmi_b", "feature_distance", "sonorant_diff",
])
- [ ] Step 4.4: Run, verify pass
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v
Expected: 4 passed.
- [ ] Step 4.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-112: contrastive selectional self-join
resolve_contrastive_join produces (verb, role_a, w1, role_b, w2, band,
ppmis, feature_distance, sonorant_diff) rows by joining selectional ×
selectional × pair_frame. The verb falls out of the join, role pair
falls out of the join, no caller iteration. slots kwarg restricts to
a specific role pair when needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 5: Skeleton host filter¶
Files:
- Modify: <spike>/pair_driven.py
- Modify: <spike>/test_pair_driven_solve.py
Given joined rows + skeletons.parquet, produce candidate skeletons that can host the produced role pair.
- [ ] Step 5.1: Write failing test
Append to <spike>/test_pair_driven_solve.py:
@pytest.fixture(scope="session")
def skeletons_df():
skeletons_path = (
Path(__file__).parent / "outputs" / "skeletons.parquet"
)
if not skeletons_path.exists():
pytest.skip("skeletons.parquet not present")
return pl.read_parquet(skeletons_path)
def test_select_host_skeletons_filters_by_role_coverage(skeletons_df):
"""Returned skeletons must contain both role_a and role_b."""
hosts = pair_driven.select_host_skeletons(
skeletons_df=skeletons_df,
role_a="nsubj",
role_b="dobj",
band="fineweb_adult",
top_k=5,
)
assert hosts.height > 0
for arg_struct in hosts["arg_structure"].to_list():
slots = arg_struct.split(",")
assert "nsubj" in slots
assert "dobj" in slots
def test_select_host_skeletons_ranks_by_freq(skeletons_df):
"""Top-K ordering is by freq desc."""
hosts = pair_driven.select_host_skeletons(
skeletons_df=skeletons_df,
role_a="nsubj",
role_b="dobj",
band="fineweb_adult",
top_k=5,
)
freqs = hosts["freq"].to_list()
assert freqs == sorted(freqs, reverse=True)
def test_select_host_skeletons_filters_by_band(skeletons_df):
"""A skeleton from a different band is not returned."""
hosts = pair_driven.select_host_skeletons(
skeletons_df=skeletons_df,
role_a="nsubj",
role_b="dobj",
band="fineweb_adult",
top_k=10,
)
for b in hosts["band"].to_list():
assert b == "fineweb_adult"
- [ ] Step 5.2: Run, verify fail
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py::test_select_host_skeletons_filters_by_role_coverage -v
Expected: AttributeError on pair_driven.select_host_skeletons.
- [ ] Step 5.3: Implement
select_host_skeletons
Append to <spike>/pair_driven.py:
def select_host_skeletons(
*,
skeletons_df: pl.DataFrame,
role_a: str,
role_b: str,
band: str,
top_k: int = 5,
) -> pl.DataFrame:
"""Return top-K skeletons whose arg_structure covers {role_a, role_b}
in the given band, ranked by freq desc.
skeletons_df schema (from skeletons.parquet):
band, arg_structure, pos_template, freq, verb_lemma_count, example
"""
# arg_structure is a comma-joined string; split and check coverage
return (
skeletons_df
.filter(pl.col("band") == band)
.filter(pl.col("arg_structure").str.split(",").list.contains(role_a))
.filter(pl.col("arg_structure").str.split(",").list.contains(role_b))
.sort(["freq", "arg_structure"], descending=[True, False])
.head(top_k)
)
- [ ] Step 5.4: Run, verify pass
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v
Expected: 7 passed (4 from Task 4 + 3 new).
- [ ] Step 5.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-112: skeleton host filter
select_host_skeletons returns top-K skeletons from skeletons.parquet
whose arg_structure covers the produced role pair, restricted to the
target band and ranked by freq desc. Verb compatibility is implicit
from the upstream selectional join.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 6: Per-slot non-contrastive enumeration¶
Files:
- Modify: <spike>/pair_driven.py
- Modify: <spike>/test_pair_driven_solve.py
For requests without contrastive constraints, enumerate per-slot fillers across all verb candidates. The output shape mirrors resolve_contrastive_join so downstream consumers don't branch.
- [ ] Step 6.1: Write failing test
Append to <spike>/test_pair_driven_solve.py:
def test_resolve_non_contrastive_returns_complete_specs(store, sel_df):
"""Non-contrastive enumeration produces (verb, role, filler, ppmi) rows
across the verb candidate set."""
rows = pair_driven.resolve_non_contrastive(
sel_df=sel_df,
verb_candidates=frozenset({"cut", "see", "make"}),
per_slot_allow_set={
"nsubj": frozenset({"cat", "dog", "kid"}),
"dobj": frozenset({"bat", "ball", "bid"}),
},
band="fineweb_adult",
)
assert rows.height > 0
# All verbs in candidate set
assert set(rows["verb"].to_list()) <= {"cut", "see", "make"}
# Per-slot allow sets respected
for role, allow_set in [("nsubj", {"cat", "dog", "kid"}),
("dobj", {"bat", "ball", "bid"})]:
role_rows = rows.filter(pl.col("role") == role)
if role_rows.height > 0:
assert set(role_rows["filler"].to_list()) <= allow_set
def test_resolve_non_contrastive_filters_zero_ppmi(store, sel_df):
"""Rows with ppmi == 0 are excluded."""
rows = pair_driven.resolve_non_contrastive(
sel_df=sel_df,
verb_candidates=frozenset({"cut"}),
per_slot_allow_set={
"nsubj": frozenset({"cat"}),
"dobj": frozenset({"bat"}),
},
band="fineweb_adult",
)
if rows.height > 0:
assert (rows["ppmi"] > 0).all()
- [ ] Step 6.2: Run, verify fail
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py::test_resolve_non_contrastive_returns_complete_specs -v
Expected: AttributeError on pair_driven.resolve_non_contrastive.
- [ ] Step 6.3: Implement
resolve_non_contrastive
Append to <spike>/pair_driven.py:
def resolve_non_contrastive(
*,
sel_df: pl.DataFrame,
verb_candidates: frozenset[str],
per_slot_allow_set: dict[str, frozenset[str]],
band: str,
) -> pl.DataFrame:
"""Enumerate (verb, role, filler, ppmi) rows where:
- verb ∈ verb_candidates
- filler ∈ per_slot_allow_set[role] for the row's role
- ppmi > 0
- band == band
Output shape: long format (verb, role, filler, band, ppmi).
Downstream consumers cross-join across roles to assemble candidates.
"""
# Pre-filter sel by band + verbs
sel_window = (
sel_df
.filter(pl.col("band") == band)
.filter(pl.col("verb").is_in(list(verb_candidates)))
.filter(pl.col("ppmi") > 0.0)
)
# Per-role filler allow-set filter
role_filters = []
for role, allow in per_slot_allow_set.items():
role_filters.append(
(pl.col("role") == role) & pl.col("filler").is_in(list(allow))
)
if not role_filters:
return sel_window.head(0)
combined = role_filters[0]
for f in role_filters[1:]:
combined = combined | f
return sel_window.filter(combined).select(["verb", "role", "filler", "band", "ppmi"])
- [ ] Step 6.4: Run, verify pass
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v
Expected: 9 passed.
- [ ] Step 6.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-112: non-contrastive per-slot enumeration
resolve_non_contrastive emits (verb, role, filler, band, ppmi) rows
respecting per-slot allow sets and the verb candidate set, with
ppmi > 0 filter. Mirrors the long-format output of contrastive join
so the downstream assembler doesn't branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 7: Constraint dispatch — per-slot allow sets¶
Files:
- Modify: <spike>/pair_driven.py
- Modify: <spike>/test_pair_driven_solve.py
Resolve constraints to per-slot allow sets. ExcludeConstraint and BoundConstraint produce hard filters; IncludeConstraint and BoundBoostConstraint produce per-word axes (deferred to Task 9 where they wire into scoring).
- [ ] Step 7.1: Write failing test
Append to <spike>/test_pair_driven_solve.py:
def test_constraint_dispatch_exclude_filters_lexicon(store):
"""ExcludeConstraint(/ɹ/) removes /ɹ/-containing words from all slots."""
from constraint_surface import ExcludeConstraint
spec_words = frozenset({"cat", "run", "rat", "bat", "kid"})
allow_sets = pair_driven.resolve_per_slot_allow_sets(
spec_words=spec_words,
word_df=store.df,
constraints=[ExcludeConstraint(phonemes=("ɹ",))],
slot_types=("nsubj", "dobj", "V"),
)
# /ɹ/-containing words excluded from all slots
for slot in ("nsubj", "dobj", "V"):
assert "run" not in allow_sets[slot]
assert "rat" not in allow_sets[slot]
assert "cat" in allow_sets[slot]
def test_constraint_dispatch_bound_filters_lexicon(store):
"""BoundConstraint(aoa, max=6) excludes high-AoA words."""
from constraint_surface import BoundConstraint
spec_words = frozenset({"cat", "convoluted", "kid", "abstruse"})
allow_sets = pair_driven.resolve_per_slot_allow_sets(
spec_words=spec_words,
word_df=store.df,
constraints=[BoundConstraint(norm="aoa", max_value=6.0)],
slot_types=("nsubj", "dobj"),
)
# cat (low AoA) should be in; convoluted (high AoA) shouldn't
assert "cat" in allow_sets["nsubj"]
# The exact AoA values are data-driven, so only assert the easy cases
- [ ] Step 7.2: Run, verify fail
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py::test_constraint_dispatch_exclude_filters_lexicon -v
Expected: AttributeError on pair_driven.resolve_per_slot_allow_sets.
- [ ] Step 7.3: Implement
resolve_per_slot_allow_sets
Append to <spike>/pair_driven.py:
def resolve_per_slot_allow_sets(
*,
spec_words: frozenset[str],
word_df: pl.DataFrame,
constraints: list,
slot_types: tuple[str, ...],
) -> dict[str, frozenset[str]]:
"""Apply hard constraints to spec_words, returning per-slot allow sets.
Soft constraints (IncludeConstraint, BoundBoostConstraint) are not
handled here — they produce per-word axes consumed at scoring time.
"""
from constraint_surface import (
ExcludeConstraint, BoundConstraint,
)
base_filtered = set(spec_words) if spec_words else set(word_df["word"].to_list())
# Apply hard constraints
for c in constraints:
if isinstance(c, ExcludeConstraint):
# Exclude words containing any banned phoneme
banned = set(c.phonemes)
filtered = (
word_df
.filter(pl.col("word").is_in(list(base_filtered)))
.filter(
~pl.col("phonemes").list.eval(
pl.element().is_in(list(banned))
).list.any()
)
)
base_filtered &= set(filtered["word"].to_list())
elif isinstance(c, BoundConstraint):
# Range filter on a norm column
col = pl.col(c.norm)
cond = pl.lit(True)
if c.min_value is not None:
cond = cond & (col >= c.min_value)
if c.max_value is not None:
cond = cond & (col <= c.max_value)
cond = cond & col.is_not_null()
filtered = (
word_df
.filter(pl.col("word").is_in(list(base_filtered)))
.filter(cond)
)
base_filtered &= set(filtered["word"].to_list())
base_set = frozenset(base_filtered)
return {slot: base_set for slot in slot_types}
- [ ] Step 7.4: Run, verify pass
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v
Expected: 11 passed.
- [ ] Step 7.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-112: per-slot allow set resolution from hard constraints
resolve_per_slot_allow_sets applies ExcludeConstraint and
BoundConstraint to spec_words, returning a per-slot-type allow set.
Same allow set goes to all slot types (incl. V), so verb gets the
same phonological/norm filtering as nsubj/dobj.
Soft constraints (Include, BoundBoost) defer to scoring (Task 9).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 8: New solve() orchestrator¶
Files:
- Modify: <spike>/pair_driven.py
- Modify: <spike>/test_pair_driven_solve.py
The top-level solve() ties resolve_per_slot_allow_sets → verb_candidates → (contrastive vs non-contrastive branch) → skeleton host filter together.
This task adds the orchestrator. The render + score wiring stays out for now; tests assert on the structured intermediate result.
- [ ] Step 8.1: Write failing test
Append to <spike>/test_pair_driven_solve.py:
def test_solve_minpair_end_to_end(store, sel_df, skeletons_df):
"""End-to-end solve with a minpair constraint produces structured
candidates whose (filler_a, filler_b) is a real minimal pair."""
from constraint_surface import MinpairConstraint
import paradigm_3_csp
spec_words = paradigm_3_csp.spec_lexicon(store, "spec1")
candidates = pair_driven.solve(
spec_words=spec_words,
word_df=store.df,
sel_df=sel_df,
pairs_df=store.pairs_df,
skeletons_df=skeletons_df,
band="fineweb_adult",
constraints=[MinpairConstraint(phoneme1="d", phoneme2="z", position="final")],
top_k=8,
)
assert candidates, "expected candidates"
# Each candidate should have the structured intermediate fields
for c in candidates:
assert "verb" in c
assert "role_a" in c
assert "filler_a" in c
assert "role_b" in c
assert "filler_b" in c
assert "skeleton" in c
assert "ppmi_total" in c
def test_solve_exclude_filters_verb(store, sel_df, skeletons_df):
"""ExcludeConstraint(/ɹ/) excludes /ɹ/-containing words including verbs."""
from constraint_surface import ExcludeConstraint, MinpairConstraint
import paradigm_3_csp
spec_words = paradigm_3_csp.spec_lexicon(store, "spec1")
candidates = pair_driven.solve(
spec_words=spec_words,
word_df=store.df,
sel_df=sel_df,
pairs_df=store.pairs_df,
skeletons_df=skeletons_df,
band="fineweb_adult",
constraints=[
ExcludeConstraint(phonemes=("ɹ",)),
MinpairConstraint(phoneme1="d", phoneme2="z", position="final"),
],
top_k=8,
)
if not candidates:
pytest.skip("no surviving candidates after exclude — adjust phoneme")
# No verb in any candidate should contain /ɹ/
verbs = {c["verb"] for c in candidates}
word_phonemes = {
row["word"]: list(row["phonemes"])
for row in store.df.filter(pl.col("word").is_in(list(verbs))).iter_rows(named=True)
}
for v in verbs:
assert "ɹ" not in word_phonemes.get(v, []), (
f"verb {v} contains /ɹ/, exclude constraint violated"
)
- [ ] Step 8.2: Run, verify fail
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py::test_solve_minpair_end_to_end -v
Expected: AttributeError on pair_driven.solve.
- [ ] Step 8.3: Implement
solveorchestrator
Append to <spike>/pair_driven.py:
def solve(
*,
spec_words: frozenset[str],
word_df: pl.DataFrame,
sel_df: pl.DataFrame,
pairs_df: pl.DataFrame | None,
skeletons_df: pl.DataFrame,
band: str,
constraints: list = (),
locked_slots: dict[str, str] | None = None,
top_k: int = 8,
) -> list[dict]:
"""Constraint-driven sentence resolver. See PHON-112 design doc."""
from constraint_surface import (
MinpairConstraint, MaxoppConstraint, MultoppConstraint,
)
locked_slots = dict(locked_slots or {})
# Identify contrastive constraint (at most one)
contrast_constraints = [
c for c in constraints
if isinstance(c, (MinpairConstraint, MaxoppConstraint))
]
if len(contrast_constraints) > 1:
raise ValueError("at most one contrastive constraint per request")
multopp = [c for c in constraints if isinstance(c, MultoppConstraint)]
if multopp:
raise ValueError(
"MultoppConstraint requires multi-sentence paragraph composition; "
"deferred to PHON-113"
)
# 1. Per-slot allow sets (constraint-filtered lexicon)
SLOT_TYPES = ("V", "nsubj", "dobj", "iobj") # extend if pobj_X used
allow_sets = resolve_per_slot_allow_sets(
spec_words=spec_words,
word_df=word_df,
constraints=constraints,
slot_types=SLOT_TYPES,
)
# 2. Verb candidates
from verb_candidates import compute_verb_candidates
verb_set = compute_verb_candidates(
spec_words=allow_sets["V"],
word_df=word_df,
sel_df=sel_df,
band=band,
)
if "V" in locked_slots:
verb_set = frozenset({locked_slots["V"]}) & verb_set
if not verb_set:
return []
# 3. Contrastive vs non-contrastive branch
if contrast_constraints:
cc = contrast_constraints[0]
if pairs_df is None:
raise ValueError("contrastive constraint requires pairs_df")
# Reuse PHON-106 helper for filler set
from skeleton_csp import _load_pairs_for_request
# All allow_sets share the same set in v1; pass union for spec filter
pair_spec = allow_sets["nsubj"] | allow_sets["dobj"] | allow_sets["V"]
pair_frame = _load_pairs_for_request(
constraint=cc,
pairs_df=pairs_df,
filtered_spec=pair_spec,
)
if pair_frame.height == 0:
return []
joined = resolve_contrastive_join(
pair_frame=pair_frame,
sel_df=sel_df,
verb_candidates=verb_set,
band=band,
slots=cc.slots,
)
else:
# Non-contrastive: long-format → assemble pairs externally
# For v1 single-sentence we still produce a "joined-shape" output by
# cross-joining nsubj × dobj rows under the same verb.
long = resolve_non_contrastive(
sel_df=sel_df,
verb_candidates=verb_set,
per_slot_allow_set=allow_sets,
band=band,
)
side_a = long.filter(pl.col("role") == "nsubj").rename({
"role": "role_a", "filler": "filler_a", "ppmi": "ppmi_a",
}).select(["verb", "role_a", "filler_a", "ppmi_a"])
side_b = long.filter(pl.col("role") == "dobj").rename({
"role": "role_b", "filler": "filler_b", "ppmi": "ppmi_b",
}).select(["verb", "role_b", "filler_b", "ppmi_b"])
joined = (
side_a.join(side_b, on="verb")
.with_columns([
pl.lit(band).alias("band"),
pl.lit(0.0).alias("feature_distance"),
pl.lit(0.0).alias("sonorant_diff"),
])
)
if joined.height == 0:
return []
# 4. Skeleton host filter — top skeleton per (role_a, role_b)
role_pairs = (
joined.select(["role_a", "role_b"]).unique().iter_rows()
)
skeleton_lookup: dict[tuple[str, str], str | None] = {}
for role_a, role_b in role_pairs:
hosts = select_host_skeletons(
skeletons_df=skeletons_df,
role_a=role_a, role_b=role_b,
band=band, top_k=1,
)
skeleton_lookup[(role_a, role_b)] = (
hosts["arg_structure"][0] if hosts.height > 0 else None
)
# 5. Score: ppmi_a + ppmi_b (+ feature_distance for maxopp)
joined = joined.with_columns(
(pl.col("ppmi_a") + pl.col("ppmi_b")).alias("ppmi_total")
).sort("ppmi_total", descending=True)
# 6. Assemble candidates with skeletons
candidates: list[dict] = []
for row in joined.head(top_k * 2).iter_rows(named=True):
skel = skeleton_lookup.get((row["role_a"], row["role_b"]))
if skel is None:
continue
candidates.append({
"verb": row["verb"],
"role_a": row["role_a"],
"filler_a": row["filler_a"],
"role_b": row["role_b"],
"filler_b": row["filler_b"],
"skeleton": skel,
"ppmi_total": row["ppmi_total"],
"feature_distance": row.get("feature_distance", 0.0),
"sonorant_diff": row.get("sonorant_diff", 0.0),
})
if len(candidates) >= top_k:
break
return candidates
- [ ] Step 8.4: Run, verify pass
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v
Expected: 13 passed (or 12 + 1 skipped if exclude probe finds no candidates).
- [ ] Step 8.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-112: solve() orchestrator end-to-end
Top-level pair-driven resolver. Produces structured candidates with
verb, role-positioned fillers, skeleton, and ppmi_total. Verb is in
locked_slots not positional. Exclude/Bound apply to verb candidate
set the same way they apply to fillers.
Render + reranker integration is the next step (Task 11). For now
candidates carry intermediate fields so tests can verify structure.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 9: Wire surface realization¶
Files:
- Modify: <spike>/pair_driven.py
- Modify: <spike>/test_pair_driven_solve.py
Take a structured candidate (verb, fillers, skeleton) and produce a sentence string using the existing renderer in skeleton_csp.py. This re-uses surface realization rather than re-implementing it.
- [ ] Step 9.1: Inspect existing render function
grep -n "def.*render\|def.*surface\|_assemble_surface" /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py
The existing renderer in skeleton_csp.py (likely _render_sentence or similar) takes (verb, fillers_dict, skeleton_arg_structure). Use it directly.
- [ ] Step 9.2: Write failing test
Append to <spike>/test_pair_driven_solve.py:
def test_solve_produces_renderable_sentences(store, sel_df, skeletons_df):
"""Each returned candidate has a 'sentence' field with a rendered string."""
from constraint_surface import MinpairConstraint
import paradigm_3_csp
spec_words = paradigm_3_csp.spec_lexicon(store, "spec1")
candidates = pair_driven.solve(
spec_words=spec_words,
word_df=store.df,
sel_df=sel_df,
pairs_df=store.pairs_df,
skeletons_df=skeletons_df,
band="fineweb_adult",
constraints=[MinpairConstraint(phoneme1="d", phoneme2="z", position="final")],
top_k=4,
)
assert candidates
for c in candidates:
assert "sentence" in c
assert isinstance(c["sentence"], str)
assert len(c["sentence"]) > 0
# filler words should appear in the sentence
assert c["filler_a"] in c["sentence"] or c["filler_b"] in c["sentence"]
- [ ] Step 9.3: Run, verify fail
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py::test_solve_produces_renderable_sentences -v
Expected: AssertionError on missing sentence key.
- [ ] Step 9.4: Wire renderer in
solve()
In <spike>/pair_driven.py, modify the candidate assembly loop in solve() to call the existing renderer. The exact import depends on what's exported by skeleton_csp.py; if there isn't a public render function, factor one out or reuse the existing path's surface formatting.
from skeleton_csp import _render_candidate # or whatever the existing helper is
candidates: list[dict] = []
for row in joined.head(top_k * 2).iter_rows(named=True):
skel = skeleton_lookup.get((row["role_a"], row["role_b"]))
if skel is None:
continue
fillers = {
row["role_a"]: row["filler_a"],
row["role_b"]: row["filler_b"],
"V": row["verb"],
}
sentence = _render_candidate(
arg_structure=skel,
verb=row["verb"],
fillers=fillers,
)
candidates.append({
"verb": row["verb"],
"role_a": row["role_a"],
"filler_a": row["filler_a"],
"role_b": row["role_b"],
"filler_b": row["filler_b"],
"skeleton": skel,
"ppmi_total": row["ppmi_total"],
"feature_distance": row.get("feature_distance", 0.0),
"sonorant_diff": row.get("sonorant_diff", 0.0),
"sentence": sentence,
})
if len(candidates) >= top_k:
break
return candidates
If skeleton_csp.py doesn't expose a render function, factor one out from the existing solve_shape body and rename it _render_candidate. Use the existing rendering logic verbatim — don't re-derive determiner placement / conjugation / advmod fill.
- [ ] Step 9.5: Run, verify pass
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py -v
Expected: 14 passed.
- [ ] Step 9.6: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py
git commit -m "$(cat <<'EOF'
PHON-112: wire surface realization into solve()
Each candidate now carries a 'sentence' field rendered by the existing
skeleton_csp.py surface formatter. No re-derivation of determiner /
conjugation / advmod logic.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 10: Retire PHON-106 v1 linked-slot mode¶
Files:
- Modify: <spike>/skeleton_csp.py
- Modify: <spike>/paradigm_3_csp.py
- Modify: <spike>/test_contrastive_scorers.py
The new path in pair_driven.py is now end-to-end functional. Retire the v1 linked-slot mode that lives inside _enumerate_vectorized and the solve_shape/paradigm_3_csp.solve contrastive detection block.
- [ ] Step 10.1: Inspect retire candidates
grep -n "contrast_pair_frame\|contrast_axis_name\|MinpairConstraint\|MaxoppConstraint\|contrast_constraints" \
/Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
| head -30
Identify:
- The contrastive detection block in solve_shape (~50 lines)
- The linked-slot block in _enumerate_vectorized (~30 lines)
- The score_cols contrast_* filter in _enumerate_vectorized and _dedup_and_assemble
- The python-fallback assert not contrast_constraints guard
- [ ] Step 10.2: Remove contrastive code from
skeleton_csp.py
In solve_shape, remove the entire contrastive detection block (validation guards + pair loading + contrast_pair_frame/contrast_axis_name plumbing). Keep the MultoppConstraint check (it still raises in v1 because PHON-113 is the home).
In _enumerate_vectorized, remove the if contrast_pair_frame is not None: branch and the else: wrapper around the standard cartesian. Restore the simpler unconditional cartesian.
Remove the c.startswith("contrast_") filter from score_cols in both _enumerate_vectorized and _dedup_and_assemble.
Remove the python-fallback assert not contrast_constraints guard.
Keep _load_pairs_for_request — pair_driven.py uses it.
- [ ] Step 10.3: Remove pairs_df + constraints kwargs from
solve_shapesignature
solve_shape signature shrinks back to its pre-PHON-106 form. The PHON-106-specific kwargs (pairs_df, constraints for contrastive routing) move to pair_driven.solve().
solve_shape retains constraints only for non-contrastive constraint types (Exclude, Include, Bound, BoundBoost) — wait, those are also moving to pair_driven. Check the existing usage:
grep -n "constraints" /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py | head -10
If solve_shape consumes constraints for non-contrastive paths internally and those are NOT yet rerouted through pair_driven, retain that parameter. The retire scope is only the contrastive (linked-slot) parts.
- [ ] Step 10.4: Update
paradigm_3_csp.solve()
paradigm_3_csp.solve(verb, ...) now becomes a thin compat shim that calls pair_driven.solve(...) with locked_slots={"V": verb}. Or, if it's only used internally, delete it and update callers to use pair_driven.solve directly.
def solve(
verb: str,
spec_name: str,
spec_words: frozenset[str],
sel_df: pl.DataFrame,
*,
constraints: list | None = None,
word_df: pl.DataFrame | None = None,
pairs_df: pl.DataFrame | None = None,
skeletons_df: pl.DataFrame | None = None,
band: str = "fineweb_adult",
top_k: int = 8,
):
"""Compat shim around pair_driven.solve(). Will be retired in PHON-109
productionization."""
import pair_driven
if word_df is None or skeletons_df is None:
raise ValueError("word_df and skeletons_df required")
candidates = pair_driven.solve(
spec_words=spec_words,
word_df=word_df,
sel_df=sel_df,
pairs_df=pairs_df,
skeletons_df=skeletons_df,
band=band,
constraints=list(constraints or []),
locked_slots={"V": verb},
top_k=top_k,
)
return candidates, {}
- [ ] Step 10.5: Update / retire PHON-106 contrastive tests
In <spike>/test_contrastive_scorers.py, the realization, error, and routing tests test the v1 path. Since v1 is retired, delete those tests. The remaining tests:
- test_load_pairs_for_minpair_basic — still tests the helper, retained
- test_load_pairs_for_maxopp_filters_sonorant_diff — retained
- test_load_pairs_emits_both_orientations — retained
- test_minpair_constraint_accepts_slots_kwarg (Task 1) — retained
- test_minpair_constraint_default_slots_is_none (Task 1) — retained
- test_maxopp_constraint_accepts_slots_kwarg (Task 1) — retained
Delete:
- test_minpair_linked_slot_realization
- test_minpair_without_pairs_df_errors
- test_minpair_single_content_slot_errors
- test_multopp_in_single_sentence_errors
- test_both_minpair_and_maxopp_errors
- test_maxopp_feature_distance_in_components
- test_pairs_empty_after_filter_returns_no_candidates
- test_minpair_uses_vectorized_path
- test_multopp_forces_python_fallback_in_routing
The replacement tests live in test_pair_driven_solve.py.
- [ ] Step 10.6: Run full spike suite
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v
Expected: All retained tests pass. The deleted tests are no longer collected. Total count drops by 9 from PHON-106's 12 contrastive tests, but ~14 new tests in test_pair_driven_solve.py more than compensate.
- [ ] Step 10.7: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/skeleton_csp.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_contrastive_scorers.py
git commit -m "$(cat <<'EOF'
PHON-112: retire v1 PHON-106 linked-slot mode
Removes:
- _enumerate_vectorized's contrast_pair_frame branch
- solve_shape's contrastive detection block + pairs_df kwarg
- score_cols contrast_* filter
- 9 test_contrastive_scorers tests that asserted v1 behavior
Replaced by pair_driven.solve() which uses a selectional self-join
(verb falls out of the join, role assignment falls out of the join).
paradigm_3_csp.solve becomes a thin compat shim around
pair_driven.solve with locked_slots={"V": verb}.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 11: Update eval harness¶
Files:
- Modify: <spike>/build_judging_set.py
Drop the hardcoded VERBS = [...] list. Rebuild requests so the verb candidate set is derived per-spec from the new compute_verb_candidates.
- [ ] Step 11.1: Inspect existing harness
grep -n "VERBS\|for verb in\|verb=" /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/build_judging_set.py | head -15
Find the request-builder loops over VERBS. They construct judging requests that pin verb= per request.
- [ ] Step 11.2: Rewrite request-building
Replace the hardcoded VERBS loop with: for each spec × band × constraints, request a top-K from pair_driven.solve() (no verb fixing). The verb is now a property of the candidate, not the request.
# In the request-building loop:
for spec_name in SPECS:
spec_words = paradigm_3_csp.spec_lexicon(store, spec_name)
for band in BANDS:
for c_label, constraints in CONSTRAINT_CONFIGS.items():
candidates = pair_driven.solve(
spec_words=spec_words,
word_df=store.df,
sel_df=sel_df,
pairs_df=store.pairs_df,
skeletons_df=skeletons_df,
band=band,
constraints=constraints,
top_k=SINGLE_TOP_K,
)
requests.append({
"request_id": f"single_{spec_name}_{band}_{c_label}",
"request_type": "single",
"spec": spec_name,
"band": band,
"constraints": _serialize_constraints(constraints),
"candidates": [_candidate_payload(c) for c in candidates],
})
The VERBS constant is deleted. The PARAGRAPH_CHAINS constant stays (paragraphs are PHON-113 scope).
- [ ] Step 11.3: Smoke-test the harness
cd /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms && \
uv run python build_judging_set.py --dry-run 2>&1 | tail -10
Expected: Produces a judging_set.jsonl with structured candidates; no VERBS-list iteration.
- [ ] Step 11.4: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/build_judging_set.py
git commit -m "$(cat <<'EOF'
PHON-112: eval harness uses constraint-driven verb selection
Drops the hardcoded VERBS list; verb candidates fall out of
pair_driven.solve()'s selectional join. Requests are now keyed on
(spec, band, constraints) rather than (verb, spec, band, constraints).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 12: Final verification¶
- [ ] Step 12.1: Full spike suite
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v
Expected: all tests pass; net new test count > 0 vs PHON-106 v1.
- [ ] Step 12.2: data-layer suite still passes
cd /Users/jneumann/Repos/PhonoLex && \
uv run python -m pytest packages/data/tests/ -q
Expected: 209 passed (or whatever the current Task-4-era count is).
- [ ] Step 12.3: Smoke test — minpair + exclude
cd /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms && \
uv run python -c "
import polars as pl
from pathlib import Path
from phonolex_data.runtime.store import WordStore
import pair_driven
from constraint_surface import MinpairConstraint, ExcludeConstraint
import paradigm_3_csp
repo = Path('/Users/jneumann/Repos/PhonoLex')
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
skeletons_df = pl.read_parquet(Path('outputs') / 'skeletons.parquet')
spec_words = paradigm_3_csp.spec_lexicon(store, 'spec1')
candidates = pair_driven.solve(
spec_words=spec_words,
word_df=store.df,
sel_df=sel_df,
pairs_df=store.pairs_df,
skeletons_df=skeletons_df,
band='fineweb_adult',
constraints=[MinpairConstraint(phoneme1='d', phoneme2='z', position='final')],
top_k=3,
)
for c in candidates:
print(c['sentence'], '|', c['verb'], (c['filler_a'], c['filler_b']))
"
Expected: 3 sentences printed, each with a (d, z)-final filler pair from spec1.
- [ ] Step 12.4: No further commit needed
If all three steps green, PHON-112 is complete. The commit history shows the migration from v1 linked-slot to v2 pair-driven.
Done¶
After Task 12 verification, PHON-112 v1 is complete on feature/csp-iteration. Stack continues toward PHON-109 productionization.
Follow-ups not in this plan:
- PHON-113 — paragraph composition (multopp, shared discourse subject, pronoun coref)
- PHON-107 — reranker v2 (trains on pair-driven output, not v1 linked-slot output)
- PHON-109 — productionize: replace /api/generate-single internals
- PHON-110 — frontend reframe (top-K candidates with per-axis breakdown)