PHON-113 — Paragraph CSP Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Rewrite paragraph composition on top of pair_driven.solve() so contrastive constraints fire per-sentence; add MultoppConstraint as a paragraph-native constraint via (N+1)-way selectional self-join.
Architecture: Multopp produces a join row (verb, role, sub_word, target_words[N]); non-multopp paragraphs run independent per-sentence solves with shared discourse subject. Verbs fall out of the joins. Cheap coherence (subject + coref + agreement + markers + variety) preserved; semantic coherence is reranker's job.
Tech Stack: Polars eager joins, pytest TDD, frozen dataclasses, reuses PHON-112's pair_driven helpers.
File map¶
Files modified:
- <spike>/pair_driven.py — add _words_with_phoneme_at_position, _resolve_multopp_buckets, resolve_multopp_join
- <spike>/paragraph_csp.py — replace solve_paragraph body, drop ParagraphSpec.verbs, migrate helpers
- <spike>/test_paragraph_csp.py (create or reuse existing) — paragraph behavior tests
- <spike>/test_pair_driven_solve.py — multopp join tests
Where <spike> = /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/.
Tests run via cd /Users/jneumann/Repos/PhonoLex/packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v.
Task 1: phoneme-position helper¶
Files:
- Modify: <spike>/pair_driven.py
- Modify: <spike>/test_pair_driven_solve.py
Helper that returns the set of words containing a given phoneme at a given position, computed from words.parquet's phonemes column.
- [ ] Step 1.1: Append failing tests
def test_words_with_phoneme_at_position_initial(store):
# /k/ at initial: cat, kid, key, etc.
words = pair_driven._words_with_phoneme_at_position(
word_df=store.df, phoneme="k", position="initial",
allow_set=frozenset({"cat", "kid", "key", "bat", "bid"}),
)
assert "cat" in words
assert "kid" in words
assert "key" in words
assert "bat" not in words
assert "bid" not in words
def test_words_with_phoneme_at_position_final(store):
# /d/ at final
words = pair_driven._words_with_phoneme_at_position(
word_df=store.df, phoneme="d", position="final",
allow_set=frozenset({"bed", "bad", "bat", "kid"}),
)
assert "bed" in words
assert "bad" in words
assert "kid" in words
assert "bat" not in words
def test_words_with_phoneme_at_position_any(store):
words = pair_driven._words_with_phoneme_at_position(
word_df=store.df, phoneme="t", position="any",
allow_set=frozenset({"cat", "tap", "kid", "test"}),
)
assert "cat" in words
assert "tap" in words
assert "test" in words
assert "kid" not in words
-
[ ] Step 1.2: Run, verify fail (AttributeError)
-
[ ] Step 1.3: Implement helper
Append to pair_driven.py:
def _words_with_phoneme_at_position(
*,
word_df: pl.DataFrame,
phoneme: str,
position: str, # "initial" | "medial" | "final" | "any"
allow_set: frozenset[str],
) -> frozenset[str]:
"""Return words in `allow_set` whose `phonemes` list contains `phoneme`
at `position` (initial=first, final=last, medial=anywhere except first/last,
any=anywhere)."""
df = word_df.filter(pl.col("word").is_in(list(allow_set)))
if position == "initial":
df = df.filter(pl.col("phonemes").list.first() == phoneme)
elif position == "final":
df = df.filter(pl.col("phonemes").list.last() == phoneme)
elif position == "medial":
# phonemes[1:-1] contains phoneme
df = df.filter(
pl.col("phonemes").list.slice(1, pl.col("phonemes").list.len() - 2).list.contains(phoneme)
)
elif position == "any":
df = df.filter(pl.col("phonemes").list.contains(phoneme))
else:
raise ValueError(f"unknown position: {position}")
return frozenset(df["word"].to_list())
-
[ ] Step 1.4: Run, verify 3 pass
-
[ ] Step 1.5: Commit
git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-113: add _words_with_phoneme_at_position helper
Returns words from allow_set whose phonemes list contains a target
phoneme at a target position. Supports initial/medial/final/any.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 2: multopp filler bucket resolver¶
Files:
- Modify: <spike>/pair_driven.py
- Modify: <spike>/test_pair_driven_solve.py
Given a MultoppConstraint and the spec lexicon, produce a dict mapping each filler-phoneme (substitute + N targets) to its word bucket.
- [ ] Step 2.1: Append failing tests
def test_resolve_multopp_buckets_returns_one_bucket_per_phoneme(store):
from constraint_surface import MultoppConstraint
c = MultoppConstraint(
substitute="t", targets=("s", "ʃ", "tʃ"), n_targets=3,
position="initial",
)
spec = frozenset({"top", "sop", "shop", "chop", "cop", "kid"}) # cop, kid have neither
buckets = pair_driven._resolve_multopp_buckets(
constraint=c, word_df=store.df, allow_set=spec,
)
# 4 phonemes (1 sub + 3 targets); each maps to a (possibly empty) bucket
assert set(buckets.keys()) == {"t", "s", "ʃ", "tʃ"}
assert "top" in buckets["t"]
assert "sop" in buckets["s"]
# cop, kid don't have any of the 4 phonemes at initial → in no bucket
for ws in buckets.values():
assert "cop" not in ws
assert "kid" not in ws
def test_resolve_multopp_buckets_respects_n_targets(store):
from constraint_surface import MultoppConstraint
c = MultoppConstraint(
substitute="t", targets=("s", "ʃ", "tʃ", "z"), n_targets=2, # only first 2 targets
position="initial",
)
spec = frozenset({"top", "sop", "shop", "chop", "zoo"})
buckets = pair_driven._resolve_multopp_buckets(
constraint=c, word_df=store.df, allow_set=spec,
)
# n_targets=2 → 1 sub + 2 targets = 3 buckets
assert set(buckets.keys()) == {"t", "s", "ʃ"}
assert "tʃ" not in buckets
assert "z" not in buckets
-
[ ] Step 2.2: Run, verify fail
-
[ ] Step 2.3: Implement helper
def _resolve_multopp_buckets(
*,
constraint: "MultoppConstraint",
word_df: pl.DataFrame,
allow_set: frozenset[str],
) -> dict[str, frozenset[str]]:
"""Return phoneme → words mapping for substitute + targets[:n_targets]
at constraint.position."""
fillers = (constraint.substitute,) + constraint.targets[:constraint.n_targets]
return {
phoneme: _words_with_phoneme_at_position(
word_df=word_df,
phoneme=phoneme,
position=constraint.position,
allow_set=allow_set,
)
for phoneme in fillers
}
Add from constraint_surface import MultoppConstraint at top of pair_driven.py if not already there.
-
[ ] Step 2.4: Run, verify pass
-
[ ] Step 2.5: Commit
Task 3: multopp N+1-way join¶
Files:
- Modify: <spike>/pair_driven.py
- Modify: <spike>/test_pair_driven_solve.py
Given the filler buckets and a verb candidate set, find (verb, role) groups that have at least one representative from EVERY bucket.
- [ ] Step 3.1: Append failing tests
def test_resolve_multopp_join_returns_full_coverage_groups(store, sel_df):
"""A (verb, role) group with at least one filler from each bucket is returned."""
buckets = {
"t": frozenset({"top", "tap"}),
"s": frozenset({"sop", "soup"}),
"ʃ": frozenset({"ship", "shop"}),
}
rows = pair_driven.resolve_multopp_join(
buckets=buckets,
sel_df=sel_df,
verb_candidates=frozenset({"see", "make", "drink"}),
band="fineweb_adult",
)
if rows.height > 0:
for row in rows.iter_rows(named=True):
covered = set(row["fillers_per_bucket"].keys()) if isinstance(row.get("fillers_per_bucket"), dict) else set()
# The exact representation depends on the join's output schema —
# the contract is: every bucket has ≥1 representative in this (verb, role)
def test_resolve_multopp_join_empty_when_no_full_coverage(store, sel_df):
"""If no (verb, role) has all buckets covered, return empty."""
buckets = {
"zzz": frozenset({"phantom_word_zzz"}), # nothing has rows
}
rows = pair_driven.resolve_multopp_join(
buckets=buckets,
sel_df=sel_df,
verb_candidates=frozenset({"see"}),
band="fineweb_adult",
)
assert rows.height == 0
The first test's exact assertion shape depends on the chosen output schema — see implementation. Adjust assertions to match.
-
[ ] Step 3.2: Run, verify fail
-
[ ] Step 3.3: Implement helper
The output schema: one row per (verb, role) group with all buckets covered, plus per-bucket selected representative (highest-ppmi from that bucket within the group).
def resolve_multopp_join(
*,
buckets: dict[str, frozenset[str]],
sel_df: pl.DataFrame,
verb_candidates: frozenset[str],
band: str,
slots: tuple[str, ...] | None = None,
) -> pl.DataFrame:
"""N+1-way self-join on (verb, role) requiring at least one filler from each bucket.
Returns rows with columns:
verb, role, band,
bucket_<phoneme>_filler (one column per bucket — the chosen filler word)
bucket_<phoneme>_ppmi (its ppmi)
total_ppmi (sum across buckets)
If no group has full coverage, returns an empty frame.
"""
if not buckets or any(len(v) == 0 for v in buckets.values()):
return pl.DataFrame(schema={"verb": pl.Utf8, "role": pl.Utf8, "band": pl.Utf8, "total_ppmi": pl.Float32})
all_filler_words: set[str] = set()
for ws in buckets.values():
all_filler_words.update(ws)
sel_window = (
sel_df
.filter(pl.col("band") == band)
.filter(pl.col("verb").is_in(list(verb_candidates)))
.filter(pl.col("filler").is_in(list(all_filler_words)))
.filter(pl.col("ppmi") > 0.0)
)
if slots is not None:
sel_window = sel_window.filter(pl.col("role").is_in(list(slots)))
# For each (verb, role) group, find the best representative per bucket
# We do this by joining sel_window to itself once per bucket — N+1 self-joins
rows_per_bucket = []
for phoneme, bucket_words in buckets.items():
b = sel_window.filter(pl.col("filler").is_in(list(bucket_words)))
# Best ppmi per (verb, role) within this bucket
best = (
b.sort("ppmi", descending=True)
.group_by(["verb", "role"])
.agg([
pl.col("filler").first().alias(f"bucket_{phoneme}_filler"),
pl.col("ppmi").first().alias(f"bucket_{phoneme}_ppmi"),
])
)
rows_per_bucket.append(best)
# Inner-join all per-bucket frames on (verb, role) — only groups with all buckets survive
joined = rows_per_bucket[0]
for next_df in rows_per_bucket[1:]:
joined = joined.join(next_df, on=["verb", "role"], how="inner")
# Compute total ppmi
ppmi_cols = [f"bucket_{p}_ppmi" for p in buckets.keys()]
joined = joined.with_columns(
pl.sum_horizontal([pl.col(c) for c in ppmi_cols]).alias("total_ppmi")
).with_columns(pl.lit(band).alias("band"))
return joined
The output is wide-format: one column per bucket. Downstream consumers iterate buckets to assemble per-sentence specs.
- [ ] Step 3.4: Run, verify pass + smoke
cd /Users/jneumann/Repos/PhonoLex && uv run python -c "
import polars as pl
from pathlib import Path
from phonolex_data.runtime.store import WordStore
import sys; sys.path.insert(0, 'packages/generation/research/2026-05-07-sentence-generation-paradigms')
import pair_driven
store = WordStore.from_parquet(Path('data/runtime/words.parquet'))
sel_df = pl.read_parquet('data/runtime/selectional.parquet')
buckets = {
't': pair_driven._words_with_phoneme_at_position(word_df=store.df, phoneme='t', position='initial', allow_set=frozenset({'top','tap'})),
's': pair_driven._words_with_phoneme_at_position(word_df=store.df, phoneme='s', position='initial', allow_set=frozenset({'sop','soup'})),
}
print('buckets:', buckets)
rows = pair_driven.resolve_multopp_join(buckets=buckets, sel_df=sel_df, verb_candidates=frozenset({'eat','drink','make'}), band='fineweb_adult')
print('shape:', rows.shape)
print(rows.head(5))
"
- [ ] Step 3.5: Commit
Task 4: migrate _pick_discourse_subjects to pair_driven¶
Files:
- Modify: <spike>/paragraph_csp.py
Replace the existing helper that probes via solve_shape (which no longer fires contrastive constraints) with one that probes via pair_driven.solve().
- [ ] Step 4.1: Read current implementation
sed -n '105,145p' /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/paragraph_csp.py
Note the function signature, return shape, and what fixtures call it.
- [ ] Step 4.2: Write failing test (or convert existing if any)
A simple smoke test:
def test_pick_discourse_subjects_returns_top_n(store, sel_df, skeletons_df):
import paragraph_csp
spec_words = paragraph_csp.spec_lexicon(store, "spec1") if hasattr(paragraph_csp, "spec_lexicon") else None
subjects = paragraph_csp._pick_discourse_subjects(
verb_candidates=frozenset({"cut", "see", "make"}),
band="fineweb_adult",
domain_words=spec_words or frozenset({"cat", "dog", "kid"}),
word_df=store.df,
sel_df=sel_df,
pairs_df=store.pairs_df,
skeletons_df=skeletons_df,
top_n=3,
)
assert isinstance(subjects, list)
assert len(subjects) <= 3
- [ ] Step 4.3: Rewrite
_pick_discourse_subjects
def _pick_discourse_subjects(
*,
verb_candidates: frozenset[str],
band: str,
domain_words: frozenset[str],
word_df: pl.DataFrame,
sel_df: pl.DataFrame,
pairs_df: pl.DataFrame | None,
skeletons_df: pl.DataFrame,
top_n: int = 3,
) -> list[str]:
"""Top-N candidate discourse subjects via unconstrained pair_driven.solve.
Picks unique nsubj (or filler in the nsubj-position role) values across
multiple skeletons. No phonological constraints — discourse subjects are
discovered from the lexicon's PMI signal.
"""
import pair_driven
candidates = pair_driven.solve(
spec_words=domain_words,
word_df=word_df,
sel_df=sel_df,
pairs_df=pairs_df,
skeletons_df=skeletons_df,
band=band,
constraints=[],
top_k=top_n * 4, # over-fetch, dedup by subject
)
seen: dict[str, float] = {}
for c in candidates:
# The "discourse subject" is whichever filler ends up in nsubj
subj = (
c["filler_a"] if c.get("role_a") == "nsubj"
else c["filler_b"] if c.get("role_b") == "nsubj"
else None
)
if subj is None:
continue
if subj not in seen or c["ppmi_total"] > seen[subj]:
seen[subj] = c["ppmi_total"]
return [s for s, _ in sorted(seen.items(), key=lambda kv: -kv[1])[:top_n]]
-
[ ] Step 4.4: Run, verify pass
-
[ ] Step 4.5: Commit
Task 5: rewrite solve_paragraph non-multopp branch¶
Files:
- Modify: <spike>/paragraph_csp.py
The non-multopp branch: pick discourse subject → run independent per-sentence pair_driven.solve(locked_slots={"nsubj": subject}) for N sentences → assemble paragraph candidates by bounded cartesian.
- [ ] Step 5.1: Update
ParagraphSpecshape
Drop verbs: tuple[str, ...]. Add n_sentences: int = 3.
@dataclass(frozen=True)
class ParagraphSpec:
band: str
constraints: tuple[Constraint, ...] = ()
n_sentences: int = 3
discourse_subject: str | None = None
use_pronoun_coref: bool = True
n_paragraphs: int = 5
per_sentence_top_k: int = 4
n_subject_seeds: int = 3
- [ ] Step 5.2: Write failing test
def test_solve_paragraph_no_constraint_returns_n_paragraphs(store, sel_df, skeletons_df):
import paragraph_csp
spec = paragraph_csp.ParagraphSpec(
band="fineweb_adult",
n_sentences=3,
n_paragraphs=2,
)
spec_words = paragraph_csp.spec_lexicon(store, "spec1") if hasattr(paragraph_csp, "spec_lexicon") else frozenset({"cat", "dog", "kid", "bat", "ball"})
paragraphs = paragraph_csp.solve_paragraph(
spec=spec,
spec_words=spec_words,
word_df=store.df,
sel_df=sel_df,
pairs_df=store.pairs_df,
skeletons_df=skeletons_df,
)
assert isinstance(paragraphs, list)
assert len(paragraphs) <= 2
for p in paragraphs:
assert "sentences" in p
assert len(p["sentences"]) == 3
- [ ] Step 5.3: Rewrite
solve_paragraph(non-multopp branch only)
def solve_paragraph(
*,
spec: ParagraphSpec,
spec_words: frozenset[str],
word_df: pl.DataFrame,
sel_df: pl.DataFrame,
pairs_df: pl.DataFrame | None,
skeletons_df: pl.DataFrame,
) -> list[dict]:
"""Constraint-driven paragraph resolver."""
import pair_driven
from constraint_surface import MultoppConstraint
constraints = list(spec.constraints)
# Multopp branch: handled in Task 6
multopp = [c for c in constraints if isinstance(c, MultoppConstraint)]
if multopp:
# Placeholder until Task 6 lands
raise NotImplementedError("multopp branch — Task 6")
# Non-multopp branch
# 1. Pick discourse subjects
subjects: list[str]
if spec.discourse_subject:
subjects = [spec.discourse_subject]
else:
# Use verb_candidates from a single pair_driven probe to pick subjects
# We're using `_pick_discourse_subjects` from Task 4
verb_candidates = pair_driven.compute_verb_candidates_or_default() # see note
subjects = _pick_discourse_subjects(
verb_candidates=frozenset(), # no filter; let probe pick widely
band=spec.band,
domain_words=spec_words,
word_df=word_df,
sel_df=sel_df,
pairs_df=pairs_df,
skeletons_df=skeletons_df,
top_n=spec.n_subject_seeds,
)
# 2. For each subject, run N independent per-sentence solves
paragraphs: list[dict] = []
for subject in subjects:
per_sentence_candidates: list[list[dict]] = []
for sent_idx in range(spec.n_sentences):
sent_constraints = list(constraints)
# First sentence carries the contrast; others run unconstrained-by-contrast
if sent_idx > 0:
sent_constraints = [
c for c in sent_constraints
if c.type not in ("contrastive_minpair", "contrastive_maxopp")
]
cands = pair_driven.solve(
spec_words=spec_words,
word_df=word_df,
sel_df=sel_df,
pairs_df=pairs_df,
skeletons_df=skeletons_df,
band=spec.band,
constraints=sent_constraints,
locked_slots={"nsubj": subject},
top_k=spec.per_sentence_top_k,
)
if not cands:
break
per_sentence_candidates.append(cands)
if len(per_sentence_candidates) < spec.n_sentences:
continue
# 3. Bounded cartesian: top-K compositions
from itertools import product
for combo in product(*per_sentence_candidates):
sentences = list(combo)
paragraphs.append({
"discourse_subject": subject,
"sentences": sentences,
"score": sum(s["ppmi_total"] for s in sentences),
})
paragraphs.sort(key=lambda p: -p["score"])
return paragraphs[:spec.n_paragraphs]
The compute_verb_candidates_or_default in the sketch is a thinko — actually the verb candidates are computed inside pair_driven.solve() for each per-sentence call. The discourse-subject probe doesn't need a separate verb_candidates set; pass frozenset() to _pick_discourse_subjects which lets pair_driven.solve compute its own.
Adjust the call to _pick_discourse_subjects to match Task 4's signature.
-
[ ] Step 5.4: Run, verify the no-constraint test passes
-
[ ] Step 5.5: Commit
Task 6: multopp branch in solve_paragraph¶
Files:
- Modify: <spike>/paragraph_csp.py
- Modify: <spike>/test_paragraph_csp.py
Wire the multopp join into solve_paragraph. Each multopp join row produces ONE paragraph: N+1 sentences sharing (verb, role, discourse_subject), each using one of the N+1 fillers in the locked role.
- [ ] Step 6.1: Write failing test
def test_solve_paragraph_multopp_returns_n_plus_1_sentences(store, sel_df, skeletons_df):
from constraint_surface import MultoppConstraint
import paragraph_csp
spec = paragraph_csp.ParagraphSpec(
band="fineweb_adult",
constraints=(MultoppConstraint(
substitute="t", targets=("s", "ʃ"), n_targets=2,
position="initial",
),),
n_paragraphs=2,
)
spec_words = frozenset({"top", "tap", "sop", "soup", "ship", "shop", "kid"})
paragraphs = paragraph_csp.solve_paragraph(
spec=spec,
spec_words=spec_words,
word_df=store.df,
sel_df=sel_df,
pairs_df=store.pairs_df,
skeletons_df=skeletons_df,
)
if not paragraphs:
pytest.skip("no multopp paragraphs survived join — adjust filler set")
for p in paragraphs:
# Multopp: 1 sub + 2 targets = 3 sentences
assert len(p["sentences"]) == 3
# All sentences share the same verb (lock invariant)
verbs = {s["verb"] for s in p["sentences"]}
assert len(verbs) == 1
-
[ ] Step 6.2: Run, verify fail (NotImplementedError or AssertionError)
-
[ ] Step 6.3: Implement multopp branch in
solve_paragraph
Replace the raise NotImplementedError(...) with:
if multopp:
if len(multopp) > 1:
raise ValueError("at most one multopp constraint per request")
cc = multopp[0]
# Reject Minpair/Maxopp + Multopp combination
if any(c.type in ("contrastive_minpair", "contrastive_maxopp") for c in constraints):
raise ValueError("Multopp + Minpair/Maxopp simultaneously not supported")
# Resolve filler buckets
allow_sets = pair_driven.resolve_per_slot_allow_sets(
spec_words=spec_words, word_df=word_df,
constraints=[c for c in constraints if not isinstance(c, MultoppConstraint)],
slot_types=("nsubj", "dobj", "iobj"),
)
# Multopp fillers come from full lexicon ∩ phono filter ∩ spec
# (same allow set used for nsubj/dobj suffices)
filler_allow = allow_sets["dobj"] # default; cc.slots can override
buckets = pair_driven._resolve_multopp_buckets(
constraint=cc, word_df=word_df, allow_set=filler_allow,
)
# Verb candidates (same logic as PHON-112)
verb_full = pair_driven.resolve_per_slot_allow_sets(
spec_words=frozenset(), word_df=word_df,
constraints=[c for c in constraints if not isinstance(c, MultoppConstraint)],
slot_types=("V",),
)["V"]
verb_set = pair_driven.compute_verb_candidates(
spec_words=verb_full, word_df=word_df, sel_df=sel_df, band=spec.band,
)
# N+1-way join
joined = pair_driven.resolve_multopp_join(
buckets=buckets,
sel_df=sel_df,
verb_candidates=verb_set,
band=spec.band,
slots=cc.slots,
)
if joined.height == 0:
return []
# Top join rows by total_ppmi
joined = joined.sort("total_ppmi", descending=True).head(spec.n_paragraphs * 2)
# For each join row, pick a discourse subject and realize N+1 sentences
paragraphs: list[dict] = []
for row in joined.iter_rows(named=True):
verb = row["verb"]
role = row["role"]
# Discourse subject: probe with verb locked using one filler
sample_filler = row[f"bucket_{cc.substitute}_filler"]
subject_probe = pair_driven.solve(
spec_words=spec_words, word_df=word_df, sel_df=sel_df,
pairs_df=pairs_df, skeletons_df=skeletons_df,
band=spec.band,
constraints=[c for c in constraints if not isinstance(c, MultoppConstraint)],
locked_slots={"V": verb, role: sample_filler},
top_k=1,
)
if not subject_probe:
continue
subject = (
subject_probe[0]["filler_a"] if subject_probe[0]["role_a"] == "nsubj"
else subject_probe[0].get("filler_b")
)
if subject is None:
continue
# Realize N+1 sentences — one per filler
phonemes = list(buckets.keys())
sentences = []
for phoneme in phonemes:
filler = row[f"bucket_{phoneme}_filler"]
sent_cands = pair_driven.solve(
spec_words=spec_words, word_df=word_df, sel_df=sel_df,
pairs_df=pairs_df, skeletons_df=skeletons_df,
band=spec.band,
constraints=[c for c in constraints if not isinstance(c, MultoppConstraint)],
locked_slots={"V": verb, role: filler, "nsubj": subject},
top_k=1,
)
if not sent_cands:
sentences = []
break
sentences.append(sent_cands[0])
if not sentences:
continue
paragraphs.append({
"discourse_subject": subject,
"sentences": sentences,
"score": row["total_ppmi"],
})
return paragraphs[:spec.n_paragraphs]
This is the most complex single step in the plan. Read the spec carefully before implementing.
-
[ ] Step 6.4: Run, verify pass (or skip-on-data)
-
[ ] Step 6.5: Commit
Task 7: pronoun coref + discourse markers¶
Files:
- Modify: <spike>/paragraph_csp.py
- Modify: <spike>/test_paragraph_csp.py
Apply pronoun coref to sentences 2..N+1 when use_pronoun_coref=True. Apply discourse markers (e.g., "Then,", "After that,") at random to sentence 2..N+1.
- [ ] Step 7.1: Append failing test
def test_solve_paragraph_pronoun_coref(store, sel_df, skeletons_df):
import paragraph_csp
spec = paragraph_csp.ParagraphSpec(
band="fineweb_adult", n_sentences=3, n_paragraphs=1,
discourse_subject="cat", # locked to test coref
use_pronoun_coref=True,
)
spec_words = frozenset({"cat", "dog", "kid", "bat", "ball", "chase"})
paragraphs = paragraph_csp.solve_paragraph(
spec=spec, spec_words=spec_words, word_df=store.df, sel_df=sel_df,
pairs_df=store.pairs_df, skeletons_df=skeletons_df,
)
if not paragraphs:
pytest.skip("no paragraphs survived")
p = paragraphs[0]
sentences = [s["sentence"] for s in p["sentences"]]
# First sentence has the explicit subject
assert "cat" in sentences[0].lower() or "the cat" in sentences[0].lower()
# Sentences 2-3 use pronouns ("it" or "the cat" coref-via-realize)
# Note: the existing _pronoun_for helper returns "it" for inanimate singular,
# "they" for plural. Verify the coref applied.
- [ ] Step 7.2: Implement coref
Reuse the existing _pronoun_for(noun) -> str helper. After realizing sentences 2..N+1, post-process the rendered sentence to substitute the pronoun for the discourse subject.
def _apply_coref(sentences: list[dict], discourse_subject: str) -> list[dict]:
"""Substitute pronouns for the discourse subject in sentences 2..N+1."""
if len(sentences) <= 1:
return sentences
pronoun = _pronoun_for(discourse_subject)
out = [sentences[0]]
for s in sentences[1:]:
# Replace "the <subject>" or "<subject>" with capitalized pronoun
sentence_text = s["sentence"]
# Replace at start of sentence (capitalized)
sentence_text = sentence_text.replace(
f"The {discourse_subject}", pronoun.capitalize()
)
sentence_text = sentence_text.replace(
f"the {discourse_subject}", pronoun
)
out.append({**s, "sentence": sentence_text})
return out
Apply this after the per-sentence solves in both branches (multopp and non-multopp).
- [ ] Step 7.3: Discourse markers
Add "Then,", "After that,", "Finally," to sentences 2..N+1 in random order. Keep simple:
DISCOURSE_MARKERS = ["Then,", "After that,", "Finally,"]
def _apply_discourse_markers(sentences: list[dict], rng_seed: int = 0) -> list[dict]:
if len(sentences) <= 1:
return sentences
import random
rng = random.Random(rng_seed)
markers = rng.sample(DISCOURSE_MARKERS, k=min(len(sentences) - 1, len(DISCOURSE_MARKERS)))
out = [sentences[0]]
for s, m in zip(sentences[1:], markers):
out.append({**s, "sentence": f"{m} {s['sentence']}"})
return out
Apply after coref. The seed could be derived from the paragraph index for determinism.
-
[ ] Step 7.4: Run, verify pass
-
[ ] Step 7.5: Commit
Task 8: subject diversification¶
Files:
- Modify: <spike>/paragraph_csp.py
- Modify: <spike>/test_paragraph_csp.py
Top-K paragraphs by score, but with distinct discourse subjects.
- [ ] Step 8.1: Existing
_diversify_by_subjecthelper
The original paragraph_csp.py has _diversify_by_subject — reuse if its signature is compatible. If not, write a new one:
def _diversify_by_subject(paragraphs: list[dict], n: int) -> list[dict]:
"""Top-N paragraphs by score, with distinct discourse subjects when possible."""
seen_subjects: set[str] = set()
diversified: list[dict] = []
for p in paragraphs:
if p["discourse_subject"] in seen_subjects:
continue
diversified.append(p)
seen_subjects.add(p["discourse_subject"])
if len(diversified) >= n:
break
if len(diversified) < n:
# Backfill with non-diversified
for p in paragraphs:
if p not in diversified:
diversified.append(p)
if len(diversified) >= n:
break
return diversified
- [ ] Step 8.2: Wire into
solve_paragraph
In both branches (multopp and non-multopp), apply _diversify_by_subject(all_paragraphs, n=spec.n_paragraphs) before returning.
- [ ] Step 8.3: Test diversification
def test_solve_paragraph_diversifies_subjects(store, sel_df, skeletons_df):
import paragraph_csp
spec = paragraph_csp.ParagraphSpec(
band="fineweb_adult", n_sentences=2, n_paragraphs=3, n_subject_seeds=5,
)
spec_words = frozenset({"cat", "dog", "kid", "bat", "ball"})
paragraphs = paragraph_csp.solve_paragraph(
spec=spec, spec_words=spec_words, word_df=store.df, sel_df=sel_df,
pairs_df=store.pairs_df, skeletons_df=skeletons_df,
)
if len(paragraphs) >= 2:
subjects = [p["discourse_subject"] for p in paragraphs]
# If at least 2 distinct subjects were available, paragraphs should have them
assert len(set(subjects)) >= 2 or len(paragraphs) < 2
- [ ] Step 8.4: Commit
Task 9: minpair/maxopp paragraph integration test¶
Files:
- Modify: <spike>/test_paragraph_csp.py
The non-multopp branch already has the routing (Task 5: first sentence carries contrast). This task verifies it works end-to-end.
- [ ] Step 9.1: Append test
def test_solve_paragraph_minpair_carried_by_first_sentence(store, sel_df, skeletons_df):
from constraint_surface import MinpairConstraint
import paragraph_csp
spec = paragraph_csp.ParagraphSpec(
band="fineweb_adult",
n_sentences=3,
n_paragraphs=1,
constraints=(MinpairConstraint(phoneme1="d", phoneme2="z", position="final"),),
)
spec_words = paragraph_csp.spec_lexicon(store, "spec1") if hasattr(paragraph_csp, "spec_lexicon") else frozenset()
paragraphs = paragraph_csp.solve_paragraph(
spec=spec, spec_words=spec_words, word_df=store.df, sel_df=sel_df,
pairs_df=store.pairs_df, skeletons_df=skeletons_df,
)
if not paragraphs:
pytest.skip("no paragraphs survived")
p = paragraphs[0]
# First sentence: the contrast pair landed somewhere in the (filler_a, filler_b)
s1 = p["sentences"][0]
# This is a smoke check; deeper validation would verify (filler_a, filler_b) is a real (d, z) final pair
assert "filler_a" in s1
assert "filler_b" in s1
-
[ ] Step 9.2: Run, verify pass
-
[ ] Step 9.3: Commit
Task 10: drop ParagraphSpec.verbs, retire v1 helpers¶
Files:
- Modify: <spike>/paragraph_csp.py
- Modify: <spike>/test_paragraph_csp.py and any other tests that reference ParagraphSpec.verbs
- [ ] Step 10.1: Search for usages
grep -rn "ParagraphSpec(verbs=\|spec\.verbs\|verbs=verbs\|\.verbs" packages/generation/research/2026-05-07-sentence-generation-paradigms/
Identify all callers/tests that use verbs=... argument or read spec.verbs.
- [ ] Step 10.2: Update each caller
For each caller, rewrite to use n_sentences=... instead of verbs=.... The CALL_SIGNATURE change is one-line per caller.
- [ ] Step 10.3: Delete v1-only helpers
_solve_sentence (replaced by inline pair_driven.solve calls in solve_paragraph) — delete.
Other helpers that depended on solve_shape's pre-PHON-112 contrastive path — delete or rewrite.
- [ ] Step 10.4: Run full spike suite
cd /Users/jneumann/Repos/PhonoLex/packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v
Expected: all tests pass; no verbs= references remain.
- [ ] Step 10.5: Commit
Task 11: paragraph eval harness update¶
Files:
- Modify: <spike>/build_judging_set.py
The eval harness's paragraph branch still uses ParagraphSpec(verbs=verbs, ...). Update to the new signature.
- [ ] Step 11.1: Inspect current paragraph branch
grep -n "ParagraphSpec\|PARAGRAPH_CHAINS\|build_paragraph_requests" packages/generation/research/2026-05-07-sentence-generation-paradigms/build_judging_set.py
- [ ] Step 11.2: Rewrite
Replace the for chain_label, verbs in PARAGRAPH_CHAINS: loop with one keyed on (label, n_sentences) instead of (label, verbs):
PARAGRAPH_CONFIGS = [
("para_3_sent", 3),
("para_5_sent", 5),
]
for label, n_sent in PARAGRAPH_CONFIGS:
for spec_name in SPECS:
for band in BANDS:
for c_label, constraints in CONSTRAINT_CONFIGS.items():
spec = ParagraphSpec(
band=band,
constraints=tuple(constraints),
n_sentences=n_sent,
n_paragraphs=PARAGRAPH_TOP_K,
)
paragraphs = solve_paragraph(
spec=spec, spec_words=spec_words,
word_df=store.df, sel_df=sel_df,
pairs_df=store.pairs_df,
skeletons_df=skeletons_df,
)
# ...
- [ ] Step 11.3: Smoke test
cd /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms && \
uv run python build_judging_set.py --dry-run 2>&1 | tail -20
- [ ] Step 11.4: Commit
Task 12: final verification¶
- [ ] Step 12.1: Full spike suite
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v
Expected: all passing.
- [ ] Step 12.2: Data layer suite
cd /Users/jneumann/Repos/PhonoLex && \
uv run python -m pytest packages/data/tests/ -q
Expected: 209 passed (no regressions).
- [ ] Step 12.3: Smoke — multopp paragraph
cd /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms && \
uv run python -c "
import polars as pl
from pathlib import Path
from phonolex_data.runtime.store import WordStore
import paragraph_csp
from constraint_surface import MultoppConstraint
repo = Path('/Users/jneumann/Repos/PhonoLex')
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
skeletons_df = pl.read_parquet(Path('outputs') / 'skeletons.parquet')
spec_words = paragraph_csp.spec_lexicon(store, 'spec1') if hasattr(paragraph_csp, 'spec_lexicon') else frozenset()
spec = paragraph_csp.ParagraphSpec(
band='fineweb_adult',
constraints=(MultoppConstraint(substitute='t', targets=('s', 'ʃ'), n_targets=2, position='initial'),),
n_paragraphs=2,
)
paragraphs = paragraph_csp.solve_paragraph(
spec=spec, spec_words=spec_words, word_df=store.df, sel_df=sel_df,
pairs_df=store.pairs_df, skeletons_df=skeletons_df,
)
print(f'{len(paragraphs)} multopp paragraphs')
for p in paragraphs[:2]:
print(f'subject={p[\"discourse_subject\"]} score={p[\"score\"]:.2f}')
for s in p['sentences']:
print(f' {s[\"sentence\"]}')
"
Expected: 1-2 paragraphs printed, each with 3 sentences (1 substitute + 2 targets) sharing the same verb.
- [ ] Step 12.4: Smoke — minpair paragraph
uv run python -c "
import polars as pl
from pathlib import Path
from phonolex_data.runtime.store import WordStore
import paragraph_csp
from constraint_surface import MinpairConstraint
repo = Path('/Users/jneumann/Repos/PhonoLex')
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
skeletons_df = pl.read_parquet(Path('outputs') / 'skeletons.parquet')
spec_words = paragraph_csp.spec_lexicon(store, 'spec1') if hasattr(paragraph_csp, 'spec_lexicon') else frozenset()
spec = paragraph_csp.ParagraphSpec(
band='fineweb_adult',
constraints=(MinpairConstraint(phoneme1='d', phoneme2='z', position='final'),),
n_sentences=3, n_paragraphs=1,
)
paragraphs = paragraph_csp.solve_paragraph(
spec=spec, spec_words=spec_words, word_df=store.df, sel_df=sel_df,
pairs_df=store.pairs_df, skeletons_df=skeletons_df,
)
print(f'{len(paragraphs)} minpair paragraphs')
for p in paragraphs:
for s in p['sentences']:
print(f' {s[\"sentence\"]}')
"
Expected: 1 paragraph, 3 sentences. First sentence's (filler_a, filler_b) is a real (d, z) final pair.
- [ ] Step 12.5: No further commit needed
Done¶
After Task 12 verification, PHON-113 v1 is complete. Stack continues toward PHON-107 (reranker v2) and PHON-109 productionization.
Follow-ups not in this plan:
- PHON-107 — reranker v2 trains on both single-sentence and paragraph output from the constraint-driven path
- PHON-109 — productionize: replace /api/generate-single and /api/generate-paragraph internals
- PHON-110 — frontend reframe (top-K candidates UI)