Skip to content

PHON-113 — Paragraph CSP Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Rewrite paragraph composition on top of pair_driven.solve() so contrastive constraints fire per-sentence; add MultoppConstraint as a paragraph-native constraint via (N+1)-way selectional self-join.

Architecture: Multopp produces a join row (verb, role, sub_word, target_words[N]); non-multopp paragraphs run independent per-sentence solves with shared discourse subject. Verbs fall out of the joins. Cheap coherence (subject + coref + agreement + markers + variety) preserved; semantic coherence is reranker's job.

Tech Stack: Polars eager joins, pytest TDD, frozen dataclasses, reuses PHON-112's pair_driven helpers.


File map

Files modified: - <spike>/pair_driven.py — add _words_with_phoneme_at_position, _resolve_multopp_buckets, resolve_multopp_join - <spike>/paragraph_csp.py — replace solve_paragraph body, drop ParagraphSpec.verbs, migrate helpers - <spike>/test_paragraph_csp.py (create or reuse existing) — paragraph behavior tests - <spike>/test_pair_driven_solve.py — multopp join tests

Where <spike> = /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/.

Tests run via cd /Users/jneumann/Repos/PhonoLex/packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v.


Task 1: phoneme-position helper

Files: - Modify: <spike>/pair_driven.py - Modify: <spike>/test_pair_driven_solve.py

Helper that returns the set of words containing a given phoneme at a given position, computed from words.parquet's phonemes column.

  • [ ] Step 1.1: Append failing tests
def test_words_with_phoneme_at_position_initial(store):
    # /k/ at initial: cat, kid, key, etc.
    words = pair_driven._words_with_phoneme_at_position(
        word_df=store.df, phoneme="k", position="initial",
        allow_set=frozenset({"cat", "kid", "key", "bat", "bid"}),
    )
    assert "cat" in words
    assert "kid" in words
    assert "key" in words
    assert "bat" not in words
    assert "bid" not in words


def test_words_with_phoneme_at_position_final(store):
    # /d/ at final
    words = pair_driven._words_with_phoneme_at_position(
        word_df=store.df, phoneme="d", position="final",
        allow_set=frozenset({"bed", "bad", "bat", "kid"}),
    )
    assert "bed" in words
    assert "bad" in words
    assert "kid" in words
    assert "bat" not in words


def test_words_with_phoneme_at_position_any(store):
    words = pair_driven._words_with_phoneme_at_position(
        word_df=store.df, phoneme="t", position="any",
        allow_set=frozenset({"cat", "tap", "kid", "test"}),
    )
    assert "cat" in words
    assert "tap" in words
    assert "test" in words
    assert "kid" not in words
  • [ ] Step 1.2: Run, verify fail (AttributeError)

  • [ ] Step 1.3: Implement helper

Append to pair_driven.py:

def _words_with_phoneme_at_position(
    *,
    word_df: pl.DataFrame,
    phoneme: str,
    position: str,  # "initial" | "medial" | "final" | "any"
    allow_set: frozenset[str],
) -> frozenset[str]:
    """Return words in `allow_set` whose `phonemes` list contains `phoneme`
    at `position` (initial=first, final=last, medial=anywhere except first/last,
    any=anywhere)."""
    df = word_df.filter(pl.col("word").is_in(list(allow_set)))
    if position == "initial":
        df = df.filter(pl.col("phonemes").list.first() == phoneme)
    elif position == "final":
        df = df.filter(pl.col("phonemes").list.last() == phoneme)
    elif position == "medial":
        # phonemes[1:-1] contains phoneme
        df = df.filter(
            pl.col("phonemes").list.slice(1, pl.col("phonemes").list.len() - 2).list.contains(phoneme)
        )
    elif position == "any":
        df = df.filter(pl.col("phonemes").list.contains(phoneme))
    else:
        raise ValueError(f"unknown position: {position}")
    return frozenset(df["word"].to_list())
  • [ ] Step 1.4: Run, verify 3 pass

  • [ ] Step 1.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/pair_driven.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_pair_driven_solve.py
git commit -m "$(cat <<'EOF'
PHON-113: add _words_with_phoneme_at_position helper

Returns words from allow_set whose phonemes list contains a target
phoneme at a target position. Supports initial/medial/final/any.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 2: multopp filler bucket resolver

Files: - Modify: <spike>/pair_driven.py - Modify: <spike>/test_pair_driven_solve.py

Given a MultoppConstraint and the spec lexicon, produce a dict mapping each filler-phoneme (substitute + N targets) to its word bucket.

  • [ ] Step 2.1: Append failing tests
def test_resolve_multopp_buckets_returns_one_bucket_per_phoneme(store):
    from constraint_surface import MultoppConstraint

    c = MultoppConstraint(
        substitute="t", targets=("s", "ʃ", "tʃ"), n_targets=3,
        position="initial",
    )
    spec = frozenset({"top", "sop", "shop", "chop", "cop", "kid"})  # cop, kid have neither
    buckets = pair_driven._resolve_multopp_buckets(
        constraint=c, word_df=store.df, allow_set=spec,
    )
    # 4 phonemes (1 sub + 3 targets); each maps to a (possibly empty) bucket
    assert set(buckets.keys()) == {"t", "s", "ʃ", "tʃ"}
    assert "top" in buckets["t"]
    assert "sop" in buckets["s"]
    # cop, kid don't have any of the 4 phonemes at initial → in no bucket
    for ws in buckets.values():
        assert "cop" not in ws
        assert "kid" not in ws


def test_resolve_multopp_buckets_respects_n_targets(store):
    from constraint_surface import MultoppConstraint

    c = MultoppConstraint(
        substitute="t", targets=("s", "ʃ", "tʃ", "z"), n_targets=2,  # only first 2 targets
        position="initial",
    )
    spec = frozenset({"top", "sop", "shop", "chop", "zoo"})
    buckets = pair_driven._resolve_multopp_buckets(
        constraint=c, word_df=store.df, allow_set=spec,
    )
    # n_targets=2 → 1 sub + 2 targets = 3 buckets
    assert set(buckets.keys()) == {"t", "s", "ʃ"}
    assert "tʃ" not in buckets
    assert "z" not in buckets
  • [ ] Step 2.2: Run, verify fail

  • [ ] Step 2.3: Implement helper

def _resolve_multopp_buckets(
    *,
    constraint: "MultoppConstraint",
    word_df: pl.DataFrame,
    allow_set: frozenset[str],
) -> dict[str, frozenset[str]]:
    """Return phoneme → words mapping for substitute + targets[:n_targets]
    at constraint.position."""
    fillers = (constraint.substitute,) + constraint.targets[:constraint.n_targets]
    return {
        phoneme: _words_with_phoneme_at_position(
            word_df=word_df,
            phoneme=phoneme,
            position=constraint.position,
            allow_set=allow_set,
        )
        for phoneme in fillers
    }

Add from constraint_surface import MultoppConstraint at top of pair_driven.py if not already there.

  • [ ] Step 2.4: Run, verify pass

  • [ ] Step 2.5: Commit


Task 3: multopp N+1-way join

Files: - Modify: <spike>/pair_driven.py - Modify: <spike>/test_pair_driven_solve.py

Given the filler buckets and a verb candidate set, find (verb, role) groups that have at least one representative from EVERY bucket.

  • [ ] Step 3.1: Append failing tests
def test_resolve_multopp_join_returns_full_coverage_groups(store, sel_df):
    """A (verb, role) group with at least one filler from each bucket is returned."""
    buckets = {
        "t": frozenset({"top", "tap"}),
        "s": frozenset({"sop", "soup"}),
        "ʃ": frozenset({"ship", "shop"}),
    }
    rows = pair_driven.resolve_multopp_join(
        buckets=buckets,
        sel_df=sel_df,
        verb_candidates=frozenset({"see", "make", "drink"}),
        band="fineweb_adult",
    )
    if rows.height > 0:
        for row in rows.iter_rows(named=True):
            covered = set(row["fillers_per_bucket"].keys()) if isinstance(row.get("fillers_per_bucket"), dict) else set()
            # The exact representation depends on the join's output schema —
            # the contract is: every bucket has ≥1 representative in this (verb, role)


def test_resolve_multopp_join_empty_when_no_full_coverage(store, sel_df):
    """If no (verb, role) has all buckets covered, return empty."""
    buckets = {
        "zzz": frozenset({"phantom_word_zzz"}),  # nothing has rows
    }
    rows = pair_driven.resolve_multopp_join(
        buckets=buckets,
        sel_df=sel_df,
        verb_candidates=frozenset({"see"}),
        band="fineweb_adult",
    )
    assert rows.height == 0

The first test's exact assertion shape depends on the chosen output schema — see implementation. Adjust assertions to match.

  • [ ] Step 3.2: Run, verify fail

  • [ ] Step 3.3: Implement helper

The output schema: one row per (verb, role) group with all buckets covered, plus per-bucket selected representative (highest-ppmi from that bucket within the group).

def resolve_multopp_join(
    *,
    buckets: dict[str, frozenset[str]],
    sel_df: pl.DataFrame,
    verb_candidates: frozenset[str],
    band: str,
    slots: tuple[str, ...] | None = None,
) -> pl.DataFrame:
    """N+1-way self-join on (verb, role) requiring at least one filler from each bucket.

    Returns rows with columns:
        verb, role, band,
        bucket_<phoneme>_filler  (one column per bucket — the chosen filler word)
        bucket_<phoneme>_ppmi    (its ppmi)
        total_ppmi               (sum across buckets)

    If no group has full coverage, returns an empty frame.
    """
    if not buckets or any(len(v) == 0 for v in buckets.values()):
        return pl.DataFrame(schema={"verb": pl.Utf8, "role": pl.Utf8, "band": pl.Utf8, "total_ppmi": pl.Float32})

    all_filler_words: set[str] = set()
    for ws in buckets.values():
        all_filler_words.update(ws)

    sel_window = (
        sel_df
        .filter(pl.col("band") == band)
        .filter(pl.col("verb").is_in(list(verb_candidates)))
        .filter(pl.col("filler").is_in(list(all_filler_words)))
        .filter(pl.col("ppmi") > 0.0)
    )
    if slots is not None:
        sel_window = sel_window.filter(pl.col("role").is_in(list(slots)))

    # For each (verb, role) group, find the best representative per bucket
    # We do this by joining sel_window to itself once per bucket — N+1 self-joins
    rows_per_bucket = []
    for phoneme, bucket_words in buckets.items():
        b = sel_window.filter(pl.col("filler").is_in(list(bucket_words)))
        # Best ppmi per (verb, role) within this bucket
        best = (
            b.sort("ppmi", descending=True)
            .group_by(["verb", "role"])
            .agg([
                pl.col("filler").first().alias(f"bucket_{phoneme}_filler"),
                pl.col("ppmi").first().alias(f"bucket_{phoneme}_ppmi"),
            ])
        )
        rows_per_bucket.append(best)

    # Inner-join all per-bucket frames on (verb, role) — only groups with all buckets survive
    joined = rows_per_bucket[0]
    for next_df in rows_per_bucket[1:]:
        joined = joined.join(next_df, on=["verb", "role"], how="inner")

    # Compute total ppmi
    ppmi_cols = [f"bucket_{p}_ppmi" for p in buckets.keys()]
    joined = joined.with_columns(
        pl.sum_horizontal([pl.col(c) for c in ppmi_cols]).alias("total_ppmi")
    ).with_columns(pl.lit(band).alias("band"))

    return joined

The output is wide-format: one column per bucket. Downstream consumers iterate buckets to assemble per-sentence specs.

  • [ ] Step 3.4: Run, verify pass + smoke
cd /Users/jneumann/Repos/PhonoLex && uv run python -c "
import polars as pl
from pathlib import Path
from phonolex_data.runtime.store import WordStore
import sys; sys.path.insert(0, 'packages/generation/research/2026-05-07-sentence-generation-paradigms')
import pair_driven

store = WordStore.from_parquet(Path('data/runtime/words.parquet'))
sel_df = pl.read_parquet('data/runtime/selectional.parquet')
buckets = {
    't': pair_driven._words_with_phoneme_at_position(word_df=store.df, phoneme='t', position='initial', allow_set=frozenset({'top','tap'})),
    's': pair_driven._words_with_phoneme_at_position(word_df=store.df, phoneme='s', position='initial', allow_set=frozenset({'sop','soup'})),
}
print('buckets:', buckets)
rows = pair_driven.resolve_multopp_join(buckets=buckets, sel_df=sel_df, verb_candidates=frozenset({'eat','drink','make'}), band='fineweb_adult')
print('shape:', rows.shape)
print(rows.head(5))
"
  • [ ] Step 3.5: Commit

Task 4: migrate _pick_discourse_subjects to pair_driven

Files: - Modify: <spike>/paragraph_csp.py

Replace the existing helper that probes via solve_shape (which no longer fires contrastive constraints) with one that probes via pair_driven.solve().

  • [ ] Step 4.1: Read current implementation
sed -n '105,145p' /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms/paragraph_csp.py

Note the function signature, return shape, and what fixtures call it.

  • [ ] Step 4.2: Write failing test (or convert existing if any)

A simple smoke test:

def test_pick_discourse_subjects_returns_top_n(store, sel_df, skeletons_df):
    import paragraph_csp
    spec_words = paragraph_csp.spec_lexicon(store, "spec1") if hasattr(paragraph_csp, "spec_lexicon") else None
    subjects = paragraph_csp._pick_discourse_subjects(
        verb_candidates=frozenset({"cut", "see", "make"}),
        band="fineweb_adult",
        domain_words=spec_words or frozenset({"cat", "dog", "kid"}),
        word_df=store.df,
        sel_df=sel_df,
        pairs_df=store.pairs_df,
        skeletons_df=skeletons_df,
        top_n=3,
    )
    assert isinstance(subjects, list)
    assert len(subjects) <= 3
  • [ ] Step 4.3: Rewrite _pick_discourse_subjects
def _pick_discourse_subjects(
    *,
    verb_candidates: frozenset[str],
    band: str,
    domain_words: frozenset[str],
    word_df: pl.DataFrame,
    sel_df: pl.DataFrame,
    pairs_df: pl.DataFrame | None,
    skeletons_df: pl.DataFrame,
    top_n: int = 3,
) -> list[str]:
    """Top-N candidate discourse subjects via unconstrained pair_driven.solve.

    Picks unique nsubj (or filler in the nsubj-position role) values across
    multiple skeletons. No phonological constraints — discourse subjects are
    discovered from the lexicon's PMI signal.
    """
    import pair_driven
    candidates = pair_driven.solve(
        spec_words=domain_words,
        word_df=word_df,
        sel_df=sel_df,
        pairs_df=pairs_df,
        skeletons_df=skeletons_df,
        band=band,
        constraints=[],
        top_k=top_n * 4,  # over-fetch, dedup by subject
    )
    seen: dict[str, float] = {}
    for c in candidates:
        # The "discourse subject" is whichever filler ends up in nsubj
        subj = (
            c["filler_a"] if c.get("role_a") == "nsubj"
            else c["filler_b"] if c.get("role_b") == "nsubj"
            else None
        )
        if subj is None:
            continue
        if subj not in seen or c["ppmi_total"] > seen[subj]:
            seen[subj] = c["ppmi_total"]
    return [s for s, _ in sorted(seen.items(), key=lambda kv: -kv[1])[:top_n]]
  • [ ] Step 4.4: Run, verify pass

  • [ ] Step 4.5: Commit


Task 5: rewrite solve_paragraph non-multopp branch

Files: - Modify: <spike>/paragraph_csp.py

The non-multopp branch: pick discourse subject → run independent per-sentence pair_driven.solve(locked_slots={"nsubj": subject}) for N sentences → assemble paragraph candidates by bounded cartesian.

  • [ ] Step 5.1: Update ParagraphSpec shape

Drop verbs: tuple[str, ...]. Add n_sentences: int = 3.

@dataclass(frozen=True)
class ParagraphSpec:
    band: str
    constraints: tuple[Constraint, ...] = ()
    n_sentences: int = 3
    discourse_subject: str | None = None
    use_pronoun_coref: bool = True
    n_paragraphs: int = 5
    per_sentence_top_k: int = 4
    n_subject_seeds: int = 3
  • [ ] Step 5.2: Write failing test
def test_solve_paragraph_no_constraint_returns_n_paragraphs(store, sel_df, skeletons_df):
    import paragraph_csp
    spec = paragraph_csp.ParagraphSpec(
        band="fineweb_adult",
        n_sentences=3,
        n_paragraphs=2,
    )
    spec_words = paragraph_csp.spec_lexicon(store, "spec1") if hasattr(paragraph_csp, "spec_lexicon") else frozenset({"cat", "dog", "kid", "bat", "ball"})
    paragraphs = paragraph_csp.solve_paragraph(
        spec=spec,
        spec_words=spec_words,
        word_df=store.df,
        sel_df=sel_df,
        pairs_df=store.pairs_df,
        skeletons_df=skeletons_df,
    )
    assert isinstance(paragraphs, list)
    assert len(paragraphs) <= 2
    for p in paragraphs:
        assert "sentences" in p
        assert len(p["sentences"]) == 3
  • [ ] Step 5.3: Rewrite solve_paragraph (non-multopp branch only)
def solve_paragraph(
    *,
    spec: ParagraphSpec,
    spec_words: frozenset[str],
    word_df: pl.DataFrame,
    sel_df: pl.DataFrame,
    pairs_df: pl.DataFrame | None,
    skeletons_df: pl.DataFrame,
) -> list[dict]:
    """Constraint-driven paragraph resolver."""
    import pair_driven
    from constraint_surface import MultoppConstraint

    constraints = list(spec.constraints)

    # Multopp branch: handled in Task 6
    multopp = [c for c in constraints if isinstance(c, MultoppConstraint)]
    if multopp:
        # Placeholder until Task 6 lands
        raise NotImplementedError("multopp branch — Task 6")

    # Non-multopp branch
    # 1. Pick discourse subjects
    subjects: list[str]
    if spec.discourse_subject:
        subjects = [spec.discourse_subject]
    else:
        # Use verb_candidates from a single pair_driven probe to pick subjects
        # We're using `_pick_discourse_subjects` from Task 4
        verb_candidates = pair_driven.compute_verb_candidates_or_default()  # see note
        subjects = _pick_discourse_subjects(
            verb_candidates=frozenset(),  # no filter; let probe pick widely
            band=spec.band,
            domain_words=spec_words,
            word_df=word_df,
            sel_df=sel_df,
            pairs_df=pairs_df,
            skeletons_df=skeletons_df,
            top_n=spec.n_subject_seeds,
        )

    # 2. For each subject, run N independent per-sentence solves
    paragraphs: list[dict] = []
    for subject in subjects:
        per_sentence_candidates: list[list[dict]] = []
        for sent_idx in range(spec.n_sentences):
            sent_constraints = list(constraints)
            # First sentence carries the contrast; others run unconstrained-by-contrast
            if sent_idx > 0:
                sent_constraints = [
                    c for c in sent_constraints
                    if c.type not in ("contrastive_minpair", "contrastive_maxopp")
                ]
            cands = pair_driven.solve(
                spec_words=spec_words,
                word_df=word_df,
                sel_df=sel_df,
                pairs_df=pairs_df,
                skeletons_df=skeletons_df,
                band=spec.band,
                constraints=sent_constraints,
                locked_slots={"nsubj": subject},
                top_k=spec.per_sentence_top_k,
            )
            if not cands:
                break
            per_sentence_candidates.append(cands)
        if len(per_sentence_candidates) < spec.n_sentences:
            continue

        # 3. Bounded cartesian: top-K compositions
        from itertools import product
        for combo in product(*per_sentence_candidates):
            sentences = list(combo)
            paragraphs.append({
                "discourse_subject": subject,
                "sentences": sentences,
                "score": sum(s["ppmi_total"] for s in sentences),
            })

    paragraphs.sort(key=lambda p: -p["score"])
    return paragraphs[:spec.n_paragraphs]

The compute_verb_candidates_or_default in the sketch is a thinko — actually the verb candidates are computed inside pair_driven.solve() for each per-sentence call. The discourse-subject probe doesn't need a separate verb_candidates set; pass frozenset() to _pick_discourse_subjects which lets pair_driven.solve compute its own.

Adjust the call to _pick_discourse_subjects to match Task 4's signature.

  • [ ] Step 5.4: Run, verify the no-constraint test passes

  • [ ] Step 5.5: Commit


Task 6: multopp branch in solve_paragraph

Files: - Modify: <spike>/paragraph_csp.py - Modify: <spike>/test_paragraph_csp.py

Wire the multopp join into solve_paragraph. Each multopp join row produces ONE paragraph: N+1 sentences sharing (verb, role, discourse_subject), each using one of the N+1 fillers in the locked role.

  • [ ] Step 6.1: Write failing test
def test_solve_paragraph_multopp_returns_n_plus_1_sentences(store, sel_df, skeletons_df):
    from constraint_surface import MultoppConstraint
    import paragraph_csp

    spec = paragraph_csp.ParagraphSpec(
        band="fineweb_adult",
        constraints=(MultoppConstraint(
            substitute="t", targets=("s", "ʃ"), n_targets=2,
            position="initial",
        ),),
        n_paragraphs=2,
    )
    spec_words = frozenset({"top", "tap", "sop", "soup", "ship", "shop", "kid"})
    paragraphs = paragraph_csp.solve_paragraph(
        spec=spec,
        spec_words=spec_words,
        word_df=store.df,
        sel_df=sel_df,
        pairs_df=store.pairs_df,
        skeletons_df=skeletons_df,
    )
    if not paragraphs:
        pytest.skip("no multopp paragraphs survived join — adjust filler set")
    for p in paragraphs:
        # Multopp: 1 sub + 2 targets = 3 sentences
        assert len(p["sentences"]) == 3
        # All sentences share the same verb (lock invariant)
        verbs = {s["verb"] for s in p["sentences"]}
        assert len(verbs) == 1
  • [ ] Step 6.2: Run, verify fail (NotImplementedError or AssertionError)

  • [ ] Step 6.3: Implement multopp branch in solve_paragraph

Replace the raise NotImplementedError(...) with:

    if multopp:
        if len(multopp) > 1:
            raise ValueError("at most one multopp constraint per request")
        cc = multopp[0]
        # Reject Minpair/Maxopp + Multopp combination
        if any(c.type in ("contrastive_minpair", "contrastive_maxopp") for c in constraints):
            raise ValueError("Multopp + Minpair/Maxopp simultaneously not supported")

        # Resolve filler buckets
        allow_sets = pair_driven.resolve_per_slot_allow_sets(
            spec_words=spec_words, word_df=word_df,
            constraints=[c for c in constraints if not isinstance(c, MultoppConstraint)],
            slot_types=("nsubj", "dobj", "iobj"),
        )
        # Multopp fillers come from full lexicon ∩ phono filter ∩ spec
        # (same allow set used for nsubj/dobj suffices)
        filler_allow = allow_sets["dobj"]  # default; cc.slots can override
        buckets = pair_driven._resolve_multopp_buckets(
            constraint=cc, word_df=word_df, allow_set=filler_allow,
        )

        # Verb candidates (same logic as PHON-112)
        verb_full = pair_driven.resolve_per_slot_allow_sets(
            spec_words=frozenset(), word_df=word_df,
            constraints=[c for c in constraints if not isinstance(c, MultoppConstraint)],
            slot_types=("V",),
        )["V"]
        verb_set = pair_driven.compute_verb_candidates(
            spec_words=verb_full, word_df=word_df, sel_df=sel_df, band=spec.band,
        )

        # N+1-way join
        joined = pair_driven.resolve_multopp_join(
            buckets=buckets,
            sel_df=sel_df,
            verb_candidates=verb_set,
            band=spec.band,
            slots=cc.slots,
        )
        if joined.height == 0:
            return []

        # Top join rows by total_ppmi
        joined = joined.sort("total_ppmi", descending=True).head(spec.n_paragraphs * 2)

        # For each join row, pick a discourse subject and realize N+1 sentences
        paragraphs: list[dict] = []
        for row in joined.iter_rows(named=True):
            verb = row["verb"]
            role = row["role"]
            # Discourse subject: probe with verb locked using one filler
            sample_filler = row[f"bucket_{cc.substitute}_filler"]
            subject_probe = pair_driven.solve(
                spec_words=spec_words, word_df=word_df, sel_df=sel_df,
                pairs_df=pairs_df, skeletons_df=skeletons_df,
                band=spec.band,
                constraints=[c for c in constraints if not isinstance(c, MultoppConstraint)],
                locked_slots={"V": verb, role: sample_filler},
                top_k=1,
            )
            if not subject_probe:
                continue
            subject = (
                subject_probe[0]["filler_a"] if subject_probe[0]["role_a"] == "nsubj"
                else subject_probe[0].get("filler_b")
            )
            if subject is None:
                continue

            # Realize N+1 sentences — one per filler
            phonemes = list(buckets.keys())
            sentences = []
            for phoneme in phonemes:
                filler = row[f"bucket_{phoneme}_filler"]
                sent_cands = pair_driven.solve(
                    spec_words=spec_words, word_df=word_df, sel_df=sel_df,
                    pairs_df=pairs_df, skeletons_df=skeletons_df,
                    band=spec.band,
                    constraints=[c for c in constraints if not isinstance(c, MultoppConstraint)],
                    locked_slots={"V": verb, role: filler, "nsubj": subject},
                    top_k=1,
                )
                if not sent_cands:
                    sentences = []
                    break
                sentences.append(sent_cands[0])
            if not sentences:
                continue

            paragraphs.append({
                "discourse_subject": subject,
                "sentences": sentences,
                "score": row["total_ppmi"],
            })

        return paragraphs[:spec.n_paragraphs]

This is the most complex single step in the plan. Read the spec carefully before implementing.

  • [ ] Step 6.4: Run, verify pass (or skip-on-data)

  • [ ] Step 6.5: Commit


Task 7: pronoun coref + discourse markers

Files: - Modify: <spike>/paragraph_csp.py - Modify: <spike>/test_paragraph_csp.py

Apply pronoun coref to sentences 2..N+1 when use_pronoun_coref=True. Apply discourse markers (e.g., "Then,", "After that,") at random to sentence 2..N+1.

  • [ ] Step 7.1: Append failing test
def test_solve_paragraph_pronoun_coref(store, sel_df, skeletons_df):
    import paragraph_csp
    spec = paragraph_csp.ParagraphSpec(
        band="fineweb_adult", n_sentences=3, n_paragraphs=1,
        discourse_subject="cat",  # locked to test coref
        use_pronoun_coref=True,
    )
    spec_words = frozenset({"cat", "dog", "kid", "bat", "ball", "chase"})
    paragraphs = paragraph_csp.solve_paragraph(
        spec=spec, spec_words=spec_words, word_df=store.df, sel_df=sel_df,
        pairs_df=store.pairs_df, skeletons_df=skeletons_df,
    )
    if not paragraphs:
        pytest.skip("no paragraphs survived")
    p = paragraphs[0]
    sentences = [s["sentence"] for s in p["sentences"]]
    # First sentence has the explicit subject
    assert "cat" in sentences[0].lower() or "the cat" in sentences[0].lower()
    # Sentences 2-3 use pronouns ("it" or "the cat" coref-via-realize)
    # Note: the existing _pronoun_for helper returns "it" for inanimate singular,
    # "they" for plural. Verify the coref applied.
  • [ ] Step 7.2: Implement coref

Reuse the existing _pronoun_for(noun) -> str helper. After realizing sentences 2..N+1, post-process the rendered sentence to substitute the pronoun for the discourse subject.

def _apply_coref(sentences: list[dict], discourse_subject: str) -> list[dict]:
    """Substitute pronouns for the discourse subject in sentences 2..N+1."""
    if len(sentences) <= 1:
        return sentences
    pronoun = _pronoun_for(discourse_subject)
    out = [sentences[0]]
    for s in sentences[1:]:
        # Replace "the <subject>" or "<subject>" with capitalized pronoun
        sentence_text = s["sentence"]
        # Replace at start of sentence (capitalized)
        sentence_text = sentence_text.replace(
            f"The {discourse_subject}", pronoun.capitalize()
        )
        sentence_text = sentence_text.replace(
            f"the {discourse_subject}", pronoun
        )
        out.append({**s, "sentence": sentence_text})
    return out

Apply this after the per-sentence solves in both branches (multopp and non-multopp).

  • [ ] Step 7.3: Discourse markers

Add "Then,", "After that,", "Finally," to sentences 2..N+1 in random order. Keep simple:

DISCOURSE_MARKERS = ["Then,", "After that,", "Finally,"]

def _apply_discourse_markers(sentences: list[dict], rng_seed: int = 0) -> list[dict]:
    if len(sentences) <= 1:
        return sentences
    import random
    rng = random.Random(rng_seed)
    markers = rng.sample(DISCOURSE_MARKERS, k=min(len(sentences) - 1, len(DISCOURSE_MARKERS)))
    out = [sentences[0]]
    for s, m in zip(sentences[1:], markers):
        out.append({**s, "sentence": f"{m} {s['sentence']}"})
    return out

Apply after coref. The seed could be derived from the paragraph index for determinism.

  • [ ] Step 7.4: Run, verify pass

  • [ ] Step 7.5: Commit


Task 8: subject diversification

Files: - Modify: <spike>/paragraph_csp.py - Modify: <spike>/test_paragraph_csp.py

Top-K paragraphs by score, but with distinct discourse subjects.

  • [ ] Step 8.1: Existing _diversify_by_subject helper

The original paragraph_csp.py has _diversify_by_subject — reuse if its signature is compatible. If not, write a new one:

def _diversify_by_subject(paragraphs: list[dict], n: int) -> list[dict]:
    """Top-N paragraphs by score, with distinct discourse subjects when possible."""
    seen_subjects: set[str] = set()
    diversified: list[dict] = []
    for p in paragraphs:
        if p["discourse_subject"] in seen_subjects:
            continue
        diversified.append(p)
        seen_subjects.add(p["discourse_subject"])
        if len(diversified) >= n:
            break
    if len(diversified) < n:
        # Backfill with non-diversified
        for p in paragraphs:
            if p not in diversified:
                diversified.append(p)
                if len(diversified) >= n:
                    break
    return diversified
  • [ ] Step 8.2: Wire into solve_paragraph

In both branches (multopp and non-multopp), apply _diversify_by_subject(all_paragraphs, n=spec.n_paragraphs) before returning.

  • [ ] Step 8.3: Test diversification
def test_solve_paragraph_diversifies_subjects(store, sel_df, skeletons_df):
    import paragraph_csp
    spec = paragraph_csp.ParagraphSpec(
        band="fineweb_adult", n_sentences=2, n_paragraphs=3, n_subject_seeds=5,
    )
    spec_words = frozenset({"cat", "dog", "kid", "bat", "ball"})
    paragraphs = paragraph_csp.solve_paragraph(
        spec=spec, spec_words=spec_words, word_df=store.df, sel_df=sel_df,
        pairs_df=store.pairs_df, skeletons_df=skeletons_df,
    )
    if len(paragraphs) >= 2:
        subjects = [p["discourse_subject"] for p in paragraphs]
        # If at least 2 distinct subjects were available, paragraphs should have them
        assert len(set(subjects)) >= 2 or len(paragraphs) < 2
  • [ ] Step 8.4: Commit

Task 9: minpair/maxopp paragraph integration test

Files: - Modify: <spike>/test_paragraph_csp.py

The non-multopp branch already has the routing (Task 5: first sentence carries contrast). This task verifies it works end-to-end.

  • [ ] Step 9.1: Append test
def test_solve_paragraph_minpair_carried_by_first_sentence(store, sel_df, skeletons_df):
    from constraint_surface import MinpairConstraint
    import paragraph_csp

    spec = paragraph_csp.ParagraphSpec(
        band="fineweb_adult",
        n_sentences=3,
        n_paragraphs=1,
        constraints=(MinpairConstraint(phoneme1="d", phoneme2="z", position="final"),),
    )
    spec_words = paragraph_csp.spec_lexicon(store, "spec1") if hasattr(paragraph_csp, "spec_lexicon") else frozenset()
    paragraphs = paragraph_csp.solve_paragraph(
        spec=spec, spec_words=spec_words, word_df=store.df, sel_df=sel_df,
        pairs_df=store.pairs_df, skeletons_df=skeletons_df,
    )
    if not paragraphs:
        pytest.skip("no paragraphs survived")
    p = paragraphs[0]
    # First sentence: the contrast pair landed somewhere in the (filler_a, filler_b)
    s1 = p["sentences"][0]
    # This is a smoke check; deeper validation would verify (filler_a, filler_b) is a real (d, z) final pair
    assert "filler_a" in s1
    assert "filler_b" in s1
  • [ ] Step 9.2: Run, verify pass

  • [ ] Step 9.3: Commit


Task 10: drop ParagraphSpec.verbs, retire v1 helpers

Files: - Modify: <spike>/paragraph_csp.py - Modify: <spike>/test_paragraph_csp.py and any other tests that reference ParagraphSpec.verbs

  • [ ] Step 10.1: Search for usages
grep -rn "ParagraphSpec(verbs=\|spec\.verbs\|verbs=verbs\|\.verbs" packages/generation/research/2026-05-07-sentence-generation-paradigms/

Identify all callers/tests that use verbs=... argument or read spec.verbs.

  • [ ] Step 10.2: Update each caller

For each caller, rewrite to use n_sentences=... instead of verbs=.... The CALL_SIGNATURE change is one-line per caller.

  • [ ] Step 10.3: Delete v1-only helpers

_solve_sentence (replaced by inline pair_driven.solve calls in solve_paragraph) — delete.

Other helpers that depended on solve_shape's pre-PHON-112 contrastive path — delete or rewrite.

  • [ ] Step 10.4: Run full spike suite
cd /Users/jneumann/Repos/PhonoLex/packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v

Expected: all tests pass; no verbs= references remain.

  • [ ] Step 10.5: Commit

Task 11: paragraph eval harness update

Files: - Modify: <spike>/build_judging_set.py

The eval harness's paragraph branch still uses ParagraphSpec(verbs=verbs, ...). Update to the new signature.

  • [ ] Step 11.1: Inspect current paragraph branch
grep -n "ParagraphSpec\|PARAGRAPH_CHAINS\|build_paragraph_requests" packages/generation/research/2026-05-07-sentence-generation-paradigms/build_judging_set.py
  • [ ] Step 11.2: Rewrite

Replace the for chain_label, verbs in PARAGRAPH_CHAINS: loop with one keyed on (label, n_sentences) instead of (label, verbs):

PARAGRAPH_CONFIGS = [
    ("para_3_sent", 3),
    ("para_5_sent", 5),
]

for label, n_sent in PARAGRAPH_CONFIGS:
    for spec_name in SPECS:
        for band in BANDS:
            for c_label, constraints in CONSTRAINT_CONFIGS.items():
                spec = ParagraphSpec(
                    band=band,
                    constraints=tuple(constraints),
                    n_sentences=n_sent,
                    n_paragraphs=PARAGRAPH_TOP_K,
                )
                paragraphs = solve_paragraph(
                    spec=spec, spec_words=spec_words,
                    word_df=store.df, sel_df=sel_df,
                    pairs_df=store.pairs_df,
                    skeletons_df=skeletons_df,
                )
                # ...
  • [ ] Step 11.3: Smoke test
cd /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms && \
  uv run python build_judging_set.py --dry-run 2>&1 | tail -20
  • [ ] Step 11.4: Commit

Task 12: final verification

  • [ ] Step 12.1: Full spike suite
cd /Users/jneumann/Repos/PhonoLex/packages/generation && \
  uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/ -v

Expected: all passing.

  • [ ] Step 12.2: Data layer suite
cd /Users/jneumann/Repos/PhonoLex && \
  uv run python -m pytest packages/data/tests/ -q

Expected: 209 passed (no regressions).

  • [ ] Step 12.3: Smoke — multopp paragraph
cd /Users/jneumann/Repos/PhonoLex/packages/generation/research/2026-05-07-sentence-generation-paradigms && \
  uv run python -c "
import polars as pl
from pathlib import Path
from phonolex_data.runtime.store import WordStore
import paragraph_csp
from constraint_surface import MultoppConstraint

repo = Path('/Users/jneumann/Repos/PhonoLex')
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
skeletons_df = pl.read_parquet(Path('outputs') / 'skeletons.parquet')
spec_words = paragraph_csp.spec_lexicon(store, 'spec1') if hasattr(paragraph_csp, 'spec_lexicon') else frozenset()

spec = paragraph_csp.ParagraphSpec(
    band='fineweb_adult',
    constraints=(MultoppConstraint(substitute='t', targets=('s', 'ʃ'), n_targets=2, position='initial'),),
    n_paragraphs=2,
)
paragraphs = paragraph_csp.solve_paragraph(
    spec=spec, spec_words=spec_words, word_df=store.df, sel_df=sel_df,
    pairs_df=store.pairs_df, skeletons_df=skeletons_df,
)
print(f'{len(paragraphs)} multopp paragraphs')
for p in paragraphs[:2]:
    print(f'subject={p[\"discourse_subject\"]} score={p[\"score\"]:.2f}')
    for s in p['sentences']:
        print(f'  {s[\"sentence\"]}')
"

Expected: 1-2 paragraphs printed, each with 3 sentences (1 substitute + 2 targets) sharing the same verb.

  • [ ] Step 12.4: Smoke — minpair paragraph
uv run python -c "
import polars as pl
from pathlib import Path
from phonolex_data.runtime.store import WordStore
import paragraph_csp
from constraint_surface import MinpairConstraint

repo = Path('/Users/jneumann/Repos/PhonoLex')
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
skeletons_df = pl.read_parquet(Path('outputs') / 'skeletons.parquet')
spec_words = paragraph_csp.spec_lexicon(store, 'spec1') if hasattr(paragraph_csp, 'spec_lexicon') else frozenset()

spec = paragraph_csp.ParagraphSpec(
    band='fineweb_adult',
    constraints=(MinpairConstraint(phoneme1='d', phoneme2='z', position='final'),),
    n_sentences=3, n_paragraphs=1,
)
paragraphs = paragraph_csp.solve_paragraph(
    spec=spec, spec_words=spec_words, word_df=store.df, sel_df=sel_df,
    pairs_df=store.pairs_df, skeletons_df=skeletons_df,
)
print(f'{len(paragraphs)} minpair paragraphs')
for p in paragraphs:
    for s in p['sentences']:
        print(f'  {s[\"sentence\"]}')
"

Expected: 1 paragraph, 3 sentences. First sentence's (filler_a, filler_b) is a real (d, z) final pair.

  • [ ] Step 12.5: No further commit needed

Done

After Task 12 verification, PHON-113 v1 is complete. Stack continues toward PHON-107 (reranker v2) and PHON-109 productionization.

Follow-ups not in this plan: - PHON-107 — reranker v2 trains on both single-sentence and paragraph output from the constraint-driven path - PHON-109 — productionize: replace /api/generate-single and /api/generate-paragraph internals - PHON-110 — frontend reframe (top-K candidates UI)