Skip to content

PHON-109 — Productionization design

Goal

Move the CSP + reranker_v2 stack from packages/generation/research/2026-05-07-sentence-generation-paradigms/ (spike) into the phonolex_generators package as a csp submodule. Replace the v6/T5Gemma generation path in the FastAPI server with two new endpoints (/api/generate-sentences, /api/generate-paragraphs) that consume the new path. Bundle the follow-ups accumulated across PHON-106/112/113/107. Retire RunPod-related code.

This ticket is code-ready, not deploy-ready. The Cloudflare Containers cutover is a sibling ticket; PHON-109 makes the new server runnable locally and replaces the source-side paths the Worker will eventually call.

Motivation

The CSP + reranker_v2 stack has been functionally complete since PHON-107 but lives in a research spike directory. The FastAPI server still hosts the v6/T5Gemma path (retired per user direction 2026-05-08). The Worker still proxies /api/generate-single to RunPod, which runs the retired v6 server. None of the productionization work is done.

Three follow-ups have accumulated: 1. pair_driven.solve locked_slots only enforces V (not filler roles); paragraph_csp has 2 local workarounds (Task 7 post-filter + lock-after-pick _solve_with_locked_nsubj) 2. Top-K dedup in pair_driven.solve — same-surface candidates from different pair orientations 3. Render-layer quality (low priority; reranker handles)

PHON-109 bundles these as we migrate, so the productionized code starts clean.

Architecture

data/runtime/
  ├─ words.parquet           (existing, LFS)
  ├─ edges.parquet           (existing, LFS)
  ├─ selectional.parquet     (existing, LFS)
  ├─ pairs.parquet           (existing, LFS — PHON-106)
  ├─ skeletons.parquet       (NEW — moved from spike/outputs/, LFS)
  └─ reranker_v2.pkl         (NEW — moved from spike/outputs/, LFS)

packages/generators/src/phonolex_generators/
  ├─ __init__.py
  ├─ cfg_seed/               (existing — PHON-95)
  ├─ editor/                 (existing — PHON-95)
  ├─ scorer/                 (existing — PHON-95)
  ├─ shared/                 (existing — PHON-95)
  └─ csp/                    (NEW — PHON-109)
      ├─ __init__.py
      ├─ pair_driven.py        ← moved from spike (locked_slots fix + dedup)
      ├─ verb_candidates.py    ← moved from spike
      ├─ paragraph.py          ← moved from spike paragraph_csp.py
      ├─ skeleton.py           ← moved from spike skeleton_csp.py (the helpers we kept after Task 10 retirement)
      ├─ constraint_surface.py ← moved from spike
      ├─ realize.py            ← moved from spike (the `realize` function + helpers)
      ├─ skeletons.py          ← skeletons-parquet loader
      ├─ reranker/
      │   ├─ __init__.py
      │   ├─ train.py            ← moved from spike train_reranker_v2.py
      │   ├─ predict.py          ← moved from spike quality_axis_v2.py
      │   ├─ rerank.py           ← moved from spike reranker_v2.py
      │   ├─ active_learning.py  ← moved from spike active_learning_select.py
      │   ├─ embedding_cache.py  ← moved from spike
      │   └─ judge.py            ← moved from spike llm_judge.py (teacher-labeling tool)
      └─ tests/
          └─ ... (moved from spike test_*.py)

packages/generation/server/
  ├─ main.py            ← server entry; CSP cold-start instead of T5Gemma
  ├─ schemas.py         ← new request/response models
  └─ routes/
      └─ generate.py    ← /api/generate-sentences, /api/generate-paragraphs, /api/generate-single (alias)

packages/web/workers/src/routes/generation.ts
  ← rewrite: route /generate-sentences and /generate-paragraphs to the new
    FastAPI server (Cloudflare Containers eventually, env-configurable host)
    Drop /generate-single's RunPod path; alias to /generate-sentences.

The packages/generation/research/2026-05-07-sentence-generation-paradigms/ directory becomes archive — kept in git history but no longer the canonical path.

API

POST /api/generate-sentences

Request:

{
  "spec": "spec1",
  "band": "fineweb_adult",
  "constraints": [
    {"type": "exclude", "phonemes": ["ɹ"]},
    {"type": "contrastive_minpair", "phoneme1": "d", "phoneme2": "z", "position": "final"}
  ],
  "locked_slots": {"V": "cut"},
  "axis_weights": {"age_appropriate": 0.4, "coherence": 0.3, "naturalness": 0.2, "grammaticality": 0.1},
  "top_k": 8
}

Response:

{
  "candidates": [
    {
      "sentence": "The seed drills in the seas.",
      "verb": "drill",
      "fillers": {"V": "drill", "nsubj": "seed", "pobj_in": "seas"},
      "skeleton": "nsubj,V,pobj_in",
      "axis_scores": {"naturalness": 1.99, "grammaticality": 2.60, "age_appropriate": 2.93, "coherence": 3.08},
      "composite_score": 2.65,
      "feature_distance": 0.8,
      "sonorant_diff": 0.0,
      "ppmi_total": 5.0
    },
    ...
  ],
  "n_total_candidates": 142,
  "diagnostics": {
    "verb_candidates_count": 87,
    "pair_frame_height": 105,
    "join_rows": 142
  }
}

n_total_candidates is the count BEFORE top_k truncation, so the frontend can show "Top 8 of 142".

POST /api/generate-paragraphs

Request: same shape but adds n_sentences (default 3); paragraph-level constraints (multopp) accepted.

Response:

{
  "paragraphs": [
    {
      "discourse_subject": "seed",
      "sentences": [
        {/* sentence-level candidate, same shape as above */},
        ...
      ],
      "composite_score": 2.65,
      "axis_scores": {...},
      "score": 44.91
    },
    ...
  ],
  "n_total_paragraphs": 12,
  "diagnostics": {...}
}

The paragraph-level axis_scores come from running reranker_v2 on the joined paragraph text. The per-sentence composite_score and axis_scores are also retained for inspection. score is the sum of per-sentence ppmi_total (legacy).

POST /api/generate-single (alias)

Equivalent to /api/generate-sentences with top_k=1. Returns just one candidate (not a list). Kept for backward compat; soft-deprecated.

Constraint serialization

The frontend sends constraints as JSON. The server parses into the dataclasses from phonolex_generators.csp.constraint_surface.

Mapping table:

Frontend type Dataclass
exclude ExcludeConstraint(phonemes)
include IncludeConstraint(phonemes)
bound BoundConstraint(norm, min_value, max_value)
bound_boost BoundBoostConstraint(norm, min_value, max_value)
contrastive_minpair MinpairConstraint(phoneme1, phoneme2, position, slots?)
contrastive_maxopp MaxoppConstraint(phoneme1, phoneme2, position, min_sonorant_diff, slots?)
contrastive_multopp MultoppConstraint(substitute, targets, n_targets, position)

Schema validation in pydantic; reject unknown types.

Server cold-start

# packages/generation/server/main.py — pseudocode
from phonolex_generators.csp import (
    pair_driven, paragraph, reranker, skeletons,
)
from phonolex_data.runtime.store import WordStore

@app.on_event("startup")
async def load_state():
    app.state.store = WordStore.from_parquet(DATA_RUNTIME / "words.parquet")
    app.state.sel_df = pl.read_parquet(DATA_RUNTIME / "selectional.parquet")
    app.state.skeletons_df = pl.read_parquet(DATA_RUNTIME / "skeletons.parquet")
    app.state.reranker_path = DATA_RUNTIME / "reranker_v2.pkl"
    # Reranker model loads lazily on first request via _cached_model

Cold start expectations: - WordStore: ~600MB parquet → ~2s - selectional: ~5.4M rows → ~5s - skeletons: ~219K rows → ~0.5s - reranker_v2.pkl: ~80MB pickle → ~1s on first request - MiniLM-L6-v2: ~80MB on first request → ~5s

Total cold-start: <15s (vs T5Gemma's ~60s). Warm requests: ~1-2s.

Bundled follow-ups

Follow-up 1: pair_driven.solve locked_slots for filler roles

Currently pair_driven.solve only honors locked_slots["V"]. Filler-slot locks (locked_slots["nsubj"], etc.) are silently ignored. paragraph_csp has 2 local workarounds (Task 7 post-filter + lock-after-pick _solve_with_locked_nsubj).

Fix in pair_driven.solve: - After resolve_contrastive_join (or non-contrastive enumeration) produces the candidate frame, filter rows where role_a==slot AND filler_a==locked_value (or symmetric for role_b) for each entry in locked_slots. - Remove the 2 paragraph_csp workarounds.

Follow-up 2: Top-K dedup

pair_driven.solve produces candidates from both orientations of pair frames. Sometimes (verb, fillers, skeleton) tuples appear twice. Dedup pass before top_k truncation:

seen: set[tuple] = set()
deduped = []
for c in scored:
    key = (c["verb"], frozenset(c["fillers"].items()), c["skeleton"])
    if key in seen:
        continue
    seen.add(key)
    deduped.append(c)
return deduped[:top_k]

Optional refinement: dedup by sentence text after realize() if structural dedup misses cases.

Follow-up 3: Render-quality polish (deferred)

"The seed the seed drills" type duplications arise from realize() when same word fills multiple slots. Lower priority; reranker scores them down. Defer to a separate ticket if someone wants to clean up the realizer.

Scope

In scope: - Move spike modules → phonolex_generators.csp.* - Move v2 reranker modules → phonolex_generators.csp.reranker.* - Move skeletons.parquet + reranker_v2.pkldata/runtime/, LFS-track - Update import paths everywhere - Bundle follow-up 1 (locked_slots) and follow-up 2 (dedup) into the move - New FastAPI routes: /api/generate-sentences, /api/generate-paragraphs - /api/generate-single alias - Update server tests - Update Worker proxy routes (point at new endpoint paths; still configurable backend host) - Retire RunPod env vars, runpod_handler.py, T5Gemma model loading - Spike directory becomes archive (kept in git, no deletion)

Out of scope: - Cloudflare Containers actual deployment (PHON-109b or sibling) - Frontend reframe (PHON-110) - Render-quality polish (deferred) - v3 reranker

Migration plan

  1. Survives unchanged: packages/data/ (data layer), packages/governors/ (still used? — verify), the existing phonolex_generators package's PHON-95 modules (cfg_seed, editor, scorer, shared).

  2. Gets retired:

  3. packages/generation/server/governor.py (v6 governor — replaced by CSP)
  4. packages/generation/server/model.py (T5Gemma loader)
  5. packages/generation/rp_handler.py (RunPod handler)
  6. packages/generation/Dockerfile (RunPod image)
  7. Worker RUNPOD_* env vars, runpodUrl(), runpodHeaders() helpers

  8. Gets moved + lightly modified:

  9. All CSP/reranker spike modules → packages/generators/src/phonolex_generators/csp/
  10. Tests → packages/generators/tests/csp/
  11. Two follow-up fixes applied during the move (locked_slots, dedup)

  12. Gets rewritten:

  13. packages/generation/server/main.py — CSP cold-start instead of T5Gemma
  14. packages/generation/server/schemas.py — new request/response shapes
  15. packages/generation/server/routes/generate.py — three new routes
  16. packages/web/workers/src/routes/generation.ts — route to new server, drop RunPod logic

  17. Branch: continues feature/csp-iteration after PHON-107. No PR until PHON-109b deployment lands and staging is green.

Risks

  • WordStore + selectional load time — cold starts may push past 15s if data grows. Mitigation: persistent process (Cloudflare Containers keeps warm), not serverless cold-start each request.
  • Reranker model artifact size — reranker_v2.pkl is ~80MB. LFS-tracked; CI clones with LFS. Manageable.
  • Constraint serialization edge cases — frontend may send malformed JSON; pydantic catches most, but enum validation for position / type literals must be explicit.
  • Test coverage during the move — moving 8 modules at once is a big step. Mitigate by doing the move per-module with tests in between (Task 1, Task 2, ...).
  • Spike vs package import drift — spike tests use sys.path.insert(0, str(Path(__file__).parent)); package tests use real package imports. Test files need rewriting, not just moving.

Open questions

  • paragraph_csp.spec_lexicon migration — the spec_lexicon helper currently lives in paradigm_3_csp.py (a v1 entry point that became a thin shim). Move spec_lexicon to phonolex_generators.csp.specs or keep in a compat module? Probably move. Defer to plan task design.
  • Backward-compat for the Worker route/api/generate-single was the only generation endpoint; existing frontend code paths use it. Keep alias in BOTH server AND worker for at least the PHON-110 frontend transition.
  • Server test database fixture — current server tests use a small fixture WordStore. Need to either keep or load real data/runtime/*.parquet for some tests. Probably load real data + skip if LFS not pulled (similar to how phonolex_data tests handle it).

Self-review

  • [x] Concrete decisions: package layout, endpoint signatures, response shape, follow-up bundling, retirement scope.
  • [x] No "TBD" placeholder language.
  • [x] Internal consistency: data flow goes spike → package → server → worker. RunPod is fully retired in scope.
  • [x] Scope decomposed: code-ready vs deploy-ready split. PHON-109b handles deployment.
  • [x] Ambiguity check: /api/generate-single alias semantics explicit; constraint type-mapping enumerated; cold-start budget stated.