PHON-109 — Productionization design¶
Goal¶
Move the CSP + reranker_v2 stack from packages/generation/research/2026-05-07-sentence-generation-paradigms/ (spike) into the phonolex_generators package as a csp submodule. Replace the v6/T5Gemma generation path in the FastAPI server with two new endpoints (/api/generate-sentences, /api/generate-paragraphs) that consume the new path. Bundle the follow-ups accumulated across PHON-106/112/113/107. Retire RunPod-related code.
This ticket is code-ready, not deploy-ready. The Cloudflare Containers cutover is a sibling ticket; PHON-109 makes the new server runnable locally and replaces the source-side paths the Worker will eventually call.
Motivation¶
The CSP + reranker_v2 stack has been functionally complete since PHON-107 but lives in a research spike directory. The FastAPI server still hosts the v6/T5Gemma path (retired per user direction 2026-05-08). The Worker still proxies /api/generate-single to RunPod, which runs the retired v6 server. None of the productionization work is done.
Three follow-ups have accumulated:
1. pair_driven.solve locked_slots only enforces V (not filler roles); paragraph_csp has 2 local workarounds (Task 7 post-filter + lock-after-pick _solve_with_locked_nsubj)
2. Top-K dedup in pair_driven.solve — same-surface candidates from different pair orientations
3. Render-layer quality (low priority; reranker handles)
PHON-109 bundles these as we migrate, so the productionized code starts clean.
Architecture¶
data/runtime/
├─ words.parquet (existing, LFS)
├─ edges.parquet (existing, LFS)
├─ selectional.parquet (existing, LFS)
├─ pairs.parquet (existing, LFS — PHON-106)
├─ skeletons.parquet (NEW — moved from spike/outputs/, LFS)
└─ reranker_v2.pkl (NEW — moved from spike/outputs/, LFS)
packages/generators/src/phonolex_generators/
├─ __init__.py
├─ cfg_seed/ (existing — PHON-95)
├─ editor/ (existing — PHON-95)
├─ scorer/ (existing — PHON-95)
├─ shared/ (existing — PHON-95)
└─ csp/ (NEW — PHON-109)
├─ __init__.py
├─ pair_driven.py ← moved from spike (locked_slots fix + dedup)
├─ verb_candidates.py ← moved from spike
├─ paragraph.py ← moved from spike paragraph_csp.py
├─ skeleton.py ← moved from spike skeleton_csp.py (the helpers we kept after Task 10 retirement)
├─ constraint_surface.py ← moved from spike
├─ realize.py ← moved from spike (the `realize` function + helpers)
├─ skeletons.py ← skeletons-parquet loader
├─ reranker/
│ ├─ __init__.py
│ ├─ train.py ← moved from spike train_reranker_v2.py
│ ├─ predict.py ← moved from spike quality_axis_v2.py
│ ├─ rerank.py ← moved from spike reranker_v2.py
│ ├─ active_learning.py ← moved from spike active_learning_select.py
│ ├─ embedding_cache.py ← moved from spike
│ └─ judge.py ← moved from spike llm_judge.py (teacher-labeling tool)
└─ tests/
└─ ... (moved from spike test_*.py)
packages/generation/server/
├─ main.py ← server entry; CSP cold-start instead of T5Gemma
├─ schemas.py ← new request/response models
└─ routes/
└─ generate.py ← /api/generate-sentences, /api/generate-paragraphs, /api/generate-single (alias)
packages/web/workers/src/routes/generation.ts
← rewrite: route /generate-sentences and /generate-paragraphs to the new
FastAPI server (Cloudflare Containers eventually, env-configurable host)
Drop /generate-single's RunPod path; alias to /generate-sentences.
The packages/generation/research/2026-05-07-sentence-generation-paradigms/ directory becomes archive — kept in git history but no longer the canonical path.
API¶
POST /api/generate-sentences¶
Request:
{
"spec": "spec1",
"band": "fineweb_adult",
"constraints": [
{"type": "exclude", "phonemes": ["ɹ"]},
{"type": "contrastive_minpair", "phoneme1": "d", "phoneme2": "z", "position": "final"}
],
"locked_slots": {"V": "cut"},
"axis_weights": {"age_appropriate": 0.4, "coherence": 0.3, "naturalness": 0.2, "grammaticality": 0.1},
"top_k": 8
}
Response:
{
"candidates": [
{
"sentence": "The seed drills in the seas.",
"verb": "drill",
"fillers": {"V": "drill", "nsubj": "seed", "pobj_in": "seas"},
"skeleton": "nsubj,V,pobj_in",
"axis_scores": {"naturalness": 1.99, "grammaticality": 2.60, "age_appropriate": 2.93, "coherence": 3.08},
"composite_score": 2.65,
"feature_distance": 0.8,
"sonorant_diff": 0.0,
"ppmi_total": 5.0
},
...
],
"n_total_candidates": 142,
"diagnostics": {
"verb_candidates_count": 87,
"pair_frame_height": 105,
"join_rows": 142
}
}
n_total_candidates is the count BEFORE top_k truncation, so the frontend can show "Top 8 of 142".
POST /api/generate-paragraphs¶
Request: same shape but adds n_sentences (default 3); paragraph-level constraints (multopp) accepted.
Response:
{
"paragraphs": [
{
"discourse_subject": "seed",
"sentences": [
{/* sentence-level candidate, same shape as above */},
...
],
"composite_score": 2.65,
"axis_scores": {...},
"score": 44.91
},
...
],
"n_total_paragraphs": 12,
"diagnostics": {...}
}
The paragraph-level axis_scores come from running reranker_v2 on the joined paragraph text. The per-sentence composite_score and axis_scores are also retained for inspection. score is the sum of per-sentence ppmi_total (legacy).
POST /api/generate-single (alias)¶
Equivalent to /api/generate-sentences with top_k=1. Returns just one candidate (not a list). Kept for backward compat; soft-deprecated.
Constraint serialization¶
The frontend sends constraints as JSON. The server parses into the dataclasses from phonolex_generators.csp.constraint_surface.
Mapping table:
Frontend type |
Dataclass |
|---|---|
exclude |
ExcludeConstraint(phonemes) |
include |
IncludeConstraint(phonemes) |
bound |
BoundConstraint(norm, min_value, max_value) |
bound_boost |
BoundBoostConstraint(norm, min_value, max_value) |
contrastive_minpair |
MinpairConstraint(phoneme1, phoneme2, position, slots?) |
contrastive_maxopp |
MaxoppConstraint(phoneme1, phoneme2, position, min_sonorant_diff, slots?) |
contrastive_multopp |
MultoppConstraint(substitute, targets, n_targets, position) |
Schema validation in pydantic; reject unknown types.
Server cold-start¶
# packages/generation/server/main.py — pseudocode
from phonolex_generators.csp import (
pair_driven, paragraph, reranker, skeletons,
)
from phonolex_data.runtime.store import WordStore
@app.on_event("startup")
async def load_state():
app.state.store = WordStore.from_parquet(DATA_RUNTIME / "words.parquet")
app.state.sel_df = pl.read_parquet(DATA_RUNTIME / "selectional.parquet")
app.state.skeletons_df = pl.read_parquet(DATA_RUNTIME / "skeletons.parquet")
app.state.reranker_path = DATA_RUNTIME / "reranker_v2.pkl"
# Reranker model loads lazily on first request via _cached_model
Cold start expectations: - WordStore: ~600MB parquet → ~2s - selectional: ~5.4M rows → ~5s - skeletons: ~219K rows → ~0.5s - reranker_v2.pkl: ~80MB pickle → ~1s on first request - MiniLM-L6-v2: ~80MB on first request → ~5s
Total cold-start: <15s (vs T5Gemma's ~60s). Warm requests: ~1-2s.
Bundled follow-ups¶
Follow-up 1: pair_driven.solve locked_slots for filler roles¶
Currently pair_driven.solve only honors locked_slots["V"]. Filler-slot locks (locked_slots["nsubj"], etc.) are silently ignored. paragraph_csp has 2 local workarounds (Task 7 post-filter + lock-after-pick _solve_with_locked_nsubj).
Fix in pair_driven.solve:
- After resolve_contrastive_join (or non-contrastive enumeration) produces the candidate frame, filter rows where role_a==slot AND filler_a==locked_value (or symmetric for role_b) for each entry in locked_slots.
- Remove the 2 paragraph_csp workarounds.
Follow-up 2: Top-K dedup¶
pair_driven.solve produces candidates from both orientations of pair frames. Sometimes (verb, fillers, skeleton) tuples appear twice. Dedup pass before top_k truncation:
seen: set[tuple] = set()
deduped = []
for c in scored:
key = (c["verb"], frozenset(c["fillers"].items()), c["skeleton"])
if key in seen:
continue
seen.add(key)
deduped.append(c)
return deduped[:top_k]
Optional refinement: dedup by sentence text after realize() if structural dedup misses cases.
Follow-up 3: Render-quality polish (deferred)¶
"The seed the seed drills" type duplications arise from realize() when same word fills multiple slots. Lower priority; reranker scores them down. Defer to a separate ticket if someone wants to clean up the realizer.
Scope¶
In scope:
- Move spike modules → phonolex_generators.csp.*
- Move v2 reranker modules → phonolex_generators.csp.reranker.*
- Move skeletons.parquet + reranker_v2.pkl → data/runtime/, LFS-track
- Update import paths everywhere
- Bundle follow-up 1 (locked_slots) and follow-up 2 (dedup) into the move
- New FastAPI routes: /api/generate-sentences, /api/generate-paragraphs
- /api/generate-single alias
- Update server tests
- Update Worker proxy routes (point at new endpoint paths; still configurable backend host)
- Retire RunPod env vars, runpod_handler.py, T5Gemma model loading
- Spike directory becomes archive (kept in git, no deletion)
Out of scope: - Cloudflare Containers actual deployment (PHON-109b or sibling) - Frontend reframe (PHON-110) - Render-quality polish (deferred) - v3 reranker
Migration plan¶
-
Survives unchanged:
packages/data/(data layer),packages/governors/(still used? — verify), the existingphonolex_generatorspackage's PHON-95 modules (cfg_seed, editor, scorer, shared). -
Gets retired:
packages/generation/server/governor.py(v6 governor — replaced by CSP)packages/generation/server/model.py(T5Gemma loader)packages/generation/rp_handler.py(RunPod handler)packages/generation/Dockerfile(RunPod image)-
Worker
RUNPOD_*env vars,runpodUrl(),runpodHeaders()helpers -
Gets moved + lightly modified:
- All CSP/reranker spike modules →
packages/generators/src/phonolex_generators/csp/ - Tests →
packages/generators/tests/csp/ -
Two follow-up fixes applied during the move (locked_slots, dedup)
-
Gets rewritten:
packages/generation/server/main.py— CSP cold-start instead of T5Gemmapackages/generation/server/schemas.py— new request/response shapespackages/generation/server/routes/generate.py— three new routes-
packages/web/workers/src/routes/generation.ts— route to new server, drop RunPod logic -
Branch: continues
feature/csp-iterationafter PHON-107. No PR until PHON-109b deployment lands and staging is green.
Risks¶
- WordStore + selectional load time — cold starts may push past 15s if data grows. Mitigation: persistent process (Cloudflare Containers keeps warm), not serverless cold-start each request.
- Reranker model artifact size — reranker_v2.pkl is ~80MB. LFS-tracked; CI clones with LFS. Manageable.
- Constraint serialization edge cases — frontend may send malformed JSON; pydantic catches most, but enum validation for
position/typeliterals must be explicit. - Test coverage during the move — moving 8 modules at once is a big step. Mitigate by doing the move per-module with tests in between (Task 1, Task 2, ...).
- Spike vs package import drift — spike tests use
sys.path.insert(0, str(Path(__file__).parent)); package tests use real package imports. Test files need rewriting, not just moving.
Open questions¶
paragraph_csp.spec_lexiconmigration — the spec_lexicon helper currently lives inparadigm_3_csp.py(a v1 entry point that became a thin shim). Move spec_lexicon tophonolex_generators.csp.specsor keep in a compat module? Probably move. Defer to plan task design.- Backward-compat for the Worker route —
/api/generate-singlewas the only generation endpoint; existing frontend code paths use it. Keep alias in BOTH server AND worker for at least the PHON-110 frontend transition. - Server test database fixture — current server tests use a small fixture WordStore. Need to either keep or load real data/runtime/*.parquet for some tests. Probably load real data + skip if LFS not pulled (similar to how phonolex_data tests handle it).
Self-review¶
- [x] Concrete decisions: package layout, endpoint signatures, response shape, follow-up bundling, retirement scope.
- [x] No "TBD" placeholder language.
- [x] Internal consistency: data flow goes spike → package → server → worker. RunPod is fully retired in scope.
- [x] Scope decomposed: code-ready vs deploy-ready split. PHON-109b handles deployment.
- [x] Ambiguity check:
/api/generate-singlealias semantics explicit; constraint type-mapping enumerated; cold-start budget stated.