Governed Generation Pipeline Transparency — Design¶
Date: 2026-04-25
Branch: feature/phon-20-gov-gen-transparency
Tickets: PHON-20 (UI disclaimer for restrictive constraint combinations) + scope expansion
Goal¶
Make the generation pipeline auditable in real time. Replace the flickering single-line statusMessage with a skeleton OutputCard that shows pipeline progress and per-draft outcomes. Build the restrictive-constraints disclaimer into the same surface. Stop the server-status chip from claiming "ready" when a cold start is needed. Reduce drafts from 4 to 3.
Compliance, audit trail, and transparency are the product story; the UI should reflect it.
Scope¶
- Frontend: GovernedGenerationTool refactor + new
SkeletonOutputCardcomponent, server-status chip rewrite, educational hint inActiveConstraints. - Workers proxy: refined server-status state machine over RunPod's worker counts.
- Generation server: drop draft count 4 → 3, emit structured pipeline/draft/selection events alongside existing free-text status.
Out of scope¶
- Pre-flight live vocabulary-survival readout (separate ticket — needs a new lightweight endpoint).
- Compliance panel redesign from older feedback (already de facto done; the layout is single-column with inline summaries).
- Architectural changes to RunPod state interpretation beyond which fields we read.
Server-status state machine¶
The Workers proxy currently buckets RunPod's idle + ready + running workers into a single totalWorkers count and reports ready whenever that count is positive. RunPod's idle field counts cold reservation slots, not warm workers — so the proxy reports ready while a cold start is still required, and the chip lies.
Replace the boolean bucket with a 5-state derivation:
| Condition | State | Chip label | Chip color | Cold-start hint |
|---|---|---|---|---|
running > 0 \|\| ready > 0 |
warm |
"GPU warm" | green | no |
initializing > 0 |
starting |
"GPU starting…" | yellow | yes |
running == 0 && ready == 0 && idle > 0 |
idle |
"GPU idle" | yellow | yes |
| all zero | cold |
"Scale-to-zero" | yellow | yes |
| RunPod unreachable / API error | error |
"Server error" | red | n/a |
canGenerate = status !== 'error'. Cold-start hint shows for everything except warm. Single source of truth: chip color tracks whether the next request will pay cold-start cost.
Educational hint¶
Always-visible single line under ActiveConstraints.tsx, secondary text color, sits with the chip count:
Each constraint trims the vocabulary the model can draw from. Stack enough of them and coherence starts to suffer.
The hint lives with constraint composition — the act it describes — rather than near the Generate button, where it would compete with the cold-start hint and status messages.
Skeleton OutputCard¶
A new component SkeletonOutputCard.tsx renders pipeline progress and draft outcomes during generation. Drops into the top of the OutputFeed list at request start. Hydrates in place into the normal OutputCard once the result arrives.
┌─────────────────────────────────────────────┐
│ ⏳ Generating… [delete] │
│ Pipeline: │
│ ✓ Constraints resolved (5) │
│ ✓ Vocabulary tagged │
│ ! 12% survival — output will be │ ← banner if <20%
│ shorter than usual │
│ Drafts: │
│ Draft 1 ✓ compliant score 8.2 │
│ Draft 2 ⏳ generating… │
│ Draft 3 · queued │
│ [Rollout slot appears if escalation fires] │
└─────────────────────────────────────────────┘
Pipeline section¶
Tick lines for every pipeline step: Constraints resolved, Vocabulary tagged, Survival computed. Pre-draft phases are shown explicitly rather than collapsed — transparency of the audit trail is the point. Each step renders as ✓ <description> once complete.
Survival banner¶
Appears the moment the pipeline event with survival field lands. Threshold-driven copy:
<5%(red banner): "Only X% of vocabulary survives these constraints. Output may be incoherent — consider relaxing or removing one."<20%(yellow banner): "X% vocabulary survival. Output will be shorter than usual."≥20%: no banner.
Banner persists onto the resulting OutputCard's pipeline trace footer after hydration.
Drafts section¶
3 draft slots. Each slot shows the index and a state indicator that transitions:
· queued— gray dot, not yet started⏳ generating…— spinner, in flight✓ compliant score X.X— green check, score from_score_draft✗ N violations— red X, violation count
If targeted-rollout escalation fires (no compliant drafts after retries), an extra Rollout slot appears below Draft 3 and goes through the same state transitions.
Hydration¶
When the result event arrives, the skeleton transforms in place into the normal OutputCard:
- Winning draft text fills the body.
- Pipeline + draft list collapse into a
<details>"Pipeline trace" footer panel. - Alternate (top-2) accessible via the existing
onSwapAlternateswap menu.
In-place hydration avoids a layout jump.
Data flow¶
The generation server already yields free-text status events via SSE. We add structured events alongside them so the frontend can update the skeleton precisely. Free-text events stay for backward compatibility and log streaming.
Event payload shape:
{"event": "pipeline", "step": "constraints_resolved", "count": 5}
{"event": "pipeline", "step": "vocab_tagged", "survival": 0.12, "max_tokens": 80}
{"event": "draft", "index": 0, "state": "generating"}
{"event": "draft", "index": 0, "state": "compliant", "score": 8.2}
{"event": "draft", "index": 1, "state": "violations", "count": 2}
{"event": "rollout", "state": "generating"}
{"event": "rollout", "state": "compliant"}
{"event": "selection", "winner_index": 0, "compliant_count": 2}
{"result": {...}}
{"error": "..."}
Frontend reducer dispatches on event key:
pipeline: append tick line forstep; surface survival banner onvocab_taggedifsurvival < 0.20.draft: update slot atindexto the new state.rollout: append rollout slot or update its state.selection: mark winner index; prepare for hydration.result: hydrate skeleton into OutputCard, collapse pipeline trace into footer.error: skeleton becomes a failed-card with message + preserved trace.
Unknown events fall back to free-text rendering in a generic log line.
Server changes¶
packages/generation/server/routes/generate.py:
n_batch = 3(was 4).draft_diversityreduced to 3 entries — drop{"repetition_penalty": 2.0, "temperature": 0.9}. The defaults already userepetition_penalty=1.2, making the dedicated anti-repetition draft redundant.- New
_emit_event(payload: dict) -> strhelper alongside_emit()— yields a JSON line underdata: {"event": ...}. - Each pipeline phase yields paired events (free-text + structured) so backward-compatible parsers keep working:
yield {"status": f"Vocabulary survival: {surviving_ratio:.0%} → max {max_tokens} tokens"}
yield {"event": "pipeline", "step": "vocab_tagged", "survival": surviving_ratio, "max_tokens": max_tokens}
Top-2 alternates: existing code returns compliant_drafts[1:2] (one alternate). No change beyond reducing n_batch.
Error handling¶
- SSE stream interrupted mid-pipeline: skeleton displays a "Connection lost" badge, retains progress so far,
Regeneratebutton enabled. - Server returns
errorevent: skeleton becomes a failed-card with the error message; the partial pipeline trace is preserved. - Cold-start timeout (existing path): surfaced via the
errorevent; skeleton handles like any other error. - Unknown structured event: logged to client console, free-text fallback rendered.
Files to touch¶
Frontend:
- packages/web/frontend/src/types/governance.ts — extend ServerStatus.status union to 'warm' | 'starting' | 'idle' | 'cold' | 'error'; add structured event types.
- packages/web/frontend/src/lib/generationApi.ts — parse structured event lines alongside free-text status; add onPipeline, onDraft, onRollout, onSelection callbacks (or a single onEvent with discriminated union).
- packages/web/frontend/src/components/tools/GovernedGenerationTool/index.tsx — server-status chip rewrite (5 states), reducer for in-progress generation, render SkeletonOutputCard at top of feed.
- packages/web/frontend/src/components/tools/GovernedGenerationTool/ActiveConstraints.tsx — add educational hint line.
- packages/web/frontend/src/components/tools/GovernedGenerationTool/SkeletonOutputCard.tsx — new.
- packages/web/frontend/src/components/tools/GovernedGenerationTool/OutputFeed.tsx — accept "in-progress" state, render SkeletonOutputCard above results.
- packages/web/frontend/src/components/tools/GovernedGenerationTool/OutputCard.tsx — accept optional pipelineTrace prop, render as collapsible footer.
Workers:
- packages/web/workers/src/routes/generation.ts — replace totalWorkers bucket with 5-state derivation.
Generation server:
- packages/generation/server/routes/generate.py — n_batch = 3, drop diversity entry 4, add _emit_event helper, pair structured events with each free-text status.
Testing¶
- Workers proxy (
packages/web/workers/test/): vitest cases for each of the 5 derived states with mock RunPod health responses. Cover the field-combinations that mattered for the bug (idle: 1, running: 0should beidle, notwarm). - Frontend (component tests): SkeletonOutputCard transitions through queued → generating → compliant; banner thresholds at 4%, 19%, 20%, 50%; in-place hydration replaces skeleton with OutputCard.
- Generation server (
packages/generation/server/tests/): pytest cases for_emit_eventpayload shape; assertn_batch == 3in the integration test for/api/generate-single; verify structured events fire in order with expected fields.
Risks¶
- Structured-event parsing on the client adds another protocol surface. Mitigated by keeping free-text
statusas the fallback; the skeleton degrades gracefully if structured events are missing. - In-place hydration is more complex than fade-out/fade-in. Mitigated by sharing the outer container styling between
SkeletonOutputCardandOutputCardso the swap is visually seamless. "In place" is a layout claim, not a React instance-preservation claim — the user sees no jump even though the underlying component swaps. - Reducing drafts from 4 to 3 may marginally hurt best-of-N diversity. Mitigated by keeping the three most distinct sampling strategies (defaults, conservative, exploratory); the dropped anti-repetition entry overlapped heavily with defaults.