Governed Generation Pipeline Transparency Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Replace flickering single-line statusMessage with a SkeletonOutputCard that shows pipeline + per-draft progress; surface restrictive-constraints disclaimer; fix the server-status chip mislabel; reduce drafts 4→3.
Architecture: Workers proxy gains a pure 5-state status derivation (testable). Generation server emits structured event payloads alongside existing free-text status events. Frontend reducer maps events to a skeleton card that hydrates in place.
Tech Stack: Hono on Cloudflare Workers (vitest), FastAPI + T5Gemma (pytest), React + TypeScript + MUI (vitest where useful, manual verification on dev servers).
Spec: docs/superpowers/specs/2026-04-25-governed-generation-pipeline-transparency-design.md
File Structure¶
Create:
- packages/web/workers/src/lib/serverStatus.ts — pure 5-state derivation function
- packages/web/workers/src/__tests__/serverStatus.test.ts — vitest unit tests
- packages/web/frontend/src/components/tools/GovernedGenerationTool/SkeletonOutputCard.tsx — pipeline + drafts skeleton
- packages/web/frontend/src/components/tools/GovernedGenerationTool/skeletonReducer.ts — pure reducer (events → skeleton state)
- packages/web/frontend/src/components/tools/GovernedGenerationTool/skeletonReducer.test.ts — vitest unit tests
Modify:
- packages/web/workers/src/routes/generation.ts — call new derivation, return new fields
- packages/web/frontend/src/types/governance.ts — extend ServerStatus union, add structured event types
- packages/web/frontend/src/lib/generationApi.ts — parse structured events
- packages/web/frontend/src/components/tools/GovernedGenerationTool/index.tsx — chip rewrite, skeleton state, render
- packages/web/frontend/src/components/tools/GovernedGenerationTool/ActiveConstraints.tsx — educational hint
- packages/web/frontend/src/components/tools/GovernedGenerationTool/OutputFeed.tsx — render in-progress skeleton above results
- packages/web/frontend/src/components/tools/GovernedGenerationTool/OutputCard.tsx — optional pipelineTrace footer
- packages/generation/server/routes/generate.py — n_batch 4→3, _emit_event helper, paired events
- packages/generation/server/tests/test_api.py — assertions for n_batch + event shape
Task 1: Workers proxy — extract pure server-status derivation¶
Files:
- Create: packages/web/workers/src/lib/serverStatus.ts
- Create: packages/web/workers/src/__tests__/serverStatus.test.ts
- [ ] Step 1.1: Write failing test for the 5 states
Create packages/web/workers/src/__tests__/serverStatus.test.ts:
import { describe, it, expect } from 'vitest';
import { deriveServerStatus } from '../lib/serverStatus';
type Workers = { idle: number; ready: number; running: number; initializing: number; throttled: number };
const w = (overrides: Partial<Workers> = {}): Workers => ({
idle: 0, ready: 0, running: 0, initializing: 0, throttled: 0, ...overrides,
});
describe('deriveServerStatus', () => {
it('returns "warm" when running > 0', () => {
expect(deriveServerStatus(w({ running: 1 }))).toBe('warm');
});
it('returns "warm" when ready > 0 (RunPod ready = warm and waiting)', () => {
expect(deriveServerStatus(w({ ready: 1 }))).toBe('warm');
});
it('returns "starting" when initializing > 0', () => {
expect(deriveServerStatus(w({ initializing: 1 }))).toBe('starting');
});
it('returns "starting" even when initializing co-occurs with idle (cold start dominates)', () => {
expect(deriveServerStatus(w({ initializing: 1, idle: 1 }))).toBe('starting');
});
it('returns "idle" when only idle workers (RunPod cold reservation, not warm)', () => {
expect(deriveServerStatus(w({ idle: 1 }))).toBe('idle');
});
it('returns "cold" when all worker counts are zero', () => {
expect(deriveServerStatus(w())).toBe('cold');
});
});
- [ ] Step 1.2: Run the test to verify it fails
Run: cd packages/web/workers && npm test -- serverStatus
Expected: FAIL with Cannot find module '../lib/serverStatus'.
- [ ] Step 1.3: Implement the pure function
Create packages/web/workers/src/lib/serverStatus.ts:
/**
* Derive a 5-state generation-server status from RunPod worker counts.
*
* RunPod's `idle` field counts cold reservation slots, NOT warm workers —
* so an `idle` count alone means a cold start is needed. Only `running` or
* `ready` workers are truly warm.
*/
export type ServerStatusState = 'warm' | 'starting' | 'idle' | 'cold' | 'error';
export interface RunpodWorkers {
idle: number;
ready: number;
running: number;
initializing: number;
throttled: number;
}
export function deriveServerStatus(workers: RunpodWorkers): ServerStatusState {
if (workers.initializing > 0) return 'starting';
if (workers.running > 0 || workers.ready > 0) return 'warm';
if (workers.idle > 0) return 'idle';
return 'cold';
}
- [ ] Step 1.4: Run the test to verify it passes
Run: cd packages/web/workers && npm test -- serverStatus
Expected: PASS — 6 tests.
- [ ] Step 1.5: Wire into the route handler
Modify packages/web/workers/src/routes/generation.ts lines 211-218:
Replace:
const totalWorkers = health.workers.idle + health.workers.ready + health.workers.running;
const anyInitializing = health.workers.initializing > 0;
return c.json({
model: 'google/t5gemma-9b-2b-ul2-it',
vocab_size: 256000,
memory_gb: totalWorkers > 0 ? 24.6 : 0,
status: anyInitializing ? 'loading' : (totalWorkers > 0 ? 'ready' : 'serverless'),
error: null,
lookup_entries: totalWorkers > 0 ? 256000 : 0,
workers: health.workers,
});
With:
const derived = deriveServerStatus(health.workers);
const isWarm = derived === 'warm';
return c.json({
model: 'google/t5gemma-9b-2b-ul2-it',
vocab_size: 256000,
memory_gb: isWarm ? 24.6 : 0,
status: derived,
error: null,
lookup_entries: isWarm ? 256000 : 0,
workers: health.workers,
});
Add the import at the top of the file:
import { deriveServerStatus } from '../lib/serverStatus';
- [ ] Step 1.6: Run all workers tests
Run: cd packages/web/workers && npm test
Expected: all tests pass; type-check clean.
- [ ] Step 1.7: Commit
git add packages/web/workers/src/lib/serverStatus.ts \
packages/web/workers/src/__tests__/serverStatus.test.ts \
packages/web/workers/src/routes/generation.ts
git commit -m "$(cat <<'EOF'
feat(workers): PHON-20 — 5-state server-status derivation
Replaces the boolean idle+ready+running bucket (which mislabeled cold
RunPod reservation slots as 'ready') with a 5-state machine:
* warm — running > 0 || ready > 0
* starting — initializing > 0
* idle — only idle workers present (cold reservation, cold-start needed)
* cold — all worker counts zero (scale-to-zero)
* error — handled at error path
Pure derivation extracted to lib/serverStatus.ts for unit testing.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 2: Frontend ServerStatus type — extend status union¶
Files:
- Modify: packages/web/frontend/src/types/governance.ts:138
- [ ] Step 2.1: Update the type union
Replace line 138 in packages/web/frontend/src/types/governance.ts:
status: "loading" | "ready" | "error" | "serverless";
With:
status: "warm" | "starting" | "idle" | "cold" | "error";
- [ ] Step 2.2: Run type-check (will fail in chip code — Task 3 fixes it)
Run: cd packages/web/frontend && npm run type-check
Expected: errors in index.tsx referencing the old states. That's expected — Task 3 fixes them.
- [ ] Step 2.3: Don't commit yet
The type change is breaking until Task 3 lands. We'll commit Task 2 + Task 3 together.
Task 3: Frontend chip rewrite — render 5 states + cold-start hint¶
Files:
- Modify: packages/web/frontend/src/components/tools/GovernedGenerationTool/index.tsx:140-159, 43-48
- [ ] Step 3.1: Replace canGenerate / showColdStartHint logic
In packages/web/frontend/src/components/tools/GovernedGenerationTool/index.tsx lines 43-48:
Replace:
const canGenerate =
serverStatus?.status === 'ready' ||
serverStatus?.status === 'loading' ||
serverStatus?.status === 'serverless';
const showColdStartHint =
serverStatus?.status === 'loading' || serverStatus?.status === 'serverless';
With:
const canGenerate =
serverStatus != null && serverStatus.status !== 'error';
const showColdStartHint =
serverStatus != null && serverStatus.status !== 'warm' && serverStatus.status !== 'error';
- [ ] Step 3.2: Replace the chip block
In the same file lines 140-160, replace the chip block:
<Box sx={{ mb: 2 }}>
{!serverStatusHasFetched && (
<Chip size="small" label="Checking server…" variant="outlined" />
)}
{serverStatusHasFetched && serverStatus?.status === 'ready' && (
<Chip size="small" label="Server ready" color="success" variant="outlined" />
)}
{serverStatusHasFetched && serverStatus?.status === 'loading' && (
<Chip size="small" label="Worker starting…" color="warning" variant="outlined" />
)}
{serverStatusHasFetched && serverStatus?.status === 'serverless' && (
<Chip size="small" label="Server idle" color="warning" variant="outlined" />
)}
{serverStatusHasFetched && serverStatus?.status === 'error' && (
<Chip size="small" label="Server error" color="error" variant="outlined" />
)}
{serverStatusHasFetched && serverStatus == null && (
<Chip size="small" label="Server unreachable" color="error" variant="outlined" />
)}
</Box>
With:
<Box sx={{ mb: 2 }}>
{!serverStatusHasFetched && (
<Chip size="small" label="Checking server…" variant="outlined" />
)}
{serverStatusHasFetched && serverStatus?.status === 'warm' && (
<Chip size="small" label="GPU warm" color="success" variant="outlined" />
)}
{serverStatusHasFetched && serverStatus?.status === 'starting' && (
<Chip size="small" label="GPU starting…" color="warning" variant="outlined" />
)}
{serverStatusHasFetched && serverStatus?.status === 'idle' && (
<Chip size="small" label="GPU idle" color="warning" variant="outlined" />
)}
{serverStatusHasFetched && serverStatus?.status === 'cold' && (
<Chip size="small" label="Scale-to-zero" color="warning" variant="outlined" />
)}
{serverStatusHasFetched && serverStatus?.status === 'error' && (
<Chip size="small" label="Server error" color="error" variant="outlined" />
)}
{serverStatusHasFetched && serverStatus == null && (
<Chip size="small" label="Server unreachable" color="error" variant="outlined" />
)}
</Box>
- [ ] Step 3.3: Run type-check + lint
Run: cd packages/web/frontend && npm run type-check && npm run lint
Expected: clean.
- [ ] Step 3.4: Manually verify in dev server
Open the running frontend (http://localhost:5173), navigate to Governed Generation. Observe the chip:
- If RunPod is warm: "GPU warm" green
- If RunPod has only idle: "GPU idle" yellow + cold-start hint visible
- If scaled to zero: "Scale-to-zero" yellow + cold-start hint visible
- [ ] Step 3.5: Commit Task 2 + Task 3 together
git add packages/web/frontend/src/types/governance.ts \
packages/web/frontend/src/components/tools/GovernedGenerationTool/index.tsx
git commit -m "$(cat <<'EOF'
feat(web): PHON-20 — 5-state server-status chip + accurate cold-start hint
Frontend now renders the new 5-state union from the Workers proxy:
warm/starting/idle/cold/error. The cold-start hint shows for everything
except 'warm' — single source of truth for whether the next request
will pay cold-start cost.
Replaces the misleading 'Server ready' that was firing whenever RunPod
reported any idle reservation slot.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 4: Generation server — drop draft 4¶
Files:
- Modify: packages/generation/server/routes/generate.py:183, 196-203
- [ ] Step 4.1: Reduce n_batch and draft_diversity
In packages/generation/server/routes/generate.py:
Change line 183:
n_batch = 4
n_batch = 3
Change lines 196-203:
# Per-draft diversity: 4 drafts with different sampling strategies
# All share one encoder pass; each gets its own decode loop + KV cache
draft_diversity = [
{}, # draft 1: defaults
{"temperature": 0.7, "top_k": 30}, # draft 2: conservative
{"temperature": 1.0, "top_p": 0.95}, # draft 3: exploratory
{"repetition_penalty": 2.0, "temperature": 0.9}, # draft 4: anti-repetition
]
# Per-draft diversity: 3 drafts with different sampling strategies
# All share one encoder pass; each gets its own decode loop + KV cache
# (Anti-repetition draft dropped — defaults already use rep_penalty=1.2.)
draft_diversity = [
{}, # draft 1: defaults
{"temperature": 0.7, "top_k": 30}, # draft 2: conservative
{"temperature": 1.0, "top_p": 0.95}, # draft 3: exploratory
]
- [ ] Step 4.2: Run server tests
Run: cd packages/generation && uv run python -m pytest server/tests/ -v
Expected: all tests pass (the change is a tuning constant; no test should depend on n_batch == 4).
- [ ] Step 4.3: Manually verify on the running generation server
The dev server should auto-reload. Make a test generation request (any prompt, no constraints) via the dev frontend. The server log (/tmp/claude-501/.../bt1wwlzfa.output) should show 3 drafts being generated, not 4.
- [ ] Step 4.4: Commit
git add packages/generation/server/routes/generate.py
git commit -m "$(cat <<'EOF'
feat(generation): PHON-20 — reduce drafts 4 → 3
Drops the dedicated anti-repetition draft (rep_penalty=2.0) — the default
draft already uses rep_penalty=1.2, so the fourth slot was largely
redundant. Keeps the three most distinct sampling strategies:
defaults, conservative (T=0.7, k=30), exploratory (T=1.0, p=0.95).
Best-of selection still returns top primary + 1 alternate.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 5: Generation server — structured events alongside free-text¶
Files:
- Modify: packages/generation/server/routes/generate.py (multiple yield points + new helper)
- Modify: packages/generation/server/tests/test_api.py (event shape assertion)
- [ ] Step 5.1: Write failing test for
_emit_eventshape
The existing client fixture mocks the model, so a live-streaming integration test would require loading T5Gemma — out of scope for unit tests. Instead, test the helper's wire format directly.
Append to packages/generation/server/tests/test_api.py:
import json
def test_emit_event_shape():
"""_emit_event wraps a payload dict in SSE 'data: ...\\n\\n' framing."""
from server.routes.generate import _emit_event
out = _emit_event({"event": "draft", "index": 0, "state": "compliant", "score": 8.2})
assert out.startswith("data: ")
assert out.endswith("\n\n")
payload = json.loads(out[len("data: "):].strip())
assert payload == {"event": "draft", "index": 0, "state": "compliant", "score": 8.2}
def test_emit_event_handles_pipeline_payload():
"""Pipeline events serialize survival as a float."""
from server.routes.generate import _emit_event
out = _emit_event({"event": "pipeline", "step": "vocab_tagged", "survival": 0.12, "max_tokens": 80})
payload = json.loads(out[len("data: "):].strip())
assert payload["event"] == "pipeline"
assert payload["step"] == "vocab_tagged"
assert payload["survival"] == 0.12
- [ ] Step 5.2: Run the test to verify it fails
Run: cd packages/generation && uv run python -m pytest server/tests/test_api.py::test_emit_event_shape server/tests/test_api.py::test_emit_event_handles_pipeline_payload -v
Expected: FAIL — _emit_event does not exist yet (ImportError).
(End-to-end verification of structured events firing in order is left to the manual dev-server smoke test in Task 13.)
- [ ] Step 5.3: Add the _emit_event helper
In packages/generation/server/routes/generate.py, near the other _emit* helpers (around line 37-47), add:
def _emit_event(payload: dict) -> str:
"""Emit a structured pipeline/draft/selection event over SSE."""
return f"data: {json.dumps(payload)}\n\n"
(Add import json at the top of the file if not already imported.)
- [ ] Step 5.4: Pair structured events with each free-text status
In packages/generation/server/routes/generate.py, update the generate_pipeline async generator:
After the line yield {"status": f"Fetched word lists ({len(resolved)} constraints)"} (around line 131), add:
yield {"event": "pipeline", "step": "constraints_resolved", "count": len(resolved)}
After the line yield {"status": "Tagging vocabulary trie..."} (around line 135), add:
yield {"event": "pipeline", "step": "vocab_tagging"}
After the line yield {"status": f"Vocabulary survival: {surviving_ratio:.0%} → max {max_tokens} tokens"} (around line 179), add:
yield {"event": "pipeline", "step": "vocab_tagged", "survival": surviving_ratio, "max_tokens": max_tokens}
For the case where there are no bans (the else branch at line 181), add after max_tokens = 128:
yield {"event": "pipeline", "step": "vocab_tagged", "survival": 1.0, "max_tokens": max_tokens}
After yield {"status": f"Generating draft {idx + 1}/{n_batch}..."} (around line 225), add:
yield {"event": "draft", "index": idx, "state": "generating"}
Replace the per-draft outcome blocks (lines 234-238):
if not violations:
compliant_drafts.append((score, gen_ids, text))
yield {"status": f" Draft {idx + 1}/{n_batch}: compliant (score={score:.1f})"}
else:
yield {"status": f" Draft {idx + 1}/{n_batch}: {len(violations)} violations"}
With:
if not violations:
compliant_drafts.append((score, gen_ids, text))
yield {"status": f" Draft {idx + 1}/{n_batch}: compliant (score={score:.1f})"}
yield {"event": "draft", "index": idx, "state": "compliant", "score": round(score, 1)}
else:
yield {"status": f" Draft {idx + 1}/{n_batch}: {len(violations)} violations"}
yield {"event": "draft", "index": idx, "state": "violations", "count": len(violations)}
For the rollout block (lines 264, 293, 305-307), add structured rollout events:
After yield {"status": "Activating targeted rollout..."} (line 264), add:
yield {"event": "rollout", "state": "activating"}
After yield {"status": "Generating with targeted rollout..."} (line 293), add:
yield {"event": "rollout", "state": "generating"}
Replace lines 305-307:
if not violations:
score = model._score_draft(text, target_pairs)
compliant_drafts.append((score, gen_ids, text))
yield {"status": "Rollout draft: compliant"}
else:
yield {"status": f"Rollout draft: {len(violations)} violations remaining"}
With:
if not violations:
score = model._score_draft(text, target_pairs)
compliant_drafts.append((score, gen_ids, text))
yield {"status": "Rollout draft: compliant"}
yield {"event": "rollout", "state": "compliant", "score": round(score, 1)}
else:
yield {"status": f"Rollout draft: {len(violations)} violations remaining"}
yield {"event": "rollout", "state": "violations", "count": len(violations)}
Replace the selection block (lines 315-325):
if compliant_drafts:
compliant_drafts.sort(key=lambda x: x[0], reverse=True)
best_score, best_ids, best_text = compliant_drafts[0]
yield {"status": f"Selected best of {len(compliant_drafts)} drafts (score={best_score:.1f})"}
elif best_fallback is not None:
_, best_ids, best_text, gen_time = best_fallback
yield {"status": f"No compliant draft — using best with {best_fallback[0]} violations"}
else:
best_ids, best_text = [], ""
yield {"status": "No output produced"}
With:
if compliant_drafts:
compliant_drafts.sort(key=lambda x: x[0], reverse=True)
best_score, best_ids, best_text = compliant_drafts[0]
yield {"status": f"Selected best of {len(compliant_drafts)} drafts (score={best_score:.1f})"}
yield {"event": "selection", "winner_score": round(best_score, 1), "compliant_count": len(compliant_drafts)}
elif best_fallback is not None:
_, best_ids, best_text, gen_time = best_fallback
yield {"status": f"No compliant draft — using best with {best_fallback[0]} violations"}
yield {"event": "selection", "winner_score": None, "compliant_count": 0, "fallback_violations": best_fallback[0]}
else:
best_ids, best_text = [], ""
yield {"status": "No output produced"}
yield {"event": "selection", "winner_score": None, "compliant_count": 0}
Now extend the SSE emitter at line 444-452 to handle the new "event" key:
Find:
async def _generate_sse(req: GenerateSingleRequest):
async for event in generate_pipeline(req):
if "status" in event:
yield _emit(event["status"])
elif "result" in event:
yield _emit_result(event["result"])
elif "error" in event:
yield _emit_error(event["error"])
Replace with:
async def _generate_sse(req: GenerateSingleRequest):
async for event in generate_pipeline(req):
if "status" in event:
yield _emit(event["status"])
elif "event" in event:
yield _emit_event(event)
elif "result" in event:
yield _emit_result(event["result"])
elif "error" in event:
yield _emit_error(event["error"])
- [ ] Step 5.5: Run the test to verify it passes
Run: cd packages/generation && uv run python -m pytest server/tests/test_api.py::test_generate_emits_structured_events -v
Expected: PASS.
- [ ] Step 5.6: Run all server tests
Run: cd packages/generation && uv run python -m pytest server/tests/ -v
Expected: all tests pass.
- [ ] Step 5.7: Commit
git add packages/generation/server/routes/generate.py \
packages/generation/server/tests/test_api.py
git commit -m "$(cat <<'EOF'
feat(generation): PHON-20 — emit structured pipeline/draft/selection events
Adds JSON event payloads alongside the existing free-text status events
so the frontend can update a skeleton card precisely:
* {event: "pipeline", step: "constraints_resolved" | "vocab_tagged", ...}
* {event: "draft", index, state: "generating" | "compliant" | "violations", ...}
* {event: "rollout", state: "activating" | "generating" | "compliant" | "violations"}
* {event: "selection", winner_score, compliant_count}
Free-text status events stay for backward compatibility / log streaming.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 6: Frontend — structured event types¶
Files:
- Modify: packages/web/frontend/src/types/governance.ts (append at end)
- [ ] Step 6.1: Append event union types
Add to the bottom of packages/web/frontend/src/types/governance.ts:
// ---------------------------------------------------------------------------
// Generation streaming events — discriminated union (matches server emit)
// ---------------------------------------------------------------------------
export type PipelineStep = 'constraints_resolved' | 'vocab_tagging' | 'vocab_tagged';
export interface PipelineEvent {
event: 'pipeline';
step: PipelineStep;
count?: number; // constraints_resolved
survival?: number; // vocab_tagged (0..1)
max_tokens?: number; // vocab_tagged
}
export type DraftState = 'generating' | 'compliant' | 'violations';
export interface DraftEvent {
event: 'draft';
index: number;
state: DraftState;
score?: number; // compliant
count?: number; // violations
}
export type RolloutState = 'activating' | 'generating' | 'compliant' | 'violations';
export interface RolloutEvent {
event: 'rollout';
state: RolloutState;
score?: number;
count?: number;
}
export interface SelectionEvent {
event: 'selection';
winner_score: number | null;
compliant_count: number;
fallback_violations?: number;
}
export type GenerationEvent = PipelineEvent | DraftEvent | RolloutEvent | SelectionEvent;
- [ ] Step 6.2: Run type-check
Run: cd packages/web/frontend && npm run type-check
Expected: clean (additions only).
- [ ] Step 6.3: Don't commit yet — bundle with Task 7
Task 7: Frontend API client — parse structured events¶
Files:
- Modify: packages/web/frontend/src/lib/generationApi.ts:16-77
- [ ] Step 7.1: Add onEvent callback to GenerationCallbacks
In packages/web/frontend/src/lib/generationApi.ts, replace the GenerationCallbacks interface (lines 16-20):
export interface GenerationCallbacks {
onStatus: (message: string) => void;
onResult: (response: SingleGenerationResponse) => void;
onError: (error: string) => void;
}
With:
import type { GenerationEvent } from '../types/governance';
export interface GenerationCallbacks {
onStatus: (message: string) => void;
onEvent?: (event: GenerationEvent) => void;
onResult: (response: SingleGenerationResponse) => void;
onError: (error: string) => void;
}
(The import type line goes at the top of the file with the other imports.)
- [ ] Step 7.2: Dispatch structured events in the SSE parser
Replace the inner parsing block (lines 70-75):
const event = JSON.parse(payload);
if (event.status) callbacks.onStatus(event.status);
else if (event.result) callbacks.onResult(event.result);
else if (event.error) callbacks.onError(event.error);
With:
const parsed = JSON.parse(payload);
if (parsed.status) callbacks.onStatus(parsed.status);
else if (parsed.event) callbacks.onEvent?.(parsed as GenerationEvent);
else if (parsed.result) callbacks.onResult(parsed.result);
else if (parsed.error) callbacks.onError(parsed.error);
- [ ] Step 7.3: Run type-check
Run: cd packages/web/frontend && npm run type-check
Expected: clean.
- [ ] Step 7.4: Commit Task 6 + Task 7 together
git add packages/web/frontend/src/types/governance.ts \
packages/web/frontend/src/lib/generationApi.ts
git commit -m "$(cat <<'EOF'
feat(web): PHON-20 — parse structured generation events
Adds a discriminated union of pipeline/draft/rollout/selection events
to GenerationCallbacks.onEvent. Free-text onStatus stays as a fallback
so unknown events still log usefully.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 8: Skeleton reducer — pure function with tests¶
Files:
- Create: packages/web/frontend/src/components/tools/GovernedGenerationTool/skeletonReducer.ts
- Create: packages/web/frontend/src/components/tools/GovernedGenerationTool/skeletonReducer.test.ts
- [ ] Step 8.1: Write failing tests
Create packages/web/frontend/src/components/tools/GovernedGenerationTool/skeletonReducer.test.ts:
import { describe, it, expect } from 'vitest';
import { initialSkeletonState, skeletonReducer } from './skeletonReducer';
import type { GenerationEvent } from '../../../types/governance';
describe('skeletonReducer', () => {
it('starts with empty pipeline + 3 queued draft slots', () => {
const s = initialSkeletonState(3);
expect(s.pipeline).toEqual([]);
expect(s.drafts).toHaveLength(3);
expect(s.drafts.every((d) => d.state === 'queued')).toBe(true);
expect(s.survival).toBeNull();
expect(s.winner).toBeNull();
});
it('appends pipeline tick on constraints_resolved', () => {
const e: GenerationEvent = { event: 'pipeline', step: 'constraints_resolved', count: 5 };
const s = skeletonReducer(initialSkeletonState(3), e);
expect(s.pipeline).toEqual([{ step: 'constraints_resolved', count: 5 }]);
});
it('records survival on vocab_tagged', () => {
const e: GenerationEvent = { event: 'pipeline', step: 'vocab_tagged', survival: 0.12, max_tokens: 80 };
const s = skeletonReducer(initialSkeletonState(3), e);
expect(s.survival).toBeCloseTo(0.12);
expect(s.maxTokens).toBe(80);
});
it('transitions a draft slot generating → compliant', () => {
let s = initialSkeletonState(3);
s = skeletonReducer(s, { event: 'draft', index: 0, state: 'generating' });
expect(s.drafts[0].state).toBe('generating');
s = skeletonReducer(s, { event: 'draft', index: 0, state: 'compliant', score: 8.2 });
expect(s.drafts[0]).toEqual({ state: 'compliant', score: 8.2 });
});
it('records winner on selection', () => {
const s = skeletonReducer(
initialSkeletonState(3),
{ event: 'selection', winner_score: 8.2, compliant_count: 2 },
);
expect(s.winner).toEqual({ score: 8.2, compliantCount: 2, fallback: false });
});
it('records fallback selection (no compliant drafts)', () => {
const s = skeletonReducer(
initialSkeletonState(3),
{ event: 'selection', winner_score: null, compliant_count: 0, fallback_violations: 3 },
);
expect(s.winner).toEqual({ score: null, compliantCount: 0, fallback: true, fallbackViolations: 3 });
});
it('appends rollout slot on rollout activation', () => {
let s = initialSkeletonState(3);
s = skeletonReducer(s, { event: 'rollout', state: 'activating' });
expect(s.rollout).toEqual({ state: 'activating' });
s = skeletonReducer(s, { event: 'rollout', state: 'compliant', score: 7.1 });
expect(s.rollout).toEqual({ state: 'compliant', score: 7.1 });
});
});
- [ ] Step 8.2: Run the tests to verify they fail
Run: cd packages/web/frontend && npm test -- skeletonReducer
Expected: FAIL — Cannot find module './skeletonReducer'.
- [ ] Step 8.3: Implement the reducer
Create packages/web/frontend/src/components/tools/GovernedGenerationTool/skeletonReducer.ts:
/**
* Pure reducer that maps generation streaming events into a skeleton-card state.
*
* Used by the SkeletonOutputCard to render pipeline progress, per-draft
* outcomes, and the selection result before the final OutputCard hydrates.
*/
import type { GenerationEvent, PipelineStep, DraftState, RolloutState } from '../../../types/governance';
export interface PipelineTick {
step: PipelineStep;
count?: number;
}
export type DraftSlot =
| { state: 'queued' }
| { state: 'generating' }
| { state: 'compliant'; score: number }
| { state: 'violations'; count: number };
export interface RolloutSlot {
state: RolloutState;
score?: number;
count?: number;
}
export interface Winner {
score: number | null;
compliantCount: number;
fallback: boolean;
fallbackViolations?: number;
}
export interface SkeletonState {
pipeline: PipelineTick[];
survival: number | null;
maxTokens: number | null;
drafts: DraftSlot[];
rollout: RolloutSlot | null;
winner: Winner | null;
}
export function initialSkeletonState(nDrafts: number): SkeletonState {
return {
pipeline: [],
survival: null,
maxTokens: null,
drafts: Array.from({ length: nDrafts }, () => ({ state: 'queued' as const })),
rollout: null,
winner: null,
};
}
export function skeletonReducer(state: SkeletonState, event: GenerationEvent): SkeletonState {
switch (event.event) {
case 'pipeline':
if (event.step === 'vocab_tagged') {
return {
...state,
pipeline: [...state.pipeline, { step: event.step, count: event.count }],
survival: event.survival ?? state.survival,
maxTokens: event.max_tokens ?? state.maxTokens,
};
}
return {
...state,
pipeline: [...state.pipeline, { step: event.step, count: event.count }],
};
case 'draft': {
const next = [...state.drafts];
if (event.state === 'compliant' && event.score != null) {
next[event.index] = { state: 'compliant', score: event.score };
} else if (event.state === 'violations' && event.count != null) {
next[event.index] = { state: 'violations', count: event.count };
} else {
next[event.index] = { state: 'generating' };
}
return { ...state, drafts: next };
}
case 'rollout':
return {
...state,
rollout: { state: event.state, score: event.score, count: event.count },
};
case 'selection':
return {
...state,
winner: {
score: event.winner_score,
compliantCount: event.compliant_count,
fallback: event.compliant_count === 0,
fallbackViolations: event.fallback_violations,
},
};
default:
return state;
}
}
- [ ] Step 8.4: Run the tests to verify they pass
Run: cd packages/web/frontend && npm test -- skeletonReducer
Expected: PASS — 7 tests.
- [ ] Step 8.5: Commit
git add packages/web/frontend/src/components/tools/GovernedGenerationTool/skeletonReducer.ts \
packages/web/frontend/src/components/tools/GovernedGenerationTool/skeletonReducer.test.ts
git commit -m "$(cat <<'EOF'
feat(web): PHON-20 — pure skeleton reducer for pipeline events
Maps GenerationEvent stream into SkeletonState (pipeline ticks, per-draft
slots, rollout slot, winner). Pure function, fully unit-tested.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 9: SkeletonOutputCard component¶
Files:
- Create: packages/web/frontend/src/components/tools/GovernedGenerationTool/SkeletonOutputCard.tsx
- [ ] Step 9.1: Create the component
Create packages/web/frontend/src/components/tools/GovernedGenerationTool/SkeletonOutputCard.tsx:
/**
* Skeleton card shown during generation — renders pipeline progress,
* per-draft outcomes, and the survival-based disclaimer banner.
*
* Hydrates in place into the normal OutputCard once the result arrives
* (the parent swaps which component is rendered for this id).
*/
import { Box, Card, CardContent, Typography, Stack, CircularProgress, Alert } from '@mui/material';
import {
CheckCircle as CheckIcon,
Cancel as XIcon,
HourglassEmpty as QueuedIcon,
} from '@mui/icons-material';
import type { SkeletonState, DraftSlot, PipelineTick } from './skeletonReducer';
interface SkeletonOutputCardProps {
state: SkeletonState;
}
const PIPELINE_LABELS: Record<string, string> = {
constraints_resolved: 'Constraints resolved',
vocab_tagging: 'Tagging vocabulary…',
vocab_tagged: 'Vocabulary tagged',
};
function pipelineLine(tick: PipelineTick): string {
const label = PIPELINE_LABELS[tick.step] ?? tick.step;
if (tick.step === 'constraints_resolved' && tick.count != null) {
return `${label} (${tick.count})`;
}
return label;
}
function survivalBanner(survival: number | null) {
if (survival == null) return null;
const pct = Math.round(survival * 100);
if (survival < 0.05) {
return (
<Alert severity="error" sx={{ mt: 1 }}>
Only <strong>{pct}%</strong> of vocabulary survives these constraints. Output may be incoherent — consider relaxing or removing one.
</Alert>
);
}
if (survival < 0.20) {
return (
<Alert severity="warning" sx={{ mt: 1 }}>
<strong>{pct}%</strong> vocabulary survival. Output will be shorter than usual.
</Alert>
);
}
return null;
}
function draftLine(slot: DraftSlot, index: number) {
const label = `Draft ${index + 1}`;
if (slot.state === 'queued') {
return (
<Stack direction="row" alignItems="center" spacing={1}>
<QueuedIcon fontSize="small" sx={{ color: 'text.disabled' }} />
<Typography variant="body2" color="text.secondary">{label}</Typography>
<Typography variant="caption" color="text.disabled">queued</Typography>
</Stack>
);
}
if (slot.state === 'generating') {
return (
<Stack direction="row" alignItems="center" spacing={1}>
<CircularProgress size={14} />
<Typography variant="body2">{label}</Typography>
<Typography variant="caption" color="text.secondary" fontStyle="italic">generating…</Typography>
</Stack>
);
}
if (slot.state === 'compliant') {
return (
<Stack direction="row" alignItems="center" spacing={1}>
<CheckIcon fontSize="small" color="success" />
<Typography variant="body2">{label}</Typography>
<Typography variant="caption" color="text.secondary">compliant · score {slot.score.toFixed(1)}</Typography>
</Stack>
);
}
// violations
return (
<Stack direction="row" alignItems="center" spacing={1}>
<XIcon fontSize="small" color="error" />
<Typography variant="body2">{label}</Typography>
<Typography variant="caption" color="text.secondary">{slot.count} violation{slot.count === 1 ? '' : 's'}</Typography>
</Stack>
);
}
const SkeletonOutputCard = ({ state }: SkeletonOutputCardProps) => {
return (
<Card variant="outlined" sx={{ mb: 2 }}>
<CardContent>
<Stack direction="row" alignItems="center" spacing={1} sx={{ mb: 1 }}>
<CircularProgress size={16} />
<Typography variant="subtitle2">Generating…</Typography>
</Stack>
<Typography variant="overline" color="text.secondary">Pipeline</Typography>
<Box sx={{ pl: 1, mb: 1 }}>
{state.pipeline.map((tick, i) => (
<Stack key={i} direction="row" alignItems="center" spacing={1}>
<CheckIcon fontSize="small" color="success" />
<Typography variant="body2">{pipelineLine(tick)}</Typography>
</Stack>
))}
{survivalBanner(state.survival)}
</Box>
<Typography variant="overline" color="text.secondary">Drafts</Typography>
<Box sx={{ pl: 1 }}>
{state.drafts.map((slot, i) => (
<Box key={i}>{draftLine(slot, i)}</Box>
))}
{state.rollout && (
<Stack direction="row" alignItems="center" spacing={1} sx={{ mt: 0.5 }}>
{state.rollout.state === 'activating' || state.rollout.state === 'generating' ? (
<CircularProgress size={14} />
) : state.rollout.state === 'compliant' ? (
<CheckIcon fontSize="small" color="success" />
) : (
<XIcon fontSize="small" color="error" />
)}
<Typography variant="body2">Rollout</Typography>
<Typography variant="caption" color="text.secondary">
{state.rollout.state}
{state.rollout.score != null && ` · score ${state.rollout.score.toFixed(1)}`}
{state.rollout.count != null && ` · ${state.rollout.count} violations`}
</Typography>
</Stack>
)}
</Box>
</CardContent>
</Card>
);
};
export default SkeletonOutputCard;
- [ ] Step 9.2: Type-check
Run: cd packages/web/frontend && npm run type-check
Expected: clean.
- [ ] Step 9.3: Don't commit yet — wire-up in Task 12.
Task 10: ActiveConstraints — educational hint¶
Files:
- Modify: packages/web/frontend/src/components/tools/GovernedGenerationTool/ActiveConstraints.tsx
- [ ] Step 10.1: Read the current component to find the insertion point
Run: cat packages/web/frontend/src/components/tools/GovernedGenerationTool/ActiveConstraints.tsx | head -70
Locate the closing of the chip-list block (where the Box/Stack containing the constraint chips ends).
- [ ] Step 10.2: Add the hint Typography
In ActiveConstraints.tsx, add a <Typography> directly after the chip stack so the hint sits with the chip count. Use this exact copy:
<Typography variant="caption" color="text.secondary" sx={{ display: 'block', mt: 1 }}>
Each constraint trims the vocabulary the model can draw from. Stack enough of them and coherence starts to suffer.
</Typography>
(If Typography is not yet imported in that file, add it to the existing @mui/material import line.)
- [ ] Step 10.3: Type-check
Run: cd packages/web/frontend && npm run type-check
Expected: clean.
- [ ] Step 10.4: Manually verify on dev server
Reload the dev frontend, navigate to Governed Generation, look at the ActiveConstraints area. The hint should be visible below the chip count, low-emphasis text.
- [ ] Step 10.5: Commit
git add packages/web/frontend/src/components/tools/GovernedGenerationTool/ActiveConstraints.tsx
git commit -m "$(cat <<'EOF'
feat(web): PHON-20 — educational hint under ActiveConstraints
Always-visible secondary text under the constraint chip count:
'Each constraint trims the vocabulary the model can draw from.
Stack enough of them and coherence starts to suffer.'
Sets expectations about the constraint-vocabulary tradeoff before
the user composes the prompt.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 11: OutputCard — pipeline trace footer¶
Files:
- Modify: packages/web/frontend/src/components/tools/GovernedGenerationTool/OutputCard.tsx
- [ ] Step 11.1: Add optional pipelineTrace prop
In OutputCard.tsx, find the OutputCardProps interface (near the top). Add:
pipelineTrace?: import('./skeletonReducer').SkeletonState;
(Or import the type at the top: import type { SkeletonState } from './skeletonReducer'; and use pipelineTrace?: SkeletonState;.)
- [ ] Step 11.2: Render the trace as a collapsible footer
Near the bottom of the card body in OutputCard.tsx (just before the card closes), add:
{pipelineTrace && (
<Box component="details" sx={{ mt: 1, fontSize: '0.85em', color: 'text.secondary' }}>
<Box component="summary" sx={{ cursor: 'pointer' }}>Pipeline trace</Box>
<Box sx={{ pl: 2, pt: 1 }}>
{pipelineTrace.pipeline.map((t, i) => (
<div key={i}>✓ {t.step}{t.count != null ? ` (${t.count})` : ''}</div>
))}
{pipelineTrace.survival != null && (
<div>vocab survival: {Math.round(pipelineTrace.survival * 100)}% (max {pipelineTrace.maxTokens} tokens)</div>
)}
{pipelineTrace.drafts.map((d, i) => (
<div key={`d${i}`}>
Draft {i + 1}: {d.state}
{d.state === 'compliant' && ` · score ${d.score.toFixed(1)}`}
{d.state === 'violations' && ` · ${d.count} violations`}
</div>
))}
{pipelineTrace.rollout && <div>Rollout: {pipelineTrace.rollout.state}</div>}
{pipelineTrace.winner && (
<div>
Selected: {pipelineTrace.winner.fallback
? `fallback (${pipelineTrace.winner.fallbackViolations ?? 0} violations)`
: `score ${pipelineTrace.winner.score?.toFixed(1)} (${pipelineTrace.winner.compliantCount} compliant)`}
</div>
)}
</Box>
</Box>
)}
(Place this before the closing </CardContent> or equivalent — adjust to match the file's structure.)
- [ ] Step 11.3: Type-check
Run: cd packages/web/frontend && npm run type-check
Expected: clean.
- [ ] Step 11.4: Don't commit yet — wire-up in Task 12.
Task 12: GovernedGenerationTool integration¶
Files:
- Modify: packages/web/frontend/src/components/tools/GovernedGenerationTool/index.tsx
- Modify: packages/web/frontend/src/components/tools/GovernedGenerationTool/OutputFeed.tsx
- [ ] Step 12.1: Add skeleton state to index.tsx
In index.tsx, add imports near the top:
import { useReducer } from 'react';
import { initialSkeletonState, skeletonReducer, type SkeletonState } from './skeletonReducer';
import type { GenerationEvent } from '../../../types/governance';
Inside the component (alongside the other useState calls), add:
const [skeleton, dispatchSkeleton] = useReducer(
(state: SkeletonState, event: GenerationEvent | { type: 'reset' }) =>
'type' in event ? initialSkeletonState(3) : skeletonReducer(state, event),
initialSkeletonState(3),
);
const [pipelineTraces, setPipelineTraces] = useState<Record<string, SkeletonState>>({});
- [ ] Step 12.2: Wire onEvent into generateContent + persist trace
In the existing handleGenerate callback, modify the generateContent call:
After setStatusMessage('Starting...');, add:
dispatchSkeleton({ type: 'reset' });
In the generateContent({ ... }) call, add an onEvent handler:
onEvent: (event) => dispatchSkeleton(event),
Inside the existing onResult handler, persist the trace alongside the result:
onResult: (response) => {
setStatusMessage(null);
const result: GenerationResult = {
id: String(++nextId),
// ...existing fields...
};
setResults((prev) => [result, ...prev]);
setPipelineTraces((prev) => ({ ...prev, [result.id]: skeleton }));
if (!overridePrompt) setPrompt('');
},
(Note: you may need to capture skeleton via a ref or compute the id earlier to avoid the closure capturing the pre-result skeleton state. The simplest fix: assign the id before calling generateContent, and read skeleton synchronously inside onResult — React state from useReducer is stable per-render but onResult may fire later. Use a ref:
const skeletonRef = useRef(skeleton);
useEffect(() => { skeletonRef.current = skeleton; }, [skeleton]);
Then inside onResult:
setPipelineTraces((prev) => ({ ...prev, [result.id]: skeletonRef.current }));
- [ ] Step 12.3: Pass skeleton + traces to OutputFeed
Replace the <OutputFeed ... /> element with:
<OutputFeed
results={results}
pipelineTraces={pipelineTraces}
inProgress={loading ? skeleton : null}
onRegenerate={handleRegenerate}
onDelete={handleDelete}
onSwapAlternate={handleSwapAlternate}
/>
- [ ] Step 12.4: Update OutputFeed to render skeleton + traces
In OutputFeed.tsx, extend the props interface:
import SkeletonOutputCard from './SkeletonOutputCard';
import type { SkeletonState } from './skeletonReducer';
interface OutputFeedProps {
results: GenerationResult[];
pipelineTraces?: Record<string, SkeletonState>;
inProgress?: SkeletonState | null;
onRegenerate: (result: GenerationResult) => void;
onDelete: (id: string) => void;
onSwapAlternate: (id: string, altIndex: number) => void;
}
In the destructured props, add pipelineTraces, inProgress. In the JSX (the area where results are mapped), render the in-progress skeleton above the results map and pass each result's trace:
{inProgress && <SkeletonOutputCard state={inProgress} />}
{results.map((result) => (
<OutputCard
key={result.id}
result={result}
pipelineTrace={pipelineTraces?.[result.id]}
selected={selectedIds.has(result.id)}
onSelect={handleSelect}
onRegenerate={onRegenerate}
onDelete={onDelete}
onSwapAlternate={onSwapAlternate}
/>
))}
Also remove the if (results.length === 0) return null; early-return — replace with if (results.length === 0 && !inProgress) return null; so the skeleton can show on first generation.
- [ ] Step 12.5: Type-check + lint
Run: cd packages/web/frontend && npm run type-check && npm run lint
Expected: clean.
- [ ] Step 12.6: Manually verify end-to-end on dev servers
In the running dev frontend, compose a few constraints and click Generate. Watch for:
- Skeleton card appears at top of feed immediately.
- Pipeline ticks fill in (constraints_resolved → vocab_tagged).
- Survival banner shows if constraints are restrictive enough.
- Draft slots transition queued → generating → compliant/violations.
- When result lands, the skeleton disappears and a real OutputCard takes its place with a "Pipeline trace" <details> footer.
- Server status chip shows the new label set; cold-start hint visible whenever chip is non-warm.
Try a restrictive combo (e.g. AoA ≤ 5 + exclude /ɹ/) to see the red banner; try a permissive prompt to see no banner.
- [ ] Step 12.7: Commit
git add packages/web/frontend/src/components/tools/GovernedGenerationTool/index.tsx \
packages/web/frontend/src/components/tools/GovernedGenerationTool/OutputFeed.tsx \
packages/web/frontend/src/components/tools/GovernedGenerationTool/OutputCard.tsx \
packages/web/frontend/src/components/tools/GovernedGenerationTool/SkeletonOutputCard.tsx
git commit -m "$(cat <<'EOF'
feat(web): PHON-20 — SkeletonOutputCard with live pipeline trace
Replaces the flickering single-line statusMessage with a skeleton card
that shows pipeline progress and per-draft outcomes. Hydrates in place
into the normal OutputCard once the result arrives, with the trace
preserved as a collapsible footer.
Includes the survival-based disclaimer banner: red <5%, yellow <20%.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"
Task 13: Final verification + push¶
- [ ] Step 13.1: Run full test suites
Run:
cd packages/web/workers && npm test && npm run type-check
cd packages/web/frontend && npm test && npm run type-check && npm run lint
cd packages/generation && uv run python -m pytest server/tests/ -v
- [ ] Step 13.2: Live smoke test on dev servers
End-to-end test in the browser:
- Server status chip transitions correctly when generation server is hot vs cold.
- Cold-start hint appears for any non-warm chip state.
- Educational hint visible under ActiveConstraints.
- Skeleton card streams pipeline + drafts during generation.
- Restrictive-constraints banner fires at the right thresholds.
- Final OutputCard has the Pipeline trace <details> footer.
- 3 drafts (not 4) — verifiable by counting draft slots in skeleton, and by checking generation server logs.
- [ ] Step 13.3: Push branch and open PR into develop
git push -u origin feature/phon-20-gov-gen-transparency
gh pr create --base develop --title "feat: PHON-20 — governed generation pipeline transparency" --body "$(cat <<'EOF'
## Summary
- **PHON-20**: Restrictive-constraints disclaimer (red <5%, yellow <20%) lives in the skeleton card header, persists onto the OutputCard's pipeline trace footer.
- **Skeleton OutputCard**: replaces the flickering single-line status with a card that streams pipeline + per-draft progress, then hydrates in place into the normal OutputCard.
- **Server-status chip fix**: `idle` RunPod workers no longer mislabel as 'ready'. New 5-state machine: warm/starting/idle/cold/error. Cold-start hint shows for everything non-warm.
- **Educational hint** under ActiveConstraints sets expectations about the constraint-vocabulary tradeoff.
- **Drafts 4 → 3**: drops the redundant anti-repetition draft.
- Closes PHON-20. Creates PHON-48 as a backlog follow-up for pre-flight vocab survival readout.
## Test plan
- [x] `npm test` (workers) — all pass including new serverStatus tests
- [x] `npm test` (frontend) — all pass including new skeletonReducer tests
- [x] `pytest server/tests/` (generation) — all pass including new structured-event test
- [x] Live smoke test on dev servers
- [ ] CI green on develop after merge
- [ ] Verify on staging (`develop.phonolex.pages.dev`) — server-status chip + skeleton card
🤖 Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"
Notes for the implementer¶
- Commit cadence is intentional: each commit is independently revertible. Don't squash.
- The dev frontend hot-reloads but the generation server may need a manual restart on Python file changes if uvicorn
--reloadisn't on. Restart withcd packages/generation && uv run uvicorn server.main:app --host 0.0.0.0 --port 8000 --reload. - Backwards compat: free-text
statusevents still fire on the server. Clients that don't implementonEventsee no behavior change. - The
pipelineTracesRecord holds skeleton state per result id — purely client-side, not persisted. Resets on page reload. That's intentional for v1.