Audio Serving on Cloudflare Containers (PHON-152) Implementation Plan¶
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Date: 2026-06-15
Status: ✅ IMPLEMENTED + LIVE ON STAGING (2026-06-15). Merged to release/v6-audio via PR #127. Production cutover still gated on the user's "happy."
What actually shipped vs this plan (read before trusting the task bodies below): - Weight delivery = baked into the image, NOT R2. R2 was dropped;
Dockerfile.cfCOPYs the artifacts (staged off the drive bybuild-and-push.sh). The plan's R2 mentions are obsolete. ---analyze-onlyserving mode (new). The default entry loads three wav2vec2 stacks (off-the-shelf + feature + analyzer) = ~3.96 GiB, which OOMsstandard-1. The prod host now loads ONLY the trajectory analyzer (__main__.py --analyze-only;build_appaccepts an analyzer with an empty registry). ~2.16 GiB peak → fitsstandard-1. Transcription = the emitter's CTC head, not the off-the-shelf model. -instance_type = standard-1(4 GiB), confirmed by measurement (notstandard-2). - CPU-only torch pinned inDockerfile.cf(the[inference]extra otherwise pulls the CUDA build: 6.1 GB → 3.5 GB). Added a repo-root.dockerignore(context was 145 GB). - Deploy =deploy/deploy.sh <env>, which digest-pinswrangler.toml(image@sha256:...viacf-image:markers). This fixes the CF mutable-tag trap: same-tag redeploys don't re-resolve the digest, so a fixed image keeps 500ing on CF. The committed digest pin is the deploy lock; CIwrangler deployrolls it. - Deleted an orphaned CSP-eragenerationservercontainer (5 instances since 2026-05-13). - PHON-152 filed under PHON-44, linked to 150/151.See [[audio-serving-cpu-benchmark]] for the full deploy record + gotchas.
Ticket: PHON-152, parent PHON-44.
Goal: Serve the already-built v6 trajectory audio host (packages/audio/) on Cloudflare Containers — CPU, scale-to-zero — reachable only through the Worker via a Durable Object binding, with a locally-built-and-pushed image (weights baked in, no R2), wired into staging + production through the existing wrangler deploy flow.
Architecture: The audio FastAPI host (packages/audio/server.py / __main__.py) runs unchanged inside a Cloudflare Container fronted by a @cloudflare/containers Durable Object (AudioHost). The Worker's routes/audio.ts stops fetching a public AUDIO_INFERENCE_URL and instead calls env.AUDIO_SERVICE.getByName('default').fetch(...). A audioFetch() seam prefers the container binding (staging/prod) and falls back to AUDIO_INFERENCE_URL (local uvicorn dev + the existing hermetic tests), so the fast local loop and current tests are preserved. The ~1.26 GB keeper checkpoint is baked into the image (built locally where the external drive is, pushed to Cloudflare's registry by tag); CI references the tag and never builds the heavy image — the exact parallel to the d1-seed.sql "developer builds locally, CI consumes" paradigm.
Tech Stack: Cloudflare Workers + Containers (@cloudflare/containers@^0.3.3, already in package.json), Durable Objects, Hono, Docker, wrangler, vitest-pool-workers (cloudflare:test), FastAPI/uvicorn (existing host).
Why this shape (settled in conversation 2026-06-14/15):
- CPU benchmark on the keeper: warm /analyze 41–90 ms, cold model load ~1.9 s → no GPU needed. See [[audio-serving-cpu-benchmark]].
- Cloudflare Containers limits (verified 2026-06): up to 12 GiB / 4 vCPU, image size up to 20 GB (= instance disk). Measured peak RSS of the loaded analyzer (model + refs + attribution + worst-case long-clip inference) = ~3.0 GiB on CPU (macOS, 2026-06-15). So standard-1 (4 GiB / 0.5 vCPU / 8 GB disk) is plausible (~1 GiB headroom) — start there and validate it starts cleanly on staging; fall back to standard-2 (6 GiB / 1 vCPU) only if it fails. The archived GenerationServer ran standard-2 because the (larger) CSP host spiked over 4 GiB; our footprint does not. Note standard-1's 0.5 vCPU makes warm inference ~1.5–3 s vs ~sub-second on standard-2 — a latency/cost trade, acceptable for Beta.
- The archived archive/csp-generation-v5.2 tag is a production-proven CF Containers blueprint we lift: src/containers/generation.ts (Container subclass), src/routes/generation.ts (binding proxy), and the wrangler.toml [[containers]] + [[durable_objects.bindings]] + [[migrations]] blocks.
- No R2. The old container baked artifacts into the image; R2 was only ever a rejected option in planning docs. Baking gives the fastest cold start and avoids R2-creds-from-container plumbing.
Prerequisites (verify before Task 1)¶
- [ ] P1: Docker is installed and running locally (
docker infosucceeds). Required to build/push the image. - [ ] P2:
wrangleris authenticated for the target Cloudflare account (cd packages/web/workers && npx wrangler whoamishows the account;CLOUDFLARE_API_TOKENis set for non-interactive use). Note the account ID — the registry path isregistry.cloudflare.com/<ACCOUNT_ID>/.... - [ ] P3: Keeper artifacts exist on the external drive (gitignored, drive-only):
/Volumes/ExternalData1/audio-union/model_feat_traj_target/state_serve.pt(~1.26 GB),.../vectors.csv,/Volumes/ExternalData1/audio-union/refs_fisher.json,/Volumes/ExternalData1/audio-union/attribution_model.json. Confirm withls -la. - [ ] P4: The PHON-150 serving harness is on the working branch (
packages/audio/server.py,__main__.py,serving_config.py,deploy/Dockerfile). This plan builds on it. - [ ] P5: Branch off the current release line:
git checkout release/v6-audio && git pull && git checkout -b feature/phon-152-audio-cf-containers.
File Structure¶
| File | Responsibility | Action |
|---|---|---|
packages/web/workers/src/lib/audioHostFetch.ts |
The audioFetch(c, path, init) seam: container binding preferred, AUDIO_INFERENCE_URL fallback |
Create (Task 1) |
packages/web/workers/src/types.ts |
Env — add AUDIO_SERVICE? binding; keep AUDIO_INFERENCE_URL? |
Modify (Task 1) |
packages/web/workers/src/routes/audio.ts |
Replace all 6 host-fetch sites with audioFetch |
Modify (Task 2) |
packages/web/workers/src/containers/audioHost.ts |
AudioHost extends Container (port 8000, sleepAfter) |
Create (Task 3) |
packages/web/workers/src/index.ts |
Re-export AudioHost for the DO migration |
Modify (Task 3) |
packages/web/workers/wrangler.toml |
[[containers]] + [[durable_objects.bindings]] + [[migrations]] for prod + staging |
Modify (Task 4) |
packages/web/workers/src/__tests__/audio.test.ts |
Add binding-path tests (mock AUDIO_SERVICE); keep URL-path tests |
Modify (Task 5) |
packages/audio/deploy/Dockerfile.cf |
CF image: bakes weights via COPY from a staged build context (no volume mount) |
Create (Task 6) |
packages/audio/deploy/build-and-push.sh |
Stage drive artifacts → docker build → push to registry.cloudflare.com/<acct>/phonolex-audio:<tag> |
Create (Task 6) |
.github/workflows/deploy.yml, deploy-staging.yml |
Trigger the deploy job on audio/worker changes; wrangler deploy references the pre-pushed tag |
Modify (Task 8) |
Task 1: The audioFetch seam + Env binding¶
Files:
- Create: packages/web/workers/src/lib/audioHostFetch.ts
- Test: packages/web/workers/src/lib/__tests__/audioHostFetch.test.ts
- Modify: packages/web/workers/src/types.ts:9-13
Rationale: every host call in audio.ts currently resolves c.env.AUDIO_INFERENCE_URL then fetch(${base}${path}). We replace that with one seam that prefers the container binding and falls back to the URL. The fallback keeps the local uvicorn dev loop and the existing hermetic tests (which intercept http://127.0.0.1:8000) working unchanged.
- [ ] Step 1: Add the binding to
Env. Inpackages/web/workers/src/types.ts, change theEnvinterface (currently lines 9-13):
export interface Env {
DB: D1Database;
// Local dev + tests: a plain HTTP host (uvicorn). Staging/prod use AUDIO_SERVICE.
AUDIO_INFERENCE_URL?: string;
// Staging/prod: the audio host runs in a Cloudflare Container reached via this
// Durable Object binding. Absent in local `wrangler dev` (default env).
AUDIO_SERVICE?: DurableObjectNamespace;
}
- [ ] Step 2: Write the failing test. Create
packages/web/workers/src/lib/__tests__/audioHostFetch.test.ts:
import { describe, it, expect, vi } from 'vitest';
import { audioFetch } from '../audioHostFetch';
function ctx(env: Record<string, unknown>) {
return { env } as unknown as Parameters<typeof audioFetch>[0];
}
describe('audioFetch', () => {
it('prefers the container binding when AUDIO_SERVICE is present', async () => {
const stubFetch = vi.fn(async () => new Response('ok', { status: 200 }));
const getByName = vi.fn(() => ({ fetch: stubFetch }));
const res = await audioFetch(ctx({ AUDIO_SERVICE: { getByName } }), '/analyze', { method: 'POST' });
expect(getByName).toHaveBeenCalledWith('default');
expect(stubFetch).toHaveBeenCalledWith('http://audio-host/analyze', { method: 'POST' });
expect(res.status).toBe(200);
});
it('falls back to AUDIO_INFERENCE_URL when no binding', async () => {
const spy = vi.spyOn(globalThis, 'fetch').mockResolvedValue(new Response('ok', { status: 200 }));
await audioFetch(ctx({ AUDIO_INFERENCE_URL: 'http://127.0.0.1:8000/' }), '/analyze', { method: 'POST' });
expect(spy).toHaveBeenCalledWith('http://127.0.0.1:8000/analyze', { method: 'POST' });
spy.mockRestore();
});
it('throws a typed NoAudioHost error when neither is configured', async () => {
await expect(audioFetch(ctx({}), '/analyze', {})).rejects.toThrow('AUDIO_HOST_UNCONFIGURED');
});
});
- [ ] Step 3: Run it to confirm it fails.
Run: cd packages/web/workers && npx vitest run src/lib/__tests__/audioHostFetch.test.ts
Expected: FAIL — Cannot find module '../audioHostFetch'.
- [ ] Step 4: Implement the seam. Create
packages/web/workers/src/lib/audioHostFetch.ts:
/**
* audioFetch — single seam for reaching the audio inference host.
*
* Staging/prod: the host runs in a Cloudflare Container; we reach it through the
* AUDIO_SERVICE Durable Object binding (not publicly exposed). Local dev + tests:
* a plain uvicorn host at AUDIO_INFERENCE_URL. The container hostname is arbitrary
* (the binding ignores it); we use a stable 'http://audio-host'.
*/
import type { Context } from 'hono';
import type { Env } from '../types';
export async function audioFetch(
c: Context<{ Bindings: Env }>,
path: string,
init: RequestInit,
): Promise<Response> {
const svc = c.env.AUDIO_SERVICE;
if (svc) {
return svc.getByName('default').fetch(`http://audio-host${path}`, init);
}
const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
if (base) {
return fetch(`${base}${path}`, init);
}
throw new Error('AUDIO_HOST_UNCONFIGURED');
}
Note:
getByNameis the@cloudflare/containersaccessor used in the archivedroutes/generation.ts. If the installed@cloudflare/containersversion exposes a different accessor (e.g.getContainer), adjust here only — every caller goes through this seam. Confirm againstnode_modules/@cloudflare/containersbefore implementing.
- [ ] Step 5: Run the test to confirm it passes.
Run: cd packages/web/workers && npx vitest run src/lib/__tests__/audioHostFetch.test.ts
Expected: PASS (3 tests).
- [ ] Step 6: Type check.
Run: cd packages/web/workers && npm run type-check
Expected: PASS.
- [ ] Step 7: Commit.
git add packages/web/workers/src/lib/audioHostFetch.ts packages/web/workers/src/lib/__tests__/audioHostFetch.test.ts packages/web/workers/src/types.ts
git commit -m "feat(audio): audioFetch seam — container binding preferred, AUDIO_INFERENCE_URL fallback (PHON-152)"
Task 2: Route every host call through audioFetch¶
Files:
- Modify: packages/web/workers/src/routes/audio.ts (6 fetch sites: proxy L46-59, fetchTranscript L107-145, /pronounce L180-184, /feature-review L256-286, /acoustic L350-360, /analyze L414-431, /attribute L449-457)
There is no new behavior here — it is a mechanical swap. Each site drops its own const base = c.env.AUDIO_INFERENCE_URL... guard and calls audioFetch(c, path, init). The existing try/catch → { warming: true } and status handling stay byte-for-byte.
- [ ] Step 1: Import the seam. At the top of
routes/audio.ts, add:
import { audioFetch } from '../lib/audioHostFetch';
- [ ] Step 2:
proxy()helper (currently L46-62). Replace:
const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
if (!base) {
return c.json({ detail: 'Audio inference host not configured' }, 500);
}
// Re-pack into a fresh multipart body to forward.
const fwd = new FormData();
fwd.append('audio', file, file.name || 'clip');
const language = form.get('language');
if (typeof language === 'string') fwd.append('language', language);
let upstream: Response;
try {
upstream = await fetch(`${base}${path}`, { method: 'POST', body: fwd });
} catch {
return c.json({ warming: true, detail: 'Inference host is warming up. Retry shortly.' }, 503);
}
with:
// Re-pack into a fresh multipart body to forward.
const fwd = new FormData();
fwd.append('audio', file, file.name || 'clip');
const language = form.get('language');
if (typeof language === 'string') fwd.append('language', language);
let upstream: Response;
try {
upstream = await audioFetch(c, path, { method: 'POST', body: fwd });
} catch {
return c.json({ warming: true, detail: 'Inference host is warming up. Retry shortly.' }, 503);
}
- [ ] Step 3:
fetchTranscript()(currently L107-145). Change the signature frombase: stringto the Hono context and swap the call. Replace the signature line:
async function fetchTranscript(
base: string,
file: File,
with:
async function fetchTranscript(
c: Context<{ Bindings: Env }>,
file: File,
and replace its fetch:
try {
upstream = await fetch(`${base}${path}`, { method: 'POST', body: fwd });
} catch {
return { warming: true };
}
with:
try {
upstream = await audioFetch(c, path, { method: 'POST', body: fwd });
} catch {
return { warming: true };
}
Add the imports if not already present at the top of the file: import type { Context } from 'hono'; and import type { Env } from '../types'; (the file already imports Hono; confirm Context/Env are imported).
- [ ] Step 4:
/pronounce(currently L180-184). Replace:
const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
if (!base) return c.json({ detail: 'Audio inference host not configured' }, 500);
// 2. Transcribe (produced phonemes) — BEFORE D1 so the warming path doesn't need a seeded DB
const transcript = await fetchTranscript(base, file, transcriber, language);
with:
// 2. Transcribe (produced phonemes) — BEFORE D1 so the warming path doesn't need a seeded DB
const transcript = await fetchTranscript(c, file, transcriber, language);
- [ ] Step 5:
/feature-review(L256-286),/acoustic(L350-360),/analyze(L414-431),/attribute(L449-457). In each, delete the two lines:
const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
if (!base) return c.json({ detail: 'Audio inference host not configured' }, 500);
and change the corresponding await fetch(${base}/feature-review, …) / /acoustic / /analyze / /attribute to await audioFetch(c, '/feature-review', …) etc. The init object (method, body, headers) is unchanged. For /attribute the init keeps headers: { 'content-type': 'application/json' } and the JSON body.
- [ ] Step 6: Type check.
Run: cd packages/web/workers && npm run type-check
Expected: PASS (no remaining references to base in audio.ts; grep to confirm: grep -n "AUDIO_INFERENCE_URL\|\\${base}" src/routes/audio.ts` returns nothing).
- [ ] Step 7: Run the existing audio tests (URL-fallback path still active in test env).
Run: cd packages/web/workers && npx vitest run src/__tests__/audio.test.ts
Expected: PASS — the test miniflare binds AUDIO_INFERENCE_URL (no AUDIO_SERVICE), so audioFetch takes the fallback and fetchMock interceptors still match.
- [ ] Step 8: Commit.
git add packages/web/workers/src/routes/audio.ts
git commit -m "refactor(audio): route all host calls through audioFetch (binding-ready) (PHON-152)"
Task 3: The AudioHost Container Durable Object¶
Files:
- Create: packages/web/workers/src/containers/audioHost.ts
- Modify: packages/web/workers/src/index.ts:118 (add the re-export near the existing export default app)
- [ ] Step 1: Create the Container subclass. This mirrors the archived
src/containers/generation.ts. Createpackages/web/workers/src/containers/audioHost.ts:
/**
* AudioHost — Cloudflare Container hosting the v6 trajectory audio inference host.
*
* The container runs the FastAPI host from packages/audio (phonolex_audio.__main__,
* uvicorn on :8000). Worker routes call env.AUDIO_SERVICE.getByName('default').fetch(req)
* via the audioFetch seam; Cloudflare manages the lifecycle (scale-to-zero, sticky
* routing by DO name). Not publicly exposed — reachable only through the Worker.
*/
import { Container } from '@cloudflare/containers';
export class AudioHost extends Container {
defaultPort = 8000;
// Scale-to-zero: sleep the container after 5 min idle. Cold start = container
// boot + ~2s model load (see audio-serving-cpu-benchmark). The Worker's
// { warming: true } path renders the warm-up state on the 503/network failure.
sleepAfter = '5m';
}
Verify
sleepAfteris the current@cloudflare/containersfield name/format for the installed version (node_modules/@cloudflare/containers); some versions usesleepAfteras a string duration, others a method. Adjust to the installed API. ThedefaultPort = 8000matches the host'sEXPOSE 8000and thePHONOLEX_AUDIO_PORT=8000default.
- [ ] Step 2: Re-export the class from the Worker entry (DO classes must be exported from the entry module for migrations). In
packages/web/workers/src/index.ts, immediately aboveexport default app;(L118), add:
export { AudioHost } from './containers/audioHost';
- [ ] Step 3: Type check.
Run: cd packages/web/workers && npm run type-check
Expected: PASS. (@cloudflare/containers@^0.3.3 is already a dependency — confirm with grep containers package.json.)
- [ ] Step 4: Commit.
git add packages/web/workers/src/containers/audioHost.ts packages/web/workers/src/index.ts
git commit -m "feat(audio): AudioHost Container DO (port 8000, scale-to-zero) (PHON-152)"
Task 4: wrangler.toml — container + binding + migration (both envs)¶
Files:
- Modify: packages/web/workers/wrangler.toml
The current file has a v1/v2 GenerationServer migration history (kept for Cloudflare's bookkeeping). We append a new migration tag — never edit existing tags — and add the container + DO binding to production (top-level) and env.staging. image points at a registry tag (built+pushed in Task 6), NOT a Dockerfile path, so CI's wrangler deploy references the pre-built image instead of building the heavy torch image.
- [ ] Step 1: Production container block. After the existing top-level
[[d1_databases]]block (before the# GenerationServer ...comment), add:
# Audio inference host (v6 trajectory model) — Cloudflare Container, CPU,
# scale-to-zero. Image is built+pushed locally by deploy/build-and-push.sh
# (weights baked in; ~5 GB). standard-1 (4 GiB): measured peak RSS ~3.0 GiB
# leaves ~1 GiB headroom (unlike the larger CSP host that needed standard-2).
# Validate it starts cleanly on staging; bump to standard-2 only if it fails.
[[containers]]
class_name = "AudioHost"
image = "registry.cloudflare.com/${CLOUDFLARE_ACCOUNT_ID}/phonolex-audio:latest"
max_instances = 3
instance_type = "standard-1"
[[durable_objects.bindings]]
name = "AUDIO_SERVICE"
class_name = "AudioHost"
${CLOUDFLARE_ACCOUNT_ID}— ifwrangler.tomldoes not interpolate env vars in your wrangler version, hardcode the account ID here (it is not secret) or use the:latesttag with the account inferred. Confirm interpolation support; otherwise replace with the literalregistry.cloudflare.com/<ACCOUNT_ID>/phonolex-audio:latest.
- [ ] Step 2: Append the migration (do not touch the v1/v2 entries). After the existing top-level
[[migrations]] tag = "v2"block, add:
[[migrations]]
tag = "v3"
new_sqlite_classes = ["AudioHost"]
- [ ] Step 3: Staging container block. Under
[env.staging], after[[env.staging.d1_databases]], add:
[[env.staging.containers]]
class_name = "AudioHost"
image = "registry.cloudflare.com/${CLOUDFLARE_ACCOUNT_ID}/phonolex-audio:staging"
max_instances = 2
instance_type = "standard-1"
[[env.staging.durable_objects.bindings]]
name = "AUDIO_SERVICE"
class_name = "AudioHost"
- [ ] Step 4: Append the staging migration. After the existing
[[env.staging.migrations]] tag = "v2"block, add:
[[env.staging.migrations]]
tag = "v3"
new_sqlite_classes = ["AudioHost"]
-
[ ] Step 5: Local default env keeps
AUDIO_INFERENCE_URL. Leave the top-level[vars] AUDIO_INFERENCE_URL = "http://127.0.0.1:8000"as-is — the default env (localwrangler dev) has noAUDIO_SERVICE, soaudioFetchfalls back to the uvicorn host. Do NOT add a container to the default env (avoids requiring Docker for the inner dev loop). -
[ ] Step 6: Validate config parses.
Run: cd packages/web/workers && npx wrangler deploy --dry-run --outdir /tmp/wrangler-dryrun
Expected: dry-run succeeds and reports the AudioHost container + AUDIO_SERVICE DO binding. (Dry-run does not build the image or deploy.)
Run staging too: npx wrangler deploy --env staging --dry-run --outdir /tmp/wrangler-dryrun-staging
Expected: same, with the staging image tag.
- [ ] Step 7: Commit.
git add packages/web/workers/wrangler.toml
git commit -m "config(audio): AudioHost container + AUDIO_SERVICE binding + v3 migration (prod+staging) (PHON-152)"
Task 5: Tests for the binding path¶
Files:
- Modify: packages/web/workers/src/__tests__/audio.test.ts
The existing tests cover the URL-fallback path (test env has AUDIO_INFERENCE_URL, no binding). Add a focused unit test for the binding path by calling the audio Hono app directly with a mock AUDIO_SERVICE env, mirroring the archived generation.test.ts style (env.GENERATION_SERVICE: { getByName: vi.fn(() => stub) }).
- [ ] Step 1: Add a binding-path describe block to
audio.test.ts. Append:
import audioApp from '../routes/audio';
describe('audio route — AUDIO_SERVICE container binding', () => {
it('/analyze forwards multipart to the container binding when AUDIO_SERVICE is set', async () => {
const stubFetch = vi.fn(async () =>
new Response(JSON.stringify({ positions: [], attribution: null }), {
status: 200, headers: { 'content-type': 'application/json' },
}),
);
const getByName = vi.fn(() => ({ fetch: stubFetch }));
// Seeded test D1 is provided by cloudflare:test env; reuse it for the canonical lookup.
const { env } = await import('cloudflare:test');
const mockEnv = { ...env, AUDIO_SERVICE: { getByName }, AUDIO_INFERENCE_URL: undefined };
const fd = new FormData();
fd.append('audio', new File([new Uint8Array([1, 2, 3])], 'clip.wav', { type: 'audio/wav' }));
fd.append('target', 'cat');
const res = await audioApp.fetch(
new Request('http://localhost/analyze', { method: 'POST', body: fd }),
mockEnv as unknown as typeof env,
);
// Only assert the binding was used; D1 seeding state may make this 200 or 404.
expect(getByName).toHaveBeenCalledWith('default');
expect(stubFetch).toHaveBeenCalled();
expect(stubFetch.mock.calls[0][0]).toBe('http://audio-host/analyze');
});
});
Note on
vi/imports:audio.test.tsalready imports fromvitestandcloudflare:test. Addvito thevitestimport if not present. IfaudioApp.fetch(req, env)requires a thirdExecutionContextarg in this Hono/Workers version, passcreateExecutionContext()fromcloudflare:testas the third argument.
- [ ] Step 2: Run the audio tests.
Run: cd packages/web/workers && npx vitest run src/__tests__/audio.test.ts
Expected: PASS (existing URL-path tests + the new binding test).
- [ ] Step 3: Run the full worker test suite (catch ripple).
Run: cd packages/web/workers && npm test
Expected: PASS.
- [ ] Step 4: Commit.
git add packages/web/workers/src/__tests__/audio.test.ts
git commit -m "test(audio): cover the AUDIO_SERVICE container-binding path (PHON-152)"
Task 6: CF image — bake weights, build, push by tag (local)¶
Files:
- Create: packages/audio/deploy/Dockerfile.cf
- Create: packages/audio/deploy/build-and-push.sh
The existing deploy/Dockerfile mounts weights from a RunPod network volume. CF Containers have no volume mounts, so the CF image bakes the four artifacts in via COPY from a staged build context (the build script stages them off the drive). This runs locally (where the drive is); the resulting image is pushed to Cloudflare's registry by tag and referenced from wrangler.toml (Task 4). CI never builds it.
- [ ] Step 1: Create
packages/audio/deploy/Dockerfile.cf. It reuses the existing Dockerfile's offline-HF-prewarm pattern butCOPYs the artifacts into image-local paths and setsPHONOLEX_AUDIO_*to those paths:
# syntax=docker/dockerfile:1.7
# PhonoLex audio inference host — Cloudflare Containers image.
# Weights are BAKED IN (no volume mounts on CF). Build locally via
# deploy/build-and-push.sh (stages the gitignored artifacts off the drive).
FROM python:3.11-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
libsndfile1 ffmpeg build-essential \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir uv==0.5.11
WORKDIR /app
COPY pyproject.toml uv.lock /app/
COPY packages/audio/pyproject.toml /app/packages/audio/pyproject.toml
COPY packages/data/pyproject.toml /app/packages/data/pyproject.toml
COPY packages/audio /app/packages/audio
COPY packages/data /app/packages/data
RUN uv pip install --system -e packages/data
RUN uv pip install --system -e "packages/audio[inference]"
# Offline HF backbone (~315 MB), baked so startup is fully offline.
ENV HF_HOME=/opt/hf
RUN python -c "from transformers import Wav2Vec2Model, Wav2Vec2FeatureExtractor; \
Wav2Vec2Model.from_pretrained('facebook/wav2vec2-lv-60-espeak-cv-ft'); \
Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-lv-60-espeak-cv-ft')"
# Baked model artifacts (staged into ./_artifacts by build-and-push.sh).
COPY _artifacts/state_serve.pt /app/artifacts/state_serve.pt
COPY _artifacts/vectors.csv /app/artifacts/vectors.csv
COPY _artifacts/refs_fisher.json /app/artifacts/refs_fisher.json
COPY _artifacts/attribution_model.json /app/artifacts/attribution_model.json
ENV PHONOLEX_AUDIO_DEVICE=cpu
ENV PHONOLEX_AUDIO_CHECKPOINT=/app/artifacts/state_serve.pt
ENV PHONOLEX_AUDIO_VECTORS=/app/artifacts/vectors.csv
ENV PHONOLEX_AUDIO_TRAJ_REFS=/app/artifacts/refs_fisher.json
ENV PHONOLEX_AUDIO_ATTRIBUTION=/app/artifacts/attribution_model.json
ENV PHONOLEX_AUDIO_HOST=0.0.0.0
ENV PHONOLEX_AUDIO_PORT=8000
ENV PYTHONUNBUFFERED=1
EXPOSE 8000
CMD ["python", "-m", "phonolex_audio", \
"--feature-checkpoint", "/app/artifacts/state_serve.pt", \
"--feature-vectors", "/app/artifacts/vectors.csv", \
"--trajectory-refs", "/app/artifacts/refs_fisher.json", \
"--attribution-model", "/app/artifacts/attribution_model.json", \
"--host", "0.0.0.0", "--port", "8000"]
Device is
cpu(benchmark proved CPU sufficient; CF has no GPU). Build context is the repo root (sopackages/resolves);_artifacts/is created under the repo root by the build script and is gitignored.
- [ ] Step 2: Create
packages/audio/deploy/build-and-push.sh. Stages artifacts off the drive, builds, pushes:
#!/usr/bin/env bash
# Build + push the CF audio image (weights baked). Run LOCALLY (needs the drive + Docker).
# Usage: ACCOUNT_ID=<id> TAG=latest|staging bash packages/audio/deploy/build-and-push.sh
set -euo pipefail
DRIVE="${DRIVE:-/Volumes/ExternalData1/audio-union}"
ACCOUNT_ID="${ACCOUNT_ID:?set ACCOUNT_ID to your Cloudflare account id}"
TAG="${TAG:-latest}"
IMAGE="registry.cloudflare.com/${ACCOUNT_ID}/phonolex-audio:${TAG}"
REPO_ROOT="$(git rev-parse --show-toplevel)"
STAGE="${REPO_ROOT}/_artifacts"
echo "Staging artifacts from ${DRIVE} -> ${STAGE}"
mkdir -p "${STAGE}"
cp "${DRIVE}/model_feat_traj_target/state_serve.pt" "${STAGE}/"
cp "${DRIVE}/model_feat_traj_target/vectors.csv" "${STAGE}/"
cp "${DRIVE}/refs_fisher.json" "${STAGE}/"
cp "${DRIVE}/attribution_model.json" "${STAGE}/"
echo "Building ${IMAGE}"
docker build --platform linux/amd64 \
-f "${REPO_ROOT}/packages/audio/deploy/Dockerfile.cf" \
-t "${IMAGE}" "${REPO_ROOT}"
echo "Pushing ${IMAGE}"
# Auth to Cloudflare's registry via wrangler, then docker push. Confirm the current
# wrangler containers push/login command for your version; as of writing:
# npx wrangler containers push "${IMAGE}" # builds+pushes, OR
# docker push "${IMAGE}" # after `wrangler login` registry auth
( cd "${REPO_ROOT}/packages/web/workers" && npx wrangler containers push "${IMAGE}" ) \
|| docker push "${IMAGE}"
echo "Cleaning staged artifacts"
rm -rf "${STAGE}"
echo "Done: ${IMAGE}"
Verify the registry-push command for the installed wrangler.
wrangler containers pushvsdocker pushafter a registry login has changed across wrangler versions. The plan's fallback (docker pushafterwranglerregistry auth) covers the common case.--platform linux/amd64is required if building on Apple Silicon (CF runs amd64).
- [ ] Step 3: Add
_artifacts/to gitignore. Append_artifacts/to the repo-root.gitignore(never commit the staged 1.26 GB weights).
echo "_artifacts/" >> .gitignore
- [ ] Step 4:
chmod +xand commit the scripts (NOT the artifacts/image).
chmod +x packages/audio/deploy/build-and-push.sh
git add packages/audio/deploy/Dockerfile.cf packages/audio/deploy/build-and-push.sh .gitignore
git commit -m "build(audio): CF Containers image (weights baked) + local build-and-push script (PHON-152)"
- [ ] Step 5: Build locally and smoke-test the image before any deploy.
ACCOUNT_ID=<your-account-id> TAG=staging bash packages/audio/deploy/build-and-push.sh
# After build (before/independent of push), run it locally to confirm it serves:
docker run --rm -p 8099:8000 registry.cloudflare.com/<acct>/phonolex-audio:staging &
sleep 20
curl -s localhost:8099/health # expect {"status":"ok", ..., "analyze": true}
/health returns analyze: true and the keeper models load (~2 s after boot).
Task 7: First deploy to STAGING (manual, gated)¶
Files: none (operational). This is the first time anything goes live; it is staging only and still subject to the "happy" gate for production.
- [ ] Step 1: Confirm the staging image tag is pushed (Task 6 with
TAG=staging). - [ ] Step 2: Deploy the Worker + container to staging:
cd packages/web/workers && npx wrangler deploy --env staging
v3 (creates AudioHost), provisions the container from the pushed :staging image, deploys the Worker.
- [ ] Step 3: Smoke-test through the staging Worker (the container is NOT directly reachable — only via the Worker):
curl -s -X POST https://staging-api.phonolex.com/api/audio/analyze \
-F audio=@/path/to/a/test.wav -F target=cat
{ "warming": true } (503) during cold start, then a { positions: [...], attribution: {...} } payload on retry within a few seconds.
- [ ] Step 4: Confirm
standard-1started cleanly. Check the container did not OOM/fail on boot (npx wrangler containers list/ the Cloudflare dashboard container logs). Measured local peak was ~3.0 GiB vs the 4 GiB limit; if thelinux/amd64build runs hotter and the container fails to start or restarts under load, bump bothwrangler.tomlblocks toinstance_type = "standard-2", re-deploy, and note it. Do not pre-emptively bump — confirm empirically. - [ ] Step 5: Verify scale-to-zero: leave it idle >5 min, re-call, confirm a single warm-up then service. Note cold-start wall-time for the user.
- [ ] Step 6: Report staging results to the user (including the
standard-1start verdict + cold-start time). Do not proceed to production until the user says "happy."
Task 8: Fold into CI (deploy on audio/worker changes)¶
Files:
- Modify: .github/workflows/deploy.yml, .github/workflows/deploy-staging.yml
Today deploy.yml runs on push to main and seeds D1 only when d1-seed.sql changed (paths-filter), then always runs npx wrangler deploy. Because the image is pre-built+pushed by tag and wrangler.toml references the tag, wrangler deploy wires the container without building it in CI. The only change needed: ensure the deploy job triggers on audio/worker/wrangler changes too, and document that the image must be pushed (manually, locally) before a deploy that bumps it.
-
[ ] Step 1: Read the current deploy job trigger. Confirm whether
deploy.yml'sdeployjob is gated by the paths-filter or runs on every push tomain. (From inspection it runs on push tomain; the paths-filter only gates the D1 seed step.) If the deploy job already runs on every push tomain/develop, no trigger change is needed —wrangler deploywill pick up the newwrangler.tomlcontainer config on the next deploy. In that case, skip to Step 3. -
[ ] Step 2 (only if the deploy job is paths-filtered): add
packages/web/workers/**andpackages/audio/deploy/**to the filter so worker/container config changes trigger a deploy. Mirror the existingdorny/paths-filter@v3block. -
[ ] Step 3: Add a guard note + no image build in CI. Add a comment above the
Deploy Workersstep in both workflows:
# Audio container: the image (registry.cloudflare.com/<acct>/phonolex-audio:<tag>)
# is built + pushed LOCALLY by packages/audio/deploy/build-and-push.sh — NOT here.
# `wrangler deploy` references the pushed tag (parallels the d1-seed "dev builds,
# CI consumes" paradigm). A model change requires a fresh local push BEFORE merge.
-
[ ] Step 4: Confirm
CLOUDFLARE_ACCOUNT_ID/token available towrangler deploy. The containerimageregistry path needs the account id; verify the workflow env exposesCLOUDFLARE_ACCOUNT_ID(add to the jobenv:ifwrangler.tomlinterpolates it). The deploy already authenticates wrangler viaCLOUDFLARE_API_TOKEN— confirm that token has Containers + Workers + DO permissions. -
[ ] Step 5: Commit.
git add .github/workflows/deploy.yml .github/workflows/deploy-staging.yml
git commit -m "ci(audio): deploy AudioHost container via wrangler (pre-pushed image, no CI build) (PHON-152)"
Task 9: Production cutover (BLOCKED on "happy")¶
Files: none (operational).
- [ ] Step 1: User has confirmed staging is good and signals "happy."
- [ ] Step 2: Push the production image tag:
ACCOUNT_ID=<id> TAG=latest bash packages/audio/deploy/build-and-push.sh. - [ ] Step 3: Merge
feature/phon-152-audio-cf-containers→ release/v6-audio → the production line per the project's git-flow. CI runswrangler deploy(prod), applies migrationv3, provisions the prod container from:latest. - [ ] Step 4: Smoke-test
https://api.phonolex.com/api/audio/analyze(same as Task 7 Step 3). Verify the frontend Speech Analysis (Beta) tab end-to-end. - [ ] Step 5: Remove the now-unused
AUDIO_INFERENCE_URLfrom prod/staging[vars]if present (keep it only in the local default env). Optional cleanup. - [ ] Step 6: Update memory ([[audio-serving-cpu-benchmark]]) + the
deploy/README.mdto record CF Containers as the live hosting (supersede the RunPod-framed README), and note the cold-start wall-time observed.
Self-Review Notes (gaps the implementer must close, not skip)¶
@cloudflare/containersAPI drift. Three call sites depend on the installed version (^0.3.3):getByName('default')(Task 1),defaultPort/sleepAfter(Task 3). Readnode_modules/@cloudflare/containersbefore implementing and adjust the seam + class to the actual API. Every binding call goes throughaudioFetch, so a rename is a one-line fix.- Registry push command.
wrangler containers pushvsdocker pushafter registry auth has changed across wrangler versions (Task 6 Step 2). Confirm the current command for the installed wrangler; the script has a||fallback but verify before relying on it. wrangler.tomlenv-var interpolation. Task 4 uses${CLOUDFLARE_ACCOUNT_ID}in theimagefield. If the installed wrangler does not interpolate, hardcode the (non-secret) account id. Verify with the Task 4 Step 6 dry-run.- Instance size — start at
standard-1, validate, fall back. Measured peak RSS ~3.0 GiB (CPU, macOS, 2026-06-15) →standard-1(4 GiB) has ~1 GiB headroom and is the starting pick. The old CSP host neededstandard-2because it spiked over 4 GiB; ours does not. Validatestandard-1starts cleanly on staging (Task 7 Step 4); bump tostandard-2only on an empirical start/OOM failure. Thelinux/amd64torch build may differ from the macOS measurement — measure on staging, don't assume. - Cold start wall-time is the UX number to report. Benchmark proved ~2 s model load, but CF container boot (image pull + start) adds to it. Measure the real first-call latency on staging (Task 7 Step 4) and tell the user — it drives whether the
{ warming: true }copy needs tuning (see [[feedback_user_facing_copy]]). - No image in git / no weights in git.
_artifacts/is gitignored (Task 6 Step 3); the 1.26 GBstate_serve.ptis never committed (drive-only, per the standing constraint). The image lives only in Cloudflare's registry. - Dev loop unchanged. Local
wrangler dev(default env, noAUDIO_SERVICE) still falls back toAUDIO_INFERENCE_URL→ the directly-runuvicornhost. No Docker needed for the inner loop. Confirm the local audio host launch in.env.developmentstill works after these changes. - Auth gap is closed structurally, not by a token. CF Containers are not publicly exposed (binding-only), so the previously-missing bearer header on the Worker proxy is moot. Do not add a token scheme — verify the container has no public route.
Execution note¶
This plan is dev-only through Task 6 (build + local smoke). Task 7 is staging only; Task 9 (production) is blocked on the user signaling "happy" per the standing v6 gate. File PHON-152 under PHON-44 before starting, linking PHON-150 (the harness this deploys) and PHON-151 (the gated reseed — same vector geometry; ideally lands in the same "happy" window).