Skip to content

Audio Serving on Cloudflare Containers (PHON-152) Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Date: 2026-06-15 Status:IMPLEMENTED + LIVE ON STAGING (2026-06-15). Merged to release/v6-audio via PR #127. Production cutover still gated on the user's "happy."

What actually shipped vs this plan (read before trusting the task bodies below): - Weight delivery = baked into the image, NOT R2. R2 was dropped; Dockerfile.cf COPYs the artifacts (staged off the drive by build-and-push.sh). The plan's R2 mentions are obsolete. - --analyze-only serving mode (new). The default entry loads three wav2vec2 stacks (off-the-shelf + feature + analyzer) = ~3.96 GiB, which OOMs standard-1. The prod host now loads ONLY the trajectory analyzer (__main__.py --analyze-only; build_app accepts an analyzer with an empty registry). ~2.16 GiB peak → fits standard-1. Transcription = the emitter's CTC head, not the off-the-shelf model. - instance_type = standard-1 (4 GiB), confirmed by measurement (not standard-2). - CPU-only torch pinned in Dockerfile.cf (the [inference] extra otherwise pulls the CUDA build: 6.1 GB → 3.5 GB). Added a repo-root .dockerignore (context was 145 GB). - Deploy = deploy/deploy.sh <env>, which digest-pins wrangler.toml (image@sha256:... via cf-image: markers). This fixes the CF mutable-tag trap: same-tag redeploys don't re-resolve the digest, so a fixed image keeps 500ing on CF. The committed digest pin is the deploy lock; CI wrangler deploy rolls it. - Deleted an orphaned CSP-era generationserver container (5 instances since 2026-05-13). - PHON-152 filed under PHON-44, linked to 150/151.

See [[audio-serving-cpu-benchmark]] for the full deploy record + gotchas.

Ticket: PHON-152, parent PHON-44.

Goal: Serve the already-built v6 trajectory audio host (packages/audio/) on Cloudflare Containers — CPU, scale-to-zero — reachable only through the Worker via a Durable Object binding, with a locally-built-and-pushed image (weights baked in, no R2), wired into staging + production through the existing wrangler deploy flow.

Architecture: The audio FastAPI host (packages/audio/server.py / __main__.py) runs unchanged inside a Cloudflare Container fronted by a @cloudflare/containers Durable Object (AudioHost). The Worker's routes/audio.ts stops fetching a public AUDIO_INFERENCE_URL and instead calls env.AUDIO_SERVICE.getByName('default').fetch(...). A audioFetch() seam prefers the container binding (staging/prod) and falls back to AUDIO_INFERENCE_URL (local uvicorn dev + the existing hermetic tests), so the fast local loop and current tests are preserved. The ~1.26 GB keeper checkpoint is baked into the image (built locally where the external drive is, pushed to Cloudflare's registry by tag); CI references the tag and never builds the heavy image — the exact parallel to the d1-seed.sql "developer builds locally, CI consumes" paradigm.

Tech Stack: Cloudflare Workers + Containers (@cloudflare/containers@^0.3.3, already in package.json), Durable Objects, Hono, Docker, wrangler, vitest-pool-workers (cloudflare:test), FastAPI/uvicorn (existing host).

Why this shape (settled in conversation 2026-06-14/15): - CPU benchmark on the keeper: warm /analyze 41–90 ms, cold model load ~1.9 sno GPU needed. See [[audio-serving-cpu-benchmark]]. - Cloudflare Containers limits (verified 2026-06): up to 12 GiB / 4 vCPU, image size up to 20 GB (= instance disk). Measured peak RSS of the loaded analyzer (model + refs + attribution + worst-case long-clip inference) = ~3.0 GiB on CPU (macOS, 2026-06-15). So standard-1 (4 GiB / 0.5 vCPU / 8 GB disk) is plausible (~1 GiB headroom) — start there and validate it starts cleanly on staging; fall back to standard-2 (6 GiB / 1 vCPU) only if it fails. The archived GenerationServer ran standard-2 because the (larger) CSP host spiked over 4 GiB; our footprint does not. Note standard-1's 0.5 vCPU makes warm inference ~1.5–3 s vs ~sub-second on standard-2 — a latency/cost trade, acceptable for Beta. - The archived archive/csp-generation-v5.2 tag is a production-proven CF Containers blueprint we lift: src/containers/generation.ts (Container subclass), src/routes/generation.ts (binding proxy), and the wrangler.toml [[containers]] + [[durable_objects.bindings]] + [[migrations]] blocks. - No R2. The old container baked artifacts into the image; R2 was only ever a rejected option in planning docs. Baking gives the fastest cold start and avoids R2-creds-from-container plumbing.


Prerequisites (verify before Task 1)

  • [ ] P1: Docker is installed and running locally (docker info succeeds). Required to build/push the image.
  • [ ] P2: wrangler is authenticated for the target Cloudflare account (cd packages/web/workers && npx wrangler whoami shows the account; CLOUDFLARE_API_TOKEN is set for non-interactive use). Note the account ID — the registry path is registry.cloudflare.com/<ACCOUNT_ID>/....
  • [ ] P3: Keeper artifacts exist on the external drive (gitignored, drive-only): /Volumes/ExternalData1/audio-union/model_feat_traj_target/state_serve.pt (~1.26 GB), .../vectors.csv, /Volumes/ExternalData1/audio-union/refs_fisher.json, /Volumes/ExternalData1/audio-union/attribution_model.json. Confirm with ls -la.
  • [ ] P4: The PHON-150 serving harness is on the working branch (packages/audio/server.py, __main__.py, serving_config.py, deploy/Dockerfile). This plan builds on it.
  • [ ] P5: Branch off the current release line: git checkout release/v6-audio && git pull && git checkout -b feature/phon-152-audio-cf-containers.

File Structure

File Responsibility Action
packages/web/workers/src/lib/audioHostFetch.ts The audioFetch(c, path, init) seam: container binding preferred, AUDIO_INFERENCE_URL fallback Create (Task 1)
packages/web/workers/src/types.ts Env — add AUDIO_SERVICE? binding; keep AUDIO_INFERENCE_URL? Modify (Task 1)
packages/web/workers/src/routes/audio.ts Replace all 6 host-fetch sites with audioFetch Modify (Task 2)
packages/web/workers/src/containers/audioHost.ts AudioHost extends Container (port 8000, sleepAfter) Create (Task 3)
packages/web/workers/src/index.ts Re-export AudioHost for the DO migration Modify (Task 3)
packages/web/workers/wrangler.toml [[containers]] + [[durable_objects.bindings]] + [[migrations]] for prod + staging Modify (Task 4)
packages/web/workers/src/__tests__/audio.test.ts Add binding-path tests (mock AUDIO_SERVICE); keep URL-path tests Modify (Task 5)
packages/audio/deploy/Dockerfile.cf CF image: bakes weights via COPY from a staged build context (no volume mount) Create (Task 6)
packages/audio/deploy/build-and-push.sh Stage drive artifacts → docker build → push to registry.cloudflare.com/<acct>/phonolex-audio:<tag> Create (Task 6)
.github/workflows/deploy.yml, deploy-staging.yml Trigger the deploy job on audio/worker changes; wrangler deploy references the pre-pushed tag Modify (Task 8)

Task 1: The audioFetch seam + Env binding

Files: - Create: packages/web/workers/src/lib/audioHostFetch.ts - Test: packages/web/workers/src/lib/__tests__/audioHostFetch.test.ts - Modify: packages/web/workers/src/types.ts:9-13

Rationale: every host call in audio.ts currently resolves c.env.AUDIO_INFERENCE_URL then fetch(${base}${path}). We replace that with one seam that prefers the container binding and falls back to the URL. The fallback keeps the local uvicorn dev loop and the existing hermetic tests (which intercept http://127.0.0.1:8000) working unchanged.

  • [ ] Step 1: Add the binding to Env. In packages/web/workers/src/types.ts, change the Env interface (currently lines 9-13):
export interface Env {
  DB: D1Database;
  // Local dev + tests: a plain HTTP host (uvicorn). Staging/prod use AUDIO_SERVICE.
  AUDIO_INFERENCE_URL?: string;
  // Staging/prod: the audio host runs in a Cloudflare Container reached via this
  // Durable Object binding. Absent in local `wrangler dev` (default env).
  AUDIO_SERVICE?: DurableObjectNamespace;
}
  • [ ] Step 2: Write the failing test. Create packages/web/workers/src/lib/__tests__/audioHostFetch.test.ts:
import { describe, it, expect, vi } from 'vitest';
import { audioFetch } from '../audioHostFetch';

function ctx(env: Record<string, unknown>) {
  return { env } as unknown as Parameters<typeof audioFetch>[0];
}

describe('audioFetch', () => {
  it('prefers the container binding when AUDIO_SERVICE is present', async () => {
    const stubFetch = vi.fn(async () => new Response('ok', { status: 200 }));
    const getByName = vi.fn(() => ({ fetch: stubFetch }));
    const res = await audioFetch(ctx({ AUDIO_SERVICE: { getByName } }), '/analyze', { method: 'POST' });
    expect(getByName).toHaveBeenCalledWith('default');
    expect(stubFetch).toHaveBeenCalledWith('http://audio-host/analyze', { method: 'POST' });
    expect(res.status).toBe(200);
  });

  it('falls back to AUDIO_INFERENCE_URL when no binding', async () => {
    const spy = vi.spyOn(globalThis, 'fetch').mockResolvedValue(new Response('ok', { status: 200 }));
    await audioFetch(ctx({ AUDIO_INFERENCE_URL: 'http://127.0.0.1:8000/' }), '/analyze', { method: 'POST' });
    expect(spy).toHaveBeenCalledWith('http://127.0.0.1:8000/analyze', { method: 'POST' });
    spy.mockRestore();
  });

  it('throws a typed NoAudioHost error when neither is configured', async () => {
    await expect(audioFetch(ctx({}), '/analyze', {})).rejects.toThrow('AUDIO_HOST_UNCONFIGURED');
  });
});
  • [ ] Step 3: Run it to confirm it fails.

Run: cd packages/web/workers && npx vitest run src/lib/__tests__/audioHostFetch.test.ts Expected: FAIL — Cannot find module '../audioHostFetch'.

  • [ ] Step 4: Implement the seam. Create packages/web/workers/src/lib/audioHostFetch.ts:
/**
 * audioFetch — single seam for reaching the audio inference host.
 *
 * Staging/prod: the host runs in a Cloudflare Container; we reach it through the
 * AUDIO_SERVICE Durable Object binding (not publicly exposed). Local dev + tests:
 * a plain uvicorn host at AUDIO_INFERENCE_URL. The container hostname is arbitrary
 * (the binding ignores it); we use a stable 'http://audio-host'.
 */
import type { Context } from 'hono';
import type { Env } from '../types';

export async function audioFetch(
  c: Context<{ Bindings: Env }>,
  path: string,
  init: RequestInit,
): Promise<Response> {
  const svc = c.env.AUDIO_SERVICE;
  if (svc) {
    return svc.getByName('default').fetch(`http://audio-host${path}`, init);
  }
  const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
  if (base) {
    return fetch(`${base}${path}`, init);
  }
  throw new Error('AUDIO_HOST_UNCONFIGURED');
}

Note: getByName is the @cloudflare/containers accessor used in the archived routes/generation.ts. If the installed @cloudflare/containers version exposes a different accessor (e.g. getContainer), adjust here only — every caller goes through this seam. Confirm against node_modules/@cloudflare/containers before implementing.

  • [ ] Step 5: Run the test to confirm it passes.

Run: cd packages/web/workers && npx vitest run src/lib/__tests__/audioHostFetch.test.ts Expected: PASS (3 tests).

  • [ ] Step 6: Type check.

Run: cd packages/web/workers && npm run type-check Expected: PASS.

  • [ ] Step 7: Commit.
git add packages/web/workers/src/lib/audioHostFetch.ts packages/web/workers/src/lib/__tests__/audioHostFetch.test.ts packages/web/workers/src/types.ts
git commit -m "feat(audio): audioFetch seam — container binding preferred, AUDIO_INFERENCE_URL fallback (PHON-152)"

Task 2: Route every host call through audioFetch

Files: - Modify: packages/web/workers/src/routes/audio.ts (6 fetch sites: proxy L46-59, fetchTranscript L107-145, /pronounce L180-184, /feature-review L256-286, /acoustic L350-360, /analyze L414-431, /attribute L449-457)

There is no new behavior here — it is a mechanical swap. Each site drops its own const base = c.env.AUDIO_INFERENCE_URL... guard and calls audioFetch(c, path, init). The existing try/catch → { warming: true } and status handling stay byte-for-byte.

  • [ ] Step 1: Import the seam. At the top of routes/audio.ts, add:
import { audioFetch } from '../lib/audioHostFetch';
  • [ ] Step 2: proxy() helper (currently L46-62). Replace:
  const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
  if (!base) {
    return c.json({ detail: 'Audio inference host not configured' }, 500);
  }

  // Re-pack into a fresh multipart body to forward.
  const fwd = new FormData();
  fwd.append('audio', file, file.name || 'clip');
  const language = form.get('language');
  if (typeof language === 'string') fwd.append('language', language);

  let upstream: Response;
  try {
    upstream = await fetch(`${base}${path}`, { method: 'POST', body: fwd });
  } catch {
    return c.json({ warming: true, detail: 'Inference host is warming up. Retry shortly.' }, 503);
  }

with:

  // Re-pack into a fresh multipart body to forward.
  const fwd = new FormData();
  fwd.append('audio', file, file.name || 'clip');
  const language = form.get('language');
  if (typeof language === 'string') fwd.append('language', language);

  let upstream: Response;
  try {
    upstream = await audioFetch(c, path, { method: 'POST', body: fwd });
  } catch {
    return c.json({ warming: true, detail: 'Inference host is warming up. Retry shortly.' }, 503);
  }
  • [ ] Step 3: fetchTranscript() (currently L107-145). Change the signature from base: string to the Hono context and swap the call. Replace the signature line:
async function fetchTranscript(
  base: string,
  file: File,

with:

async function fetchTranscript(
  c: Context<{ Bindings: Env }>,
  file: File,

and replace its fetch:

  try {
    upstream = await fetch(`${base}${path}`, { method: 'POST', body: fwd });
  } catch {
    return { warming: true };
  }

with:

  try {
    upstream = await audioFetch(c, path, { method: 'POST', body: fwd });
  } catch {
    return { warming: true };
  }

Add the imports if not already present at the top of the file: import type { Context } from 'hono'; and import type { Env } from '../types'; (the file already imports Hono; confirm Context/Env are imported).

  • [ ] Step 4: /pronounce (currently L180-184). Replace:
  const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
  if (!base) return c.json({ detail: 'Audio inference host not configured' }, 500);

  // 2. Transcribe (produced phonemes) — BEFORE D1 so the warming path doesn't need a seeded DB
  const transcript = await fetchTranscript(base, file, transcriber, language);

with:

  // 2. Transcribe (produced phonemes) — BEFORE D1 so the warming path doesn't need a seeded DB
  const transcript = await fetchTranscript(c, file, transcriber, language);
  • [ ] Step 5: /feature-review (L256-286), /acoustic (L350-360), /analyze (L414-431), /attribute (L449-457). In each, delete the two lines:
  const base = c.env.AUDIO_INFERENCE_URL?.replace(/\/$/, '');
  if (!base) return c.json({ detail: 'Audio inference host not configured' }, 500);

and change the corresponding await fetch(${base}/feature-review, …) / /acoustic / /analyze / /attribute to await audioFetch(c, '/feature-review', …) etc. The init object (method, body, headers) is unchanged. For /attribute the init keeps headers: { 'content-type': 'application/json' } and the JSON body.

  • [ ] Step 6: Type check.

Run: cd packages/web/workers && npm run type-check Expected: PASS (no remaining references to base in audio.ts; grep to confirm: grep -n "AUDIO_INFERENCE_URL\|\\${base}" src/routes/audio.ts` returns nothing).

  • [ ] Step 7: Run the existing audio tests (URL-fallback path still active in test env).

Run: cd packages/web/workers && npx vitest run src/__tests__/audio.test.ts Expected: PASS — the test miniflare binds AUDIO_INFERENCE_URL (no AUDIO_SERVICE), so audioFetch takes the fallback and fetchMock interceptors still match.

  • [ ] Step 8: Commit.
git add packages/web/workers/src/routes/audio.ts
git commit -m "refactor(audio): route all host calls through audioFetch (binding-ready) (PHON-152)"

Task 3: The AudioHost Container Durable Object

Files: - Create: packages/web/workers/src/containers/audioHost.ts - Modify: packages/web/workers/src/index.ts:118 (add the re-export near the existing export default app)

  • [ ] Step 1: Create the Container subclass. This mirrors the archived src/containers/generation.ts. Create packages/web/workers/src/containers/audioHost.ts:
/**
 * AudioHost — Cloudflare Container hosting the v6 trajectory audio inference host.
 *
 * The container runs the FastAPI host from packages/audio (phonolex_audio.__main__,
 * uvicorn on :8000). Worker routes call env.AUDIO_SERVICE.getByName('default').fetch(req)
 * via the audioFetch seam; Cloudflare manages the lifecycle (scale-to-zero, sticky
 * routing by DO name). Not publicly exposed — reachable only through the Worker.
 */
import { Container } from '@cloudflare/containers';

export class AudioHost extends Container {
  defaultPort = 8000;
  // Scale-to-zero: sleep the container after 5 min idle. Cold start = container
  // boot + ~2s model load (see audio-serving-cpu-benchmark). The Worker's
  // { warming: true } path renders the warm-up state on the 503/network failure.
  sleepAfter = '5m';
}

Verify sleepAfter is the current @cloudflare/containers field name/format for the installed version (node_modules/@cloudflare/containers); some versions use sleepAfter as a string duration, others a method. Adjust to the installed API. The defaultPort = 8000 matches the host's EXPOSE 8000 and the PHONOLEX_AUDIO_PORT=8000 default.

  • [ ] Step 2: Re-export the class from the Worker entry (DO classes must be exported from the entry module for migrations). In packages/web/workers/src/index.ts, immediately above export default app; (L118), add:
export { AudioHost } from './containers/audioHost';
  • [ ] Step 3: Type check.

Run: cd packages/web/workers && npm run type-check Expected: PASS. (@cloudflare/containers@^0.3.3 is already a dependency — confirm with grep containers package.json.)

  • [ ] Step 4: Commit.
git add packages/web/workers/src/containers/audioHost.ts packages/web/workers/src/index.ts
git commit -m "feat(audio): AudioHost Container DO (port 8000, scale-to-zero) (PHON-152)"

Task 4: wrangler.toml — container + binding + migration (both envs)

Files: - Modify: packages/web/workers/wrangler.toml

The current file has a v1/v2 GenerationServer migration history (kept for Cloudflare's bookkeeping). We append a new migration tag — never edit existing tags — and add the container + DO binding to production (top-level) and env.staging. image points at a registry tag (built+pushed in Task 6), NOT a Dockerfile path, so CI's wrangler deploy references the pre-built image instead of building the heavy torch image.

  • [ ] Step 1: Production container block. After the existing top-level [[d1_databases]] block (before the # GenerationServer ... comment), add:
# Audio inference host (v6 trajectory model) — Cloudflare Container, CPU,
# scale-to-zero. Image is built+pushed locally by deploy/build-and-push.sh
# (weights baked in; ~5 GB). standard-1 (4 GiB): measured peak RSS ~3.0 GiB
# leaves ~1 GiB headroom (unlike the larger CSP host that needed standard-2).
# Validate it starts cleanly on staging; bump to standard-2 only if it fails.
[[containers]]
class_name = "AudioHost"
image = "registry.cloudflare.com/${CLOUDFLARE_ACCOUNT_ID}/phonolex-audio:latest"
max_instances = 3
instance_type = "standard-1"

[[durable_objects.bindings]]
name = "AUDIO_SERVICE"
class_name = "AudioHost"

${CLOUDFLARE_ACCOUNT_ID} — if wrangler.toml does not interpolate env vars in your wrangler version, hardcode the account ID here (it is not secret) or use the :latest tag with the account inferred. Confirm interpolation support; otherwise replace with the literal registry.cloudflare.com/<ACCOUNT_ID>/phonolex-audio:latest.

  • [ ] Step 2: Append the migration (do not touch the v1/v2 entries). After the existing top-level [[migrations]] tag = "v2" block, add:
[[migrations]]
tag = "v3"
new_sqlite_classes = ["AudioHost"]
  • [ ] Step 3: Staging container block. Under [env.staging], after [[env.staging.d1_databases]], add:
[[env.staging.containers]]
class_name = "AudioHost"
image = "registry.cloudflare.com/${CLOUDFLARE_ACCOUNT_ID}/phonolex-audio:staging"
max_instances = 2
instance_type = "standard-1"

[[env.staging.durable_objects.bindings]]
name = "AUDIO_SERVICE"
class_name = "AudioHost"
  • [ ] Step 4: Append the staging migration. After the existing [[env.staging.migrations]] tag = "v2" block, add:
[[env.staging.migrations]]
tag = "v3"
new_sqlite_classes = ["AudioHost"]
  • [ ] Step 5: Local default env keeps AUDIO_INFERENCE_URL. Leave the top-level [vars] AUDIO_INFERENCE_URL = "http://127.0.0.1:8000" as-is — the default env (local wrangler dev) has no AUDIO_SERVICE, so audioFetch falls back to the uvicorn host. Do NOT add a container to the default env (avoids requiring Docker for the inner dev loop).

  • [ ] Step 6: Validate config parses.

Run: cd packages/web/workers && npx wrangler deploy --dry-run --outdir /tmp/wrangler-dryrun Expected: dry-run succeeds and reports the AudioHost container + AUDIO_SERVICE DO binding. (Dry-run does not build the image or deploy.)

Run staging too: npx wrangler deploy --env staging --dry-run --outdir /tmp/wrangler-dryrun-staging Expected: same, with the staging image tag.

  • [ ] Step 7: Commit.
git add packages/web/workers/wrangler.toml
git commit -m "config(audio): AudioHost container + AUDIO_SERVICE binding + v3 migration (prod+staging) (PHON-152)"

Task 5: Tests for the binding path

Files: - Modify: packages/web/workers/src/__tests__/audio.test.ts

The existing tests cover the URL-fallback path (test env has AUDIO_INFERENCE_URL, no binding). Add a focused unit test for the binding path by calling the audio Hono app directly with a mock AUDIO_SERVICE env, mirroring the archived generation.test.ts style (env.GENERATION_SERVICE: { getByName: vi.fn(() => stub) }).

  • [ ] Step 1: Add a binding-path describe block to audio.test.ts. Append:
import audioApp from '../routes/audio';

describe('audio route — AUDIO_SERVICE container binding', () => {
  it('/analyze forwards multipart to the container binding when AUDIO_SERVICE is set', async () => {
    const stubFetch = vi.fn(async () =>
      new Response(JSON.stringify({ positions: [], attribution: null }), {
        status: 200, headers: { 'content-type': 'application/json' },
      }),
    );
    const getByName = vi.fn(() => ({ fetch: stubFetch }));
    // Seeded test D1 is provided by cloudflare:test env; reuse it for the canonical lookup.
    const { env } = await import('cloudflare:test');
    const mockEnv = { ...env, AUDIO_SERVICE: { getByName }, AUDIO_INFERENCE_URL: undefined };

    const fd = new FormData();
    fd.append('audio', new File([new Uint8Array([1, 2, 3])], 'clip.wav', { type: 'audio/wav' }));
    fd.append('target', 'cat');

    const res = await audioApp.fetch(
      new Request('http://localhost/analyze', { method: 'POST', body: fd }),
      mockEnv as unknown as typeof env,
    );

    // Only assert the binding was used; D1 seeding state may make this 200 or 404.
    expect(getByName).toHaveBeenCalledWith('default');
    expect(stubFetch).toHaveBeenCalled();
    expect(stubFetch.mock.calls[0][0]).toBe('http://audio-host/analyze');
  });
});

Note on vi/imports: audio.test.ts already imports from vitest and cloudflare:test. Add vi to the vitest import if not present. If audioApp.fetch(req, env) requires a third ExecutionContext arg in this Hono/Workers version, pass createExecutionContext() from cloudflare:test as the third argument.

  • [ ] Step 2: Run the audio tests.

Run: cd packages/web/workers && npx vitest run src/__tests__/audio.test.ts Expected: PASS (existing URL-path tests + the new binding test).

  • [ ] Step 3: Run the full worker test suite (catch ripple).

Run: cd packages/web/workers && npm test Expected: PASS.

  • [ ] Step 4: Commit.
git add packages/web/workers/src/__tests__/audio.test.ts
git commit -m "test(audio): cover the AUDIO_SERVICE container-binding path (PHON-152)"

Task 6: CF image — bake weights, build, push by tag (local)

Files: - Create: packages/audio/deploy/Dockerfile.cf - Create: packages/audio/deploy/build-and-push.sh

The existing deploy/Dockerfile mounts weights from a RunPod network volume. CF Containers have no volume mounts, so the CF image bakes the four artifacts in via COPY from a staged build context (the build script stages them off the drive). This runs locally (where the drive is); the resulting image is pushed to Cloudflare's registry by tag and referenced from wrangler.toml (Task 4). CI never builds it.

  • [ ] Step 1: Create packages/audio/deploy/Dockerfile.cf. It reuses the existing Dockerfile's offline-HF-prewarm pattern but COPYs the artifacts into image-local paths and sets PHONOLEX_AUDIO_* to those paths:
# syntax=docker/dockerfile:1.7
# PhonoLex audio inference host — Cloudflare Containers image.
# Weights are BAKED IN (no volume mounts on CF). Build locally via
# deploy/build-and-push.sh (stages the gitignored artifacts off the drive).
FROM python:3.11-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
        libsndfile1 ffmpeg build-essential \
    && rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir uv==0.5.11
WORKDIR /app

COPY pyproject.toml uv.lock /app/
COPY packages/audio/pyproject.toml /app/packages/audio/pyproject.toml
COPY packages/data/pyproject.toml /app/packages/data/pyproject.toml
COPY packages/audio /app/packages/audio
COPY packages/data  /app/packages/data
RUN uv pip install --system -e packages/data
RUN uv pip install --system -e "packages/audio[inference]"

# Offline HF backbone (~315 MB), baked so startup is fully offline.
ENV HF_HOME=/opt/hf
RUN python -c "from transformers import Wav2Vec2Model, Wav2Vec2FeatureExtractor; \
Wav2Vec2Model.from_pretrained('facebook/wav2vec2-lv-60-espeak-cv-ft'); \
Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-lv-60-espeak-cv-ft')"

# Baked model artifacts (staged into ./_artifacts by build-and-push.sh).
COPY _artifacts/state_serve.pt        /app/artifacts/state_serve.pt
COPY _artifacts/vectors.csv           /app/artifacts/vectors.csv
COPY _artifacts/refs_fisher.json      /app/artifacts/refs_fisher.json
COPY _artifacts/attribution_model.json /app/artifacts/attribution_model.json

ENV PHONOLEX_AUDIO_DEVICE=cpu
ENV PHONOLEX_AUDIO_CHECKPOINT=/app/artifacts/state_serve.pt
ENV PHONOLEX_AUDIO_VECTORS=/app/artifacts/vectors.csv
ENV PHONOLEX_AUDIO_TRAJ_REFS=/app/artifacts/refs_fisher.json
ENV PHONOLEX_AUDIO_ATTRIBUTION=/app/artifacts/attribution_model.json
ENV PHONOLEX_AUDIO_HOST=0.0.0.0
ENV PHONOLEX_AUDIO_PORT=8000
ENV PYTHONUNBUFFERED=1
EXPOSE 8000

CMD ["python", "-m", "phonolex_audio", \
     "--feature-checkpoint", "/app/artifacts/state_serve.pt", \
     "--feature-vectors",    "/app/artifacts/vectors.csv", \
     "--trajectory-refs",    "/app/artifacts/refs_fisher.json", \
     "--attribution-model",  "/app/artifacts/attribution_model.json", \
     "--host", "0.0.0.0", "--port", "8000"]

Device is cpu (benchmark proved CPU sufficient; CF has no GPU). Build context is the repo root (so packages/ resolves); _artifacts/ is created under the repo root by the build script and is gitignored.

  • [ ] Step 2: Create packages/audio/deploy/build-and-push.sh. Stages artifacts off the drive, builds, pushes:
#!/usr/bin/env bash
# Build + push the CF audio image (weights baked). Run LOCALLY (needs the drive + Docker).
# Usage: ACCOUNT_ID=<id> TAG=latest|staging bash packages/audio/deploy/build-and-push.sh
set -euo pipefail

DRIVE="${DRIVE:-/Volumes/ExternalData1/audio-union}"
ACCOUNT_ID="${ACCOUNT_ID:?set ACCOUNT_ID to your Cloudflare account id}"
TAG="${TAG:-latest}"
IMAGE="registry.cloudflare.com/${ACCOUNT_ID}/phonolex-audio:${TAG}"
REPO_ROOT="$(git rev-parse --show-toplevel)"
STAGE="${REPO_ROOT}/_artifacts"

echo "Staging artifacts from ${DRIVE} -> ${STAGE}"
mkdir -p "${STAGE}"
cp "${DRIVE}/model_feat_traj_target/state_serve.pt" "${STAGE}/"
cp "${DRIVE}/model_feat_traj_target/vectors.csv"     "${STAGE}/"
cp "${DRIVE}/refs_fisher.json"                       "${STAGE}/"
cp "${DRIVE}/attribution_model.json"                 "${STAGE}/"

echo "Building ${IMAGE}"
docker build --platform linux/amd64 \
  -f "${REPO_ROOT}/packages/audio/deploy/Dockerfile.cf" \
  -t "${IMAGE}" "${REPO_ROOT}"

echo "Pushing ${IMAGE}"
# Auth to Cloudflare's registry via wrangler, then docker push. Confirm the current
# wrangler containers push/login command for your version; as of writing:
#   npx wrangler containers push "${IMAGE}"   # builds+pushes, OR
#   docker push "${IMAGE}"                     # after `wrangler login` registry auth
( cd "${REPO_ROOT}/packages/web/workers" && npx wrangler containers push "${IMAGE}" ) \
  || docker push "${IMAGE}"

echo "Cleaning staged artifacts"
rm -rf "${STAGE}"
echo "Done: ${IMAGE}"

Verify the registry-push command for the installed wrangler. wrangler containers push vs docker push after a registry login has changed across wrangler versions. The plan's fallback (docker push after wrangler registry auth) covers the common case. --platform linux/amd64 is required if building on Apple Silicon (CF runs amd64).

  • [ ] Step 3: Add _artifacts/ to gitignore. Append _artifacts/ to the repo-root .gitignore (never commit the staged 1.26 GB weights).
echo "_artifacts/" >> .gitignore
  • [ ] Step 4: chmod +x and commit the scripts (NOT the artifacts/image).
chmod +x packages/audio/deploy/build-and-push.sh
git add packages/audio/deploy/Dockerfile.cf packages/audio/deploy/build-and-push.sh .gitignore
git commit -m "build(audio): CF Containers image (weights baked) + local build-and-push script (PHON-152)"
  • [ ] Step 5: Build locally and smoke-test the image before any deploy.

ACCOUNT_ID=<your-account-id> TAG=staging bash packages/audio/deploy/build-and-push.sh
# After build (before/independent of push), run it locally to confirm it serves:
docker run --rm -p 8099:8000 registry.cloudflare.com/<acct>/phonolex-audio:staging &
sleep 20
curl -s localhost:8099/health    # expect {"status":"ok", ..., "analyze": true}
Expected: /health returns analyze: true and the keeper models load (~2 s after boot).


Task 7: First deploy to STAGING (manual, gated)

Files: none (operational). This is the first time anything goes live; it is staging only and still subject to the "happy" gate for production.

  • [ ] Step 1: Confirm the staging image tag is pushed (Task 6 with TAG=staging).
  • [ ] Step 2: Deploy the Worker + container to staging:

cd packages/web/workers && npx wrangler deploy --env staging
Expected: wrangler applies migration v3 (creates AudioHost), provisions the container from the pushed :staging image, deploys the Worker.

  • [ ] Step 3: Smoke-test through the staging Worker (the container is NOT directly reachable — only via the Worker):

curl -s -X POST https://staging-api.phonolex.com/api/audio/analyze \
  -F audio=@/path/to/a/test.wav -F target=cat
Expected: first call may return { "warming": true } (503) during cold start, then a { positions: [...], attribution: {...} } payload on retry within a few seconds.

  • [ ] Step 4: Confirm standard-1 started cleanly. Check the container did not OOM/fail on boot (npx wrangler containers list / the Cloudflare dashboard container logs). Measured local peak was ~3.0 GiB vs the 4 GiB limit; if the linux/amd64 build runs hotter and the container fails to start or restarts under load, bump both wrangler.toml blocks to instance_type = "standard-2", re-deploy, and note it. Do not pre-emptively bump — confirm empirically.
  • [ ] Step 5: Verify scale-to-zero: leave it idle >5 min, re-call, confirm a single warm-up then service. Note cold-start wall-time for the user.
  • [ ] Step 6: Report staging results to the user (including the standard-1 start verdict + cold-start time). Do not proceed to production until the user says "happy."

Task 8: Fold into CI (deploy on audio/worker changes)

Files: - Modify: .github/workflows/deploy.yml, .github/workflows/deploy-staging.yml

Today deploy.yml runs on push to main and seeds D1 only when d1-seed.sql changed (paths-filter), then always runs npx wrangler deploy. Because the image is pre-built+pushed by tag and wrangler.toml references the tag, wrangler deploy wires the container without building it in CI. The only change needed: ensure the deploy job triggers on audio/worker/wrangler changes too, and document that the image must be pushed (manually, locally) before a deploy that bumps it.

  • [ ] Step 1: Read the current deploy job trigger. Confirm whether deploy.yml's deploy job is gated by the paths-filter or runs on every push to main. (From inspection it runs on push to main; the paths-filter only gates the D1 seed step.) If the deploy job already runs on every push to main/develop, no trigger change is needed — wrangler deploy will pick up the new wrangler.toml container config on the next deploy. In that case, skip to Step 3.

  • [ ] Step 2 (only if the deploy job is paths-filtered): add packages/web/workers/** and packages/audio/deploy/** to the filter so worker/container config changes trigger a deploy. Mirror the existing dorny/paths-filter@v3 block.

  • [ ] Step 3: Add a guard note + no image build in CI. Add a comment above the Deploy Workers step in both workflows:

      # Audio container: the image (registry.cloudflare.com/<acct>/phonolex-audio:<tag>)
      # is built + pushed LOCALLY by packages/audio/deploy/build-and-push.sh — NOT here.
      # `wrangler deploy` references the pushed tag (parallels the d1-seed "dev builds,
      # CI consumes" paradigm). A model change requires a fresh local push BEFORE merge.
  • [ ] Step 4: Confirm CLOUDFLARE_ACCOUNT_ID/token available to wrangler deploy. The container image registry path needs the account id; verify the workflow env exposes CLOUDFLARE_ACCOUNT_ID (add to the job env: if wrangler.toml interpolates it). The deploy already authenticates wrangler via CLOUDFLARE_API_TOKEN — confirm that token has Containers + Workers + DO permissions.

  • [ ] Step 5: Commit.

git add .github/workflows/deploy.yml .github/workflows/deploy-staging.yml
git commit -m "ci(audio): deploy AudioHost container via wrangler (pre-pushed image, no CI build) (PHON-152)"

Task 9: Production cutover (BLOCKED on "happy")

Files: none (operational).

  • [ ] Step 1: User has confirmed staging is good and signals "happy."
  • [ ] Step 2: Push the production image tag: ACCOUNT_ID=<id> TAG=latest bash packages/audio/deploy/build-and-push.sh.
  • [ ] Step 3: Merge feature/phon-152-audio-cf-containers → release/v6-audio → the production line per the project's git-flow. CI runs wrangler deploy (prod), applies migration v3, provisions the prod container from :latest.
  • [ ] Step 4: Smoke-test https://api.phonolex.com/api/audio/analyze (same as Task 7 Step 3). Verify the frontend Speech Analysis (Beta) tab end-to-end.
  • [ ] Step 5: Remove the now-unused AUDIO_INFERENCE_URL from prod/staging [vars] if present (keep it only in the local default env). Optional cleanup.
  • [ ] Step 6: Update memory ([[audio-serving-cpu-benchmark]]) + the deploy/README.md to record CF Containers as the live hosting (supersede the RunPod-framed README), and note the cold-start wall-time observed.

Self-Review Notes (gaps the implementer must close, not skip)

  1. @cloudflare/containers API drift. Three call sites depend on the installed version (^0.3.3): getByName('default') (Task 1), defaultPort/sleepAfter (Task 3). Read node_modules/@cloudflare/containers before implementing and adjust the seam + class to the actual API. Every binding call goes through audioFetch, so a rename is a one-line fix.
  2. Registry push command. wrangler containers push vs docker push after registry auth has changed across wrangler versions (Task 6 Step 2). Confirm the current command for the installed wrangler; the script has a || fallback but verify before relying on it.
  3. wrangler.toml env-var interpolation. Task 4 uses ${CLOUDFLARE_ACCOUNT_ID} in the image field. If the installed wrangler does not interpolate, hardcode the (non-secret) account id. Verify with the Task 4 Step 6 dry-run.
  4. Instance size — start at standard-1, validate, fall back. Measured peak RSS ~3.0 GiB (CPU, macOS, 2026-06-15) → standard-1 (4 GiB) has ~1 GiB headroom and is the starting pick. The old CSP host needed standard-2 because it spiked over 4 GiB; ours does not. Validate standard-1 starts cleanly on staging (Task 7 Step 4); bump to standard-2 only on an empirical start/OOM failure. The linux/amd64 torch build may differ from the macOS measurement — measure on staging, don't assume.
  5. Cold start wall-time is the UX number to report. Benchmark proved ~2 s model load, but CF container boot (image pull + start) adds to it. Measure the real first-call latency on staging (Task 7 Step 4) and tell the user — it drives whether the { warming: true } copy needs tuning (see [[feedback_user_facing_copy]]).
  6. No image in git / no weights in git. _artifacts/ is gitignored (Task 6 Step 3); the 1.26 GB state_serve.pt is never committed (drive-only, per the standing constraint). The image lives only in Cloudflare's registry.
  7. Dev loop unchanged. Local wrangler dev (default env, no AUDIO_SERVICE) still falls back to AUDIO_INFERENCE_URL → the directly-run uvicorn host. No Docker needed for the inner loop. Confirm the local audio host launch in .env.development still works after these changes.
  8. Auth gap is closed structurally, not by a token. CF Containers are not publicly exposed (binding-only), so the previously-missing bearer header on the Worker proxy is moot. Do not add a token scheme — verify the container has no public route.

Execution note

This plan is dev-only through Task 6 (build + local smoke). Task 7 is staging only; Task 9 (production) is blocked on the user signaling "happy" per the standing v6 gate. File PHON-152 under PHON-44 before starting, linking PHON-150 (the harness this deploys) and PHON-151 (the gated reseed — same vector geometry; ideally lands in the same "happy" window).