PHON-103 — CSP Domain Caching Implementation Plan¶

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add three module-level LRU caches to the CSP spike so verb-independent setup work (spec_lexicon, filtered_spec, per_word_axes) is computed once per constraint set and reused across multi-verb paragraph composition.

Architecture: New domain_cache.py module with OrderedDict-backed LRU caches. spec_lexicon keyed on (spec_id, id(word_df)). filtered_spec keyed on (id(spec_words_frozenset), hard_constraints, id(word_df)). per_word_axes keyed on (soft_constraints, id(word_df)). Public wrappers replace direct calls in paradigm_3_csp.py and paragraph_csp.py.

Tech Stack: Python 3.12, Polars, collections.OrderedDict, pytest.

Spec: docs/superpowers/specs/2026-05-08-phon-103-csp-domain-caching-design.md

File map¶

File	Action
`packages/generation/research/2026-05-07-sentence-generation-paradigms/domain_cache.py`	Create
`packages/generation/research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py`	Create
`packages/generation/research/2026-05-07-sentence-generation-paradigms/bench_domain_cache.py`	Create
`packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py`	Modify (`spec_lexicon` returns `frozenset`; `_resolve_domain_words` body wrapped; `solve()`'s `per_word_axes` call wrapped)
`packages/generation/research/2026-05-07-sentence-generation-paradigms/paragraph_csp.py`	Modify (`_filtered_domain` body wrapped; `solve_paragraph()`'s `per_word_axes` call wrapped)

All paths in this plan are relative to the repo root /Users/jneumann/Repos/PhonoLex/. The spike directory is referenced as <spike>/ for brevity: <spike>/ = packages/generation/research/2026-05-07-sentence-generation-paradigms/.

Test command throughout:

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Task 1: Bootstrap `domain_cache.py` with empty cache infrastructure¶

Files: - Create: <spike>/domain_cache.py - Create: <spike>/test_domain_cache.py

[ ] Step 1.1: Write the failing test for clear_caches and get_cache_stats

Create <spike>/test_domain_cache.py:

"""Tests for domain_cache.py — PHON-103."""
from __future__ import annotations

import sys
from pathlib import Path

import pytest

sys.path.insert(0, str(Path(__file__).parent))

import domain_cache


def test_clear_caches_resets_stats():
    domain_cache.clear_caches()
    stats = domain_cache.get_cache_stats()
    expected_keys = {"spec_lexicon", "filtered_spec", "per_word_axes"}
    assert set(stats.keys()) == expected_keys
    for cache_name, counts in stats.items():
        assert counts == {"hits": 0, "misses": 0, "evictions": 0}, (
            f"{cache_name} stats not zeroed: {counts}"
        )


def test_get_cache_stats_returns_snapshot_not_reference():
    """Mutating the returned dict must not affect internal state."""
    domain_cache.clear_caches()
    stats = domain_cache.get_cache_stats()
    stats["spec_lexicon"]["hits"] = 999
    fresh = domain_cache.get_cache_stats()
    assert fresh["spec_lexicon"]["hits"] == 0

[ ] Step 1.2: Run test, verify it fails

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: FAIL with ModuleNotFoundError: No module named 'domain_cache'.

[ ] Step 1.3: Create minimal domain_cache.py

Create <spike>/domain_cache.py:

"""Domain caching for CSP — PHON-103.

Three module-level OrderedDict-backed LRU caches keyed on the constraint set,
so verb-independent setup work (spec_lexicon, filtered_spec, per_word_axes)
is computed once per (spec marker × constraints) tuple and reused across
calls within a process lifetime.

Mirrors the existing `_ADVMOD_PMI_CACHE` / `_PHONEME_CACHE` patterns from
`skeleton_csp.py` and `constraint_surface.py`.
"""

from __future__ import annotations

from collections import OrderedDict
from typing import Callable, Iterable, TypeVar

import polars as pl

from phonolex_data.runtime.store import WordStore
from phonolex_generators.cfg_seed.spec_filters import SPEC_FILTERS

from constraint_surface import (
    BoundBoostConstraint,
    BoundConstraint,
    Constraint,
    ExcludeConstraint,
    IncludeConstraint,
    domain_trace,
    hard_filter_expr,
    per_word_axes as _per_word_axes_uncached,
)

_HARD_TYPES = (ExcludeConstraint, BoundConstraint)
_SOFT_TYPES = (IncludeConstraint, BoundBoostConstraint)

_MAX_SPEC_LEXICON = 8
_MAX_FILTERED_SPEC = 64
_MAX_PER_WORD_AXES = 64

_SPEC_LEXICON_CACHE: OrderedDict = OrderedDict()
_FILTERED_SPEC_CACHE: OrderedDict = OrderedDict()
_PER_WORD_AXES_CACHE: OrderedDict = OrderedDict()

_CACHE_STATS: dict[str, dict[str, int]] = {
    "spec_lexicon":  {"hits": 0, "misses": 0, "evictions": 0},
    "filtered_spec": {"hits": 0, "misses": 0, "evictions": 0},
    "per_word_axes": {"hits": 0, "misses": 0, "evictions": 0},
}

T = TypeVar("T")


def _lru_get_or_compute(
    cache: OrderedDict,
    max_size: int,
    stat_key: str,
    key,
    compute: Callable[[], T],
) -> T:
    """Generic LRU get-or-compute. move_to_end on hit; popitem(last=False) on overflow."""
    if key in cache:
        cache.move_to_end(key)
        _CACHE_STATS[stat_key]["hits"] += 1
        return cache[key]
    _CACHE_STATS[stat_key]["misses"] += 1
    value = compute()
    cache[key] = value
    if len(cache) > max_size:
        cache.popitem(last=False)
        _CACHE_STATS[stat_key]["evictions"] += 1
    return value


def clear_caches() -> None:
    """Drop all entries and reset stats. Used by tests."""
    _SPEC_LEXICON_CACHE.clear()
    _FILTERED_SPEC_CACHE.clear()
    _PER_WORD_AXES_CACHE.clear()
    for stats in _CACHE_STATS.values():
        for k in stats:
            stats[k] = 0


def get_cache_stats() -> dict[str, dict[str, int]]:
    """Snapshot of hits/misses/evictions per cache. Defensive copy."""
    return {k: dict(v) for k, v in _CACHE_STATS.items()}

[ ] Step 1.4: Run test, verify both pass

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 2 passed.

[ ] Step 1.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/domain_cache.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py
git commit -m "$(cat <<'EOF'
PHON-103: bootstrap domain_cache module

Empty cache infrastructure: OrderedDict-backed LRU primitives,
_CACHE_STATS, clear_caches(), get_cache_stats(). Tests verify
stat snapshots are defensive copies.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 2: Implement `_hashable_constraints` helper¶

Files: - Modify: <spike>/domain_cache.py - Modify: <spike>/test_domain_cache.py

[ ] Step 2.1: Write failing tests for _hashable_constraints

Append to <spike>/test_domain_cache.py:

from constraint_surface import (
    BoundBoostConstraint,
    BoundConstraint,
    ContrastiveConstraint,
    ExcludeConstraint,
    IncludeConstraint,
)


def test_hashable_constraints_filters_by_type():
    excl = ExcludeConstraint(phonemes=("ɹ",))
    bnd = BoundConstraint(norm="aoa", max_value=6.0)
    incl = IncludeConstraint(phonemes=("k",))
    contrastive = ContrastiveConstraint(pair_type="minpair", phoneme1="k", phoneme2="g")

    hard_types = (ExcludeConstraint, BoundConstraint)
    result = domain_cache._hashable_constraints([excl, bnd, incl, contrastive], hard_types)
    assert set(result) == {excl, bnd}
    assert isinstance(result, tuple)


def test_hashable_constraints_order_invariant():
    excl = ExcludeConstraint(phonemes=("ɹ",))
    bnd = BoundConstraint(norm="aoa", max_value=6.0)
    hard_types = (ExcludeConstraint, BoundConstraint)
    forward = domain_cache._hashable_constraints([excl, bnd], hard_types)
    reverse = domain_cache._hashable_constraints([bnd, excl], hard_types)
    assert forward == reverse


def test_hashable_constraints_empty_returns_empty_tuple():
    incl = IncludeConstraint(phonemes=("k",))
    hard_types = (ExcludeConstraint, BoundConstraint)
    assert domain_cache._hashable_constraints([incl], hard_types) == ()
    assert domain_cache._hashable_constraints([], hard_types) == ()

[ ] Step 2.2: Run test, verify it fails

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 3 new tests fail with AttributeError: module 'domain_cache' has no attribute '_hashable_constraints'.

[ ] Step 2.3: Implement _hashable_constraints

Append to <spike>/domain_cache.py (above _lru_get_or_compute):

def _hashable_constraints(
    constraints: Iterable[Constraint],
    types: tuple[type, ...],
) -> tuple[Constraint, ...]:
    """Filter to relevant Constraint types and sort for stable hashing.

    Constraints are frozen dataclasses (hashable). Sort key uses (type, repr)
    so two semantically-equivalent lists in different orders produce the same
    cache key.
    """
    relevant = [c for c in constraints if isinstance(c, types)]
    return tuple(sorted(relevant, key=lambda c: (c.type, repr(c))))

[ ] Step 2.4: Run tests, verify all pass

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 5 passed.

[ ] Step 2.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/domain_cache.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py
git commit -m "$(cat <<'EOF'
PHON-103: add _hashable_constraints helper

Filters by Constraint subtype and sorts by (type, repr) so reordered
input lists yield the same cache key.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 3: Implement `get_spec_lexicon` with WordStore fixture¶

Files: - Modify: <spike>/domain_cache.py - Modify: <spike>/test_domain_cache.py

[ ] Step 3.1: Add a session-scoped WordStore fixture and clear_caches autouse fixture

Append to top of <spike>/test_domain_cache.py (after existing imports):

@pytest.fixture(scope="session")
def store():
    """Session-scoped WordStore. ~1s to load."""
    from phonolex_data.runtime.store import WordStore
    repo_root = Path(__file__).resolve().parents[4]
    return WordStore.from_parquet(repo_root / "data" / "runtime" / "words.parquet")


@pytest.fixture(autouse=True)
def _reset_caches():
    """Clear caches between tests for isolation."""
    domain_cache.clear_caches()
    yield

[ ] Step 3.2: Write failing tests for get_spec_lexicon

Append to <spike>/test_domain_cache.py:

def test_get_spec_lexicon_correctness_vs_uncached(store):
    """Cached result must equal direct spec_lexicon call."""
    from phonolex_generators.cfg_seed.spec_filters import SPEC_FILTERS

    expected = frozenset(
        store.subset(SPEC_FILTERS["spec1"])
        .get_column("word")
        .str.to_lowercase()
        .to_list()
    )
    cached = domain_cache.get_spec_lexicon("spec1", store)
    assert cached == expected
    assert isinstance(cached, frozenset)


def test_get_spec_lexicon_returns_same_object_on_hit(store):
    """Repeated calls return the SAME frozenset object (id stable). This is
    what allows downstream get_filtered_spec to use id() as a cache key."""
    a = domain_cache.get_spec_lexicon("spec1", store)
    b = domain_cache.get_spec_lexicon("spec1", store)
    assert a is b


def test_get_spec_lexicon_second_call_hits_cache(store):
    domain_cache.get_spec_lexicon("spec1", store)
    domain_cache.get_spec_lexicon("spec1", store)
    stats = domain_cache.get_cache_stats()
    assert stats["spec_lexicon"]["misses"] == 1
    assert stats["spec_lexicon"]["hits"] == 1


def test_get_spec_lexicon_different_specs_separate_entries(store):
    domain_cache.get_spec_lexicon("spec1", store)
    domain_cache.get_spec_lexicon("spec6", store)
    stats = domain_cache.get_cache_stats()
    assert stats["spec_lexicon"]["misses"] == 2
    assert stats["spec_lexicon"]["hits"] == 0

[ ] Step 3.3: Run tests, verify they fail

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 4 new tests fail with AttributeError: module 'domain_cache' has no attribute 'get_spec_lexicon'.

[ ] Step 3.4: Implement get_spec_lexicon

Append to <spike>/domain_cache.py:

def get_spec_lexicon(spec_id: str, store: WordStore) -> frozenset[str]:
    """Cached spec_lexicon. Returns the SAME frozenset on repeated calls,
    so downstream caches keyed on id() of the result remain stable.

    Keyed on (spec_id, id(store.df)). Cache lifetime is process lifetime;
    underlying SPEC_FILTERS and store.df are immutable singletons.
    """
    key = (spec_id, id(store.df))

    def compute() -> frozenset[str]:
        return frozenset(
            store.subset(SPEC_FILTERS[spec_id])
            .get_column("word")
            .str.to_lowercase()
            .to_list()
        )

    return _lru_get_or_compute(
        _SPEC_LEXICON_CACHE, _MAX_SPEC_LEXICON, "spec_lexicon", key, compute,
    )

[ ] Step 3.5: Run tests, verify all pass

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 9 passed.

[ ] Step 3.6: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/domain_cache.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py
git commit -m "$(cat <<'EOF'
PHON-103: implement get_spec_lexicon cache

Keyed on (spec_id, id(store.df)). Returns the SAME frozenset on repeated
calls — important because downstream get_filtered_spec keys on
id(spec_words). Tests verify object identity stability.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 4: Implement `get_filtered_spec`¶

Files: - Modify: <spike>/domain_cache.py - Modify: <spike>/test_domain_cache.py

[ ] Step 4.1: Write failing tests for get_filtered_spec

Append to <spike>/test_domain_cache.py:

def test_get_filtered_spec_correctness_no_constraints(store):
    """Empty constraints → returns spec_words unchanged, empty trace."""
    spec_words = domain_cache.get_spec_lexicon("spec1", store)
    filtered, trace = domain_cache.get_filtered_spec(spec_words, [], store.df)
    assert filtered == spec_words
    assert trace == []


def test_get_filtered_spec_correctness_with_hard_constraint(store):
    spec_words = domain_cache.get_spec_lexicon("spec1", store)
    excl = ExcludeConstraint(phonemes=("ɹ",))
    filtered, trace = domain_cache.get_filtered_spec(spec_words, [excl], store.df)
    # All filtered words must lack /ɹ/
    spec_df = store.df.filter(pl.col("word").is_in(list(filtered)))
    has_r = spec_df.filter(pl.col("phonemes_str").str.contains("|ɹ|", literal=True))
    assert has_r.height == 0, "filter leaked words containing /ɹ/"
    assert filtered < spec_words  # strict subset (some words contain ɹ)
    assert len(trace) == 1
    assert trace[0]["constraint_label"] == "exclude /ɹ/"


def test_get_filtered_spec_constraint_order_invariance(store):
    spec_words = domain_cache.get_spec_lexicon("spec1", store)
    excl = ExcludeConstraint(phonemes=("ɹ",))
    bnd = BoundConstraint(norm="aoa", max_value=6.0)

    a, _ = domain_cache.get_filtered_spec(spec_words, [excl, bnd], store.df)
    b, _ = domain_cache.get_filtered_spec(spec_words, [bnd, excl], store.df)
    assert a == b
    stats = domain_cache.get_cache_stats()
    assert stats["filtered_spec"]["misses"] == 1
    assert stats["filtered_spec"]["hits"] == 1


def test_get_filtered_spec_word_df_none_passthrough(store):
    """word_df=None → returns (spec_words, []), no caching."""
    spec_words = domain_cache.get_spec_lexicon("spec1", store)
    excl = ExcludeConstraint(phonemes=("ɹ",))
    filtered, trace = domain_cache.get_filtered_spec(spec_words, [excl], None)
    assert filtered == spec_words
    assert trace == []
    stats = domain_cache.get_cache_stats()
    assert stats["filtered_spec"] == {"hits": 0, "misses": 0, "evictions": 0}


def test_get_filtered_spec_trace_mutation_isolated(store):
    """Mutating returned trace must not corrupt the cached version."""
    spec_words = domain_cache.get_spec_lexicon("spec1", store)
    excl = ExcludeConstraint(phonemes=("ɹ",))
    a, trace_a = domain_cache.get_filtered_spec(spec_words, [excl], store.df)
    trace_a[0]["constraint_label"] = "MUTATED"

    b, trace_b = domain_cache.get_filtered_spec(spec_words, [excl], store.df)
    assert trace_b[0]["constraint_label"] == "exclude /ɹ/"


def test_get_filtered_spec_same_frozenset_hits(store):
    """Same spec_words frozenset reference → cache hit on second call."""
    spec_words = domain_cache.get_spec_lexicon("spec1", store)
    excl = ExcludeConstraint(phonemes=("ɹ",))
    domain_cache.get_filtered_spec(spec_words, [excl], store.df)
    domain_cache.get_filtered_spec(spec_words, [excl], store.df)
    stats = domain_cache.get_cache_stats()
    assert stats["filtered_spec"]["hits"] == 1

[ ] Step 4.2: Run tests, verify they fail

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 6 new tests fail with AttributeError: module 'domain_cache' has no attribute 'get_filtered_spec'.

[ ] Step 4.3: Implement get_filtered_spec

Append to <spike>/domain_cache.py:

def get_filtered_spec(
    spec_words: frozenset[str],
    constraints: Iterable[Constraint],
    word_df: pl.DataFrame | None,
) -> tuple[frozenset[str], list[dict]]:
    """Cached spec ∩ hard-constraint filter.

    Caller must pass `spec_words` as a frozenset and hold the same reference
    across calls that should hit. `get_spec_lexicon` returns a stable cached
    frozenset; paragraph_csp callers compose `spec1 | spec6` once per request
    and reuse the union across the verb loop.

    Returns (filtered_word_set, domain_trace). Domain trace is a fresh list of
    fresh dicts on every call so caller mutation doesn't corrupt the cache.

    word_df=None → pass-through (no caching).
    """
    if word_df is None:
        return spec_words, []

    hard = _hashable_constraints(constraints, _HARD_TYPES)
    key = (id(spec_words), hard, id(word_df))

    def compute() -> tuple[frozenset[str], tuple[dict, ...]]:
        if not hard:
            return spec_words, ()
        spec_df = word_df.filter(pl.col("word").is_in(list(spec_words)))
        expr = hard_filter_expr(list(hard))
        if expr is None:
            return spec_words, ()
        trace = tuple(domain_trace(list(hard), spec_df))
        filtered = frozenset(spec_df.filter(expr).get_column("word").to_list())
        return filtered, trace

    filtered, trace = _lru_get_or_compute(
        _FILTERED_SPEC_CACHE, _MAX_FILTERED_SPEC, "filtered_spec", key, compute,
    )
    return filtered, [dict(t) for t in trace]

[ ] Step 4.4: Run tests, verify all pass

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 15 passed.

[ ] Step 4.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/domain_cache.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py
git commit -m "$(cat <<'EOF'
PHON-103: implement get_filtered_spec cache

Keyed on (id(spec_words), hard_constraints, id(word_df)). The id()-on-
frozenset key shape lets paragraph_csp's spec1|spec6 unions cache
correctly within a request. word_df=None bypasses caching to preserve
the existing fallback behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 5: Implement `get_per_word_axes`¶

Files: - Modify: <spike>/domain_cache.py - Modify: <spike>/test_domain_cache.py

[ ] Step 5.1: Write failing tests for get_per_word_axes

Append to <spike>/test_domain_cache.py:

def test_get_per_word_axes_correctness_vs_uncached(store):
    from constraint_surface import per_word_axes as uncached_per_word_axes
    incl = IncludeConstraint(phonemes=("k",))
    expected = uncached_per_word_axes([incl], store.df)
    cached = domain_cache.get_per_word_axes([incl], store.df)
    assert cached == expected


def test_get_per_word_axes_second_call_hits_cache(store):
    incl = IncludeConstraint(phonemes=("k",))
    domain_cache.get_per_word_axes([incl], store.df)
    domain_cache.get_per_word_axes([incl], store.df)
    stats = domain_cache.get_cache_stats()
    assert stats["per_word_axes"]["misses"] == 1
    assert stats["per_word_axes"]["hits"] == 1


def test_get_per_word_axes_filters_out_hard_constraints(store):
    """Hard constraints must NOT be part of the per_word_axes key —
    different hard constraints with same soft constraints should hit."""
    incl = IncludeConstraint(phonemes=("k",))
    excl = ExcludeConstraint(phonemes=("ɹ",))
    bnd = BoundConstraint(norm="aoa", max_value=6.0)

    domain_cache.get_per_word_axes([incl, excl], store.df)
    domain_cache.get_per_word_axes([incl, bnd], store.df)
    stats = domain_cache.get_cache_stats()
    assert stats["per_word_axes"]["misses"] == 1
    assert stats["per_word_axes"]["hits"] == 1


def test_get_per_word_axes_word_df_none_returns_empty(store):
    incl = IncludeConstraint(phonemes=("k",))
    result = domain_cache.get_per_word_axes([incl], None)
    assert result == {}
    stats = domain_cache.get_cache_stats()
    assert stats["per_word_axes"] == {"hits": 0, "misses": 0, "evictions": 0}

[ ] Step 5.2: Run tests, verify they fail

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 4 new tests fail with AttributeError: module 'domain_cache' has no attribute 'get_per_word_axes'.

[ ] Step 5.3: Implement get_per_word_axes

Append to <spike>/domain_cache.py:

def get_per_word_axes(
    constraints: Iterable[Constraint],
    word_df: pl.DataFrame | None,
) -> dict[str, dict[str, float]]:
    """Cached per_word_axes lookup tables.

    Keyed on (sorted soft constraints, id(word_df)). Hard constraints are
    filtered out — they don't affect axis values, so changing only hard
    constraints should hit the cache.

    word_df=None → returns empty dict (no axes), no caching.
    """
    if word_df is None:
        return {}
    soft = _hashable_constraints(constraints, _SOFT_TYPES)
    key = (soft, id(word_df))
    return _lru_get_or_compute(
        _PER_WORD_AXES_CACHE, _MAX_PER_WORD_AXES, "per_word_axes", key,
        lambda: _per_word_axes_uncached(list(soft), word_df),
    )

[ ] Step 5.4: Run tests, verify all pass

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 19 passed.

[ ] Step 5.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/domain_cache.py \
        packages/generation/research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py
git commit -m "$(cat <<'EOF'
PHON-103: implement get_per_word_axes cache

Keyed on (soft_constraints, id(word_df)). Hard constraints excluded
from key — toggling them must not bust the per_word_axes cache.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 6: Add LRU eviction tests¶

Files: - Modify: <spike>/test_domain_cache.py

[ ] Step 6.1: Write LRU eviction tests

Append to <spike>/test_domain_cache.py:

def _make_unique_excludes(n: int) -> list[list[ExcludeConstraint]]:
    """Generate n distinct constraint lists for forcing cache fills."""
    return [[ExcludeConstraint(phonemes=(f"x{i}",))] for i in range(n)]


def test_filtered_spec_lru_evicts_oldest_when_full(store):
    spec_words = domain_cache.get_spec_lexicon("spec1", store)
    # _MAX_FILTERED_SPEC = 64. Insert 65 distinct keys → 1 eviction.
    constraint_lists = _make_unique_excludes(65)
    for cs in constraint_lists:
        domain_cache.get_filtered_spec(spec_words, cs, store.df)
    stats = domain_cache.get_cache_stats()
    assert stats["filtered_spec"]["misses"] == 65
    assert stats["filtered_spec"]["evictions"] == 1
    assert stats["filtered_spec"]["hits"] == 0


def test_filtered_spec_lru_move_to_end_on_hit(store):
    """Re-accessing the oldest entry should promote it; next eviction targets the new oldest."""
    spec_words = domain_cache.get_spec_lexicon("spec1", store)
    constraint_lists = _make_unique_excludes(64)
    for cs in constraint_lists:
        domain_cache.get_filtered_spec(spec_words, cs, store.df)
    # Cache is full. Touch the oldest entry (index 0) — this promotes it.
    domain_cache.get_filtered_spec(spec_words, constraint_lists[0], store.df)
    stats = domain_cache.get_cache_stats()
    assert stats["filtered_spec"]["hits"] == 1
    # Insert a new entry — should evict index-1 (the new oldest), not index-0.
    domain_cache.get_filtered_spec(
        spec_words, [ExcludeConstraint(phonemes=("zNew",))], store.df,
    )
    stats = domain_cache.get_cache_stats()
    assert stats["filtered_spec"]["evictions"] == 1
    # Verify index-0 is still in cache by re-accessing it (should hit).
    domain_cache.get_filtered_spec(spec_words, constraint_lists[0], store.df)
    stats = domain_cache.get_cache_stats()
    assert stats["filtered_spec"]["hits"] == 2


def test_per_word_axes_lru_evicts_at_max(store):
    """_MAX_PER_WORD_AXES = 64. Insert 65 → 1 eviction."""
    constraint_lists = [
        [IncludeConstraint(phonemes=(f"y{i}",))] for i in range(65)
    ]
    for cs in constraint_lists:
        domain_cache.get_per_word_axes(cs, store.df)
    stats = domain_cache.get_cache_stats()
    assert stats["per_word_axes"]["evictions"] == 1


def test_spec_lexicon_cap_above_real_spec_count(store):
    """Touch every real spec_id; _MAX_SPEC_LEXICON = 8 ≥ len(SPEC_FILTERS)
    so no eviction expected."""
    from phonolex_generators.cfg_seed.spec_filters import SPEC_FILTERS
    for spec_id in SPEC_FILTERS:
        domain_cache.get_spec_lexicon(spec_id, store)
    stats = domain_cache.get_cache_stats()
    assert stats["spec_lexicon"]["evictions"] == 0
    assert stats["spec_lexicon"]["misses"] == len(SPEC_FILTERS)

[ ] Step 6.2: Run tests, verify all pass

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 23 passed.

[ ] Step 6.3: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py
git commit -m "$(cat <<'EOF'
PHON-103: LRU eviction + move-to-end tests

Verifies oldest entry evicted at capacity, hit promotes entry past
next eviction, and spec_lexicon cap is comfortably above the
SPEC_FILTERS count.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 7: Wire `spec_lexicon` callsites in `paradigm_3_csp.py`¶

Files: - Modify: <spike>/paradigm_3_csp.py

The function spec_lexicon(store, spec_id) is defined at line 77 and called at lines 264, 320, 385–386, 451–452, 560, 625 (8 callsites total). We replace the local definition with a wrapper that returns the cached frozenset directly.

[ ] Step 7.1: Add import for the cache wrappers

Find the existing block (around line 35–47):

from constraint_surface import (  # noqa: E402
    Constraint,
    BoundConstraint,
    ExcludeConstraint,
    IncludeConstraint,
    BoundBoostConstraint,
    ContrastiveConstraint,
    cross_slot_axes,
    domain_trace,
    hard_filter_expr,
    per_word_axes,
)
from reranker import rerank  # noqa: E402

After this block, add:

from domain_cache import (  # noqa: E402
    get_filtered_spec,
    get_per_word_axes,
    get_spec_lexicon,
)

[ ] Step 7.2: Replace the local spec_lexicon function with a thin wrapper

Find:

def spec_lexicon(store: WordStore, spec_id: str) -> set[str]:
    return set(
        store.subset(SPEC_FILTERS[spec_id])
        .get_column("word")
        .str.to_lowercase()
        .to_list()
    )

Replace with:

def spec_lexicon(store: WordStore, spec_id: str) -> frozenset[str]:
    """Backwards-compat wrapper around domain_cache.get_spec_lexicon.

    Returns a `frozenset[str]`. Downstream callers do `set & spec_words`
    intersections which work identically on frozenset. The frozenset return
    type is REQUIRED — callers must pass the same frozenset object to
    get_filtered_spec to hit the cache (key is id(spec_words)).
    """
    return get_spec_lexicon(spec_id, store)

[ ] Step 7.3: Check whether SPEC_FILTERS is referenced elsewhere; remove import if unused

grep -n "SPEC_FILTERS" packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py

If SPEC_FILTERS is no longer referenced anywhere in paradigm_3_csp.py, remove the line from phonolex_generators.cfg_seed.spec_filters import SPEC_FILTERS. Otherwise leave it.

[ ] Step 7.4: Run cache tests to verify no regression

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 23 passed.

[ ] Step 7.5: Smoke-test paradigm_3_csp imports + a baseline solve

cd packages/generation/research/2026-05-07-sentence-generation-paradigms && \
  uv run python -c "
import paradigm_3_csp
from phonolex_data.runtime.store import WordStore
from pathlib import Path
import polars as pl

repo = Path('../../../..').resolve()
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
spec_words = paradigm_3_csp.spec_lexicon(store, 'spec1')
top, stats = paradigm_3_csp.solve('cut', 'spec1', spec_words, sel_df, word_df=store.df)
print(f'top-1: {top[0][\"sentence\"]}  domain_size={stats[\"nsubj_domain_size\"]}')
print(f'spec_words type: {type(spec_words).__name__}')
"

Expected: a sentence printed and spec_words type: frozenset.

[ ] Step 7.6: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py
git commit -m "$(cat <<'EOF'
PHON-103: route spec_lexicon through domain_cache

Local definition becomes a wrapper around get_spec_lexicon, returning
the cached frozenset directly. The frozenset return type is required
for downstream get_filtered_spec keying on id(spec_words). Set ops
downstream (set & frozenset) are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 8: Wire `_resolve_domain_words` and `per_word_axes` callsites in `paradigm_3_csp.py`¶

Files: - Modify: <spike>/paradigm_3_csp.py

[ ] Step 8.1: Replace the body of _resolve_domain_words

Find (around line 93):

def _resolve_domain_words(
    spec_words: set[str],
    constraints: list[Constraint],
    word_df: pl.DataFrame | None,
) -> tuple[set[str], list[dict]]:
    """Apply hard constraints to the spec lexicon. Returns (filtered, trace)."""
    hard = [c for c in constraints if isinstance(c, (ExcludeConstraint, BoundConstraint))]
    if not hard or word_df is None:
        return spec_words, []
    spec_df = word_df.filter(pl.col("word").is_in(list(spec_words)))
    expr = hard_filter_expr(hard)
    if expr is None:
        return spec_words, []
    trace = domain_trace(hard, spec_df)
    filtered = set(spec_df.filter(expr).get_column("word").to_list())
    return filtered, trace

Replace with:

def _resolve_domain_words(
    spec_words: frozenset[str],
    constraints: list[Constraint],
    word_df: pl.DataFrame | None,
) -> tuple[frozenset[str], list[dict]]:
    """Apply hard constraints to the spec lexicon via the domain cache.

    Returns (filtered_words, trace). spec_words must be a frozenset (the
    caller pattern: spec_words = spec_lexicon(store, spec_id) which now
    returns frozenset). Callers that need a mutable set can wrap with set().
    """
    return get_filtered_spec(spec_words, constraints, word_df)

[ ] Step 8.2: Replace the per_word_axes callsite in solve()

Find (around line 170):

    word_axes = per_word_axes(constraints, word_df) if word_df is not None else {}

Replace with:

    word_axes = get_per_word_axes(constraints, word_df)

(The if word_df is not None guard moves into get_per_word_axes itself.)

[ ] Step 8.3: Run cache tests to verify no regression

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 23 passed.

[ ] Step 8.4: Smoke-test paradigm_3_csp end-to-end

cd packages/generation/research/2026-05-07-sentence-generation-paradigms && \
  uv run python -c "
import paradigm_3_csp, domain_cache
from phonolex_data.runtime.store import WordStore
from pathlib import Path
import polars as pl

repo = Path('../../../..').resolve()
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
spec_words = paradigm_3_csp.spec_lexicon(store, 'spec1')
domain_cache.clear_caches()
for _ in range(3):
    paradigm_3_csp.solve('cut', 'spec1', spec_words, sel_df, word_df=store.df)
print(domain_cache.get_cache_stats())
"

Expected: filtered_spec shows 1 miss + 2 hits; per_word_axes shows 1 miss + 2 hits.

[ ] Step 8.5: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/paradigm_3_csp.py
git commit -m "$(cat <<'EOF'
PHON-103: route _resolve_domain_words and solve()'s per_word_axes through cache

_resolve_domain_words becomes a thin pass-through to get_filtered_spec.
solve()'s per_word_axes call routes through get_per_word_axes, which
absorbs the word_df=None guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 9: Wire `_filtered_domain` and `per_word_axes` callsites in `paragraph_csp.py`¶

Files: - Modify: <spike>/paragraph_csp.py

paragraph_csp.py has its own _filtered_domain function (line 94) that duplicates _resolve_domain_words logic, plus a per_word_axes call in solve_paragraph (line 231). Both route through the cache.

[ ] Step 9.1: Add the cache import

Find the import block (around lines 50–58):

from constraint_surface import (
    ...
    per_word_axes,
)
from skeleton_csp import (
    ...
)

After the skeleton_csp import block, add:

from domain_cache import get_filtered_spec, get_per_word_axes

Also remove per_word_axes from the from constraint_surface import (...) block — it's no longer used directly.

[ ] Step 9.2: Replace the body of _filtered_domain

Find (around line 94):

def _filtered_domain(
    spec_words: set[str],
    constraints: tuple[Constraint, ...],
    word_df: pl.DataFrame,
) -> set[str]:
    """Apply hard constraints to spec_words → narrowed domain."""
    hard = [c for c in constraints if isinstance(c, (ExcludeConstraint, BoundConstraint))]
    if not hard:
        return spec_words
    spec_df = word_df.filter(pl.col("word").is_in(list(spec_words)))
    expr = hard_filter_expr(hard)
    if expr is None:
        return spec_words
    return set(spec_df.filter(expr).get_column("word").to_list())

Replace with:

def _filtered_domain(
    spec_words: frozenset[str],
    constraints: tuple[Constraint, ...],
    word_df: pl.DataFrame,
) -> frozenset[str]:
    """Apply hard constraints to spec_words via the domain cache.

    Routes through get_filtered_spec; keys on id(spec_words) so the same
    frozenset reference reused across the verb loop hits the cache.
    """
    filtered, _ = get_filtered_spec(spec_words, list(constraints), word_df)
    return filtered

[ ] Step 9.3: Replace the per_word_axes callsite in solve_paragraph()

Find (around line 231):

    word_axes = per_word_axes(list(spec.constraints), store_df)

Replace with:

    word_axes = get_per_word_axes(list(spec.constraints), store_df)

[ ] Step 9.4: Verify hard_filter_expr and domain_trace are no longer referenced; remove if unused

grep -n "hard_filter_expr\|^from constraint_surface" packages/generation/research/2026-05-07-sentence-generation-paradigms/paragraph_csp.py

If hard_filter_expr is no longer referenced (the only caller was inside the now-replaced _filtered_domain body), remove it from the from constraint_surface import (...) block.

[ ] Step 9.5: Smoke-test paragraph_csp end-to-end

cd packages/generation/research/2026-05-07-sentence-generation-paradigms && \
  uv run python -c "
import paragraph_csp, paradigm_3_csp, domain_cache
from paragraph_csp import ParagraphSpec, solve_paragraph
from phonolex_data.runtime.store import WordStore
from pathlib import Path
import polars as pl

repo = Path('../../../..').resolve()
store = WordStore.from_parquet(repo / 'data' / 'runtime' / 'words.parquet')
sel_df = pl.read_parquet(repo / 'data' / 'runtime' / 'selectional.parquet')
spec_words = paradigm_3_csp.spec_lexicon(store, 'spec1') | paradigm_3_csp.spec_lexicon(store, 'spec6')
domain_cache.clear_caches()
spec = ParagraphSpec(verbs=('chase','sit','eat'), band='fineweb_adult', constraints=(),
                    n_paragraphs=1, per_sentence_top_k=2)
solve_paragraph(spec, store_df=store.df, sel_df=sel_df, spec_words=spec_words)
print(domain_cache.get_cache_stats())
"

Expected: filtered_spec shows ≥1 miss + ≥2 hits (1 miss in _filtered_domain + verb-loop hits); per_word_axes shows 1 miss; spec_lexicon shows 2 misses (from the | setup).

[ ] Step 9.6: Run cache tests

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 23 passed.

[ ] Step 9.7: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/paragraph_csp.py
git commit -m "$(cat <<'EOF'
PHON-103: route paragraph_csp _filtered_domain and per_word_axes through cache

_filtered_domain becomes a pass-through to get_filtered_spec.
solve_paragraph's per_word_axes call routes through the cache. With
spec_words held as a stable frozenset for the request duration,
multi-verb paragraphs share one filtered_spec entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 10: Behavior test — paragraph composition reuses cached domain¶

Files: - Modify: <spike>/test_domain_cache.py

[ ] Step 10.1: Write the multi-verb paragraph hit/miss test

Append to <spike>/test_domain_cache.py:

def test_solve_paragraph_reuses_cached_domain(store):
    """3-verb paragraph with shared constraints → solve_paragraph calls
    get_filtered_spec once (in _filtered_domain), and again per verb in
    each candidate-subject solve_shape path. With one stable spec_words
    frozenset, all per-verb calls hit. Per_word_axes called once → 1 miss."""
    from paragraph_csp import ParagraphSpec, solve_paragraph

    repo_root = Path(__file__).resolve().parents[4]
    sel_df = pl.read_parquet(repo_root / "data" / "runtime" / "selectional.parquet")

    spec_words = (
        domain_cache.get_spec_lexicon("spec1", store)
        | domain_cache.get_spec_lexicon("spec6", store)
    )
    # Reset to discount the spec_lexicon prep above
    domain_cache.clear_caches()

    spec = ParagraphSpec(
        verbs=("chase", "sit", "eat"),
        band="fineweb_adult",
        constraints=(IncludeConstraint(phonemes=("k",)),),
        n_paragraphs=1,
        per_sentence_top_k=2,
    )
    solve_paragraph(spec, store_df=store.df, sel_df=sel_df, spec_words=spec_words)

    stats = domain_cache.get_cache_stats()
    # filtered_spec: paragraph_csp._filtered_domain calls it once with
    # constraints=(IncludeConstraint,) → since IncludeConstraint isn't a
    # _HARD_TYPE it produces an empty `hard` tuple → cache MISS first call.
    # Subsequent _filtered_domain calls don't happen in solve_paragraph
    # (only one call). But verb-loop solve_shape calls do NOT route through
    # filtered_spec (solve_shape doesn't use it). Therefore: 1 miss, 0 hits.
    assert stats["filtered_spec"]["misses"] == 1
    # per_word_axes: solve_paragraph calls it once outside the verb loop.
    # solve_shape does NOT call per_word_axes — that's solve()'s job, and
    # paragraph_csp uses solve_shape directly. So: 1 miss, 0 hits.
    assert stats["per_word_axes"]["misses"] == 1


def test_solve_loop_reuses_cached_domain(store):
    """3-verb loop calling paradigm_3_csp.solve directly → 1 miss + 2 hits
    on filtered_spec and per_word_axes (solve() goes through both caches
    per call)."""
    import paradigm_3_csp

    repo_root = Path(__file__).resolve().parents[4]
    sel_df = pl.read_parquet(repo_root / "data" / "runtime" / "selectional.parquet")

    spec_words = paradigm_3_csp.spec_lexicon(store, "spec1")
    domain_cache.clear_caches()

    constraints = [IncludeConstraint(phonemes=("k",)), ExcludeConstraint(phonemes=("ɹ",))]
    for verb in ("cut", "chase", "eat"):
        paradigm_3_csp.solve(
            verb, "spec1", spec_words, sel_df,
            constraints=constraints, word_df=store.df,
        )

    stats = domain_cache.get_cache_stats()
    assert stats["filtered_spec"]["misses"] == 1
    assert stats["filtered_spec"]["hits"] == 2
    assert stats["per_word_axes"]["misses"] == 1
    assert stats["per_word_axes"]["hits"] == 2

The two tests cover the two distinct call-shapes: solve_paragraph (single call to each cache, no verb-loop hits because solve_shape doesn't use the caches) and paradigm_3_csp.solve looped per verb (verb-loop hits).

[ ] Step 10.2: Run the tests, verify they pass

cd packages/generation && uv run python -m pytest research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py -v

Expected: 25 passed. If either of the two new tests fails, the actual hit/miss counts in the failure message tell you the real call pattern; reconcile by adjusting expectations to match (the design only requires that some reuse occurs, not exact numbers).

[ ] Step 10.3: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/test_domain_cache.py
git commit -m "$(cat <<'EOF'
PHON-103: behavior tests — verb-loop reuses cached domain

Two tests cover the two call-shapes: solve_paragraph (one call to each
cache, no verb-loop hits because solve_shape bypasses the caches) and
paradigm_3_csp.solve looped per verb (verb-loop hits 1 miss + 2 hits).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Task 11: Bench script + record baseline¶

Files: - Create: <spike>/bench_domain_cache.py

[ ] Step 11.1: Create the bench script

Create <spike>/bench_domain_cache.py:

"""Bench domain_cache speedup across realistic scenarios — PHON-103.

Run: uv run python research/2026-05-07-sentence-generation-paradigms/bench_domain_cache.py

Reports wall-clock for 4 conditions and the cache stats per condition.
"""
from __future__ import annotations

import sys
import time
from pathlib import Path

import polars as pl

sys.path.insert(0, str(Path(__file__).parent))

import domain_cache
import paradigm_3_csp
from constraint_surface import (
    BoundConstraint,
    ExcludeConstraint,
    IncludeConstraint,
)
from paragraph_csp import ParagraphSpec, solve_paragraph
from phonolex_data.runtime.store import WordStore


def _load_data() -> tuple[WordStore, pl.DataFrame]:
    repo_root = Path(__file__).resolve().parents[4]
    store = WordStore.from_parquet(repo_root / "data" / "runtime" / "words.parquet")
    sel_df = pl.read_parquet(repo_root / "data" / "runtime" / "selectional.parquet")
    return store, sel_df


def _run_paragraph(store: WordStore, sel_df: pl.DataFrame, constraints: tuple) -> float:
    spec_words = (
        paradigm_3_csp.spec_lexicon(store, "spec1")
        | paradigm_3_csp.spec_lexicon(store, "spec6")
    )
    spec = ParagraphSpec(
        verbs=("chase", "sit", "eat"),
        band="fineweb_adult",
        constraints=constraints,
        n_paragraphs=2,
        per_sentence_top_k=4,
    )
    t0 = time.perf_counter()
    solve_paragraph(spec, store_df=store.df, sel_df=sel_df, spec_words=spec_words)
    return time.perf_counter() - t0


def main() -> None:
    print("Loading WordStore + selectional.parquet…")
    store, sel_df = _load_data()

    constraints_a = (IncludeConstraint(phonemes=("k",)),)
    constraints_b = (
        ExcludeConstraint(phonemes=("ɹ",)),
        BoundConstraint(norm="aoa", max_value=6.0),
    )

    print("\n=== Condition 1: cache COLD, 1 paragraph ===")
    domain_cache.clear_caches()
    t = _run_paragraph(store, sel_df, constraints_a)
    print(f"wall: {t:.3f}s")
    print(f"stats: {domain_cache.get_cache_stats()}")

    print("\n=== Condition 2: cache WARM, repeat 1 paragraph ===")
    t = _run_paragraph(store, sel_df, constraints_a)
    print(f"wall: {t:.3f}s")
    print(f"stats: {domain_cache.get_cache_stats()}")

    print("\n=== Condition 3: 5 paragraphs, same constraints ===")
    domain_cache.clear_caches()
    t0 = time.perf_counter()
    for _ in range(5):
        _run_paragraph(store, sel_df, constraints_a)
    total = time.perf_counter() - t0
    print(f"wall: {total:.3f}s ({total/5:.3f}s avg)")
    print(f"stats: {domain_cache.get_cache_stats()}")

    print("\n=== Condition 4: 5 paragraphs, alternating constraint sets ===")
    domain_cache.clear_caches()
    t0 = time.perf_counter()
    for i in range(5):
        cs = constraints_a if i % 2 == 0 else constraints_b
        _run_paragraph(store, sel_df, cs)
    total = time.perf_counter() - t0
    print(f"wall: {total:.3f}s ({total/5:.3f}s avg)")
    print(f"stats: {domain_cache.get_cache_stats()}")


if __name__ == "__main__":
    main()

[ ] Step 11.2: Run the bench

cd packages/generation && uv run python research/2026-05-07-sentence-generation-paradigms/bench_domain_cache.py

Expected: Four condition reports printed. Capture the numbers — they go in the spec.

[ ] Step 11.3: Append baseline numbers to the spec

Open docs/superpowers/specs/2026-05-08-phon-103-csp-domain-caching-design.md and append at the end:

## Empirical baseline (recorded 2026-05-08)

From `bench_domain_cache.py`:

| Condition | Wall-clock | spec_lexicon h/m/e | filtered_spec h/m/e | per_word_axes h/m/e |
|---|---|---|---|---|
| Cold, 1 paragraph | <FILL>s | <FILL> | <FILL> | <FILL> |
| Warm, 1 paragraph (repeat) | <FILL>s | <FILL> | <FILL> | <FILL> |
| 5 paragraphs, same constraints | <FILL>s total (<FILL>s avg) | <FILL> | <FILL> | <FILL> |
| 5 paragraphs, alternating | <FILL>s total (<FILL>s avg) | <FILL> | <FILL> | <FILL> |

Speedup: warm vs cold = <FILL>×; 5 same vs 5 alternating = <FILL>×.

Replace each <FILL> with the actual number from the bench output.

[ ] Step 11.4: Commit

git add packages/generation/research/2026-05-07-sentence-generation-paradigms/bench_domain_cache.py \
        docs/superpowers/specs/2026-05-08-phon-103-csp-domain-caching-design.md
git commit -m "$(cat <<'EOF'
PHON-103: bench script + recorded baseline

bench_domain_cache.py reports wall-clock + cache stats for 4
conditions. Baseline numbers folded into the design spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
EOF
)"

Done¶

After Task 11 commits, the cache implementation is feature-complete with tests + a bench. PHON-103 closes; PHON-104 (per-slot top-N pruning) is unblocked.

PHON-103 — CSP Domain Caching Implementation Plan¶

File map¶

Task 1: Bootstrap domain_cache.py with empty cache infrastructure¶

Task 2: Implement _hashable_constraints helper¶

Task 3: Implement get_spec_lexicon with WordStore fixture¶

Task 4: Implement get_filtered_spec¶

Task 5: Implement get_per_word_axes¶

Task 6: Add LRU eviction tests¶

Task 7: Wire spec_lexicon callsites in paradigm_3_csp.py¶

Task 8: Wire _resolve_domain_words and per_word_axes callsites in paradigm_3_csp.py¶

Task 9: Wire _filtered_domain and per_word_axes callsites in paragraph_csp.py¶

Task 10: Behavior test — paragraph composition reuses cached domain¶

Task 11: Bench script + record baseline¶

Done¶

Task 1: Bootstrap `domain_cache.py` with empty cache infrastructure¶

Task 2: Implement `_hashable_constraints` helper¶

Task 3: Implement `get_spec_lexicon` with WordStore fixture¶

Task 4: Implement `get_filtered_spec`¶

Task 5: Implement `get_per_word_axes`¶

Task 7: Wire `spec_lexicon` callsites in `paradigm_3_csp.py`¶

Task 8: Wire `_resolve_domain_words` and `per_word_axes` callsites in `paradigm_3_csp.py`¶

Task 9: Wire `_filtered_domain` and `per_word_axes` callsites in `paragraph_csp.py`¶