PHON-103 — CSP perf: domain caching by constraint-set hash¶
Date: 2026-05-08
Branch: feature/csp-iteration
Scope: spike-internal perf optimization (productionization to packages/generators/csp/ is PHON-109's scope)
Frame¶
The CSP spike's per-solve hot path repeats two costly Polars operations on every call: applying hard constraints to the spec lexicon (_resolve_domain_words) and building per-word soft-axis lookup tables (per_word_axes). Both are verb-independent — for a multi-verb paragraph composing 3 sentences with the same constraint set, the work is done 3× redundantly.
This ticket adds three module-level LRU caches keyed on the constraint set so this verb-independent work is computed once per (spec_id × constraints) tuple and reused across calls within a process lifetime.
Goal¶
For a 3-verb paragraph with shared constraints, expect 1 cache miss + 2 hits on each of spec_lexicon, filtered_spec, and per_word_axes. Wall-clock improvement reported via bench script; specific target deferred to empirical measurement (the existing wall-clock baseline is in notebook.md: 2.7s for 5 single-verb probes; paragraph composition demos are slower per paragraph_csp.py).
Cache must not change result semantics — top-K outputs and domain_trace lists must be identical to the uncached path.
Non-goals¶
- PMI lookup caching for nominal slots (separate optimization; current
_ADVMOD_PMI_CACHEalready handles adverb PMI). - Thread-safety locks (CPython GIL serializes; revisit if async scheduling enters).
- Cross-process / KV / Durable Object caching (deployment is single-process Cloudflare Container).
- Productionization move to
packages/generators/csp/(PHON-109 scope).
Architecture¶
A new module research/2026-05-07-sentence-generation-paradigms/domain_cache.py next to the existing spike files. Three module-level OrderedDict-backed LRU caches plus public wrapper functions.
research/2026-05-07-sentence-generation-paradigms/
├── constraint_surface.py (unchanged)
├── skeleton_csp.py (unchanged)
├── paradigm_3_csp.py (callsite swaps)
├── paragraph_csp.py (callsite swaps)
└── domain_cache.py (NEW)
Public functions exposed by domain_cache.py:
| Function | Signature | Wraps |
|---|---|---|
get_spec_lexicon |
(spec_id: str, store: WordStore) -> frozenset[str] |
spec_lexicon() |
get_filtered_spec |
(spec_words: frozenset[str], constraints, word_df) -> tuple[frozenset[str], list[dict]] |
_resolve_domain_words() and _filtered_domain() |
get_per_word_axes |
(soft_constraints, word_df) -> dict[str, dict[str, float]] |
per_word_axes() |
clear_caches |
() -> None |
reset all caches and stats |
get_cache_stats |
() -> dict |
snapshot of hits/misses/evictions per cache |
Cache keys & values¶
# spec_id → frozenset of admitted words (verb- and constraint-independent)
_SPEC_LEXICON_CACHE: OrderedDict[
tuple[str, int], # (spec_id, id(word_df))
frozenset[str],
]
# (spec_words, hard_constraints) → spec ∩ filter result
_FILTERED_SPEC_CACHE: OrderedDict[
tuple[frozenset[str], tuple[Constraint, ...], int], # (spec_words_frozenset, hard_constraints, id(word_df))
tuple[frozenset[str], tuple[dict, ...]], # (filtered_words, domain_trace)
]
# soft_constraints → per-word axis lookup tables
_PER_WORD_AXES_CACHE: OrderedDict[
tuple[tuple[Constraint, ...], int], # (soft_constraints, id(word_df))
dict[str, dict[str, float]], # axis_name → {word: score}
]
Constraint normalization ensures order-independent keys:
def _hashable_constraints(
constraints: Iterable[Constraint],
types: tuple[type, ...],
) -> tuple[Constraint, ...]:
"""Filter to relevant types and sort for stable hashing."""
relevant = [c for c in constraints if isinstance(c, types)]
return tuple(sorted(relevant, key=lambda c: (c.type, repr(c))))
filtered_spec keys filter to (ExcludeConstraint, BoundConstraint). per_word_axes keys filter to (IncludeConstraint, BoundBoostConstraint). Constraint dataclasses are already frozen=True → tuple-of-constraints is hashable as-is.
Verb is NOT a key. This is the property that makes paragraph composition cheap — multiple verbs with the same constraint set hit the same cache entry.
Why spec_words (content-keyed) instead of spec_id for filtered_spec: paragraph_csp callers compose multi-spec domains (e.g., spec1 | spec6) — there's no single spec_id to key on. Keying on the frozenset content directly (rather than id(spec_words)) is correct because:
- frozenset is hashable; Python caches the hash on the object after first computation, so repeated lookups are O(1).
- Any caller that constructs the same content (e.g., same spec1 | spec6 union) gets a cache hit regardless of whether the object is the same reference or a fresh set.union() result.
- This fixes a correctness hazard from the original id(spec_words) key: Python can recycle a GC'd id for an unrelated frozenset, causing the cache to return wrong data.
Cross-request paragraph hits now also work: the bench's condition 3 (5 same-constraint calls) shows 4 filtered_spec hits because the spec1 | spec6 content is identical across calls even though each call creates a new frozenset object.
Eviction & sizing¶
LRU via OrderedDict.move_to_end(key) on hit, popitem(last=False) on miss-and-full.
_MAX_SPEC_LEXICON = 8 # only ~6 spec_ids exist per PHON-95
_MAX_FILTERED_SPEC = 64
_MAX_PER_WORD_AXES = 64
Memory rationale:
- spec_lexicon value: frozenset[str] of ~5K–30K words per spec → ~250KB–1.5MB. 8 entries × ~1MB ≈ 8MB.
- filtered_spec value: similar size, post-hard-filter sets are usually 1K–30K words → 50KB–1.5MB. 64 entries × ~1MB ≈ 60MB worst realistic.
- per_word_axes value: per-axis lookup tables; a single include axis touches words containing target phoneme (~10K words → ~500KB). Multi-axis cases scale linearly. 64 entries × ~1MB ≈ 60MB worst realistic.
64 entries handles ~10–20 rounds of constraint iteration with branch-and-undo patterns (toggle a constraint off and back on → hits cache).
Why no TTL: Underlying data (word_df, per-spec_id word sets) is immutable singletons loaded once at process startup. Cache lifetime is process lifetime. Cloudflare Containers cold-start clears the cache, but rebuild cost is bounded by the first solve in a session and amortizes thereafter.
Observability¶
_CACHE_STATS: dict[str, dict[str, int]] = {
"spec_lexicon": {"hits": 0, "misses": 0, "evictions": 0},
"filtered_spec": {"hits": 0, "misses": 0, "evictions": 0},
"per_word_axes": {"hits": 0, "misses": 0, "evictions": 0},
}
def get_cache_stats() -> dict:
"""Snapshot of hits/misses/evictions per cache."""
return {k: dict(v) for k, v in _CACHE_STATS.items()}
def clear_caches() -> None:
"""Drop all entries and reset stats. Used by tests."""
Callsite changes¶
| File | Function/callsite | Change |
|---|---|---|
paradigm_3_csp.py |
spec_lexicon() definition |
Becomes thin wrapper returning frozenset from get_spec_lexicon(). Callers downstream that did set & spec_words continue to work (set/frozenset interop). |
paradigm_3_csp.py |
_resolve_domain_words() body |
Replaced with call to get_filtered_spec(). Signature unchanged. |
paradigm_3_csp.py |
solve() per_word_axes call |
Routes through get_per_word_axes(). |
paragraph_csp.py |
_filtered_domain() body |
Replaced with call to get_filtered_spec(). Signature unchanged. |
paragraph_csp.py |
solve_paragraph() per_word_axes call |
Routes through get_per_word_axes(). |
Public solve() / solve_paragraph() signatures unchanged.
Edge cases¶
| Case | Behavior |
|---|---|
word_df is None |
Pass-through to existing fallback (returns spec_words, []). No cache lookup, no insertion. |
| Empty constraints list | Cache key is (); cached normally. Subsequent calls hit. |
Caller passes list[Constraint] instead of tuple |
Wrapper normalizes via _hashable_constraints(). Caller signature unchanged. |
| Hypothetical non-frozen Constraint subclass added later | Runtime guard assert hash(c) is not None in normalizer surfaces the bug at first call. |
spec_words mutated by caller after caching |
Cache stores a frozenset (immutable). Wrapper internally converts to frozenset before insert. Caller's set unchanged. |
Caller mutates a dict in returned domain_trace |
Wrapper returns a new list of new dicts on each retrieval. Cached version stays clean. |
id(word_df) collision (df A GC'd, B allocated at same address) |
Theoretical only. Tests use clear_caches() in setup. Production has one immutable df → not a real risk. |
| Concurrent reads/writes (unlikely under GIL) | Last write wins, identical results, no corruption. Lock added if/when async scheduling enters. |
Testing¶
New test_domain_cache.py next to other spike tests. Three buckets:
Correctness:
- test_filtered_spec_returns_same_set_as_uncached — hit/miss must produce identical (frozenset, trace) tuples.
- test_constraint_order_invariance — reordered constraint list yields the same cache entry (stats: 1 miss + 1 hit).
- test_spec_lexicon_returns_same_set — cached vs uncached spec_lexicon identical.
- test_per_word_axes_returns_same_dicts — cached vs uncached per_word_axes identical.
- test_solve_results_unchanged_with_cache — full solve() top-K is identical with cache cold vs warm.
Behavior:
- test_paragraph_composition_reuses_filtered_spec — 3-verb paragraph with shared constraints → 1 miss, 2 hits on filtered_spec; same on per_word_axes.
- test_lru_eviction_at_max — 65th distinct entry evicts the LRU one (stats: 1 eviction).
- test_lru_move_to_end_on_hit — re-accessing oldest entry promotes it; subsequent eviction targets a different entry.
- test_clear_caches_resets_stats_and_entries — after clear, all stats are 0 and lookups miss.
Perf:
- bench_domain_cache.py (script, not pytest) — wall-clock for 4 conditions: cold cache, warm cache, 5 paragraphs same constraints, 5 paragraphs varying constraints. Reports stats. Baseline number committed in this spec after first run.
Implementation sketch¶
# domain_cache.py
from collections import OrderedDict
from typing import Iterable
import polars as pl
from phonolex_data.runtime.store import WordStore
from phonolex_generators.cfg_seed.spec_filters import SPEC_FILTERS
from constraint_surface import (
Constraint, ExcludeConstraint, BoundConstraint,
IncludeConstraint, BoundBoostConstraint,
hard_filter_expr, per_word_axes as _per_word_axes_uncached, domain_trace,
)
_HARD_TYPES = (ExcludeConstraint, BoundConstraint)
_SOFT_TYPES = (IncludeConstraint, BoundBoostConstraint)
_MAX_SPEC_LEXICON = 8
_MAX_FILTERED_SPEC = 64
_MAX_PER_WORD_AXES = 64
_SPEC_LEXICON_CACHE: OrderedDict = OrderedDict()
_FILTERED_SPEC_CACHE: OrderedDict = OrderedDict()
_PER_WORD_AXES_CACHE: OrderedDict = OrderedDict()
_CACHE_STATS = {
"spec_lexicon": {"hits": 0, "misses": 0, "evictions": 0},
"filtered_spec": {"hits": 0, "misses": 0, "evictions": 0},
"per_word_axes": {"hits": 0, "misses": 0, "evictions": 0},
}
def _hashable_constraints(
constraints: Iterable[Constraint], types: tuple[type, ...],
) -> tuple[Constraint, ...]:
relevant = [c for c in constraints if isinstance(c, types)]
return tuple(sorted(relevant, key=lambda c: (c.type, repr(c))))
def _lru_get_or_compute(cache, max_size, stat_key, key, compute):
if key in cache:
cache.move_to_end(key)
_CACHE_STATS[stat_key]["hits"] += 1
return cache[key]
_CACHE_STATS[stat_key]["misses"] += 1
value = compute()
cache[key] = value
if len(cache) > max_size:
cache.popitem(last=False)
_CACHE_STATS[stat_key]["evictions"] += 1
return value
def get_spec_lexicon(spec_id: str, store: WordStore) -> frozenset[str]:
"""Cached spec_lexicon. Replaces the local definition in paradigm_3_csp.py."""
key = (spec_id, id(store.df))
def compute() -> frozenset[str]:
return frozenset(
store.subset(SPEC_FILTERS[spec_id])
.get_column("word")
.str.to_lowercase()
.to_list()
)
return _lru_get_or_compute(
_SPEC_LEXICON_CACHE, _MAX_SPEC_LEXICON, "spec_lexicon", key, compute,
)
def get_filtered_spec(
spec_words: frozenset[str],
constraints: Iterable[Constraint],
word_df: pl.DataFrame | None,
) -> tuple[frozenset[str], list[dict]]:
"""Apply hard constraints to spec_words.
Caller must pass `spec_words` as a frozenset and hold the same reference
across calls that should hit. `get_spec_lexicon` returns a stable cached
frozenset; paragraph_csp callers compose `spec1 | spec6` once per request
and reuse the union across the verb loop.
"""
if word_df is None:
return spec_words, []
hard = _hashable_constraints(constraints, _HARD_TYPES)
key = (spec_words, hard, id(word_df))
def compute() -> tuple[frozenset[str], tuple[dict, ...]]:
if not hard:
return spec_words, ()
spec_df = word_df.filter(pl.col("word").is_in(list(spec_words)))
expr = hard_filter_expr(list(hard))
if expr is None:
return spec_words, ()
trace = tuple(domain_trace(list(hard), spec_df))
filtered = frozenset(spec_df.filter(expr).get_column("word").to_list())
return filtered, trace
filtered, trace = _lru_get_or_compute(
_FILTERED_SPEC_CACHE, _MAX_FILTERED_SPEC, "filtered_spec", key, compute,
)
return filtered, [dict(t) for t in trace]
def get_per_word_axes(
constraints: Iterable[Constraint],
word_df: pl.DataFrame | None,
) -> dict[str, dict[str, float]]:
if word_df is None:
return {}
soft = _hashable_constraints(constraints, _SOFT_TYPES)
key = (soft, id(word_df))
return _lru_get_or_compute(
_PER_WORD_AXES_CACHE, _MAX_PER_WORD_AXES, "per_word_axes", key,
lambda: _per_word_axes_uncached(list(soft), word_df),
)
def clear_caches() -> None:
_SPEC_LEXICON_CACHE.clear()
_FILTERED_SPEC_CACHE.clear()
_PER_WORD_AXES_CACHE.clear()
for stats in _CACHE_STATS.values():
for k in stats:
stats[k] = 0
def get_cache_stats() -> dict:
return {k: dict(v) for k, v in _CACHE_STATS.items()}
Open questions¶
None.
References¶
- Spike code:
packages/generation/research/2026-05-07-sentence-generation-paradigms/ - Existing cache patterns the design mirrors:
_ADVMOD_PMI_CACHEand_ADVMOD_BAND_FALLBACKinskeleton_csp.py;_PHONEME_CACHEinconstraint_surface.py. - Productionization: PHON-109 (blocked on PHON-103–108).
Empirical baseline (recorded 2026-05-08, updated post-Issue-1-fix)¶
From bench_domain_cache.py:
| Condition | Wall-clock | spec_lexicon h/m/e | filtered_spec h/m/e | per_word_axes h/m/e |
|---|---|---|---|---|
| Cold, 1 paragraph | 0.867s | 0/2/0 | 0/1/0 | 0/1/0 |
| Warm, 1 paragraph (repeat) | 0.515s | 2/2/0 | 1/1/0 | 1/1/0 |
| 5 paragraphs, same constraints | 2.430s total (0.486s avg) | 8/2/0 | 4/1/0 | 4/1/0 |
| 5 paragraphs, alternating | 2.418s total (0.484s avg) | 8/2/0 | 3/2/0 | 3/2/0 |
Speedup: warm vs cold = 1.68×; 5 same vs 5 alternating = 1.005× (still roughly flat — see note).
After Issue 1 fix (content-keyed filtered_spec cache):
- Condition 2 (warm repeat): filtered_spec now shows 1 hit (was 0) — the same spec1 | spec6 content matches even across fresh frozenset objects created by set.union() calls in _run_paragraph.
- Condition 3 (5 same): filtered_spec now shows 4 hits (was 0) — cross-call hits for identical spec content. per_word_axes hits unchanged at 4.
- The 5-same vs 5-alternating delta remains small (~0.5%) because both conditions exercise the same number of total solves and the bench paragraph objects are small (3 verbs each); the marginal time saved per filtered_spec hit is under 2ms at this scale.
Note on same-vs-alternating wash: the bench calls _run_paragraph 5× in an outer loop rather than one long multi-verb paragraph, so per-call overhead (Polars planning) dominates. The filtered_spec cache benefit is real and accumulates inside solve_paragraph across the verb loop; it just doesn't show dramatically at N=5 paragraph-calls scale.