Skip to content

PHON-103 — CSP perf: domain caching by constraint-set hash

Date: 2026-05-08 Branch: feature/csp-iteration Scope: spike-internal perf optimization (productionization to packages/generators/csp/ is PHON-109's scope)

Frame

The CSP spike's per-solve hot path repeats two costly Polars operations on every call: applying hard constraints to the spec lexicon (_resolve_domain_words) and building per-word soft-axis lookup tables (per_word_axes). Both are verb-independent — for a multi-verb paragraph composing 3 sentences with the same constraint set, the work is done 3× redundantly.

This ticket adds three module-level LRU caches keyed on the constraint set so this verb-independent work is computed once per (spec_id × constraints) tuple and reused across calls within a process lifetime.

Goal

For a 3-verb paragraph with shared constraints, expect 1 cache miss + 2 hits on each of spec_lexicon, filtered_spec, and per_word_axes. Wall-clock improvement reported via bench script; specific target deferred to empirical measurement (the existing wall-clock baseline is in notebook.md: 2.7s for 5 single-verb probes; paragraph composition demos are slower per paragraph_csp.py).

Cache must not change result semantics — top-K outputs and domain_trace lists must be identical to the uncached path.

Non-goals

  • PMI lookup caching for nominal slots (separate optimization; current _ADVMOD_PMI_CACHE already handles adverb PMI).
  • Thread-safety locks (CPython GIL serializes; revisit if async scheduling enters).
  • Cross-process / KV / Durable Object caching (deployment is single-process Cloudflare Container).
  • Productionization move to packages/generators/csp/ (PHON-109 scope).

Architecture

A new module research/2026-05-07-sentence-generation-paradigms/domain_cache.py next to the existing spike files. Three module-level OrderedDict-backed LRU caches plus public wrapper functions.

research/2026-05-07-sentence-generation-paradigms/
├── constraint_surface.py      (unchanged)
├── skeleton_csp.py            (unchanged)
├── paradigm_3_csp.py          (callsite swaps)
├── paragraph_csp.py           (callsite swaps)
└── domain_cache.py            (NEW)

Public functions exposed by domain_cache.py:

Function Signature Wraps
get_spec_lexicon (spec_id: str, store: WordStore) -> frozenset[str] spec_lexicon()
get_filtered_spec (spec_words: frozenset[str], constraints, word_df) -> tuple[frozenset[str], list[dict]] _resolve_domain_words() and _filtered_domain()
get_per_word_axes (soft_constraints, word_df) -> dict[str, dict[str, float]] per_word_axes()
clear_caches () -> None reset all caches and stats
get_cache_stats () -> dict snapshot of hits/misses/evictions per cache

Cache keys & values

# spec_id → frozenset of admitted words (verb- and constraint-independent)
_SPEC_LEXICON_CACHE: OrderedDict[
    tuple[str, int],          # (spec_id, id(word_df))
    frozenset[str],
]

# (spec_words, hard_constraints) → spec ∩ filter result
_FILTERED_SPEC_CACHE: OrderedDict[
    tuple[frozenset[str], tuple[Constraint, ...], int],   # (spec_words_frozenset, hard_constraints, id(word_df))
    tuple[frozenset[str], tuple[dict, ...]],              # (filtered_words, domain_trace)
]

# soft_constraints → per-word axis lookup tables
_PER_WORD_AXES_CACHE: OrderedDict[
    tuple[tuple[Constraint, ...], int],        # (soft_constraints, id(word_df))
    dict[str, dict[str, float]],               # axis_name → {word: score}
]

Constraint normalization ensures order-independent keys:

def _hashable_constraints(
    constraints: Iterable[Constraint],
    types: tuple[type, ...],
) -> tuple[Constraint, ...]:
    """Filter to relevant types and sort for stable hashing."""
    relevant = [c for c in constraints if isinstance(c, types)]
    return tuple(sorted(relevant, key=lambda c: (c.type, repr(c))))

filtered_spec keys filter to (ExcludeConstraint, BoundConstraint). per_word_axes keys filter to (IncludeConstraint, BoundBoostConstraint). Constraint dataclasses are already frozen=True → tuple-of-constraints is hashable as-is.

Verb is NOT a key. This is the property that makes paragraph composition cheap — multiple verbs with the same constraint set hit the same cache entry.

Why spec_words (content-keyed) instead of spec_id for filtered_spec: paragraph_csp callers compose multi-spec domains (e.g., spec1 | spec6) — there's no single spec_id to key on. Keying on the frozenset content directly (rather than id(spec_words)) is correct because: - frozenset is hashable; Python caches the hash on the object after first computation, so repeated lookups are O(1). - Any caller that constructs the same content (e.g., same spec1 | spec6 union) gets a cache hit regardless of whether the object is the same reference or a fresh set.union() result. - This fixes a correctness hazard from the original id(spec_words) key: Python can recycle a GC'd id for an unrelated frozenset, causing the cache to return wrong data.

Cross-request paragraph hits now also work: the bench's condition 3 (5 same-constraint calls) shows 4 filtered_spec hits because the spec1 | spec6 content is identical across calls even though each call creates a new frozenset object.

Eviction & sizing

LRU via OrderedDict.move_to_end(key) on hit, popitem(last=False) on miss-and-full.

_MAX_SPEC_LEXICON = 8       # only ~6 spec_ids exist per PHON-95
_MAX_FILTERED_SPEC = 64
_MAX_PER_WORD_AXES = 64

Memory rationale: - spec_lexicon value: frozenset[str] of ~5K–30K words per spec → ~250KB–1.5MB. 8 entries × ~1MB ≈ 8MB. - filtered_spec value: similar size, post-hard-filter sets are usually 1K–30K words → 50KB–1.5MB. 64 entries × ~1MB ≈ 60MB worst realistic. - per_word_axes value: per-axis lookup tables; a single include axis touches words containing target phoneme (~10K words → ~500KB). Multi-axis cases scale linearly. 64 entries × ~1MB ≈ 60MB worst realistic.

64 entries handles ~10–20 rounds of constraint iteration with branch-and-undo patterns (toggle a constraint off and back on → hits cache).

Why no TTL: Underlying data (word_df, per-spec_id word sets) is immutable singletons loaded once at process startup. Cache lifetime is process lifetime. Cloudflare Containers cold-start clears the cache, but rebuild cost is bounded by the first solve in a session and amortizes thereafter.

Observability

_CACHE_STATS: dict[str, dict[str, int]] = {
    "spec_lexicon":  {"hits": 0, "misses": 0, "evictions": 0},
    "filtered_spec": {"hits": 0, "misses": 0, "evictions": 0},
    "per_word_axes": {"hits": 0, "misses": 0, "evictions": 0},
}

def get_cache_stats() -> dict:
    """Snapshot of hits/misses/evictions per cache."""
    return {k: dict(v) for k, v in _CACHE_STATS.items()}

def clear_caches() -> None:
    """Drop all entries and reset stats. Used by tests."""

Callsite changes

File Function/callsite Change
paradigm_3_csp.py spec_lexicon() definition Becomes thin wrapper returning frozenset from get_spec_lexicon(). Callers downstream that did set & spec_words continue to work (set/frozenset interop).
paradigm_3_csp.py _resolve_domain_words() body Replaced with call to get_filtered_spec(). Signature unchanged.
paradigm_3_csp.py solve() per_word_axes call Routes through get_per_word_axes().
paragraph_csp.py _filtered_domain() body Replaced with call to get_filtered_spec(). Signature unchanged.
paragraph_csp.py solve_paragraph() per_word_axes call Routes through get_per_word_axes().

Public solve() / solve_paragraph() signatures unchanged.

Edge cases

Case Behavior
word_df is None Pass-through to existing fallback (returns spec_words, []). No cache lookup, no insertion.
Empty constraints list Cache key is (); cached normally. Subsequent calls hit.
Caller passes list[Constraint] instead of tuple Wrapper normalizes via _hashable_constraints(). Caller signature unchanged.
Hypothetical non-frozen Constraint subclass added later Runtime guard assert hash(c) is not None in normalizer surfaces the bug at first call.
spec_words mutated by caller after caching Cache stores a frozenset (immutable). Wrapper internally converts to frozenset before insert. Caller's set unchanged.
Caller mutates a dict in returned domain_trace Wrapper returns a new list of new dicts on each retrieval. Cached version stays clean.
id(word_df) collision (df A GC'd, B allocated at same address) Theoretical only. Tests use clear_caches() in setup. Production has one immutable df → not a real risk.
Concurrent reads/writes (unlikely under GIL) Last write wins, identical results, no corruption. Lock added if/when async scheduling enters.

Testing

New test_domain_cache.py next to other spike tests. Three buckets:

Correctness: - test_filtered_spec_returns_same_set_as_uncached — hit/miss must produce identical (frozenset, trace) tuples. - test_constraint_order_invariance — reordered constraint list yields the same cache entry (stats: 1 miss + 1 hit). - test_spec_lexicon_returns_same_set — cached vs uncached spec_lexicon identical. - test_per_word_axes_returns_same_dicts — cached vs uncached per_word_axes identical. - test_solve_results_unchanged_with_cache — full solve() top-K is identical with cache cold vs warm.

Behavior: - test_paragraph_composition_reuses_filtered_spec — 3-verb paragraph with shared constraints → 1 miss, 2 hits on filtered_spec; same on per_word_axes. - test_lru_eviction_at_max — 65th distinct entry evicts the LRU one (stats: 1 eviction). - test_lru_move_to_end_on_hit — re-accessing oldest entry promotes it; subsequent eviction targets a different entry. - test_clear_caches_resets_stats_and_entries — after clear, all stats are 0 and lookups miss.

Perf: - bench_domain_cache.py (script, not pytest) — wall-clock for 4 conditions: cold cache, warm cache, 5 paragraphs same constraints, 5 paragraphs varying constraints. Reports stats. Baseline number committed in this spec after first run.

Implementation sketch

# domain_cache.py
from collections import OrderedDict
from typing import Iterable
import polars as pl

from phonolex_data.runtime.store import WordStore
from phonolex_generators.cfg_seed.spec_filters import SPEC_FILTERS

from constraint_surface import (
    Constraint, ExcludeConstraint, BoundConstraint,
    IncludeConstraint, BoundBoostConstraint,
    hard_filter_expr, per_word_axes as _per_word_axes_uncached, domain_trace,
)

_HARD_TYPES = (ExcludeConstraint, BoundConstraint)
_SOFT_TYPES = (IncludeConstraint, BoundBoostConstraint)

_MAX_SPEC_LEXICON = 8
_MAX_FILTERED_SPEC = 64
_MAX_PER_WORD_AXES = 64

_SPEC_LEXICON_CACHE: OrderedDict = OrderedDict()
_FILTERED_SPEC_CACHE: OrderedDict = OrderedDict()
_PER_WORD_AXES_CACHE: OrderedDict = OrderedDict()

_CACHE_STATS = {
    "spec_lexicon":  {"hits": 0, "misses": 0, "evictions": 0},
    "filtered_spec": {"hits": 0, "misses": 0, "evictions": 0},
    "per_word_axes": {"hits": 0, "misses": 0, "evictions": 0},
}


def _hashable_constraints(
    constraints: Iterable[Constraint], types: tuple[type, ...],
) -> tuple[Constraint, ...]:
    relevant = [c for c in constraints if isinstance(c, types)]
    return tuple(sorted(relevant, key=lambda c: (c.type, repr(c))))


def _lru_get_or_compute(cache, max_size, stat_key, key, compute):
    if key in cache:
        cache.move_to_end(key)
        _CACHE_STATS[stat_key]["hits"] += 1
        return cache[key]
    _CACHE_STATS[stat_key]["misses"] += 1
    value = compute()
    cache[key] = value
    if len(cache) > max_size:
        cache.popitem(last=False)
        _CACHE_STATS[stat_key]["evictions"] += 1
    return value


def get_spec_lexicon(spec_id: str, store: WordStore) -> frozenset[str]:
    """Cached spec_lexicon. Replaces the local definition in paradigm_3_csp.py."""
    key = (spec_id, id(store.df))

    def compute() -> frozenset[str]:
        return frozenset(
            store.subset(SPEC_FILTERS[spec_id])
            .get_column("word")
            .str.to_lowercase()
            .to_list()
        )

    return _lru_get_or_compute(
        _SPEC_LEXICON_CACHE, _MAX_SPEC_LEXICON, "spec_lexicon", key, compute,
    )


def get_filtered_spec(
    spec_words: frozenset[str],
    constraints: Iterable[Constraint],
    word_df: pl.DataFrame | None,
) -> tuple[frozenset[str], list[dict]]:
    """Apply hard constraints to spec_words.

    Caller must pass `spec_words` as a frozenset and hold the same reference
    across calls that should hit. `get_spec_lexicon` returns a stable cached
    frozenset; paragraph_csp callers compose `spec1 | spec6` once per request
    and reuse the union across the verb loop.
    """
    if word_df is None:
        return spec_words, []
    hard = _hashable_constraints(constraints, _HARD_TYPES)
    key = (spec_words, hard, id(word_df))

    def compute() -> tuple[frozenset[str], tuple[dict, ...]]:
        if not hard:
            return spec_words, ()
        spec_df = word_df.filter(pl.col("word").is_in(list(spec_words)))
        expr = hard_filter_expr(list(hard))
        if expr is None:
            return spec_words, ()
        trace = tuple(domain_trace(list(hard), spec_df))
        filtered = frozenset(spec_df.filter(expr).get_column("word").to_list())
        return filtered, trace

    filtered, trace = _lru_get_or_compute(
        _FILTERED_SPEC_CACHE, _MAX_FILTERED_SPEC, "filtered_spec", key, compute,
    )
    return filtered, [dict(t) for t in trace]


def get_per_word_axes(
    constraints: Iterable[Constraint],
    word_df: pl.DataFrame | None,
) -> dict[str, dict[str, float]]:
    if word_df is None:
        return {}
    soft = _hashable_constraints(constraints, _SOFT_TYPES)
    key = (soft, id(word_df))
    return _lru_get_or_compute(
        _PER_WORD_AXES_CACHE, _MAX_PER_WORD_AXES, "per_word_axes", key,
        lambda: _per_word_axes_uncached(list(soft), word_df),
    )


def clear_caches() -> None:
    _SPEC_LEXICON_CACHE.clear()
    _FILTERED_SPEC_CACHE.clear()
    _PER_WORD_AXES_CACHE.clear()
    for stats in _CACHE_STATS.values():
        for k in stats:
            stats[k] = 0


def get_cache_stats() -> dict:
    return {k: dict(v) for k, v in _CACHE_STATS.items()}

Open questions

None.

References

  • Spike code: packages/generation/research/2026-05-07-sentence-generation-paradigms/
  • Existing cache patterns the design mirrors: _ADVMOD_PMI_CACHE and _ADVMOD_BAND_FALLBACK in skeleton_csp.py; _PHONEME_CACHE in constraint_surface.py.
  • Productionization: PHON-109 (blocked on PHON-103–108).

Empirical baseline (recorded 2026-05-08, updated post-Issue-1-fix)

From bench_domain_cache.py:

Condition Wall-clock spec_lexicon h/m/e filtered_spec h/m/e per_word_axes h/m/e
Cold, 1 paragraph 0.867s 0/2/0 0/1/0 0/1/0
Warm, 1 paragraph (repeat) 0.515s 2/2/0 1/1/0 1/1/0
5 paragraphs, same constraints 2.430s total (0.486s avg) 8/2/0 4/1/0 4/1/0
5 paragraphs, alternating 2.418s total (0.484s avg) 8/2/0 3/2/0 3/2/0

Speedup: warm vs cold = 1.68×; 5 same vs 5 alternating = 1.005× (still roughly flat — see note).

After Issue 1 fix (content-keyed filtered_spec cache): - Condition 2 (warm repeat): filtered_spec now shows 1 hit (was 0) — the same spec1 | spec6 content matches even across fresh frozenset objects created by set.union() calls in _run_paragraph. - Condition 3 (5 same): filtered_spec now shows 4 hits (was 0) — cross-call hits for identical spec content. per_word_axes hits unchanged at 4. - The 5-same vs 5-alternating delta remains small (~0.5%) because both conditions exercise the same number of total solves and the bench paragraph objects are small (3 verbs each); the marginal time saved per filtered_spec hit is under 2ms at this scale.

Note on same-vs-alternating wash: the bench calls _run_paragraph 5× in an outer loop rather than one long multi-verb paragraph, so per-call overhead (Polars planning) dominates. The filtered_spec cache benefit is real and accumulates inside solve_paragraph across the verb loop; it just doesn't show dramatically at N=5 paragraph-calls scale.