vllm.v1.kv_offload.reuse_manager ¶
Reuse-frequency gating for CPU KV-cache offload stores.
FilterReusedOffloadingManager — OffloadingManager decorator that skips storing blocks that have not yet been seen enough times.
FilterReusedOffloadingManager ¶
Bases: OffloadingManager
An :class:OffloadingManager decorator that skips storing blocks whose reuse frequency is below store_threshold.
All methods are delegated to the backing manager. Two methods are intercepted:
lookup— records each visited key in an internal LRU counter.prepare_store— filters out keys that have not yet crossed the threshold before calling the backingprepare_store.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backing | OffloadingManager | The underlying | required |
store_threshold | int | A block must be seen at least this many times in | 2 |
max_tracker_size | int | Maximum entries in the internal tracker's LRU table. | 64000 |
Source code in vllm/v1/kv_offload/reuse_manager.py
lookup ¶
Record each key, then delegate lookup to backing manager.
Source code in vllm/v1/kv_offload/reuse_manager.py
prepare_store ¶
prepare_store(
keys: Iterable[OffloadKey],
) -> PrepareStoreOutput | None
Filter out blocks below threshold, then delegate to backing.
Filtering is evaluated before calling the backing manager's prepare_store so that blocks that would be skipped do not consume any CPU offload capacity.