DocsFeature — Cache Intelligence

Feature — Cache Intelligence

Anthropic supports prompt caching for prefixes over 1,024 tokens. Cached tokens cost 90% less than uncached. Tokenistt identifies which prompts are cache-eligible and tracks hit rates.

How cache fingerprinting works

Tokenistt hashes the first 1,024 tokens of every prompt. When the same hash appears across requests, it flags the prefix as cache-eligible. If the prefix changes between requests, it warns that caching is being broken.

Cache hit rate dashboard

–Per-workspace cache hit % over time
–List of prompts breaking their own cache (prefix instability)
–Estimated savings if hit rate improves to 80%
–Prefix length recommendations per prompt pattern

# Cache report output
tokenistt cache-report --workspace agents-platform

workspace:     agents-platform
hit_rate:      34%           ← target: 80%+
top_miss:      summarize()   prefix changes every call (dynamic timestamp)
potential_savings: $48/mo    if hit rate reaches 80%

suggestion: move dynamic content to user turn, keep system prompt static

TIPThe single highest-ROI change for most teams: move all dynamic content (dates, user input) out of the system prompt so the static prefix can be cached consistently.