Your AI infrastructure
is leaking money.

Tokenistt — AI cost management startup and LLM spend observability platform for engineering teams. Monitor token usage, reduce Claude API costs, and optimize prompts before production.

Tokenistt gives engineering teams real-time visibility into token usage, prompt inefficiencies, caching opportunities and LLM spend — before requests hit production.

DEPLOYS WITH
Claude DesktopCursorWindsurfVS CodeMCP
tokenistt · ops · acme-corp
liveregion us-east-1
REQUESTS / MIN
912
↑ 4.2%
TOKENS / SEC
2,530
p99 318ms
SPEND TODAY
$48.57
projected $58/d
SAVED TODAY
$35.36
−40% vs base
TOKEN THROUGHPUT · LAST 18m1m bins
MODEL ROUTING · LIVE
haiku-4.562%
sonnet-4.528%
opus-4.510%
EVENT STREAM
14:02:11spend anomaly · billing-svc+38%
14:01:47cache hit · agent-router−$0.08
14:01:22rewrite applied · summarize()−68 tk
WORKSPACES · 24h
agents-platform$124+12%
support-ai$68−8%
doc-intelligence$42+3%
sql-copilot$24−24%
all systems nominaluptime 99.99%SOC 2 Type II
14 active workspaces
Powering AI infrastructure at engineering teams
VANTAGESTRATANORTHWINDOBLIQUEPARALLEL/8KINESISAXON
§ ASpend Observability

Every dollar. Every team. Every minute.

One pane of glass for AI spend across your organization. Real-time attribution by workspace, model, route, and request — with anomaly detection on every metric.

SPEND HEATMAP · last 7 days
Token cost by hour of day
peak: Tue 14:00 · $14.20/h
Mon
Tue
Wed
Thu
Fri
Sat
Sun
0
6
12
18
low
high
SPEND BY MODEL · 30d
$1,284 routed
sonnet-4.5$578 · 45%
opus-4.5$411 · 32%
haiku-4.5$205 · 16%
embeddings$90 · 7%
recommended re-route−$420 / mo
WORKSPACE ATTRIBUTION · MTD
6 workspaces · $1,284 total
WorkspaceSpend (MTD)7d trendΔ wowStatus
agents-platform
$539
+18%
ANOMALY
support-ai
$270
−12%
● ok
doc-intelligence
$180
+4%
● ok
sql-copilot
$141
−24%
● ok
internal-tools
$90
+1%
● ok
experiments
$64
+92%
ANOMALY
§ BInfrastructure Monitoring

Live token streams. Forensic-grade traces.

Every request your apps send to Claude is captured, classified, costed, and indexed. Slice by route, by user, by model — answer cost questions in seconds, not days.

TOKENS / SEC
2,706
REQ / MIN
184
↑ 4.2%
P50 LAT
142ms
P99 LAT
318ms
SLO 500ms
ERROR RATE
0.04%
↓ 12%
TOKEN THROUGHPUT · last 60s · 1s bins
in out cached
REQUEST LOG · live · acme/agents-platform
streaming
idroutemodeltkcostlatcache
req_4f81aagents/runop887$0.0038312ms
req_4f8a3support/triagesn332$0.0187386ms
req_4f92cdocs/answersn1840$0.0131216ms
req_4f9b5sql/translateop312$0.0044266ms
req_4fa3eagents/runhk1594$0.0189554ms
req_4fac7support/triagehk1120$0.0160345ms
req_4fb50docs/answerhk815$0.0031602ms
req_4fbd9sql/translatehk516$0.0048230ms
req_4fc62agents/runhk1476$0.0144670ms
req_4fcebsupport/triagesn434$0.0056422ms
req_4fd74docs/answersn417$0.0037246ms
req_4fdfdsql/translateop1091$0.0087220ms
req_4fe86agents/runhk641$0.0102337ms
req_4ff0fsupport/triageop280$0.0047629ms
req_4ff98docs/answerop286$0.0108220ms
req_50021sql/translatehk446$0.0153575ms
req_500aaagents/runhk1255$0.0035119ms
req_50133support/triagesn1359$0.0119342ms
req_501bcdocs/answerhk1225$0.0110383ms
req_50245sql/translateop1942$0.0118662ms
30D COST FORECAST
$1,842
baseline · no optimization
$691
with tokenistt · −62.5%
SLO COMPLIANCE · 90d
p99 < 500ms99.2%
cost / req SLA96.8%
cache hit ≥ 80%94.2%
error rate < 0.1%99.96%
§ 02Prompt Intelligence

Inspect every token across every workload.

Tokenistt parses every prompt routed through your infrastructure — input cost, output prediction, cache eligibility, model fit — in under 2ms, at the edge of every request.

analyzed · 1.2ms
01
02
03
04
05
06
You are a highly intelligent and helpful AI assistant. Please carefully read the following customer support ticket and analyze it thoroughly. I would really appreciate it if you could extract the priority level, category, sentiment, and provide a suggested response.

Customer message:
{{message}}

Please respond in valid JSON.
328 chars · 6 lines● waste  ● optimize  ● cache
Input tokens
91
3.6 chars / tk
Output (est.)
122
p50 · sonnet-4.5
Total cost
$0.00210
per request
Best model
haiku-4.5
−83% cost · ≥97% quality
Context
0.05%
91 / 200k
Cache
● eligible
≥1024 tk threshold
Monthly spend
$429
@ 6,800 req/d
Optimization
41 / 100
high waste detected
Token distribution
Where your tokens go
91 tk
system
38%
context
22%
instructions
28%
output
12%
Cost by model
Same prompt, all models
haiku-4.5 ● BEST$0.0008
sonnet-4.5 $0.0034
opus-4.5 $0.0180
30-day savings
If optimized today
$266
vs $429 unoptimized
§ 03Context Bloat Detection

Production prompts leak 60–80% of every dollar.

Tokenistt classifies waste at the byte level across your entire fleet — repeated headers, ungrounded context, verbose framing, unconstrained output.

YOUR PROMPT · highlighted
support-ticket.md · 2,144 tokens · $0.0064/req · $86/mo
waste 1,418 tk keep 726 tk
system: You are a highly intelligent and exceptionally capable AI assistant specializing in customer support analysis. Please take your time and carefully read through the entire ticket below before responding. Extract: priority, category, sentiment, response. I would really appreciate it if you could format your response as JSON. Thank you so much for your help with this. context: <full company handbook · 11,200 tokens · 2.3% referenced> ticket: {{message}}
REPEATED SYSTEM PROMPT
System block re-sent on every call
You're shipping 1,840 tokens of identical instructions on every request. The cache is right there.
−71%1,840 tk/req
OVERSIZED CONTEXT
Stuffing the entire document
Only 2.3% of injected context is referenced. Targeted retrieval cuts this by 38×.
−96%12,480 tk/req
UNNECESSARY VERBOSITY
Politeness padding & filler
"Please carefully", "I would really appreciate", "thank you so much" → 38 wasted tokens per call.
−22%38 tk/req
INEFFICIENT OUTPUT
No max_tokens, no schema
The model rambles 480 tokens when 60 would do. JSON schema mode would constrain it.
−87%420 tk/req
AGGREGATE WASTE · THIS PROMPT
This single template costs you $842 in avoidable spend per month.
Apply all fixes →
§ 04Optimization Engine

Same intent. 73% fewer tokens. Zero regressions.

Our LLM-grade rewriter preserves meaning and constrains output, then verifies behavioral equivalence on your eval set before promoting to production.

before · prompt.original
142 tk · $0.00043
after · prompt.optimized
38 tk · $0.00011
- You are a highly intelligent AI assistant.
- Please carefully analyze the following customer support
  ticket and extract: priority level, category, sentiment,
  and a suggested response.

- I would really appreciate it if you could respond in
- valid JSON format. Thank you very much.

  Customer ticket:
  {{message}}
+ Extract from ticket:
+   priority(P0-P3), category, sentiment(-1..1), response.
+ Output JSON. Schema: {p,c,s,r}.

  {{message}}
Token reduction
−73%
142 → 38
Per-request cost
−74%
$0.00043 → $0.00011
Monthly savings
$184
@ 2k req/d
Output control
JSON
schema-locked
Eval parity
99.4%
on 240 tests
01
Parse

Tokenize, classify each region by purpose: framing, instruction, context, output.

02
Rewrite

Strip filler, compress instructions, enforce schema, propose minimal viable form.

03
Verify

Run candidate against your test set. Measure semantic + behavioral equivalence.

04
Ship

Promote to production with one command. Rollback in 1 click if metrics drift.

§ 05Cache Intelligence

Pay 10× less for what you say twice.

Tokenistt detects every cache-eligible region in your prompt, models the write/read economics against Anthropic's thresholds, and tells you exactly which blocks to mark.

cache-map · agent-pipeline.yaml 3 cacheable regions detected
Astatic
system instructions
identical across requests
1,840 tk
↳ cache_control: ephemeral
Bstatic
tool definitions
reused by agent
1,240 tk
↳ cache_control: ephemeral
Creusable
company handbook
rotates weekly
8,200 tk
↳ cache_control: ephemeral
Ddynamic
user query
unique per request
124 tk
Edynamic
session context
unique per session
380 tk
Anthropic cache threshold · 1,024 tk min
1024
011,280 tk cacheable · 96%11,784 tk
Cache economics · sonnet-4.5
WRITE COST
1.25×
once per 5 min
READ COST
0.10×
per cached request
Break-even
after 2 reads · you have ~14k/day
EST. MONTHLY REDUCTION
$284
vs $1,120 uncached · 75% reduction on cacheable surface
cache hit rate (proj.)94.2%
writes / day~288 (5min ttl)
§ CTeam Governance

Centrally govern AI infrastructure across every workspace.

Cost caps, model whitelists, prompt policies, key rotation, and full audit trails — set once, enforced everywhere.

POLICY · acme-corp / production
Daily spend cap
$80 / workspace
ENFORCED
Model whitelist
haiku-4.5, sonnet-4.5
ENFORCED
Force prompt cache
≥ 1,024 tk · ttl 5m
ENFORCED
Block opus-4.5 in prod
except agents-platform
ENFORCED
Require BYOK on tier-1
org keys disabled
ENFORCED
PII redaction
pre-flight scrubber
ENFORCED
Anomaly auto-pause
> 3σ · 5min window
ENFORCED
ROLES & ACCESS · 142 members
Owners
4
full org control
Admins
12
workspace + policy
Engineers
98
analyzer + optimize
Viewers
28
read-only dashboards
AUDIT LOG · last 24h
14:02:11POLICYaryan@acme.co · updated cost cap · agents-platform
13:48:02ROUTEsys.tokenistt · auto-rerouted 1,284 req → haiku-4.5
13:22:47OPTIMaditya@acme.co · approved rewrite · summarize() · −68 tk
13:04:11ALERTsys.tokenistt · anomaly detected · experiments · +92%
12:51:30AUTHakshay@acme.co · rotated workspace key · sql-copilot
12:44:08CACHEsys.tokenistt · cache promoted · system_v4 · 1840 tk
§ DBuilt for production

Infrastructure-grade. By default.

Tokenistt runs in your VPC, with your keys, on your terms. SOC 2 Type II, ISO 27001, region pinning, no prompt logging by default.

UPTIME · 90D
99.99%
12s downtime
COMPLIANCE
SOC 2 II
ISO 27001 · GDPR
DEPLOYMENT
VPC + SaaS
us · eu · apac
KEY HANDLING
BYOK
hashicorp vault
DATA RESIDENCY
pinned
no cross-region
PROMPT LOGGING
opt-in
PII scrubber on
§ 06MCP Ecosystem

One install. Every Claude surface.

Tokenistt ships as a Model Context Protocol server. Drop it into any MCP-compatible host and every prompt routed through it gets analyzed, optimized, and cached automatically.

~/projects/apiClaude Desktop
click to interactzsh · bash compatible
$ npm install -g tokenistt-mcp
added 1 package · 240ms
$ tokenistt connect claude-desktop
→ resolving host...
Claude Desktop@anthropicconnected
✓ MCP handshake · ✓ tokenizer loaded · ✓ analytics streaming
→ Claude Desktop now routing through tokenistt
$
Claude Desktop
@anthropic
12.4k connections live
Cursor
editor
8.9k connections live
Windsurf
editor
4.2k connections live
VS Code
extension
21.7k installs live
request pipeline · live
host
Claude Desktop · MCP client
send
tokenistt
parse · classify · score
1.1ms
optimize
rewrite · cache · model swap
−68% tk
anthropic
POST /v1/messages
$0.0003
response
metrics · ledger · webhook
logged
§ 07Editor-Native

Inline cost intelligence in your editor.

The Tokenistt extension lights up every prompt-shaped string in your codebase — token counts in the gutter, model recommendations on hover, optimization at a keystroke.

summarize.ts — apitokenistt v0.4.2
summarize.ts
embed.ts
agent.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
export async function summarize(text: string) {
const prompt = `You are a helpful summarization assistant.
Please carefully read the following text and provide
a concise summary, no more than 3 sentences.
 
Text: ${text}`;
 
return await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 200,
messages: [{role: "user", content: prompt}],
});
}
◆ 47 tk · −68% available
TOKENISTT · summarize()
TOKENS / CALL
47 −68%
30D SPEND
$28.40
BEST MODEL
haiku-4.5
SUGGESTIONS
Trim filler (−18 tk)
Add max_tokens=60
Cache system block
● tokenisttmain0 ⚠ 0 ✗
◆ 47 tk · $0.00009/call30d $28 · save $19TypeScript
Gutter token count

See tokens beside every prompt-shaped literal.

Hover diagnostics

Cost, best model, cache eligibility on mouseover.

Quick-fix optimize

⌘. to apply Tokenistt rewrites in place.

Spend status bar

30-day rolling cost and projected savings.

§ 08The Team

Engineers building infrastructure for the next generation of AI systems.

Aryan Singh
01 / 03
Aryan Singh
Co-founder · CEO

Building infrastructure for the next generation of AI systems. Previously led billing platform engineering at 9-figure scale.

Aditya Tiwari
02 / 03
Aditya Tiwari
Co-founder · CPO & CMO

LLM inference economics and large-scale ML serving. Architecting the runtime that governs production AI workloads.

Akshay Khanna
03 / 03
Akshay Khanna
Co-founder · CTO

Developer infrastructure used by 40k+ engineers. Owns the surface where AI engineering teams live every day.

§ 09Pricing

Pricing that scales with your AI infrastructure.

From individual engineers to organization-wide AI platform teams. Tokenistt typically pays for itself in the first week of production traffic.

Free
$0individual

For engineers exploring prompt economics and token visibility.

MCP server access
Up to 50k analyses / mo
VS Code extension
Personal observability dashboard
Spend analytics (single user)
Optimization engine5 / day
Cache intelligenceread-only
Workspace governance
Audit log + RBAC
VPC deployment + SOC 2
RECOMMENDED
Team
$7per engineer / month

For AI engineering teams running Claude across multiple workloads.

MCP server access
Unlimited analyses + traces
VS Code extension
Multi-workspace observability
Real-time spend attribution
Optimization engineunlimited
Cache intelligencefull + automation
Workspace governancecost caps + policies
Audit log + RBAC
VPC deployment + SOC 2
Enterprise
Customproduction AI infrastructure

For platform teams operating Claude at scale across the organization.

MCP server access
Unlimited analyses + traces
VS Code extension
Org-wide observability
Real-time spend attribution
Optimization engineunlimited + custom rules
Cache intelligencefull + policy + SLOs
Workspace governancecentral + anomaly auto-pause
Audit log + RBACSCIM + SSO + SAML
VPC deployment + SOC 2on-prem + ISO 27001
ROI guarantee
Pays for itself in week 1.
SOC 2 Type II
Available on Team plan.
No prompt logging
BYOK + region pinning.
Annual billing
2 months free.
▊ · · ◆ · ▌ · ·0x 0x ·▄ ▇ ◆ · ◆
◆ · · 0x · · ▏ ▇◆▍ ◆
◆ ▉ ▎ 0x 0x·◆ · 0x ██ 0x ◆· 0x
· · ◆ ◆ ◆▄ ▆ 0x · · · · · ◆ · 0x 0x
0x ▂ 0x ◆ ◆ · ▏ ▄ ▅· · ▋· · ·
· 0x ▂◆ · ◆ ◆▁ · · ◆ ▊▏ █ · · ·
◆▁ · ▉ · ▃ ◆ · ▄ ◆◆ ▉
◆ · ▊ · ·▊◆ ◆· · ▍ 0x · ·
◆ ·· 0x · · ·· · ·
◆ ◆ ▊ · ▂ 0x ◆ ▃ ◆·
◆· ▌ 0x ▍ · ▎· ▏ 0x ▄0x ▄ ▃
◆ 0x · ◆ ◆ · 0x◆ · █ · ◆ ▎ · 0x ▏
◆ ▊◆ · ▁ · 0x ▋ ▀ 0x
◆ 0x ▁ 0x· 0x ◆ ·· ◆ ·· ·▉·
§ 10Production AI infrastructure

Stop guessing your
AI infrastructure costs.

Deploy in an afternoon. Visibility within the hour. Optimization that compounds across every workload, every team, every workspace.

$180
saved YTD
41%
avg reduction
1.2ms
analysis latency
13+
active developers