TL;DR
AI coding assistants burn tokens on structure, not meaning. A 200-row JSON API response sends thousands of tokens of brackets, quotes, and repeated key names that the model uses exactly zero times for reasoning. In measured runs across JSON, CSV, YAML, and debug output:
| Content Type | Token Savings |
|---|---|
| Medium JSON | 65.0% |
| API-shaped responses | 50.3% |
| Nested objects | 48.2% |
| Multilingual content | 46.3% |
| Large JSON | 42.6% |
| CSV-like data | 24.5% |
| Average (48 cases) | 48.1% |
Source: Toonify benchmark run, April 30, 2026.
The mechanism: intercept tool output via a PostToolUse hook before it reaches the context window. No changes to how you use Claude Code.
The Problem: JSON Is the Wrong Format for LLM Input
JSON was designed for machine-to-machine serialization. Every value is wrapped in quotes, every key is repeated for every object in an array, every nesting level adds brackets and commas. For a program parsing JSON, that structure is the contract. For an LLM reading it, most of that structure is noise.
Consider a 10-row API response:
[
{"id": 1, "name": "Blue Lake Trail", "distanceKm": 7.5, "elevationGain": 320},
{"id": 2, "name": "Royal Arch", "distanceKm": 5.2, "elevationGain": 490},
...10 more rows
]
The keys id, name, distanceKm, elevationGain appear once per row. For 10 rows, that’s 40 key repetitions. The model’s attention doesn’t need repetition to understand the schema — it needs the schema once, then the values.
TOON (Token-Oriented Object Notation) addresses this directly. For uniform arrays of objects, it uses a CSV-style tabular layout: declare the headers once, then list values row by row:
id | name | distanceKm | elevationGain
1 | Blue Lake Trail | 7.5 | 320
2 | Royal Arch | 5.2 | 490
The data model is lossless — TOON is a drop-in representation of JSON, not a lossy summary. The difference is structural overhead: TOON strips what exists for parsers, not for models.
Why Hook-Based Compression Works
The first version of Toonify was an MCP server: you called a tool explicitly to compress content. No one used it consistently.
The insight that unlocked real usage was the PostToolUse hook. Claude Code fires this event after every tool call, before the result enters the context window. Intercepting there means:
- Every
Bashoutput gets compressed automatically - Every
Readresult gets compressed automatically - No extra commands, no workflow changes, no discipline required
The hook reads the tool output, detects the content type (JSON, CSV, YAML, or debug output), compresses if savings exceed a 30% threshold, and passes the result through unchanged if it doesn’t. Under 50 tokens: pass through. Compression would break the content: pass through. Any error during processing: pass through. The implementation never breaks the workflow.
This design principle — always fail silently, never interrupt the session — is what made automatic hook execution safe to ship without per-tool opt-in.
The v0.7.0 Expansion: Debug Output Has the Same Problem
Six months after launch, usage patterns revealed a second culprit: debug output.
Long test failure logs, TypeScript compiler diagnostics, and lint output share the same token-waste pattern as JSON — repetition that isn’t meaning. A TypeScript error that repeats the file path and error code on 40 lines for 40 instances of the same problem is structurally identical to a JSON array repeating 40 object keys.
The v0.7.0 compressor handles this by collapsing repeated diagnostic patterns:
- Identical consecutive lines collapse to one with a
(×N)count - Similar TypeScript/lint diagnostics with different file paths collapse to a representative sample with a count
- Stack traces retain the top frames (where the failure is) and collapse the library internals below
In practice this makes test-failure context significantly shorter without losing the information that matters for debugging — which file, which line, which error.
When Compression Helps Most (and Least)
High compression (50–66%): Uniform arrays of objects — API responses, database query results, log entries with shared schemas. The key-repetition problem is largest here.
Medium compression (42–48%): Nested JSON, multilingual content (where TOON handles Unicode correctly), large file reads.
Low compression (24%): CSV-like data that’s already column-structured. TOON still removes quoting overhead and normalizes whitespace, but the gains are smaller because CSV is already relatively compact.
Skip entirely: Short content under 50 tokens, free-form prose, content where exact original formatting must be preserved (binary data, certain configs). The evaluator skips these rather than risking corruption.
The Caching Layer
For real-world AI coding sessions, the same content appears repeatedly — you read the same config file, you run the same test suite, you get the same API response. Rather than re-compressing identical content on every hook firing, Toonify caches compression results in a local LRU cache with a one-hour TTL.
The cache is keyed by content hash. Cache hit means near-zero overhead per call — compression work happens once per unique content per session.
Field Note
The 30–65% range in the original project description came from early benchmark averages. After broader testing across edge cases — very small payloads, already-compact CSVs, multilingual content — the current measured average of 48.1% is more representative of real-world sessions.
The design constraint that shaped every tradeoff: token savings matter only if the compression is transparent. A developer who has to think about when to compress has already paid most of the cognitive cost back.
FURTHER QUESTIONS
- At what context window size does compression overhead (parsing + encoding time) become the bottleneck vs. the savings it produces?
- Does TOON encoding degrade model accuracy on structured-data reasoning tasks, or does the model handle positional notation as well as JSON?