← Back to Research
Solo Builders 2026-05-05

The Context Window Tax: How 48% of AI Tool Output Is Wasted Tokens

Every AI coding session has a hidden token drain: large tool output from JSON responses, file reads, and test failures fills the context window with structure the model doesn't need. Measured across real-world sessions, automatic compression via TOON notation and debug-output collapsing recovers 24–66% of that spend.

TL;DR

AI coding assistants burn tokens on structure, not meaning. A 200-row JSON API response sends thousands of tokens of brackets, quotes, and repeated key names that the model uses exactly zero times for reasoning. In measured runs across JSON, CSV, YAML, and debug output:

Content TypeToken Savings
Medium JSON65.0%
API-shaped responses50.3%
Nested objects48.2%
Multilingual content46.3%
Large JSON42.6%
CSV-like data24.5%
Average (48 cases)48.1%

Source: Toonify benchmark run, April 30, 2026.

The mechanism: intercept tool output via a PostToolUse hook before it reaches the context window. No changes to how you use Claude Code.


The Problem: JSON Is the Wrong Format for LLM Input

JSON was designed for machine-to-machine serialization. Every value is wrapped in quotes, every key is repeated for every object in an array, every nesting level adds brackets and commas. For a program parsing JSON, that structure is the contract. For an LLM reading it, most of that structure is noise.

Consider a 10-row API response:

[
  {"id": 1, "name": "Blue Lake Trail", "distanceKm": 7.5, "elevationGain": 320},
  {"id": 2, "name": "Royal Arch", "distanceKm": 5.2, "elevationGain": 490},
  ...10 more rows
]

The keys id, name, distanceKm, elevationGain appear once per row. For 10 rows, that’s 40 key repetitions. The model’s attention doesn’t need repetition to understand the schema — it needs the schema once, then the values.

TOON (Token-Oriented Object Notation) addresses this directly. For uniform arrays of objects, it uses a CSV-style tabular layout: declare the headers once, then list values row by row:

id | name | distanceKm | elevationGain
1 | Blue Lake Trail | 7.5 | 320
2 | Royal Arch | 5.2 | 490

The data model is lossless — TOON is a drop-in representation of JSON, not a lossy summary. The difference is structural overhead: TOON strips what exists for parsers, not for models.


Why Hook-Based Compression Works

The first version of Toonify was an MCP server: you called a tool explicitly to compress content. No one used it consistently.

The insight that unlocked real usage was the PostToolUse hook. Claude Code fires this event after every tool call, before the result enters the context window. Intercepting there means:

  • Every Bash output gets compressed automatically
  • Every Read result gets compressed automatically
  • No extra commands, no workflow changes, no discipline required

The hook reads the tool output, detects the content type (JSON, CSV, YAML, or debug output), compresses if savings exceed a 30% threshold, and passes the result through unchanged if it doesn’t. Under 50 tokens: pass through. Compression would break the content: pass through. Any error during processing: pass through. The implementation never breaks the workflow.

This design principle — always fail silently, never interrupt the session — is what made automatic hook execution safe to ship without per-tool opt-in.


The v0.7.0 Expansion: Debug Output Has the Same Problem

Six months after launch, usage patterns revealed a second culprit: debug output.

Long test failure logs, TypeScript compiler diagnostics, and lint output share the same token-waste pattern as JSON — repetition that isn’t meaning. A TypeScript error that repeats the file path and error code on 40 lines for 40 instances of the same problem is structurally identical to a JSON array repeating 40 object keys.

The v0.7.0 compressor handles this by collapsing repeated diagnostic patterns:

  • Identical consecutive lines collapse to one with a (×N) count
  • Similar TypeScript/lint diagnostics with different file paths collapse to a representative sample with a count
  • Stack traces retain the top frames (where the failure is) and collapse the library internals below

In practice this makes test-failure context significantly shorter without losing the information that matters for debugging — which file, which line, which error.


When Compression Helps Most (and Least)

High compression (50–66%): Uniform arrays of objects — API responses, database query results, log entries with shared schemas. The key-repetition problem is largest here.

Medium compression (42–48%): Nested JSON, multilingual content (where TOON handles Unicode correctly), large file reads.

Low compression (24%): CSV-like data that’s already column-structured. TOON still removes quoting overhead and normalizes whitespace, but the gains are smaller because CSV is already relatively compact.

Skip entirely: Short content under 50 tokens, free-form prose, content where exact original formatting must be preserved (binary data, certain configs). The evaluator skips these rather than risking corruption.


The Caching Layer

For real-world AI coding sessions, the same content appears repeatedly — you read the same config file, you run the same test suite, you get the same API response. Rather than re-compressing identical content on every hook firing, Toonify caches compression results in a local LRU cache with a one-hour TTL.

The cache is keyed by content hash. Cache hit means near-zero overhead per call — compression work happens once per unique content per session.


Field Note

The 30–65% range in the original project description came from early benchmark averages. After broader testing across edge cases — very small payloads, already-compact CSVs, multilingual content — the current measured average of 48.1% is more representative of real-world sessions.

The design constraint that shaped every tradeoff: token savings matter only if the compression is transparent. A developer who has to think about when to compress has already paid most of the cognitive cost back.

FURTHER QUESTIONS

  • At what context window size does compression overhead (parsing + encoding time) become the bottleneck vs. the savings it produces?
  • Does TOON encoding degrade model accuracy on structured-data reasoning tasks, or does the model handle positional notation as well as JSON?