3 hello messages and your Claude Code quota is gone? A cache bug spanning 28 days, along with an official response telling you to "use it sparingly."

Question

Original title: 3 hellos and you’re capped—where did your Claude Code credits go? A cache bug spanning 28 days, and an official response telling you to “use it more sparingly”

Original author: LvDong BlockBeats

Original source:

Reprint: Mars Finance

4-17%. This is the prompt cache read rate of Claude Code over the past month. Normal levels are 97-99%.

This means that when you resume a prior session, Claude Code doesn’t reuse the context that was already processed before; instead, every time it processes the entire content from scratch, consuming 10 to 20 times more credits than in normal conditions. You think you’re continuing a conversation—when in reality, every time you’re effectively restarting a brand-new, full-price conversation.

The figure comes from real-world proxy monitoring tests by independent developer ArkNill. By setting up a transparent proxy, he recorded every request between Claude Code and the Anthropic API, and found that at least two client-side cache bugs prevented the API server from matching cached conversation prefixes, forcing it to fully rebuild the tokens on every turn.

The chart above shows a comparison of cache read rates across three phases. Between v2.1.69 and v2.1.89 (the period when the bug existed), the cache read rate of the standalone version was only 4-17%. After v2.1.90 fixed one of the key bugs, the cold-start cache read rate returned to 47-99.7%. By v2.1.91, under stable operation, the cache read rate recovered to 97-99%.

Notably, one detail in the chart stands out: the range span for v2.1.90 is very large (47% to 99.7%). This is because when a session is first restored, the cache still needs to be “warmed up”; the hit rate in the first few rounds is low, but it quickly returns to normal. In the bug version, this warm-up never happens—the cache read rate stays forever at the 14,500 token limit of the system prompt, and all conversation history is billed at full price every time.

28 days, 20 versions

This bug wasn’t introduced in one update and fixed in the next. According to the release records on the npm registry, the bug-introducing version v2.1.69 was published on March 4, and the bug-fixing version v2.1.90 was published on April 1. That’s a span of 28 days, crossing 20 versions in between.

The timeline reveals an intriguing detail. After the bug was introduced on March 4, users didn’t immediately launch large-scale complaints. It wasn’t until March 23 that complaints really erupted. In the meantime, nearly three weeks passed. The reason, based on a breakdown of GitHub issue #41930, is that from March 13 to 28 Anthropic had launched a 2x credits promotion (doubling during off-peak periods) — which objectively masked the impact of the bug. After the promotion ended, the cache bug’s consumption returned to the normal billing baseline, and users’ credits “evaporated” almost instantly.

Anthropic’s response didn’t come quickly. On March 26—three days after user complaints erupted—engineer Thariq Shihipar announced on his personal X account that the quota during peak hours (weekdays 5am-11am PT) had been tightened. On March 30, Anthropic acknowledged on Reddit that “the rate at which users hit the limit is far beyond expectations,” saying it had been listed as the team’s highest priority. It wasn’t until April 1 that team member Lydia Hallie published the formal investigation conclusion.

Throughout the entire process, Anthropic published no blog posts, sent no email notifications, and didn’t update a status page. All official communications were completed only through engineers’ personal social media posts and a handful of Reddit comments.

How much did you pay, and how long can it last?

GitHub issue #41930 gathers hundreds of user reports. The most extreme case involves a Max 20x subscriber user ($200/month): his 5-hour rolling window was completely exhausted within 19 minutes. Max 5x users ($100/month) reported that the 5-hour window was used up within 90 minutes. According to The Letter Two, there were also users claiming that even a simple “hello” consumed 13% of their session quota. A Pro user ($20/month), on Discord, said his credits were “used up every Monday, and reset on Saturday,” and in a 30-day period he could only use them normally for 12 days.

According to ArkNill’s benchmarks, in the bug version v2.1.89, the Max 20x plan’s 100% allocation would be exhausted in about 70 minutes. He also calculated the quota cost of a single --resume operation for a context session of 500K tokens, at about $0.15, because the system fully replays the entire context.

“You’re using it the wrong way”

Lydia Hallie’s investigation conclusion confirmed two points: first, the quota during peak hours really has been tightened; second, the session consumption for 1 million token contexts increased. She said the team fixed some bugs, but emphasized that “no single bug caused overcharging.”

Then she provided four tips to use less: 1. Use Sonnet 4.6 instead of Opus (Opus consumes at about twice the speed);

When deep reasoning isn’t needed, lower the reasoning strength or turn off extended thinking;
Don’t restore long meetings that have been idle for more than an hour—start a new one;
Set the environment variable CLAUDE_CODE_AUTO_COMPACT_WINDOW=200000 to limit the context window size.

No mention was made of any form of quota reset or compensation.

AI podcast host Alex Volkov summarized this response as “You’re holding it wrong,” pointing out that Anthropic itself set 1 million token context as the default, promoted Opus as the flagship model, and positioned extended thinking as a selling point—yet now it advises paid users not to use these features.

The claim of “no overcharging” also conflicts with Claude Code’s own update history. The day before Lydia published her response, v2.1.90 fixed a cache regression bug that had existed since v2.1.69: when using --resume to restore a session, a request that should have hit the cache would trigger a full prompt cache miss, billed at full price. Lydia’s response didn’t mention this confirmed billing anomaly.

By comparison, OpenAI’s Codex also previously had similar issues with abnormal quota consumption. OpenAI’s approach was to reset user quotas, resend credits, and announce in March that it would remove Codex’s usage cap. Anthropic’s approach is to advise users to downgrade models, turn off features, limit context—and attribute the responsibility to users’ usage patterns.

Anthropic sells a subscription that offers “the strongest model + the largest context + the highest reasoning capability,” charging $20 to $200 per month. A cache bug spanning 28 days makes paid users’ credits “evaporate” at 10–20 times the normal speed, and the official response is to tell you to use it more sparingly.

3 hello messages and your Claude Code quota is gone? A cache bug spanning 28 days, along with an official response telling you to "use it sparingly."

Trending Topics

GateSquareAprilPostingChallenge

CryptoMarketSeesVolatility

OilPricesRise

IranLandmarkBridgeBombed

SpaceXIPOTargets$2TValuation

Hot Gate Fun

BHR

黑马纪元

LELE

乐乐

op

op

火箭

HJ

SHRK

BABY SHARK O

Pin