Ant Engineer reverse-engineers Claude code source, revealing the four-layer decision pipeline mechanism of Auto Mode

Gate News, March 25 — Ant Group engineer and Umi.js front-end framework author Chen Cheng reverse-engineered the source code of Claude Code 2.1.81, fully restoring the decision mechanism of Auto Mode. The key finding: each tool invocation passes through four layers of decision-making, and only when the first three layers cannot determine the outcome will an independent AI classifier be called for safety review.

The four layers of the pipeline are: first, checking existing permission rules; if matched, allow directly; second, simulating acceptEdits mode (permission level allowing file edits). If it passes in this mode, it indicates low risk, and the classifier is skipped; third, checking the read-only tool whitelist (Read, Grep, Glob, LSP, WebSearch, etc.), which do not modify any state and are unconditionally allowed; only if none of these conditions are met does it proceed to the fourth layer, sending an API request to Claude Sonnet for safety classification.

Key design details of the classifier include: always using Sonnet instead of Opus, balancing cost and latency; setting temperature to 0 to ensure deterministic output; the classifier is defined as a “safety monitor for autonomous AI programming agents,” protecting against three types of risks (prompt injection, scope creep, unintended harm); the user’s CLAUDE.md configuration file is injected into the classifier context as the basis for judging user intent.

The interception rules cover over 22 categories, including force push, direct push to main branch, downloading and executing external code, production deployment, data leaks, self-modification permissions, creating remote code execution surfaces, credential leaks, etc. Exceptions for allowing include seven types: hardcoded test keys, local file operations within the working directory, read-only GET requests, installing declared dependencies, official toolchain installations, reading configuration credentials sent to target providers, and pushing to the current working branch.

The system also implements a circuit breaker mechanism: after three consecutive rejections or a total of 20 rejections, the system downgrades to manual confirmation; in headless mode, it directly aborts the agent. When the classifier is unavailable, a feature flag controls whether it “fail-closed” (immediately reject) or “fail-open” (downgrade to manual confirmation).

In Auto Mode, prompt injection behavior is finely controlled: injected once every five dialogue rounds, with the first injection in each five-round cycle being the full version (about 800 words, including six instructions such as “execute immediately,” “reduce interruptions,” “action over plan”), and the remaining four being a concise one-line version, balancing context window usage and behavioral stability.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

a16z latest report: Why blockchain is the missing infrastructure piece that AI agents need?

a16z crypto’s latest report says that AI agents are evolving from support tools into economic actors, yet there are still major gaps in core infrastructure such as identity, payments, and cross-platform collaboration. The report emphasizes that as AI becomes involved in governance and transactions, verification mechanisms become the key to trust, and blockchain technology can provide verifiable infrastructure to address these challenges. The future will require cryptographic mechanisms to ensure that AI agents truly represent users’ intent and to change traditional payment systems.

ChainNewsAbmedia7h ago

Three Major Platforms Control 75% of Stock Perpetual Futures Market in Q1 2026

TokenInsight’s Q1 2026 report reveals that the stock perpetual futures market is dominated by a few top platforms, which collectively hold about 75% market share. Major exchanges are increasingly offering U.S. stock and finance products to enhance cross-asset trading.

GateNews14h ago

Cross-Asset Hedging Emerges as Mainstream Strategy, Q1 Report Shows

A report by Block Scholes reveals rising correlations between crypto assets and traditional markets, noting increased demand for unified trading platforms as traders manage diverse assets. Trading volumes have surged, reflecting a shift towards multi-asset strategies.

GateNews14h ago

Digital Asset Investment Products Record $1.4B Net Inflows Last Week, Highest Since January

CoinShares reported $1.4 billion in net inflows for digital asset investment products last week, marking the largest increase since January. Bitcoin led with $1.116 billion, while Ethereum saw $328 million inflow. The U.S. contributed significantly, though Switzerland experienced outflows.

GateNews14h ago

DeFi hackers stole $600 million in April; Kelp DAO and Drift accounted for 95% of the monthly losses

In April 2026, within just 20 days, cryptocurrency protocols suffered losses of more than $606 million due to hacker attacks, becoming the worst single-month loss record since the February 2025 exchange incident in which $1.4 billion in data was leaked. The two attacks by KelpDAO and Drift Protocol accounted for 95% of April’s losses, and 75% of the total $771.8 million losses as of now in 2026.

MarketWhisper18h ago

Moody's: Stablecoin Market Exceeds $315.8B, but Near-Term Bank Threat Remains Limited

Moody's report shows stablecoins have reached a market value of $315.8 billion, primarily dominated by USDT. While near-term risks to banks are limited due to narrow adoption and regulation, long-term growth may challenge traditional banking.

GateNews19h ago
Comment
0/400
No comments