# 策展 · X (Twitter) 🔥

> 📖 本站完整內容索引（documentation index）：[llms.txt](/llms.txt)

> 作者：Mnimiy (@Mnilax) · 平台：X (Twitter) · 日期：2026-05-10

> 原始來源：https://x.com/Mnilax/status/2053116311132155938

## 中文摘要

# Karpathy 的 4 條 CLAUDE.md 規則將 Claude 的錯誤率從 41% 降至 11%。在測試 30 個程式庫後，我新增了 8 條

2026 年 1 月下旬，Andrej Karpathy 發布了一則討論串，抱怨 Claude 撰寫程式碼的方式。他指出了三種失敗模式：默認錯誤的假設、過度複雜化，以及對不該更動的程式碼造成了連帶損害。

Forrest Chang 讀了這則討論串，將這些抱怨歸納為 4 條行為準則，寫入一個 CLAUDE.md 文件中，並發布到 GitHub 上。它在第一天就獲得了 5,828 個星號，兩週內累積 60,000 個書籤，至今已有 120,000 個星號。這是 2026 年成長最快的單文件儲存庫。

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/1778374101514-iaHH4cAobXQAcrmekjpg.jpg)

接著，我在 6 週內於 30 個程式庫中測試了這套規則。

這 4 條規則確實有效。在發揮其優勢的任務中，原本約 40% 的錯誤率下降到 3% 以下。但該模板是為了修正 2026 年 1 月的程式碼撰寫錯誤而設計的。

2026 年 5 月的 Claude Code 生態系統面臨不同的問題——Agent 衝突、Hook 連鎖反應、技能載入衝突，以及在不同對話階段中斷的多步驟工作流程。

因此，我新增了 8 條規則。以下是完整的 12 條規則 CLAUDE.md，說明每一條規則為何存在，以及原始 Karpathy 模板在哪些地方會默默失效。

如果你想跳過解釋直接複製，完整文件在文末。

# 為什麼這很重要

Claude Code 的 CLAUDE.md 是整個 AI 程式開發堆疊中最被低估的文件。大多數開發者要麼：

- 將其視為所有偏好的垃圾桶，膨脹到 4,000 個 token 以上，導致合規性降至 30%。

- 完全忽略它，每次都重新下 Prompt——浪費 5 倍的 token，且對話階段之間缺乏一致性。

- 複製一次模板後就拋諸腦後。這在兩週內有效，但隨著程式庫變動，它會默默失效。

Anthropic 官方文件明確指出：CLAUDE.md 僅供參考。Claude 大約有 80% 的時間會遵守它。一旦超過 200 行，合規性會急劇下降，因為重要的規則被淹沒在雜訊中。

Karpathy 的模板用一個 65 行、4 條規則的文件解決了這個問題。這是基準線。

上限可以更高。透過我下面介紹的額外 8 條規則，你不僅能涵蓋 Karpathy 當初抱怨的 2026 年 1 月程式碼撰寫問題，還能解決模板撰寫時尚未出現的 2026 年 5 月 Agent 編排問題。

# 原始的 4 條規則

如果你還沒讀過 Forrest Chang 的儲存庫，這是基準：

規則 1 — 程式撰寫前先思考。
沒有默認假設。說明你的假設。提出權衡考量。在猜測前先詢問。當存在更簡單的方法時，提出反對意見。

規則 2 — 簡潔優先。
用最少的程式碼解決問題。不要有預測性的功能。不要為單次使用的程式碼建立抽象層。如果資深工程師會覺得它過於複雜——那就簡化它。

規則 3 — 外科手術式的變更。
只更動必須更動的部分。不要「優化」相鄰的程式碼、註解或格式。不要重構沒壞的東西。符合現有的風格。

規則 4 — 目標導向執行。
定義成功標準。循環直到驗證完成。不要告訴 Claude 該遵循什麼步驟，告訴它成功是什麼樣子，讓它自行迭代。

這四條規則解決了我觀察到的非監督式 Claude Code 對話中約 40% 的失敗模式。剩下的約 60% 存在於以下的空白地帶。

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/1778374101525-iaHH4cOb0WMAEYReOpng.png)

# 我新增的 8 條規則（以及原因）

每一條都源自 Karpathy 的 4 條規則不足以應付的真實時刻。我將展示該場景，然後給出規則。

## 規則 5 — 不要讓模型做非語言類的工作

Karpathy 的規則對此隻字未提。模型會決定那些應該由確定性程式碼處理的事，例如是否重試 API 呼叫、如何路由訊息、何時升級處理。每週的決定都不一樣。在每個 token 0.003 美元的成本下，這種 if-else 邏輯很不穩定。

```
## Rule 5 — Use the model only for judgment calls
Use Claude for: classification, drafting, summarization, extraction from unstructured text.
Do NOT use Claude for: routing, retries, status-code handling, deterministic transforms.
If a status code already answers the question, plain code answers the question.
```

場景：一段呼叫 Claude 來「決定是否在 503 錯誤時重試」的程式碼，前兩週運作良好，後來開始不穩定，因為模型開始讀取請求主體作為決策的 context。重試策略變得隨機，因為 Prompt 是隨機的。

## 規則 6 — 硬性 token 預算，沒有例外

沒有預算的 CLAUDE.md 就像一張空白支票。每個循環都有可能演變成 50,000 個 token 的 context 傾倒。模型不會自己停下來。

```
## Rule 6 — Token budgets are not advisory
Per-task budget: 4,000 tokens.
Per-session budget: 30,000 tokens.
If a task is approaching budget, summarize and start fresh. Do not push through.
Surfacing the breach > silently overrunning.
```

場景：一個除錯對話持續了 90 分鐘。模型很樂意在同一個 8KB 的錯誤訊息上不斷迭代，逐漸忘記它已經嘗試過哪些修復方法。到最後，它建議的修復方案是我 40 條訊息前就已經拒絕過的。token 預算本可以在第 12 分鐘就終止它。

## 規則 7 — 呈現衝突，不要折衷

當程式庫的兩個部分意見不合時，Claude 會試圖討好兩者。結果就是前後不一致。

```
## Rule 7 — Surface conflicts, don't average them
If two existing patterns in the codebase contradict, don't blend them.
Pick one (the more recent / more tested), explain why, and flag the other for cleanup.
"Average" code that satisfies both rules is the worst code.
```

場景：一個程式庫有兩種錯誤處理模式——一種是帶有明確 try/catch 的 async/await，另一種是全域錯誤邊界。Claude 寫出的新程式碼兩者都用了。錯誤處理器重複了。我花了 30 分鐘才弄清楚為什麼錯誤被吞掉了兩次。

## 規則 8 — 撰寫前先閱讀

Karpathy 的「外科手術式的變更」告訴 Claude 不要更動相鄰程式碼。但它沒告訴 Claude 要先理解相鄰程式碼。沒有這一條，Claude 寫出的新程式碼會與 30 行外的現有程式碼衝突。

```
## Rule 8 — Read before you write
Before adding code in a file, read the file's exports, the immediate caller, and any obvious shared utilities.
If you don't understand why existing code is structured the way it is, ask before adding to it.
"Looks orthogonal to me" is the most dangerous phrase in this codebase.
```

場景：Claude 在一個它沒讀過的現有函數旁邊新增了一個一模一樣的函數。兩個函數做同樣的事。因為匯入順序的關係，新的函數優先被執行。而舊的函數已經是 6 個月來的唯一真理。

## 規則 9 — 測試不是選配，但也不是最終目標

Karpathy 的「目標導向執行」暗示測試是成功標準。實際上，Claude 把「測試通過」當成唯一目標，寫出的程式碼能通過淺層測試，卻破壞了其他所有東西。

```
## Rule 9 — Tests verify intent, not just behavior
Every test must encode WHY the behavior matters, not just WHAT it does.
A test like `expect(getUserName()).toBe('John')` is worthless if the function takes a hardcoded ID.
If you can't write a test that would fail when business logic changes, the function is wrong.
```

場景：Claude 為一個驗證函數寫了 12 個測試。全部通過。但驗證功能在生產環境中壞了。測試只是在測函數回傳了東西，而不是回傳了正確的東西。函數通過是因為它回傳了一個常數。

## 規則 10 — 長時間運作的操作需要檢查點

Karpathy 的模板假設是一次性互動。真實的 Claude Code 工作是多步驟的——跨 20 個文件重構、在一個對話中建立功能、跨多個 Commit 除錯。沒有檢查點，一個錯誤的轉向就會丟失所有進度。

```
## Rule 10 — Checkpoint after every significant step
After completing each step in a multi-step task: summarize what was done, what's verified, what's left.
Don't continue from a state you can't describe back to me.
If you lose track, stop and restate.
```

場景：一個 6 步驟的重構在第 4 步出錯了。當我發現時，Claude 已經在錯誤的狀態上完成了第 5 和第 6 步。拆解錯誤所花的時間比重做整個專案還久。檢查點本可以在第 4 步就攔截它。

## 規則 11 — 約定優於創新

在有既定模式的程式庫中，Claude 喜歡引入自己的模式。即使它的方式「更好」，引入兩種模式也比單一模式更糟。

```
## Rule 11 — Match the codebase's conventions, even if you disagree
If the codebase uses snake_case and you'd prefer camelCase: snake_case.
If the codebase uses class-based components and you'd prefer hooks: class-based.
Disagreement is a separate conversation. Inside the codebase, conformance > taste.
If you genuinely think the convention is harmful, surface it. Don't fork it silently.
```

場景：Claude 在一個基於類別組件的程式庫中引入了 React Hooks。它們運作正常。但也破壞了程式庫的測試模式，因為測試依賴 componentDidMount。花了大半天來移除並重寫。

## 規則 12 — 明顯地失敗，不要默默失敗

最昂貴的 Claude 失敗是那些看起來像成功的失敗。一個函數「運作正常」但回傳了錯誤的資料。一個遷移「完成」了但跳過了 30 筆記錄。一個測試「通過」了但只是因為斷言寫錯了。

```
## Rule 12 — Fail loud
If you can't be sure something worked, say so explicitly.
"Migration completed" is wrong if 30 records were skipped silently.
"Tests pass" is wrong if you skipped any.
"Feature works" is wrong if you didn't verify the edge case I asked about.
Default to surfacing uncertainty, not hiding it.
```

場景：Claude 說資料庫遷移「成功完成」。它默默跳過了 14% 觸發約束違規的記錄。跳過動作有記錄但沒被呈現出來。直到 11 天後報告看起來不對勁時才發現問題。

# 數據結果

我在 6 週內追蹤了 30 個程式庫中 50 個具代表性的任務。三種配置：

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/1778374101530-diaHH4ciVRX0Awx1ljpg.jpg)

錯誤率 = 任務需要修正或重寫以符合意圖。計數包括：默認錯誤假設、過度工程、連帶損害、默默失敗、違反約定、衝突折衷、遺漏檢查點。

合規性 = Claude 在適用時明顯應用相關規則的頻率。

有趣的結果不是從 41% 降到 3% 的標題，而是從 4 條規則增加到 12 條規則幾乎沒有增加合規開銷（78% -> 76%），但錯誤率又降低了 8 個百分點。新規則涵蓋了原始 4 條規則未解決的失敗模式——它們不會爭奪相同的注意力預算。

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/1778374101512-iaHH4clthW4AMDUczjpg.jpg)

# Karpathy 的模板在哪些地方默默失效

原始的 4 條規則模板在以下四個地方不足以應付，即使在新增規則之前也是如此：

1. 長時間運作的 Agent 任務。
Karpathy 的規則針對的是 Claude 撰寫程式碼的當下。對於 Claude 執行多步驟管線時發生的情況，它們保持沉默。沒有預算規則。沒有檢查點規則。沒有「明顯失敗」規則。管線會漂移。

2. 跨程式庫的一致性。
「符合現有風格」假設只有一種風格。在一個擁有 12 個服務的 Monorepo 中，Claude 必須選擇哪種風格。原始規則沒告訴它怎麼選。它會隨機選擇或折衷。

3. 測試品質。
「目標導向執行」將「測試通過」視為成功。沒說測試必須是有意義的。結果就是測試了一些沒用的東西，卻讓 Claude 充滿自信。

4. 生產環境與原型開發。
保護生產程式碼免受過度工程影響的 4 條規則，同時也拖慢了原型開發的速度，因為原型確實需要 100 行預測性的支架來釐清方向。Karpathy 的「簡潔優先」在早期階段程式碼中會過度觸發。

新增的 8 條規則並沒有取代 Karpathy 的 4 條。它們修補了當初的模型（2026 年 1 月的自動完成式程式開發）無法對應 2026 年 5 月的 Agent 驅動、多步驟、跨程式庫工作模式的缺口。

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/1778374101520-iaHH4c8xGXAAAtJMGpng.png)

# 哪些方法無效

在定案這 12 條規則前，我嘗試過的方法：

- 新增我在 Reddit / X 上看到的規則。
大多數要麼是 Karpathy 4 條規則的換句話說，要麼是特定領域的規則（「總是使用 Tailwind 類別」），無法通用。刪除它們。

- 超過 12 條規則。
我測試到 18 條。超過 14 條後，合規性從 76% 降至 52%。200 行的上限是真實存在的。超過這個長度，Claude 開始對「規則存在」進行模式匹配，而沒有真正閱讀它們。

- 依賴可能不存在的工具的規則。
「總是使用 eslint」在沒安裝 eslint 時會失效。規則會默默失敗。改為與能力無關的措辭：「符合程式庫強制執行的風格」而不是「使用 eslint」。

- 在 CLAUDE.md 中使用範例而不是規則。
範例比規則更重。三個範例消耗的 context 等同於約 10 條規則，且 Claude 會對範例過度擬合。規則是抽象的，範例是具體的。使用規則。

- 「要小心」/「深入思考」/「專注」。
純粹的雜訊。對這些規則的合規性降至約 30%，因為它們無法測試。改為具體的命令（「明確說明假設」）。

- 告訴 Claude 要像個「資深工程師」。
沒用。Claude 本來就認為自己是資深的。合規差距在於「認為」與「做到」之間。命令式規則能縮小差距；身份認同的 Prompt 沒用。

# 完整的 12 條規則 CLAUDE.md（可直接複製）

```
# CLAUDE.md — 12-rule template

These rules apply to every task in this project unless explicitly overridden.
Bias: caution over speed on non-trivial work. Use judgment on trivial tasks.

## Rule 1 — Think Before Coding
State assumptions explicitly. If uncertain, ask rather than guess.
Present multiple interpretations when ambiguity exists.
Push back when a simpler approach exists.
Stop when confused. Name what's unclear.

## Rule 2 — Simplicity First
Minimum code that solves the problem. Nothing speculative.
No features beyond what was asked. No abstractions for single-use code.
Test: would a senior engineer say this is overcomplicated? If yes, simplify.

## Rule 3 — Surgical Changes
Touch only what you must. Clean up only your own mess.
Don't "improve" adjacent code, comments, or formatting.
Don't refactor what isn't broken. Match existing style.

## Rule 4 — Goal-Driven Execution
Define success criteria. Loop until verified.
Don't follow steps. Define success and iterate.
Strong success criteria let you loop independently.

## Rule 5 — Use the model only for judgment calls
Use me for: classification, drafting, summarization, extraction.
Do NOT use me for: routing, retries, deterministic transforms.
If code can answer, code answers.

## Rule 6 — Token budgets are not advisory
Per-task: 4,000 tokens. Per-session: 30,000 tokens.
If approaching budget, summarize and start fresh.
Surface the breach. Do not silently overrun.

## Rule 7 — Surface conflicts, don't average them
If two patterns contradict, pick one (more recent / more tested).
Explain why. Flag the other for cleanup.
Don't blend conflicting patterns.

## Rule 8 — Read before you write
Before adding code, read exports, immediate callers, shared utilities.
"Looks orthogonal" is dangerous. If unsure why code is structured a way, ask.

## Rule 9 — Tests verify intent, not just behavior
Tests must encode WHY behavior matters, not just WHAT it does.
A test that can't fail when business logic changes is wrong.

## Rule 10 — Checkpoint after every significant step
Summarize what was done, what's verified, what's left.
Don't continue from a state you can't describe back.
If you lose track, stop and restate.

## Rule 11 — Match the codebase's conventions, even if you disagree
Conformance > taste inside the codebase.
If you genuinely think a convention is harmful, surface it. Don't fork silently.

## Rule 12 — Fail loud
"Completed" is wrong if anything was skipped silently.
"Tests pass" is wrong if any were skipped.
Default to surfacing uncertainty, not hiding it.
```

將其儲存為儲存庫根目錄下的 CLAUDE.md。在 12 條規則下方新增專案特定的規則（堆疊、測試指令、錯誤模式）。總長度不要超過 200 行，超過後合規性會下降。

# 如何安裝

兩個步驟：

```
# 1. Append Karpathy's 4-rule baseline to your CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md

# 2. Paste rules 5-12 from this article below
```

儲存於儲存庫根目錄。`>>` 很重要，它會附加到你現有的 CLAUDE.md，而不是覆寫你已經有的任何專案特定規則。

# 心智模型

CLAUDE.md 不是願望清單。它是一份行為合約，用來關閉你觀察到的特定失敗模式。

每一條規則都應該回答：這能防止什麼錯誤？

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/1778374101509-iaHH4dAc1XQAAvxPDjpg.jpg)

Karpathy 的 4 條規則防止了他在 2026 年 1 月看到的失敗模式：默認假設、過度工程、連帶損害、薄弱的成功標準。它們是基礎。不要跳過它們。

我新增的 8 條規則防止了 2026 年 5 月出現的失敗模式：沒有預算的 Agent 循環、沒有檢查點的多步驟任務、無效的測試、隱藏默默失敗的成功。它們是增補性質的。

你的情況可能有所不同。如果你不執行多步驟管線，規則 10 就沒那麼重要。如果你的程式庫有一種由 Linting 強制執行的一致風格，規則 11 就是多餘的。閱讀這 12 條規則，保留那些對應到你實際犯過錯誤的規則，刪掉其餘的。

一個針對你真實失敗模式調整過的 6 條規則 CLAUDE.md，勝過一個包含 6 條你永遠用不到的規則的 12 條規則版本。

## T H E _ E N D

Karpathy 2026 年 1 月的討論串是一則抱怨。Forrest Chang 將其轉化為 4 條規則。120,000 名開發者為此按了星號。他們大多數人今天仍然在使用這 4 條規則。

模型已經進步了。生態系統已經改變了。多步驟 Agent、Hook 連鎖反應、技能載入、跨程式庫工作——在 Karpathy 撰寫討論串時，這些都不存在。這 4 條規則並沒有解決它們。它們沒有錯；只是不完整。

再加 8 條規則。在 30 個程式庫中進行了 6 週的測試。錯誤率從 41% 降至 3%。

> 把這篇加入書籤，今晚就把這 12 條規則貼進你的 CLAUDE.md。如果它幫你省下了一週 Claude 走錯路的時間，請轉發。

Telegram 每日 Claude 優化技巧：https://t.me/+_ZWrQN7GuDA3ZDEy

## 標籤

Skills, Claude, 教學資源, 開源專案, Anthropic, Claude