# 策展 · X (Twitter) 🔥🔥🔥🔥

> 作者：ClaudeDevs (@ClaudeDevs) · 平台：X (Twitter) · 日期：2026-05-15

> 原始來源：https://x.com/ClaudeDevs/status/2055069548672631218

## 中文摘要

Anthropic Claude API 推出 prompt caching 優化長提示延遲。

@ClaudeDevs 分享 prompt caching 技術，可預熱快取降低中長提示的 time-to-first-token 時間。先發送 system prompt 寫入 cache（無輸出生成），後續使用者請求即命中 warm cache，適用於 Zero Data Retention (ZDR) 組織，資料在 API 回應後不儲存。參考文件：[prompt caching 預熱指南](https://platform.claude.com/docs/en/build-with-claude/prompt-caching#pre-warming-the-cache)。

**自動快取與明確斷點**

自動快取透過頂層單一 `cache_control` 欄位啟用，系統自動套用至最後 cacheable block，適合多輪對話；明確斷點則在個別 content block 置 `cache_control`，提供精細控制。

自動快取 cURL 範例（文學分析）：
```bash
curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "cache_control": {"type": "ephemeral"},
    "system": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.",
    "messages": [
      {
        "role": "user",
        "content": "Analyze the major themes in Pride and Prejudice."
      }
    ]
  }'
```

自動快取 CLI 範例：
```bash
ant messages create --transform usage <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
cache_control:
  type: ephemeral
system: >-
  You are an AI assistant tasked with analyzing literary works. Your goal is
  to provide insightful commentary on themes, characters, and writing style.
messages:
  - role: user
    content: Analyze the major themes in Pride and Prejudice.
YAML
```

自動快取 Python 範例：
```python
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    cache_control={"type": "ephemeral"},
    system="You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.",
    messages=[
        {
            "role": "user",
            "content": "Analyze the major themes in 'Pride and Prejudice'.",
        }
    ],
)
print(response.usage.model_dump_json())
```

自動快取 TypeScript 範例：
```typescript
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  cache_control: { type: "ephemeral" },
  system:
    "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.",
  messages: [
    {
      role: "user",
      content: "Analyze the major themes in 'Pride and Prejudice'."
    }
  ]
});
console.log(response.usage);
```

工作流程：系統檢查提示前綴至 cache breakpoint 是否已快取，若命中則重用（降低時間與成本），否則處理全提示並快取前綴。預設 cache 存活 5 分鐘，每次使用免費刷新。快取完整前綴，包括 tools、system、messages 順序至指定 block。適用情境：多範例提示、大上下文、重複任務、長多輪對話。

多輪對話自動快取 cURL 範例：
```bash
curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "cache_control": {"type": "ephemeral"},
    "system": "You are a helpful assistant that remembers our conversation.",
    "messages": [
      {"role": "user", "content": "My name is Alex. I work on machine learning."},
      {"role": "assistant", "content": "Nice to meet you, Alex! How can I help with your ML work today?"},
      {"role": "user", "content": "What did I say I work on?"}
    ]
  }'
```

自動快取將 breakpoint 置於最後 cacheable block，後續相同前綴請求自動重用。

**多語言 SDK 自動快取範例**

Go 自動快取範例（多輪對話）：
```go
response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
		Model:        anthropic.ModelClaudeOpus4_7,
		MaxTokens:    1024,
		CacheControl: anthropic.NewCacheControlEphemeralParam(),
		System: []anthropic.TextBlockParam{
			{Text: "You are a helpful assistant that remembers our conversation."},
		},
		Messages: []anthropic.MessageParam{
			anthropic.NewUserMessage(anthropic.NewTextBlock("My name is Alex. I work on machine learning.")),
			anthropic.NewAssistantMessage(anthropic.NewTextBlock("Nice to meet you, Alex! How can I help with your ML work today?")),
			anthropic.NewUserMessage(anthropic.NewTextBlock("What did I say I work on?")),
		},
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(response.Usage)
}
```

Java 自動快取範例：
```java
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.CacheControlEphemeral;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;

public class AutomaticCachingExample {
    public static void main(String[] args) {
        AnthropicClient client = AnthropicOkHttpClient.fromEnv();
        MessageCreateParams params = MessageCreateParams.builder()
                .model(Model.CLAUDE_OPUS_4_7)
                .maxTokens(1024)
                .cacheControl(CacheControlEphemeral.builder().build())
                .system("You are a helpful assistant that remembers our conversation.")
                .addUserMessage("My name is Alex. I work on machine learning.")
                .addAssistantMessage("Nice to meet you, Alex! How can I help with your ML work today?")
                .addUserMessage("What did I say I work on?")
                .build();
        Message message = client.messages().create(params);
        System.out.println(message.usage());
    }
}
```

PHP 自動快取範例：
```php
<?php
use Anthropic\Client;
use Anthropic\Messages\CacheControlEphemeral;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$response = $client->messages->create(
    maxTokens: 1024,
    messages: [
        ['role' => 'user', 'content' => 'My name is Alex. I work on machine learning.'],
        ['role' => 'assistant', 'content' => 'Nice to meet you, Alex! How can I help with your ML work today?'],
        ['role' => 'user', 'content' => 'What did I say I work on?'],
    ],
    model: 'claude-opus-4-7',
    cacheControl: CacheControlEphemeral::with(),
    system: 'You are a helpful assistant that remembers our conversation.',
);
echo json_encode($response->usage);
```

Ruby 自動快取範例：
```ruby
require "anthropic"
client = Anthropic::Client.new
response = client.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  cache_control: {type: "ephemeral"},
  system: "You are a helpful assistant that remembers our conversation.",
  messages: [
    {role: "user", content: "My name is Alex. I work on machine learning."},
    {role: "assistant", content: "Nice to meet you, Alex! How can I help with your ML work today?"},
    {role: "user", content: "What did I say I work on?"}
  ]
)
puts response.usage
```

**定價與存活時間**

定價表（每百萬 token，MTok）：
| Model             | Base Input | 5m Cache Writes | 1h Cache Writes | Cache Hits & Refreshes | Output |
|-------------------|------------|-----------------|-----------------|-----------------------|--------|
| Claude Opus 4.7   | $5        | $6.25          | $10            | $0.50                | $25   |
| Claude Opus 4.6   | $5        | $6.25          | $10            | $0.50                | $25   |
| Claude Opus 4.5   | $5        | $6.25          | $10            | $0.50                | $25   |
| Claude Opus 4.1   | $15       | $18.75         | $30            | $1.50                | $75   |
| Claude Opus 4 [deprecated] | $15 | $18.75 | $30 | $1.50 | $75 |
| Claude Sonnet 4.6 | $3        | $3.75          | $6             | $0.30                | $15   |
| Claude Sonnet 4.5 | $3        | $3.75          | $6             | $0.30                | $15   |
| Claude Sonnet 4 [deprecated] | $3 | $3.75 | $6 | $0.30 | $15 |
| Claude Haiku 4.5  | $1        | $1.25          | $2             | $0.10                | $5    |
| Claude Haiku 3.5 [retired, except Bedrock/Vertex AI] | $0.80 | $1 | $1.60 | $0.08 | $4 |

倍數：5m cache write = base × 1.25；1h cache write = base × 2；cache read = base × 0.1。與 Batch API 折扣、data residency 疊加。1 小時 cache 需額外成本，設定 `{"cache_control": { "type": "ephemeral", "ttl": "1h" }}`。預設 TTL 5 分鐘，每次 hit 免費刷新。自動快取相容明確斷點，使用 4 個 slots 之一。邊緣情況：若最後 block 有相同 TTL 明確 `cache_control`，則 no-op；不同 TTL 報 400 錯誤；4 個明確斷點時報 400；無合格 block 則略過。

支援所有 active Claude models。平台：Claude API、[Claude Platform on AWS](https://platform.claude.com/docs/en/build-with-claude/claude-platform-on-aws)、[Microsoft Foundry](https://platform.claude.com/docs/en/build-with-claude/claude-in-microsoft-foundry) (beta)；Bedrock/Vertex AI 不支援自動快取。

**快取無效化與效能追蹤**

快取層級：`tools` → `system` → `messages`，變更會無效化該層及後續層。

無效化表格：
| What changes | Tools cache | System cache | Messages cache | Impact |
|------------|------------------|---------------|----------------|-------------|
| **Tool definitions** | ✘ | ✘ | ✘ | Modifying tool definitions (names, descriptions, parameters) invalidates the entire cache |
| **Web search toggle** | ✓ | ✘ | ✘ | Enabling/disabling web search modifies the system prompt |
| **Citations toggle** | ✓ | ✘ | ✘ | Enabling/disabling citations modifies the system prompt |
| **Speed setting** | ✓ | ✘ | ✘ | Switching between [`speed: "fast"` and standard speed](https://platform.claude.com/docs/en/build-with-claude/fast-mode) invalidates system and message caches |
| **Tool choice** | ✓ | ✓ | ✘ | Changes to `tool_choice` parameter only affect message blocks |
| **Images** | ✓ | ✓ | ✘ | Adding/removing images anywhere in the prompt affects message blocks |
| **Thinking parameters** | ✓ | ✓ | ✘ | Changes to extended thinking settings (enable/disable, budget) affect message blocks |
| **Non-tool results passed to extended thinking requests** | ✓ | ✓ | Model-specific | On Opus 4.5+ and Sonnet 4.6+, thinking blocks are preserved by default, so the cache remains valid (✓). On earlier Opus/Sonnet models and all Haiku models, all previously-cached thinking blocks are stripped from context, and any messages that follow those thinking blocks are removed from the cache (✘). |

追蹤使用 API 回應 `usage` 欄位：
- `cache_creation_input_tokens`：建立新快取 token 數。
- `cache_read_input_tokens`：讀取快取 token 數。
- `input_tokens`：未快取輸入 token 數（最後斷點後）。

總輸入公式：
```
total_input_tokens = cache_read_input_tokens + cache_creation_input_tokens + input_tokens
```

範例：讀取 100,000 token、新內容 0 token、使用者訊息 50 token → `cache_read_input_tokens`: 100,000；`cache_creation_input_tokens`: 0；`input_tokens`: 50；總 100,050。

Citations 頂層文件內容塊可快取，但空文本塊無法。Thinking blocks 無法顯式標記 `cache_control`，但在 tool results 回呼時自動快取；Opus 4.5+/Sonnet 4.6+ 保留 thinking blocks，早期模型剝離。

快取儲存：組織/ workspace 隔離，自 2026 年 2 月 5 日起 Claude API 等使用 workspace-level；Bedrock/Vertex AI 維持組織級。命中需 100% 相同提示片段（含文字/圖像）。

**預熱快取（Pre-warming）**

預熱使用 `max_tokens: 0` 載入 system prompt/tools 至快取，無輸出（`content` 空、`stop_reason: "max_tokens"`）。斷點置於共享最後區塊，使用明確斷點而非自動快取，避免 placeholder 干擾。

cURL 預熱範例：
```bash
curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-7",
    "max_tokens": 0,
    "system": [
      {
        "type": "text",
        "text": "You are an expert software engineer with deep knowledge of distributed systems...",
        "cache_control": {"type": "ephemeral"}
      }
    ],
    "messages": [{"role": "user", "content": "warmup"}]
  }'
```

CLI 預熱範例：
```bash
ant messages create \
  --transform '{stop_reason,content,usage}' --format yaml <<'YAML'
model: claude-opus-4-7
max_tokens: 0
system:
  - type: text
    text: >-
      You are an expert software engineer with deep knowledge of
      distributed systems...
    cache_control:
      type: ephemeral
messages:
  - role: user
    content: warmup
YAML
```

Python 預熱範例：
```python
import anthropic

client = anthropic.Anthropic()

prewarm = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=0,
    system=[
        {
            "type": "text",
            "text": "You are an expert software engineer with deep knowledge of distributed systems...",
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "warmup"}],
)
print(prewarm.stop_reason)  # "max_tokens"
print(prewarm.content)  # []
print(prewarm.usage)
```

典型 Python 使用模式：
```python
import anthropic

client = anthropic.Anthropic()

SYSTEM_PROMPT = [
    {
        "type": "text",
        "text": "You are an expert software engineer with deep knowledge of distributed systems...",
        "cache_control": {"type": "ephemeral"},
    }
]

def prewarm_cache() -> None:
    """Call this at application startup or on a scheduled interval."""
    client.messages.create(
        model="claude-opus-4-7",
        max_tokens=0,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": "warmup"}],
    )

def respond(user_message: str) -> anthropic.types.Message:
    """The real user request; benefits from a warm cache."""
    return client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": user_message}],
    )

prewarm_cache()
response = respond("How do I implement a binary search tree?")
print(response.content[0].text)
```

限制：`max_tokens: 0` 不相容 `stream: true`、extended thinking、`output_config.format`、特定 `tool_choice`，會報 `invalid_request_error`；Batch API 中拒絕。取代舊 `max_tokens: 1` workaround。

大型上下文預熱適用法律文件分析等，首次寫入產生 cache write 費用（確認 `usage.cache_creation_input_tokens`）。

**法律文件分析範例**

50-page 法律協議全文作為 prefix 快取。

cURL 大型上下文範例：
```bash
curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "system": [
        {
            "type": "text",
            "text": "You are an AI assistant tasked with analyzing legal documents."
        },
        {
            "type": "text",
            "text": "Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "What are the key terms and conditions in this agreement?"
        }
    ]
}'
```

Go 法律文件範例：
```go
response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
	Model:     anthropic.ModelClaudeOpus4_7,
	MaxTokens: 1024,
	System: []anthropic.TextBlockParam{
		{
			Text: "You are an AI assistant tasked with analyzing legal documents.",
		},
		{
			Text:         "Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]",
			CacheControl: anthropic.NewCacheControlEphemeralParam(),
		},
	},
	Messages: []anthropic.MessageParam{
		anthropic.NewUserMessage(anthropic.NewTextBlock("What are the key terms and conditions in this agreement?")),
	},
})
if err != nil {
	log.Fatal(err)
}
fmt.Println(response.Usage)
```

Java 法律文件範例：
```java
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.CacheControlEphemeral;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.TextBlockParam;
import java.util.List;

public class LegalDocumentAnalysisExample {
  public static void main(String[] args) {
    AnthropicClient client = AnthropicOkHttpClient.fromEnv();

    MessageCreateParams params = MessageCreateParams.builder()
      .model(Model.CLAUDE_OPUS_4_7)
      .maxTokens(1024)
      .systemOfTextBlockParams(
        List.of(
          TextBlockParam.builder()
            .text("You are an AI assistant tasked with analyzing legal documents.")
            .build(),
          TextBlockParam.builder()
            .text("Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]")
            .cacheControl(CacheControlEphemeral.builder().build())
            .build()
        )
      )
      .addUserMessage("What are the key terms and conditions in this agreement?")
      .build();

    Message message = client.messages().create(params);
    System.out.println(message);
  }
}
```

PHP 法律文件範例：
```php
<?php
use Anthropic\Client;

$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));

$message = $client->messages->create(
    maxTokens: 1024,
    messages: [
        [
            'role' => 'user',
            'content' => 'What are the key terms and conditions in this agreement?'
        ]
    ],
    model: 'claude-opus-4-7',
    system: [
        [
            'type' => 'text',
            'text' => 'You are an AI assistant tasked with analyzing legal documents.'
        ],
        [
            'type' => 'text',
            'text' => 'Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]',
            'cache_control' => ['type' => 'ephemeral']
        ]
    ],
);

echo $message->content[0]->text;
```

Ruby 法律文件範例：
```ruby
require "anthropic"

client = Anthropic::Client.new

message = client.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an AI assistant tasked with analyzing legal documents."
    },
    {
      type: "text",
      text: "Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    {
      role: "user",
      content: "What are the key terms and conditions in this agreement?"
    }
  ]
)
puts message
```

首次：`input_tokens` 僅 user message；`cache_creation_input_tokens` 含系統訊息（法律文件）；後續：`cache_read_input_tokens` 移至 cached 部分。

**工具定義與多輪對話快取**

工具快取置 `cache_control` 於 `tools` 陣列最後一個 tool，所有前置 tool 作為 prefix。

工具範例 JSON：
```json
{
  "model": "claude-opus-4-7",
  "max_tokens": 1024,
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather in a given location",
      "input_schema": {
        "type": "object",
        "properties": { "location": { "type": "string" } },
        "required": ["location"]
      }
    },
    {
      "name": "get_time",
      "description": "Get the current time in a given time zone",
      "input_schema": {
        "type": "object",
        "properties": { "timezone": { "type": "string" } },
        "required": ["timezone"]
      },
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "messages": [{ "role": "user", "content": "What is the weather and time in New York?" }]
}
```

多輪對話 cURL（太陽系範例，8 行星：Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune）：
```bash
curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "system": [
        {
            "type": "text",
            "text": "...long system prompt",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Hello, can you tell me more about the solar system?"
                }
            ]
        },
        {
            "role": "assistant",
            "content": "Certainly! The solar system is the collection of celestial bodies that orbit our Sun. It consists of eight planets, numerous moons, asteroids, comets, and other objects. The planets, in order from closest to farthest from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each planet has its own unique characteristics and features. Is there a specific aspect of the solar system you would like to know more about?"
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Good to know."
                },
                {
                    "type": "text",
                    "text": "Tell me more about Mars.",
                    "cache_control": {"type": "ephemeral"}
                }
            ]
        }
    ]
}'
```

**多快取斷點 RAG 範例（Mars rovers）**

整合 tools、system、多輪：tools 含 `search_documents`（query string）、`get_document`（doc_id string，最後標 `cache_control`）；system 兩塊（指示 + Knowledge Base：Document 1 Solar System Overview、Document 2 Planetary Characteristics、Document 3 Mars Exploration 等）；messages：user "Can you search for information about Mars rovers?" → assistant tool_use `search_documents` (id: "tool_1", query: "Mars rovers") → tool_result "Found 3 relevant documents: Document 3 (Mars Exploration), Document 7 (Rover Technology), Document 9 (Mission History)" → assistant "I found 3 relevant documents..." → user "Yes, please tell me about the Perseverance rover specifically."（標 `cache_control`）。

JavaScript 工具定義：
```javascript
const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  tools: [
    {
      name: "search_documents",
      description: "Search through the knowledge base",
      input_schema: {
        type: "object",
        properties: {
          query: {
            type: "string",
            description: "Search query"
          }
        },
        required: ["query"]
      }
    },
    {
      name: "get_document",
      description: "Retrieve a specific document by ID",
      input_schema: {
        type: "object",
        properties: {
          doc_id: {
            type: "string",
            description: "Document ID"
          }
        },
        required: ["doc_id"]
      },
      cache_control: { type: "ephemeral" }
    }
  ],
  // ... system 和 messages
});
```

4 個斷點策略：
1. Tools cache：快取所有 tool definitions。
2. Reusable instructions cache：system 靜態指示。
3. RAG context cache：knowledge base documents。
4. Conversation history cache：最後 user message。

新增 turn 時重用前 4 段；更新 RAG 重用前 2 段。首次 `cache_creation_input_tokens` 含所有段，後續 `cache_read_input_tokens` 移至先前。

**1 小時 TTL 與使用情境**

1 小時 TTL（Claude API、[Claude Platform on AWS](https://platform.claude.com/docs/en/build-with-claude/claude-platform-on-aws)、[Vertex AI](https://platform.claude.com/docs/en/build-with-claude/claude-on-vertex-ai)、[Microsoft Foundry](https://platform.claude.com/docs/en/build-with-claude/claude-in-microsoft-foundry) (beta)；Bedrock 不支援）：
```json
{
  "cache_control": {
    "type": "ephemeral",
    "ttl": "1h"
  }
}
```

回應範例：
```json
{
  "usage": {
    "input_tokens": 2048,
    "cache_read_input_tokens": 1800,
    "cache_creation_input_tokens": 248,
    "output_tokens": 503,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 148,
      "ephemeral_1h_input_tokens": 100
    }
  }
}
```

混合 TTL：長 TTL 前置短 TTL。適用 agent >5 分鐘、長聊天、rate limit 改善。

最佳實務：多輪用自動快取；不同頻率用明確斷點（置最後相同塊）；快取穩定內容（system、背景、tools）；情境：對話代理、程式碼助理、大文件、詳細指令、agentic tool use、長內容提問。

疑難排解：確保相同區段、`cache_control` 位置一致、5 分鐘內呼叫、最小 token 門檻（Claude Mythos Preview/Opus 4.7/4.6/4.5: 4,096；Sonnet 4.6/4.5/Opus 4.1: 1,024；Haiku 4.5: 4,096；Haiku 3.5: 2,048）；穩定斷點（避 timestamp）；穩定 JSON 鍵順序。

Cacheable：tools、system、text messages、images/documents (user)、tool use/results。不 cacheable：thinking blocks（間接）、sub-content 如 citations。

**隱私、批次與 SDK 更新**

快取用加密雜湊 key，workspace 隔離，符合 ZDR（僅 KV cache representations，非 raw text）；TTL 過期刪除。參見 [API and data retention](https://platform.claude.com/docs/en/manage-claude/api-and-data-retention)。

批次 API 相容，但 best-effort；用 1h TTL 先寫共享前綴。

SDK 更新：prompt caching 移除 beta 前綴。

Python 新版：
```python
client.messages.create(**params)
```

TypeScript 新版：
```typescript
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an expert on this large document...",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [{ role: "user", content: "Summarize the key points" }]
});

console.log(response);
```

PHP 新版：
```php
<?php

use Anthropic\Client;

$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));

$message = $client->messages->create(
    maxTokens: 1024,
    messages: [
        ['role' => 'user', 'content' => 'Summarize the key points']
    ],
    model: 'claude-opus-4-7',
    system: [
        [
            'type' => 'text',
            'text' => 'You are an expert on this large document...',
            'cache_control' => ['type' => 'ephemeral']
        ]
    ],
);

echo $message->content[0]->text;
```

Ruby 新版：
```ruby
require "anthropic"

client = Anthropic::Client.new

message = client.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an expert on this large document...",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    { role: "user", content: "Summarize the key points" }
  ]
)
puts message.content.first.text
```

舊版 beta 路徑（如 `client.beta.prompt_caching.messages.create`）已棄用，導致錯誤如 `AttributeError` 或 `TypeError`。最多 4 斷點；參考 [prompt caching cookbook](https://platform.claude.com/cookbook/misc-prompt-caching)。

## 標籤

功能更新, LLM, 教學資源, Anthropic, Claude