# 策展 · X (Twitter) 🔥🔥🔥🔥🔥

> 📖 本站完整內容索引（documentation index）：[llms.txt](/llms.txt)

> 作者：Cursor (@cursor_ai) · 平台：X (Twitter) · 日期：2026-07-01

> 原始來源：https://x.com/cursor_ai/status/2072020786181988418

## 中文摘要

Cursor 與 Devin 導入 Claude Sonnet 5 提升程式開發效能。

**Cursor 的效能評估**
Cursor 官方宣布 Claude Sonnet 5 已正式上線，並透過自家的「CursorBench」進行評測。根據數據顯示，Claude Sonnet 5 在該基準測試中取得 57% 的成績，相較於前代 Claude Sonnet 4.6 的 49% 有顯著提升。

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/5ae0de4904162bb9.jpg)
> 在 CursorBench 3.1 基準測試中，Claude Sonnet 5 相比於 Sonnet 4.6 取得了顯著的進步（57% 對比 49%）；下圖進一步展示不同模型在各種推理設定下的成本與分數權衡曲線。

使用者可透過 [Cursor 官方評測頁面](http://cursor.com/evals) 查看完整的模型排名。

**Devin 的工程實測**
Cognition 旗下的 Devin Desktop 與 Devin CLI 同步支援 Claude Sonnet 5，並強調該模型以更具競爭力的成本，提供達到前沿水準的程式開發效能。根據 Cognition 針對真實工程任務所設計的「FrontierCode (Extended)」基準測試，Claude Sonnet 5 在程式碼可合併性（mergeability）與品質評分上表現優異：
- Claude Sonnet 5 取得 53.8% 的分數，並具備 57.6% 的通過率，表現超越 Claude Opus 4.8。

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/e87c8ddec732ba0c.jpg)
> 在 FrontierCode Extended 基準測試中，Claude Sonnet 5 以 53.8% 的得分超越了 Claude Opus 4.8（51.8%）及其他模型。

- Cognition 特別提醒，隨著未來對 FrontierCode 基準測試的調整，相關排名數據可能會有些微變動。

**使用優惠與相關資訊**
為了鼓勵使用者體驗新模型，Cognition 提供限時的配額優惠：
- 即日起至 2026 年 8 月 31 日止，在 Devin Desktop 與 Devin CLI 中使用 Claude Sonnet 5，將比使用 Claude Sonnet 4.6 節省約 30% 的配額消耗。
- 優惠期結束後，Claude Sonnet 5 的配額消耗將調整為與 Claude Sonnet 4.6 相同。
- 使用者可前往 [Devin 官方下載頁面](http://devin.ai/download) 獲取最新版本，詳細評測分析可參考 [Cognition 官方部落格](https://devin.ai/blog/claude-sonnet-5)。

## 媒體內容

**在 CursorBench 3.1 基準測試中，Claude Sonnet 5 相比於 Sonnet 4.6 取得了顯著的進步（57% 對比 49%），在相同的每項任務平均成本下提供更高的分數。**

**數據表**

| 項目 | 數值 |
| --- | --- |
| 本圖為 CursorBench 3.1 的「成本 vs 分數」多序列權衡曲線（非單一數字比較圖），各模型於不同推理設定下的表現如下 |  |
| Fable 5 high · 單點，約 $10.5／70% |  |
| Composer 2.5 · 單點，約 $0.5／62% |  |
| Opus 4.8 high · 多點曲線，約 $9,63% 至 $3,54% |  |
| Sonnet 5 high (default) · 單點，約 $4／55%（僅測試一種推理設定，非曲線） |  |
| Sonnet 4.6 high · 多點曲線，約 $9,60% 至 $1.3,40%（本圖成本跨度最廣的序列） |  |
| GLM 5.2 high · 單點，約 $1.8／50% |  |
| GPT-5.5 medium · 多點曲線，約 $4.3,63% 至 $0.8,48% |  |
| 註：Cursor 官方推文另引用「CursorBench 57% 對比 49%」的整體分數比較 Sonnet 5 與 Sonnet 4.6，該數字與本圖個別座標點非一一對應（不同衡量情境，圖表為 cost-quality tradeoff 曲線）。 |  |

**在 FrontierCode Extended 基準測試中，Claude Sonnet 5 以 53.8% 的得分超越了 Claude Opus 4.8（51.8%）及其他模型。**

**數據表**

| 項目 | 數值 |
| --- | --- |
| SWE-1.6 | 18.4 |
| Claude Sonnet 4.6 | 33.6 |
| Gemini 3.1 Pro | 34.2 |
| Kimi K2.7 | 39.5 |
| GLM 5.2 | 43.0 |
| GPT-5.5 | 44.8 |
| Claude Opus 4.8 | 51.8 |
| Claude Sonnet 5 | 53.8 |

## 標籤

IDE, 功能更新, Benchmark, Cursor, Cognition, Anthropic