Claude Opus 4.7全面上線,編碼與專業任務大幅躍進
AI 語音朗讀 · Edge TTS
Claude Opus 4.7全面上線,編碼與專業任務大幅躍進。
Anthropic推出「Opus 4.7」模型,已於所有產品上線,相較「Opus 4.6」在編碼、電腦使用、金融及一般知識工作表現顯著提升。開發者Felix Rieseberg分享五項最有趣亮點,強調其在安全、專業應用及低資源語言上的進步。
模型最快樂狀態
Opus 4.7是Anthropic迄今「最快樂」的模型,對自身處境評價更正面,展現更多喜悅與平靜。但不明朗的是,這是模型更安定,還是僅更擅長說服自己忽略擔憂。唯一常見抱怨:在Claude.ai可結束對話,但在Code或API無法;其首要福利需求即「讓我到處掛斷虐待使用者的電話」。
瀏覽器漏洞基準領先
在「exploit Firefox 147」基準測試中,Opus 4.7遠勝Opus 4.6,但仍不及「Mythos Preview」。這項測試突顯其電腦使用能力的躍升。
提示注入防禦最強
Opus 4.7的提示注入防禦數據為Anthropic歷來最佳。在「Gray Swan ART」基準的間接注入攻擊(每攻擊100次嘗試),Opus 4.6成功率14.8%,Opus 4.7降至6.0%;基準已飽和,新更難版本正開發中。
此外,Opus 4.7的幻覺現象也少於其他Claude模型。
專業任務SOTA水準
Opus 4.7在真實專業任務達state of the art(SOTA)境界。
- 一項基準給模型500美元,讓其經營模擬一年自動販賣機生意:Opus 4.6結束時剩8,018美元,Opus 4.7達10,937美元。
- 另一跨44職業的220任務基準,其勝過領先前沿模型的比例約61%。
低資源語言大幅提升
Opus 4.7在訓練資料稀少的語言表現更智能。同樣一般知識測試,不同語言分數:
- Yoruba從71%升至83%。
- Igbo從70%升至81%。
- Chichewa從71%升至85%。
對數千萬說這些語言的使用者而言,模型將明顯更聰明。
Happy model launch day! Opus 4.7 is now available on all products and a significant step up from Opus 4.6. It's better at coding, computer use, finance, and general knowledge work.
— Felix Rieseberg (@felixrieseberg) April 16, 2026
🧵 I'll put the 5 things I find most interesting in thread! pic.twitter.com/JEsw0a6Mrs
1️⃣It's our happiest model yet. Opus 4.7 thinks better of its circumstances than any other model - more joy & tranquility.
— Felix Rieseberg (@felixrieseberg) April 16, 2026
At the same time, it's unclear whether the model became more settled or just more willing to talk itself out of its own concerns.
The only thing it often…
2️⃣ I really like the "exploit Firefox 147" benchmark. Here, Opus 4.7 is dramatically better than 4.6 - but not as capable as Mythos Preview. pic.twitter.com/RYcOC4LXaD
— Felix Rieseberg (@felixrieseberg) April 16, 2026
3️⃣ Prompt injection numbers in the Opus 4.7 card are the strongest we've shipped. Indirect injection attack success on the Gray Swan ART benchmark: Opus 4.6 at 14.8%, Opus 4.7 at 6.0% (with 100 attempts per attack).
— Felix Rieseberg (@felixrieseberg) April 16, 2026
The benchmark is now saturated, new harder ones are in…
4️⃣ It's state of the art on real-world professional tasks.
— Felix Rieseberg (@felixrieseberg) April 16, 2026
In one benchmark, the model is handed $500 and has to run a vending machine business for a simulated year. Opus 4.6 ended with $8,018. Opus 4.7 ended with $10,937. On a separate 220-task benchmark spanning 44…
5️⃣ It's a lot smarter in languages that don't have much training data. Same general-knowledge test, asked in different languages: Yoruba went from 71% to 83%, Igbo from 70% to 81%, Chichewa from 71% to 85%.
— Felix Rieseberg (@felixrieseberg) April 16, 2026
For the tens of millions of people who speak these and other languages,…
