# 策展 · X (Twitter) 🔥🔥🔥🔥

> 📖 本站完整內容索引（documentation index）：[llms.txt](/llms.txt)

> 作者：Cognition (@cognition) · 平台：X (Twitter) · 日期：2026-07-02

> 原始來源：https://x.com/cognition/status/2072368168182432109

## 中文摘要

Cognition 推出 Devin Security Swarm 自動偵測程式漏洞。

**核心架構與運作機制**
Cognition 針對大型程式庫的推理挑戰，開發了「Agentic MapReduce」架構。傳統 Agent 在處理大規模程式庫時，常因搜尋與閱讀過程消耗過多 token，且缺乏明確的覆蓋邊界。Agentic MapReduce 則將任務拆解為確定性的篩選與並行的 Agent 執行：

1. **Plan（規劃）**：Agent 根據威脅模型編寫「selectors」（如路由、認證邊界、反序列化接收器等規則），這些規則會以確定性的方式掃描整個程式庫，剔除無關檔案。
2. **Shard（分片）**：將匹配的檔案分組為有界的批次（bounded batches），確保每個 Agent 處理的範圍明確。
3. **Map（映射）**：多個子 Agent 並行處理各自的批次，在受限的上下文中進行深入分析，並產出結構化的發現結果。
4. **Reduce（縮減）**：整合各分片的發現，進行去重（deduplication）與跨分片鏈接（cross-shard chaining），識別單一 Agent 無法察覺的複雜攻擊路徑。
5. **Verify（驗證）**：針對嚴重漏洞在隔離的「沙盒」中進行執行時（runtime）驗證，確認是否可被利用，並可進一步自動生成修復的 PR。

<video src="https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/1782970631849-c2ro5kkp.mp4" poster="https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/ade2bb8c840b6a9c.jpg" autoplay loop muted playsinline preload="metadata" style="max-width:100%;height:auto;display:block;margin:1rem 0"></video>
> 這是一組展示 Agentic MapReduce Pipeline 安全掃描流程的技術演示圖。

**效能評測與實戰表現**
Cognition 使用包含 50 個真實 GitHub 安全公告（GHSA）漏洞的基準測試集進行評估，涵蓋 Go、Rust、Python、Ruby、Java、C#、JavaScript、C、Swift、Dart 與 Elixir 等 14 種語言。測試結果顯示：

- **Devin Security Swarm**：召回率（Recall）達 72%，單次執行成本為 $90.23。
- **Claude Security**：召回率 68%，單次執行成本 $131.87。
- **Codex Security**：召回率 48%，單次執行成本 $118.20。
- **Cursor Security**：召回率 26%，單次執行成本 $4.60。

![](https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/c866c6939f70ce9e.jpg)
> 測試結果顯示，Devin Security 達到了最高的召回率（72%），且每次執行的成本（$90.23）比次優的 Claude Security（召回率 68%，成本 $131.87）更低。

Devin Security Swarm 不僅在召回率上領先，還成功發現了其他工具遺漏的關鍵漏洞，包括 PHP 沙盒繞過（透過模板注入）、透過元資料值解析進行的參數注入，以及過於寬泛的反序列化介面。

<video src="https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/1782970608285-3npfpe9e.mp4" poster="https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/00928882e95b47ab.jpg" controls playsinline preload="metadata" style="max-width:100%;height:auto;display:block;margin:1rem 0"></video>
> Cognition 團隊介紹 Devin Security Swarm 的運作機制與效能評測。

**實際應用與未來展望**
Devin Security Swarm 現已作為「Devin for Security」套件的一部分正式發布，用來協助安全團隊處理日益增加的漏洞積壓。該工具支援從現有的威脅模型文件生成掃描設定檔，並可排程進行每日或每週掃描，後續掃描僅針對變更的程式碼執行，進一步降低成本。

<video src="https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/1782970644708-57yakto0.mp4" poster="https://pub-75d4fe1e4e80421b9ecb1245a7ae0d1a.r2.dev/curated/777e2f504129bb77.jpg" autoplay loop muted playsinline preload="metadata" style="max-width:100%;height:auto;display:block;margin:1rem 0"></video>
> 這是一份展示多個開源軟體專案漏洞評估數據集的清單，詳細列出了各專案的漏洞類型與嚴重程度。

對於需要深度整合的企業，Cognition 同時提供為期六週的「Devin Security Vulnerability Remediation Program」，由工程團隊協助企業清理 CVE 積壓並建立持續性的修復流程。詳細的技術架構說明、評測方法論及相關文件可參考以下連結：

- [Devin Security Swarm 介紹](https://cognition.com/blog/introducing-devin-security-swarm)
- [Agentic MapReduce 技術解析](https://devin.ai/blog/agentic-map-reduce)
- [Security Swarm 評測報告](https://devin.ai/blog/security-swarm-eval)

## 媒體內容

**Cognition 團隊介紹 Devin Security Swarm 的運作機制與效能評測。**

**影片中的 Prompt 與操作**

Prompt（00:02）：

```
請 Devin 修復功能、修復錯誤或新增程式碼。
```

原文：43: Ask Devin to fix features, fix bugs, or add to your code.

操作步驟：

1. @2:43 使用者在輸入框輸入指令並送出。

**逐字稿**

- `00:00` 嗨，我是 Ido。（Hi, I'm Ido.）
- `00:01` 我是 Angela，我們是 Cognition 的工程師。（And I'm Angela, and we're engineers at Cognition.）
- `00:04` 今天，我們很高興介紹 Security Swarm，（Today, we're excited to introduce Security Swarm,）
- `00:07` 一種全新的弱點掃描方法，（a new approach to vulnerability scanning）
- `00:09` 由我們稱為「Agentic map reduce」的新架構所驅動。（powered by a new architecture we're calling agentic map reduce.）
- `00:13` Devin Security Swarm 在尋找開源程式庫中（Devon Security Swarm outperforms existing tools）
- `00:16` 已驗證的弱點集合方面，表現優於現有的工具，（in finding a verified set of vulnerabilities in open source repos）
- `00:19` 且成本僅為領先替代方案的一半左右。（at only about half of the cost of leading alternatives.）
- `00:24` 現在，現代的安全漏洞利用通常依賴於多個弱點，（Now, modern security exploits often rely upon multiple vulnerabilities）
- `00:27` 這些弱點在程式庫中被串聯在一起。（that are chained together across a code base.）
- `00:30` 現今的模型能夠準確地識別這些漏洞利用，（Today's models are able to accurately identify these exploits,）
- `00:33` 但若要進行深入、大規模的分析，（but to do so in depth, at scale,）
- `00:36` 同時還要對整個商業邏輯進行推理，（while still reasoning across the entire business logic,）
- `00:39` 你需要的遠不止是一個智慧模型。（you need more than just an intelligent model.）
- `00:42` 你需要正確的 harness 和正確的系統設計。（You need the right harness and the right system design.）
- `00:45` 特別是對於大型且成熟的程式庫，（Especially for large and mature code bases,）
- `00:48` 你不能只是把整個儲存庫塞進 context window，（you can't just stuff an entire repo into a context window）
- `00:50` 然後要求 Agent 去找出弱點。（and ask the agent to find the vulnerabilities.）
- `00:53` 這種方法無法擴展。（That approach won't scale.）
- `00:55` 這就是為什麼我們建立了 Agentic map reduce。（That's why we built agentic map reduce.）
- `00:58` Agentic map reduce 有五個步驟。（Agentic map reduce has five steps.）
- `01:01` 首先，Devin 會研究你的程式庫，（First, Devon studies your code base）
- `01:03` 以找出將其劃分為相關分片（shards）的最佳方式。（to identify the best way to divide it into related shards.）
- `01:07` 接著，它會根據該分解方式確定性地對儲存庫進行批次處理。（It then deterministically batches the repo based on that decomposition.）
- `01:11` Devin 會啟動平行的調查工作階段，（Devon will spin up parallel investigation sessions）
- `01:14` 以浮現潛在的弱點，（to surface potential vulnerabilities）
- `01:16` 並產出可供稽核的調查結果。（and produce audit-ready findings.）
- `01:18` 然後，Devin 會進行聚合、（Then, Devon aggregates,）
- `01:20` 去重，並對結果進行推理。（deduplicates, and reasons across results.）
- `01:23` Devin 還會將低嚴重性的漏洞利用串聯起來，（Devon also chains together low-severity exploits）
- `01:26` 以揭露影響力更高的關鍵發現。（to uncover higher-impact critical findings.）
- `01:29` 最後，Devin 會在它自己的安全沙盒中重現這些發現，（Finally, Devon reproduces findings in its own secure sandbox）
- `01:33` 並提供驗證該漏洞利用的 asset。（and provides artifacts that verify the exploit.）
- `01:35` 其獨特的見解在於，安全掃描（The unique insight is that security scanning）
- `01:38` 可以被視為一個分解與歸約（reduction）問題。（can be treated as a decomposition and reduction problem.）
- `01:42` 將程式庫拆解成正確的調查分片，（Break a code base into the right investigative shards,）
- `01:45` 平行執行隔離的 Agent，（run isolated agents in parallel,）
- `01:47` 然後將它們的發現聚合為已驗證的漏洞利用路徑。（then aggregate their findings into verified exploit paths.）
- `01:51` 這就是它被稱為 Agentic map reduce 的原因。（Hence the name, agentic map reduce.）
- `01:54` Security Swarm 採用了另一個步驟，（Security swarm employs another step）
- `01:56` 以確保其發現代表的是真實的弱點。（to ensure that its findings represent real vulnerabilities.）
- `02:00` Devin 會執行一個安全的沙盒虛擬機，（Devon runs a secure sandboxed virtual machine）
- `02:03` 來測試每一個漏洞利用。（to test each exploit.）
- `02:05` 這意味著它可以執行實際的程式，（That means it can run the actual program）
- `02:07` 以驗證假設的發現，（to validate the hypothesized finding）
- `02:09` 並消除誤報。（and eliminate false positives.）
- `02:11` 由於 Agentic map reduce 的存在，（Because of agentic map reduce,）
- `02:13` 該架構本質上更有效率。（the architecture is inherently more efficient.）
- `02:16` Devin 絕不會進行超出所需的調查。（Devon never investigates more than it needs to.）
- `02:18` 只有與安全相關的檔案才會進入管線。（Only files with security relevance enter the pipeline,）
- `02:21` 在調查人員提出假設後，（and after investigators present hypotheses,）
- `02:24` 推理縮減器將會消除誤報，（the reasoning reducer will eliminate false positives）
- `02:27` 並僅專注於中度、（and only focus on shards with medium,）
- `02:30` 高度或嚴重等級的發現結果進行驗證。（high, or critical severity findings for validation.）
- `02:33` 除此之外，使用者還能透過掃描設定檔，（On top of that, users get a high degree of control）
- `02:36` 獲得高度的控制權，（via scanning profiles to focus scans）
- `02:38` 將掃描聚焦在重要的部分。（on the parts that matter.）
- `02:40` 最大的槓桿點是批次大小，（The biggest lever is batch size,）
- `02:42` 這實際上是一個乘數，（which is effectively a multiplier）
- `02:44` 決定了有多少調查人員的工作階段會被展開執行。（on how many investigator sessions are getting fanned out.）
- `02:47` 最後，Security Swarm 也會隨著時間推移變得更有效率，（Finally, security swarm also gets more efficient over time）
- `02:51` 因為後續的執行只會處理自上一次掃描以來的（because subsequent runs only process the diffs）
- `02:54` 差異內容。（since the previous scan.）
- `02:56` 因此，（As a result,）
- `02:57` Devin Security Swarm 是尋找已驗證漏洞（Devon security swarm is the most cost-effective way）
- `03:00` 最具成本效益的方式，（to find verified vulnerabilities,）
- `03:02` 而且它能比目前市面上（and it finds them more consistently）
- `03:04` 任何其他 AI 驅動的安全性掃描器（than any other AI-powered security scanner）
- `03:07` 更穩定地找出這些漏洞。（currently available.）
- `03:08` 我們在包含 50 個真實世界漏洞的基準測試中，（We evaluated Devon Security Swarm）
- `03:11` 評估了 Devin Security Swarm，（on a benchmark of 50 real-world vulnerabilities,）
- `03:14` 每個漏洞都連結到公開的 GitHub 安全公告。（each tied to a published GitHub security advisory.）
- `03:17` Devin Security Swarm 已經被大型企業用於（Devon Security Swarm is already used by major companies）
- `03:21` 定期的安全性掃描，（for regular security scanning,）
- `03:22` 並且從今天起在 Devin 中正式開放使用。（and it's available starting today in Devon.）

**測試結果顯示，Devin Security 達到了最高的召回率（72%），且每次執行的成本（$90.23）比次優的 Claude Security（召回率 68%，成本 $131.87）更低。**

**數據表**

|   | Recall | $/Run |
| --- | --- | --- |
| Devin Security | 72% | $90.23 |
| Claude Security | 68% | $131.87 |
| Codex Security | 48% | $118.20 |
| Cursor Security | 26% | $4.60 |

**這是一組展示 Agentic MapReduce Pipeline 安全掃描流程的技術演示圖。**

**影片中的 Prompt 與操作**

操作步驟：

1. （00:00）點擊進入 Plan 階段
2. （00:02）點擊進入 Shard 階段
3. （00:03）點擊進入 Map 階段
4. （00:04）點擊進入 Reduce 階段
5. （00:05）點擊進入 Verify 階段

## 標籤

Agent, 新產品, 資安, 自動化, Cognition