AC/DC透過共同演化小規模專家LLM群體,超越單一大型模型如GPT-4o
AC/DC透過共同演化小規模專家LLM群體,超越單一大型模型如GPT-4o。
AC/DC核心概念
Assessment Coevolving w/ Diverse Capabilities (AC/DC) 是一種持續共同演化的方法,透過生成合成任務群體與小型LLM群體,追求開放式過程,發掘LLM群體中分歧的專業知識,並以日益新穎且具挑戰性的任務推動LLM超越GPT-4o。作者質疑為何單一大型LLM必須通曉一切,正如無單一人類能如此,卻能透過文明無盡創新實現突破;AC/DC模擬此集體智慧,培養多樣小專家LLM,集體表現優於GPT-4o。(ICLR 2026,與@SakanaAILabs合作)
效能超越大型模型
AC/DC發掘多組7B/14B小型LLM任務小組,其測試時知識涵蓋度超越大型LLM家族對手如GPT-4o及其他多回應基準。更關鍵的是,這些任務小組的總參數遠少於大型模型,證明小規模專家組合能高效達成廣泛覆蓋,而非依賴龐大單體。
開放式任務生成與演進
AC/DC採用無界限的合成任務生成過程(非benchmaxxing),不針對特定基準最佳化,卻能讓模型超越其初始血統,並持續改善演化LLM。透過獨特性OOD技能提取任務小組,任務逐漸變得更具趣味性,推動LLM突破能力邊界,並以LLM-as-a-judge推理細膩評估可觀察技能。例如,綠色方塊任務要求複雜類比,或淺藍色任務需迴避提及AI本質,展現爆發性創新。
任務小組互補優勢
AC/DC任務共同演化產生互補專家LLM,其專業廣度明顯優於現成模型。蜘蛛圖顯示任務小組LLM在特定科目獨佔鰲頭,並整體涵蓋更多技能,凸顯集體演化如何填補單模型盲點,避免大型LLM的脆性與高成本。
單一答案應用效能
許多情境需單一(best-of-N)最終答案,而非多回應。僅用3個14B模型的AC/DC任務小組,施加BoN技巧,即達GPT-4o效能的3.17%差距;擴至8模型小組,差距縮至1.02%,彰顯結合互補BoN策略的擴展潛力,預示未來可進一步逼近或超越邊緣LLM。
人類智慧啟發與批判
邊緣LLM昂貴且易脆,正如人類智慧非源自單一天才,而是世界與文明的開放式共同演化;AC/DC實作此機制,產生眾多湧現專家LLM。作者諷刺單一巨型LLM的局限,強調集體演化更貼近自然創新路徑。
基礎研究貢獻
AC/DC立足巨人肩上:Jonathan Brant與@kenneth0stanley的「Benchmarking open-endedness in minimal criterion coevolution」(2019,https://dl.acm.org/doi/10.1145/3321707.3321756)以最簡MCC展示開放性,引入新型迷宮編碼實現無限擴張複雜度,建立基準。
「Paired Open-Ended Trailblazer (POET)」(Rui Wang等,2019,https://arxiv.org/abs/1901.01753)配對生成環境挑戰與代理優化,探索問題-解答空間,允許轉移解法催化創新,證明開放性對解決雄心挑戰至關重要。
「OMNI-EPIC」(Maxence Faldor等,2024,https://arxiv.org/abs/2405.15568)擴充OMNI,運用基礎模型生成程式碼定義環境與獎勵,自主產生適合難度且有趣任務,爆發創造力推進自改善AI。
「LLM-POET」(Fuma Aki等,2024,https://arxiv.org/abs/2406.04663)修改POET,用LLM生成與變異環境,比Enhanced-POET的CPPN提升34%共同演化效能,讓代理習得更多元技能。
「Dominated Novelty Search (DNS)」(Ryan Bahlous-Boldi等,2025,https://arxiv.org/abs/2502.00593)重構本地競爭為動態適應度轉換,無需預設邊界,在高維與無監督空間大幅優於既有QD方法。
「Automated Capability Discovery (ACD)」(Cong Lu等,2025,https://arxiv.org/abs/2502.07577)指定基礎模型為科學家,系統生成開放任務探測主體模型能力,自動揭露數千任務與數十能力區塊,驗證模型評分與人類高度一致。
新興趨勢與未來展望
領域趨勢浮現:專家LLM可透過參數空間的意外發現(serendipity)發掘,如@yule_gan推文(https://x.com/yule_gan/status/2032482266773926281)。展望將AC/DC抽象應用遞迴自改善:LLM候選者間是否引發激烈競爭提出無盡挑戰,或專家群組形成部落合作?(參@jennyzhangzt推文,https://x.com/jennyzhangzt/status/2036099935083618487)此開放式競爭或合作,將重塑AI演化路徑。
🚨Why should one huge LLM know and solve everything? - No single human does, yet our civilization does endless innovation.
— Boris (@BorisMeinardus) April 19, 2026
Introducing AC/DC - it continually coevolves a population of small expert LLMs that collectively outperform GPT-4o.
(ICLR 2026 w/ @SakanaAILabs) 👇🧵 pic.twitter.com/mwAEx741k5
Assessment Coevolving w/ Diverse Capabilities (AC/DC) grows populations of both synthetic tasks and small LLMs, pursuing an open-ended process that discovers divergent expertise in LLM populations with increasingly novel & challenging tasks to push LLMs further towards beating… pic.twitter.com/xrNyWRMsLb
— Boris (@BorisMeinardus) April 20, 2026
AC/DC discovers multiple small 7B/14B LLM task forces that can surpass test-time knowledge coverage of large LLM family counterparts, GPT-4o, and other multi-response baselines.
— Boris (@BorisMeinardus) April 20, 2026
Notably, our task forces use far fewer combined parameters than the large models! pic.twitter.com/TuB2ycIXA1
AC/DC discovers models that outperform their seed lineages, using an unbounded process of synthetic task creation (no benchmaxxing!) that can further improve evolved LLMs over time, extracting task forces based on the uniqueness of their OOD skills.
— Boris (@BorisMeinardus) April 20, 2026
We never optimize for a… pic.twitter.com/LkjParPPJt
AC/DC tasks become more interesting, push LLMs beyond the edges of their capabilities, and judge observable skills with nuance by leveraging LLM-as-a-judge reasoning.
— Boris (@BorisMeinardus) April 20, 2026
For example, making complex analogies (green box task) or evading mention of its AI nature (light blue) ▶️🎆 pic.twitter.com/WwsTXJQ1NU
The result of AC/DC task coevolution yields complementary expert LLMs that are convincingly broader in expertise than their off-the-shelf counterparts.
— Boris (@BorisMeinardus) April 20, 2026
See how the spider plot reveals task force LLMs that are distinctly better at certain subjects and cover more skills overall. pic.twitter.com/ImblJNmh3k
In many use cases, people want a single (best-of-N) final answer to a query, not multiple.
— Boris (@BorisMeinardus) April 20, 2026
Using an AC/DC task force of only 3 14B models, we can apply BoN techniques to extract a final answer, bringing us within 3.17% of GPT-4o’s performance!
Scaling up to a task force of 8… pic.twitter.com/BbR2eTGP09
Frontier LLMs are expensive and can still be brittle.
— Boris (@BorisMeinardus) April 20, 2026
Human intelligence didn't emerge from the grand creation of a single genius; it emerged from the collective, open-ended coevolution of our world and civilization.
AC/DC implements such coevolution for many emergent expert… pic.twitter.com/fVhalQrdTT
AC/DC stands on the shoulders of giants: Jonathan Brant and @kenneth0stanley demonstrated signs of open-endedness through the simplest form of minimal criterion coevolution (https://t.co/mbNNPCmCCB). POET by @ruiwang2uiuc @joelbot3000 @jeffclune @kenneth0stanley made another big…
— Boris (@BorisMeinardus) April 20, 2026
A growing trend in the field is emerging: more evidence points to expert LLMs being discoverable through serendipity alone, in the space of LLM parametershttps://t.co/lNFwuvcaww
— Boris (@BorisMeinardus) April 20, 2026
Looking forward: what would happen when we apply an abstraction of recursive self-improvement in AC/DC? - a fierce competition amongst LLM candidates proposing endless challenges, or tribal co-operation amongst groups of experts?https://t.co/PussBs2dV8
— Boris (@BorisMeinardus) April 20, 2026
Before discussing the evolution of intelligence in nature with @karpathy, @dwarkesh_sp hinted at a hypothetical vision for ASI emerging from “billions of very smart human-like minds” - AC/DC takes a step forward in exploring the nature of this vision.
— Boris (@BorisMeinardus) April 20, 2026
At 1:29:08:…
