DISCO 透過聯合擴散技術實現蛋白質序列與結構的同步設計,突破自然界化學限制
DISCO 透過聯合擴散技術實現蛋白質序列與結構的同步設計,突破自然界化學限制。
DISCO (Diffusion for Sequence-structure CO-design) 是一種多模態生成模型,能同時設計蛋白質序列與 3D 結構,無需預先指定催化殘基,即可創造出自然界未曾出現過的全新酵素。該模型不僅在設計效能上超越現有方法,更成功透過濕實驗驗證,設計出能執行自然界不存在的化學反應之酵素。
技術瓶頸與挑戰
傳統酵素工程(如定向演化)雖然強大,但過程極度耗時且受限於自然界已存在的化學反應。例如,一項針對 C(sp³)–H 插入反應的工程,過去需耗費 14 輪實驗與超過一年時間。現有的 AI 設計方法則面臨兩大核心瓶頸:
- 依賴「theozyme」(預先指定的催化殘基排列):這要求研究者對反應機制有極深理解,但在處理新穎化學反應時,此類資訊往往不可得。
- 序列與結構生成脫鉤:現有方法通常先生成骨架,再填充氨基酸序列,這種分離的流程導致關鍵資訊在轉換過程中流失。
核心技術架構
DISCO 透過「聯合擴散」(Joint Diffusion)技術,將蛋白質序列與 3D 結構視為一個整體進行生成,徹底消除了傳統兩階段流程的缺陷。其關鍵技術創新包括:
- 跨模態循環(Cross-modal recycling):結構與序列資訊在生成過程中進行雙向對齊,確保氨基酸選擇與幾何結構相互適應。
- 自我修正序列(Self-correcting sequence):允許模型在生成過程中 revisits 並修正早期的氨基酸選擇,避免過早鎖定錯誤。
- 雜訊引導(Noisy guidance):透過對另一模態的雜訊版本進行條件化,強化模型對結構與序列的對齊能力。
- Feynman-Kac Correctors:在推理階段進行多模態引導,例如強制增加二硫鍵密度,或在結合目標的同時主動避開誘餌分子。
實際效能與驗證
DISCO 在「Studio-179」基準測試中展現了卓越的通用性,該測試包含 179 種自然與非自然配體。DISCO 在 179 個目標中,有 178 個表現優於現有基準模型。在濕實驗(Wet lab)驗證中,其成果更具突破性:
- 針對碳烯轉移反應(Carbene-transfer):單次計算設計即達到 2,360 TTN(總轉換數),遠超 14 輪定向演化的成果。
- 針對 B–H 插入反應:單次設計達到 5,170 TTN,效能是實驗室定向演化的兩倍以上。
創新性與演化潛力
DISCO 設計出的酵素活性位點在自然界中並不存在,其創新性極高:
- 結構新穎:超過 90% 的設計基序(Motif)在「AlphaFold Database」中找不到相近的同源物,證明模型並非單純重組已知片段,而是發現了全新的解決方案。
- 具備演化起點:這些設計不僅具備功能,更具備「可演化性」。以 dCT-H11 為例,僅經過一輪誘變,活性即提升 4 倍,並能改變立體選擇性,證明 DISCO 為蛋白質工程提供了一個極佳的演化起點。
What if AI could invent enzymes that nature hasn’t seen? 👩🔬🧑🔬
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
Introducing 🪩 DISCO: Diffusion for Sequence-structure CO-design
14 rounds of directed evolution and over a year of wet lab work. That's what it took to engineer an enzyme for selective C(sp³)–H insertion, one of the… pic.twitter.com/VvRqGXmZOw
Evolution is an amazing chemist, but the reactions it has explored represent a remarkably narrow slice of what is possible. Existing AI models require predefined "theozymes" (exact catalytic residue arrangements) and generate backbones before sequences. DISCO generates both… pic.twitter.com/B0hiDezXFi
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
Introducing the 💃🕺Studio-179 benchmark🕺 💃, a library of 179 natural and non-natural ligands spanning catalysis, pharmaceuticals, luminescence, and sensing: DISCO outperforms baselines on 178/179 targets, along with sequence-specific DNA and RNA binders. It also shines in… pic.twitter.com/nNgzbtlpGi
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
How does it work? DISCO aligns sequence & structure bidirectionally via cross-modal recycling, self-correction, and noisy guidance. It also introduces an entropy-adaptive sequence temperature to properly balance information across modalities during generation. ⚖️ pic.twitter.com/pVpz3zFz3J
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
Because DISCO generates sequence and structure together, it unlocks true multimodal inference-time steering. Deriving multimodal Feynman-Kac Correctors, DISCO steers generation on the fly—like forcing the creation of dense disulfide bonds (FKC-MM) or binding a target while… pic.twitter.com/hVNKexBBuK
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
The ultimate test is the wet lab. 🧪
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
DISCO was challenged to design enzymes for carbene-transfer reactions—chemistry alien to the natural world. It mastered selective C(sp³)–H insertion, one of the most challenging transformations in organic chemistry. A single computational… pic.twitter.com/V1TcSDELW0
The success continued with B–H insertion. A single DISCO design achieved 5,170 TTN, outperforming three rounds of laboratory directed evolution by over 2x! 🤯 pic.twitter.com/b6FaYdLgSJ
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
Perhaps the most interesting property? These active sites don't exist in nature.
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
When searched against 200M+ structures in the AlphaFold Database, the majority of DISCO's generated binding motifs have no close natural homologs. pic.twitter.com/0NfJCz8OCJ
Take the top design, dCT-H11. Its closest structural match (PDB 3CRJ) is a non-enzymatic transcription factor from a Dead Sea extremophile! DISCO completely repurposed this fold for carbene chemistry with a completely novel active-site geometry and very low sequence identity.… pic.twitter.com/GEjVMkiSps
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
These enzymes aren't just functional; they're evolvable starting points. One round of mutagenesis on dCT-H11 yielded a 4x activity increase for spirocyclopropanation and even inverted stereoselectivity. The chemistry nature never explored is now within reach! 🌍 pic.twitter.com/vNZof3Gy6m
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
Incredible collaboration with Théophile Lambert @martoskreto Daniel Roth @Yueming_Long @ZiqiLi0513 @NZhang211 @MirunaCretu2 @francescazfl @tanviganapathy Emily Jin @bose_joey @jsunn_y @k_neklyudov @Yoshua_Bengio @AlexanderTong7 @francesarnold @ChengHaoLiu1 at @Caltech…
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
Tagging those who may be interested: @karsten_kreis @HannesStaerk @GabriCorso @AntonBushuiev @roman_bushuiev @DidiKieran @mmbronstein @MoleiTaoMath @sokrypton @_akhaliq @LeoTZ03 @MoAlQuraishi @rishabh16_ @json_yim @KevinKaichuang @chaitjo @saakohl
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
Not to leave anyone out, tagging a few others who might be curious: @pranamanam @SarahAlamdari @avapamini @nfusi @nc_frey @BiologyAIDaily @owl_posting @PatrickKidger @andrewwhite01 @SGRodriques @alexrives @brianhie @erictopol @zwcarpenter @enfeinberg @amyxlu @quanquangu…
— Jarrid Rector-Brooks (@jarridrb) April 8, 2026
