← 返回首頁

Why Is Continual Learning Even Possible Mathematically? 為什麼持

deep Manifold
deep Manifold
@BetaTomorrow
128🔁 19
𝕏 (Twitter)🔥🔥

Why Is Continual Learning Even Possible Mathematically? 為什麼持續學習在數學上是可能的?

To answer this question, we need to step back a bit. Mathematical solutions come in two broad types: analytical and numerical.

Analytical mathematics seeks an exact solution: a true fixed point. Classical numerical computation begins where that exactness is no longer available. It proceeds through discretization, approximation, and iteration, often carried by derivative and integral processes, to obtain a converged solution, not the true answer in the analytical sense, but an average solution, or what we may call an average fixed point. Both the analytical true fixed point and the numerical average fixed point are static, because the equation and coordinate are given and fixed.

From this point of view, is a neural network a numerical computation? Yes, definitively. A neural network does not derive solutions analytically. It discretizes data into layers and weights, approximates through nonlinear activations, and iterates through derivative and integral processes. Deep Manifold makes this precise. It gives a global equation of the neural network as a Lagrangian formulation of a fixed point, but the fixed point is dynamic and stochastic.

要回答這個問題,我們需要退一步思考。數學解可以分為兩大類:解析解與數值解。

解析數學追求精確解:一個真正的不動點。經典數值計算則從精確解不再可得之處開始。它透過離散化、近似和迭代推進,通常借助微分和積分過程,獲得一個收斂解,並非解析意義上的真實答案,而是一個平均解,或者我們稱之為平均不動點。解析真不動點與數值平均不動點都是靜態的,因為方程式和座標是給定且固定的。

從這個角度來看,神經網路是一種數值計算嗎?是的,毫無疑問。神經網路並不解析地推導解。它將資料離散化為層和權重,透過非線性啟用進行近似,並透過微分和積分過程進行迭代。Deep Manifold 對此給出了精確的表述:它將神經網路的全局方程式表示為不動點的拉格朗日形式,但這個不動點是動態且隨機的。

Further, Deep Manifold states that coordinates change at each iteration. This coordinate change is part of the learning dynamics of an inverse problem. Learning is an inverse problem. In theory, a neural network can be trained indefinitely, and its corresponding fixed point moves with it. Deep Manifold views neural networks as stacked piecewise manifolds, where each manifold’s orientation changes per iteration; in algebraic terms, this is a coordinate change.

The coordinate change during training is mathematically the reason why continual learning is possible. Continual learning is not a special case or an engineering patch. It is the natural behavior of a numerical system whose average fixed point shifts as the underlying data distribution evolves.

進一步地,Deep Manifold 指出座標在每次迭代中都會發生變化。這種座標變化是逆問題學習動力學的一部分。學習本質上是一個逆問題。理論上,神經網路可以無限期地訓練,其對應的不動點也隨之移動。Deep Manifold 將神經網路視為堆疊的分段流形,每個流形的方向在每次迭代中都會改變;用代數語言來說,這就是座標變換。

訓練過程中的座標變換,在數學上正是持續學習得以可能的根本原因。持續學習並非特例,也不是工程上的權宜之計。它是一個數值系統的自然行為,其平均不動點隨著底層資料分佈的演變而移動。

The next questions follow naturally: how do we make continual learning more efficient, and is there a fundamental limit?

A pre-trained base model already contains abundant fixed points and many shortcut pathways. These shortcuts are much like human intuition: fast, reliable, and largely opaque. We are often uncomfortable with intuition, whether in humans or machines, especially when those pathways are hidden from us. That is why we build reasoning models with RL. In this sense, continual learning is not simply the accumulation of new knowledge.

It is the production of reasoning, largely through the formation of chain-of-reasoning pathways toward fixed points. Such reasoning pathways also reduce hallucinations, which are often observed along shortcut pathways in base models.

接下來自然引出兩個問題:如何使持續學習更加高效?以及是否存在根本性的極限?

預訓練基礎模型已經包含大量不動點和許多捷徑通路。這些捷徑很像人類的直覺:快速、可靠,卻大多不透明。我們往往對依賴直覺的系統感到不安,無論是人類或是機器,尤其是當這些通路對我們隱而不見時。這正是我們用 RL 構建推理模型的原因。從這個意義上說,持續學習不僅僅是新知識的累積。

它是推理的生產過程,主要透過形成通向不動點的推理鏈路徑來實現。這樣的推理路徑同時也能減少幻覺現象:幻覺往往出現在基礎模型的捷徑通路上。

From this angle, RL methods such as GRPO can be viewed as curvature perturbations. Such perturbations are powerful because they can reshape the geometry of existing pathways and incubate new reasoning routes. But perturbation cannot be arbitrary. It must have direction; otherwise it is wasted effort, since many pathways are already formed through pretraining and prior RL rounds. More importantly, the same perturbation that strengthens one pathway can weaken or break another. That is where learning and catastrophic forgetting begin, not as a memory failure, but as a geometric consequence of operating on a shared manifold.

This reframes both questions. Efficiency is not about training faster. It is about directing perturbations so they compound rather than conflict. And the limit of continual learning may not be a hardware constraint or a data constraint — it may be a geometric one: how many reasoning pathways a manifold of given capacity can support before mutual destruction becomes inevitable.

Overall, continual learning should be studied under training progression including fixed point progression, weight space geometry deformation, and neural plasticity.

從這個角度來看,GRPO 等 RL 方法可以被視為曲率擾動。這類擾動之所以強大,在於它們能夠重塑現有通路的幾何結構,並孕育新的推理路徑。但擾動不能是任意的,它必須有明確的方向;否則便是徒勞,因為許多通路已經在預訓練和先前的 RL 輪次中形成。更重要的是,強化某條通路的擾動,可能同時削弱乃至破壞另一條通路。學習與災難性遺忘正是從這裡開始的,不是作為記憶失敗,而是在共享流形上操作的幾何必然結果。

這重新定義了兩個問題的框架。效率不在於訓練得更快,而在於引導擾動使其相互增強而非相互衝突。持續學習的極限,也許不是硬體約束或資料約束,而是幾何約束:一個給定容量的流形能夠支撐多少條推理路徑,才不至於導致相互破壞。

總體而言,持續學習應在訓練進程的框架下加以研究:涵蓋不動點的演進、權重空間幾何形變的演進,以及神經可塑性。


deepmanifold.ai

Neural network mathematics, as we have discovered, is highly counterintuitive and extremely primitive, often defying our expectations. Yet once we see through its first layer, we find beauty, elegance, and richness.

We do not know how deep it goes; it may be another 30 to 50 years before it is fully understood. Newton’s calculus waited 200+ years for its mathematical rigor, that foundation was eventually supplied by my co-author (Gen-Hua Shi), through the theory of fixed point classes in his twenties, in the late 1960s.

正如我們所發現的,神經網路數學高度違反直覺且極其原始,常常超出我們的預期。然而,一旦我們看穿其表象,便能發現其中的優美、洗練與豐饒。

我們不知道它究竟有多深;可能還需要 30 到 50 年的時間才能被完全理解。牛頓的微積分曾等待了 200 多年才獲得其數學嚴密性,而這一基礎最終是由我的合著者(石根華)在 20 世紀 60 年代後期、在他 20 多歲時透過「不動點類理論」所奠定的。

https://open.substack.com/pub/deepmanifold/p/why-is-continual-learning-even-possible