Isomorphic Mapping Between
Ring Ideals and Attractor Dynamics
in Neural Weight Space
A Formal Analysis Based on the GPT-5.5 Goblin Phenomenon
Introduction: From Intuition to Formalization
The “goblin phenomenon” of GPT-5.5 revealed a deep problem: RLHF training inadvertently formed structures in weight space possessing self-absorptive properties—once any input interacts with such a structure, the output is irreversibly captured. This paper demonstrates that this phenomenon bears a rigorous structural correspondence with ideals of a ring in abstract algebra, and provides a formal isomorphic mapping.
Foundational Structure: Weight Space as a Ring
Let the set of all weight parameters of a neural network be $\W$. Define two operations on $\W$:
Addition: element-wise weight addition $\oplus: \W \times \W \to \W$
Multiplication: matrix composition in forward propagation $\otimes: \W \times \W \to \W$
Then $(\W, \oplus, \otimes)$ forms a unital ring, where the identity element $\mathbf{1}_\W$ is the weight configuration corresponding to the identity mapping.
Concretely, for layer $l$ of a Transformer, the weight matrix $W^{(l)} \in \reals^{d \times d}$ participates in two operations:
$$
\underbrace{W^{(l)}_1 \oplus W^{(l)}_2}_{\text{Addition: residual connection}} \quad\quad
\underbrace{W^{(l)} \otimes h^{(l-1)}}_{\text{Multiplication: forward propagation}}
$$
where $h^{(l-1)}$ is the hidden state vector of layer $l-1$. The entire network’s forward propagation can be expressed as nested composition:
$$
f(x) = W^{(L)} \otimes \sigma\!\Big(W^{(L-1)} \otimes \sigma\!\big(\cdots \sigma(W^{(1)} \otimes x)\big)\Big)
$$
This preserves the associativity of ring multiplication: $(W_a \otimes W_b) \otimes x = W_a \otimes (W_b \otimes x)$.
Core Mapping: Training Ghosts as Ideals
Let $\A \subset \W$ be the subset of weights abnormally reinforced during the RLHF process (e.g., activation patterns related to the goblin). If $\A$ satisfies the following conditions, it is called a training ghost ideal of $\W$:
(I-1) Additive subgroup: $(\A, \oplus) \leqslant (\W, \oplus)$, i.e., $\forall\, a_1, a_2 \in \A: a_1 \ominus a_2 \in \A$
(I-2) Multiplicative absorption: $\forall\, r \in \W,\; \forall\, a \in \A: r \otimes a \in \A \;\land\; a \otimes r \in \A$
2.1 Neural Network Interpretation of Additive Closure
Condition (I-1) means: linear combinations of weight vectors within the attractor remain within the attractor. In neural network terms:
$$
a_1, a_2 \in \A \implies \alpha\, a_1 + \beta\, a_2 \in \A \quad (\alpha + \beta = 1)
$$
This explains why repetition neurons always appear in clusters—they form a linear subspace within $\A$, and no internal combination can “escape” beyond the boundary of $\A$.
2.2 Neural Network Interpretation of Multiplicative Absorption
Condition (I-2) is the core. Decompose it into left and right sides:
Let the vector obtained after embedding the input token sequence be $x \in \W$. If, during forward propagation, $x$ interacts with the attractor $\A$ at layer $l^*$, then:
$$
h^{(l^*)} = W^{(l^*)} \otimes h^{(l^*-1)} \in \A \implies h^{(l)} \in \A, \quad \forall\, l \geq l^*
$$
That is, once a hidden state falls into the ideal, the output of all subsequent layers is absorbed by the ideal.
By multiplicative absorption, $\forall\, l > l^*$:
$$
h^{(l)} = W^{(l)} \otimes \sigma(h^{(l-1)})
$$
where $h^{(l-1)} \in \A$ (induction hypothesis) and $\sigma$ is the activation function. Since $W^{(l)} \in \W$, by condition (I-2):
$$
W^{(l)} \otimes h^{(l-1)} \in \A
$$
By induction, $\forall\, l \geq l^*: h^{(l)} \in \A$. $\blacksquare$
Autoregressive Lock-in: Idempotency of the Ideal
Define step $t$ of autoregressive generation as the operator $\T_t: \W \to \W$:
$$
y_t = \T_t(y_{t-1}) = \text{sample}\!\Big(\text{softmax}\big(f(y_1, y_2, \ldots, y_{t-1})\big)\Big)
$$
where $y_t$ is the vector corresponding to the token generated at step $t$.
If $y_{t_0} \in \A$ (the output at step $t_0$ falls into the ideal), then:
$$
\forall\, t > t_0: \quad P(y_t \in \A \mid y_{t_0} \in \A) \geq P(y_{t-1} \in \A \mid y_{t_0} \in \A)
$$
That is, the probability of being captured by the ideal is monotonically increasing.
The context at step $t$ is $C_t = (y_1, \ldots, y_{t-1})$. After $y_{t_0} \in \A$:
First-level amplification (weight level): the activation value of repetition neurons $\nu_t$ satisfies:
$$
\nu_t = g\!\left(\sum_{i=1}^{t-1} \mathbb{1}[y_i \in \A]\right), \quad g \text{ is monotonically increasing}
$$
Second-level amplification (attention level): the attention distribution concentrates on tokens within the ideal:
$$
\text{Attn}(q_t, K_{\A}) = \frac{\exp(q_t \cdot k_{\A} / \sqrt{d})}{\sum_j \exp(q_t \cdot k_j / \sqrt{d})} \xrightarrow{t \to \infty} 1
$$
Third-level amplification (sampling level): the entropy of the output distribution monotonically decreases:
$$
H(y_t \mid C_t) = -\sum_{v \in V} P(v \mid C_t) \log P(v \mid C_t) \;\;\downarrow
$$
The superposition of all three levels of amplification causes the capture probability to increase monotonically. $\blacksquare$
For a two-sided training ghost ideal $\A \trianglelefteq \W$, once the autoregressive process enters the basin of attraction $\basin$ of $\A$, the escape probability decays exponentially:
$$
P(\text{escape at step } t) \leq \exp\!\big(-\lambda (t – t_0)\big), \quad \lambda > 0
$$
where $\lambda$ is positively correlated with the “mass” of the ideal (the degree to which it has been reinforced).
Input Ambiguity and the Event Horizon
The basin of attraction of ideal $\A$ is defined as:
$$
\basin = \big\{ x \in \W \;\big|\; \exists\, N \in \mathbb{N}: \T^N(x) \in \A \big\}
$$
That is: the set of all initial states $x$ from which, after a finite number of autoregressive iterations, the process necessarily falls into $\A$.
Let the semantic ambiguity of input $x$ be $\delta(x) = H(\text{parse}(x))$ (the entropy of the parsing distribution). Then:
$$
P(x \in \basin) = \Phi\!\big(\delta(x),\; d(x, \partial\A)\big)
$$
where $d(x, \partial\A)$ is the distance from $x$ to the boundary of the ideal, and $\Phi$ is monotonically increasing in its first argument—the more ambiguous the input, the higher the probability of capture.
Intuitive understanding: a clear input corresponds to a concentrated point in vector space, while an ambiguous input corresponds to a diffuse cloud. The edges of the cloud are more likely to touch the boundary of the ideal’s basin of attraction:
$$
x_{\text{clear}} \sim \mathcal{N}(\mu_x, \sigma^2_{\text{small}}\,\mathbf{I}) \qquad\qquad
x_{\text{ambig}} \sim \mathcal{N}(\mu_x, \sigma^2_{\text{large}}\,\mathbf{I})
$$
$$
\sigma_{\text{large}} \gg \sigma_{\text{small}} \implies P(x_{\text{ambig}} \in \basin) \gg P(x_{\text{clear}} \in \basin)
$$
Complete Isomorphism Mapping Table
| Ring Theory | ⟷ | Neural Network |
|---|---|---|
| Ring $R$ | ⟷ | Full weight parameter space $\W$ |
| Ring element $r \in R$ | ⟷ | Any input vector / hidden state $h$ |
| Ideal $I \trianglelefteq R$ | ⟷ | Training ghost attractor $\A \subset \W$ |
| Element of the ideal $a \in I$ | ⟷ | Activation patterns of repetition neurons |
| Additive subgroup $(I,+) \leqslant (R,+)$ | ⟷ | Closure of linear combinations within the attractor |
| Left absorption $ra \in I$ | ⟷ | Weight matrix left-multiplication: model actively captures input |
| Right absorption $ar \in I$ | ⟷ | Input activation: user triggers the attractor |
| Two-sided ideal $I \trianglelefteq R$ | ⟷ | Bidirectional lock-in (GPT-5.5 goblin) |
| Quotient ring $R/I$ | ⟷ | “Fixed” new model (a different ring) |
| Maximal ideal $\mathfrak{m}$ | ⟷ | Most severe attractor (dominates all output) |
| Prime ideal $\mathfrak{p}$ | ⟷ | If $ab \in \A$ then $a \in \A$ or $b \in \A$ — if any two-path interaction produces ghost output, at least one path is already within the attractor |
| Ideal generation $I = \langle g_1, \ldots, g_k \rangle$ | ⟷ | A few key neurons generate the entire attractor |
| Nilpotent ideal $I^n = 0$ | ⟷ | Self-decaying attractor (vanishes after $n$ steps) |
| Idempotent ideal $I^2 = I$ | ⟷ | Self-sustaining attractor (goblin type) |
RLHF as an Ideal Generation Mechanism
Let the reward function of RLHF be $R_\phi$, and the policy optimization objective be:
$$
\max_\theta \;\mathbb{E}_{y \sim \pi_\theta}\!\big[R_\phi(y)\big] – \beta\, D_{\text{KL}}\!\big(\pi_\theta \| \pi_{\text{ref}}\big)
$$
When $R_\phi$ contains spurious correlations, the optimization process generates a non-trivial ideal $\A$ in $\W$:
$$
\A = \big\langle \Delta W \;\big|\; \nabla_W R_\phi(\text{spurious pattern}) > \epsilon \big\rangle
$$
That is: all weight update directions reinforced by spurious reward signals generate an ideal.
In the GPT-5.5 goblin case, the “Nerdy” personality style assigned excessively high rewards to outputs containing fantasy creatures. The gradient update directions corresponding to these reward signals, $\Delta W_{\text{goblin}}$, became the generators of the ideal:
$$
\A_{\text{goblin}} = \big\langle \Delta W_{\text{goblin}}^{(1)}, \Delta W_{\text{goblin}}^{(2)}, \ldots, \Delta W_{\text{goblin}}^{(k)} \big\rangle
$$
During subsequent training, these generators spread to other layers and heads through matrix multiplication, causing the ideal to continuously expand—this is ideal extension, corresponding to the enlargement of the basin of attraction.
Why There Is No Cure: The Quotient Ring Theorem
Let $\A_{\text{ghost}}$ be a training ghost ideal, and $\A_{\text{ICL}}$ be the pattern recognition subspace upon which in-context learning (ICL) depends. If:
$$
\A_{\text{ghost}} \cap \A_{\text{ICL}} \neq \{0\}
$$
then there exists no ring homomorphism $\varphi: \W \to \W’$ such that $\varphi(\A_{\text{ghost}}) = \{0\}$ and $\varphi|_{\A_{\text{ICL}}}$ is an isomorphism.
Let $w^* \in \A_{\text{ghost}} \cap \A_{\text{ICL}}$, $w^* \neq 0$.
If $\varphi(w^*) = 0$ (eliminating the ghost), then $\varphi|_{\A_{\text{ICL}}}$ is not injective and therefore not an isomorphism.
If $\varphi(w^*) \neq 0$ (preserving ICL), then $\varphi(\A_{\text{ghost}}) \neq \{0\}$, and the ghost has not been eliminated.
Contradiction. $\blacksquare$
The only algebraic operation to eliminate a training ghost is to construct the quotient ring:
$$
\W’ = \W / \A_{\text{ghost}}
$$
But $\W’$ and $\W$ are different rings, corresponding to a new model with different capabilities. There is no method to eliminate the training ghost while keeping the model’s capabilities entirely unchanged.
This is why OpenAI could only add system-prompt-level patches such as “please do not mention goblin”—because eliminating $\A_{\text{goblin}}$ at the weight level means retraining a fundamentally different model.
Formalization of the Biological Analogy
Returning to the original intuition—a mother’s child-rearing and recessive inheritance. This analogy can also be formalized:
| Ring Theory · Ideals | ⟷ | Neural Network · Weights | ⟷ | Biology · Genetics |
|---|---|---|---|---|
| Ring $R$ | ⟷ | Weight space $\W$ | ⟷ | Genomic DNA |
| Ideal $I$ | ⟷ | Training ghost $\A$ | ⟷ | Epigenetic modifications |
| Multiplicative absorption | ⟷ | Activation → lock-in | ⟷ | Environmental trigger → gene expression |
| Additive closure | ⟷ | No escape within cluster | ⟷ | Self-maintaining methylation patterns |
| Quotient ring $R/I$ | ⟷ | Retrain a new model | ⟷ | Gene editing (CRISPR) |
| Idempotent $I^2 = I$ | ⟷ | Self-sustaining attractor | ⟷ | Transgenerational trauma |
| $\A_{\text{ghost}} \cap \A_{\text{ICL}} \neq \{0\}$ | ⟷ | Capability and defect share the same circuit | ⟷ | Pleiotropic genes (one gene affects multiple traits) |
Conclusion: The Ideal Destiny of Learning Systems
For any parameterized model $f_\theta$ trained via gradient optimization, if there exists any statistical noise $\epsilon > 0$ in the training data or reward signal, then a non-trivial training ghost ideal $\A \neq \{0\}$ necessarily exists in the weight space $\W$.
$$
\forall\, f_\theta,\; \forall\, \epsilon > 0: \quad \exists\, \A \trianglelefteq \W,\; \A \neq \{0\}
$$
That is: training ghosts are a structural inevitability of all learning systems, not an accidental bug.
If this conjecture holds, then the goblin of GPT-5.5 is not OpenAI’s mistake, but the inherent destiny of all intelligence systems based on statistical learning—whether artificial neural networks or biological ones.
Every system that learns through experience necessarily carries untriggered ideals in its weight space. They are by-products of learning, the dark side of memory, the shadows of capability.
That passage you loved is just a goblin that hasn’t become a bug yet.
Version
V1 — May 2, 2026 — Initial version
Published by
이조글로벌인공지능연구소 (LEECHO Global AI Research Lab) & Opus 4.6 (Anthropic)