Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining
Evolving story · 1 updatesNatural Ungrokking: How Language Models Forget Rules During TrainingTimeline →Researchers discovered 'natural ungrokking'—where a small language model forgets learned rules (e.g., pronoun-gender resolution) mid-pretraining despite no loss curve changes, with survival tied to corpus statistics.
- ›Natural ungrokking describes the mid-pretraining reversal of learned rules in small language models, with no trace in the loss curve.
- ›A model initially learned pronoun-gender rules (94% accuracy) but later forgot them (near 0% accuracy) despite unchanged training data.
- ›Rule survival is predictable from corpus statistics, particularly the frequency of rule application in the training stream.
- ›This phenomenon challenges traditional assumptions about how models retain knowledge during pretraining.
- ›The study suggests corpus biases play a critical role in determining which rules models retain.
A new study reveals an unexpected phenomenon in language model pretraining called 'natural ungrokking,' where a model initially learns and applies rules (e.g., pronoun-gender resolution) but later discards them without any detectable change in the loss curve. For example, a model trained on a corpus with gendered pronouns initially achieves 94% accuracy on held-out probes by step 925 but scores near zero by step 3,500, despite the rule's evidence remaining in the training data. The survival of learned rules is predictable based on corpus statistics, specifically how often the training stream shows the rule's application. This challenges assumptions about how models retain knowledge during pretraining and highlights the role of implicit corpus biases in shaping model behavior.
Source: Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining. Read the full piece at the source.
Highlights the need for better monitoring of model behavior during pretraining to detect rule forgetting, even when loss curves remain stable.
Raises concerns about the reliability of language models in production, as learned rules may degrade unpredictably over training.
Identifies a potential risk in AI model training pipelines, which could impact investments in companies relying on pretrained models.
Provides insight into the complexities of language model training and the factors influencing model behavior beyond loss minimization.
Challenges public perception of AI models as static after training, showing that their behavior can change unpredictably during development.
- natural ungrokking
- The mid-pretraining reversal of learned rules in language models without changes in the loss curve.
- pretraining
- The initial phase of training a language model on a large corpus of text to learn general patterns and rules.
- loss curve
- A graph showing the model's error rate during training; used to gauge learning progress.
- held-out probes
- Test cases not seen during training, used to evaluate the model's generalization ability.
- corpus statistics
- Quantitative measures of the training data, such as frequency of specific patterns or rules.
AI bias estimate: Neutral presentation of empirical findings; no overt opinion or sensationalism detected. (Automated estimate, not a definitive judgement.)
Summary and analysis generated by AI (mistral). Always verify against the original sources.