โ All stories
AdamW Optimizer Under Heavy-Tailed Noise
Researchers question the effectiveness of AdamW optimizer under heavy-tailed noise, a common scenario in large language model pretraining. A rigorous convergence theory for AdamW in this regime is still lacking.
One continuously updated timeline instead of dozens of separate articles. New developments are appended as the story evolves.
- AnnouncementJun 22, 2026, 05:58 PM 83%
Researchers question the effectiveness of AdamW optimizer under heavy-tailed noise in LLM pretraining.
Researchers question the effectiveness of AdamW optimizer under heavy-tailed noise, a common scenario in large language model pretraining. A rigorous convergence theory for AdamW in this regime is still lacking.
Read the full story โ