MIT 🧠@ZimingLiu11 Team's New Work: Many aspects of large model training can be analytically understood through basic thermodynamic concepts. Large neural networks share astonishing similarities with thermodynamic systems; both involve a large number of degrees of freedom and exhibit stochastic dynamic characteristics. The learning rate of neural networks mainly serves three functions during training: controlling temperature, controlling entropy force, and controlling time scale

https://arxiv.org/abs/2505.10559