Seminars & Colloquia Calendar
Stochastic learning dynamics and generalization in neural networks: Can statistical physicists help understand AI?
Yuhai Tu - IBM T. J. Watson Research Center
Location: Zoom
Date & time: Wednesday, 01 March 2023 at 10:45AM - 11:45AM
Abstract: Despite the great success of deep learning, it remains largely a black box. For example, the main search engine in deep neural networks is based on the Stochastic Gradient Descent (SGD) algorithm, however, little is known about how SGD finds ``good" solutions (low generalization error) in the high-dimensional weight space. In this talk, we will first give a general overview of SGD followed by a more detailed description of our recent work [1,2] on the SGD learning dynamics, the loss function landscape, and their relationship.
More specifically, our study shows that SGD dynamics follows a low-dimensional drift-diffusion motion in the weight space and the loss function is flat in most directions with large values of flatness (small curvatures). Furthermore, our study reveals a robust inverse relation between the weight variance in SGD and the landscape flatness opposite to the fluctuation-response relation in equilibrium systems. We develop a statistical theory of SGD based on properties of the ensemble of minibatch loss functions and show that the noise strength in SGD depends inversely on the landscape flatness, which explains the inverse variance-flatness relation. Our study suggests that SGD serves as an ``intelligent" annealing strategy where the effective anisotropic “temperature” self-adjusts according to the loss landscape in order to find the flat minima that is found to be more generalizable. Finally, we discuss an application of these insights for reducing catastrophic forgetting for sequential multiple tasks learning.
Time permits, we will discuss a more recent work on trying to understand why flat solutions are more generalizable and whether there are other measures for better generalization based on an exact duality relation we found between neuron activity and network weight [3].
[1] “The inverse variance-flatness relation in Stochastic-Gradient-Descent is critical for finding flat minima”, Y. Feng and Y. Tu, PNAS, 118 (9), 2021.
[2] “Phases of learning dynamics in artificial neural networks: in the absence and presence of mislabeled data”, Y. Feng and Y. Tu, Machine Learning: Science and Technology (MLST), July 19, 2021. https://iopscience.iop.org/article/10.1088/2632-2153/abf5b9/pdf
[3] “The activity-weight duality in feed forward neural networks: The geometric determinants of generalization”, Y. Feng and Y. Tu, https://arxiv.org/abs/2203.10736
R. Shapiro Organizer's Page
Chiara Damiolini, Ian Coley and Franco Rota -Charles Weibel Organizer's Page
Brooke Logan
Wujun Zhang Organizer's webpage
Ziming Shi, Sagun Chanillo, Xiaojun Huang, Chi Li, Jian Song Seminar website Old seminar website
Swastik Kopparty, Sepehr Assadi Seminar webpage
Jeffry Kahn, Bhargav Narayanan, Jinyoung Park Organizer's webpage
Brooke Ogrodnik, Website
Robert Dougherty-Bliss and Doron Zeilberger --> homepage
Paul Feehan, Daniel Ketover, Natasa Sesum Organizer's webpage
Lev Borisov, Emanuel Diaconescu, Angela Gibney, Nicolas Tarasca, and Chris Woodward Organizer's webpage
Jason Saied Seminar webpage
Brian Pinsky, Rashmika Goswami website
Quentin Dubroff Organizer's webpage
James Holland; Organizer website
Edna Jones Organizer's webpage
Brooke Ogrodnik website
Yanyan Li, Zheng-Chao Han, Jian Song, Natasa Sesum Organizer's Webpage
Organizer: Luochen Zhao
Yanyan Li, Zheng-Chao Han, Natasa Sesum, Jian Song Organizer's Page
Lisa Carbone, Yi-Zhi Huang, James Lepowsky, Siddhartha Sahi Organizer's webpage
Simon Thomas website
Kasper Larsen, Daniel Ocone and Kim Weston Organizer's page
Joel Lebowitz, Michael Kiessling
Yanyan Li, Haim Brezis Organizer's Webpage
Stephen D. Miller, John C. Miller, Alex V. Kontorovich, Alex Walker seminar website
Stephen D. Miller
Brooke Ogrodnik, Website
Organizers: Yanyan Li, Z.C. Han, Jian Song, Natasa Sesum
Yael Davidov Seminar webpage
Kristen Hendricks, Xiaochun Rong, Hongbin Sun, Chenxi Wu Organizer's page
Fioralba Cakoni Seminar webpage
Ebru Toprak, Organizer
Organizer's webpage: Organizer's webpage
- Show events from all categories
Special Note to All Travelers
Directions: map and driving directions. If you need information on public transportation, you may want to check the New Jersey Transit page.
Unfortunately, cancellations do occur from time to time. Feel free to call our department: 848-445-6969 before embarking on your journey. Thank you.