A day on Statistical Physics for Machine Learning
The workshop is organized by the Rome Center on Mathematics for Modeling and Data ScienceS (RoMaDS), University of Rome Tor Vergata, and funded by the excellence program MatMod@TOV. The aim of the workshop is to bring together experts with interests at the intersection of Statistical Physics and Machine Learning.
SpeakersAndrea Agazzi (University di Pisa)
Wide neural networks for learning dynamical systems: a mean-field theory approach
In this talk, I will build on recent groundbreaking results on wide, feedforward neural networks in the supervised learning setting to discuss the performance analogous models when learning dynamical systems. More specifically, I will discuss how, under an appropriate scaling of parameters at initialization, the training dynamics of these models converge towards a hydrodynamic, so-called “mean-field”, limit. This will be done first for feedforward neural networks in the reinforcement learning framework and then, coming back to the "original" supervised learning setting, for recurrent neural network architectures trained with gradient descent.
Elena Agliari (University of Rome La Sapienza)
The early bird generalises better
In the first part of the seminar I will introduce shallow neural-networks from a statistical-mechanics perspective, focusing on simple cases and on a naive scenario where information to be learnt is structureless. Then, inspired by biological information-processing, I will enrich the framework and make the network able to successfully and cheaply handle structured datasets. In particular, I will recast reinforcement and remotion mechanisms occurring in mammal’s brain during sleep into suitable machine-learning hyperparameters. Results presented are both analytical and numerical.
Giulio Biroli (École normale supérieure (Paris))
Generative AI and Diffusion Models: a Statistical Physics Analysis
Generative models based on diffusion have become the state of the art in the last few years, notably for image generation. After a discussion of the state of the art, I will present an analysis of generative diffusion in the high-dimensional limit, where data are formed by a very large number of variables. By using methods from statistical physics, I will show that concepts like free-energy and symmetry breaking are useful to understand how generative diffusion processes work. I will also characterise the scaling laws in the number of data and in the number of dimensions needed for an efficient generation.
Matthieu Wyart (École Polytechnique Fédérale de Lausanne)
What in the structure of data make them learnable?
Deep learning algorithms have achieved remarkable successes, yet why they work is unclear. Notably, they can learn many high-dimensional tasks, a feat generically infeasible due to the so-called curse of dimensionality. What is the structure of data that makes them learnable, and how this structure is exploited by deep neural networks, is a central question of the field. In the absence of an answer, relevant quantities such as the number of training data needed to learn a given task -the sample complexity- cannot be determined. I will show how deep neural networks trained with gradient descent can beat the curse of dimensionality when the task is hierarchically compositional, by building a good representation of the data that effectively lowers the dimension of the problem. This analysis also reveals how the sample complexity is affected by the hierarchical nature of the task. If time permits, I will also discuss how the fact that regions in the data containing information on the task can be sparse affects sample complexity.
Riccardo Zecchina (Università Bocconi)
Exploring the Role of Liquid States in Neural Networks: From Feedforward Networks to Attractor Models
This seminar will delve into the importance of liquid states in neural network architectures, specifically focusing on feedforward and attractor networks. For feedforward networks, we show that regions with 'liquid flatness' in the loss landscape are associated with minimizers that exhibit strong generalization in overparameterized, non-convex models. In the context of asymmetric attractor random networks, the discussion will highlight the coexistence of an exponential number of liquid attractors alongside chaotic fixed points. This interplay results in the existence of an exponentially large set of internal representations endowed with error-correcting capabilities. These findings draw upon a large deviation technique and are supported by rigorous results (when possible) and numerical simulations.
|09h30 - 10h30||Riccardo Zecchina|
|10h30 - 11h00||Coffee break|
|11h00 - 12h00||Matthieu Wyart|
|12h00 - 13h00||Elena Agliari|
|13h00 - 14h30||Lunch|
|14h30 - 15h30||Giulio Biroli|
|15h30 - 16h00||Coffee break|
|16h00 - 17h00||Andrea Agazzi|
The conference will take place in
Aula Gismondi (aka Aula Magna) at University of Rome Tor Vergata,
Via della Ricerca Scientifica 1, 00133, Roma.
When you arrive, facing the main building, head towards your left. At the left-end of the building, before entering, take the stairs on your left. Walk a few meters under the 'bridges' and you are there.
Should you have further questions, please write to [last name of second organizer] [at] mat [dot] uniroma2 [dot] it .