- LCA: Loss Change Allocation for Neural Network Training
- Asymptotics of Wide Networks from Feynman Diagrams
- Neural networks and physical systems with emergent collective computational abilities
- Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes
- Adversarial Robustness Through Local Lipschitzness
- Lagrangian Neural Networks
- Deep Information Propagation
- Exponential expressivity in deep neural networks through transient chaos
- Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
- Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function
- Mean Field Residual Networks: On the Edge of Chaos
- Mean Field Theory of Activation Functions in Deep Neural Networks
- Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
- On the Impact of the Activation Function on Deep Neural Networks Training
- Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks
- Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration
- Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
- Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
- Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak- Lojasiewicz Condition
- On the distance between two neural networks and the stability of learning
- The large learning rate phase of deep learning: the catapult mechanism
- Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
- A Fine-Grained Spectral Perspective on Neural Networks
- Regularizing activations in neural networks via distribution matching with the Wasserstein metric
- Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem
- Effect of Activation Functions on the Training of Overparametrized Neural Nets
- Implicit Neural Representations with Periodic Activation Functions
- Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem
- Making Convolutional Networks Shift-Invariant Again
- GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing
- Butterfly Transform: An Efficient FFT Based Neural Architecture Design
- ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
- Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
- Learning One Convolutional Layer with Overlapping Patches
- Batch-Shaping for Learning Conditional Channel Gated Networks
- Convolutional Networks with Adaptive Inference Graphs
- The Singular Values of Convolutional Layers