Principles of Deep Learning

This blog post seeks to challenge that notion by exploring the foundational Principles of Deep Learning. The insights presented here are heavily inspired by the comprehensive scholarly survey, "A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models," authored by Namjoon Suh and Guang Cheng and published in the Annual Review of Statistics and Its Application.

1. Introduction

“… The history of deep learning is a story of progress in three fundamental principles: the existence of a neural network that can solve the problem, the design of diverse architectures, and the training dynamics to find the solution. …”

2. The Existence of a Neural Network: Can it solve the problem?

Coming soon ...

2.1 Fully Connected Neural Networks

Coming soon ...

2.2 The Space Triplet

Coming soon ...

2.3 Fixed Approximation Error

Coming soon ...

2.4 The Universal Approximation Theorem

The Universal Approximation Theorem states that a feedforward neural network with a single hidden layer and a finite number of neurons can approximate any continuous function on a compact subset of the real numbers $\mathbb{R}^n$, given an appropriate activation function. (GeeksforGeeks)

2.5 The Advantage of Depth

Coming soon ...

2.6 The Limits of Width

Coming soon ...

2.7 Breaking the Curse of Dimensionality

Coming soon ...

3. Diverse Architectures: Beyond Simple Connections

“… Neural networks, the mathematical abstractions of our brain, lie at the core of this progression. Nevertheless, amid the ongoing renaissance of artificial intelligence (AI), neural networks have acquired an almost mythical status, spreading the misconception that they are more art than science. …”

3.1 Convolutional Neural Networks

Coming soon ...

3.2 Recurrent Neural Networks

Coming soon ...

3.3 Recurrent Convolutional Neural Networks

Coming soon ...

3.4 Graph Neural Networks

Coming soon ...

3.5 Transformer Architectures

Coming soon ...

3.6 Hybrid Transformer Architectures

Coming soon ...

3.7 Large Language Models

Coming soon ...

4. Training Dynamics: How do we find the solution?

Coming soon ...

4.1 Loss Function

Coming soon ...

4.2 Gradient Descent

Coming soon ...

4.3 Training Regimes

Coming soon ...

4.4 Half precision training

Tensor Cores

Tensor Cores enable mixed-precision computing, dynamically adapting calculations to accelerate throughput while preserving accuracy and providing enhanced security. The latest generation of Tensor Cores are faster than ever on a broad array of AI and high-performance computing (HPC) tasks. From 4X speedups in training trillion-parameter generative AI models to a 30X increase in inference performance, NVIDIA Tensor Cores accelerate all workloads for modern AI factories. (NVIDIA)

Principles of Deep Learning

1. Introduction

2. The Existence of a Neural Network: Can it solve the problem?

2.1 Fully Connected Neural Networks

2.2 The Space Triplet

2.3 Fixed Approximation Error

2.4 The Universal Approximation Theorem

2.5 The Advantage of Depth

2.6 The Limits of Width

2.7 Breaking the Curse of Dimensionality

3. Diverse Architectures: Beyond Simple Connections

3.1 Convolutional Neural Networks

3.2 Recurrent Neural Networks

3.3 Recurrent Convolutional Neural Networks

3.4 Graph Neural Networks

3.5 Transformer Architectures

3.6 Hybrid Transformer Architectures

3.7 Large Language Models

4. Training Dynamics: How do we find the solution?

4.1 Loss Function

4.2 Gradient Descent

4.3 Training Regimes

4.4 Half precision training

Tensor Cores

4.5 SOTA scaling techniques

Accumulate Gradients

Gradient Clipping

Stochastic Weight Averaging (SWA)

Batch Size Finder

Learning Rate Finder

5. Conclusion