Neural Networks

| **Activation Function** | **Equation**                       | **Details**                                                                                                                                      | **When to Use / Limits**                                                                                                         |
|-------------------------|------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|
| **Step Function**       | \(\phi(z) = \begin{cases} 1 & \text{if } z \ge 0 \\ 0 & \text{if } z < 0 \end{cases}\) | Simple threshold function that outputs binary values.                                                                                            | Use in simple binary classifiers (e.g., perceptrons). Limited in complex tasks due to lack of smooth gradients for learning.     |
| **Sigmoid**             | \(\sigma(z) = \frac{1}{1 + e^{-z}}\) | Smooth, differentiable function outputting values between 0 and 1.                                                                                | Good for binary classification and probabilistic interpretations. Can suffer from vanishing gradients.                           |
| **Tanh**                | \(\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}\) | Smooth, differentiable function outputting values between -1 and 1.                                                                                | Useful in hidden layers for zero-centered outputs. Also can suffer from vanishing gradients, though less than sigmoid.           |
| **ReLU**                | \(\text{ReLU}(z) = \max(0, z)\)      | Outputs zero for negative inputs and linear for positive inputs, introducing sparsity and efficient computation.                                   | Common in hidden layers of deep networks. Can suffer from dying ReLU problem where neurons get stuck during training.            |
| **Leaky ReLU**          | \(\text{Leaky ReLU}(z) = \begin{cases} z & \text{if } z \ge 0 \\ \alpha z & \text{if } z < 0 \end{cases}\) | Modified ReLU allowing small negative values for negative inputs with \(\alpha\) being a small constant.                                          | Mitigates dying ReLU problem, making it suitable for deep networks. \(\alpha\) typically set to 0.01.                           |
| **Parametric ReLU (PReLU)** | \(\text{PReLU}(z) = \begin{cases} z & \text{if } z \ge 0 \\ \alpha z & \text{if } z < 0 \end{cases}\) | Similar to Leaky ReLU but \(\alpha\) is learned during training.                                                                                   | Offers flexibility over Leaky ReLU. Can potentially adapt better during training.                                               |
| **Softmax**             | \(\text{Softmax}(z_i) = \frac{e^{z_i}}{\sum_{j} e^{z_j}}\) | Converts a vector of values into a probability distribution.                                                                                       | Ideal for the output layer of multi-class classification problems. Ensures output probabilities sum to 1.                       |
| **ELU (Exponential Linear Unit)** | \(\text{ELU}(z) = \begin{cases} z & \text{if } z \ge 0 \\ \alpha (e^z - 1) & \text{if } z < 0 \end{cases}\) | Like ReLU but smooth for negative inputs, reducing bias shift and encouraging more effective learning.                                           | Suitable for deep networks. Requires more computation compared to ReLU. Parameter \(\alpha\) typically set to 1.                |

Perceptron (Building Blocks of Neural Networks)

Connect More Perceptrons (Layers)

Activation Functions

Network configurations

Feed Forward Networks

Networks Shape

Training & Data

Training Backpropogation

Backprogation Steps

XOR Test Case

Better Backpropagation