Friday February 6, 2026

Home | Contact | Support | WebGPU Graphics and Compute ... | Neural Networks and WebGPU Compute.. Learning from Data.....

Neural Networks and WebGPU Compute..

Learning from Data.....

Numerical Examples (Step-by-Step)

Instead of just jumping into code - we go through examples of the forward and backpropagation algorithm. These step-by-step examples help build upon the theoretical equations by plugging in numerical values for a range of network configurations.

If you're not sure how to go from the theory to the implementation this provides an additional springboard.

Furthermore, when you implement your test version - if there are any bugs - you can compare the calculated results to confirm the result match.

Example 1 (2-2-2 Network)

Let's go through a step-by-step example of a simple neural network with 2 inputs, 2 hidden neurons, and 2 outputs using the sigmoid activation function.

We'll include biases in our calculations.

1. Define the Network Structure and Initialize Parameters

Network Architecture:

- Input layer: 2 neurons
- Hidden layer: 2 neurons
- Output layer: 2 neurons

Sigmoid Activation Function:

\[ \sigma(x) = \frac{1}{1 + e^{-x}} \]

Example Initialization:

Let's initialize the weights and biases with some example values.

- Input to hidden weights (\( W_1 \)):
\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \end{bmatrix} \]
- Hidden to output weights (\( W_2 \)):
\[ W_2 = \begin{bmatrix} 0.40 & 0.45 \\ 0.50 & 0.55 \end{bmatrix} \]
- Biases for hidden layer (\( b_1 \)):
\[ b_1 = \begin{bmatrix} 0.35 \\ 0.35 \end{bmatrix} \]
- Biases for output layer (\( b_2 \)):
\[ b_2 = \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} \]

2. Forward Pass

Inputs:

\[ X = \begin{bmatrix} 0.05 \\ 0.10 \end{bmatrix} \]

Hidden Layer Calculations:

\[ Z_1 = W_1 \cdot X + b_1 \]
\[ Z_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \end{bmatrix} \cdot \begin{bmatrix} 0.05 \\ 0.10 \end{bmatrix} + \begin{bmatrix} 0.35 \\ 0.35 \end{bmatrix} \]
\[ Z_1 = \begin{bmatrix} (0.15 \cdot 0.05 + 0.20 \cdot 0.10) + 0.35 \\ (0.25 \cdot 0.05 + 0.30 \cdot 0.10) + 0.35 \end{bmatrix} \]
\[ Z_1 = \begin{bmatrix} 0.3775 \\ 0.3925 \end{bmatrix} \]

Applying the sigmoid function:
\[ A_1 = \sigma(Z_1) = \begin{bmatrix} \sigma(0.3775) \\ \sigma(0.3925) \end{bmatrix} \]
\[ A_1 = \begin{bmatrix} 0.5933 \\ 0.5968 \end{bmatrix} \]

Output Layer Calculations:

\[ Z_2 = W_2 \cdot A_1 + b_2 \]
\[ Z_2 = \begin{bmatrix} 0.40 & 0.45 \\ 0.50 & 0.55 \end{bmatrix} \cdot \begin{bmatrix} 0.5933 \\ 0.5968 \end{bmatrix} + \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} \]
\[ Z_2 = \begin{bmatrix} (0.40 \cdot 0.5933 + 0.45 \cdot 0.5968) + 0.60 \\ (0.50 \cdot 0.5933 + 0.55 \cdot 0.5968) + 0.60 \end{bmatrix} \]
\[ Z_2 = \begin{bmatrix} 1.2291 \\ 1.3613 \end{bmatrix} \]

Applying the sigmoid function:
\[ A_2 = \sigma(Z_2) = \begin{bmatrix} \sigma(1.2291) \\ \sigma(1.3613) \end{bmatrix} \]
\[ A_2 = \begin{bmatrix} 0.7737 \\ 0.7960 \end{bmatrix} \]

3. Compute the Loss

For simplicity, we use Mean Squared Error (MSE):
\[ L = \frac{1}{2} \sum_{i} (y_i - \hat{y}_i)^2 \]

Let's assume the target output \( Y \) is:
\[ Y = \begin{bmatrix} 0.01 \\ 0.99 \end{bmatrix} \]

\[ L = \frac{1}{2} ((0.01 - 0.7737)^2 + (0.99 - 0.7960)^2) \]
\[ L = \frac{1}{2} ((-0.7637)^2 + (0.1940)^2) \]
\[ L = \frac{1}{2} (0.5832 + 0.0376) \]
\[ L = \frac{1}{2} (0.6208) \]
\[ L = 0.3104 \]

4. Backward Pass

Calculate Output Layer Gradients

\[ \delta_2 = (A_2 - Y) \cdot \sigma'(Z_2) \]
\[ \sigma'(Z_2) = \sigma(Z_2) \cdot (1 - \sigma(Z_2)) \]
\[ \sigma'(Z_2) = \begin{bmatrix} 0.7737 \cdot (1 - 0.7737) \\ 0.7960 \cdot (1 - 0.7960) \end{bmatrix} \]
\[ \sigma'(Z_2) = \begin{bmatrix} 0.1753 \\ 0.1625 \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} (0.7737 - 0.01) \cdot 0.1753 \\ (0.7960 - 0.99) \cdot 0.1625 \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} 0.1341 \\ -0.0314 \end{bmatrix} \]

Calculate Hidden Layer Gradients

\[ \delta_1 = (W_2^T \cdot \delta_2) \cdot \sigma'(Z_1) \]
\[ W_2^T = \begin{bmatrix} 0.40 & 0.50 \\ 0.45 & 0.55 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.40 & 0.50 \\ 0.45 & 0.55 \end{bmatrix} \cdot \begin{bmatrix} 0.1341 \\ -0.0314 \end{bmatrix} \cdot \sigma'(Z_1) \]
\[ \delta_1 = \begin{bmatrix} 0.40 \cdot 0.1341 + 0.50 \cdot -0.0314 \\ 0.45 \cdot 0.1341 + 0.55 \cdot -0.0314 \end{bmatrix} \cdot \sigma'(Z_1) \]
\[ \delta_1 = \begin{bmatrix} 0.0462 \\ 0.0446 \end{bmatrix} \cdot \begin{bmatrix} 0.2413 \\ 0.2406 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.0111 \\ 0.0107 \end{bmatrix} \]

5. Update Weights and Biases

Using a learning rate \( \eta \) of 0.5 for this example:

Update Hidden to Output Weights (\( W_2 \)) and Biases (\( b_2 \)):

\[ W_2 = W_2 - \eta \cdot \delta_2 \cdot A_1^T \]
\[ W_2 = \begin{bmatrix} 0.40 & 0.45 \\ 0.50 & 0.55 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.1341 \\ -0.0314 \end{bmatrix} \cdot \begin{bmatrix} 0.5933 & 0.5968 \end{bmatrix} \]
\[ W_2 = \begin{bmatrix} 0.40 & 0.45 \\ 0.50 & 0.55 \end{bmatrix} - \begin{bmatrix} 0.0398 & 0.0400 \\ -0.0093 & -0.0094 \end{bmatrix} \]
\[ W_2 = \begin{bmatrix} 0.3602 & 0.4100 \\ 0.5093 & 0.5594 \end{bmatrix} \]

Update Biases for the Output Layer (\( b_2 \)):

\[ b_2 = b_2 - \eta \cdot \delta_2 \]
\[ b_2 = \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.1341 \\ -0.0314 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.60 - 0.06705 \\ 0.60 + 0.0157 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.53295 \\ 0.6157 \end{bmatrix} \]

Update Weights and Biases for the Hidden Layer (\( W_1 \) and \( b_1 \)):

Update Hidden to Output Weights (\( W_1 \)):

\[ W_1 = W_1 - \eta \cdot \delta_1 \cdot X^T \]
\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0111 \\ 0.0107 \end{bmatrix} \cdot \begin{bmatrix} 0.05 & 0.10 \end{bmatrix} \]

Calculating the gradients:
\[ \delta_1 \cdot X^T = \begin{bmatrix} 0.0111 \\ 0.0107 \end{bmatrix} \cdot \begin{bmatrix} 0.05 & 0.10 \end{bmatrix} = \begin{bmatrix} 0.0111 \cdot 0.05 & 0.0111 \cdot 0.10 \\ 0.0107 \cdot 0.05 & 0.0107 \cdot 0.10 \end{bmatrix} \]
\[ \delta_1 \cdot X^T = \begin{bmatrix} 0.000555 & 0.00111 \\ 0.000535 & 0.00107 \end{bmatrix} \]

Multiplying by the learning rate:
\[ 0.5 \cdot \delta_1 \cdot X^T = \begin{bmatrix} 0.0002775 & 0.000555 \\ 0.0002675 & 0.000535 \end{bmatrix} \]

Updating the weights:
\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \end{bmatrix} - \begin{bmatrix} 0.0002775 & 0.000555 \\ 0.0002675 & 0.000535 \end{bmatrix} \]
\[ W_1 = \begin{bmatrix} 0.1497225 & 0.199445 \\ 0.2497325 & 0.299465 \end{bmatrix} \]

Update Biases for the Hidden Layer (\( b_1 \)):

\[ b_1 = b_1 - \eta \cdot \delta_1 \]
\[ b_1 = \begin{bmatrix} 0.35 \\ 0.35 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0111 \\ 0.0107 \end{bmatrix} \]
\[ b_1 = \begin{bmatrix} 0.35 - 0.00555 \\ 0.35 - 0.00535 \end{bmatrix} \]
\[ b_1 = \begin{bmatrix} 0.34445 \\ 0.34465 \end{bmatrix} \]

Finally of Updated Parameters:

- Updated weights and biases for the output layer:
\[ W_2 = \begin{bmatrix} 0.3602 & 0.4100 \\ 0.5093 & 0.5594 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.53295 \\ 0.6157 \end{bmatrix} \]

- Updated weights and biases for the hidden layer:
\[ W_1 = \begin{bmatrix} 0.1497225 & 0.199445 \\ 0.2497325 & 0.299465 \end{bmatrix} \]
\[ b_1 = \begin{bmatrix} 0.34445 \\ 0.34465 \end{bmatrix} \]

This completes the backpropagation process for the given neural network. The weights and biases have been updated based on the error gradients calculated during the backward pass.

Example 2 (2-3-2-2 Network)

1. Define the Network Structure and Initialize Parameters

We will use the sigmoid activation function and include biases in our calculations.

Network Architecture:

- Input layer: 2 neurons
- First hidden layer: 3 neurons
- Second hidden layer: 2 neurons
- Output layer: 2 neurons

Sigmoid Activation Function:

\[ \sigma(x) = \frac{1}{1 + e^{-x}} \]

Example Initialization:

Let's initialize the weights and biases with some example values.

- Input to first hidden layer weights (\( W_1 \)):
\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \\ 0.35 & 0.40 \end{bmatrix} \]
- First hidden layer to second hidden layer weights (\( W_2 \)):
\[ W_2 = \begin{bmatrix} 0.45 & 0.50 & 0.55 \\ 0.60 & 0.65 & 0.70 \end{bmatrix} \]
- Second hidden layer to output layer weights (\( W_3 \)):
\[ W_3 = \begin{bmatrix} 0.75 & 0.80 \\ 0.85 & 0.90 \end{bmatrix} \]
- Biases for first hidden layer (\( b_1 \)):
\[ b_1 = \begin{bmatrix} 0.35 \\ 0.35 \\ 0.35 \end{bmatrix} \]
- Biases for second hidden layer (\( b_2 \)):
\[ b_2 = \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} \]
- Biases for output layer (\( b_3 \)):
\[ b_3 = \begin{bmatrix} 0.75 \\ 0.75 \end{bmatrix} \]

2. Forward Pass

Inputs:

\[ X = \begin{bmatrix} 0.05 \\ 0.10 \end{bmatrix} \]

First Hidden Layer Calculations:

\[ Z_1 = W_1 \cdot X + b_1 \]
\[ Z_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \\ 0.35 & 0.40 \end{bmatrix} \cdot \begin{bmatrix} 0.05 \\ 0.10 \end{bmatrix} + \begin{bmatrix} 0.35 \\ 0.35 \\ 0.35 \end{bmatrix} \]
\[ Z_1 = \begin{bmatrix} (0.15 \cdot 0.05 + 0.20 \cdot 0.10) + 0.35 \\ (0.25 \cdot 0.05 + 0.30 \cdot 0.10) + 0.35 \\ (0.35 \cdot 0.05 + 0.40 \cdot 0.10) + 0.35 \end{bmatrix} \]
\[ Z_1 = \begin{bmatrix} 0.3775 \\ 0.3925 \\ 0.4075 \end{bmatrix} \]

Applying the sigmoid function:
\[ A_1 = \sigma(Z_1) = \begin{bmatrix} \sigma(0.3775) \\ \sigma(0.3925) \\ \sigma(0.4075) \end{bmatrix} \]
\[ A_1 = \begin{bmatrix} 0.5933 \\ 0.5968 \\ 0.6005 \end{bmatrix} \]

Second Hidden Layer Calculations:

\[ Z_2 = W_2 \cdot A_1 + b_2 \]
\[ Z_2 = \begin{bmatrix} 0.45 & 0.50 & 0.55 \\ 0.60 & 0.65 & 0.70 \end{bmatrix} \cdot \begin{bmatrix} 0.5933 \\ 0.5968 \\ 0.6005 \end{bmatrix} + \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} \]
\[ Z_2 = \begin{bmatrix} (0.45 \cdot 0.5933 + 0.50 \cdot 0.5968 + 0.55 \cdot 0.6005) + 0.60 \\ (0.60 \cdot 0.5933 + 0.65 \cdot 0.5968 + 0.70 \cdot 0.6005) + 0.60 \end{bmatrix} \]
\[ Z_2 = \begin{bmatrix} 1.3063 \\ 1.6211 \end{bmatrix} \]

Applying the sigmoid function:
\[ A_2 = \sigma(Z_2) = \begin{bmatrix} \sigma(1.3063) \\ \sigma(1.6211) \end{bmatrix} \]
\[ A_2 = \begin{bmatrix} 0.7869 \\ 0.8350 \end{bmatrix} \]

Output Layer Calculations:

\[ Z_3 = W_3 \cdot A_2 + b_3 \]
\[ Z_3 = \begin{bmatrix} 0.75 & 0.80 \\ 0.85 & 0.90 \end{bmatrix} \cdot \begin{bmatrix} 0.7869 \\ 0.8350 \end{bmatrix} + \begin{bmatrix} 0.75 \\ 0.75 \end{bmatrix} \]
\[ Z_3 = \begin{bmatrix} (0.75 \cdot 0.7869 + 0.80 \cdot 0.8350) + 0.75 \\ (0.85 \cdot 0.7869 + 0.90 \cdot 0.8350) + 0.75 \end{bmatrix} \]
\[ Z_3 = \begin{bmatrix} 2.2219 \\ 2.5194 \end{bmatrix} \]

Applying the sigmoid function:
\[ A_3 = \sigma(Z_3) = \begin{bmatrix} \sigma(2.2219) \\ \sigma(2.5194) \end{bmatrix} \]
\[ A_3 = \begin{bmatrix} 0.9022 \\ 0.9255 \end{bmatrix} \]

3. Compute the Loss

For simplicity, we use Mean Squared Error (MSE):
\[ L = \frac{1}{2} \sum_{i} (y_i - \hat{y}_i)^2 \]

Let's assume the target output \( Y \) is:
\[ Y = \begin{bmatrix} 0.01 \\ 0.99 \end{bmatrix} \]

\[ L = \frac{1}{2} ((0.01 - 0.9022)^2 + (0.99 - 0.9255)^2) \]
\[ L = \frac{1}{2} ((-0.8922)^2 + (0.0645)^2) \]
\[ L = \frac{1}{2} (0.7960 + 0.0042) \]
\[ L = \frac{1}{2} (0.8002) \]
\[ L = 0.4001 \]

4. Backward Pass

Calculate Output Layer Gradients

\[ \delta_3 = (A_3 - Y) \cdot \sigma'(Z_3) \]
\[ \sigma'(Z_3) = \sigma(Z_3) \cdot (1 - \sigma(Z_3)) \]
\[ \sigma'(Z_3) = \begin{bmatrix} 0.9022 \cdot (1 - 0.9022) \\ 0.9255 \cdot (1 - 0.9255) \end{bmatrix} \]
\[ \sigma'(Z_3) = \begin{bmatrix} 0.0882 \\ 0.0689 \end{bmatrix} \]
\[ \delta_3 = \begin{bmatrix} (0.9022 - 0.01) \cdot 0.0882 \\ (0.9255 - 0.99) \cdot 0.0689 \end{bmatrix} \]
\[ \delta_3 = \begin{bmatrix} 0.0787 \\ -0.0044 \end{bmatrix} \]

Calculate Second Hidden Layer Gradients

\[ \delta_2 = (W_3^T \cdot \delta_3) \cdot \sigma'(Z_2) \]
\[ W_3^T = \begin{bmatrix} 0.75 & 0.85 \\ 0.80 & 0.90 \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} 0.75 & 0.80 \\ 0.85 & 0.90 \end{bmatrix} \cdot \begin{bmatrix} 0.0787 \\ -0.0044 \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} (0.75 \cdot 0.0787 + 0.80 \cdot -0.0044) \\ (0.85 \cdot 0.0787 + 0.90 \cdot -0.0044) \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} 0.0578 \\ 0.0625 \end{bmatrix} \]

Applying the sigmoid derivative to \( Z_2 \):
\[ \sigma'(Z_2) = \sigma(Z_2) \cdot (1 - \sigma(Z_2)) \]
\[ \sigma'(Z_2) = \begin{bmatrix} 0.7869 \cdot (1 - 0.7869) \\ 0.8350 \cdot (1 - 0.8350) \end{bmatrix} \]
\[ \sigma'(Z_2) = \begin{bmatrix} 0.1673 \\ 0.1376 \end{bmatrix} \]

Now calculate \( \delta_2 \):
\[ \delta_2 = \begin{bmatrix} 0.0578 \\ 0.0625 \end{bmatrix} \cdot \begin{bmatrix} 0.1673 \\ 0.1376 \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} 0.0097 \\ 0.0086 \end{bmatrix} \]

Calculate First Hidden Layer Gradients

\[ \delta_1 = (W_2^T \cdot \delta_2) \cdot \sigma'(Z_1) \]
\[ W_2^T = \begin{bmatrix} 0.45 & 0.60 \\ 0.50 & 0.65 \\ 0.55 & 0.70 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.45 & 0.50 & 0.55 \\ 0.60 & 0.65 & 0.70 \end{bmatrix}^T \cdot \begin{bmatrix} 0.0097 \\ 0.0086 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.45 \cdot 0.0097 + 0.60 \cdot 0.0086 \\ 0.50 \cdot 0.0097 + 0.65 \cdot 0.0086 \\ 0.55 \cdot 0.0097 + 0.70 \cdot 0.0086 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.0110 \\ 0.0115 \\ 0.0120 \end{bmatrix} \]

Applying the sigmoid derivative to \( Z_1 \):
\[ \sigma'(Z_1) = \sigma(Z_1) \cdot (1 - \sigma(Z_1)) \]
\[ \sigma'(Z_1) = \begin{bmatrix} 0.5933 \cdot (1 - 0.5933) \\ 0.5968 \cdot (1 - 0.5968) \\ 0.6005 \cdot (1 - 0.6005) \end{bmatrix} \]
\[ \sigma'(Z_1) = \begin{bmatrix} 0.2413 \\ 0.2407 \\ 0.2399 \end{bmatrix} \]

Now calculate \( \delta_1 \):
\[ \delta_1 = \begin{bmatrix} 0.0110 \\ 0.0115 \\ 0.0120 \end{bmatrix} \cdot \begin{bmatrix} 0.2413 \\ 0.2407 \\ 0.2399 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.0027 \\ 0.0028 \\ 0.0029 \end{bmatrix} \]

5. Update Weights and Biases

Using a learning rate \( \eta \) of 0.5 for this example:

Update Weights and Biases for Output Layer (\( W_3 \) and \( b_3 \)):

\[ W_3 = W_3 - \eta \cdot \delta_3 \cdot A_2^T \]
\[ W_3 = \begin{bmatrix} 0.75 & 0.80 \\ 0.85 & 0.90 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0787 \\ -0.0044 \end{bmatrix} \cdot \begin{bmatrix} 0.7869 & 0.8350 \end{bmatrix} \]
\[ W_3 = \begin{bmatrix} 0.75 & 0.80 \\ 0.85 & 0.90 \end{bmatrix} - \begin{bmatrix} 0.0309 & 0.0328 \\ -0.0017 & -0.0018 \end{bmatrix} \]
\[ W_3 = \begin{bmatrix} 0.7191 & 0.7672 \\ 0.8517 & 0.9018 \end{bmatrix} \]

\[ b_3 = b_3 - \eta \cdot \delta_3 \]
\[ b_3 = \begin{bmatrix} 0.75 \\ 0.75 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0787 \\ -0.0044 \end{bmatrix} \]
\[ b_3 = \begin{bmatrix} 0.75 \\ 0.75 \end{bmatrix} - \begin{bmatrix} 0.0394 \\ -0.0022 \end{bmatrix} \]
\[ b_3 = \begin{bmatrix} 0.7106 \\ 0.7522 \end{bmatrix} \]

Update Weights and Biases for Second Hidden Layer (\( W_2 \) and \( b_2 \)):

\[ W_2 = W_2 - \eta \cdot \delta_2 \cdot A_1^T \]
\[ W_2 = \begin{bmatrix} 0.45 & 0.50 & 0.55 \\ 0.60 & 0.65 & 0.70 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0097 \\ 0.0086 \end{bmatrix} \cdot \begin{bmatrix} 0.5933 & 0.5968 & 0.6005 \end{bmatrix} \]
\[ W_2 = \begin{bmatrix} 0.45 & 0.50 & 0.55 \\ 0.60 & 0.65 & 0.70 \end{bmatrix} - \begin{bmatrix} 0.0029 & 0.0030 & 0.0030 \\ 0.0026 & 0.0026 & 0.0026 \end{bmatrix} \]
\[ W_2 = \begin{bmatrix} 0.4471 & 0.4970 & 0.5470 \\ 0.5974 & 0.6474 & 0.6974 \end{bmatrix} \]

\[ b_2 = b_2 - \eta \cdot \delta_2 \]
\[ b_2 = \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0097 \\ 0.0086 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} - \begin{bmatrix} 0.0049 \\ 0.0043 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.5951 \\ 0.5957 \end{bmatrix} \]

Update Weights and Biases for First Hidden Layer (\( W_1 \) and \( b_1 \)):

\[ W_1 = W_1 - \eta \cdot \delta_1 \cdot X^T \]

Using the learning rate \( \eta = 0.5 \):

\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \\ 0.35 & 0.40 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0027 \\ 0.0028 \\ 0.0029 \end{bmatrix} \cdot \begin{bmatrix} 0.05 & 0.10 \end{bmatrix} \]

Calculating the gradients:

\[ \delta_1 \cdot X^T = \begin{bmatrix} 0.0027 \\ 0.0028 \\ 0.0029 \end{bmatrix} \cdot \begin{bmatrix} 0.05 & 0.10 \end{bmatrix} = \begin{bmatrix} 0.0027 \cdot 0.05 & 0.0027 \cdot 0.10 \\ 0.0028 \cdot 0.05 & 0.0028 \cdot 0.10 \\ 0.0029 \cdot 0.05 & 0.0029 \cdot 0.10 \end{bmatrix} \]

\[ \delta_1 \cdot X^T = \begin{bmatrix} 0.000135 & 0.000270 \\ 0.000140 & 0.000280 \\ 0.000145 & 0.000290 \end{bmatrix} \]

Multiplying by the learning rate:

\[ 0.5 \cdot \delta_1 \cdot X^T = \begin{bmatrix} 0.0000675 & 0.000135 \\ 0.000070 & 0.000140 \\ 0.0000725 & 0.000145 \end{bmatrix} \]

Updating the weights:

\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \\ 0.35 & 0.40 \end{bmatrix} - \begin{bmatrix} 0.0000675 & 0.000135 \\ 0.000070 & 0.000140 \\ 0.0000725 & 0.000145 \end{bmatrix} \]

\[ W_1 = \begin{bmatrix} 0.1499325 & 0.199865 \\ 0.249930 & 0.299860 \\ 0.3499275 & 0.399855 \end{bmatrix} \]

Updating the biases:

\[ b_1 = b_1 - \eta \cdot \delta_1 \]

\[ b_1 = \begin{bmatrix} 0.35 \\ 0.35 \\ 0.35 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0027 \\ 0.0028 \\ 0.0029 \end{bmatrix} \]

\[ b_1 = \begin{bmatrix} 0.35 \\ 0.35 \\ 0.35 \end{bmatrix} - \begin{bmatrix} 0.00135 \\ 0.0014 \\ 0.00145 \end{bmatrix} \]

\[ b_1 = \begin{bmatrix} 0.34865 \\ 0.3486 \\ 0.34855 \end{bmatrix} \]

Finally the Updated Parameters

- Updated weights and biases for the output layer:
\[ W_3 = \begin{bmatrix} 0.7191 & 0.7672 \\ 0.8517 & 0.9018 \end{bmatrix} \]
\[ b_3 = \begin{bmatrix} 0.7106 \\ 0.7522 \end{bmatrix} \]

- Updated weights and biases for the second hidden layer:
\[ W_2 = \begin{bmatrix} 0.4471 & 0.4970 & 0.5470 \\ 0.5974 & 0.6474 & 0.6974 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.5951 \\ 0.5957 \end{bmatrix} \]

- Updated weights and biases for the first hidden layer:
\[ W_1 = \begin{bmatrix} 0.1499325 & 0.199865 \\ 0.249930 & 0.299860 \\ 0.3499275 & 0.399855 \end{bmatrix} \]
\[ b_1 = \begin{bmatrix} 0.34865 \\ 0.3486 \\ 0.34855 \end{bmatrix} \]

This completes one full forward and backward pass (backpropagation) with the given example numbers through a neural network with 2 input neurons, 2 hidden layers (with 3 and 2 neurons respectively), and 2 output neurons using the sigmoid activation function.

Advert (Support Website)

Visitor: