www.xbdev.net
xbdev - software development
Friday February 6, 2026
Home | Contact | Support | WebGPU Graphics and Compute ... | Neural Networks and WebGPU Compute.. Learning from Data.....
     
 

Neural Networks and WebGPU Compute..

Learning from Data.....

 



Numerical Examples (Step-by-Step)


Instead of just jumping into code - we go through examples of the forward and backpropagation algorithm. These step-by-step examples help build upon the theoretical equations by plugging in numerical values for a range of network configurations.

If you're not sure how to go from the theory to the implementation this provides an additional springboard.

Furthermore, when you implement your test version - if there are any bugs - you can compare the calculated results to confirm the result match.


Example 1 (2-2-2 Network)


Let's go through a step-by-step example of a simple neural network with 2 inputs, 2 hidden neurons, and 2 outputs using the sigmoid activation function.

We'll include biases in our calculations.

1. Define the Network Structure and Initialize Parameters


Network Architecture:

- Input layer: 2 neurons
- Hidden layer: 2 neurons
- Output layer: 2 neurons

Sigmoid Activation Function:

\[ \sigma(x) = \frac{1}{1 + e^{-x}} \]

Example Initialization:

Let's initialize the weights and biases with some example values.

- Input to hidden weights (\( W_1 \)):
\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \end{bmatrix} \]
- Hidden to output weights (\( W_2 \)):
\[ W_2 = \begin{bmatrix} 0.40 & 0.45 \\ 0.50 & 0.55 \end{bmatrix} \]
- Biases for hidden layer (\( b_1 \)):
\[ b_1 = \begin{bmatrix} 0.35 \\ 0.35 \end{bmatrix} \]
- Biases for output layer (\( b_2 \)):
\[ b_2 = \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} \]

2. Forward Pass


Inputs:

\[ X = \begin{bmatrix} 0.05 \\ 0.10 \end{bmatrix} \]

Hidden Layer Calculations:

\[ Z_1 = W_1 \cdot X + b_1 \]
\[ Z_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \end{bmatrix} \cdot \begin{bmatrix} 0.05 \\ 0.10 \end{bmatrix} + \begin{bmatrix} 0.35 \\ 0.35 \end{bmatrix} \]
\[ Z_1 = \begin{bmatrix} (0.15 \cdot 0.05 + 0.20 \cdot 0.10) + 0.35 \\ (0.25 \cdot 0.05 + 0.30 \cdot 0.10) + 0.35 \end{bmatrix} \]
\[ Z_1 = \begin{bmatrix} 0.3775 \\ 0.3925 \end{bmatrix} \]

Applying the sigmoid function:
\[ A_1 = \sigma(Z_1) = \begin{bmatrix} \sigma(0.3775) \\ \sigma(0.3925) \end{bmatrix} \]
\[ A_1 = \begin{bmatrix} 0.5933 \\ 0.5968 \end{bmatrix} \]

Output Layer Calculations:

\[ Z_2 = W_2 \cdot A_1 + b_2 \]
\[ Z_2 = \begin{bmatrix} 0.40 & 0.45 \\ 0.50 & 0.55 \end{bmatrix} \cdot \begin{bmatrix} 0.5933 \\ 0.5968 \end{bmatrix} + \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} \]
\[ Z_2 = \begin{bmatrix} (0.40 \cdot 0.5933 + 0.45 \cdot 0.5968) + 0.60 \\ (0.50 \cdot 0.5933 + 0.55 \cdot 0.5968) + 0.60 \end{bmatrix} \]
\[ Z_2 = \begin{bmatrix} 1.2291 \\ 1.3613 \end{bmatrix} \]

Applying the sigmoid function:
\[ A_2 = \sigma(Z_2) = \begin{bmatrix} \sigma(1.2291) \\ \sigma(1.3613) \end{bmatrix} \]
\[ A_2 = \begin{bmatrix} 0.7737 \\ 0.7960 \end{bmatrix} \]

3. Compute the Loss


For simplicity, we use Mean Squared Error (MSE):
\[ L = \frac{1}{2} \sum_{i} (y_i - \hat{y}_i)^2 \]

Let's assume the target output \( Y \) is:
\[ Y = \begin{bmatrix} 0.01 \\ 0.99 \end{bmatrix} \]

\[ L = \frac{1}{2} ((0.01 - 0.7737)^2 + (0.99 - 0.7960)^2) \]
\[ L = \frac{1}{2} ((-0.7637)^2 + (0.1940)^2) \]
\[ L = \frac{1}{2} (0.5832 + 0.0376) \]
\[ L = \frac{1}{2} (0.6208) \]
\[ L = 0.3104 \]

4. Backward Pass


Calculate Output Layer Gradients

\[ \delta_2 = (A_2 - Y) \cdot \sigma'(Z_2) \]
\[ \sigma'(Z_2) = \sigma(Z_2) \cdot (1 - \sigma(Z_2)) \]
\[ \sigma'(Z_2) = \begin{bmatrix} 0.7737 \cdot (1 - 0.7737) \\ 0.7960 \cdot (1 - 0.7960) \end{bmatrix} \]
\[ \sigma'(Z_2) = \begin{bmatrix} 0.1753 \\ 0.1625 \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} (0.7737 - 0.01) \cdot 0.1753 \\ (0.7960 - 0.99) \cdot 0.1625 \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} 0.1341 \\ -0.0314 \end{bmatrix} \]

Calculate Hidden Layer Gradients

\[ \delta_1 = (W_2^T \cdot \delta_2) \cdot \sigma'(Z_1) \]
\[ W_2^T = \begin{bmatrix} 0.40 & 0.50 \\ 0.45 & 0.55 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.40 & 0.50 \\ 0.45 & 0.55 \end{bmatrix} \cdot \begin{bmatrix} 0.1341 \\ -0.0314 \end{bmatrix} \cdot \sigma'(Z_1) \]
\[ \delta_1 = \begin{bmatrix} 0.40 \cdot 0.1341 + 0.50 \cdot -0.0314 \\ 0.45 \cdot 0.1341 + 0.55 \cdot -0.0314 \end{bmatrix} \cdot \sigma'(Z_1) \]
\[ \delta_1 = \begin{bmatrix} 0.0462 \\ 0.0446 \end{bmatrix} \cdot \begin{bmatrix} 0.2413 \\ 0.2406 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.0111 \\ 0.0107 \end{bmatrix} \]

5. Update Weights and Biases


Using a learning rate \( \eta \) of 0.5 for this example:

Update Hidden to Output Weights (\( W_2 \)) and Biases (\( b_2 \)):

\[ W_2 = W_2 - \eta \cdot \delta_2 \cdot A_1^T \]
\[ W_2 = \begin{bmatrix} 0.40 & 0.45 \\ 0.50 & 0.55 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.1341 \\ -0.0314 \end{bmatrix} \cdot \begin{bmatrix} 0.5933 & 0.5968 \end{bmatrix} \]
\[ W_2 = \begin{bmatrix} 0.40 & 0.45 \\ 0.50 & 0.55 \end{bmatrix} - \begin{bmatrix} 0.0398 & 0.0400 \\ -0.0093 & -0.0094 \end{bmatrix} \]
\[ W_2 = \begin{bmatrix} 0.3602 & 0.4100 \\ 0.5093 & 0.5594 \end{bmatrix} \]


Update Biases for the Output Layer (\( b_2 \)):


\[ b_2 = b_2 - \eta \cdot \delta_2 \]
\[ b_2 = \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.1341 \\ -0.0314 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.60 - 0.06705 \\ 0.60 + 0.0157 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.53295 \\ 0.6157 \end{bmatrix} \]

Update Weights and Biases for the Hidden Layer (\( W_1 \) and \( b_1 \)):


Update Hidden to Output Weights (\( W_1 \)):

\[ W_1 = W_1 - \eta \cdot \delta_1 \cdot X^T \]
\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0111 \\ 0.0107 \end{bmatrix} \cdot \begin{bmatrix} 0.05 & 0.10 \end{bmatrix} \]

Calculating the gradients:
\[ \delta_1 \cdot X^T = \begin{bmatrix} 0.0111 \\ 0.0107 \end{bmatrix} \cdot \begin{bmatrix} 0.05 & 0.10 \end{bmatrix} = \begin{bmatrix} 0.0111 \cdot 0.05 & 0.0111 \cdot 0.10 \\ 0.0107 \cdot 0.05 & 0.0107 \cdot 0.10 \end{bmatrix} \]
\[ \delta_1 \cdot X^T = \begin{bmatrix} 0.000555 & 0.00111 \\ 0.000535 & 0.00107 \end{bmatrix} \]

Multiplying by the learning rate:
\[ 0.5 \cdot \delta_1 \cdot X^T = \begin{bmatrix} 0.0002775 & 0.000555 \\ 0.0002675 & 0.000535 \end{bmatrix} \]

Updating the weights:
\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \end{bmatrix} - \begin{bmatrix} 0.0002775 & 0.000555 \\ 0.0002675 & 0.000535 \end{bmatrix} \]
\[ W_1 = \begin{bmatrix} 0.1497225 & 0.199445 \\ 0.2497325 & 0.299465 \end{bmatrix} \]

Update Biases for the Hidden Layer (\( b_1 \)):

\[ b_1 = b_1 - \eta \cdot \delta_1 \]
\[ b_1 = \begin{bmatrix} 0.35 \\ 0.35 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0111 \\ 0.0107 \end{bmatrix} \]
\[ b_1 = \begin{bmatrix} 0.35 - 0.00555 \\ 0.35 - 0.00535 \end{bmatrix} \]
\[ b_1 = \begin{bmatrix} 0.34445 \\ 0.34465 \end{bmatrix} \]

Finally of Updated Parameters:


- Updated weights and biases for the output layer:
\[ W_2 = \begin{bmatrix} 0.3602 & 0.4100 \\ 0.5093 & 0.5594 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.53295 \\ 0.6157 \end{bmatrix} \]

- Updated weights and biases for the hidden layer:
\[ W_1 = \begin{bmatrix} 0.1497225 & 0.199445 \\ 0.2497325 & 0.299465 \end{bmatrix} \]
\[ b_1 = \begin{bmatrix} 0.34445 \\ 0.34465 \end{bmatrix} \]

This completes the backpropagation process for the given neural network. The weights and biases have been updated based on the error gradients calculated during the backward pass.



Example 2 (2-3-2-2 Network)


1. Define the Network Structure and Initialize Parameters


We will use the sigmoid activation function and include biases in our calculations.

Network Architecture:

- Input layer: 2 neurons
- First hidden layer: 3 neurons
- Second hidden layer: 2 neurons
- Output layer: 2 neurons

Sigmoid Activation Function:

\[ \sigma(x) = \frac{1}{1 + e^{-x}} \]

Example Initialization:

Let's initialize the weights and biases with some example values.

- Input to first hidden layer weights (\( W_1 \)):
\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \\ 0.35 & 0.40 \end{bmatrix} \]
- First hidden layer to second hidden layer weights (\( W_2 \)):
\[ W_2 = \begin{bmatrix} 0.45 & 0.50 & 0.55 \\ 0.60 & 0.65 & 0.70 \end{bmatrix} \]
- Second hidden layer to output layer weights (\( W_3 \)):
\[ W_3 = \begin{bmatrix} 0.75 & 0.80 \\ 0.85 & 0.90 \end{bmatrix} \]
- Biases for first hidden layer (\( b_1 \)):
\[ b_1 = \begin{bmatrix} 0.35 \\ 0.35 \\ 0.35 \end{bmatrix} \]
- Biases for second hidden layer (\( b_2 \)):
\[ b_2 = \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} \]
- Biases for output layer (\( b_3 \)):
\[ b_3 = \begin{bmatrix} 0.75 \\ 0.75 \end{bmatrix} \]

2. Forward Pass


Inputs:

\[ X = \begin{bmatrix} 0.05 \\ 0.10 \end{bmatrix} \]

First Hidden Layer Calculations:

\[ Z_1 = W_1 \cdot X + b_1 \]
\[ Z_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \\ 0.35 & 0.40 \end{bmatrix} \cdot \begin{bmatrix} 0.05 \\ 0.10 \end{bmatrix} + \begin{bmatrix} 0.35 \\ 0.35 \\ 0.35 \end{bmatrix} \]
\[ Z_1 = \begin{bmatrix} (0.15 \cdot 0.05 + 0.20 \cdot 0.10) + 0.35 \\ (0.25 \cdot 0.05 + 0.30 \cdot 0.10) + 0.35 \\ (0.35 \cdot 0.05 + 0.40 \cdot 0.10) + 0.35 \end{bmatrix} \]
\[ Z_1 = \begin{bmatrix} 0.3775 \\ 0.3925 \\ 0.4075 \end{bmatrix} \]

Applying the sigmoid function:
\[ A_1 = \sigma(Z_1) = \begin{bmatrix} \sigma(0.3775) \\ \sigma(0.3925) \\ \sigma(0.4075) \end{bmatrix} \]
\[ A_1 = \begin{bmatrix} 0.5933 \\ 0.5968 \\ 0.6005 \end{bmatrix} \]

Second Hidden Layer Calculations:

\[ Z_2 = W_2 \cdot A_1 + b_2 \]
\[ Z_2 = \begin{bmatrix} 0.45 & 0.50 & 0.55 \\ 0.60 & 0.65 & 0.70 \end{bmatrix} \cdot \begin{bmatrix} 0.5933 \\ 0.5968 \\ 0.6005 \end{bmatrix} + \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} \]
\[ Z_2 = \begin{bmatrix} (0.45 \cdot 0.5933 + 0.50 \cdot 0.5968 + 0.55 \cdot 0.6005) + 0.60 \\ (0.60 \cdot 0.5933 + 0.65 \cdot 0.5968 + 0.70 \cdot 0.6005) + 0.60 \end{bmatrix} \]
\[ Z_2 = \begin{bmatrix} 1.3063 \\ 1.6211 \end{bmatrix} \]

Applying the sigmoid function:
\[ A_2 = \sigma(Z_2) = \begin{bmatrix} \sigma(1.3063) \\ \sigma(1.6211) \end{bmatrix} \]
\[ A_2 = \begin{bmatrix} 0.7869 \\ 0.8350 \end{bmatrix} \]

Output Layer Calculations:

\[ Z_3 = W_3 \cdot A_2 + b_3 \]
\[ Z_3 = \begin{bmatrix} 0.75 & 0.80 \\ 0.85 & 0.90 \end{bmatrix} \cdot \begin{bmatrix} 0.7869 \\ 0.8350 \end{bmatrix} + \begin{bmatrix} 0.75 \\ 0.75 \end{bmatrix} \]
\[ Z_3 = \begin{bmatrix} (0.75 \cdot 0.7869 + 0.80 \cdot 0.8350) + 0.75 \\ (0.85 \cdot 0.7869 + 0.90 \cdot 0.8350) + 0.75 \end{bmatrix} \]
\[ Z_3 = \begin{bmatrix} 2.2219 \\ 2.5194 \end{bmatrix} \]

Applying the sigmoid function:
\[ A_3 = \sigma(Z_3) = \begin{bmatrix} \sigma(2.2219) \\ \sigma(2.5194) \end{bmatrix} \]
\[ A_3 = \begin{bmatrix} 0.9022 \\ 0.9255 \end{bmatrix} \]

3. Compute the Loss


For simplicity, we use Mean Squared Error (MSE):
\[ L = \frac{1}{2} \sum_{i} (y_i - \hat{y}_i)^2 \]

Let's assume the target output \( Y \) is:
\[ Y = \begin{bmatrix} 0.01 \\ 0.99 \end{bmatrix} \]

\[ L = \frac{1}{2} ((0.01 - 0.9022)^2 + (0.99 - 0.9255)^2) \]
\[ L = \frac{1}{2} ((-0.8922)^2 + (0.0645)^2) \]
\[ L = \frac{1}{2} (0.7960 + 0.0042) \]
\[ L = \frac{1}{2} (0.8002) \]
\[ L = 0.4001 \]

4. Backward Pass


Calculate Output Layer Gradients

\[ \delta_3 = (A_3 - Y) \cdot \sigma'(Z_3) \]
\[ \sigma'(Z_3) = \sigma(Z_3) \cdot (1 - \sigma(Z_3)) \]
\[ \sigma'(Z_3) = \begin{bmatrix} 0.9022 \cdot (1 - 0.9022) \\ 0.9255 \cdot (1 - 0.9255) \end{bmatrix} \]
\[ \sigma'(Z_3) = \begin{bmatrix} 0.0882 \\ 0.0689 \end{bmatrix} \]
\[ \delta_3 = \begin{bmatrix} (0.9022 - 0.01) \cdot 0.0882 \\ (0.9255 - 0.99) \cdot 0.0689 \end{bmatrix} \]
\[ \delta_3 = \begin{bmatrix} 0.0787 \\ -0.0044 \end{bmatrix} \]


Calculate Second Hidden Layer Gradients

\[ \delta_2 = (W_3^T \cdot \delta_3) \cdot \sigma'(Z_2) \]
\[ W_3^T = \begin{bmatrix} 0.75 & 0.85 \\ 0.80 & 0.90 \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} 0.75 & 0.80 \\ 0.85 & 0.90 \end{bmatrix} \cdot \begin{bmatrix} 0.0787 \\ -0.0044 \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} (0.75 \cdot 0.0787 + 0.80 \cdot -0.0044) \\ (0.85 \cdot 0.0787 + 0.90 \cdot -0.0044) \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} 0.0578 \\ 0.0625 \end{bmatrix} \]

Applying the sigmoid derivative to \( Z_2 \):
\[ \sigma'(Z_2) = \sigma(Z_2) \cdot (1 - \sigma(Z_2)) \]
\[ \sigma'(Z_2) = \begin{bmatrix} 0.7869 \cdot (1 - 0.7869) \\ 0.8350 \cdot (1 - 0.8350) \end{bmatrix} \]
\[ \sigma'(Z_2) = \begin{bmatrix} 0.1673 \\ 0.1376 \end{bmatrix} \]

Now calculate \( \delta_2 \):
\[ \delta_2 = \begin{bmatrix} 0.0578 \\ 0.0625 \end{bmatrix} \cdot \begin{bmatrix} 0.1673 \\ 0.1376 \end{bmatrix} \]
\[ \delta_2 = \begin{bmatrix} 0.0097 \\ 0.0086 \end{bmatrix} \]

Calculate First Hidden Layer Gradients

\[ \delta_1 = (W_2^T \cdot \delta_2) \cdot \sigma'(Z_1) \]
\[ W_2^T = \begin{bmatrix} 0.45 & 0.60 \\ 0.50 & 0.65 \\ 0.55 & 0.70 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.45 & 0.50 & 0.55 \\ 0.60 & 0.65 & 0.70 \end{bmatrix}^T \cdot \begin{bmatrix} 0.0097 \\ 0.0086 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.45 \cdot 0.0097 + 0.60 \cdot 0.0086 \\ 0.50 \cdot 0.0097 + 0.65 \cdot 0.0086 \\ 0.55 \cdot 0.0097 + 0.70 \cdot 0.0086 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.0110 \\ 0.0115 \\ 0.0120 \end{bmatrix} \]

Applying the sigmoid derivative to \( Z_1 \):
\[ \sigma'(Z_1) = \sigma(Z_1) \cdot (1 - \sigma(Z_1)) \]
\[ \sigma'(Z_1) = \begin{bmatrix} 0.5933 \cdot (1 - 0.5933) \\ 0.5968 \cdot (1 - 0.5968) \\ 0.6005 \cdot (1 - 0.6005) \end{bmatrix} \]
\[ \sigma'(Z_1) = \begin{bmatrix} 0.2413 \\ 0.2407 \\ 0.2399 \end{bmatrix} \]

Now calculate \( \delta_1 \):
\[ \delta_1 = \begin{bmatrix} 0.0110 \\ 0.0115 \\ 0.0120 \end{bmatrix} \cdot \begin{bmatrix} 0.2413 \\ 0.2407 \\ 0.2399 \end{bmatrix} \]
\[ \delta_1 = \begin{bmatrix} 0.0027 \\ 0.0028 \\ 0.0029 \end{bmatrix} \]

5. Update Weights and Biases


Using a learning rate \( \eta \) of 0.5 for this example:

Update Weights and Biases for Output Layer (\( W_3 \) and \( b_3 \)):

\[ W_3 = W_3 - \eta \cdot \delta_3 \cdot A_2^T \]
\[ W_3 = \begin{bmatrix} 0.75 & 0.80 \\ 0.85 & 0.90 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0787 \\ -0.0044 \end{bmatrix} \cdot \begin{bmatrix} 0.7869 & 0.8350 \end{bmatrix} \]
\[ W_3 = \begin{bmatrix} 0.75 & 0.80 \\ 0.85 & 0.90 \end{bmatrix} - \begin{bmatrix} 0.0309 & 0.0328 \\ -0.0017 & -0.0018 \end{bmatrix} \]
\[ W_3 = \begin{bmatrix} 0.7191 & 0.7672 \\ 0.8517 & 0.9018 \end{bmatrix} \]

\[ b_3 = b_3 - \eta \cdot \delta_3 \]
\[ b_3 = \begin{bmatrix} 0.75 \\ 0.75 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0787 \\ -0.0044 \end{bmatrix} \]
\[ b_3 = \begin{bmatrix} 0.75 \\ 0.75 \end{bmatrix} - \begin{bmatrix} 0.0394 \\ -0.0022 \end{bmatrix} \]
\[ b_3 = \begin{bmatrix} 0.7106 \\ 0.7522 \end{bmatrix} \]

Update Weights and Biases for Second Hidden Layer (\( W_2 \) and \( b_2 \)):

\[ W_2 = W_2 - \eta \cdot \delta_2 \cdot A_1^T \]
\[ W_2 = \begin{bmatrix} 0.45 & 0.50 & 0.55 \\ 0.60 & 0.65 & 0.70 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0097 \\ 0.0086 \end{bmatrix} \cdot \begin{bmatrix} 0.5933 & 0.5968 & 0.6005 \end{bmatrix} \]
\[ W_2 = \begin{bmatrix} 0.45 & 0.50 & 0.55 \\ 0.60 & 0.65 & 0.70 \end{bmatrix} - \begin{bmatrix} 0.0029 & 0.0030 & 0.0030 \\ 0.0026 & 0.0026 & 0.0026 \end{bmatrix} \]
\[ W_2 = \begin{bmatrix} 0.4471 & 0.4970 & 0.5470 \\ 0.5974 & 0.6474 & 0.6974 \end{bmatrix} \]

\[ b_2 = b_2 - \eta \cdot \delta_2 \]
\[ b_2 = \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0097 \\ 0.0086 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.60 \\ 0.60 \end{bmatrix} - \begin{bmatrix} 0.0049 \\ 0.0043 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.5951 \\ 0.5957 \end{bmatrix} \]



Update Weights and Biases for First Hidden Layer (\( W_1 \) and \( b_1 \)):


\[ W_1 = W_1 - \eta \cdot \delta_1 \cdot X^T \]

Using the learning rate \( \eta = 0.5 \):

\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \\ 0.35 & 0.40 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0027 \\ 0.0028 \\ 0.0029 \end{bmatrix} \cdot \begin{bmatrix} 0.05 & 0.10 \end{bmatrix} \]

Calculating the gradients:

\[ \delta_1 \cdot X^T = \begin{bmatrix} 0.0027 \\ 0.0028 \\ 0.0029 \end{bmatrix} \cdot \begin{bmatrix} 0.05 & 0.10 \end{bmatrix} = \begin{bmatrix} 0.0027 \cdot 0.05 & 0.0027 \cdot 0.10 \\ 0.0028 \cdot 0.05 & 0.0028 \cdot 0.10 \\ 0.0029 \cdot 0.05 & 0.0029 \cdot 0.10 \end{bmatrix} \]

\[ \delta_1 \cdot X^T = \begin{bmatrix} 0.000135 & 0.000270 \\ 0.000140 & 0.000280 \\ 0.000145 & 0.000290 \end{bmatrix} \]

Multiplying by the learning rate:

\[ 0.5 \cdot \delta_1 \cdot X^T = \begin{bmatrix} 0.0000675 & 0.000135 \\ 0.000070 & 0.000140 \\ 0.0000725 & 0.000145 \end{bmatrix} \]

Updating the weights:

\[ W_1 = \begin{bmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \\ 0.35 & 0.40 \end{bmatrix} - \begin{bmatrix} 0.0000675 & 0.000135 \\ 0.000070 & 0.000140 \\ 0.0000725 & 0.000145 \end{bmatrix} \]

\[ W_1 = \begin{bmatrix} 0.1499325 & 0.199865 \\ 0.249930 & 0.299860 \\ 0.3499275 & 0.399855 \end{bmatrix} \]

Updating the biases:

\[ b_1 = b_1 - \eta \cdot \delta_1 \]

\[ b_1 = \begin{bmatrix} 0.35 \\ 0.35 \\ 0.35 \end{bmatrix} - 0.5 \cdot \begin{bmatrix} 0.0027 \\ 0.0028 \\ 0.0029 \end{bmatrix} \]

\[ b_1 = \begin{bmatrix} 0.35 \\ 0.35 \\ 0.35 \end{bmatrix} - \begin{bmatrix} 0.00135 \\ 0.0014 \\ 0.00145 \end{bmatrix} \]

\[ b_1 = \begin{bmatrix} 0.34865 \\ 0.3486 \\ 0.34855 \end{bmatrix} \]

Finally the Updated Parameters


- Updated weights and biases for the output layer:
\[ W_3 = \begin{bmatrix} 0.7191 & 0.7672 \\ 0.8517 & 0.9018 \end{bmatrix} \]
\[ b_3 = \begin{bmatrix} 0.7106 \\ 0.7522 \end{bmatrix} \]

- Updated weights and biases for the second hidden layer:
\[ W_2 = \begin{bmatrix} 0.4471 & 0.4970 & 0.5470 \\ 0.5974 & 0.6474 & 0.6974 \end{bmatrix} \]
\[ b_2 = \begin{bmatrix} 0.5951 \\ 0.5957 \end{bmatrix} \]

- Updated weights and biases for the first hidden layer:
\[ W_1 = \begin{bmatrix} 0.1499325 & 0.199865 \\ 0.249930 & 0.299860 \\ 0.3499275 & 0.399855 \end{bmatrix} \]
\[ b_1 = \begin{bmatrix} 0.34865 \\ 0.3486 \\ 0.34855 \end{bmatrix} \]

This completes one full forward and backward pass (backpropagation) with the given example numbers through a neural network with 2 input neurons, 2 hidden layers (with 3 and 2 neurons respectively), and 2 output neurons using the sigmoid activation function.























WebGPU by Example: Fractals, Image Effects, Ray-Tracing, Procedural Geometry, 2D/3D, Particles, Simulations WebGPU Compute graphics and animations using the webgpu api 12 week course kenwright learn webgpu api kenwright programming compute and graphics applications with html5 and webgpu api kenwright real-time 3d graphics with webgpu kenwright webgpu api develompent a quick start guide kenwright webgpu by example 2022 kenwright webgpu gems kenwright webgpu interactive compute and graphics visualization cookbook kenwright wgsl webgpu shading language cookbook kenwright wgsl webgpugems shading language cookbook kenwright



 
Advert (Support Website)

 
 Visitor:
Copyright (c) 2002-2025 xbdev.net - All rights reserved.
Designated articles, tutorials and software are the property of their respective owners.