Neural Networks and Deep Learning with WebGPU Compute

Available on Amazon

Practical introduction to neural networks & deep learning from the ground-up using the WebGPU API (and compute shaders).

- history about neural networks (what they are and where they came from)
- learn about neural networks (maths and theory)
- write your own from the ground up (both JavaScript and WGSL Compute Shaders)
- learn about the webgpu api (and compute shaders)
- learn to combine webgpu and machine learning
- learn about developing parallel neural networks (suitable for the gpu/massively parallel architectures)
- parallel compute challenges and how to overcome them (with hands on examples that focus on neural networks and the gpu)
- testing and debugging neural networks

The following gives chapter insights, code and examples for some of the chapters.

Background (History)

In the half-century since the invention of neural networks, web-technologies have continued to push the boundaries of knowledge to deliver solutions that ebrace ingenuity for the web. These web-technologies unlock the potential of neural networks in new ways that can be accessed through the web.

• Neural Network Landscape (Overview).
• Different types of neural networks.
• Digial Brain! Hype (LLMs, DALLE, ChatGPT, ..)
• Computational power (transistor numbers)

Neural Networks (Concepts, Maths and Hacks)

The human brain is more complex than any other known structure in the universe. Neuroscientists study the inner workings of the human brain. Brain is made up of neurons, also known as nerve cells, which send and receive signals from your brain through `connections` (or wires). In fact, the brains contain a striking amount of 'brain wires' (more than 100 trillion connections).

Think of neurons as transitors that make up the brain CPU. However, their vast numbers and connections are what give them the power. Like DNA which consist of only one, or a few tandemly repeated sequences, yet they can represent the building blocks of life. Digital neural network are a concept inspired by the structure and function of biological neurons in brains. Mathematic models that emulate the behavour of simple neurons. These simple concepts can be connected together to construct solutions to complex problems.

• Basic 'neuron' (or perceptron) and related components (inputs, outputs, weights, biases and data).
• Activation functions (sinusoid/relu/... - table)
• Fully connected neural networks
• Forward propogation (and backward propogation) - finding coefficients.
• Step-by-step examples (visual run through using a minimal example/lots of diagrams).
• Other training (non-linear - genetic algorithms, differential evolutionary, ..)
• Special neural networks - memory, feedback, 'not-fully' connected

Step-by-Step Numerical Examples

Before you start coding a neural network or tinkering under the hood, it's crucial to understand the fundamentals of the model (maths). Not just the equatiosn but how they all fit together. First and foremost, why and what the basic perceptron does and how it's gets the result it does, should be your top priority. Then there is the training - how the gradients and errors are able to iteratively improve the weights. Instead of just copying out the equations and looking at them again and again - a better approach is to actually plug-in some numbers and work through some simple neural network configurations.

- Step by step calculating forward/backward
- Show/link equation to numbers
- Outputs, errors, gradients
- Different configurations
- Visual diagram of the steps (not just text)
- Checks or bugs (how to debug and check it works)

Neural Network JavaScript Side (or Python?)

One of the primary reasons to build a client-side implementation before jumping into writing a GPU one is to aid in the development process. Crafting a JavaScript (CPU) model offers a tangible platform for you to explore and develop your ideas. Testing and manipulating the algorithms, you can experiment with the implementation's various aspects, such as its data organisation, tests, layout, and structure. This also provides a hands-on opportunity to test different designs, test cases and algorithms, which can significantly impact the final WebGPU compute version.

• fully connected neural networks
• hardcoded 2x3x1 (input-hidden-output) example for XOR
• flexible constants
• saving/loading data (weights/biases) - important for debugging or traing for long times
• training neural networks (backpropagation)
• hacks/tricks (engineering workarounds and numerical limits)
- No such thing as 'perfect' data - dirty data (added dash noise) good for the neural network
- Types of noise (fractal, pure, )
- Noise - oscilating/vibrations/fine tuning/...
• Dynamic learning rate
• Randomness on GPU (seeds)

Visualizing Neural Networks

Visualizing neural network data is vital in the digital age due to their complexity and vast information storage. A picture says a thousand words. That's how the method works, using visuals to simplify the complexity. Visualizations puts information into a visual context (also let us see problems at a glance). For example, our brain processes images

60,000

times faster than text. Not to mention, people often understand trends and patterns when seeing them in visual form (vs just looking at numbers or tables).

• Why and how to visualize neural networks
• Topology view (connected neurons, weights, biases)
• Visualizing data path from input to output
• Input vs output plot

WebGPU Intro (Compute Shaders/WGSL)

While web-technologies and tools available to client-side browsers are limited, the latest addition offers a new API aimed at solving old pains for working with the GPU. One aspect unlocking the power of the GPU for parallel compute processing - creating new benefits and possibilties for web-based neural network development. Before delving into developing a web-based compute neural network using WebGPU, we'll look at the API and the compute shader.

• What is WebGPU and WGSL?
• Setting up WebGPU/WGSL Language/Compute Shader
• Workgroups/Threads
• Minimal working example (basic compute shader)
• Structure of Arrays or Array of structures (data)

Neural Network GPU (Dirty Slow - Single Thread)

People want to jump into developing large parallel implementations as soon as possible. Starting coding! Measurable progress. Push, push, push. Get it done fast! Unfortunately, this approach is as bad as it is common - because it means testing and planning will not be nearly as detailed and rigorous as it needs to be. Problems aren't recognized. Solutions aren't found. And overlooked problems don't vanish. Sooner or later, they surface, and because the implementation is already in coded, they risk the painful task of having to delete everything and start again, particularly when problems bump up against multi-threaded algorithms, like random results or poor performance. This is how coding something that should be simple and fun turns into an agonizing crawl that will finish desperately late leave horrible mental scares. At this stage, it just about getting the neural network on the GPU - having it run and match the results from the client-side implementation.

• setting up buffers/shaders
- blocks of data
• hard coded constants (proof of concept)
• shifting from JavaScript to compute shaders
• dumping and comparing results (data to/from GPU)
• performance (why so slow?) - understanding slowdowns/bottlenecks

Shifting the algorithm to multiple threads is of utmost importance in creating a successful compute neural network. Through an understanding the algorithm, data and the implementation, we can explore and develop solutions, so parts of the algorithm calculations can operate independently - avoiding any delays or stalls and distributing the workload across many threads - bringing the compute algorithm to life and demonstrating the power of compute algorithms for neural network problems.

• Add more threads (that get used)
• Faster as you add more nodes (few nodes slower)
• Split into three parts - activate, errors, weight updates
• Split work over 'layers' (forward/backward)

Generic Config (Multiple Thread)

Developing a set of modular functions to contain the implementation. So you can switch between different versions of the algorithm (e.g., cpu, gpu, cpu1, gpu1, ...). Each build could include additional features or aspects of the algorithm that are limited to specific hardware or are in early experimental/testing phases. For instance, if you're going to test your neural network on different computers, you might want to be able to drop down to the cpu version if the WebGPU API isn't supported or availble in the browser for that machine. Removing and shifting to a more customizable and modular design thar removes some of the hardcoded defines and allows the prototype/test code to be taken out for a test drive on some interesting projects (e.g., include the header and initialise/use the neural network)

• modular design (activate/propagate)
• loading/saving weights/biases (checking)
• initialization and configuration (layers, weights, learning rate, dimensions)

Test cases/projects using Neural Networks
Take the neural network out for a test drive - provide a range of projects to show the flexibility and power of neural networks. The projects have been chosen so that are small enough to be trained/tested on a general computer with minimal processing power (don't require days or months to train) - be trained/tested in a few minutes. Building blocks for larger more elaborate concepts - test the water and give you a taste. Examples include text data classification, image data extraction, image filtering, sound generating, drawing programs and so on.