Friday February 6, 2026

Home | Contact | Support | WebGPU Graphics and Compute ... | Neural Networks and WebGPU Compute.. Learning from Data.....

Neural Networks and WebGPU Compute..

Learning from Data.....

Setting up WebGPU/WGSL Language/Compute Shader

Structure of Arrays or Array of Structures

When develping algorithms based on data, it's improtant to understand how the data will be sorted - and what storage will best maximize the algorithms performance. The two main types of data storage is Structure of Arrays (SoA) or Array of Structures (AoS) - each with their own benefits and disadvanages.

For compute shaders on the GPU, SoA is typically preferred due to its alignment with GPU architecture and memory access patterns, leading to better performance for attribute-specific operations across large datasets. However, the choice between SoA and AoS may still depend on the specific use case and access patterns of the application.

Structure of Arrays (SoA):
In a Structure of Arrays (SoA) configuration, data is organized such that each attribute of a structure is stored in a separate array. For instance, if you have a structure with three attributes (e.g., position, velocity, and mass), you will have three separate arrays: one for positions, one for velocities, and one for masses. This approach enables highly efficient memory access patterns for operations that need to process a single attribute across many elements, as it allows for contiguous memory access and better cache utilization. Vectorized operations and SIMD (Single Instruction, Multiple Data) instructions benefit significantly from this arrangement, leading to improved performance in scenarios where the same attribute needs to be processed simultaneously across multiple elements.

Array of Structures (AoS):
In an Array of Structures (AoS) setup, data is organized such that each array element is a structure containing all attributes. For example, an array of structures would have each element storing position, velocity, and mass together. This organization is straightforward and intuitive, making it easier to understand and manipulate at the code level. It is especially useful when operations require access to multiple attributes of a single element since all the relevant data is located close together in memory, reducing the need for multiple memory accesses. However, this can lead to inefficient memory access patterns and cache utilization when dealing with large datasets and operations that focus on individual attributes across multiple elements.

For compute shaders on GPUs, Structure of Arrays (SoA) is generally better than Array of Structures (AoS) due to several reasons:

Pros of SoA for GPUs:
1. Memory Coalescing: GPUs perform better when memory accesses are coalesced, meaning threads access contiguous memory locations. SoA ensures that each attribute is stored in a contiguous block of memory, leading to efficient memory access patterns.
2. Vectorization: Many GPU operations are vectorized, and SoA aligns well with this by enabling parallel processing of a single attribute across multiple data elements, improving throughput and performance.
3. Cache Efficiency: SoA can improve cache efficiency as accessing the same attribute for multiple elements benefits from spatial locality, reducing cache misses and latency.

Cons of SoA for GPUs:
1. Complexity: Managing multiple arrays for different attributes can add complexity to the code, making it harder to maintain and debug.
2. Data Interleaving: When operations require simultaneous access to multiple attributes, SoA might lead to scattered memory access, slightly offsetting the benefits of cache efficiency in such scenarios.

Pros of AoS for GPUs:
1. Simplicity: Easier to understand and manage, especially when dealing with structures that have many attributes. This reduces coding overhead and potential for errors.
2. Grouped Access: When operations require access to multiple attributes of a single structure, AoS ensures that all the required data is close together in memory, reducing the overhead of multiple memory fetches.

Cons of AoS for GPUs:
1. Non-coalesced Access: Accessing individual attributes can lead to non-coalesced memory access patterns, resulting in inefficient use of memory bandwidth and reduced performance.
2. Cache Inefficiency: With AoS, accessing a single attribute across multiple elements can lead to poor cache utilization as attributes of interest are interleaved with other data, leading to increased cache misses.

Simple Compute Example (Minimal Working Code)

A compute shader example in JavaScript acn be implemented in less than 100 lines; this 100 lines includes all of the essential components - such as setting up buffers, pipelines, the WGSL shader code and performing a complex computation that can be distributed across multiple GPU threads.

<?php
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();

async function runComputePipeline(shaderCode, bindings, workGroupCount) {
    const shaderModule = device.createShaderModule({ code: shaderCode });
  
    let entries = [];
    for (let n=0; n<bindings.length; n++) {
       entries.push( {binding: bindings[n].binding, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"} } );
    }
    const bindGroupLayout = device.createBindGroupLayout( { "entries": entries } );
  	/*
    const bindGroupLayout = device.createBindGroupLayout({
    	entries: [ {binding: 0, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  },
                   {binding: 1, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  },
                   {binding: 2, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  },
                   {binding: 3, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  } 
                 ]
  	});
    */
  
    const pipeline = device.createComputePipeline({
      layout: device.createPipelineLayout({bindGroupLayouts: [bindGroupLayout]}),
      compute: { module: shaderModule, entryPoint: 'main' },
    });

    const bindGroup = device.createBindGroup({ layout: bindGroupLayout, entries: bindings });

    const commandEncoder = device.createCommandEncoder();
    const passEncoder = commandEncoder.beginComputePass();
    passEncoder.setPipeline(pipeline);
    passEncoder.setBindGroup(0, bindGroup);
    passEncoder.dispatchWorkgroups(workGroupCount);
    passEncoder.end();

    const commandBuffer = commandEncoder.finish();
    device.queue.submit([commandBuffer]);
    await device.queue.onSubmittedWorkDone();
}

function createBuffer( data, usage ) {
    const buffer = device.createBuffer({
      size: data.byteLength,
      usage: usage | GPUBufferUsage.COPY_DST,
      mappedAtCreation: true,
    });
    new data.constructor(buffer.getMappedRange()).set(data);
    buffer.unmap();
    return buffer;
}

const inputsBuffer       = createBuffer(new Float32Array( [1, 2, 3]  ), GPUBufferUsage.STORAGE );
const outputBuffer       = createBuffer(new Float32Array( [3, 4, 5]  ), GPUBufferUsage.STORAGE );
const weightsBuffer      = createBuffer(new Float32Array( [9, 9, 9]  ), GPUBufferUsage.STORAGE );
const biasesBuffer       = createBuffer(new Float32Array( [8, 7, 6]  ), GPUBufferUsage.STORAGE );

const shaderCode = `
@group(0) @binding(0) var<storage, read_write> weights  : array<f32>;
@group(0) @binding(1) var<storage, read_write> biases   : array<f32>;
@group(0) @binding(2) var<storage, read_write> inputs   : array<f32>;
@group(0) @binding(3) var<storage, read_write> outputs  : array<f32>;


@compute @workgroup_size(1)
fn main( @builtin(global_invocation_id) global_id: vec3<u32>) {
   outputs[0] = 1;
}
`

await runComputePipeline(shaderCode, [
      { binding: 0, resource: { buffer: weightsBuffer } },
      { binding: 1, resource: { buffer: biasesBuffer  } },
      { binding: 2, resource: { buffer: inputsBuffer  } },
      { binding: 3, resource: { buffer: outputBuffer  } },
], 1 );

// Retrieve results
const readBuffer = device.createBuffer({ size: outputBuffer.size, usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ });
    
await readBuffer.mapAsync(GPUMapMode.READ);
const result = Array.from( new Float32Array(readBuffer.getMappedRange()) );
readBuffer.unmap();
console.log( results );

const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();

async function runComputePipeline(shaderCode, bindings, workGroupCount) {
    const shaderModule = device.createShaderModule({ code: shaderCode });

    let entries = [];
    for (let n=0; n<bindings.length; n++) {
       entries.push( {binding: bindings[n].binding, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"} } );
    }
    const bindGroupLayout = device.createBindGroupLayout( { "entries": entries } );
      /*
    const bindGroupLayout = device.createBindGroupLayout({
        entries: [ {binding: 0, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  },
                   {binding: 1, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  },
                   {binding: 2, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  },
                   {binding: 3, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  }
                 ]
      });
    */

    const pipeline = device.createComputePipeline({
      layout: device.createPipelineLayout({bindGroupLayouts: [bindGroupLayout]}),
      compute: { module: shaderModule, entryPoint: 'main' },
    });

    const bindGroup = device.createBindGroup({ layout: bindGroupLayout, entries: bindings });

    const commandEncoder = device.createCommandEncoder();
    const passEncoder = commandEncoder.beginComputePass();
    passEncoder.setPipeline(pipeline);
    passEncoder.setBindGroup(0, bindGroup);
    passEncoder.dispatchWorkgroups(workGroupCount);
    passEncoder.end();

    const commandBuffer = commandEncoder.finish();
    device.queue.submit([commandBuffer]);
    await device.queue.onSubmittedWorkDone();
}

function createBuffer( data, usage ) {
    const buffer = device.createBuffer({
      size: data.byteLength,
      usage: usage | GPUBufferUsage.COPY_DST,
      mappedAtCreation: true,
    });
    new data.constructor(buffer.getMappedRange()).set(data);
    buffer.unmap();
    return buffer;
}

const inputsBuffer       = createBuffer(new Float32Array( [1, 2, 3]  ), GPUBufferUsage.STORAGE );
const outputBuffer       = createBuffer(new Float32Array( [3, 4, 5]  ), GPUBufferUsage.STORAGE );
const weightsBuffer      = createBuffer(new Float32Array( [9, 9, 9]  ), GPUBufferUsage.STORAGE );
const biasesBuffer       = createBuffer(new Float32Array( [8, 7, 6]  ), GPUBufferUsage.STORAGE );

const shaderCode = `
@group(0) @binding(0) var<storage, read_write> weights  : array<f32>;
@group(0) @binding(1) var<storage, read_write> biases   : array<f32>;
@group(0) @binding(2) var<storage, read_write> inputs   : array<f32>;
@group(0) @binding(3) var<storage, read_write> outputs  : array<f32>;

@compute @workgroup_size(1)
fn main( @builtin(global_invocation_id) global_id: vec3<u32>) {
   outputs[0] = 1;
}
`

await runComputePipeline(shaderCode, [
      { binding: 0, resource: { buffer: weightsBuffer } },
      { binding: 1, resource: { buffer: biasesBuffer  } },
      { binding: 2, resource: { buffer: inputsBuffer  } },
      { binding: 3, resource: { buffer: outputBuffer  } },
], 1 );

// Retrieve results
const readBuffer = device.createBuffer({ size: outputBuffer.size, usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ });

await readBuffer.mapAsync(GPUMapMode.READ);
const result = Array.from( new Float32Array(readBuffer.getMappedRange()) );
readBuffer.unmap();
console.log( results );

Resources and Links

Advert (Support Website)

Visitor:

Copyright (c) 2002-2025 xbdev.net - All rights reserved.
Designated articles, tutorials and software are the property of their respective owners.