www.xbdev.net
xbdev - software development
Friday February 6, 2026
Home | Contact | Support | WebGPU Graphics and Compute ... | Neural Networks and WebGPU Compute.. Learning from Data.....
     
 

Neural Networks and WebGPU Compute..

Learning from Data.....

 


Setting up WebGPU/WGSL Language/Compute Shader






Structure of Arrays or Array of Structures


When develping algorithms based on data, it's improtant to understand how the data will be sorted - and what storage will best maximize the algorithms performance. The two main types of data storage is Structure of Arrays (SoA) or Array of Structures (AoS) - each with their own benefits and disadvanages.

For compute shaders on the GPU, SoA is typically preferred due to its alignment with GPU architecture and memory access patterns, leading to better performance for attribute-specific operations across large datasets. However, the choice between SoA and AoS may still depend on the specific use case and access patterns of the application.

Structure of Arrays (SoA):
In a Structure of Arrays (SoA) configuration, data is organized such that each attribute of a structure is stored in a separate array. For instance, if you have a structure with three attributes (e.g., position, velocity, and mass), you will have three separate arrays: one for positions, one for velocities, and one for masses. This approach enables highly efficient memory access patterns for operations that need to process a single attribute across many elements, as it allows for contiguous memory access and better cache utilization. Vectorized operations and SIMD (Single Instruction, Multiple Data) instructions benefit significantly from this arrangement, leading to improved performance in scenarios where the same attribute needs to be processed simultaneously across multiple elements.

Array of Structures (AoS):
In an Array of Structures (AoS) setup, data is organized such that each array element is a structure containing all attributes. For example, an array of structures would have each element storing position, velocity, and mass together. This organization is straightforward and intuitive, making it easier to understand and manipulate at the code level. It is especially useful when operations require access to multiple attributes of a single element since all the relevant data is located close together in memory, reducing the need for multiple memory accesses. However, this can lead to inefficient memory access patterns and cache utilization when dealing with large datasets and operations that focus on individual attributes across multiple elements.


For compute shaders on GPUs, Structure of Arrays (SoA) is generally better than Array of Structures (AoS) due to several reasons:

Pros of SoA for GPUs:
1. Memory Coalescing: GPUs perform better when memory accesses are coalesced, meaning threads access contiguous memory locations. SoA ensures that each attribute is stored in a contiguous block of memory, leading to efficient memory access patterns.
2. Vectorization: Many GPU operations are vectorized, and SoA aligns well with this by enabling parallel processing of a single attribute across multiple data elements, improving throughput and performance.
3. Cache Efficiency: SoA can improve cache efficiency as accessing the same attribute for multiple elements benefits from spatial locality, reducing cache misses and latency.

Cons of SoA for GPUs:
1. Complexity: Managing multiple arrays for different attributes can add complexity to the code, making it harder to maintain and debug.
2. Data Interleaving: When operations require simultaneous access to multiple attributes, SoA might lead to scattered memory access, slightly offsetting the benefits of cache efficiency in such scenarios.

Pros of AoS for GPUs:
1. Simplicity: Easier to understand and manage, especially when dealing with structures that have many attributes. This reduces coding overhead and potential for errors.
2. Grouped Access: When operations require access to multiple attributes of a single structure, AoS ensures that all the required data is close together in memory, reducing the overhead of multiple memory fetches.

Cons of AoS for GPUs:
1. Non-coalesced Access: Accessing individual attributes can lead to non-coalesced memory access patterns, resulting in inefficient use of memory bandwidth and reduced performance.
2. Cache Inefficiency: With AoS, accessing a single attribute across multiple elements can lead to poor cache utilization as attributes of interest are interleaved with other data, leading to increased cache misses.






Simple Compute Example (Minimal Working Code)


A compute shader example in JavaScript acn be implemented in less than 100 lines; this 100 lines includes all of the essential components - such as setting up buffers, pipelines, the WGSL shader code and performing a complex computation that can be distributed across multiple GPU threads.

const adapter await navigator.gpu.requestAdapter();
const 
device await adapter.requestDevice();

async function runComputePipeline(shaderCodebindingsworkGroupCount) {
    const 
shaderModule device.createShaderModule({ codeshaderCode });
  
    
let entries = [];
    for (
let n=0n<bindings.lengthn++) {
       
entries.push( {bindingbindings[n].bindingvisibilityGPUShaderStage.COMPUTEbuffer: {type"storage"} } );
    }
    const 
bindGroupLayout device.createBindGroupLayout( { "entries"entries } );
      
/*
    const bindGroupLayout = device.createBindGroupLayout({
        entries: [ {binding: 0, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  },
                   {binding: 1, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  },
                   {binding: 2, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  },
                   {binding: 3, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"}  } 
                 ]
      });
    */
  
    
const pipeline device.createComputePipeline({
      
layoutdevice.createPipelineLayout({bindGroupLayouts: [bindGroupLayout]}),
      
compute: { moduleshaderModuleentryPoint'main' },
    });

    const 
bindGroup device.createBindGroup({ layoutbindGroupLayoutentriesbindings });

    const 
commandEncoder device.createCommandEncoder();
    const 
passEncoder commandEncoder.beginComputePass();
    
passEncoder.setPipeline(pipeline);
    
passEncoder.setBindGroup(0bindGroup);
    
passEncoder.dispatchWorkgroups(workGroupCount);
    
passEncoder.end();

    const 
commandBuffer commandEncoder.finish();
    
device.queue.submit([commandBuffer]);
    
await device.queue.onSubmittedWorkDone();
}

function 
createBufferdatausage ) {
    const 
buffer device.createBuffer({
      
sizedata.byteLength,
      
usageusage GPUBufferUsage.COPY_DST,
      
mappedAtCreationtrue,
    });
    new 
data.constructor(buffer.getMappedRange()).set(data);
    
buffer.unmap();
    return 
buffer;
}

const 
inputsBuffer       createBuffer(new Float32Array( [123]  ), GPUBufferUsage.STORAGE );
const 
outputBuffer       createBuffer(new Float32Array( [345]  ), GPUBufferUsage.STORAGE );
const 
weightsBuffer      createBuffer(new Float32Array( [999]  ), GPUBufferUsage.STORAGE );
const 
biasesBuffer       createBuffer(new Float32Array( [876]  ), GPUBufferUsage.STORAGE );

const 
shaderCode = `
@group(0) @binding(0) var<storage, read_write> weights  : array<f32>;
@group(0) @binding(1) var<storage, read_write> biases   : array<f32>;
@group(0) @binding(2) var<storage, read_write> inputs   : array<f32>;
@group(0) @binding(3) var<storage, read_write> outputs  : array<f32>;


@compute @workgroup_size(1)
fn main( @builtin(global_invocation_id) global_id: vec3<u32>) {
   outputs[0] = 1;
}
`

await runComputePipeline(shaderCode, [
      { 
binding0resource: { bufferweightsBuffer } },
      { 
binding1resource: { bufferbiasesBuffer  } },
      { 
binding2resource: { bufferinputsBuffer  } },
      { 
binding3resource: { bufferoutputBuffer  } },
], 
);

// Retrieve results
const readBuffer device.createBuffer({ sizeoutputBuffer.sizeusageGPUBufferUsage.COPY_DST GPUBufferUsage.MAP_READ });
    
await readBuffer.mapAsync(GPUMapMode.READ);
const 
result = Array.from( new Float32Array(readBuffer.getMappedRange()) );
readBuffer.unmap();
console.logresults );





Resources and Links




































WebGPU by Example: Fractals, Image Effects, Ray-Tracing, Procedural Geometry, 2D/3D, Particles, Simulations WebGPU Compute graphics and animations using the webgpu api 12 week course kenwright learn webgpu api kenwright programming compute and graphics applications with html5 and webgpu api kenwright real-time 3d graphics with webgpu kenwright webgpu api develompent a quick start guide kenwright webgpu by example 2022 kenwright webgpu gems kenwright webgpu interactive compute and graphics visualization cookbook kenwright wgsl webgpu shading language cookbook kenwright wgsl webgpugems shading language cookbook kenwright



 
Advert (Support Website)

 
 Visitor:
Copyright (c) 2002-2025 xbdev.net - All rights reserved.
Designated articles, tutorials and software are the property of their respective owners.