www.xbdev.net
xbdev - software development
Friday May 8, 2026
Home | Contact | Support | WebGPU Graphics and Compute ...
     
 

WebGPU/WGSL Tutorials and Articles

Graphics and Compute ...

 


IT Crowd TV series - famous and funny quote when Jan says not to talk about memory or ram - and Moss laughs and jokes that ram ...
IT Crowd TV series - famous and funny quote when Jan says not to talk about memory or ram - and Moss laughs and jokes that ram is memory. Jan is head of IT - but has no idea how computers work.


Memory Array - Alignment Buffers and Pains


Going to talk about memory and how it isn't always what you think it is! Especially for long sequential blocks of data!

For example the below implementation copies from
array0
to
array1
! What could be simplier? Both are just an array of floats. However, in WGSL the
array1
is stored as
array< vec3<f32> >
- which is common as you might want to group every 3 values as positions or vectors (x,y,z) - but what you might not be aware of is how it's stored in memory on the GPU!!

As the
vec3<f32>
is a structure - so all structures are aligned to 16 byte boundaries!

If the memory definitions are like this:

@group(0) @binding(0) var<storage, read>       array0 : array< vec3<f32> >; 
@group(0) @binding(1) var<storage, read_write> array1 : array< f32 >; 


And the WGSL code to copy
array0
to
array1
is:

<?php
    // vec1
    array1[ global_id.x+0 ] = array0[ global_id.x ].x;
    array1[ global_id.x+1 ] = array0[ global_id.x ].y;
    array1[ global_id.x+2 ] = array0[ global_id.x ].z;
    
    // vec2
    array1[ global_id.x+3 ] = array0[ global_id.x+1 ].x;
    array1[ global_id.x+4 ] = array0[ global_id.x+1 ].y;
    array1[ global_id.x+5 ] = array0[ global_id.x+1 ].z;


We get this:

<?php
log:["res:",[1,2,3,5,6,0]]


Instead of this:

<?php
log:["res:",[1,2,3,4,5,6]]


All because of the internal alignment/padding - which can be a real pain if you're not careful.


Below gives the complete implementation - with a link to the webgpulab at the bottom so you can test out the code.

How to Fix?


You can pad the data - by adding an extra 0 every 4 floating point values in the array. Then use two
vec3
structures in the WGSL code.

However, a more compact and simplier way is just to make sure the structure in WGSL are both
f32
- that way the input/output is the same - inside the WGSL code, just convert the array values to a
vec3
structure manually.

// Convert the flat array of f32 into vec3s
let v0 = vec3<f32>(array0[ global_id.x + 0 ],
                   array0[ global_id.x + 1 ],
                   array0[ global_id.x + 2 ] );
 
let v1 = vec3<f32>(array0[ global_id.x + 3 ],
                   array0[ global_id.x + 4 ],
                   array0[ global_id.x + 5 ] );
                   
// Then use them as normal
// vec1
array1[ global_id.x+0 ] = v0.x;
array1[ global_id.x+1 ] = v0.y;
array1[ global_id.x+2 ] = v0.z;

// vec2
array1[ global_id.x+3 ] = v1.x;
array1[ global_id.x+4 ] = v1.y;
array1[ global_id.x+5 ] = v1.z;


The above gives the following output:

<?php
log:["res:",[1,2,3,4,5,6]



Implementation


The full code that can be run from a `index.js` file (including all the initialization and copying to/from the GPU and the compute shader).

const adapter = await navigator.gpu.requestAdapter();
const device  = await adapter.requestDevice();

const array0 = new Float32Array( [1, 2, 3, 4, 5, 6] );
const array1 = new Float32Array(  array0.byteLength/4    );


var array0Buffer = device.createBuffer({ size: array0.byteLength, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST | GPUBufferUsage.COPY_SRC } );
var array1Buffer = device.createBuffer({ size: array1.byteLength, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST | GPUBufferUsage.COPY_SRC } );

device.queue.writeBuffer(array0Buffer, 0, array0);

const shaderCode = `

@group(0) @binding(0) var<storage, read>       array0 : array< vec3<f32> >; 
@group(0) @binding(1) var<storage, read_write> array1 : array< f32 >;  

@compute @workgroup_size(1)
fn main(@builtin(global_invocation_id) global_id : vec3<u32>) {

    // vec1
    array1[ global_id.x+0 ] = array0[ global_id.x ].x;
    array1[ global_id.x+1 ] = array0[ global_id.x ].y;
    array1[ global_id.x+2 ] = array0[ global_id.x ].z;
    
    // vec2
    array1[ global_id.x+3 ] = array0[ global_id.x+1 ].x;
    array1[ global_id.x+4 ] = array0[ global_id.x+1 ].y;
    array1[ global_id.x+5 ] = array0[ global_id.x+1 ].z;
}
`;

const pipeline = device.createComputePipeline({
    layout: 'auto',
    compute: {
        module: device.createShaderModule({
            code: shaderCode
        }),
        entryPoint: 'main'
    }
});


const bindGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
        { binding: 0, resource: { buffer: array0Buffer } },   
        { binding: 1, resource: { buffer: array1Buffer } },  
    ]
});

const commandEncoder = device.createCommandEncoder();
const passEncoder = commandEncoder.beginComputePass();

passEncoder.setPipeline(pipeline);
passEncoder.setBindGroup(0, bindGroup);
passEncoder.dispatchWorkgroups( 1 , 1, 1);
await passEncoder.end();
device.queue.submit([commandEncoder.finish()]);

await device.queue.onSubmittedWorkDone();

// -------------------------------

// All the compute is done - just a matter of copying the data back from the array for analysis.
// -------------------------------

// Write a small helper function
async function getGPUBuffer( buf, siz, msg )
{
// Note this buffer is not linked to the 'STORAGE' compute (used to bring the data back to the CPU)
const gbufferTmp = device.createBuffer({ size:  siz, usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ});

const commandEncoder = device.createCommandEncoder();
// Encode commands for copying buffer to buffer.
commandEncoder.copyBufferToBuffer(
    buf,           // source buffer
    0,                  // source offset
    gbufferTmp,           // destination buffer
    0,                  // destination offset
    siz  // size
);

// Submit GPU commands.
const gpuCommands = commandEncoder.finish();
await device.queue.submit([gpuCommands]);

// Read buffer.
await gbufferTmp.mapAsync(GPUMapMode.READ);
const arrayBuffer = gbufferTmp.getMappedRange();
const arr = Array.from( new Float32Array(arrayBuffer) );
gbufferTmp.unmap();
//log(msg + 'array contents:', arr);
return arr;
}

// Copy array1 back and print it to the output
let res = await getGPUBuffer( array1Buffer, array1.byteLength, 'array1 ' );

console.log('res:', res );



Alignment with `align` Expression in Structures


There is the
align
expresssion - which you can add into structures to force alignment. However, be aware, the alignment but be a power of 2.

We can do a simple example - create a structure and have an input data - copy 2 array items - the first array item is fine (
data1
), it's only when the second array item is copied you notice the problem.

We use an input array with a sequence of numbers from 1 to 12, and the output is given below:

<?php
log:["res:",[1,2,3,5,6,7,9,10,11,0,0,0]]


As you can see - the array isn't a sequential count, 4 is missing, 8 is missing etc. As the data is aligned - so our sequential data in memory doens't align with the WGSL structures.


<?php
struct AlignedData {
    // This field is aligned to 8 bytes
    @align(8)
    data1: vec3<f32>,

    // This field is aligned to 8 bytes
    @align(8)
    data2: vec2<f32>,

    // This field has the default alignment
    data3: f32,
};

@group(0) @binding(0) var<storage, read>       array0 : array< AlignedData >; 
@group(0) @binding(1) var<storage, read_write> array1 : array< f32 >;  

@compute @workgroup_size(1)
fn main(@builtin(global_invocation_id) global_id : vec3<u32>) {

var c = global_id.x;

// copy 2 array items
for (var i=0; i<2; i++)
{
    array1[ c ] = array0[i].data1.x;  c++;
    array1[ c ] = array0[i].data1.y;  c++;
    array1[ c ] = array0[i].data1.z;  c++;
    
    array1[ c ] = array0[i].data2.x;  c++;
    array1[ c ] = array0[i].data2.y;  c++;
    
    array1[ c ] = array0[i].data3;  c++;
}

}// end main


Larger offsets


For example if we increase the size of the array test data 1..32 - then make the alignments larger, you'll get:

<?php
struct AlignedData {
    // This field is aligned to x bytes
    @align(16)
    data1: vec3<f32>,

    // This field is aligned to x bytes
    @align(32)
    data2: vec2<f32>,

    // This field has the default alignment
    data3: f32,
};


Output

<?php
log:["res:",[0,1,2,8,9,10,16,17,18,24,25,26,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]]


You can't just force alignment to 1 byte boundaries - as the offset for each field variable needs to be a power of 2! Hence, you'll always get gaps - but you have a bit more control over the gaps with the align expression.






Resources & Links


Memory Arrays and Vec3 (WebGPU Lab Live Demo)














101 WebGPU Programming Projects. WebGPU Development Pixels - coding fragment shaders from post processing to ray tracing! WebGPU by Example: Fractals, Image Effects, Ray-Tracing, Procedural Geometry, 2D/3D, Particles, Simulations WebGPU Games WGSL 2d 3d interactive web-based fun learning WebGPU Compute WebGPU API - Owners WebGPU Development Cookbook - coding recipes for all your webgpu needs! WebGPU & WGSL Essentials: A Hands-On Approach to Interactive Graphics, Games, 2D Interfaces, 3D Meshes, Animation, Security and Production Kenwright graphics and animations using the webgpu api 12 week course kenwright learn webgpu api kenwright programming compute and graphics applications with html5 and webgpu api kenwright real-time 3d graphics with webgpu kenwright webgpu for dummies kenwright webgpu wgsl compute graphics all in one kenwright webgpu api develompent a quick start guide kenwright webgpu by example 2022 kenwright webgpu gems kenwright webgpu interactive compute and graphics visualization cookbook kenwright wgsl webgpu shading language cookbook kenwright WebGPU Shader Language Development: Vertex, Fragment, Compute Shaders for Programmers Kenwright WGSL Fundamentals book kenwright WebGPU Data Visualization Cookbook kenwright Special Effects Programming with WebGPU kenwright WebGPU Programming Guide: Interactive Graphics and Compute Programming with WebGPU & WGSL kenwright Ray-Tracing with WebGPU kenwright



 
Advert (Support Website)

 
 Visitor:
Copyright (c) 2002-2026 xbdev.net - All rights reserved.
Designated articles, tutorials and software are the property of their respective owners.