www.xbdev.net
xbdev - software development
Wednesday January 15, 2025
Home | Contact | Support | WebGPU Graphics and Compute ...
     
 

WebGPU/WGSL Tutorials and Articles

Graphics and Compute ...

 


Memory Array - Alignment Buffers and Pains


Going to talk about memory and how it isn't always what you think it is! Especially for long sequential blocks of data!

For example the below implementation copies from
array0
to
array1
! What could be simplier? Both are just an array of floats. However, in WGSL the
array1
is stored as
array< vec3<f32> >
- which is common as you might want to group every 3 values as positions or vectors (x,y,z) - but what you might not be aware of is how it's stored in memory on the GPU!!

As the
vec3<f32>
is a structure - so all structures are aligned to 16 byte boundaries!

If the memory definitions are like this:

@group(0) @binding(0) var<storageread>       array0 : array< vec3<f32> >; 
@
group(0) @binding(1) var<storageread_writearray1 : array< f32 >; 


And the WGSL code to copy
array0
to
array1
is:

    // vec1
    
array1global_id.x+] = array0global_id.].x;
    
array1global_id.x+] = array0global_id.].y;
    
array1global_id.x+] = array0global_id.].z;
    
    
// vec2
    
array1global_id.x+] = array0global_id.x+].x;
    
array1global_id.x+] = array0global_id.x+].y;
    
array1global_id.x+] = array0global_id.x+].z;


We get this:

log:["res:",[1,2,3,5,6,0]]


Instead of this:

log:["res:",[1,2,3,4,5,6]]


All because of the internal alignment/padding - which can be a real pain if you're not careful.


Below gives the complete implementation - with a link to the webgpulab at the bottom so you can test out the code.

How to Fix?


You can pad the data - by adding an extra 0 every 4 floating point values in the array. Then use two
vec3
structures in the WGSL code.

However, a more compact and simplier way is just to make sure the structure in WGSL are both
f32
- that way the input/output is the same - inside the WGSL code, just convert the array values to a
vec3
structure manually.

// Convert the flat array of f32 into vec3s
let v0 vec3<f32>(array0global_id.],
                   
array0global_id.],
                   
array0global_id.] );
 
let v1 vec3<f32>(array0global_id.],
                   
array0global_id.],
                   
array0global_id.] );
                   
// Then use them as normal
// vec1
array1global_id.x+] = v0.x;
array1global_id.x+] = v0.y;
array1global_id.x+] = v0.z;

// vec2
array1global_id.x+] = v1.x;
array1global_id.x+] = v1.y;
array1global_id.x+] = v1.z;


The above gives the following output:

log:["res:",[1,2,3,4,5,6]



Implementation


The full code that can be run from a `index.js` file (including all the initialization and copying to/from the GPU and the compute shader).

const adapter await navigator.gpu.requestAdapter();
const 
device  await adapter.requestDevice();

const 
array0 = new Float32Array( [123456] );
const 
array1 = new Float32Array(  array0.byteLength/4    );


var 
array0Buffer device.createBuffer({ sizearray0.byteLengthusageGPUBufferUsage.STORAGE GPUBufferUsage.COPY_DST GPUBufferUsage.COPY_SRC } );
var 
array1Buffer device.createBuffer({ sizearray1.byteLengthusageGPUBufferUsage.STORAGE GPUBufferUsage.COPY_DST GPUBufferUsage.COPY_SRC } );

device.queue.writeBuffer(array0Buffer0array0);

const 
shaderCode = `

@group(0) @binding(0) var<storage, read>       array0 : array< vec3<f32> >; 
@group(0) @binding(1) var<storage, read_write> array1 : array< f32 >;  

@compute @workgroup_size(1)
fn main(@builtin(global_invocation_id) global_id : vec3<u32>) {

    // vec1
    array1[ global_id.x+0 ] = array0[ global_id.x ].x;
    array1[ global_id.x+1 ] = array0[ global_id.x ].y;
    array1[ global_id.x+2 ] = array0[ global_id.x ].z;
    
    // vec2
    array1[ global_id.x+3 ] = array0[ global_id.x+1 ].x;
    array1[ global_id.x+4 ] = array0[ global_id.x+1 ].y;
    array1[ global_id.x+5 ] = array0[ global_id.x+1 ].z;
}
`;

const 
pipeline device.createComputePipeline({
    
layout'auto',
    
compute: {
        
moduledevice.createShaderModule({
            
codeshaderCode
        
}),
        
entryPoint'main'
    
}
});


const 
bindGroup device.createBindGroup({
    
layoutpipeline.getBindGroupLayout(0),
    
entries: [
        { 
binding0resource: { bufferarray0Buffer } },   
        { 
binding1resource: { bufferarray1Buffer } },  
    ]
});

const 
commandEncoder device.createCommandEncoder();
const 
passEncoder commandEncoder.beginComputePass();

passEncoder.setPipeline(pipeline);
passEncoder.setBindGroup(0bindGroup);
passEncoder.dispatchWorkgroups11);
await passEncoder.end();
device.queue.submit([commandEncoder.finish()]);

await device.queue.onSubmittedWorkDone();

// -------------------------------

// All the compute is done - just a matter of copying the data back from the array for analysis.
// -------------------------------

// Write a small helper function
async function getGPUBufferbufsizmsg )
{
// Note this buffer is not linked to the 'STORAGE' compute (used to bring the data back to the CPU)
const gbufferTmp device.createBuffer({ size:  sizusageGPUBufferUsage.COPY_DST GPUBufferUsage.MAP_READ});

const 
commandEncoder device.createCommandEncoder();
// Encode commands for copying buffer to buffer.
commandEncoder.copyBufferToBuffer(
    
buf,           // source buffer
    
0,                  // source offset
    
gbufferTmp,           // destination buffer
    
0,                  // destination offset
    
siz  // size
);

// Submit GPU commands.
const gpuCommands commandEncoder.finish();
await device.queue.submit([gpuCommands]);

// Read buffer.
await gbufferTmp.mapAsync(GPUMapMode.READ);
const 
arrayBuffer gbufferTmp.getMappedRange();
const 
arr = Array.from( new Float32Array(arrayBuffer) );
gbufferTmp.unmap();
//log(msg + 'array contents:', arr);
return arr;
}

// Copy array1 back and print it to the output
let res await getGPUBufferarray1Bufferarray1.byteLength'array1 ' );

console.log('res:'res );



Alignment with `align` Expression in Structures


There is the
align
expresssion - which you can add into structures to force alignment. However, be aware, the alignment but be a power of 2.

We can do a simple example - create a structure and have an input data - copy 2 array items - the first array item is fine (
data1
), it's only when the second array item is copied you notice the problem.

We use an input array with a sequence of numbers from 1 to 12, and the output is given below:

log:["res:",[1,2,3,5,6,7,9,10,11,0,0,0]]


As you can see - the array isn't a sequential count, 4 is missing, 8 is missing etc. As the data is aligned - so our sequential data in memory doens't align with the WGSL structures.


struct AlignedData {
    
// This field is aligned to 8 bytes
    
@align(8)
    
data1vec3<f32>,

    
// This field is aligned to 8 bytes
    
@align(8)
    
data2vec2<f32>,

    
// This field has the default alignment
    
data3f32,
};

@
group(0) @binding(0) var<storageread>       array0 : array< AlignedData >; 
@
group(0) @binding(1) var<storageread_writearray1 : array< f32 >;  

@
compute @workgroup_size(1)
fn 
main(@builtin(global_invocation_idglobal_id vec3<u32>) {

var 
global_id.x;

// copy 2 array items
for (var i=0i<2i++)
{
    
array1] = array0[i].data1.x;  c++;
    
array1] = array0[i].data1.y;  c++;
    
array1] = array0[i].data1.z;  c++;
    
    
array1] = array0[i].data2.x;  c++;
    
array1] = array0[i].data2.y;  c++;
    
    
array1] = array0[i].data3;  c++;
}

}
// end main


Larger offsets


For example if we increase the size of the array test data 1..32 - then make the alignments larger, you'll get:

struct AlignedData {
    
// This field is aligned to x bytes
    
@align(16)
    
data1vec3<f32>,

    
// This field is aligned to x bytes
    
@align(32)
    
data2vec2<f32>,

    
// This field has the default alignment
    
data3f32,
};


Output

log:["res:",[0,1,2,8,9,10,16,17,18,24,25,26,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]]


You can't just force alignment to 1 byte boundaries - as the offset for each field variable needs to be a power of 2! Hence, you'll always get gaps - but you have a bit more control over the gaps with the align expression.






Resources & Links


Memory Arrays and Vec3 (WebGPU Lab Live Demo)














WebGPU Development Pixels - coding fragment shaders from post processing to ray tracing! WebGPU by Example: Fractals, Image Effects, Ray-Tracing, Procedural Geometry, 2D/3D, Particles, Simulations WebGPU Games WGSL 2d 3d interactive web-based fun learning WebGPU Compute WebGPU API - Owners WebGPU Development Cookbook - coding recipes for all your webgpu needs! WebGPU & WGSL Essentials: A Hands-On Approach to Interactive Graphics, Games, 2D Interfaces, 3D Meshes, Animation, Security and Production Kenwright graphics and animations using the webgpu api 12 week course kenwright learn webgpu api kenwright programming compute and graphics applications with html5 and webgpu api kenwright real-time 3d graphics with webgpu kenwright webgpu for dummies kenwright webgpu api develompent a quick start guide kenwright webgpu by example 2022 kenwright webgpu gems kenwright webgpu interactive compute and graphics visualization cookbook kenwright wgsl webgpu shading language cookbook kenwright WebGPU Shader Language Development: Vertex, Fragment, Compute Shaders for Programmers Kenwright wgsl webgpugems shading language cookbook kenwright WGSL Fundamentals book kenwright WebGPU Data Visualization Cookbook kenwright Special Effects Programming with WebGPU kenwright WebGPU Programming Guide: Interactive Graphics and Compute Programming with WebGPU & WGSL kenwright Ray-Tracing with WebGPU kenwright



 
Advert (Support Website)

 
 Visitor:
Copyright (c) 2002-2024 xbdev.net - All rights reserved.
Designated articles, tutorials and software are the property of their respective owners.