Often when you're generating meshes you'll have lots of loops and recursive components - that makes them idea for parallel architectures!
Say hello to `WebGPU`!
Your friendly tool for generating meshes quickly and easily on the GPU.
The WebGPU API has a `compute` pipeline - so you can generate meshes without all the issues of a graphics pipeline (vertex and fragment stages) - using lots of bodges and hacky tricks to get things to work.
Instead, the compute pipeline is a clean and compact way for you to perform parallel computations with no mess.
Few key points about why mesh generation:
• On Vulkan (native API) you have the geometry shader so you can generate/modify mesh data in the graphics pipeline (don't have this with WebGPU API - instead you have the compute shader)
• Challenge is making the mesh generation/update workload is distributed across the GPU threads (take advantage of atomics and buffer indexes)
• Atomic let us manage the parallel complexity - so a thread picks a single triangle. The atomic returns the index into the vertex buffer array - this index is unique to this thread - so there isn't any problems with different threads trying to read and write to the same location.
| Compute Mesh Generation | |
Let's start simple - just get the things running and generate a mesh with 2 triangles.
For each vertex, we'll have a position, normal and color - each are 4 floats each (vec4) to avoid any alignment issues. Also just to get started, we'll just generate a triangle-list (only triangle buffer) - won't generate a seperate index buffer. So every 3 vertices in the buffer makes a triangle. If there are 9 vertices there is 3 triangles.
console.log('WebGPU Compute Mesh Generation Example (2 Triangles)');
if (!navigator.gpu) { log("WebGPU is not supported (or is it disabled? flags/settings)"); return; }
const adapter = await navigator.gpu.requestAdapter(); const device = await adapter.requestDevice();
// GPU Data Buffers (Data on GPU)
// Counter for how many triangles we have const buffer0 = new Uint32Array([ 0 ] ); const gbuffer0 = device.createBuffer({ size: buffer0.byteLength, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST | GPUBufferUsage.COPY_SRC }); device.queue.writeBuffer(gbuffer0, 0, buffer0);
const MAX_VERTEX_BUFFER_SIZE = 50; // large buffer const EACH_VERTEX_SIZE = 12; // each vertex is 12 floats (pos, col, normal)
// Array of triangles (holds our generated mesh) const buffer1 = new Float32Array( Array( MAX_VERTEX_BUFFER_SIZE * EACH_VERTEX_SIZE ).fill(0) ); const gbuffer1 = device.createBuffer({ size: buffer1.byteLength, usage: GPUBufferUsage.VERTEX | GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST | GPUBufferUsage.COPY_SRC }); device.queue.writeBuffer(gbuffer1, 0, buffer1);
// Layout for the pipeline and the shader
// Bind group layout and bind group const bindGroupLayout = device.createBindGroupLayout({ entries: [ {binding: 0, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"} }, {binding: 1, visibility: GPUShaderStage.COMPUTE, buffer: {type: "storage"} } ] });
const bindGroup = device.createBindGroup({ layout: bindGroupLayout, entries: [ {binding: 0, resource: {buffer: gbuffer0 }}, {binding: 1, resource: {buffer: gbuffer1 }} ] });
// Compute shader code const computeShader = `
struct Vertex { position : vec4<f32>, // xyzw color : vec4<f32>, // rgba normal : vec4<f32> // xyzw }
@group(0) @binding(0) var<storage, read_write> buf0 : atomic<u32>; // Maximum number of vertices (triangle-list) @group(0) @binding(1) var<storage, read_write> buf1 : array<Vertex, ${MAX_VERTEX_BUFFER_SIZE}>;
@compute @workgroup_size(256, 1) fn main(@builtin(global_invocation_id) globalId : vec3<u32>, @builtin(local_invocation_id) localId : vec3<u32>, @builtin(workgroup_id) workgroupId : vec3<u32>, @builtin(num_workgroups) workgroupSize : vec3<u32> ) { // Calculate total number of triangles let totalTriangles:u32 = TORUS_MAJOR_SEGMENTS * TORUS_MINOR_SEGMENTS * 2u;
if (globalId.x >= totalTriangles) { return; }
// Calculate indices for the current triangle let triIndex:u32 = atomicAdd(&buf0, 1); // Each triangle is 3 vertices // Each thread will manage the calculations for '1-triangle' using the 'globalId' to identify // where the triangle fits in the larger shape model buf1[triIndex * 3 + 0].position = vec4(0,1,2,0) * f32(globalId.x); buf1[triIndex * 3 + 1].position = vec4(3,4,5,0) * f32(globalId.x); buf1[triIndex * 3 + 2].position = vec4(6,7,8,0); buf1[triIndex * 3 + 0].normal = vec4(0,1,0,0); buf1[triIndex * 3 + 1].normal = vec4(0,1,0,0); buf1[triIndex * 3 + 2].normal = vec4(0,1,0,0); buf1[triIndex * 3 + 0].color = vec4(1,0,0,1); buf1[triIndex * 3 + 1].color = vec4(0,1,0,1); buf1[triIndex * 3 + 2].color = vec4(0,0,1,1); } `;
// Pipeline setup const computePipeline = device.createComputePipeline({ layout : device.createPipelineLayout({bindGroupLayouts: [bindGroupLayout]}), compute: { module : device.createShaderModule({code:computeShader}), entryPoint: "main" } });
{ // Commands submission const commandEncoder = device.createCommandEncoder(); const passEncoder = commandEncoder.beginComputePass(); passEncoder.setPipeline(computePipeline); passEncoder.setBindGroup(0, bindGroup); passEncoder.dispatchWorkgroups( 16, 1, 1 ); await passEncoder.end();
// Submit GPU commands. const gpuCommands = commandEncoder.finish(); await device.queue.submit([gpuCommands]); }
The skeleton code adds 2 triangles to the vertex buffer (buffer1 ) and updates the vertex buffer counter (buffer0 ).
Just used hard coded values for the size of the buffer (large block) - and the defines are constants getting set in the shader using a string literal (i.e., `MAX_VERTEX_BUFFER_SIZE`). Keeps the code more compact.
Debugging, we'll bring the data back to the CPU and print out the values.
|