1.7 - Uniforms and GPU Memory Layout

What We're Learning
You've been using uniforms to send data to your shaders, but now it's time to uncover a hidden and critical layer of complexity: GPU memory layout. Think of the GPU as a highly-optimized machine that demands data be arranged in a very specific way. If the memory layout of our Rust structs doesn't perfectly match the GPU's expectations, our shaders can receive corrupted data, leading to visual glitches or silent, hard-to-debug failures.
This article is your guide to mastering these rules. Get it right, and you unlock a seamless, efficient flow of data from your CPU to your GPU, giving you precise control over your shader's behavior.
By the end, you'll understand:
The purpose of uniforms and the best scenarios for using them.
Why GPUs enforce strict memory alignment rules.
How to use Bevy's powerful ShaderType trait to manage memory layout automatically.
How to take manual control with attributes like
#[align(16)]to solve common issues.How to spot and fix the most common alignment bugs.
Best practices for designing efficient and error-free uniform structs from the start.
What Are Uniforms?
Uniforms are the primary way we send small amounts of read-only data from our Rust code on the CPU to our shaders on the GPU. They are called "uniform" because their value remains constant - or uniform - across every vertex and fragment processed within a single draw call.
Think of them as a settings panel for your shader. Before you draw a mesh, you configure its material's uniforms (e.g., color: red, intensity: 0.8). Every single vertex and fragment the GPU renders for that mesh will then use those exact same settings.
Here’s the data flow in practice:
// In your Rust application (CPU)
// You define the data in a struct and add it to your materials.
let material = materials.add(MyMaterial {
uniforms: MyUniforms {
color: LinearRgba::RED,
intensity: 0.8,
}
});
This Rust data is then sent to the GPU, where the shader can access it as a global, read-only variable.
// In your WGSL shader (GPU)
// You declare a variable with the same structure to receive the data.
@group(2) @binding(0)
var<uniform> material: MyUniforms;
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
// Now you can use the uniform values to influence the output.
let final_color = material.color * material.intensity;
return vec4<f32>(final_color.rgb, 1.0);
}
When to use uniforms:
Material Properties: Base color, roughness, metallic, emissive values.
Animation Parameters: A global
timevalue, animation speed, or wave amplitude.Configuration Flags: Toggles or settings that control shader behavior, like is_dissolving.
Scene-wide Data: Camera position, projection matrices, ambient light color. (Bevy provides these for you in
@group(0)).
When NOT to use uniforms:
Per-Vertex Data: Information that is unique to each vertex, like position, normals, or UV coordinates. This data must be passed as vertex attributes (
@location).Large or Writable Data: Big arrays of data (e.g., bone matrices for skinned animation) or data that the GPU needs to write back to. For these cases, storage buffers are the correct tool.
The Hidden Problem: Memory Alignment
Here is where many shader beginners get stuck. You might define a struct in Rust and an identical-looking struct in WGSL and assume they will just work. This is incorrect, and this single assumption is the source of countless shader bugs.
Think of your Rust struct and your WGSL struct as a contract. Both sides must agree on the exact size and layout of the data, byte for byte. If they don't match, the GPU will read from the wrong memory addresses, interpreting your data as garbage.
The problem is that the GPU has very strict rules about how data must be arranged in memory, known as alignment requirements. These rules exist for a single reason: performance. GPUs are massively parallel processors that can read data much, much faster when it's located at predictable, evenly-spaced memory addresses (like a multiple of 16 bytes). They sacrifice flexibility for raw speed.
The 16-Byte Boundary Problem
Let's look at a simple Rust struct that seems perfectly innocent but is fundamentally broken from the GPU's perspective.
// This struct looks logical, but it violates GPU alignment rules.
#[derive(ShaderType)] // We'll explain this later
pub struct BrokenMaterial {
pub color: LinearRgba, // 16 bytes (a vec4 of f32s)
pub intensity: f32, // 4 bytes
pub time: f32, // 4 bytes
}
What Rust arranges in memory (your expectation)
The Rust compiler, by default, packs fields together tightly to save space.
// Total size: 24 bytes
|-- color (16 bytes) --|-- intensity (4) --|-- time (4) --|
^ ^ ^ ^
Offset 0 Offset 16 Offset 20 Offset 24
What the WGSL shader requires (the GPU's reality)
The GPU's layout rules are different. In WGSL, a uniform struct's total size must be a multiple of its largest member's alignment. Here, the largest member is LinearRgba (vec4<f32>), which has a 16-byte alignment. Therefore, the total size of the struct on the GPU side must be padded up to the next multiple of 16.
// Required GPU size: 32 bytes (the next multiple of 16 after 24)
|-- color (16 bytes) --|-- intensity (4) --|-- time (4) --|---- padding (8) ----|
^ ^ ^ ^ ^
Offset 0 Offset 16 Offset 20 Offset 24 Offset 32
When Bevy sends the 24 bytes of Rust data to a GPU buffer expecting 32 bytes of data, the time field will be misread, and anything that comes after this struct in memory will be completely corrupted.
Why the GPU Cares So Much: The Hardware Contract
To understand why these strict rules exist, you need to think about how a GPU achieves its incredible speed. It's not one processor working very fast; it's thousands of simple processors (cores) working in synchronized groups (warps or wavefronts).
Imagine a team of 32 workers (a warp) lined up in a warehouse. They all execute the same instruction at the same time: "Everyone, grab your assigned intensity value from the shelf." The warehouse's memory system is optimized for this kind of teamwork. It doesn't fetch individual items; a memory controller (a forklift) fetches an entire, large, pre-defined block of memory at once (e.g., a 32- or 64-byte cache line).
The Efficient, Aligned Case
If all 32 intensity values the workers need are located within a single, aligned block of memory, the forklift can make one trip, grab the entire block, and distribute the data to the team almost instantly. This is called a coalesced memory access, and it is the key to the GPU's high memory bandwidth.
The Inefficient, Unaligned Case
Now, imagine a single vec4 value crosses a 16-byte boundary. It's like a large tool that was stored carelessly, with its handle in one box and its head in the next. When a worker needs that tool, the forklift driver has to:
Make a trip to get the first box.
Extract the part of the tool that's in it.
Make a second trip to get the adjacent box.
Extract the rest of the tool.
Combine the pieces for the worker.
This "two-trip penalty" is a real hardware phenomenon. An unaligned memory access forces the memory controller to perform two memory transactions instead of one, consuming twice the bandwidth and adding significant latency. Since all workers in the group are synchronized, if one is delayed by an unaligned access, the entire group has to wait.
The alignment rules are therefore a performance contract between you and the GPU hardware. By following the rules, you are promising to lay out your data in a way that allows the GPU to always use its most efficient, single-trip memory access patterns. In return, the GPU gives you maximum performance.
Alignment Rules Quick Reference
The specific layout rules for WGSL are as follows. The key is that a value must be stored at a memory address that is a multiple of its alignment size.
| WGSL Type | Size in Bytes | Alignment in Bytes | Rule |
f32, i32, u32 | 4 | 4 | Must start at an address divisible by 4. |
vec2<T> | 8 | 8 | Must start at an address divisible by 8. |
vec3<T> | 12 | 16 | ⚠️ The Trap! 12 bytes of data, but requires 16-byte alignment. |
vec4<T> | 16 | 16 | Must start at an address divisible by 16. |
Array | Varies | 16 | Each element is aligned as if it were a vec4. |
Struct | Varies | Largest Member | Alignment is the strictest alignment of any of its fields. |
The vec3 rule is the most common source of errors. It's only 12 bytes in size, but the GPU requires it to start on a 16-byte boundary, forcing 4 bytes of unused padding before it if the preceding data doesn't end on a 16-byte boundary.
Understanding Struct Alignment
The final rule is the most important: "A struct's alignment is equal to the strictest alignment of any of its fields."
Let's build a better mental model. Imagine your uniform buffer is a giant, specialized parking lot, where each byte is one parking space.
A motorcycle (
f32,u32) is 4 spaces wide and can park in any spot numbered with a multiple of 4 (0, 4, 8, 12...).A car (
vec2) is 8 spaces wide and can park in any spot numbered with a multiple of 8 (0, 8, 16...).A large truck (
vec4,mat4) is 16 spaces wide and can only park in special spots marked with a multiple of 16 (0, 16, 32...).
Now, a struct is like a car carrier trailer: a pre-packed container holding several vehicles. The rule for parking the entire trailer in the main lot is determined by its most demanding vehicle.
Example: A Car Carrier with a Truck
#[derive(ShaderType)]
pub struct MixedStruct {
pub passenger_count: u32, // A motorcycle (4-space requirement)
pub transport: Vec4, // A large truck (16-space requirement) ← STRICTEST
}
Most Demanding Vehicle: The
Vec4truck, which needs a 16-space spot.Trailer's Parking Rule: The entire
MixedStructtrailer must be parked starting at a 16-space marker (address 0, 16, 32...).
Why this matters: Nested Trailers and Padding
This rule guarantees that no matter where the trailer is parked (as long as it follows its own rule), none of the vehicles inside will ever violate their parking rules. This often requires leaving empty space (padding) inside the trailer.
#[derive(ShaderType)]
pub struct InnerTrailer {
pub vehicle: Vec4, // Contains a truck, so this trailer has a 16-space rule
}
#[derive(ShaderType)]
pub struct OuterTrailer {
pub driver: u32, // A motorcycle at the front of the trailer
pub cargo: InnerTrailer, // The inner trailer, containing a truck
}
Here's how the OuterTrailer is laid out:
// Layout of OuterTrailer:
|-- driver (4) --|---------- EMPTY SPACE (12) ----------|---- cargo (InnerTrailer, 16 spaces) ----|
^ ^ ^ ^
Offset 0 Offset 4 Offset 16 Offset 32
The
drivermotorcycle is placed at the front of the trailer (offset0). It takes up 4 spaces.The next available spot inside the trailer is offset
4.However, the cargo (the
InnerTrailer) contains a truck and therefore has a 16-space parking requirement. It cannot be placed starting at offset 4.The next spot inside the trailer that is a multiple of 16 is offset 16.
Therefore, the system must leave 12 empty spaces of padding between the motorcycle and the inner trailer to ensure the
cargois correctly positioned.
This ensures that if the OuterTrailer parks at a valid 16-space spot (like address 32), the truck inside the cargo will end up at 32 (trailer start) + 16 (cargo offset) = 48, which is also a valid 16-space spot.
The Rule in Plain English:
A struct must be placed at a memory address that satisfies its most demanding field. This guarantees all its fields will be correctly aligned.
The ShaderType Trait: Your Automatic Layout Manager
Now that you understand the complexity of GPU memory rules, you might be worried about calculating offsets and padding by hand. Fortunately, you rarely have to. Bevy leverages a powerful trait called ShaderType (from the encase crate) to manage this for you automatically.
ShaderType is the bridge between your Rust struct's memory layout and the strict layout the GPU requires. When you derive it, it analyzes your struct's fields and generates the necessary logic to write the data to the GPU buffer with the correct padding and alignment.
What ShaderType Does for You
Think of ShaderType as an expert logistics manager for your "car carrier trailer" struct. It knows the exact parking rules for every vehicle (Vec4, f32, etc.) and automatically arranges the cargo and adds empty space (padding) to ensure the entire trailer is compliant.
use bevy::render::render_resource::ShaderType;
#[derive(ShaderType)]
pub struct MyUniforms {
pub color: Vec4, // ShaderType knows this is a "truck" with 16-byte alignment.
pub intensity: f32, // ShaderType knows this is a "motorcycle" with 4-byte alignment.
// ...and it knows how to pad the whole struct correctly to a multiple of 16.
}
By deriving ShaderType, you get:
Correct Sizing: It calculates the total size of the struct, including all necessary padding to satisfy the struct's overall alignment.
Automatic Padding: It inserts padding between fields when writing to the GPU buffer, ensuring every field meets its alignment requirement.
Layout Contract: It guarantees the final layout in the GPU buffer matches what your WGSL shader expects, fulfilling the memory contract.
Many of Bevy's built-in math and color types already implement ShaderType, so you can use them in your uniform structs immediately:
f32,u32,i32Vec2,Vec3,Vec4Mat2,Mat3,Mat4LinearRgba,Color(when converted)Arrays and tuples of the above types.
The Two-Struct Pattern Revisited
Using ShaderType is straightforward: you derive it on the struct you intend to use for your uniform data. This is the second half of the two-struct pattern we've been using. Bevy's AsBindGroup derive macro requires that any field marked #[uniform] must be on a type that implements ShaderType.
Let's look at a nested example to see it in action.
// First, define a struct for lighting parameters.
#[derive(ShaderType)]
pub struct LightingParams {
pub ambient: Vec3,
pub diffuse: Vec3,
pub specular: Vec3,
}
// Now, use it in our main material struct.
#[derive(Asset, TypePath, AsBindGroup, Debug, Clone)]
pub struct MyMaterial {
#[uniform(0)]
pub lighting: LightingParams, // This works because LightingParams derives ShaderType
}
When you create an instance of LightingParams, you don't see any padding.
// You write this simple, clean Rust code:
let params = LightingParams {
ambient: Vec3::new(0.1, 0.1, 0.1), // 12 bytes of data
diffuse: Vec3::new(0.8, 0.8, 0.8), // 12 bytes of data
specular: Vec3::new(1.0, 1.0, 1.0), // 12 bytes of data
};
However, ShaderType does the hard work behind the scenes. Because Vec3 has a 16-byte alignment, it inserts 4 bytes of padding after each one when writing to the GPU buffer to ensure the next field starts on a proper 16-byte boundary.
What ShaderType writes to the GPU buffer:
// Total size: 48 bytes (16 * 3)
|-- ambient (12) --|-- pad (4) --|-- diffuse (12) --|-- pad (4) --|-- specular (12) --|-- pad (4) --|
^ ^ ^ ^ ^ ^ ^
Offset 0 Offset 12 Offset 16 Offset 28 Offset 32 Offset 44 Offset 48
The "unused function" Warning
When you derive ShaderType, the Rust compiler will often show a warning like: warning: function 'check' is never used. This is expected and harmless.
The ShaderType derive macro generates a validation function called check. Its only job is to run at compile time to verify that all fields in your struct implement the necessary traits. If you used a type that can't be sent to a shader, this check function would cause a compile error, which is exactly what you want. Since it's never called at runtime, the compiler flags it as "dead code."
How to suppress it cleanly:
The best practice is to place your ShaderType definitions in a dedicated submodule and apply #![allow(dead_code)] to that module. This keeps your main codebase clean and isolates the warning suppression to only where it's needed.
// src/materials/my_material.rs
use bevy::prelude::*;
use bevy::render::render_resource::{AsBindGroup, ShaderRef};
// Define a private module for our shader-facing types.
mod uniforms {
#![allow(dead_code)] // This attribute applies only to the `uniforms` module.
use bevy::prelude::*;
use bevy::render::render_resource::ShaderType;
#[derive(ShaderType, Debug, Clone)]
pub struct MyUniforms {
pub color: Vec4,
pub intensity: f32,
}
}
// Re-export the type so it can be used elsewhere.
pub use uniforms::MyUniforms;
#[derive(Asset, TypePath, AsBindGroup, Debug, Clone)]
pub struct MyMaterial {
#[uniform(0)]
pub uniforms: MyUniforms,
}
// ... rest of your material impl
ShaderType is fantastic for automating the layout of structs. However, as we saw with the LightingParams example, it can sometimes produce layouts that are inefficient or don't solve every problem. For those cases, we need to move from automatic management to manual control.
The #[align(16)] Attribute: Your Most Common Fix
While ShaderType is powerful, it can't read your mind. It follows the rules, but it won't magically solve tricky cases like the Vec3 trap. We need a way to give it explicit instructions.
Now, you might be thinking, "Wait a minute. In the LightingParams example, ShaderType seemed to automatically add padding between Vec3 fields. But now you're saying it doesn't?"
ShaderType adds padding between fields only when it's forced to do so by the alignment requirements of the next field.
Let's compare the two cases:
Vec3followed byVec3(like inLightingParams):ambient: Vec3is placed at offset 0.The next field,
diffuse: Vec3, requires 16-byte alignment.The next available spot (offset 12) is invalid for
diffuse.Therefore,
ShaderTypemust insert 4 bytes of padding to start diffuse at offset 16. The layout is correct, but potentially inefficient.
Vec3followed byf32(our new problem):position: Vec3is placed at offset 0.The next field,
intensity: f32, only requires 4-byte alignment.The next available spot (offset 12) is a valid 4-byte aligned address.
ShaderType sees no reason to add padding and places
intensityat offset 12. The layout is incorrect because the WGSL side still expectspositionto be padded to 16 bytes.
This is where we must intervene. We need an explicit instruction to tell the layout system to treat the Vec3 as if it takes up 16 bytes of space, no matter what follows it. The #[align(16)] attribute is that instruction.
Example: Fixing the Misaligned Vec3
The #[align(16)] attribute is the single most common tool you will use to fix GPU alignment issues.
use bevy::render::render_resource::ShaderType;
// ✗ BROKEN: `intensity` starts at offset 12, but the GPU expects it at 16.
#[derive(ShaderType)]
pub struct BrokenUniforms {
pub position: Vec3,
pub intensity: f32,
}
// ✓ FIXED: We add the attribute to the field that needs explicit padding.
#[derive(ShaderType)]
pub struct FixedUniforms {
// This attribute tells ShaderType: "Treat this field's allocated
// space as 16 bytes, pushing the next field to the correct offset."
#[align(16)]
pub position: Vec3,
// `intensity` is now correctly placed at offset 16.
pub intensity: f32,
}
By attaching #[align(16)] to position, you are making the padding rule explicit, removing any ambiguity for the layout system.
Visualizing the Memory Layout
Without the fix, the Rust data and the GPU's expectation are out of sync.
// What your Rust code sends (16 bytes total):
|-- position (12 bytes) --|-- intensity (4) --|
^ ^
Offset 0 Offset 12
// What the GPU tries to read:
// It expects `position` to be padded, so it looks for `intensity` at offset 16.
|-- position (12 bytes) --|-- ??? (4) --|-- ??? (4) --|
^ ^ ^
Offset 0 Offset 12 Offset 16 (Reads garbage from here)
The #[align(16)] attribute forces 4 bytes of padding to be inserted after position, making the layout in the GPU buffer match perfectly.
// What your Rust code now sends, correctly padded:
// The struct's data takes up 20 bytes, so it's padded to a total size of 32.
|-- position (12) --|-- pad (4) --|-- intensity (4) --|-- struct padding (12) --|
^ ^ ^ ^ ^
Offset 0 Offset 12 Offset 16 Offset 20 Offset 32
Now, when the GPU goes to read intensity from offset 16, it finds the correct data.
When to Use #[align(16)]
You should reach for #[align(16)] primarily when you have a Vec3 that isn't the absolute last field in the struct. While it can be used to align other types, the Vec3 case is the one you will encounter 99% of the time.
The offset Attribute: Full Manual Control
For the vast majority of cases, ShaderType combined with #[align(16)] is all you need. However, in some advanced scenarios, you might need absolute control over the exact byte offset of every field. The offset attribute provides this manual override.
When you use offset, you are taking full responsibility for the memory layout. You are telling the ShaderType system to place a field at a specific byte offset, bypassing its automatic calculations.
#[derive(ShaderType)]
pub struct PreciseLayout {
#[offset(0)]
pub color: Vec4, // Starts at byte 0, ends at 15.
#[offset(16)]
pub intensity: f32, // Starts at byte 16, ends at 19.
// NOTE: There is a 12-byte gap here (from 20 to 31) that we've
// created manually.
#[offset(32)]
pub time: f32, // Starts at byte 32, ends at 35.
}
Warning: With great power comes great responsibility. Using offset is dangerous. You are overriding the automatic safety checks. If you specify an offset that violates a field's alignment requirements (e.g., placing a Vec4 at #[offset(8)]), it may compile but will cause crashes or unpredictable behavior at runtime.
When to use offset
This attribute is rarely needed, but can be useful for:
Matching an External Layout: When interfacing with a graphics API or compute shader written by another team or in another language that has a very specific, non-standard memory layout you must match exactly.
Performance Tuning: In performance-critical compute shaders, manual layout can sometimes be used to optimize data access patterns, though this is an advanced optimization.
Debugging: Explicitly defining offsets can be a useful tool for debugging a complex alignment issue, as it makes your intended layout crystal clear.
For general shader development in Bevy, you should always prefer the automatic layout provided by ShaderType and #[align(16)]. Only use offset when you have a specific, documented reason to do so.
Common Scenarios and Pitfalls
Theory is important, but seeing how these rules apply in real-world code is where the learning sticks. Let's explore the most common layout problems Bevy developers encounter and the idiomatic ways to solve them.
Scenario 1: The Vec3 Trap
This is, by far, the most common source of alignment bugs. The problem occurs specifically when a Vec3 is followed by a type with a less-strict alignment requirement (like an f32 or u32).
The Problem
// ✗ PROBLEM: A Vec3 followed by a type with a less-strict alignment.
#[derive(ShaderType)]
pub struct LightData {
pub position: Vec3, // Occupies offsets 0-11
pub intensity: f32, // Placed at offset 12 by ShaderType
}
This is the classic bug. position ends at offset 11. The next field, intensity, only needs 4-byte alignment. Since offset 12 is a valid 4-byte address, ShaderType places intensity there. However, your WGSL shader expects every vec3 in a uniform block to be padded to occupy a full 16-byte slot. The shader will therefore try to read intensity from offset 16, resulting in garbage data.
Note on
Vec3followed byVec3: You might observe that if aVec3is followed by anotherVec3, ShaderType does add padding automatically. This is because the secondVec3's 16-byte requirement forces it to. While this technically works, it's a subtle behavior to rely on. It is far better to be explicit and make your memory layout obvious and robust using the solutions below.
Solution A (Best): Use Vec4
The simplest and most robust solution is to use a Vec4 instead of a Vec3. It's naturally 16-byte aligned and requires no extra thought. You simply ignore the .w component in your shader code. The tiny memory cost is almost always worth the correctness and clarity.
// ✓ BEST: Use Vec4 and forget about alignment issues.
#[derive(ShaderType)]
pub struct LightData {
pub position: Vec4, // In WGSL, use light.position.xyz
pub intensity: f32,
}
Solution B (Alternative): Explicit Alignment
If you cannot use a Vec4, the #[align(16)] attribute is the correct, explicit fix. It forces the Vec3 to be treated as a 16-byte element, regardless of what follows it.
// ✓ ALTERNATIVE: Manually align the Vec3.
#[derive(ShaderType)]
pub struct LightData {
#[align(16)]
pub position: Vec3,
pub intensity: f32,
}
Scenario 2: Inefficient Field Ordering
The Problem
Placing small types before large types can force the layout system to insert large, wasteful chunks of padding.
// ✗ INEFFICIENT: Creates 12 bytes of padding after `is_active`.
#[derive(ShaderType)]
pub struct InefficientLayout {
pub is_active: u32, // 4 bytes, then needs to pad to 16 for `transform`
pub transform: Mat4, // 64 bytes (16-byte alignment)
pub intensity: f32, // 4 bytes, then needs to pad to 16 for `color`
pub color: Vec4, // 16 bytes
}
The Solution: Order Fields from Largest to Smallest
By reordering the fields to place the largest, most-aligned types first, you allow the smaller types to be packed together more tightly afterward, minimizing wasted space.
// ✓ EFFICIENT: Largest types first minimizes padding.
#[derive(ShaderType)]
pub struct EfficientLayout {
// Largest alignment first
pub transform: Mat4, // 64 bytes
pub color: Vec4, // 16 bytes
// Smaller types can be packed together after
pub is_active: u32, // 4 bytes
pub intensity: f32, // 4 bytes
}
Scenario 3: Arrays of Structs
The Problem
When you create an array of custom structs on the GPU, there's a critical rule: each element in the array must start on a 16-byte boundary. This means the total size of your Rust struct must be a multiple of 16. If it's not, the second element in the array will be placed at the wrong offset, causing all subsequent data to be misaligned.
Let's look at a struct whose size isn't a multiple of 16.
// ✗ PROBLEM: This struct's size is 20 bytes (16 for `color` + 4 for `intensity`).
// 20 is not a multiple of 16.
#[derive(ShaderType, Clone)]
pub struct InstanceData {
pub color: Vec4,
pub intensity: f32,
}
In an array like instances: [InstanceData; 2]:
instances[0]starts at offset 0.Your Rust data places
instances[1]at offset 20.The GPU expects
instances[1]to be at offset 32 (the next multiple of 16). This is a major misalignment.
The Solution: Pad the Struct to a Multiple of 16
To fix this, you must add explicit padding fields to your struct to round its total size up to the next multiple of 16.
Calculate the current size:
size_of::<Vec4>() + size_of::<f32>() = 16 + 4 = 20bytes.Find the next multiple of 16: The next multiple of 16 after 20 is 32.
Calculate the required padding:
32 - 20 = 12bytes.Add padding fields: Add private, unused fields to fill that 12-byte gap. An array of
u32orf32is perfect for this.[u32; 3]is 3 * 4 = 12 bytes.
Here is the corrected struct:
// ✓ SOLUTION: Add explicit padding to make the total size a multiple of 16.
#[derive(ShaderType, Clone)]
pub struct InstanceData {
pub color: Vec4, // 16 bytes
pub intensity: f32, // 4 bytes
// --- 20 bytes so far ---
// Add 12 bytes of padding to reach the next multiple of 16 (32).
// The leading underscore prevents "unused variable" warnings.
pub _padding: [u32; 3], // 3 * 4 = 12 bytes
}
// Total size is now 32 bytes, which is safe for use in an array.
Debugging Alignment Issues
Sooner or later, you will get a memory layout wrong. Your shader won't crash; it will just give you bizarre, nonsensical results. When your visuals look corrupted, it's almost always an alignment problem. Here’s how to diagnose the symptoms.
Symptom 1: The Last Field is Garbage
You send a struct with a color and an intensity. The color appears correctly in the shader, but the intensity value is a seemingly random, often huge number.
Likely Cause
The total size of your Rust struct is not a multiple of its alignment (usually 16). The ShaderType system added padding at the end of the struct that you didn't account for. The GPU is reading intensity from one address, but the padding pushed the next uniform block into that spot.
Debug Steps
Print the size of your Rust struct:
println!("Struct size is {}", std::mem::size_of::<MyUniforms>());.The GPU requires the total size to be a multiple of the largest member's alignment (16 for a Vec4). The GPU-side struct is therefore padded to the next multiple of 16.
The Fix
Add explicit padding to your Rust struct to match the GPU's expected size.
// In this case, size is 20. GPU pads to 32. Fix by adding 12 bytes.
#[derive(ShaderType)]
struct MyUniforms {
pub color: Vec4,
pub intensity: f32,
pub _padding: [f32; 3], // Add 12 bytes to make total size 32
}
Symptom 2: A Field in the Middle is Garbage
The first one or two fields in your struct work correctly, but a field after them is wrong, and every subsequent field is also wrong.
// Rust Struct
#[derive(ShaderType)]
struct MyUniforms {
pub time: f32, // Works fine
pub direction: Vec3, // THIS is the problem
pub speed: f32, // This and all later fields are garbage
}
Likely Cause
You have a Vec3 that isn't the last field and is followed by a type with a smaller alignment (like f32). As we discussed, ShaderType won't pad it automatically in this case, causing a layout mismatch.
Debug Steps
Scan your struct for any Vec3 fields that are not at the very end.
The Fix
Add #[align(16)] to the Vec3 field, or better yet, change it to a Vec4 and reorder your fields for efficiency.
// Fix (Preferred): Use Vec4 and reorder
#[derive(ShaderType)]
struct MyUniforms {
pub direction: Vec4, // Moved to top for better packing
pub time: f32,
pub speed: f32,
// ... plus end-of-struct padding
}
Symptom 3: Everything Looks "Shifted"
Each field in your shader seems to be reading the value of the next field in the struct. uniform.time has the value of uniform.speed, uniform.speed has the value of uniform.intensity, and so on.
Likely Cause
This is another classic Vec3 problem, often happening near the start of the struct. A missing #[align(16)] causes a 4-byte shift in all subsequent data.
Debug Steps
This has the same cause as Symptom 2. Go through your struct and ensure every
Vec3is properly aligned to a 16-byte boundary using#[align(16)]or by converting it to aVec4.Check your field ordering. Make sure large-alignment types (
Mat4,Vec4,Vec3) come before small-alignment types (f32,u32).
Debug Helper: Verifying Your Layout in Code
Don't guess! You can use Rust's std::mem module to get the exact size and alignment of your structs at compile time. Add this to your setup code to verify your assumptions.
// In your setup() system, or any other temporary code
println!("--- MyUniforms Layout ---");
println!("Size: {} bytes", std::mem::size_of::<MyUniforms>());
println!("Alignment: {} bytes", std::mem::align_of::<MyUniforms>());
println!("------------------------");
// Expected output for a well-formed struct (e.g., one Vec4 and one f32):
// Size: 32 bytes (or a multiple of 16)
// Alignment: 16 bytes
Best Practices for Robust Uniform Layout
Follow these guidelines to avoid alignment issues entirely and design memory-efficient, future-proof uniform structs.
1. Prefer Vec4 Over Vec3
This is the golden rule. It eliminates the most common alignment bug entirely and makes your layout trivial to reason about. The small amount of "wasted" memory is a tiny price to pay for correctness and peace of mind.
// ✗ Requires attention and manual alignment
pub struct Material {
#[align(16)]
pub color: Vec3,
// ...
}
// ✓ BEST: Simple, safe, and always 16-byte aligned
pub struct Material {
pub color: Vec4, // Use .rgb in your WGSL shader
// ...
}
2. Order Fields by Size (Largest Alignment First)
Arrange your fields with the highest alignment requirements first. This allows smaller types to pack efficiently at the end, minimizing overall padding.
| Alignment Group | Rust Types |
| 16-byte | Mat4, Mat3, Vec4, Vec3 |
| 8-byte | Vec2 |
| 4-byte | f32, u32, i32 |
3. Explicitly Pad Structs Used in Arrays
If a struct is destined to be an array element (e.g., lights: [LightData; 10]), its total size must be a multiple of 16. Use std::mem::size_of to check, and add explicit _padding fields if it's not.
4. Document Your Layout
If you use manual alignment or explicit padding, explain why in the struct's doc comments. This makes your code maintainable for your future self and for others.
/// GPU memory layout for an array of lights.
/// Total size must be a multiple of 16.
/// Current size: 16 (pos) + 4 (intensity) = 20.
/// Padding: 12 bytes needed to reach 32.
#[derive(ShaderType)]
pub struct LightData {
pub position: Vec4,
pub intensity: f32,
pub _padding: [u32; 3],
}
By following these four principles, you can design uniform structures with confidence, knowing the data being sent to your GPU is exactly what your shader expects.
Complete Example: Interactive Wave Animation
This mini-project demonstrates a perfectly aligned uniform struct used to drive a vertex and fragment shader, creating an interactive, animated material.
Our Goal
We will create a material that displaces the vertices of a sphere to create a dynamic wave effect. All the parameters controlling the wave's shape, speed, and color will be defined in a single, properly laid out uniform struct, and we will be able to adjust these parameters in real-time.
What This Project Demonstrates
Best Practices in Action: The
AnimationParamsstruct is a perfect example of ordering fields by size, usingVec4forVec3data, and adding explicit padding to ensure a robust layout.Complex Uniforms: Shows how to manage a struct with multiple different data types (
Vec4,f32,u32) without alignment errors.Vertex and Fragment Animation: The uniform data is used in the vertex shader to displace vertices and in the fragment shader to shift colors.
Interactivity: We will update the uniform values from our Rust application every frame in response to user input, demonstrating the power of uniforms for dynamic effects.
The Shader (assets/shaders/d01_07_animated_material.wgsl)
This single shader file contains both our vertex and fragment logic. Note how the AnimationParams struct in WGSL mirrors the Rust struct's layout perfectly, fulfilling our memory contract.
#import bevy_pbr::mesh_functions
#import bevy_pbr::view_transformations::position_world_to_clip
#import bevy_pbr::forward_io::VertexOutput
// Properly aligned uniform struct
struct AnimationParams {
// 16-byte aligned types first
base_color: vec4<f32>, // Offset 0-15
wave_direction: vec4<f32>, // Offset 16-31 (using vec4 for vec3 + padding)
// Small types grouped together
time: f32, // Offset 32-35
frequency: f32, // Offset 36-39
amplitude: f32, // Offset 40-43
speed: f32, // Offset 44-47
// More parameters
wave_count: u32, // Offset 48-51
color_shift: f32, // Offset 52-55
_padding: vec2<f32>, // Offset 56-63 (explicit padding)
}
@group(2) @binding(0)
var<uniform> params: AnimationParams;
@vertex
fn vertex(
@builtin(instance_index) instance_index: u32,
@location(0) position: vec3<f32>,
@location(1) normal: vec3<f32>,
) -> VertexOutput {
var out: VertexOutput;
let world_from_local = mesh_functions::get_world_from_local(instance_index);
var world_position = mesh_functions::mesh_position_local_to_world(
world_from_local,
vec4<f32>(position, 1.0)
);
// Apply wave animation using properly aligned uniforms
let wave_dir = params.wave_direction.xyz;
let offset = dot(position, wave_dir) * params.frequency + params.time * params.speed;
// Create layered waves
var displacement = 0.0;
for (var i = 0u; i < params.wave_count; i = i + 1u) {
let freq = f32(i + 1u) * params.frequency;
let amp = params.amplitude / f32(i + 1u);
displacement = displacement + sin(offset * freq) * amp;
}
// Apply displacement along normal
world_position = world_position + vec4<f32>(normal * displacement, 0.0);
out.position = position_world_to_clip(world_position.xyz);
out.world_normal = mesh_functions::mesh_normal_local_to_world(normal, instance_index);
out.world_position = world_position;
return out;
}
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
// Color animation using uniforms
let normal = normalize(in.world_normal);
let wave_influence = dot(normal, params.wave_direction.xyz);
// Shift color based on wave
let hue_shift = wave_influence * params.color_shift;
var color = params.base_color.rgb;
// Simple hue shift
color = color + vec3<f32>(hue_shift, -hue_shift * 0.5, hue_shift * 0.3);
color = clamp(color, vec3<f32>(0.0), vec3<f32>(1.0));
return vec4<f32>(color, params.base_color.a);
}
The Rust Material (src/materials/d01_07_animated_material.rs)
This file defines our two-struct material. AnimationParams is our shader-facing data struct, carefully laid out for the GPU according to the best practices we've discussed. AnimatedMaterial is the Bevy Asset that holds it.
use bevy::prelude::*;
use bevy::render::render_resource::{AsBindGroup, ShaderRef};
// Uniform types in a separate module to isolate the dead_code warnings
mod uniforms {
#![allow(dead_code)] // Suppresses ShaderType's generated check functions
use bevy::prelude::*;
use bevy::render::render_resource::ShaderType;
// Properly aligned uniform struct matching WGSL layout
#[derive(ShaderType, Debug, Clone)]
pub struct AnimationParams {
// 16-byte aligned types first
pub base_color: Vec4, // Offset 0-15
pub wave_direction: Vec4, // Offset 16-31 (vec3 + padding as vec4)
// Small types grouped together
pub time: f32, // Offset 32-35
pub frequency: f32, // Offset 36-39
pub amplitude: f32, // Offset 40-43
pub speed: f32, // Offset 44-47
pub wave_count: u32, // Offset 48-51
pub color_shift: f32, // Offset 52-55
pub _padding: Vec2, // Offset 56-63 (explicit padding)
}
}
// Re-export the uniform type
pub use uniforms::AnimationParams;
// Helper for creating animation params
impl AnimationParams {
pub fn new(base_color: Color) -> Self {
Self {
base_color: base_color.to_linear().to_vec4(),
wave_direction: Vec3::new(1.0, 0.5, 0.0).extend(0.0), // vec3 as vec4
time: 0.0,
frequency: 2.0,
amplitude: 0.3,
speed: 1.0,
wave_count: 3,
color_shift: 0.2,
_padding: Vec2::ZERO,
}
}
}
#[derive(Asset, TypePath, AsBindGroup, Debug, Clone)]
pub struct AnimatedMaterial {
#[uniform(0)]
pub params: AnimationParams,
}
impl Material for AnimatedMaterial {
fn vertex_shader() -> ShaderRef {
"shaders/d01_07_animated_material.wgsl".into()
}
fn fragment_shader() -> ShaderRef {
"shaders/d01_07_animated_material.wgsl".into()
}
}
Don't forget to add it to src/materials/mod.rs:
// ... other materials
pub mod d01_07_animated_material;
The Demo Module (src/demos/d01_07_animated_material.rs)
This file sets up our Bevy demo. It registers the material, spawns the sphere, and contains the systems that update the time uniform, handle user input for parameter adjustments, and update the UI to display the current values in real-time.
use crate::materials::d01_07_animated_material::{AnimatedMaterial, AnimationParams};
use bevy::prelude::*;
pub fn run() {
App::new()
.add_plugins(DefaultPlugins.set(AssetPlugin {
watch_for_changes_override: Some(true),
..default()
}))
.add_plugins(MaterialPlugin::<AnimatedMaterial>::default())
.add_systems(Startup, setup)
.add_systems(
Update,
(rotate_camera, update_animation, handle_input, update_ui),
)
.run();
}
fn setup(
mut commands: Commands,
mut meshes: ResMut<Assets<Mesh>>,
mut materials: ResMut<Assets<AnimatedMaterial>>,
) {
// Create animated sphere
commands.spawn((
Mesh3d(meshes.add(Sphere::new(1.0).mesh().uv(64, 32))),
MeshMaterial3d(materials.add(AnimatedMaterial {
params: AnimationParams::new(Color::srgb(0.2, 0.6, 1.0)),
})),
));
// Light
commands.spawn((
PointLight {
shadows_enabled: true,
intensity: 2000.0,
..default()
},
Transform::from_xyz(4.0, 8.0, 4.0),
));
// Camera
commands.spawn((
Camera3d::default(),
Transform::from_xyz(-3.0, 2.5, 6.0).looking_at(Vec3::ZERO, Vec3::Y),
));
// UI
commands.spawn((
Text::new(""),
Node {
position_type: PositionType::Absolute,
top: Val::Px(10.0),
left: Val::Px(10.0),
..default()
},
));
}
fn rotate_camera(time: Res<Time>, mut camera_query: Query<&mut Transform, With<Camera3d>>) {
for mut transform in camera_query.iter_mut() {
let radius = 6.0;
let angle = time.elapsed_secs() * 0.3;
transform.translation.x = angle.cos() * radius;
transform.translation.z = angle.sin() * radius;
transform.look_at(Vec3::ZERO, Vec3::Y);
}
}
fn handle_input(
keyboard: Res<ButtonInput<KeyCode>>,
time: Res<Time>,
mut materials: ResMut<Assets<AnimatedMaterial>>,
) {
let delta = time.delta_secs();
for (_, material) in materials.iter_mut() {
// Frequency
if keyboard.pressed(KeyCode::ArrowUp) {
material.params.frequency = (material.params.frequency + delta * 2.0).min(10.0);
}
if keyboard.pressed(KeyCode::ArrowDown) {
material.params.frequency = (material.params.frequency - delta * 2.0).max(0.1);
}
// Amplitude
if keyboard.pressed(KeyCode::ArrowRight) {
material.params.amplitude = (material.params.amplitude + delta / 2.0).min(1.0);
}
if keyboard.pressed(KeyCode::ArrowLeft) {
material.params.amplitude = (material.params.amplitude - delta / 2.0).max(0.0);
}
// Wave count
if keyboard.just_pressed(KeyCode::KeyQ) {
material.params.wave_count = material.params.wave_count.saturating_sub(1).max(1);
}
if keyboard.just_pressed(KeyCode::KeyE) {
material.params.wave_count = (material.params.wave_count + 1).min(10);
}
// Speed
if keyboard.pressed(KeyCode::KeyW) {
material.params.speed = (material.params.speed + delta * 2.0).min(5.0);
}
if keyboard.pressed(KeyCode::KeyS) {
material.params.speed = (material.params.speed - delta * 2.0).max(-5.0);
}
}
}
fn update_ui(materials: Res<Assets<AnimatedMaterial>>, mut text_query: Query<&mut Text>) {
if !materials.is_changed() {
return;
}
if let Some((_, material)) = materials.iter().next() {
for mut text in text_query.iter_mut() {
**text = format!(
"Arrow Keys: Adjust wave parameters\n\
[UP/DOWN]: Frequency | [LEFT/RIGHT]: Amplitude\n\
[Q/E]: Wave Count | [W/S]: Speed\n\
Frequency: {:.2}\n\
Amplitude: {:.2}\n\
Wave Count: {}\n\
Speed: {:.2}",
material.params.frequency,
material.params.amplitude,
material.params.wave_count,
material.params.speed,
);
}
}
}
fn update_animation(time: Res<Time>, mut materials: ResMut<Assets<AnimatedMaterial>>) {
for (_, material) in materials.iter_mut() {
material.params.time = time.elapsed_secs();
}
}
Don't forget to add it to src/demos/mod.rs:
// ... other demos
pub mod d01_07_animated_material;
And register it in src/main.rs:
Demo {
number: "1.7",
title: "Uniforms and GPU Memory Layout",
run: demos::d01_07_animated_material::run,
},
Running the Demo
When you run the project, you will see an animated, wavy sphere. The fact that it renders correctly without visual glitches is proof that our Rust and WGSL memory layouts are in perfect sync. You can use the controls to change the wave parameters in real time.
Controls
| Key | Action | Uniform Field |
| Up/Down Arrow | Increase/Decrease wave frequency | params.frequency |
| Left/Right Arrow | Decrease/Increase wave amplitude | params.amplitude |
| W / S | Increase/Decrease wave speed | params.speed |
| Q / E | Decrease/Increase number of waves | params.wave_count |
What You're Seeing

| Visual Effect | Controlled By | How It Works |
| The speed of the waves | params.speed | The Rust update_animation system continuously updates time. The vertex shader uses this, multiplied by speed, to scroll the wave pattern across the mesh. |
| The height of the waves | params.amplitude | This value directly scales the vertex displacement along the normal. Higher amplitude means taller waves. |
| The number of waves | params.frequency | frequency controls the base density of the waves. wave_count creates layered sine waves for a more complex, organic look. |
| The shifting colors | params.color_shift | The fragment shader uses the dot product of the surface normal and the wave direction to shift the base color's hue, making peaks and troughs different colors. |
Key Takeaways
This has been a dense and technical topic, but mastering it is a huge step towards becoming a proficient shader developer. Before moving on, make sure these key concepts are clear.
Uniforms are a Memory Contract. Your Rust struct and your WGSL struct must agree on the exact memory layout, byte for byte. If they don't, your shader will read garbage data.
GPUs Demand Alignment for Speed. GPUs require data to be placed at specific memory address boundaries (multiples of 4, 8, or 16 bytes) to enable their fastest, parallel memory access patterns. This is not optional.
ShaderTypeDoes the Heavy Lifting. Bevy'sShaderTypetrait automatically handles most layout calculations and padding for you. Always derive it on your uniform structs.Handle
Vec3With Care. AVec3is 12 bytes but requires 16-byte alignment. This mismatch is the #1 source of bugs. Either use#[align(16)]on it or, preferably, use aVec4instead.Order Matters. Arrange fields in your struct from largest alignment to smallest (e.g.,
Mat4, thenVec4, thenf32). This minimizes wasted space from padding.Arrays Have a Special Rule. Any struct used in an array must have a total size that is a multiple of 16. Add explicit padding to meet this requirement.
When in Doubt, Verify. Use
std::mem::size_ofandstd::mem::align_ofin your Rust code to check your assumptions about a struct's layout.
What's Next?
Congratulations! You now understand the critical and often-feared topic of how data is laid out in GPU memory. You have the tools to design, implement, and debug complex, efficient uniform structures for any shader you can imagine.
In the next article, we'll shift our focus from data layout to data manipulation by diving into Essential Shader Math Concepts. We'll explore vector operations, matrix transformations, and the different coordinate spaces that work together to create a 3D scene.
Next up: 1.8 - Essential Shader Math Concepts
Quick Reference
A cheat sheet for GPU memory layout rules.
Alignment & Size Rules (WGSL)
| Type | Size | Alignment | Notes |
f32, u32, i32 | 4 bytes | 4 bytes | |
vec2 | 8 bytes | 8 bytes | |
vec3 | 12 bytes | 16 bytes | ⚠️ The most common trap! |
vec4 | 16 bytes | 16 bytes | |
mat2 | 16 bytes | 8 bytes | (Two 8-byte columns) |
mat3 | 48 bytes | 16 bytes | (Three 16-byte columns) |
mat4 | 64 bytes | 16 bytes | (Four 16-byte columns) |
Array of T | N * stride(T) | 16 bytes | Stride of each element is rounded up to 16. |
| Struct | Varies | Strictest member | A struct's alignment is the largest alignment of any of its fields. |
Rust Attributes
// Enable automatic GPU layout management.
#[derive(ShaderType)]
pub struct MyUniforms {
// Force this field to be padded as if it were 16 bytes.
// Use this to fix the Vec3 trap.
#[align(16)]
pub field: Vec3,
// Manually place this field at an exact byte offset (advanced).
#[offset(32)]
pub other: f32,
}
Best Practices Checklist
Did I use
Vec4instead ofVec3where possible?Are fields ordered from largest to smallest alignment?
If this struct is used in an array, is its
size_of::<T>()a multiple of 16?Have I documented any manual padding or complex layout choices






