Bevy & WGSL: Fragment Shader Fundamentals

What We're Learning

Welcome to Phase 3! In Phase 2, we took complete control of our geometry. We mastered the vertex shader, learning to manipulate the shape, position, and animation of every vertex on the GPU. We've answered the questions of "where" and "how" our geometry exists in the world. Now, we shift our focus to the most visually impactful question of all: what does it look like?

This brings us to the second major programmable stage of the graphics pipeline and the heart of this entire phase: the fragment shader.

If the vertex shader is the sculptor, shaping the form of our models, the fragment shader is the painter, giving them color, texture, light, and life. Every single pixel you see on your screen - every smooth gradient, every intricate texture, every realistic lighting effect - is the final product of a fragment shader's calculation.

This power comes with a critical shift in scale. Where vertex shaders run once per vertex (perhaps thousands of times per model), fragment shaders run once for nearly every visible pixel on your screen (potentially millions of times per frame). Mastering this stage is not just about creating beautiful visuals; it's also about learning to think efficiently at a massive scale.

This first article lays the groundwork for everything to come. We will demystify what a fragment is, how it gets its data, and how you can write your first fragment shaders to control the color of your creations.

By the end of this article, you'll understand:

What a fragment is and how it differs from a pixel
The fragment shader entry point and signature
How interpolation brings vertex data to fragments
The fragment shader's output: color values
Screen-space coordinates and built-in variables
The fragment execution model and performance implications
How overdraw impacts performance
Basic fragment shader patterns in Bevy
Building a complete color visualization material

The Rendering Pipeline Recap

Let's quickly revisit where the fragment shader fits into the grand journey of rendering a single frame. Understanding this context is crucial, as the fragment shader's inputs are the direct outputs of the stages that come before it.

Vertex Shader
- Input: Vertex data from a mesh (position, normal, UV, etc.).
- Process: Transforms each vertex's position into clip space. Prepares and passes other data (like normals and UVs) downstream.
- Runs: Once per vertex.
Rasterization (A non-programmable, hardware-driven stage)
- Input: The transformed triangles from the vertex shader.
- Process: The GPU determines exactly which pixels on the screen each triangle covers. For each covered pixel, it generates a "fragment" and perfectly interpolates the vertex data (color, UVs, etc.) across the triangle's surface to that fragment's specific location.
- Runs: A stream of fragments, ready for shading.
Fragment Shader ← WE ARE HERE
- Input: An individual fragment with its interpolated data.
- Process: Executes your custom WGSL code to calculate a final color for that fragment. This is where texturing and most lighting calculations happen.
- Runs: Once for (almost) every visible pixel of a mesh.
Output Merger (Another non-programmable stage)
- Process: The final colored fragment is subjected to tests like the depth test (is it behind something else?). If it passes, it may be blended with the pixel already on the screen before finally being written to the framebuffer.

The key insight: Notice the dramatic shift in workload between the stages. The vertex shader might run a few thousand times for a detailed model, but the fragment shader runs for millions of pixels on a standard 1080p screen. A simple quad has only 4 vertices, but if it fills a 500x500 pixel area on screen, the fragment shader will run 250,000 times! This incredible multiplier is why fragment shader performance is one of the most critical aspects of real-time graphics.

Understanding Fragments vs. Pixels

Before we dive deeper, we need to clarify a crucial piece of terminology. While we often use "fragment" and "pixel" interchangeably, they represent two distinct concepts.

A Pixel (short for "picture element") is the final, colored dot you see on your monitor. It has one job: display a single color.

A Fragment is a potential pixel. Think of it as a data packet generated by the rasterizer for a single pixel location that a triangle covers. This data packet contains everything needed to calculate a final pixel color, including:

Its screen position (which pixel it corresponds to).
A set of smoothly interpolated attributes (like UV coordinates, world position, and normals) that it inherited from the triangle's vertices.

A fragment is the input to the fragment shader. The shader runs its code, calculates a color, and then the fragment must pass a series of final hardware tests.

Fragment → Fragment Shader → Depth Test → Blend → Pixel

Not all fragments become pixels! A fragment is discarded and its color is never written to the screen if:

It fails the depth test (meaning it's hidden behind an object that has already been drawn closer to the camera).
Your shader code explicitly uses the discard keyword (for effects like cut-out transparency).
It is clipped because it lies outside the defined viewport.

For the rest of this article, we'll follow the common convention of using the terms loosely, but it's vital to remember this distinction: fragments are the candidates, and pixels are the winners.

The Fragment Shader Entry Point

Just as vertex shaders are identified by the @vertex attribute, fragment shaders are marked with @fragment. This attribute tells the WGSL compiler that this function is the main entry point for the fragment processing stage.

@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
    // This is where all the color calculation magic happens.
    // For now, we'll just return a solid color.
    return vec4<f32>(1.0, 0.0, 0.0, 1.0); // A solid, opaque red.
}

Let's break down this essential signature:

@fragment: The attribute that declares this function as the fragment shader's entry point.
in: VertexOutput: This is the most common input parameter. It's a struct containing all the data that was output by the vertex shader and then interpolated by the rasterizer. The fields of this struct are our raw materials for determining color.
-> @location(0) vec4<f32>: This defines the function's return type.
- vec4<f32>: The four-component vector represents the final RGBA (Red, Green, Blue, Alpha) color, with each component typically ranging from 0.0 to 1.0.
- @location(0): This specifies that the output color should be written to the first "render target." For now, you can think of this as the main image buffer that will eventually be displayed on the screen.

The Minimal Fragment Shader

The absolute simplest fragment shader requires no inputs at all. It just returns a constant color. Every single fragment processed by this shader will receive the exact same color value.

@fragment
fn fragment() -> @location(0) vec4<f32> {
    return vec4<f32>(1.0, 0.5, 0.0, 1.0); // A solid, opaque orange.
}

If you were to apply a material using this shader to a sphere, the entire sphere would appear as a flat, uniformly orange circle on your screen.

Function Signature Variations

The fragment shader's signature is flexible, allowing you to request only the data you need.

// Pattern 1: No inputs needed for a solid color.
@fragment
fn fragment() -> @location(0) vec4<f32> {
    return vec4<f32>(1.0, 1.0, 1.0, 1.0);
}

// Pattern 2: Using interpolated data from the vertex shader. (Most common)
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
    // e.g., using interpolated vertex colors
    return vec4<f32>(in.color.rgb, 1.0);
}

// Pattern 3: Using built-in hardware variables.
@fragment
fn fragment(
    @builtin(position) frag_coord: vec4<f32>
) -> @location(0) vec4<f32> {
    // Color based on pixel's screen coordinates
    return vec4<f32>(frag_coord.xy / 1000.0, 0.0, 1.0);
}

// Pattern 4: Combining inputs.
@fragment
fn fragment(
    in: VertexOutput,
    @builtin(position) frag_coord: vec4<f32>
) -> @location(0) vec4<f32> {
    // You can request both interpolated and built-in data.
    let color = mix(in.color.rgb, frag_coord.xy / 1000.0, 0.5);
    return vec4<f32>(color, 1.0);
}

// Pattern 5: Multiple Render Targets (Advanced).
// You can output different data to different image buffers simultaneously.
// This is the foundation of techniques like Deferred Rendering.
struct FragmentOutput {
    @location(0) color: vec4<f32>,
    @location(1) normal_data: vec4<f32>,
}

@fragment
fn fragment(in: VertexOutput) -> FragmentOutput {
    var out: FragmentOutput;
    out.color = vec4<f32>(1.0);
    out.normal_data = vec4<f32>(in.normal, 1.0);
    return out;
}

For the majority of this phase, we will be focusing on Pattern 2 and Pattern 4, as they form the basis of most standard texturing and lighting work.

Understanding Interpolation

Interpolation is the automatic process that bridges the gap between per-vertex outputs and per-fragment inputs. The GPU's rasterizer calculates smooth, intermediate values for every fragment across a triangle's surface based on the data at its three corner vertices.

Imagine a triangle with red, green, and blue assigned to its vertices. The fragment shader will receive a smoothly blended color for every fragment inside that triangle, creating a colorful gradient. This process applies to any numerical data passed from the vertex shader, such as UV coordinates, normals, and world positions.

The Data Flow

The flow is straightforward: you define a struct to pass data, and the hardware handles the interpolation.

// This struct is output by the vertex shader.
struct VertexOutput {
    @builtin(position) clip_position: vec4<f32>,
    @location(0) color: vec3<f32>, // Per-vertex value
    @location(1) uv: vec2<f32>,    // Per-vertex value
}

// RASTERIZER (Hardware): Magically creates interpolated
// versions of VertexOutput for every fragment.

// This struct is the input to the fragment shader.
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
    // `in.color` and `in.uv` are now smoothly interpolated per-fragment values.
    let texture_color = textureSample(my_texture, my_sampler, in.uv);
    return texture_color;
}

The key takeaway: Any field in your VertexOutput decorated with @location is automatically interpolated. You don't write the interpolation code; it's a fundamental feature of the GPU.

Interpolation Modes

You can hint at how the GPU should interpolate using the @interpolate attribute.

perspective (Default): The 3D-correct mode. Use this for colors, UVs, normals - basically everything on a 3D object. You don't need to write it explicitly.
linear: Simple 2D screen-space interpolation. Can be useful for UI, but looks wrong in 3D.
flat: No interpolation. Creates a faceted, low-poly look. Required for passing integer values.

The Golden Rule of Normal Interpolation

Interpolation has one critical pitfall: interpolating a normalized vector does not result in a normalized vector. The resulting vector will always be shorter than unit-length, which will break your lighting calculations.

The solution must become a reflex:

Always re-normalize your normal vector at the beginning of your fragment shader.

@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
    // ✓ CORRECT: The first thing we do is re-normalize.
    let normal = normalize(in.world_normal);

    // Now `normal` is safe to use for lighting.
    let diffuse = dot(normal, light_dir);
    // ...
}

Fragment Output: Color Values

The primary job of a fragment shader is to calculate and output a single color. This color determines what you see on the screen for that specific pixel. Understanding the format and properties of this output value is key to achieving the look you want.

Color Format: RGBA

Fragment shaders in WGSL output color as a four-component floating-point vector, or vec4<f32>. Each component corresponds to a channel in the RGBA color model:

// The standard output signature
-> @location(0) vec4<f32>

// R: Red   (0.0 to 1.0+)
// G: Green (0.0 to 1.0+)
// B: Blue  (0.0 to 1.0+)
// A: Alpha (0.0 = transparent, 1.0 = opaque)

Here are some common color values:

Black:       vec4<f32>(0.0, 0.0, 0.0, 1.0)
White:       vec4<f32>(1.0, 1.0, 1.0, 1.0)
Red:         vec4<f32>(1.0, 0.0, 0.0, 1.0)
Gray (50%):  vec4<f32>(0.5, 0.5, 0.5, 1.0)
Transparent: vec4<f32>(0.0, 0.0, 0.0, 0.0) // Alpha is 0
Semi-Trans:  vec4<f32>(1.0, 0.0, 0.0, 0.5) // 50% transparent red

High Dynamic Range (HDR) and Color Values

You may have noticed the + in the component range (0.0 to 1.0+). This is critically important. By default, Bevy uses a High Dynamic Range (HDR) rendering pipeline. This means your shader is not limited to outputting colors within the [0, 1] range. It can, and for realistic lighting often should, output much brighter values.

A standard white surface might be (1.0, 1.0, 1.0).
A bright light source or a reflection of the sun could be (50.0, 50.0, 50.0).

These HDR values represent physically-based brightness and are essential for effects like realistic bloom.

// A standard white surface color.
return vec4<f32>(1.0, 1.0, 1.0, 1.0);

// An HDR color for a very bright emissive surface.
// This value will be preserved for post-processing.
return vec4<f32>(2.0, 2.0, 1.5, 1.0);

// Negative values are physically meaningless for color and should be avoided.
// They will likely be clamped to 0 by later stages.
return vec4<f32>(-0.5, 0.5, 0.5, 1.0); // Will probably become (0.0, 0.5, 0.5)

For now, it's fine to work within the [0, 1] range for basic colors. But when we get to lighting, remember that you can return values greater than 1.0 to indicate intense brightness.

Color Spaces, sRGB, and Tonemapping

Color on computers is a surprisingly complex topic. The most important concept to grasp is that your shader performs its calculations in a linear color space, but your monitor displays color in a non-linear space called sRGB.

Fortunately, Bevy's rendering pipeline manages the conversion for you. Here's the journey your color takes after leaving the fragment shader:

Fragment Shader (You are here): You output a color in linear HDR format. This is where all lighting and blending math should happen, as it is physically accurate.
Post-Processing (Bevy): Bevy runs a series of effects on the HDR image. The most important one is Tonemapping. This is a smart process that takes the wide range of HDR brightness values (from 0 to 50 or more) and artistically maps them back into the standard [0, 1] range that a display can handle. It does this in a way that preserves detail in both dark shadows and bright highlights, preventing harsh clipping and creating a much more pleasing image.
Final Write (GPU Hardware): The tonemapped linear color is sent to the framebuffer. Because Bevy has configured the screen's texture format as sRGB, the GPU hardware automatically applies the correct gamma correction during the final write, converting the linear [0, 1] values into the non-linear sRGB values your monitor expects.

Your only responsibility is to do your math correctly in linear space. Bevy and the GPU will handle the rest.

Alpha and Transparency

The alpha channel (A in RGBA) controls a fragment's opacity.

// Fully opaque (most common)
return vec4<f32>(1.0, 0.0, 0.0, 1.0);

// 50% transparent
return vec4<f32>(1.0, 0.0, 0.0, 0.5);

Note: Enabling transparency is not as simple as just returning an alpha value less than 1.0. It requires additional configuration on your Material in Bevy and has significant performance implications (like disabling the Early-Z optimization we'll discuss later). We will dedicate a future article to mastering transparency; for now, we will stick to fully opaque materials where alpha is always 1.0.

Common Color Operations

Here are some common operations you'll perform on colors. All of these correctly operate in the linear color space of the shader.

// Brighten or Darken (Adjust Exposure)
// In linear space, multiplication is the correct way to adjust brightness.
let brightened = color * 1.5; // 50% brighter
let darkened = color * 0.5;   // 50% darker

// Blend (Linear Interpolation)
// The `mix()` function is the physically accurate way to blend two linear colors.
let blended = mix(color_a, color_b, 0.5); // A 50/50 blend

// Grayscale (Luminance)
// This calculates the perceived brightness of a color using weights
// that are correct for converting linear RGB to luminance.
let luminance = dot(color.rgb, vec3<f32>(0.2126, 0.7152, 0.0722));
let grayscale = vec3<f32>(luminance);

// Invert Color
// Note: This operation assumes the input is in the [0, 1] range.
// Using it on HDR colors greater than 1.0 will produce negative values.
let inverted = vec3<f32>(1.0) - color.rgb;

// Clamp to LDR (Low Dynamic Range)
// The `saturate()` or `clamp()` functions clamp a value to the `[0, 1]` range.
// WARNING: In Bevy's default pipeline, you should AVOID clamping your final output
// color. This is a destructive operation that throws away all HDR information.
// The Tonemapping pass is designed to handle this conversion gracefully.
let ldr_color = saturate(color.rgb);

A Note on Grayscale and the "Magic Numbers"

The vector vec3<f32>(0.2126, 0.7152, 0.0722) used for the grayscale calculation might seem arbitrary, but it's a precise model of human vision.

The reason for these specific weights is that the human eye is significantly more sensitive to green light than it is to red or blue light. If you had three lights of pure red, green, and blue all emitting the same physical amount of energy, the green light would appear much brighter to you.

The numbers are standardized weights from the Rec. 709 specification (used for HDTVs and modern monitors) that reflect this sensitivity in linear space:

Green: 0.7152 (~71.5%)
Red: 0.2126 (~21.3%)
Blue: 0.0722 (~7.2%)

By using these weights in a dot() product, we calculate the color's perceived brightness (luminance) in a way that is consistent with how our eyes work and how our screens are calibrated. Using older, incorrect weights for sRGB space would result in colors that look too dark or have a slight color shift when converted to grayscale.

Fragment Shader Built-in Variables

In addition to the interpolated data you pass from the vertex shader, the fragment shader has access to a special set of read-only input variables and writable output variables called built-ins. These are not user-defined data; they are provided directly by the GPU hardware and give you intrinsic information and control over the fragment currently being processed.

Their availability is specific to each shader stage. For example, @builtin(vertex_index) is only available in a vertex shader. Here is the complete list of built-ins you have access to specifically within the fragment shader stage.

Input Built-ins

These are read-only values that provide information about the fragment.

`@builtin(position): vec4<f32>` - Fragment Coordinates

This is the most fundamental and frequently used built-in. It provides the fragment's coordinates within the context of the screen.

@fragment
fn fragment(
    @builtin(position) frag_coord: vec4<f32>
) -> @location(0) vec4<f32> {
    // frag_coord.x: The fragment's X position in pixels.
    // frag_coord.y: The fragment's Y position in pixels.
    // frag_coord.z: The fragment's depth value (0.0 near to 1.0 far).
    // frag_coord.w: 1.0 / clip_space_w (used for perspective calculations).

    // ...
}

Screen-Space Coordinates Explained

The frag_coord variable lives in screen space.

X and Y: These are the pixel coordinates. The origin (0, 0) is at the top-left corner of the screen. X increases to the right, and Y increases downwards.
Z (Depth): This is the fragment's depth, a value between 0.0 and 1.0 representing its distance from the camera relative to the near and far clipping planes. 0.0 is as close as possible, and 1.0 is as far as possible. This is the value used by the GPU for depth testing.

The Dual Meaning of `@builtin(position)`

It is critical to remember that @builtin(position) has a completely different meaning and data format depending on where you use it:

In a vertex shader (output): It represents the vertex's position in homogeneous clip space. This is a 4D coordinate that you must calculate and provide to the rasterizer.
In a fragment shader (input): It represents the fragment's position in screen space. This is a read-only value provided by the hardware, measured in pixels.

Because of this dual meaning, you cannot have @builtin(position) in your vertex shader's output struct and also declare it as an input parameter to your fragment shader. This is a common source of confusion and compiler errors.

The standard Bevy pattern to resolve this is to use two separate structs: one for the vertex shader's output and a different one for the fragment shader's input.

// VERTEX shader returns this struct
struct VertexOutput {
    @builtin(position) clip_position: vec4<f32>,
    @location(0) color: vec3<f32>,
}

// FRAGMENT shader receives this struct
struct FragmentInput {
    @location(0) color: vec3<f32>,
}

@fragment
fn fragment(
    in: FragmentInput,                       // Interpolated data
    @builtin(position) frag_coord: vec4<f32> // Screen-space built-in
) -> @location(0) vec4<f32> {
    // Now you have access to both!
}

`@builtin(front_facing): bool` - Face Orientation

This boolean tells you if the current fragment belongs to a triangle that is facing the camera.

true: The triangle is "front-facing" (its vertices are in counter-clockwise order on screen).
false: The triangle is "back-facing" (its vertices are in clockwise order).

By default, Bevy enables back-face culling, an optimization that automatically discards all back-facing triangles. In this default state, @builtin(front_facing) will always be true. However, if you disable culling (a material setting), you can render both sides of a mesh. This is where front_facing becomes essential.

Common Use Cases:

Two-Sided Materials: Imagine a sheet of paper, a flag, or a playing card. You need to render both sides, but perhaps with different colors or textures. You can use front_facing to decide which texture to apply.
Refraction (Glass/Water): To properly simulate light bending as it enters and exits a material, you need to process both the front surfaces (where light enters) and the back surfaces (where light exits).
Debugging: Visualizing front and back faces with different colors is a great way to find geometry issues, like "flipped normals."

@fragment
fn fragment(
    @builtin(front_facing) is_front: bool,
) -> @location(0) vec4<f32> {
    // `select` is a branchless way to choose between two values.
    // select(value_if_false, value_if_true, condition)
    let color = select(
        vec3<f32>(1.0, 0.0, 0.0), // Red if false (back)
        vec3<f32>(0.0, 1.0, 0.0), // Green if true (front)
        is_front,
    );

    return vec4<f32>(color, 1.0);
}

`@builtin(primitive_index): u32` - Triangle Index

This is a debugging tool that provides the index of the current primitive (triangle or line) within the current draw call. It's not typically used for visual effects but is invaluable for visualizing meshes.

Use Case: Creating a "clown vomit" or "mesh ID" visualization where every triangle gets a unique color, making it easy to see the mesh topology.

`@builtin(sample_index): u32` - MSAA Sample Index

This input relates to Multisample Anti-Aliasing (MSAA). When MSAA is enabled, the GPU tests coverage at multiple "sample" locations within each pixel. This u32 tells you which specific sub-pixel sample (e.g., 0, 1, 2, or 3 for 4x MSAA) the shader is currently processing.

Use Case: This is very rarely used. It requires enabling a special, performance-intensive mode called "sample-rate shading," where the fragment shader runs for every single sub-pixel. This is reserved for extremely high-end, custom rendering techniques.

`@builtin(clip_distance): array<f32>` - Custom Clipping Plane Distance

This is an array of floating-point values that is passed from the vertex shader. In the vertex shader, you would calculate the signed distance of a vertex from a custom plane. The rasterizer then interpolates these distances, and the GPU hardware will automatically discard any fragments where the interpolated distance is negative.

Use Case: Slicing through objects, creating cross-section views, or rendering reflections on a perfectly flat plane like water.

Output Built-ins

These are variables your fragment shader can write to, which will affect later stages of the rendering pipeline. To use them, you add them to the return signature of your fragment shader function.

`@builtin(frag_depth)`: f32 - Custom Depth Output

By default, the depth (frag_coord.z) of a fragment is interpolated by the hardware. This output allows you to override that value with a custom one.

Use Case: This is essential for techniques that don't rely on standard rasterized geometry, such as raymarching SDFs (Signed Distance Fields) within a shader, or for advanced decal rendering systems.

// The output is added to the function's return signature
@fragment
fn fragment(in: FragmentInput) -> (@location(0) vec4<f32>, @builtin(frag_depth) f32) {
    let color = vec4<f32>(1.0, 0.0, 0.0, 1.0);
    let custom_depth = 0.25; // A custom depth value
    return (color, custom_depth);
}

CRITICAL PERFORMANCE WARNING: Writing to @builtin(frag_depth) disables the Early-Z optimization. The GPU cannot know a fragment's final depth until after the shader has finished executing, so it cannot discard hidden fragments early. This can have a severe negative impact on performance. Use this feature only when absolutely necessary.

`@builtin(sample_mask): u32` - MSAA Sample Mask

This u32 acts as a bitmask where each bit corresponds to a sub-pixel sample (e.g., 4 bits for 4x MSAA). While it can be an input, its power comes from being an output. You can programmatically decide which of the sub-pixel samples should be written to.

Use Case: Implementing "Alpha to Coverage." This is a technique for creating dithered, non-binary transparency for things like foliage. Instead of blending, you can use the alpha value to mask out a proportional number of samples, creating a stippled transparency effect that interacts correctly with the depth buffer.

The Fragment Execution Model

Understanding how and when your fragment shader code runs is the single most important factor in managing rendering performance. A small inefficiency in your shader is magnified millions of times, and can easily be the difference between a smooth 60 FPS and a stuttering slideshow.

The Multiplier Effect: From Vertices to Fragments

The core concept is simple: your fragment shader is executed for (almost) every single pixel that your mesh covers on screen.

Consider a moderately detailed 3D model with 10,000 vertices.

The vertex shader will run 10,000 times.

Now, imagine that model is close to the camera on a 1920x1080 screen and covers a 500x500 pixel area.

The fragment shader will run approximately 250,000 times.

That's a 25x multiplier. If the object fills the whole screen, the fragment shader could run over 2,000,000 times, a 200x multiplier over the vertex shader. This massive execution count is why fragment shaders are frequently the main performance bottleneck in a frame.

Early Depth Testing (EarlyZ)

Fragment shaders are the most expensive part of the rendering pipeline because they can run millions of times per frame. To combat this, GPUs have a critical optimization called Early Depth Testing (or Early-Z). The entire goal of Early-Z is to avoid running the fragment shader for pixels that are hidden behind other objects.

To understand this, you first need a clear picture of the Depth Buffer.

What is the Depth Buffer?

Think of the depth buffer (or z-buffer) as a grayscale image that sits alongside the final color image. Instead of storing color, each of its pixels stores a single number (usually from 0.0 to 1.0) representing the depth of the closest object rendered so far at that pixel location.

0.0 = At the camera's near plane (very close)
1.0 = At the camera's far plane (very far)

How Early-Z Works: A Step-by-Step Example

Imagine a scene where your player character is standing in front of a brick wall. For optimal performance, Bevy will try to draw opaque objects from front to back.

Draw the Player First: The GPU starts rasterizing the player model's triangles. For each potential pixel:
- It calculates the fragment's depth (e.g., 0.2).
- It looks at the depth buffer. The buffer is currently empty (or has the default "infinitely far" value).
- The depth test passes (0.2 is closer).
- The GPU runs the player's fragment shader to get a color.
- It writes the player's color to the color buffer and 0.2 to the depth buffer.
Draw the Wall Second: Now the GPU starts rasterizing the wall's triangles. For a fragment on the wall that is behind the player:
- It calculates the fragment's depth (e.g., 0.5).
- It looks at the depth buffer. The value at this pixel is now 0.2 (from the player).
- The depth test fails (0.5 is further away than 0.2).
- Because the test failed early, the GPU immediately discards the wall fragment.
- The expensive wall fragment shader is never executed for this pixel.

This is a massive performance save. By drawing the scene front-to-back, you avoid running costly fragment shaders for millions of pixels that would just be painted over anyway.

When Early-Z is Disabled: The GPU's Dilemma

This "early" test can only work if the GPU knows a fragment's final depth and knows for sure that the fragment will be opaque before running the shader. Certain shader operations make this impossible, forcing the GPU to revert to the old, slow method of running the shader first and doing a "late" depth test afterward.

Here are the main reasons Early-Z gets disabled:

Manually Writing to Depth (@builtin(frag_depth)): If your fragment shader contains code that modifies the fragment's depth, the GPU can't know the final depth value until after the shader has finished executing. The "early" test is impossible because the necessary information isn't available early enough.
Using discard: The discard keyword tells the GPU to throw the fragment away completely. This is often used for "cutout" transparency, like a chain-link fence where you throw away the fragments in the holes. The GPU cannot know if a fragment will be discarded until it runs the shader and hits that if statement. If it discarded the fragment early based on depth, it might be wrong (e.g., discarding a closer fence fragment that should have been a hole, preventing a character behind it from being seen).
Alpha Blending (Transparency): Alpha blending requires mixing the fragment's color with the color of whatever is already in the framebuffer behind it. If the GPU performed an early depth test and discarded a farther object, there would be no color to blend with! The GPU must run the fragment shaders for both the foreground and background objects and then blend their results, making overdraw unavoidable for transparent surfaces. This is why transparency is so expensive.

Practical Advice for Keeping Early-Z Active:

Draw Opaque Geometry First: Let Bevy's renderer sort opaque objects from front-to-back so the depth buffer can be filled effectively.
Avoid Transparency When Possible: An opaque material will always be faster than a transparent one. If something doesn't need to be see-through, don't use alpha.
Prefer Alpha Cutouts (discard) over Alpha Blending for Foliage: For things like leaves, using discard is often better than semi-transparent blending, as parts of the object can still benefit from Early-Z.
Use a Depth Pre-Pass: This is an advanced technique where you render all your opaque geometry once in a very fast pass that only writes to the depth buffer. Then, you render the geometry a second time with the full, expensive fragment shaders. In this second pass, Early-Z will be extremely effective at culling hidden pixels.

Fragment Shader Execution Groups: The 2x2 Quad

Just like vertex shaders, fragment shaders are executed in large parallel groups. However, there's a special, smaller unit of work that is fundamental to how they operate: the 2x2 pixel quad.

A GPU processes fragments in blocks of four, arranged in a 2x2 square on the screen. Even if a triangle only covers a single pixel, the GPU will likely activate a full 2x2 quad to process it.

Why a 2x2 block? The answer is Mipmapping and Derivatives.

This might seem inefficient, but it enables one of the most important features of texturing: automatic mipmap level selection.

The Problem: When you sample a texture, how does the GPU know whether to use the high-resolution original (mip 0), a blurry medium-resolution version (mip 4), or a tiny 1x1 pixel version (mip 10)? Using a high-res texture on a faraway object causes shimmering and aliasing, while using a low-res texture up close looks blurry and pixelated. The GPU needs to pick the perfect mip level.
The Solution: To pick the right mip level, the GPU needs to know how "stretched" or "squished" the texture is on the screen for a given fragment. It answers the question: "How much are the texture's UV coordinates changing as I move from this pixel to its immediate neighbors?"
This "rate of change" is called a derivative. The GPU calculates it by comparing the UV coordinates of the fragments within the 2x2 quad.
- By comparing the UVs of the left and right fragments in the quad, it can calculate the rate of change in the X direction (ddx).
- By comparing the UVs of the top and bottom fragments, it can calculate the rate of change in the Y direction (ddy).

Because all four fragments in the quad are processed together, they can share their data, allowing the GPU to instantly compute these derivatives for every fragment. This information tells the GPU exactly how large the texture's "footprint" is on the screen, which allows it to select the perfect mipmap level.

Practical Implications of Quad Execution

Understanding this has a few important consequences:

Automatic Mipmapping Just Works: This is the big win. When you use textureSample(), the GPU uses the quad to calculate derivatives implicitly, giving you high-quality, alias-free texturing for free. This is why you don't have to specify a mip level manually most of the time.
Inefficiency with Tiny Triangles: If you have a mesh that is so far away that its triangles only cover one or two pixels on screen, the GPU still has to launch a full 2x2 quad for each one. This means 2-3 of the fragment shader invocations in the quad are "wasted work" - they run, realize they are outside the triangle, and are discarded. This is known as quad overshading and can be a performance issue when rendering complex geometry from a great distance.
Divergence is Still a Problem: All fragments in a quad (and usually a larger group called a "wave" or "warp") execute the same code path in lockstep. If you have an if statement that depends on frag_coord (the pixel's position), it's very likely that some fragments in a quad will take one path and others will take the second. This divergence forces the hardware to execute both code paths, which is slower than if all fragments had taken the same path.

The Cost of Fragment Shaders

We've established that fragment shaders run for almost every pixel on the screen, potentially millions of times per frame. This massive execution count means they are nearly always the most performance-critical part of the rendering pipeline. While vertex shaders might consume 5-15% of your frame's GPU time, it's common for fragment shaders to be responsible for 60-80% or more.

Even a tiny, seemingly insignificant operation in a fragment shader is magnified millions of times. A single extra instruction that takes one nanosecond to execute can add milliseconds to your total frame time. Therefore, understanding what makes a fragment shader expensive is crucial.

GPU performance is typically limited by one of two factors:

ALU / Compute Bound: The shader is bottlenecked by the raw number of mathematical calculations it has to perform. The GPU's Arithmetic Logic Units (ALUs) are running at 100%, and the shader is limited by its "thinking time."
Memory / Bandwidth Bound: The shader is bottlenecked by how quickly it can fetch data (primarily textures) from VRAM. The GPU is spending most of its time "waiting for data" rather than calculating.

A slow shader is often suffering from one or both of these problems.

What Makes a Shader Compute-Bound? (Too Much Math)

Some math operations are much more expensive than others. A simple addition or multiplication might take a single cycle, but more complex functions can take many.

Procedural Calculations: Generating noise (like Perlin or Simplex), fractals, or complex patterns in the shader involves a lot of math and is a classic example of a compute-heavy task.
Complex Lighting Models: Physically-based lighting often involves many complex calculations, including pow, sqrt, and saturate, for every single light source.
Heavy Use of Expensive Functions: Trigonometry (sin, cos), transcendentals (pow, exp, log), and square roots (sqrt) are significantly more costly than basic arithmetic. A shader with many of these can easily become compute-bound.
Long Loops: A for loop that runs many times per fragment multiplies the cost of all the instructions inside it.

What Makes a Shader Memory-Bound? (Too Many Texture Reads)

Every time your shader calls textureSample(), the GPU has to fetch data from memory. While GPUs have very fast caches to help with this, it's still one of the slowest things a shader can do.

Multiple Texture Samples: A material that samples from a color texture, a normal map, a roughness map, and a metalness map is performing four separate memory fetches for every single pixel. This can quickly saturate the GPU's memory bandwidth.
Dependent Texture Reads: This is a worst-case scenario where the result of one texture sample is used to calculate the UV coordinates for a second texture sample. This prevents the GPU from fetching data in parallel and can cause a significant stall as it waits for the first read to complete before it can even begin the second.
Large, Uncompressed Textures: Using large, unoptimized textures can strain the memory bus and lead to cache misses, slowing down fetches.

An Intuitive Cost Hierarchy

While exact costs vary by GPU, a good mental model for the relative cost of operations is:

The Final Multiplier: Overdraw

Remember that overdraw multiplies the cost of everything. If you have a complex, memory-bound shader that samples 5 textures, and it's running on an object with an average of 3x overdraw, you are effectively performing 15 texture samples for that screen pixel. This is why controlling draw order and transparency is just as important as optimizing the shader code itself.

Understanding Overdraw and Fragment Load

Overdraw is one of the most significant performance killers in fragment-heavy scenes. It occurs whenever the GPU is forced to run a fragment shader for a pixel that will ultimately be covered up by another object drawn later in the same frame. It is, quite simply, wasted work.

What is Overdraw?

Imagine a scene with overlapping objects drawn in a "back-to-front" order: a skybox, a mountain range, and a tree.

Draw Skybox: The GPU shades every single pixel on the screen blue.
Draw Mountains: The GPU shades all the pixels for the mountains, painting over the skybox pixels that are behind them. The work done on the sky pixels now covered by the mountains was completely wasted.
Draw Tree: The GPU shades all the pixels for the tree, painting over the mountain pixels behind it. Again, the work on those mountain pixels was wasted.

In the areas where the tree overlaps the mountain, which overlaps the sky, the fragment shader has been run three times for the same final pixel. This is called 3x overdraw. An average overdraw of 2x means that, on average, every pixel on your screen is being shaded twice. High overdraw (5x or more) can bring even powerful GPUs to their knees.

Measuring Overdraw

Identifying where overdraw is happening is the first step to fixing it.

DIY Visualization: A simple trick is to create a special material that is semi-transparent and uses additive blending. When applied to your whole scene, areas with more overdraw will accumulate more light and appear brighter, creating a "heat map" of your scene's fragment cost.

// In a temporary debug material:
@fragment
fn fragment() -> @location(0) vec4<f32> {
    // Return a dim, constant color.
    return vec4<f32>(0.1, 0.1, 0.1, 1.0);
}

You would then configure this material in Bevy to use AlphaMode::Add. Brighter areas in the resulting image indicate higher overdraw.
Professional Tools (The Right Way): The industry-standard approach is to use a graphics debugger like RenderDoc. You can launch your Bevy application with RenderDoc, capture a single frame, and then use its built-in visualization modes. RenderDoc has an explicit "Overdraw" view that shows you exactly how many times each pixel is being shaded, letting you pinpoint problematic areas with ease.

Common Causes of Overdraw

Inefficient Draw Order: Drawing opaque objects from back-to-front is the worst-case scenario. It completely defeats the Early-Z optimization. (Fortunately, Bevy's default renderer sorts opaque objects front-to-back to help prevent this).
Transparent Surfaces: Transparency is the biggest enemy of overdraw. Because transparent objects must be blended with what's behind them, the GPU must run the fragment shader for both the transparent surface and the object behind it. Early-Z is effectively disabled for these interactions. A scene with many large, overlapping transparent surfaces (windows, particle effects, foliage) will inherently have high overdraw.
Dense Geometry and Small Triangles: Particle effects with large, overlapping billboards, or dense foliage where many leaves cover the same few pixels, are major sources of overdraw. Likewise, rendering a complex mesh from a great distance results in many tiny triangles covering the same pixel, forcing many quads to be processed for a small area.
UI Elements: A complex user interface with many overlapping panels and elements is, by definition, a high-overdraw situation.

Reducing Overdraw

Strategy 1: Rely on Front-to-Back Sorting for Opaque Objects: This is the most critical strategy, and Bevy's PBR renderer handles it for you by default for opaque meshes. By drawing the closest objects first, you fill the depth buffer with "near" values, allowing Early-Z to be maximally effective.
Strategy 2: Use Alpha Cutouts Instead of Blending: If an effect can be achieved with a "cutout" shader (using the discard keyword) instead of semi-transparent blending, it's often a performance win. The discarded fragments don't need to be blended and can still benefit from depth testing.
Strategy 3: Use a Depth Pre-Pass: This is a powerful technique for complex scenes. It works in two stages:
- Pass 1 (Depth Only): Render all opaque geometry with an extremely simple, fast vertex shader and no fragment shader at all, only writing depth values to the depth buffer.
- Pass 2 (Shading): Render all the opaque geometry again, but this time with the full, expensive fragment shaders. Because the depth buffer is already perfectly filled from the first pass, Early-Z can now discard every single hidden fragment with near-100% efficiency. This trades more draw calls for a massive reduction in fragment shader work.
Strategy 4: Occlusion Culling: Bevy automatically performs Frustum Culling (don't draw objects outside the camera's view). Occlusion Culling goes a step further and doesn't draw objects that are inside the view but are completely hidden behind other objects (e.g., a character in another room behind a wall). Bevy does not have this built-in, but plugins are available for this advanced technique.

The Vicious Multiplier: How Overdraw Amplifies Shader Cost

Finally, it's crucial to understand that overdraw and shader complexity multiply each other's performance impact.

Consider a moderately expensive shader that samples 4 textures.

With 1x overdraw (the ideal), you perform 4 texture samples per final pixel.
With 5x overdraw, you are now performing 20 texture samples for that same final pixel.

This is why a shader that runs at a smooth 60 FPS in isolation can drag a complex scene down to 20 FPS. You must optimize both the shader's intrinsic cost (reducing math and texture fetches) and the scene's overdraw.

Basic Fragment Shader Structure in Bevy

Now that we understand the theory, let's look at how a fragment shader is practically integrated into a custom Bevy Material. The structure is a direct mirror of the vertex shader integration we've already seen.

The Minimal Bevy Material

This is the simplest possible custom material. It defines a single color in Rust, passes it to the GPU as a uniform, and the fragment shader simply reads that uniform and outputs it.

// src/materials/simple_color.rs
use bevy::prelude::*;
use bevy::render::render_resource::{AsBindGroup, ShaderRef};

#[derive(Asset, TypePath, AsBindGroup, Debug, Clone)]
pub struct SimpleColorMaterial {
    // Bevy will upload the value of this field to the GPU.
    #[uniform(0)]
    pub color: LinearRgba,
}

impl Material for SimpleColorMaterial {
    // This material only needs a fragment shader. Bevy will use
    // its default vertex shader to handle transformations.
    fn fragment_shader() -> ShaderRef {
        "shaders/simple_color.wgsl".into()
    }
}

// assets/shaders/simple_color.wgsl

// The WGSL struct must match the layout of the Rust struct.
struct SimpleColorMaterial {
    color: vec4<f32>,
}

// Access the uniform data at the binding specified in Rust.
// Materials in Bevy are typically placed in bind group 2.
@group(2) @binding(0)
var<uniform> material: SimpleColorMaterial;

@fragment
fn fragment() -> @location(0) vec4<f32> {
    // Read the color from the uniform and return it.
    return material.color;
}

Key components of this pattern:

The Rust Material struct: This holds the data you want to control from the CPU (in this case, color). The AsBindGroup derive macro handles the work of preparing this data for the GPU.
The #[uniform(0)] attribute: This tells Bevy that this field should be part of a uniform buffer at binding 0 within the material's bind group.
The Material trait: The implementation of this trait tells Bevy's renderer which shader file(s) to use. If vertex_shader() is not specified, a default one is used.
The WGSL uniform block: The struct in WGSL and the @group(2) @binding(0) declaration provide the shader with access to the data uploaded from Rust.

Accepting Interpolated Input

Most fragment shaders need data from the vertex shader. To do this, you must provide both a vertex and a fragment shader in your Material implementation.

// in the material's .rs file
impl Material for InterpolatedMaterial {
    fn vertex_shader() -> ShaderRef {
        "shaders/interpolated.wgsl".into()
    }

    fn fragment_shader() -> ShaderRef {
        "shaders/interpolated.wgsl".into()
    }
}

The shader file then contains both entry points. The VertexOutput struct acts as the bridge, packaging the data that the rasterizer will interpolate.

// assets/shaders/interpolated.wgsl
#import bevy_pbr::mesh_functions
#import bevy_pbr::view_transformations::position_world_to_clip

// ... Uniforms and VertexInput ...

struct VertexOutput {
    @builtin(position) clip_position: vec4<f32>,
    @location(0) world_normal: vec3<f32>,
}

@vertex
fn vertex(in: VertexInput) -> VertexOutput {
    var out: VertexOutput;
    // ... calculate clip_position and world_normal ...
    out.world_normal = mesh_functions::mesh_normal_local_to_world(
        in.normal,
        in.instance_index
    );
    return out;
}

@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
    // `in.world_normal` is the interpolated value from the vertex stage.
    let normal = normalize(in.world_normal);

    // Use the interpolated normal to calculate a simple lighting value.
    let light_dir = normalize(vec3<f32>(1.0, 1.0, 1.0));
    let diffuse = max(0.0, dot(normal, light_dir));

    return vec4<f32>(vec3<f32>(diffuse), 1.0);
}

The Standard Material Pattern

For organization and to ensure correct memory layout, it's a common best practice to separate your uniform data into its own dedicated struct.

// In a `uniforms` module within your material's .rs file

#[derive(ShaderType, Debug, Clone, Copy, Default)]
pub struct MyMaterialUniforms {
    pub color: LinearRgba,
    pub intensity: f32,
    pub time: f32,
    // ... other fields
}

// In the main material .rs file

#[derive(Asset, TypePath, AsBindGroup, Debug, Clone)]
pub struct MyMaterial {
    #[uniform(0)]
    pub uniforms: MyMaterialUniforms, // A single field holding the struct
}

impl Material for MyMaterial {
    // ...
}

This pattern, which uses the ShaderType derive to guarantee alignment, keeps your code clean, organized, and robust against common GPU memory layout issues.

Complete Example: Color Visualization Material

It's time to put all this theory into practice. We will build a complete, interactive Bevy application with a custom "debug" material. This material will allow us to visualize all the different kinds of data a fragment shader has access to, from interpolated vertex attributes like normals and UVs to built-in hardware variables like screen position and depth.

Our Goal

We will create a scene with several 3D primitives (a sphere, a cube, and a torus). A single, versatile ColorVisualizationMaterial will be applied to all of them. By pressing keys, we will be able to cycle through different "display modes" in the fragment shader, changing what data is being used to generate the final color.

What This Project Demonstrates

Passing Data: How to correctly pass various attributes (world_position, world_normal, local_position, uv) from the vertex shader.
Interpolation in Action: We'll see how smoothly interpolated values create gradients across curved and flat surfaces.
Visualizing Built-ins: We'll render colors based on screen position (@builtin(position)), depth (frag_coord.z), and face orientation (@builtin(front_facing)).
Uniform-Based Branching: A safe and efficient if/else if chain in the fragment shader, controlled by a uniform, will be used to switch between visualization modes.
The Two-Struct Pattern: We will correctly use separate VertexOutput and FragmentInput structs to access both interpolated data and @builtin(position).

The Shader (`assets/shaders/d03_01_color_visualization.wgsl`)

This single WGSL file contains both our vertex and fragment shaders. The vertex shader's job is straightforward: it performs standard transformations and passes four key attributes (world_position, world_normal, local_position, and uv) to the next stage.

The fragment shader is where the real logic lives. It receives the interpolated data in a FragmentInput struct and also takes frag_coord and is_front as built-in parameters. The core of the shader is a large if/else if block that checks the material.display_mode uniform to decide which visualization to render. Each block takes a different piece of data and maps its values into a visible RGB color.

#import bevy_pbr::{
    mesh_functions,
    view_transformations::position_world_to_clip,
}

struct ColorVisualizationMaterial {
    display_mode: u32,  // Which property to visualize
    time: f32,
}

@group(2) @binding(0)
var<uniform> material: ColorVisualizationMaterial;

struct VertexInput {
    @builtin(instance_index) instance_index: u32,
    @location(0) position: vec3<f32>,
    @location(1) normal: vec3<f32>,
    @location(2) uv: vec2<f32>,
}

struct VertexOutput {
    @builtin(position) clip_position: vec4<f32>,
    @location(0) world_position: vec3<f32>,
    @location(1) world_normal: vec3<f32>,
    @location(2) local_position: vec3<f32>,
    @location(3) uv: vec2<f32>,
}

struct FragmentInput {
    @location(0) world_position: vec3<f32>,
    @location(1) world_normal: vec3<f32>,
    @location(2) local_position: vec3<f32>,
    @location(3) uv: vec2<f32>,
}

// Note: We need separate structs because @builtin(position) has different meanings:
// - In vertex shader: output clip-space position (for rasterization)
// - In fragment shader: input screen-space fragment coordinates
// We can't have it in both the input struct and as a parameter, so we use
// FragmentInput (without @builtin) and declare fragment coordinates separately.

@vertex
fn vertex(in: VertexInput) -> VertexOutput {
    var out: VertexOutput;

    let model = mesh_functions::get_world_from_local(in.instance_index);
    let world_position = mesh_functions::mesh_position_local_to_world(
        model,
        vec4<f32>(in.position, 1.0)
    );

    out.clip_position = position_world_to_clip(world_position.xyz);
    out.world_position = world_position.xyz;
    out.world_normal = mesh_functions::mesh_normal_local_to_world(
        in.normal,
        in.instance_index
    );
    out.local_position = in.position;
    out.uv = in.uv;

    return out;
}

// Convert a direction vector to RGB color (mapping [-1,1] to [0,1])
fn direction_to_color(dir: vec3<f32>) -> vec3<f32> {
    return dir * 0.5 + 0.5;
}

// Create a checkerboard pattern
fn checkerboard(uv: vec2<f32>, frequency: f32) -> f32 {
    let checker = (floor(uv.x * frequency) + floor(uv.y * frequency)) % 2.0;
    return checker;
}

@fragment
fn fragment(
    in: FragmentInput,
    @builtin(position) frag_coord: vec4<f32>,
    @builtin(front_facing) is_front: bool,
) -> @location(0) vec4<f32> {
    var color: vec3<f32>;

    if material.display_mode == 0u {
        // Mode 0: World-space position
        // Maps world position to color (mod to keep in 0-1 range)
        color = fract(in.world_position);

    } else if material.display_mode == 1u {
        // Mode 1: Local-space position
        color = direction_to_color(normalize(in.local_position));

    } else if material.display_mode == 2u {
        // Mode 2: World-space normal
        color = direction_to_color(normalize(in.world_normal));

    } else if material.display_mode == 3u {
        // Mode 3: UV coordinates
        color = vec3<f32>(in.uv, 0.0);

    } else if material.display_mode == 4u {
        // Mode 4: UV checkerboard
        let checker = checkerboard(in.uv, 10.0);
        color = vec3<f32>(checker);

    } else if material.display_mode == 5u {
        // Mode 5: Screen-space position (fragment coordinates)
        // Normalize by a reasonable screen size
        color = fract(frag_coord.xyz / 100.0);

    } else if material.display_mode == 6u {
        // Mode 6: Depth visualization
        let min_depth = 0.01115;
        let max_depth = 0.015;
        let depth = (frag_coord.z - min_depth) / (max_depth - min_depth);

        color = vec3<f32>(depth);
    } else if material.display_mode == 7u {
        // Mode 7: Front/back face
        if is_front {
            // Create a grid pattern in world-space to punch holes in the mesh.
            let check = checkerboard(in.uv, 10.0);

            // If this fragment is in a "hole" of our grid, discard it entirely.
            if check < 0.1 { // Use < 0.1 for float comparison safety
                discard;
            }

            // If not discarded, color it green.
            color = vec3<f32>(0.0, 1.0, 0.0);
        } else {
            // Back faces are a red checkerboard.
            let checker = checkerboard(in.uv, 20.0);
            color = vec3<f32>(checker, 0., 0.);
        }
    } else if material.display_mode == 8u {
        // Mode 8: Animated gradient
        let dist = length(in.local_position.xy);
        let wave = sin(dist * 10.0 - material.time * 3.0) * 0.5 + 0.5;
        color = vec3<f32>(wave, 1.0 - wave, 0.5);

    } else if material.display_mode == 9u {
        // Mode 9: Simple lighting demo
        let normal = normalize(in.world_normal);
        let light_dir = normalize(vec3<f32>(1.0, 1.0, 1.0));
        let diffuse = max(0.0, dot(normal, light_dir));

        let base_color = vec3<f32>(0.8, 0.6, 0.4);
        color = base_color * (0.3 + diffuse * 0.7);

    } else {
        // Fallback: Magenta for invalid mode
        color = vec3<f32>(1.0, 0.0, 1.0);
    }

    return vec4<f32>(color, 1.0);
}

The Rust Material (`src/materials/d03_01_color_visualization.rs`)

The Rust code for our material is a standard implementation. It uses the best practice of a nested uniforms module and derives ShaderType to ensure correct memory layout. It also includes a helper function get_mode_name that we'll use to display the current mode in the UI.

use bevy::prelude::*;
use bevy::render::render_resource::{AsBindGroup, ShaderRef};
use bevy::pbr::{Material, MaterialPipelineKey};
use bevy::render::mesh::MeshVertexBufferLayoutRef;
use bevy::render::render_resource::{RenderPipelineDescriptor, SpecializedMeshPipelineError};


mod uniforms {
    #![allow(dead_code)]

    use bevy::render::render_resource::ShaderType;

    #[derive(ShaderType, Debug, Clone, Copy)]
    pub struct ColorVisualizationUniforms {
        pub display_mode: u32,
        pub time: f32,
    }

    impl Default for ColorVisualizationUniforms {
        fn default() -> Self {
            Self {
                display_mode: 0,
                time: 0.0,
            }
        }
    }
}

pub use uniforms::ColorVisualizationUniforms;

#[derive(Asset, TypePath, AsBindGroup, Debug, Clone)]
pub struct ColorVisualizationMaterial {
    #[uniform(0)]
    pub uniforms: ColorVisualizationUniforms,
}

impl Material for ColorVisualizationMaterial {
    fn vertex_shader() -> ShaderRef {
        "shaders/d03_01_color_visualization.wgsl".into()
    }

    fn fragment_shader() -> ShaderRef {
        "shaders/d03_01_color_visualization.wgsl".into()
    }

    // Disable backface culling so we can see both front and back faces
    fn specialize(
        _pipeline: &bevy::pbr::MaterialPipeline<Self>,
        descriptor: &mut RenderPipelineDescriptor,
        _layout: &MeshVertexBufferLayoutRef,
        _key: MaterialPipelineKey<Self>,
    ) -> Result<(), SpecializedMeshPipelineError> {
        descriptor.primitive.cull_mode = None;
        Ok(())
    }
}

// Helper to get mode name for UI
pub fn get_mode_name(mode: u32) -> &'static str {
    match mode {
        0 => "World Position",
        1 => "Local Position",
        2 => "World Normal",
        3 => "UV Coordinates",
        4 => "UV Checkerboard",
        5 => "Screen Position",
        6 => "Depth",
        7 => "Front/Back Face",
        8 => "Animated Gradient",
        9 => "Simple Lighting",
        _ => "Invalid Mode",
    }
}

Don't forget to add it to src/materials/mod.rs:

// ... other materials
pub mod d03_01_color_visualization;

The Demo Module (`src/demos/d03_01_color_visualization.rs`)

The demo module sets up our scene, spawning the three shapes and applying the custom material. It includes several systems: one to update the time uniform every frame, another to handle keyboard input for changing the display_mode, and a third to update the UI text.

use crate::materials::d03_01_color_visualization::{
    ColorVisualizationMaterial, ColorVisualizationUniforms, get_mode_name,
};
use bevy::prelude::*;
use std::f32::consts::PI;

pub fn run() {
    App::new()
        .add_plugins(DefaultPlugins)
        .add_plugins(MaterialPlugin::<ColorVisualizationMaterial>::default())
        .init_resource::<RotationPaused>()
        .add_systems(Startup, setup)
        .add_systems(
            Update,
            (update_time, handle_input, rotate_objects, update_ui),
        )
        .run();
}

#[derive(Component)]
struct RotatingObject {
    speed: f32,
}

fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<ColorVisualizationMaterial>>,
) {
    let material = materials.add(ColorVisualizationMaterial {
        uniforms: ColorVisualizationUniforms::default(),
    });

    // Sphere - smooth surface, good for normals and lighting
    commands.spawn((
        Mesh3d(meshes.add(Sphere::new(1.0).mesh().uv(32, 16))),
        MeshMaterial3d(material.clone()),
        Transform::from_xyz(-3.0, 0.0, 0.0),
        RotatingObject { speed: 0.5 },
    ));

    // Cube - flat faces, good for seeing face differences
    commands.spawn((
        Mesh3d(meshes.add(Cuboid::new(1.5, 1.5, 1.5))),
        MeshMaterial3d(material.clone()),
        Transform::from_xyz(0.0, 0.0, 0.0),
        RotatingObject { speed: 0.3 },
    ));

    // Torus - complex geometry with interesting UVs
    commands.spawn((
        Mesh3d(meshes.add(Torus::new(0.6, 0.3).mesh().build())),
        MeshMaterial3d(material),
        Transform::from_xyz(3.0, 0.0, 0.0),
        RotatingObject { speed: 0.7 },
    ));

    // Lighting
    commands.spawn((
        DirectionalLight {
            illuminance: 10000.0,
            shadows_enabled: false,
            ..default()
        },
        Transform::from_rotation(Quat::from_euler(EulerRot::XYZ, -PI / 4.0, PI / 4.0, 0.0)),
    ));

    // Camera
    commands.spawn((
        Camera3d::default(),
        Transform::from_xyz(0.0, 2.0, 8.0).looking_at(Vec3::ZERO, Vec3::Y),
    ));

    // UI
    commands.spawn((
        Text::new(
            "[1-0] Select visualization mode | [Space] Pause rotation\n\
             \n\
             Mode: World Position",
        ),
        Node {
            position_type: PositionType::Absolute,
            top: Val::Px(10.0),
            left: Val::Px(10.0),
            padding: UiRect::all(Val::Px(10.0)),
            ..default()
        },
        TextFont {
            font_size: 16.0,
            ..default()
        },
        TextColor(Color::WHITE),
        BackgroundColor(Color::srgba(0.0, 0.0, 0.0, 0.7)),
    ));
}

fn update_time(time: Res<Time>, mut materials: ResMut<Assets<ColorVisualizationMaterial>>) {
    for (_, material) in materials.iter_mut() {
        material.uniforms.time = time.elapsed_secs();
    }
}

#[derive(Resource, Default)]
struct RotationPaused(bool);

fn handle_input(
    keyboard: Res<ButtonInput<KeyCode>>,
    mut materials: ResMut<Assets<ColorVisualizationMaterial>>,
    mut paused: ResMut<RotationPaused>,
) {
    // Toggle pause
    if keyboard.just_pressed(KeyCode::Space) {
        paused.0 = !paused.0;
    }

    // Mode selection
    for (_, material) in materials.iter_mut() {
        if keyboard.just_pressed(KeyCode::Digit1) {
            material.uniforms.display_mode = 0;
        } else if keyboard.just_pressed(KeyCode::Digit2) {
            material.uniforms.display_mode = 1;
        } else if keyboard.just_pressed(KeyCode::Digit3) {
            material.uniforms.display_mode = 2;
        } else if keyboard.just_pressed(KeyCode::Digit4) {
            material.uniforms.display_mode = 3;
        } else if keyboard.just_pressed(KeyCode::Digit5) {
            material.uniforms.display_mode = 4;
        } else if keyboard.just_pressed(KeyCode::Digit6) {
            material.uniforms.display_mode = 5;
        } else if keyboard.just_pressed(KeyCode::Digit7) {
            material.uniforms.display_mode = 6;
        } else if keyboard.just_pressed(KeyCode::Digit8) {
            material.uniforms.display_mode = 7;
        } else if keyboard.just_pressed(KeyCode::Digit9) {
            material.uniforms.display_mode = 8;
        } else if keyboard.just_pressed(KeyCode::Digit0) {
            material.uniforms.display_mode = 9;
        }
    }
}

fn rotate_objects(
    time: Res<Time>,
    paused: Res<RotationPaused>,
    mut query: Query<(&mut Transform, &RotatingObject)>,
) {
    if paused.0 {
        return;
    }

    for (mut transform, rotating) in query.iter_mut() {
        transform.rotate_y(time.delta_secs() * rotating.speed);
        transform.rotate_x(time.delta_secs() * rotating.speed)
    }
}

fn update_ui(
    materials: Res<Assets<ColorVisualizationMaterial>>,
    paused: Res<RotationPaused>,
    mut text_query: Query<&mut Text>,
) {
    if !materials.is_changed() && !paused.is_changed() {
        return;
    }

    if let Some((_, material)) = materials.iter().next() {
        let mode_name = get_mode_name(material.uniforms.display_mode);
        let pause_status = if paused.0 { "PAUSED" } else { "Playing" };

        for mut text in text_query.iter_mut() {
            **text = format!(
                "[1-0] Select visualization mode | [Space] Pause rotation\n\
                 \n\
                 Mode: {} | Rotation: {}",
                mode_name, pause_status
            );
        }
    }
}

Don't forget to add it to src/demos/mod.rs:

// ... other demos
pub mod d03_01_color_visualization;

And register it in src/main.rs:

Demo {
    number: "3.1",
    title: "Fragment Shader Fundamentals",
    run: demos::d03_01_color_visualization::run,
},

Running the Demo

When you run the application, you will see the three shapes rotating in the center of the screen. Use the number keys to cycle through the different visualization modes and observe how the colors on the objects change.

Controls

Key	Action
1 - 8	Select the visualization mode.
Spacebar	Pause / Resume the rotation of the objects.

What You're Seeing

Mode	Description
1 - World Position	Visualizes the object's absolute position in the 3D scene using a repeating color grid. The `fract()` function wraps the world coordinates, causing the colors to tile every 1 unit in world space. As objects rotate, their surfaces move through this fixed 3D color grid.
2 - Local Position	Shows the position relative to each object's own center. The `normalize()` function turns the position into a direction, which is then mapped to color. This creates a consistent gradient from the object's core outwards that remains "painted on" and unchanging as the object rotates.
3 - World Normal	Displays the direction each part of the surface is facing in world space. The direction vector is mapped to color (X-axis to Red, Y to Green, Z to Blue). Notice the smooth color transitions on the sphere, while each flat face of the cube has a single, solid color representing its orientation.
4 - UV Coordinates	Renders the model's 2D texture coordinates directly as color, mapping the U coordinate to the Red channel and V to the Green. This reveals the "unwrapped" layout of the mesh, which is fundamental for applying textures. Note the visible "seam" where the UVs meet.
5 - UV Checkerboard	Uses the UV coordinates to generate a procedural checkerboard pattern. This is a classic diagnostic tool used by 3D artists to visually inspect a model's UV map for unwanted stretching, compression, or distortion.
6 - Screen Position	The color is determined by the fragment's pixel coordinate on the screen. The colors appear to be projected onto the objects from your monitor, remaining static in screen space. As the objects rotate, they move through this fixed field of color.
7 - Depth	Visualizes the fragment's distance from the camera using repeating contour lines. To make the tiny differences in the non-linear depth buffer visible, we amplify the frag_coord.z value with a large multiplier and use fract() to create distinct bands, similar to a topographic map.
8 - Front/Back Face	This mode visualizes the difference between a mesh's outer (front) and inner (back) surfaces. To make the inner surfaces visible, the shader programmatically punches holes in the solid green front faces using the `discard` keyword. Through these cutouts, you can see the denser, red-and-black checkerboard pattern being rendered on the back faces. This entire effect is only possible because we disabled back-face culling in the Rust material file, allowing the GPU to render both sides of the triangles.
9 - Animated Gradient	A simple procedural pattern using `sin()` and `time` to demonstrate that fragment shaders can create dynamic, animated colors.
0 - Simple Lighting	A basic diffuse lighting model that uses the interpolated world normal to calculate how much light a surface receives from a fixed directional source. The brightness is calculated using the dot product between the normal and the light's direction.

Key Takeaways

You have now covered the foundational theory and practice of the fragment shader. Before moving on, ensure these core concepts are clear:

The Fragment Shader's Core Job: Its one and only purpose is to calculate and return the final color for a given fragment (a potential pixel).
Massive Parallelism: Fragment shaders run for nearly every visible pixel of a mesh, potentially millions of times per frame. This makes their performance absolutely critical.
Interpolation is Automatic: Data passed from the vertex shader (like normals, UVs, and world positions) is automatically and smoothly interpolated across a triangle's surface by the GPU hardware.
Always Re-Normalize Normals: The process of interpolation will cause normal vectors to become non-unit length. The very first step in a fragment shader that uses normals for lighting should be to normalize() them.
Work in Linear HDR Space: Your shader performs its calculations in a linear color space and can output High Dynamic Range (HDR) values greater than 1.0. Bevy's post-processing pipeline is responsible for tonemapping these values back to a displayable range.
Built-ins Provide Context: Hardware variables like @builtin(position) and @builtin(front_facing) give your shader crucial information about its location on the screen and the orientation of the geometry it belongs to.
Performance is Everything: Critical optimizations like Early-Z (which culls hidden fragments before they are shaded) and avoiding Overdraw (shading the same pixel multiple times) are fundamental to achieving good performance.
Engine and Shader Work Together: Shader effects can depend on render states set in the engine. Disabling back-face culling in a Bevy Material is a perfect example of how Rust code is sometimes required to enable a WGSL feature.

What's Next?

You are now able to create custom materials that color objects based on their geometry and position. We've treated color as a simple set of numbers, but to create truly rich and believable visuals, we need to understand how to manipulate it with more nuance.

In the next article, we will take a deeper dive into the art and science of digital color. We'll properly explore the difference between linear and sRGB color spaces, master a library of color operations like blending and contrast, and learn how to convert between different color models like HSV to achieve more intuitive artistic effects.

Next up: 3.2 - Color Spaces and Operations

Quick Reference

Basic Fragment Shader Syntax:

@fragment
fn fragment(in: FragmentInput) -> @location(0) vec4<f32> {
    // Return an opaque red color
    return vec4<f32>(1.0, 0.0, 0.0, 1.0);
}

Two-Struct Pattern for Inputs:

// Output by the Vertex Shader
struct VertexOutput {
    @builtin(position) clip_position: vec4<f32>,
    @location(0) world_normal: vec3<f32>,
}

// Received by the Fragment Shader
struct FragmentInput {
    @location(0) world_normal: vec3<f32>,
}

@fragment
fn fragment(
    in: FragmentInput,
    @builtin(position) frag_coord: vec4<f32>, // Separate built-in
) -> @location(0) vec4<f32> {
    let normal = normalize(in.world_normal); // Don't forget to normalize!
    // ...
}

Common Fragment Built-ins:

Built-in	Type	Description
@builtin(position)	`vec4<f32>`	Fragment's screen-space coordinates and depth.
@builtin(front_facing)	`bool`	true if the triangle is facing the camera.
@builtin(frag_depth)	`f32` (out)	Allows you to manually write the depth value.
@builtin(primitive_index)	`u32`	Index of the current triangle in the draw call.

Common Color Operations (in Linear Space):

// Blend two colors
let blended = mix(color_a, color_b, 0.5);

// Calculate luminance (perceived brightness)
let luminance = dot(color.rgb, vec3<f32>(0.2126, 0.7152, 0.0722));
let grayscale = vec3<f32>(luminance);

// Adjust exposure
let brighter = color * 1.5;
let darker = color * 0.5;

Performance Reminders:

Overdraw is the enemy: Shading the same pixel multiple times multiplies your shader's cost.
Keep Early-Z active: Avoid discard, alpha blending, and writing to frag_depth on opaque objects unless necessary, as they prevent the GPU from culling hidden pixels early.
Fragment shaders are expensive: Move calculations to the vertex shader or the CPU whenever possible.

Command Palette