Bevy & WGSL: High-Performance Instancing

What We're Learning

Instanced rendering is a fundamental optimization technique for rendering massive numbers of similar objects. It's the key to creating vast forests, dense asteroid fields, and sprawling crowds without bringing your CPU to its knees.

By the end of this article, you will understand:

The performance problem caused by excessive draw calls and how instancing solves it.
How to use the @builtin(instance_index) WGSL attribute to create per-instance variations.
Techniques for procedural variation using world position to make instances look natural and unique.
The difference between procedural variation and using storage buffers for explicit per-instance data.
How Bevy's renderer automatically instances entities that share the same mesh and material.
The practical performance benefits and limitations of instancing, such as culling and transparency.
How to build a complete, animated field of grass with thousands of unique, swaying blades.

The Problem: Why Render a Thousand Things Slowly?

Imagine you need to render a field with 40,000 blades of grass. A straightforward approach might be to create 40,000 entities, each with its own mesh, material, and transform, and then ask the GPU to draw them one by one.

This seems logical, but it runs headfirst into one of the biggest bottlenecks in real-time graphics: draw call overhead.

The Cost of a Conversation

Think of the CPU as a manager and the GPU as a highly specialized, incredibly fast worker. Every time the CPU tells the GPU to draw something, it's not just a simple command. It's a whole conversation:

CPU Prepares: The CPU gathers all the data for one blade of grass - its mesh, material, and world position.
CPU Submits: It packages this into a "draw call" and sends it to the graphics driver.
Driver Translates: The driver (special software from NVIDIA, AMD, etc.) translates this into a language the GPU hardware understands.
GPU Executes: The GPU finally receives the command and draws the single blade of grass.

This conversation has a fixed cost. It takes time, regardless of how simple the object being drawn is. Now, repeat that entire process 40,000 times for every single frame.

// ❌ THE NAIVE APPROACH - Do Not Do This!
// Spawning 40,000 individual entities.
for i in 0..40_000 {
    commands.spawn((
        Mesh3d(grass_blade_mesh.clone()),
        MeshMaterial3d(material.clone()),
        Transform::from_xyz(
            positions[i].x,
            positions[i].y,
            positions[i].z
        ),
    ));
}
// Result: 40,000 entities, potentially 40,000 draw calls.
// Frame Time: Unplayable.

Your manager (the CPU) spends all its time talking to the worker instead of preparing the next frame's logic, and your game grinds to a halt. You become CPU-bound, limited not by the GPU's drawing speed but by the CPU's ability to issue commands.

The Solution: Instanced Rendering

Instanced rendering flips this conversation on its head. Instead of 40,000 separate, costly conversations, the CPU has one, much smarter conversation with the GPU.

It says: "Here is the mesh for a single blade of grass. I want you to draw it 40,000 times. Also, here is a list of 40,000 different positions and rotations. Use the first transform for the first blade, the second for the second, and so on. Let me know when you're done."

The GPU, which is designed for exactly this kind of massive parallel work, can now execute the entire batch with incredible efficiency.

Without Instancing:                  With Instancing:

CPU: "Draw blade 1"      ──→ GPU     CPU: "Draw blade mesh x40,000" ──→  GPU
CPU: "Draw blade 2"      ──→ GPU          "Use this list of transforms"   │
CPU: "Draw blade 3"      ──→ GPU                                          │
    ... (repeats 40,000x) ...                                             ▼
CPU: "Draw blade 40,000" ──→ GPU                     GPU processes all instances in parallel
      │                                                                   │
      ▼                                                                   ▼
Result: SLOW! (CPU-bound)                                  Result: FAST! (GPU-bound)

This is the core concept of instancing: one draw call to render one mesh many times with per-instance variations. In the shader, we gain access to a special built-in variable that lets us know which instance we're currently drawing, allowing us to apply the correct unique properties.

The Magic Variable: `@builtin(instance_index)`

Now that we understand the "what" and "why" of instancing, let's explore the "how." How does a single shader, running on the GPU, know which of the 40,000 grass blades it's currently processing?

The answer is a special built-in variable provided by WGSL: @builtin(instance_index).

You can think of instance_index as a simple counter. When the GPU begins processing our single instanced draw call for 40,000 grass blades, this is what happens:

For the very first instance, the vertex shader runs with instance_index set to 0.
For the second instance, it runs with instance_index set to 1.
...and so on, all the way up to 39999 for the last instance.

This index is the crucial link that allows us to fetch unique data for each instance. It's our key to unlocking per-instance variation.

Accessing the Index in WGSL

To get access to this value, you simply add it to your vertex shader's input arguments with the @builtin decorator.

// In your vertex shader signature:

struct VertexInput {
    // Other attributes like position, normal, etc.
    @location(0) position: vec3<f32>,
    @location(1) normal: vec3<f32>,
    // Add this line to get the instance index
    @builtin(instance_index) instance_index: u32,
}

@vertex
fn vertex(in: VertexInput) -> VertexOutput {
    // Now you can use `in.instance_index` inside your shader.
    // It will be 0 for the first instance, 1 for the second, etc.

    // ... shader logic ...
}

On its own, the index is just a number. Its power comes from what we do with it. In a Bevy context, its most important job is to look up the correct Transform (model matrix) for the current instance from a list that Bevy automatically provides to the GPU. From there, we can derive all sorts of other interesting variations.

From Index to Positional Variation

While you could theoretically use the instance_index directly to create variations (e.g., if instance_index % 2 == 0 { color = green; }), this approach is brittle. If you add or remove an object, the indices of all subsequent objects shift, changing their appearance.

A far more powerful and stable technique is to use the instance_index for one primary purpose: to retrieve the instance's unique world transform. From that transform, we can extract the object's world position and use that as a stable source for generating consistent, procedural variations. An object's appearance will be tied to its location in the world, not the arbitrary order in which it was created.

Bevy's PBR shader imports make this incredibly simple. The mesh_functions::get_world_from_local() function takes the instance_index and returns the correct mat4x4<f32> model matrix for that specific instance.

The Core Pattern

The workflow inside the vertex shader looks like this:

Get the Index: Receive the instance_index as a vertex attribute.
Get the Transform: Use the index to fetch the instance's model matrix.
Extract World Position: The world position of the instance is stored in the fourth column of its model matrix.
Generate Variation: Feed this world position into simple "hash" functions to generate consistent, pseudo-random numbers.
Apply Variation: Use these numbers to modify attributes like height, color, or animation offsets.

Let's see this in action.

#import bevy_pbr::mesh_functions

// A simple function to generate a consistent "random" float (0.0 to 1.0)
// from a 2D world position. The same input position will always produce
// the same output number.
fn hash_from_position(pos: vec2<f32>) -> f32 {
    // Arbitrary numbers to create chaotic but deterministic results
    let p = pos * 0.1031;
    let n = dot(p, vec2<f32>(12.9898, 78.233));
    return fract(sin(n) * 43758.5453);
}

@vertex
fn vertex(
    @builtin(instance_index) instance_index: u32,
    @location(0) position: vec3<f32>,
    // ... other inputs
) -> VertexOutput {
    // 1. Get the transform matrix for this specific instance
    let model_matrix = mesh_functions::get_world_from_local(instance_index);

    // 2. Extract the instance's world position from the matrix's 4th column
    let instance_world_pos = vec2<f32>(model_matrix[3].x, model_matrix[3].z);

    // 3. Use the position to generate a consistent "random" value
    let random_val = hash_from_position(instance_world_pos);

    // 4. Use that value to create variations
    let height_scale = 0.8 + random_val * 0.4; // Scale from 0.8 to 1.2

    // 5. Apply the variation to the vertex
    var scaled_position = position;
    scaled_position.y *= height_scale;

    // Finally, transform the modified vertex into world space using the instance's matrix
    let world_position = model_matrix * vec4<f32>(scaled_position, 1.0);

    // ... continue with the rest of the MVP transformation ...
    var out: VertexOutput;
    // ...
    return out;
}

This pattern is the foundation of our grass field demo. It ensures that each blade of grass has a unique height, lean, and color based on where it is in the world, creating a natural, organic look without needing to store any extra data on the CPU.

For Full Control: Per-Instance Data Buffers

The positional variation technique we just covered is elegant and highly efficient for creating natural, organic-looking scenes. However, it has its limits. The variations are generated entirely on the GPU based on a mathematical recipe. What if you need to control each instance with specific data from your game's logic on the CPU?

What if you want to set the color of each grass blade based on team ownership?
What if each instance needs to display a specific frame from a sprite sheet animation?
What if an instance's properties are determined by complex game state, not just its position?

For these scenarios, where you need explicit, artist- or game-driven control, you need a way to send a large batch of custom data from the CPU to the GPU. The solution is a storage buffer.

What is a Storage Buffer?

A storage buffer is a versatile block of GPU memory that you can fill with a large array of custom data structures. In the context of instancing, you can create an array where each element corresponds to an instance, and then access that data in your shader using instance_index.

CPU Side (Your Rust Code):             GPU Side (Your WGSL Shader):
                                   ┌──────────────────────────────┐
let instance_data = vec![          │ Storage Buffer on GPU        │
  Instance { color: RED, ... },    │                              │
  Instance { color: BLUE, ... },   │ [ Data for instance 0 ]      │
  Instance { color: RED, ... },    │ [ Data for instance 1 ]      │
  // ... 39,997 more ...           │ [ Data for instance 2 ]      │
];                                 │ ...                          │
                                   │ [ Data for instance 39999 ]  │
// Upload this data to the GPU     └──────────────────────────────┘
                                                  │
                                                  │ access via index
                                                  ▼
                                      In shader: let data = my_buffer[instance_index];

A Glimpse into the Code

While the full implementation is an advanced topic that involves interacting with Bevy's render world, the shader-side code is quite intuitive. You define a struct that matches your CPU-side data, declare a storage buffer, and access it like an array.

// WGSL Shader (Conceptual)

// Define a struct for our custom per-instance data
struct InstanceProperties {
    color: vec4<f32>,
    animation_frame: u32,
    // ... any other data you need
}

// Declare the storage buffer. It's an array of our structs.
// Note the `storage, read` which specifies its type and usage.
@group(2) @binding(1)
var<storage, read> instance_properties: array<InstanceProperties>;

@vertex
fn vertex(@builtin(instance_index) idx: u32, /*... other inputs ...*/) {
    // Access the unique data for this instance
    let data = instance_properties[idx];

    // Now you can use this data to control the shader's output
    let final_color = data.color;
    // ... use data.animation_frame to calculate UV offsets, etc.
}

This method offers maximum flexibility at the cost of increased memory usage (you have to store the data for every instance) and more complex setup code in Rust.

For our grass field, where the goal is to create a natural, chaotic, and organic look, the procedural approach based on world position is not only simpler but also more memory-efficient and achieves the desired artistic result perfectly.

Instancing the Bevy Way: Automatic Batching

One of the best features of Bevy's renderer is that, in many cases, you get the performance benefits of instancing for free, without any special setup.

Bevy's renderer is designed to be efficient by default. During the "prepare" phase of rendering, it analyzes all the visible entities in your scene that you've told it to draw. It then automatically groups, or "batches," entities that can be rendered together in a single instanced draw call.

The Rules of Automatic Batching

For Bevy to automatically batch a group of entities, they must share two things:

The exact same Handle<Mesh>: They must all refer to the same mesh asset.
The exact same Handle<Material>: They must all use the same instance of a material asset.

If these two conditions are met, Bevy takes care of the rest behind the scenes:

Grouping: The renderer identifies the group of entities that share the mesh and material.
Data Collection: It efficiently gathers up the GlobalTransform (which contains the model matrix) from each entity in the group.
GPU Upload: It uploads this list of transforms to a buffer on the GPU.
Single Draw Call: It issues a single, instanced draw call, telling the GPU to render the mesh N times, where N is the number of entities in the group.
Shader Access: Inside your WGSL shader, the mesh_functions::get_world_from_local(instance_index) function we saw earlier is what Bevy uses to look up the correct transform from that GPU buffer for each instance.

This is precisely how our grass demo works. We spawn 40,000 entities, but we give every single one the same mesh handle and the same material handle.

// This code triggers Bevy's automatic instancing.
// Even though we spawn 40,000 separate entities, the renderer
// is smart enough to combine them into one draw call.

let blade_mesh_handle = meshes.add(/* ... */);
let grass_material_handle = materials.add(/* ... */);

for i in 0..40_000 {
    commands.spawn((
        // All entities share this handle
        Mesh3d(blade_mesh_handle.clone()),
        // All entities share this handle
        MeshMaterial3d(grass_material_handle.clone()),  
        // Each entity gets its own unique transform
        Transform::from_xyz( /* ... unique position ... */ ),
    ));
}

Performance Reality Check

The performance difference is not subtle; it's dramatic.

Scenario: 40,000 Grass Blades	Without Instancing (40k unique materials)	With Instancing (1 shared material)
Draw Calls	~40,000	1
CPU Frame Time	100+ ms (Completely CPU-bound)	< 1 ms (for draw prep)
Resulting FPS	< 10 FPS	60+ FPS (GPU-bound)

This automatic behavior is a huge advantage. It means you can structure your game logic around individual entities, but still get the performance of a highly optimized, low-level rendering technique.

Limitations to Keep in Mind

While powerful, this automatic system has some limitations you should be aware of:

Culling: The entire group of instances is culled as a single unit. If the bounding box of the entire group is visible, the GPU will process all instances in that group, even those that are individually off-screen. For a dense field of grass, this is usually acceptable.
Transparency: Correctly rendering transparent objects requires sorting them from back-to-front. Instanced rendering draws objects in an arbitrary order, which means it is generally not suitable for transparent objects that need to overlap correctly.

Creating Natural Variation

The key to a convincing field of grass is variation that appears organic, not robotic. If every blade is just "random," it can look like static noise. To achieve a natural look, we use three simple strategies in our shader:

Distribution Curves: Instead of a linear random distribution (where every height is equally likely), we can modify the random value, for example by squaring it (random * random). This biases the result towards smaller numbers, creating a field where most grass is short to medium height, with only the occasional tall blade sticking out.
Layered Noise: We use multiple, different "hash" functions for different attributes. One random value controls height, while a completely different one controls the "lean" or "bend" of the blade. This prevents obvious patterns where, for example, the tallest blades are always the most bent.
Color Tinting: Real grass isn't one solid color. We can use the world position to subtly shift the hue (slightly more yellow or blue) and brightness. This breaks up the visual repetition and adds depth to the scene.

We will apply all of these techniques in the final project.

Complete Example: Procedural Field of Grass

Let's put everything we've learned together to build a lush, animated grass field renderer. This system relies entirely on Bevy's automatic instancing. All variations - height, color, lean, and wind animation - are generated procedurally in the vertex shader based on each blade's world position.

Our Goal

We will create a scene with 40,000 individual blades of grass. Instead of melting our CPU with 40,000 draw calls, we will render them all in a single batch. We will add a custom material that handles the wind simulation and color variation on the GPU, and we'll implement a simple interactive camera to fly around and inspect our work.

What This Project Demonstrates

Automatic Instancing: How to structure your Bevy entities to trigger the renderer's automatic batching.
Procedural Vertex Animation: Modifying vertex positions in the shader to simulate wind without complex physics.
Position-Based Hashing: Generating stable random numbers in WGSL using world coordinates.
Instance-Dependent Logic: Using instance_index indirectly (via the model matrix) to make every object unique.

The Shader (`assets/shaders/d02_07_grass_instancing.wgsl`)

This shader is the heart of the project, handling both the procedural generation in the vertex stage and the lighting model in the fragment stage.

The @vertex shader is responsible for all the procedural variation and animation. It uses the instance_index to fetch each blade's transform, calculates a unique height_scale, lean, and color_tint from its world position, and applies the animated wind_sway. It then passes all the necessary data (world position, normal, UVs) to the fragment shader.
The @fragment shader performs a rich lighting calculation to give the scene life and depth. It uses a combination of techniques: standard diffuse lighting for basic color, fake ambient occlusion to darken the base of the blades, specular highlights to give the grass a healthy sheen, and soft rim lighting to make the blades pop from the background.

Note: The fragment shader uses several common lighting concepts like specular highlights and rim lighting. Don't worry about understanding every line of the lighting code just yet! Each of these techniques will be covered in great detail during the "Lighting Models" phase of the curriculum. For now, focus on how the vertex shader enables this advanced lighting by providing it with the necessary data.

#import bevy_pbr::{
    mesh_functions,
    mesh_bindings::mesh,
    mesh_view_bindings::view,
    view_transformations::position_world_to_clip,
}

struct GrassMaterial {
    time: f32,
    wind_strength: f32,
    wind_direction: vec2<f32>,
    wind_speed: f32,
}

@group(2) @binding(0)
var<uniform> material: GrassMaterial;

struct VertexInput {
    @builtin(instance_index) instance_index: u32,
    @location(0) position: vec3<f32>,
    @location(1) normal: vec3<f32>,
    @location(2) uv: vec2<f32>,
}

struct VertexOutput {
    @builtin(position) clip_position: vec4<f32>,
    @location(0) color_tint: vec3<f32>,
    @location(1) world_normal: vec3<f32>,
    @location(2) world_position: vec3<f32>,
    @location(3) uv: vec2<f32>,
}
// Generates pseudo-random values based on world position.
fn hash_positional(pos: vec2<f32>) -> f32 {
    let p = pos * 0.05;
    let n = dot(p, vec2<f32>(12.9898, 78.233));
    return fract(sin(n) * 43758.5453);
}

fn hash_positional_2(pos: vec2<f32>) -> f32 {
    let p = pos * 0.07;
    let n = dot(p, vec2<f32>(34.376, 63.934));
    return fract(sin(n) * 39748.8473);
}

// 2D noise for organic, non-repetitive patterns. Used here for wind waves.
fn value_noise_2d(p: vec2<f32>) -> f32 {
    let p_int = floor(p);
    let p_frac = fract(p);
    let uv = p_frac * p_frac * (3.0 - 2.0 * p_frac); // smoothstep

    let n00 = hash_positional(p_int + vec2(0.0, 0.0));
    let n01 = hash_positional(p_int + vec2(0.0, 1.0));
    let n10 = hash_positional(p_int + vec2(1.0, 0.0));
    let n11 = hash_positional(p_int + vec2(1.0, 1.0));

    let n0 = mix(n00, n01, uv.y);
    let n1 = mix(n10, n11, uv.y);
    return mix(n0, n1, uv.x);
}

// Get per-instance variations based on world position.
fn get_height_scale(pos: vec2<f32>) -> f32 {
    // Square the random value to favor shorter grass, making taller blades rarer.
    let rand = hash_positional(pos);
    return 0.6 + (rand * rand) * 0.8;
}

fn get_lean(pos: vec2<f32>) -> f32 {
    return (hash_positional_2(pos) - 0.5) * 0.6; // -0.3 to 0.3
}

fn get_color_tint(pos: vec2<f32>) -> vec3<f32> {
    let hue_rand = hash_positional(pos);
    let bright_rand = hash_positional_2(pos);

    // Base color for lush grass
    var color = vec3(0.2, 0.45, 0.1);

    // Add some yellow/brown for variation
    color.r += (hue_rand - 0.5) * 0.2;
    // Vary brightness
    color *= 0.8 + bright_rand * 0.4;

    return color;
}

// Calculates a dreamy, wave-like wind sway.
fn calculate_wind_sway(
    height_influence: f32,
    instance_world_pos: vec2<f32>,
) -> vec3<f32> {
    // Large-scale waves using noise - made slightly faster
    let wave_coord_1 = instance_world_pos * 0.2 + material.time * material.wind_speed * 0.2 * material.wind_direction;
    let wave_1 = value_noise_2d(wave_coord_1) - 0.5;

    // Smaller, faster ripples on top - also made faster
    let wave_coord_2 = instance_world_pos * 0.6 + material.time * material.wind_speed * 0.6 * material.wind_direction.yx;
    let wave_2 = value_noise_2d(wave_coord_2) - 0.5;

    // Combine waves. The influence of height means the top of the blade moves most.
    let total_sway = (wave_1 * 0.6 + wave_2 * 0.4) * material.wind_strength * height_influence;

    // Apply sway in the specified wind direction.
    var sway_offset = vec3(0.0);
    sway_offset.x = material.wind_direction.x * total_sway;
    sway_offset.z = material.wind_direction.y * total_sway;

    return sway_offset;
}

@vertex
fn vertex(in: VertexInput) -> VertexOutput {
    var out: VertexOutput;

    let model = mesh_functions::get_world_from_local(in.instance_index);
    let instance_world_pos = vec2<f32>(model[3].x, model[3].z);

    // Get procedural variations for this blade of grass
    let height_scale = get_height_scale(instance_world_pos);
    let lean = get_lean(instance_world_pos);
    out.color_tint = get_color_tint(instance_world_pos);

    // Apply scaling and lean
    var local_pos = in.position;
    local_pos.y *= height_scale;
    local_pos.x += lean * local_pos.y * 0.2; // Lean increases with height

    // Wind animation
    // The influence of wind is stronger at the top of the blade.
    let height_influence = pow(in.position.y, 2.0);
    let wind_sway = calculate_wind_sway(height_influence, instance_world_pos);
    local_pos += wind_sway;

    // Transform to world and clip space
    let world_position = (model * vec4<f32>(local_pos, 1.0)).xyz;
    out.clip_position = position_world_to_clip(world_position);

    // Apply lean and sway to normals for correct lighting
    var normal_offset = vec3(0.0, 0.0, 0.0);
    normal_offset.x = lean * 0.2;
    normal_offset += wind_sway * 2.0;
    out.world_normal = mesh_functions::mesh_normal_local_to_world(
        normalize(in.normal - normal_offset),
        in.instance_index
    );
    out.world_position = world_position;
    out.uv = in.uv;

    return out;
}

@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
    let normal = normalize(in.world_normal);

    // Get essential vectors
    let light_dir = normalize(vec3(0.8, 1.0, 0.5));
    // The view direction is from the fragment's world position to the camera
    let view_dir = normalize(view.world_position.xyz - in.world_position);

    // 1. Fake Ambient Occlusion
    // Darken the base of the blade. `in.uv.y` goes from 1.0 (bottom) to 0.0 (top).
    // `smoothstep` creates a nice gradient.
    let ambient_occlusion = mix(0.5, 1.0, smoothstep(0.0, 0.4, 1.0 - in.uv.y));
    let ambient = 0.4 * ambient_occlusion;

    // Standard diffuse lighting
    let diffuse = max(0.0, dot(normal, light_dir)) * 0.6;

    // 2. Specular Highlights (Blinn-Phong)
    // `half_vec` is halfway between the light and view directions.
    let half_vec = normalize(light_dir + view_dir);
    // The dot product with the normal tells us how aligned the surface is to reflect light.
    // `pow(..., 64.0)` creates a small, tight highlight.
    let specular_power = 64.0;
    let specular = pow(max(0.0, dot(normal, half_vec)), specular_power) * 0.3;

    // 3. Rim Lighting
    // `rim_dot` is close to 1 when we look straight at a surface, and 0 at the edge.
    let rim_dot = 1.0 - max(0.0, dot(normal, view_dir));
    // `pow(..., 2.0)` enhances the effect at the very edge.
    let rim_intensity = pow(rim_dot, 2.0) * 0.5;

    // Combine all lighting components
    let total_lighting = ambient + diffuse + specular + rim_intensity;
    let lit_color = in.color_tint * total_lighting;

    return vec4<f32>(lit_color, 1.0);
}

The Rust Material (`src/materials/d02_07_grass_instancing.rs`)

This is our standard material setup. It defines the GrassMaterial asset type and the GrassMaterialUniforms struct that will be sent to the GPU. The time uniform is updated every frame from a Rust system, which drives the wind animation.

use bevy::prelude::*;
use bevy::render::render_resource::{AsBindGroup, ShaderRef};

mod uniforms {
    #![allow(dead_code)]

    use bevy::prelude::*;
    use bevy::render::render_resource::ShaderType;

    // The uniform struct for our grass material
    #[derive(ShaderType, Debug, Clone, Copy, Default)]
    pub struct GrassMaterial {
        pub time: f32,
        pub wind_strength: f32,
        pub wind_direction: Vec2,
        pub wind_speed: f32,
    }
}

pub use uniforms::GrassMaterial as GrassMaterialUniforms;

// The Bevy Asset and BindGroup for our grass material
#[derive(Asset, TypePath, AsBindGroup, Debug, Clone)]
pub struct GrassMaterial {
    #[uniform(0)]
    pub uniforms: GrassMaterialUniforms,
}

impl Material for GrassMaterial {
    fn vertex_shader() -> ShaderRef {
        "shaders/d02_07_grass_instancing.wgsl".into()
    }

    fn fragment_shader() -> ShaderRef {
        "shaders/d02_07_grass_instancing.wgsl".into()
    }
}

Don't forget to add it to src/materials/mod.rs:

// ... other materials
pub mod d02_07_grass_instancing;

The Demo Module (`src/demos/d02_07_grass_instancing.rs`)

Dependency Note: This demo uses the rand crate to give each blade of grass a unique, random rotation, which makes the field look much more natural. Before adding the code, you must add this dependency to your project.

Open your Cargo.toml file and add the following line under [dependencies]:

[dependencies]
bevy = "0.16" # Ensure this matches your Bevy version
rand = "0.8.5"

The application logic is straightforward. The setup function:

Creates a single, simple mesh for one blade of grass.
Creates one instance of our GrassMaterial.
Spawns 40,000 entities in a grid. Crucially, every entity gets a clone() of the same mesh and material handles, which is what triggers Bevy's automatic instancing.
Each entity is given a unique position and a random Y-axis rotation.

The update_time system continuously updates the time uniform in our material, driving the wind animation on the GPU. Other systems handle input for controlling the wind and camera.

use crate::materials::d02_07_grass_instancing::{GrassMaterial, GrassMaterialUniforms};
use bevy::{
    pbr::MeshMaterial3d,
    prelude::*,
    render::{
        mesh::{Indices, PrimitiveTopology},
        render_asset::RenderAssetUsages,
    },
};
use std::f32::consts::PI;

const GRID_SIZE: i32 = 200;

pub fn run() {
    App::new()
        .add_plugins(DefaultPlugins)
        .add_plugins(MaterialPlugin::<GrassMaterial>::default())
        .add_systems(Startup, setup)
        .add_systems(
            Update,
            (update_time, handle_input, update_camera, update_ui),
        )
        .run();
}

#[derive(Component)]
struct OrbitCamera {
    radius: f32,
    angle: f32,
    height: f32,
    target: Vec3,
}

#[derive(Resource)]
struct GrassMaterialHandle(Handle<GrassMaterial>);

fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<GrassMaterial>>,
    mut standard_materials: ResMut<Assets<StandardMaterial>>,
) {
    let blade_mesh_handle = meshes.add(create_grass_blade_mesh());

    let material_handle = materials.add(GrassMaterial {
        uniforms: GrassMaterialUniforms {
            time: 0.0,
            wind_strength: 1.5,
            wind_speed: 2.0,
            wind_direction: Vec2::new(1.0, 0.5).normalize(),
        },
    });

    // Store the handle as a resource so we can access it reliably
    commands.insert_resource(GrassMaterialHandle(material_handle.clone()));

    let spacing = 0.2;
    for x in 0..GRID_SIZE {
        for z in 0..GRID_SIZE {
            commands.spawn((
                Mesh3d(blade_mesh_handle.clone()),
                MeshMaterial3d(material_handle.clone()),
                Transform::from_xyz(
                    (x as f32 - GRID_SIZE as f32 / 2.0) * spacing,
                    0.0,
                    (z as f32 - GRID_SIZE as f32 / 2.0) * spacing,
                )
                .with_rotation(Quat::from_rotation_y(rand::random::<f32>() * PI * 2.0)),
            ));
        }
    }

    println!(
        "Spawned {} grass blades with wind_strength: 1.5",
        GRID_SIZE * GRID_SIZE
    );

    commands.spawn((
        Mesh3d(meshes.add(Plane3d::default().mesh().size(50.0, 50.0))),
        MeshMaterial3d(standard_materials.add(StandardMaterial {
            base_color: Color::srgb(0.2, 0.3, 0.1),
            ..default()
        })),
    ));

    commands.spawn((
        DirectionalLight {
            illuminance: 15000.0,
            shadows_enabled: false,
            ..default()
        },
        Transform::from_rotation(Quat::from_euler(EulerRot::XYZ, -PI / 3.0, PI / 4.0, 0.0)),
    ));

    commands.spawn((
        Camera3d::default(),
        Transform::from_xyz(-10.0, 5.0, 10.0).looking_at(Vec3::new(0.0, 2.0, 0.0), Vec3::Y),
        OrbitCamera {
            radius: 15.0,
            angle: -PI / 4.0,
            height: 5.0,
            target: Vec3::new(0.0, 2.0, 0.0),
        },
    ));

    commands.spawn((
        Text::new(""),
        Node {
            position_type: PositionType::Absolute,
            top: Val::Px(10.0),
            left: Val::Px(10.0),
            padding: UiRect::all(Val::Px(10.0)),
            ..default()
        },
        TextFont {
            font_size: 16.0,
            ..default()
        },
        TextColor(Color::WHITE),
        BackgroundColor(Color::srgba(0.0, 0.0, 0.0, 0.7)),
    ));
}

fn create_grass_blade_mesh() -> Mesh {
    let mut mesh = Mesh::new(
        PrimitiveTopology::TriangleList,
        RenderAssetUsages::default(),
    );

    let width = 0.1;
    let height = 1.0;

    let positions: Vec<[f32; 3]> = vec![
        [-width / 2.0, 0.0, 0.0],
        [width / 2.0, 0.0, 0.0],
        [width / 4.0, height, -width / 4.0],
        [-width / 4.0, height, width / 4.0],
    ];

    let normals: Vec<[f32; 3]> = vec![
        [0.0, 0.0, 1.0],
        [0.0, 0.0, 1.0],
        [0.0, 0.0, 1.0],
        [0.0, 0.0, 1.0],
    ];

    let uvs: Vec<[f32; 2]> = vec![[0.0, 1.0], [1.0, 1.0], [1.0, 0.0], [0.0, 0.0]];

    let indices: Vec<u32> = vec![0, 1, 2, 0, 2, 3];

    mesh.insert_attribute(Mesh::ATTRIBUTE_POSITION, positions);
    mesh.insert_attribute(Mesh::ATTRIBUTE_NORMAL, normals);
    mesh.insert_attribute(Mesh::ATTRIBUTE_UV_0, uvs);
    mesh.insert_indices(Indices::U32(indices));
    mesh
}

fn update_time(
    time: Res<Time>,
    material_handle: Res<GrassMaterialHandle>,
    mut materials: ResMut<Assets<GrassMaterial>>,
) {
    if let Some(material) = materials.get_mut(&material_handle.0) {
        material.uniforms.time = time.elapsed_secs();
        // get_mut automatically marks the asset as changed in Bevy's asset system
    }
}

fn handle_input(
    keyboard: Res<ButtonInput<KeyCode>>,
    time: Res<Time>,
    material_handle: Res<GrassMaterialHandle>,
    mut materials: ResMut<Assets<GrassMaterial>>,
) {
    let delta = time.delta_secs();

    if let Some(material) = materials.get_mut(&material_handle.0) {
        if keyboard.pressed(KeyCode::KeyW) {
            material.uniforms.wind_strength =
                (material.uniforms.wind_strength + delta * 0.5).min(3.0);
        }
        if keyboard.pressed(KeyCode::KeyS) {
            material.uniforms.wind_strength =
                (material.uniforms.wind_strength - delta * 0.5).max(0.0);
        }
        if keyboard.pressed(KeyCode::KeyQ) {
            material.uniforms.wind_speed = (material.uniforms.wind_speed - delta).max(0.1);
        }
        if keyboard.pressed(KeyCode::KeyE) {
            material.uniforms.wind_speed = (material.uniforms.wind_speed + delta).min(5.0);
        }

        if keyboard.pressed(KeyCode::KeyA) || keyboard.pressed(KeyCode::KeyD) {
            let rotation_amount = if keyboard.pressed(KeyCode::KeyA) {
                delta
            } else {
                -delta
            };
            let angle = material
                .uniforms
                .wind_direction
                .y
                .atan2(material.uniforms.wind_direction.x)
                + rotation_amount;
            material.uniforms.wind_direction = Vec2::new(angle.cos(), angle.sin());
        }
    }
}

fn update_camera(
    keyboard: Res<ButtonInput<KeyCode>>,
    time: Res<Time>,
    mut query: Query<(&mut Transform, &mut OrbitCamera), With<Camera3d>>,
) {
    if let Ok((mut transform, mut orbit)) = query.single_mut() {
        let delta = time.delta_secs();
        if keyboard.pressed(KeyCode::ArrowLeft) {
            orbit.angle += delta;
        }
        if keyboard.pressed(KeyCode::ArrowRight) {
            orbit.angle -= delta;
        }
        if keyboard.pressed(KeyCode::KeyZ) {
            orbit.height = (orbit.height - delta * 5.0).max(1.0);
        }
        if keyboard.pressed(KeyCode::KeyX) {
            orbit.height = (orbit.height + delta * 5.0).min(20.0);
        }

        transform.translation = orbit.target
            + Vec3::new(
                orbit.angle.cos() * orbit.radius,
                orbit.height,
                orbit.angle.sin() * orbit.radius,
            );
        *transform = transform.looking_at(orbit.target, Vec3::Y);
    }
}

fn update_ui(
    material_handle: Res<GrassMaterialHandle>,
    materials: Res<Assets<GrassMaterial>>,
    mut text_query: Query<&mut Text>,
) {
    if !materials.is_changed() {
        return;
    }

    if let Some(material) = materials.get(&material_handle.0) {
        for mut text in text_query.iter_mut() {
            **text = format!(
                "[W/S] Wind Strength: {:.2}\n\
                 [A/D] Wind Direction\n\
                 [Q/E] Wind Speed: {:.2}\n\
                 [Z/X] Camera Height\n\
                 [Arrows] Orbit Camera\n\n\
                 Blades: {}\n\
                 Time: {:.1}s",
                material.uniforms.wind_strength,
                material.uniforms.wind_speed,
                GRID_SIZE * GRID_SIZE,
                material.uniforms.time
            );
        }
    }
}

Don't forget to add it to src/demos/mod.rs:

// ... other demos
pub mod d02_07_grass_instancing;

And register it in src/main.rs:

Demo {
    number: "2.7",
    title: "Instanced Rendering",
    run: demos::d02_07_grass_instancing::run,
},

Running the Demo

When you run the demo, you'll be greeted by a field of 40,000 blades of grass swaying gently in a simulated wind. The initial performance should be excellent, demonstrating the power of a single instanced draw call. Use the controls to manipulate the wind and camera to see how the procedural animation responds in real-time.

Controls

Key	Action
W / S	Increase / Decrease wind strength.
A / D	Change the wind direction.
Q / E	Decrease / Increase wind speed.
Arrow Keys	Orbit the camera around the field.
Z / X	Lower / Raise the camera height.

What You're Seeing

High Performance: Notice your high and stable frame rate, even with 40,000 unique objects. This is the direct result of instancing.
Natural Variation: Look closely at the field. You'll see subtle differences in the height, color, and static lean of each blade. This is the hash_positional function at work.
Dynamic Wind: The grass isn't just moving back and forth. The value_noise_2d function creates wave-like patterns that travel across the field, creating a more believable and organic effect.
No Repetition: Because all variations are derived from the unique world position of each blade, you won't see any tiling or repeating patterns, no matter how large the field is.

Key Takeaways

Instancing Solves the Draw Call Problem: Instancing is the go-to solution when you need to render hundreds or thousands of similar objects, drastically reducing CPU overhead by combining many draws into one.
Bevy Automates Instancing: If you spawn multiple entities with the same mesh and material handles, Bevy's renderer will automatically batch them into a single, high-performance instanced draw call.
@builtin(instance_index) is the Key: This WGSL built-in variable is the unique identifier for each instance within a shader, allowing you to apply custom logic.
Positional Variation is Powerful: Using the instance_index to get the object's model matrix and then deriving variations from its world position is a robust, efficient, and flexible pattern for creating natural-looking scenes.
Procedural Animation on the GPU: Complex animations like wind can be simulated directly in the vertex shader, offloading the work from the CPU and enabling massive-scale effects.

What's Next?

You've now learned how to optimize rendering for a massive number of objects by moving variation logic to the GPU. This completes our deep dive into the vertex shader. We have transformed vertices, handled normals correctly, and now rendered thousands of instances efficiently.

In the next phase, we'll shift our focus from the shape and position of our objects to what gives them color and life. We will dive headfirst into the Fragment Shader, learning how to control the color of every single pixel on our screen.

Next up: 2.8 - Vertex Shader Optimization

Quick Reference

Instancing: A rendering technique to draw one mesh many times in a single draw call, providing unique per-instance data (like transforms) for variation.
@builtin(instance_index): A WGSL vertex shader input that provides the zero-based index of the current instance being processed.
Bevy's Automatic Instancing Trigger: Spawning entities that share the exact same Handle<Mesh> and Handle<Material>.
Positional Variation Pattern:
1. In WGSL, get the instance's model matrix: let model = mesh_functions::get_world_from_local(in.instance_index);
2. Extract the world position: let pos = model[3].xyz;
3. Use pos as input to a hash/noise function to generate consistent random values.
4. Apply these values to modify vertex attributes like position, color, etc.
When to Use Instancing: For any scene with large quantities (>100) of similar objects: grass, trees, rocks, bullets, particles, asteroids, etc.
Limitations: Generally not suitable for objects requiring back-to-front sorting for transparency. Culling is performed on the entire group, not individual instances.

2.7 - Instanced Rendering

What We're Learning

The Problem: Why Render a Thousand Things Slowly?

The Cost of a Conversation

The Solution: Instanced Rendering

The Magic Variable: `@builtin(instance_index)`

Accessing the Index in WGSL

From Index to Positional Variation

The Core Pattern

For Full Control: Per-Instance Data Buffers

What is a Storage Buffer?

A Glimpse into the Code

Instancing the Bevy Way: Automatic Batching

The Rules of Automatic Batching

Performance Reality Check

Limitations to Keep in Mind

Creating Natural Variation

Complete Example: Procedural Field of Grass

Our Goal

What This Project Demonstrates

The Shader (`assets/shaders/d02_07_grass_instancing.wgsl`)

The Rust Material (`src/materials/d02_07_grass_instancing.rs`)

The Demo Module (`src/demos/d02_07_grass_instancing.rs`)

Running the Demo

Controls

What You're Seeing

Key Takeaways

What's Next?

Quick Reference

Comments

Learning WGSL Shaders with Bevy 0.16: A Practical Journey

2.8 - Vertex Shader Optimization

More from this blog

4.3 - Texture Wrapping Modes

4.2 - Texture Filtering and Mipmapping

4.1 - Texture Sampling Basics

3.8 - Advanced Color Techniques

3.7 - Fragment Discard and Transparency

Command Palette

What We're Learning

The Problem: Why Render a Thousand Things Slowly?

The Cost of a Conversation

The Solution: Instanced Rendering

The Magic Variable: @builtin(instance_index)

Accessing the Index in WGSL

From Index to Positional Variation

The Core Pattern

For Full Control: Per-Instance Data Buffers

What is a Storage Buffer?

A Glimpse into the Code

Instancing the Bevy Way: Automatic Batching

The Rules of Automatic Batching

Performance Reality Check

Limitations to Keep in Mind

Creating Natural Variation

Complete Example: Procedural Field of Grass

Our Goal

What This Project Demonstrates

The Shader (assets/shaders/d02_07_grass_instancing.wgsl)

The Rust Material (src/materials/d02_07_grass_instancing.rs)

The Demo Module (src/demos/d02_07_grass_instancing.rs)

Running the Demo

Controls

What You're Seeing

Key Takeaways

What's Next?

Quick Reference

Comments

Learning WGSL Shaders with Bevy 0.16: A Practical Journey

2.8 - Vertex Shader Optimization

More from this blog

The Magic Variable: `@builtin(instance_index)`

The Shader (`assets/shaders/d02_07_grass_instancing.wgsl`)

The Rust Material (`src/materials/d02_07_grass_instancing.rs`)

The Demo Module (`src/demos/d02_07_grass_instancing.rs`)