Bevy & WGSL: Camera & Projection Matrices

What We're Learning

In the previous article, we seized control of our geometry. We learned how to manipulate vertex positions in local space to create dynamic waves, twists, and other deformations. To get our transformed geometry onto the screen, we relied on a convenient Bevy helper function: position_world_to_clip. This function acted as a "black box," handling the complex math of camera positioning and lens projection for us. Now, it's time to open that box.

This article is about mastering the final, crucial steps of the rendering pipeline: the journey from a shared 3D world into the 2D plane of your monitor. The camera is your window into the scene; it dictates what the player sees, from what angle, and with what sense of perspective. By understanding and building the matrices that power it - the View Matrix and the Projection Matrix - you move from simply using a camera to creating bespoke visual experiences. This knowledge is the foundation for custom camera effects, non-standard rendering styles, and debugging tricky visual artifacts.

By the end of this article, you will be able to:

Explain the role of the View Matrix and manually construct one using the "look-at" method to position and orient the camera.
Differentiate between Perspective and Orthographic projection, understanding when and why to use each.
Build a Perspective Projection Matrix from scratch, controlling key lens parameters like Field of View (FOV), aspect ratio, and clipping planes.
Understand the "Perspective Divide" and how the w coordinate creates the illusion of depth.
Implement the full Model-View-Projection (MVP) pipeline in a shader to gain complete control over vertex transformation.
Recognize how this theory maps directly to Bevy's Camera and Projection components.

The Transformation Pipeline Revisited

To understand the camera's role, we must complete the map of the journey each vertex takes from a 3D model file to a pixel on your screen. This journey is a series of coordinate space transformations, each handled by a specific matrix.

In the last article, we focused on the first step: using a Model Matrix to move vertices from their private Local Space into the shared scene, or World Space. We then handed off the result to Bevy's position_world_to_clip function. Let's now deconstruct that function and complete the picture.

Local Space (A model's private coordinates)
    │
    └─[Model Matrix]───────> Places the model in the scene.
    │
World Space (The shared scene's coordinates)
    │
    └─[View Matrix]────────> Moves the entire world so the camera is at the origin.
    │
View Space (The world from the camera's perspective)
    │
    └─[Projection Matrix] -> Flattens the 3D view into a 2D image with perspective.
    │
Clip Space (A standardized cube, ready for the GPU)

The Bevy helper function we used, position_world_to_clip, encapsulates the last two, most crucial steps of this process. It is simply a convenient shortcut for two sequential matrix multiplications:

// This single Bevy function...
let clip_position = position_world_to_clip(world_position.xyz);

// ...is a shortcut for this:
let view_position = view_matrix * vec4<f32>(world_position.xyz, 1.0);
let clip_position = projection_matrix * view_position;

The goal of this article is to build view_matrix and projection_matrix from first principles. Once you master these, you will have complete, end-to-end control over the rendering pipeline. Let's start with the View Matrix.

Part 1: The View Matrix - Positioning the Camera

The View Matrix has a single, crucial job: to transform the entire world from its shared World Space coordinates into View Space, a new coordinate system defined from the camera's unique perspective. In essence, it repositions every vertex in the scene so that the camera becomes the new center of the universe, with everything else arranged around it.

The Inverse Relationship

Here is the most critical concept to understand: the view matrix is the mathematical inverse of the camera's own transformation matrix in the world.

Think about it intuitively:

If you move your camera 10 units to the right (+X), the entire world appears to shift 10 units to the left (-X) from your perspective.
If you rotate your camera 30 degrees clockwise, the world appears to rotate 30 degrees counter-clockwise.

The view matrix applies this opposite, or inverse, transformation to every vertex in the world. This is what creates the illusion of a moving camera.

Mathematically, the relationship is simple and elegant: view_matrix = inverse(camera_world_matrix). This camera_world_matrix is the standard model matrix that would place and orient the camera object itself in world space.

Understanding View Space

To grasp what the view matrix does, you must first understand its destination: View Space. This is a standardized coordinate system where the camera is always at the origin, looking in a fixed direction.

Origin (0, 0, 0): The camera's exact position.
-Z Axis: The direction the camera is looking (forward).
+Y Axis: The camera's "up" direction.
+X Axis: The direction to the camera's right.

This convention of "looking down the negative Z-axis" is a long-standing practice in graphics, stemming from the math of right-handed coordinate systems.

Constructing a "Look-At" View Matrix

While inverse(camera_world_matrix) is conceptually correct, calculating a full matrix inverse is computationally expensive and unnecessary. A more direct and efficient method exists. Most of the time, it's far more intuitive to define a camera's orientation by stating:

Where the camera is (eye).
What it's looking at (target).
Which general direction is "up" (usually the world's up vector, vec3(0.0, 1.0, 0.0)).

From these three pieces of information, we can derive the necessary forward, right, and up vectors for the camera's local coordinate system and construct our view matrix directly. This is universally known as a "look-at" function.

fn look_at(
    eye: vec3<f32>,      // The camera's world position
    target: vec3<f32>,   // The point the camera is looking at
    world_up: vec3<f32>  // The world's up direction (e.g., vec3(0.0, 1.0, 0.0))
) -> mat4x4<f32> {
    // 1. Calculate the forward vector (z-axis of the camera's space).
    // This is the direction from the target TO the eye.
    // It points OUT of the screen, aligning with our desired +Z view space axis.
    let z_axis = normalize(eye - target);

    // 2. Calculate the right vector (x-axis).
    // The cross product gives a vector perpendicular to two others.
    let x_axis = normalize(cross(world_up, z_axis));

    // 3. Recalculate the true camera up vector (y-axis).
    // This ensures all three axes are mutually perpendicular (an orthonormal basis).
    let y_axis = cross(z_axis, x_axis);

    // 4. Construct the matrix columns.
    // The first three columns define the inverse rotation by using the camera's
    // axes as the basis vectors.
    let col0 = vec4(x_axis, 0.0);
    let col1 = vec4(y_axis, 0.0);
    let col2 = vec4(z_axis, 0.0);

    // The fourth column defines the inverse translation. It moves the world
    // in the opposite direction of the camera's position.
    let col3 = vec4(
        -dot(x_axis, eye),
        -dot(y_axis, eye),
        -dot(z_axis, eye),
        1.0
    );

    // The WGSL mat4x4 constructor takes columns, not rows.
    return mat4x4<f32>(col0, col1, col2, col3);
}

Why the negative dot products? The dot() product calculates how far along one vector another vector lies. -dot(x_axis, eye) tells us "how much of the camera's position is in its own 'right' direction?" and then negates it. By doing this for all three axes, we find the exact opposite translation required to move the camera back to the origin (0,0,0).

Why recalculate up? The initial world_up vector is a guide. If the camera is looking straight up or down, the initial x_axis calculation could fail (the cross product of two parallel vectors is zero). By recalculating the y_axis from the new z_axis and x_axis, we guarantee the camera's local axes form a perfect, stable, 90-degree coordinate system.

Testing Your View Matrix

A correctly constructed view matrix will transform world-space coordinates into view-space. You can verify this with a few key checks:

A vertex at the camera's world position should be transformed to the origin (0, 0, 0).
A vertex located directly in front of the camera should be transformed to a position with a negative Z value.
A vertex to the right of the camera should have a positive X value.
A vertex above the camera should have a positive Y value.

Part 2: Projection Matrices - From 3D to 2D

We have successfully transformed our world into a camera-centric view. Now, we face the final challenge: how do we represent this 3D view on a 2D screen? This is the job of the Projection Matrix. It takes our 3D view-space coordinates and squashes them into a standardized 2D space that the GPU can map to pixels.

This process is analogous to how a real camera lens works, focusing light from a three-dimensional world onto a flat two-dimensional sensor. In computer graphics, we primarily use two types of "lenses" or projections.

(Note: The following matrix functions are the classic implementations, perfect for learning the fundamental concepts. In the next section, we'll see how Bevy uses a slightly modified "Reverse-Z" version for improved precision.)

Orthographic Projection: Parallel Lines Stay Parallel

An orthographic projection is the simplest type. It maps 3D coordinates directly to 2D coordinates without any perspective. This means an object's size on screen does not change with its distance from the camera. Parallel lines in the 3D world remain parallel on the 2D screen.

When to use orthographic projection:

2D games, user interfaces (UI), and sprite-based rendering.
Architectural blueprints and CAD (Computer-Aided Design) applications.
Strategy games with top-down or isometric views.

The orthographic projection matrix transforms a rectangular box of view space (defined by left, right, top, bottom, near, and far planes) into the GPU's normalized clip space cube.

fn orthographic_projection(
    left: f32, right: f32,
    bottom: f32, top: f32,
    near: f32, far: f32
) -> mat4x4<f32> {
    let width = right - left;
    let height = top - bottom;
    let depth = far - near;

    // Column-major construction
    let col0 = vec4(2.0 / width, 0.0, 0.0, 0.0);
    let col1 = vec4(0.0, 2.0 / height, 0.0, 0.0);
    let col2 = vec4(0.0, 0.0, -2.0 / depth, 0.0);
    let col3 = vec4(
        -(right + left) / width,
        -(top + bottom) / height,
        -(far + near) / depth,
        1.0
    );

    return mat4x4<f32>(col0, col1, col2, col3);
}

This matrix effectively scales and shifts the view volume. Crucially, the fourth component of a transformed position vector (w) remains 1.0. This is the key reason there is no perspective.

Perspective Projection: Realistic Depth

A perspective projection mimics how the human eye and real-world cameras work: objects that are farther away appear smaller. This is the standard projection for virtually all 3D games and simulations.

When to use perspective projection:

First-person and third-person 3D games.
Realistic simulations and visualizations.
Any application where depth perception is important.

Instead of defining a box, we define a "frustum" using more intuitive parameters like the camera's field of view.

fn perspective_projection(
    fov_y_radians: f32, // Vertical field of view
    aspect_ratio: f32,  // Viewport width / height
    near: f32,          // Near clipping plane distance
    far: f32            // Far clipping plane distance
) -> mat4x4<f32> {
    let f = 1.0 / tan(fov_y_radians / 2.0);
    let range = 1.0 / (near - far);

    // Column-major construction
    let col0 = vec4(f / aspect_ratio, 0.0, 0.0, 0.0);
    let col1 = vec4(0.0, f, 0.0, 0.0);
    let col2 = vec4(0.0, 0.0, (near + far) * range, -1.0);
    let col3 = vec4(0.0, 0.0, 2.0 * near * far * range, 0.0);

    return mat4x4<f32>(col0, col1, col2, col3);
}

Look closely at col2: its fourth component (which will be multiplied by the w of the input vector) is set to -1.0. This means that after multiplication, the final w value of our output position will be equal to its negative z value from view space. This is the secret ingredient for perspective.

The Magic of the Perspective Divide

The real "magic" of perspective projection happens after our vertex shader is finished. The GPU's fixed-function hardware takes the vec4 position we output and automatically performs an operation called the perspective divide.

// Our vertex shader outputs a clip-space position:
// out.position = clip_pos; (a vec4<f32>)

// The GPU automatically does this for every vertex:
let final_ndc_pos = clip_pos.xyz / clip_pos.w;

It divides the x, y, and z components by the w component. Now, let's connect this to our projection matrix. We saw that the matrix was engineered to produce this result:

clip_position.w = -view_position.z

The z value in view space represents the distance from the camera into the scene. By setting w to this distance, the perspective divide scales our vertex positions accordingly.

A point close to the camera:
  view_position.z = -2.0
  clip_position.w = 2.0
  final_x = clip_x / 2.0  (larger on screen)

A point far from the camera:
  view_position.z = -50.0
  clip_position.w = 50.0
  final_x = clip_x / 50.0 (smaller on screen)

This simple division is how perspective is achieved in modern graphics.

Understanding Field of View (FOV)

Field of View, or FOV, is the extent of the observable world seen at any given moment. In our projection matrix, it's the vertical angle of the camera's frustum. It's analogous to the zoom lens on a camera.

Low FOV (30-50°): Creates a "telephoto" or zoomed-in effect.
Medium FOV (60-90°): A standard view that feels natural for most games.
High FOV (90-120°): A wide-angle view. Can cause "fisheye" distortion at the edges of the screen.

Understanding Aspect Ratio

Aspect ratio is the ratio of the viewport's width to its height (width / height). A 1920x1080 screen has an aspect ratio of 16/9 or ~1.777. Our projection matrix needs this value to prevent the image from being stretched. The fov_y_radians parameter defines the vertical opening of our view. We use the aspect ratio to calculate the correct horizontal opening to match the screen's shape. The matrix corrects for this by scaling the X-coordinate: f / aspect_ratio. This makes the view wider than it is tall, matching the viewport's dimensions.

Understanding Near and Far Planes

The near and far parameters define the boundaries of the camera's view frustum. They create two clipping planes.

Anything closer to the camera than the near plane is discarded ("clipped").
Anything farther from the camera than the far plane is also discarded.

These planes are not just for culling geometry; they are essential for the depth buffer. The depth buffer is a texture that stores a depth value (from 0.0 to 1.0) for every pixel. Before drawing a new pixel, the GPU checks the depth buffer. If the new pixel is farther away than the one already there, it's discarded. This is how the GPU correctly sorts overlapping objects.

The projection matrix maps the view-space Z range [-near, -far] to the clip-space Z range [0, 1]. However, this mapping is non-linear. It's designed to give more precision to objects closer to the camera. This leads to a critical trade-off:

Depth buffer precision is not distributed evenly.

Imagine the depth buffer as a ruler. In a classic perspective projection, the tick marks on the ruler are densely packed near the camera and spread out farther away.

Setting the near plane too close (e.g., 0.01): You are cramming an enormous amount of the depth buffer's precision into the tiny space right in front of the camera. This leaves very little precision for the rest of the scene, causing distant objects with similar depths to flicker back and forth. This artifact is called Z-fighting.
Setting the far plane too far: You are stretching a finite amount of precision over a vast distance, which also reduces accuracy and can cause Z-fighting.

Best Practice: Keep the far / near ratio as small as possible for your scene's needs (ideally under 1000). Push the near plane out as far as you can without clipping into objects the player should see.

Part 3: Reverse-Z Projection

In the last section, we discussed how the classic projection matrix maps the view-space depth range [-near, -far] to the depth buffer's [0, 1] range. This traditional method has a significant drawback related to how computers store numbers.

Traditional Z-Mapping

The near plane is mapped to a depth of 0.0.
The far plane is mapped to a depth of 1.0.
Standard floating-point numbers (like f32) have the most precision near zero.
Result: Almost all of your depth precision is clustered right in front of the camera, leaving very little for the distant parts of your scene. This is the primary cause of the Z-fighting artifact.

To solve this, modern rendering pipelines, including Bevy's, use a clever technique called "Reverse-Z". The idea is simple but highly effective: we just flip the mapping.

Reverse-Z Mapping

The near plane is mapped to a depth of 1.0.
The far plane is mapped to a depth of 0.0.
Result: The high precision of floating-point numbers (near zero) is now distributed across the far end of the view frustum. This results in a much more even and usable distribution of depth precision across the entire visible range, significantly reducing Z-fighting artifacts.

The implementation is a small tweak to the perspective projection matrix. Bevy also commonly uses an "infinite" far plane, meaning geometry is never clipped for being too far away, which simplifies the matrix further.

// A common form for an infinite far plane with Reverse-Z, which Bevy uses.
fn reverse_z_perspective(
    fov_y_radians: f32,
    aspect_ratio: f32,
    near: f32
) -> mat4x4<f32> {
    let f = 1.0 / tan(fov_y_radians / 2.0);

    // Column-major construction
    let col0 = vec4(f / aspect_ratio, 0.0, 0.0, 0.0);
    let col1 = vec4(0.0, f, 0.0, 0.0);
    // The Z-mapping components are different from the classic matrix
    let col2 = vec4(0.0, 0.0, 0.0, -1.0);
    let col3 = vec4(0.0, 0.0, near, 0.0);

    return mat4x4<f32>(col0, col1, col2, col3);
}

You don't need to implement this yourself when using Bevy's built-in camera, but it's crucial to know that this is happening under the hood. It explains why Bevy's rendering is robust against depth artifacts by default and is a key piece of context for anyone diving deep into the engine's rendering code. From this point forward, when we discuss "the projection matrix," you can assume it's this more robust, modern version.

Part 4: Custom Projection Effects

Understanding how projection matrices are constructed gives you the power to break the rules. By manipulating the transformation pipeline, you can create non-standard camera effects that would be impossible with a standard projection matrix alone.

Fish-Eye Effect

A fisheye lens captures an extremely wide field of view, causing straight lines to appear curved. This "barrel distortion" is the lens's signature characteristic. A standard perspective projection matrix is fundamentally incapable of creating this effect because it is a linear transformation, meaning it is designed to preserve straight lines.

To create a true fisheye effect, we must introduce a non-linear step into our vertex shader.

Perspective Projection's Logic: The distance of a point from the center of the screen is proportional to tan(theta), where theta is the angle of that point from the camera's forward axis. This preserves lines.
Fisheye Projection's Logic: The distance from the center is proportional directly to the angle theta itself. This bends lines.

Implementation in the Vertex Shader:

The most accurate way to implement this is to interrupt the standard transformation pipeline. We transform our vertex into view space, apply our custom non-linear distortion, and then apply the final projection matrix.

// --- In the vertex shader ---

// 1. Transform vertex from world space to view space as usual.
let view_pos = view_matrix * world_position;

// 2. Apply the non-linear fisheye distortion.
// Calculate the distance from the center of the view and the angle from the forward axis.
let xy_distance = length(view_pos.xy);
// -view_pos.z is the distance "into" the screen
let theta = atan2(xy_distance, -view_pos.z); 

// 3. Determine the new, distorted distance from the center.
// Instead of tan(theta), we just use theta.
// The focal_length is derived from the camera's FOV.
let focal_length = 1.0 / tan(fov_y_radians * 0.5);
let fisheye_radius = theta * focal_length;

// 4. Calculate a scaling factor and apply it.
var distorted_view_pos = view_pos;
// Avoid division by zero at the very center of the view.
if (xy_distance > 0.001) {
    let scale = fisheye_radius / xy_distance;
    distorted_view_pos.xy *= scale;
}

// 5. Now that the view-space position is distorted, apply the standard projection.
let clip_pos = projection_matrix * distorted_view_pos;

Dolly Zoom (The "Vertigo" Effect)

Popularized by Alfred Hitchcock's film Vertigo, the dolly zoom is a dramatic cinematic technique. It's achieved by moving the camera towards or away from a subject while simultaneously adjusting the lens's zoom (or FOV) to keep the subject the same size in the frame. The result is that the subject appears stationary while the background seems to either compress or expand dramatically.

This effect isn't a custom shader trick, but rather a manipulation of the camera and projection data you send to the shader each frame from your Rust code.

Implementation (in your Rust code):

Move the camera: In your update system, change the camera's Transform to move it closer to or farther from your target.
Adjust the FOV: In the same system, change the fov property of the PerspectiveProjection component.

As the camera moves closer, you must increase the FOV (zoom out) to keep the subject the same size.
As the camera moves away, you must decrease the FOV (zoom in).

The shader simply receives a different projection_matrix each frame and renders the scene accordingly, creating the iconic effect. Here is what a simple Bevy system to control a dolly zoom might look like.

// A resource to control the dolly zoom effect
#[derive(Resource)]
struct DollyZoom {
    target_entity: Entity,
    // The value that must remain constant: distance_to_target * tan(fov / 2)
    initial_product: f32,
    // A timer to drive the animation, 0.0 to 1.0
    progress: f32,
    start_distance: f32,
    end_distance: f32,
}

fn dolly_zoom_system(
    time: Res<Time>,
    mut dolly: ResMut<DollyZoom>,
    mut camera_query: Query<(&mut Transform, &mut Projection), With<Camera3d>>,
    target_query: Query<&GlobalTransform>,
) {
    let Ok((mut camera_transform, mut projection)) = camera_query.get_single_mut() else { return };
    let Ok(target_transform) = target_query.get(dolly.target_entity) else { return };

    // Animate the effect over a few seconds
    dolly.progress = (dolly.progress + time.delta_secs() * 0.2).fract();
    let current_distance = dolly.start_distance.lerp(dolly.end_distance, dolly.progress);

    // 1. Move the camera
    let direction_to_target = (target_transform.translation() - camera_transform.translation).normalize();
    camera_transform.translation = target_transform.translation() - direction_to_target * current_distance;

    // 2. Adjust the FOV to compensate
    if let Projection::Perspective(ref mut pers) = *projection {
        // Solve for the new FOV using the core relationship
        let new_half_fov_tan = dolly.initial_product / current_distance;
        let new_fov_rad = 2.0 * new_half_fov_tan.atan();
        pers.fov = new_fov_rad;
    }
}

Part 5: Accessing Bevy's View and Projection

While building matrices from scratch in WGSL is a fantastic learning exercise, it's not something you'll do every day. Bevy's renderer, of course, already calculates the view and projection matrices for every active camera. Our job is to get that data from Bevy into our shader.

There are two primary ways to do this, each with its own use case: accessing Bevy's global view uniform directly, and passing the data through our own custom material.

The Global View Uniform (With a Big Caveat)

Bevy prepares a large uniform buffer containing all the data for the current view and binds it for many of its internal rendering passes. This View uniform is available at a well-known location: bind group 0, binding 0.

// Bevy's built-in View uniform struct (simplified)
struct View {
    view_proj: mat4x4<f32>,       // The final combined view * projection matrix
    inverse_view_proj: mat4x4<f32>,
    view: mat4x4<f32>,             // The view matrix only
    inverse_view: mat4x4<f32>,     // The camera's world matrix
    projection: mat4x4<f32>,       // The projection matrix only
    inverse_projection: mat4x4<f32>,
    world_position: vec3<f32>,     // The camera's world position
    // ... and many more fields for time, viewport size, etc.
};

@group(0) @binding(0)
var<uniform> view: View;

You could, in theory, add this to your shader and use Bevy's data directly:

// This calculates the final position just like Bevy's internal shaders.
let clip_pos = view.view_proj * world_position;

WARNING: Do NOT do this in a standard Material!

Bevy's material system uses bind group 1 for material-specific data and bind group 2 for mesh-level data. Bind group 0 is reserved for view-level data managed by Bevy's PBR pipeline. If you try to define @group(0) @binding(0) in your custom material's shader, it will cause a binding conflict with the data Bevy is already providing, leading to crashes or unpredictable behavior.

When is it safe to use the global View uniform?

In compute shaders.
In full-screen post-processing effects.
In custom render pipelines where you are not using Bevy's Material trait.

For our purposes in this curriculum, we will avoid this method and use the safer, more flexible approach.

The Safe Approach: Material Uniforms

The correct and most robust way to get camera data into a custom material is to pass it in yourself. We treat the camera's matrices just like any other data we want to control, like a color or a time value.

This involves three steps:

1. Define a uniform struct in your material:

// In your material's Rust code
// ...

#[derive(ShaderType, Clone)] // ShaderType is crucial
pub struct CameraData {
    pub view_proj: Mat4,
    pub position: Vec3,
}

#[derive(Asset, TypePath, AsBindGroup, Clone)]
pub struct MyCustomMaterial {
    // This will be bound to @group(1) @binding(0) by default
    #[uniform(0)]
    pub camera: CameraData,
    #[uniform(1)]
    pub color: Color,
}

2. Create a Bevy system to update this data every frame:

This system queries for the active camera, gets its transform and projection data, and iterates through all assets of your material type, updating them with the latest values.

// In your app's systems
fn update_material_camera_data(
    camera_query: Query<(&GlobalTransform, &Projection), With<Camera3d>>,
    mut materials: ResMut<Assets<MyCustomMaterial>>,
) {
    let Ok((camera_transform, projection)) = camera_query.get_single() else { return };

    let view_matrix = camera_transform.compute_matrix().inverse();
    let view_proj = projection.get_projection_matrix() * view_matrix;

    for (_, material) in materials.iter_mut() {
        material.camera.view_proj = view_proj;
        material.camera.position = camera_transform.translation();
    }
}

3. Use the data in your shader:

Now your shader can access this data from its own bind group (@group(1) for a Material that also uses mesh data, or @group(2) if it's a Material on a Mesh3d without a StandardMaterial handle), completely avoiding any conflicts with Bevy's internal bindings.

// In your shader.wgsl
struct CameraData {
    view_proj: mat4x4<f32>,
    position: vec3<f32>,
};

struct MyMaterial {
    camera: CameraData,
    color: vec4<f32>,
};

// Assuming this material is used with Mesh3d/MeshMaterial3d
@group(2) @binding(0)
var<uniform> material: MyMaterial;

// ... in your vertex function
let clip_pos = material.camera.view_proj * world_position;

This pattern is more work to set up initially, but it is the correct, conflict-free way to work with the Material trait. It also gives you the flexibility to send different camera data to different materials if you ever needed to.


---

## Complete Example: Interactive Camera Explorer

Now, let's put all this theory into practice. We will build an interactive demo that allows you to switch between perspective and orthographic projection on the fly. You will be able to orbit a scene of simple cubes, adjust the field of view, and see exactly how these changes affect the final rendering.

This project will solidify your understanding of how camera matrices are not just theoretical constructs but are the primary tools for defining the look and feel of a 3D scene.

### Our Goal

We will create a custom material and shader that visualizes our camera logic. A Rust system will build the View and Projection matrices from scratch based on interactive controls. We will use Bevy's standard transformation for the geometry to ensure stability, but we will pass our custom camera parameters to the fragment shader to drive distance-based fog and color coding, helping us visualize the difference between projection modes.

### What This Project Demonstrates

* **Manual Matrix Construction:** Building `look_at` (view) and `perspective`/`orthographic` (projection) matrices in Rust.

* **Uniform Data Flow:** Passing complex camera data from a Rust system into a custom `Material`'s uniform buffer.

* **Complete Vertex Transformation:** Implementing the full `projection * view * model * position` pipeline in a WGSL vertex shader for all modes.

* **Shader-Based Branching:** Using a `u32` uniform to switch between different rendering modes (perspective, ortho, fisheye) inside the shader.

* **Interactive Feedback:** Connecting keyboard inputs to camera parameters (FOV, distance, projection type) to provide a tangible feel for each concept.


### The Shader (`assets/shaders/d02_02_multi_projection.wgsl`)

The vertex shader uses Bevy's built-in `position_world_to_clip` for the geometry, ensuring our mesh is placed correctly on screen. However, we pass our custom camera data to the fragment shader to visualize the different modes: Perspective mode gets distance-based fog (which relies on camera position), while Orthographic mode gets a distinct flat coloring style.

```rust
#import bevy_pbr::mesh_functions
#import bevy_pbr::view_transformations
#import bevy_pbr::forward_io::VertexOutput

struct CameraUniforms {
    view_matrix: mat4x4<f32>,
    projection_matrix: mat4x4<f32>,
    camera_position: vec3<f32>,
    projection_type: u32,  // 0=perspective, 1=orthographic
    fov: f32,  // Field of view in radians
    ortho_size: f32,
    time: f32,
}

@group(2) @binding(0)
var<uniform> camera: CameraUniforms;

@vertex
fn vertex(
    @builtin(instance_index) instance_index: u32,
    @location(0) position: vec3<f32>,
    @location(1) normal: vec3<f32>,
) -> VertexOutput {
    var out: VertexOutput;

    let world_from_local = mesh_functions::get_world_from_local(instance_index);
    let world_position = mesh_functions::mesh_position_local_to_world(
        world_from_local,
        vec4<f32>(position, 1.0)
    );

    out.position = bevy_pbr::view_transformations::position_world_to_clip(world_position.xyz);

    // Pass data to fragment shader
    out.world_position = world_position;
    out.world_normal = mesh_functions::mesh_normal_local_to_world(normal, instance_index);

    return out;
}

@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
    let normal = normalize(in.world_normal);

    // Calculate distance from camera
    let to_camera = in.world_position.xyz - camera.camera_position;
    let distance = length(to_camera);

    // Color based on projection type
    var base_color = vec3<f32>(0.0);

    if camera.projection_type == 0u {
        // Perspective - blue
        base_color = vec3<f32>(0.3, 0.5, 1.0);
    } else{
        // Orthographic - green
        base_color = vec3<f32>(0.3, 1.0, 0.5);
    }

    // Simple lighting
    let light_dir = normalize(vec3<f32>(
        cos(camera.time),
        0.5,
        sin(camera.time)
    ));
    let diffuse = max(0.3, dot(normal, light_dir));

    // Distance-based fog for perspective
    if camera.projection_type == 0u {
        let fog_start = 10.0;
        let fog_end = 40.0;
        let fog_factor = clamp((distance - fog_start) / (fog_end - fog_start), 0.0, 1.0);
        base_color = mix(base_color, vec3<f32>(0.5, 0.5, 0.6), fog_factor * 0.5);
    }

    return vec4<f32>(base_color * diffuse, 1.0);
}

The Rust Material (`src/materials/d02_02_multi_projection.rs`)

This file defines the data structure that will be passed from the CPU to the GPU. It contains our manually constructed matrices, the camera's position for lighting calculations, and several parameters to control the projection modes. Note the padding fields, which are necessary to ensure the struct's memory layout in Rust matches WGSL's expectations.

use bevy::prelude::*;
use bevy::render::render_resource::{AsBindGroup, ShaderRef};

mod uniforms {
    #![allow(dead_code)]

    use bevy::prelude::*;
    use bevy::render::render_resource::ShaderType;

    #[derive(ShaderType, Debug, Clone)]
    pub struct CameraUniforms {
        pub view_matrix: Mat4,
        pub projection_matrix: Mat4,
        pub camera_position: Vec3,
        pub projection_type: u32,
        pub fov: f32,
        pub fisheye_strength: f32,
        pub ortho_size: f32,
        pub time: f32,
    }

    impl Default for CameraUniforms {
        fn default() -> Self {
            Self {
                view_matrix: Mat4::IDENTITY,
                projection_matrix: Mat4::IDENTITY,
                camera_position: Vec3::ZERO,
                projection_type: 0,
                fov: 60.0,
                fisheye_strength: 0.5,
                ortho_size: 10.0,
                time: 0.0,
            }
        }
    }
}

pub use uniforms::CameraUniforms;

#[derive(Asset, TypePath, AsBindGroup, Debug, Clone)]
pub struct MultiProjectionMaterial {
    #[uniform(0)]
    pub camera: CameraUniforms,
}

impl Material for MultiProjectionMaterial {
    fn vertex_shader() -> ShaderRef {
        "shaders/d02_02_multi_projection.wgsl".into()
    }

    fn fragment_shader() -> ShaderRef {
        "shaders/d02_02_multi_projection.wgsl".into()
    }
}

Don't forget to add it to src/materials/mod.rs:

// ... other materials
pub mod d02_02_multi_projection;

The Demo Module (`src/demos/d02_02_multi_projection.rs`)

The Rust code sets up our scene and contains the logic for interactivity. The key system is update_materials. It takes the user-controlled parameters, builds the final view and projection matrices from scratch using our own helper functions, and then iterates through every instance of our custom material to update their uniform data.

use crate::materials::d02_02_multi_projection::{CameraUniforms, MultiProjectionMaterial};
use bevy::prelude::*;
use std::f32::consts::PI;

#[derive(Resource)]
struct CameraParams {
    distance: f32, // Distance from target
    angle: f32,    // Horizontal rotation angle
    height: f32,   // Vertical height
    target: Vec3,  // Look-at target
    fov_degrees: f32,
    projection_type: u32, // 0=perspective, 1=orthographic
    ortho_size: f32,
}

impl Default for CameraParams {
    fn default() -> Self {
        Self {
            distance: 15.0,
            angle: 0.0,
            height: 5.0,
            target: Vec3::ZERO,
            fov_degrees: 60.0,
            projection_type: 0,
            ortho_size: 10.0,
        }
    }
}

pub fn run() {
    App::new()
        .add_plugins(DefaultPlugins)
        .add_plugins(MaterialPlugin::<MultiProjectionMaterial>::default())
        .init_resource::<CameraParams>()
        .add_systems(Startup, setup)
        .add_systems(
            Update,
            (
                handle_input,
                update_camera_transform,
                update_materials,
                update_ui,
            ),
        )
        .run();
}

fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<MultiProjectionMaterial>>,
    mut standard_materials: ResMut<Assets<StandardMaterial>>,
) {
    // Create a large grid of cubes to show projection effects
    for x in -10..=10 {
        for z in -10..=10 {
            let distance = ((x * x + z * z) as f32).sqrt();
            let height = (distance * 0.3).sin() * 0.5 + 0.5;

            commands.spawn((
                Mesh3d(meshes.add(Cuboid::new(0.8, height + 0.3, 0.8))),
                MeshMaterial3d(materials.add(MultiProjectionMaterial {
                    camera: CameraUniforms::default(),
                })),
                Transform::from_xyz(x as f32 * 1.5, height * 0.5, z as f32 * 1.5),
            ));
        }
    }

    // Add reference spheres at different distances
    for i in 0..=8 {
        let i = i - 4;
        let distance = i as f32 * 3.0;
        commands.spawn((
            Mesh3d(meshes.add(Sphere::new(0.5))),
            MeshMaterial3d(standard_materials.add(StandardMaterial {
                base_color: Color::srgb(1.0, 0.5, 0.2),
                ..default()
            })),
            Transform::from_xyz(0.0, 2.0, -distance),
        ));
    }

    // Light
    commands.spawn((
        DirectionalLight {
            illuminance: 10000.0,
            shadows_enabled: true,
            ..default()
        },
        Transform::from_rotation(Quat::from_euler(EulerRot::XYZ, -PI / 4.0, PI / 4.0, 0.0)),
    ));

    // Camera
    let params = CameraParams::default();
    let position = Vec3::new(
        params.distance * params.angle.cos(),
        params.height,
        params.distance * params.angle.sin(),
    );
    commands.spawn((
        Camera3d::default(),
        Transform::from_translation(position).looking_at(params.target, Vec3::Y),
    ));

    // UI
    commands.spawn((
        Text::new(""),
        Node {
            position_type: PositionType::Absolute,
            top: Val::Px(10.0),
            left: Val::Px(10.0),
            ..default()
        },
        TextFont {
            font_size: 16.0,
            ..default()
        },
    ));
}

fn update_camera_transform(
    params: Res<CameraParams>,
    mut camera_query: Query<(&mut Transform, &mut Projection), With<Camera3d>>,
) {
    let Ok((mut transform, mut projection)) = camera_query.single_mut() else {
        return;
    };

    // Calculate camera position from polar coordinates
    let position = Vec3::new(
        params.distance * params.angle.cos(),
        params.height,
        params.distance * params.angle.sin(),
    );

    // Update camera position and orientation
    *transform = Transform::from_translation(position).looking_at(params.target, Vec3::Y);

    // Update camera projection based on type
    match params.projection_type {
        0 => {
            // Perspective projection
            *projection = Projection::Perspective(PerspectiveProjection {
                fov: params.fov_degrees.to_radians(),
                near: 0.1,
                far: 1000.0,
                aspect_ratio: 1.0, // Will be updated by Bevy
            });
        }
        1 => {
            // Orthographic projection
            let scale = params.ortho_size;
            *projection = Projection::Orthographic(OrthographicProjection {
                near: -1000.0,
                far: 1000.0,
                viewport_origin: Vec2::new(0.5, 0.5),
                scaling_mode: bevy::render::camera::ScalingMode::FixedVertical {
                    viewport_height: scale * 2.0,
                },
                scale: 1.0,
                area: bevy::math::Rect {
                    min: Vec2::new(-scale, -scale),
                    max: Vec2::new(scale, scale),
                },
            });
        }
        _ => {}
    }
}

fn update_materials(
    time: Res<Time>,
    params: Res<CameraParams>,
    windows: Query<&Window>,
    camera_query: Query<&Transform, With<Camera3d>>,
    mut materials: ResMut<Assets<MultiProjectionMaterial>>,
) {
    let Ok(window) = windows.single() else {
        return;
    };
    let Ok(camera_transform) = camera_query.single() else {
        return;
    };

    let aspect = window.width() / window.height();
    let position = camera_transform.translation;

    // Build view matrix
    let view_matrix = build_view_matrix(position, params.target, Vec3::Y);

    // Build projection matrix based on type
    let projection_matrix = match params.projection_type {
        0 => build_perspective_matrix(params.fov_degrees, aspect, 0.1, 1000.0),
        1 => build_orthographic_matrix(params.ortho_size, aspect, -1000.0, 1000.0),
        _ => build_perspective_matrix(params.fov_degrees, aspect, 0.1, 1000.0),
    };

    // Update all materials
    for (_, material) in materials.iter_mut() {
        material.camera.view_matrix = view_matrix;
        material.camera.projection_matrix = projection_matrix;
        material.camera.camera_position = position;
        material.camera.projection_type = params.projection_type;
        material.camera.fov = params.fov_degrees.to_radians(); // Convert to radians for shader
        material.camera.ortho_size = params.ortho_size;
        material.camera.time = time.elapsed_secs();
    }
}

fn build_view_matrix(eye: Vec3, target: Vec3, up: Vec3) -> Mat4 {
    let forward = (eye - target).normalize();
    let right = up.cross(forward).normalize();

    let camera_up = right.cross(forward);

    Mat4::from_cols(
        right.extend(0.0),
        camera_up.extend(0.0),
        forward.extend(0.0),
        Vec4::new(-right.dot(eye), -camera_up.dot(eye), -forward.dot(eye), 1.0),
    )
}

fn build_perspective_matrix(fov_degrees: f32, aspect: f32, near: f32, far: f32) -> Mat4 {
    let fov_rad = fov_degrees * PI / 180.0;
    let f = 1.0 / (fov_rad / 2.0).tan();
    let range = 1.0 / (near - far);

    Mat4::from_cols(
        Vec4::new(f / aspect, 0.0, 0.0, 0.0),
        Vec4::new(0.0, f, 0.0, 0.0),
        Vec4::new(0.0, 0.0, (near + far) * range, -1.0),
        Vec4::new(0.0, 0.0, 2.0 * near * far * range, 0.0),
    )
}

fn build_orthographic_matrix(size: f32, aspect: f32, near: f32, far: f32) -> Mat4 {
    let width = size * aspect;
    let height = size;
    let depth = far - near;

    Mat4::from_cols(
        Vec4::new(2.0 / width, 0.0, 0.0, 0.0),
        Vec4::new(0.0, 2.0 / height, 0.0, 0.0),
        Vec4::new(0.0, 0.0, -2.0 / depth, 0.0),
        Vec4::new(0.0, 0.0, -(far + near) / depth, 1.0),
    )
}

fn handle_input(
    keyboard: Res<ButtonInput<KeyCode>>,
    mut params: ResMut<CameraParams>,
    time: Res<Time>,
) {
    let delta = time.delta_secs();

    // Switch projection type
    if keyboard.just_pressed(KeyCode::Space) {
        params.projection_type = (params.projection_type + 1) % 2;
        // params.projection_type = params.projection_type + 1;
    }

    // Camera rotation (around target)
    let rotation_speed = 2.0 * delta;
    if keyboard.pressed(KeyCode::ArrowLeft) {
        params.angle -= rotation_speed;
    }
    if keyboard.pressed(KeyCode::ArrowRight) {
        params.angle += rotation_speed;
    }

    // Camera height
    let height_speed = 5.0 * delta;
    if keyboard.pressed(KeyCode::ArrowUp) {
        params.height = (params.height + height_speed).min(20.0);
    }
    if keyboard.pressed(KeyCode::ArrowDown) {
        params.height = (params.height - height_speed).max(1.0);
    }

    // Camera distance
    let distance_speed = 5.0 * delta;
    if keyboard.pressed(KeyCode::Equal) {
        params.distance = (params.distance - distance_speed).max(3.0);
    }
    if keyboard.pressed(KeyCode::Minus) {
        params.distance = (params.distance + distance_speed).min(50.0);
    }

    // FOV/ortho size adjustment
    if keyboard.pressed(KeyCode::KeyQ) {
        params.fov_degrees = (params.fov_degrees - 30.0 * delta).max(10.0);
        params.ortho_size = (params.ortho_size - 5.0 * delta).max(1.0);
    }
    if keyboard.pressed(KeyCode::KeyE) {
        params.fov_degrees = (params.fov_degrees + 30.0 * delta).min(120.0);
        params.ortho_size = (params.ortho_size + 5.0 * delta).min(50.0);
    }
}

fn update_ui(params: Res<CameraParams>, mut text_query: Query<&mut Text>) {
    if !params.is_changed() {
        return;
    }

    for mut text in text_query.iter_mut() {
        let proj_name = match params.projection_type {
            0 => "Perspective".to_string(),
            1 => "Orthographic".to_string(),
            _ => "Unknown".to_string(),
        };

        let zoom_info = match params.projection_type {
            0 => format!("FOV: {:.0}deg", params.fov_degrees),
            1 => format!("Size: {:.1}", params.ortho_size),
            _ => String::new(),
        };

        **text = format!(
            "[SPACE]: Perspective (blue) / Orthographic (green)\n\
             [Arrow Keys] Rotate Camera | [=/-] Camera Distance\n\
             [Q/E] FOV/Zoom\n\
             Projection: {}\n\
             {} | Distance: {:.1}\n\
             Angle: {:.0}deg | Height: {:.1}",
            proj_name,
            zoom_info,
            params.distance,
            params.angle.to_degrees(),
            params.height
        );
    }
}

Don't forget to add it to src/demos/mod.rs:

// ... other demoss
pub mod d02_02_multi_projection;

And register it in src/main.rs:

Demo {
    number: "2.2",
    title: "Camera and Projection Matrices",
    run: demos::d02_02_multi_projection::run,
},

Running the Demo

When you run the application, you will be greeted by a grid of cubes. You can use the keyboard to manipulate the camera and projection in real-time.

Controls

Key(s)	Action
Space	Toggle between Perspective and Orthographic modes
Arrow Keys	Orbit the camera around the center of the scene
W / S	Increase / Decrease the camera's height
\= / -	Move the camera closer or farther away
Q / E	Decrease / Increase FOV or Orthographic Size

What You're Seeing

Mode	Description
Perspective	(Blue Objects) Standard perspective. Notice how the parallel lines of the grid appear to converge at a vanishing point in the distance. Objects farther away are smaller.
Orthographic	(Green Objects) All cubes appear the same size, no matter their distance. Parallel lines remain perfectly parallel. The scene looks flat, like a technical diagram.

FOV/Size: Experiment with Q and E in each mode. In perspective/fisheye, you are changing the "zoom" of the lens. In orthographic, you are changing the size of the visible rectangular area.

Key Takeaways

This article demystified the "black box" of camera transformations. Before moving on, ensure you have a solid grasp of these core concepts:

The Full MVP Pipeline: You now understand the complete vertex transformation journey: Model → World → View → Clip. You can implement this full projection * view * model * position multiplication chain in a shader to gain ultimate control over where a vertex appears on screen.
View Matrix: Its purpose is to transform the entire world into a camera-centric coordinate system (View Space), where the camera is at the origin looking down the -Z axis. It is the mathematical inverse of the camera's world transformation.
Look-At Matrix: This is the most common way to construct a view matrix, using an eye position, a target point, and an up vector to define the camera's orientation.
Orthographic vs. Perspective: Orthographic projection preserves size and parallel lines, ideal for 2D or technical views. Perspective projection simulates depth by making distant objects appear smaller.
The Perspective Divide: The "magic" of perspective comes from the GPU automatically dividing the final xyz coordinates by the w coordinate. The projection matrix is engineered to store the vertex's distance in this w component.
Frustum Parameters: The shape of the camera's view is defined by its Field of View (FOV), Aspect Ratio, and the Near/Far Clipping Planes.
Depth Buffer Precision: The distribution of depth buffer accuracy is non-linear. Setting the near plane too close is a common cause of Z-fighting artifacts. Bevy uses Reverse-Z mapping by default to improve this distribution.
Bevy Integration: The safest and most robust way to get camera data into a custom Material is to pass it in via your own uniform, which is updated each frame by a Rust system, avoiding binding conflicts.

What's Next?

You now have end-to-end control over the vertex transformation pipeline, from a model's local space all the way to the screen's clip space. You understand how to place an object in the world and how to define the camera that views it.

In the next article, we will shift our focus. Instead of just transforming the position of a vertex, we will learn how to read, interpret, and pass along other crucial pieces of data embedded in our meshes - like normals for lighting, UVs for texturing, and vertex colors for unique styling. This will unlock a whole new dimension of visual effects and prepare us to add color and texture to our custom geometry.

Next up: 2.3 - Working with Vertex Attributes

Quick Reference

Core Transformations

Matrix	Input Space	Output Space	Primary Role	Key Insight
Model	Local Space	World Space	Places and orients the object in the scene.	`World = Model * Local`
View	World Space	View Space	Moves the entire world so the camera is at the origin.	Inverse of the camera's world transform.
Projection	View Space	Clip Space	Flattens the scene, applying perspective or ortho rules.	Sets the vertex's distance in the W component.
MVP	Local Space	Clip Space	The combined transformation: `Proj * View * Model`.	Final position sent to the GPU.

Projection Types

Type	Visual Effect	W Component	Precision	Ideal Use Case
Perspective	Distant objects appear smaller. Parallel lines converge.	Calculated as -Z (distance)	Non-linear, front-loaded (unless Reverse-Z).	3D games, realistic rendering.
Orthographic	All objects appear same size. Parallel lines remain parallel.	Fixed at 1.0	Linear (precision is evenly distributed).	2D/UI, technical drawings.

Command Palette