2.2 - Camera and Projection Matrices

What We're Learning
In the previous article, we seized control of our geometry. We learned how to manipulate vertex positions in local space to create dynamic waves, twists, and other deformations. To get our transformed geometry onto the screen, we relied on a convenient Bevy helper function: position_world_to_clip. This function acted as a "black box," handling the complex math of camera positioning and lens projection for us. Now, it's time to open that box.
This article is about mastering the final, crucial steps of the rendering pipeline: the journey from a shared 3D world into the 2D plane of your monitor. The camera is your window into the scene; it dictates what the player sees, from what angle, and with what sense of perspective. By understanding and building the matrices that power it - the View Matrix and the Projection Matrix - you move from simply using a camera to creating bespoke visual experiences. This knowledge is the foundation for custom camera effects, non-standard rendering styles, and debugging tricky visual artifacts.
By the end of this article, you will be able to:
Explain the role of the View Matrix and manually construct one using the "look-at" method to position and orient the camera.
Differentiate between Perspective and Orthographic projection, understanding when and why to use each.
Build a Perspective Projection Matrix from scratch, controlling key lens parameters like Field of View (FOV), aspect ratio, and clipping planes.
Understand the "Perspective Divide" and how the
wcoordinate creates the illusion of depth.Implement the full Model-View-Projection (MVP) pipeline in a shader to gain complete control over vertex transformation.
Recognize how this theory maps directly to Bevy's
CameraandProjectioncomponents.
The Transformation Pipeline Revisited
To understand the camera's role, we must complete the map of the journey each vertex takes from a 3D model file to a pixel on your screen. This journey is a series of coordinate space transformations, each handled by a specific matrix.
In the last article, we focused on the first step: using a Model Matrix to move vertices from their private Local Space into the shared scene, or World Space. We then handed off the result to Bevy's position_world_to_clip function. Let's now deconstruct that function and complete the picture.
Local Space (A model's private coordinates)
│
└─[Model Matrix]───────> Places the model in the scene.
│
World Space (The shared scene's coordinates)
│
└─[View Matrix]────────> Moves the entire world so the camera is at the origin.
│
View Space (The world from the camera's perspective)
│
└─[Projection Matrix] -> Flattens the 3D view into a 2D image with perspective.
│
Clip Space (A standardized cube, ready for the GPU)
The Bevy helper function we used, position_world_to_clip, encapsulates the last two, most crucial steps of this process. It is simply a convenient shortcut for two sequential matrix multiplications:
// This single Bevy function...
let clip_position = position_world_to_clip(world_position.xyz);
// ...is a shortcut for this:
let view_position = view_matrix * vec4<f32>(world_position.xyz, 1.0);
let clip_position = projection_matrix * view_position;
The goal of this article is to build view_matrix and projection_matrix from first principles. Once you master these, you will have complete, end-to-end control over the rendering pipeline. Let's start with the View Matrix.
Part 1: The View Matrix - Positioning the Camera
The View Matrix has a single, crucial job: to transform the entire world from its shared World Space coordinates into View Space, a new coordinate system defined from the camera's unique perspective. In essence, it repositions every vertex in the scene so that the camera becomes the new center of the universe, with everything else arranged around it.
The Inverse Relationship
Here is the most critical concept to understand: the view matrix is the mathematical inverse of the camera's own transformation matrix in the world.
Think about it intuitively:
If you move your camera 10 units to the right (
+X), the entire world appears to shift 10 units to the left (-X) from your perspective.If you rotate your camera 30 degrees clockwise, the world appears to rotate 30 degrees counter-clockwise.
The view matrix applies this opposite, or inverse, transformation to every vertex in the world. This is what creates the illusion of a moving camera.

Mathematically, the relationship is simple and elegant: view_matrix = inverse(camera_world_matrix). This camera_world_matrix is the standard model matrix that would place and orient the camera object itself in world space.
Understanding View Space
To grasp what the view matrix does, you must first understand its destination: View Space. This is a standardized coordinate system where the camera is always at the origin, looking in a fixed direction.
Origin
(0, 0, 0): The camera's exact position.-Z Axis: The direction the camera is looking (forward).
+Y Axis: The camera's "up" direction.
+X Axis: The direction to the camera's right.

This convention of "looking down the negative Z-axis" is a long-standing practice in graphics, stemming from the math of right-handed coordinate systems.
Constructing a "Look-At" View Matrix
While inverse(camera_world_matrix) is conceptually correct, calculating a full matrix inverse is computationally expensive and unnecessary. A more direct and efficient method exists. Most of the time, it's far more intuitive to define a camera's orientation by stating:
Where the camera is (
eye).What it's looking at (
target).Which general direction is "up" (usually the world's
upvector,vec3(0.0, 1.0, 0.0)).
From these three pieces of information, we can derive the necessary forward, right, and up vectors for the camera's local coordinate system and construct our view matrix directly. This is universally known as a "look-at" function.
fn look_at(
eye: vec3<f32>, // The camera's world position
target: vec3<f32>, // The point the camera is looking at
world_up: vec3<f32> // The world's up direction (e.g., vec3(0.0, 1.0, 0.0))
) -> mat4x4<f32> {
// 1. Calculate the forward vector (z-axis of the camera's space).
// This is the direction from the target TO the eye.
// It points OUT of the screen, aligning with our desired +Z view space axis.
let z_axis = normalize(eye - target);
// 2. Calculate the right vector (x-axis).
// The cross product gives a vector perpendicular to two others.
let x_axis = normalize(cross(world_up, z_axis));
// 3. Recalculate the true camera up vector (y-axis).
// This ensures all three axes are mutually perpendicular (an orthonormal basis).
let y_axis = cross(z_axis, x_axis);
// 4. Construct the matrix columns.
// The first three columns define the inverse rotation by using the camera's
// axes as the basis vectors.
let col0 = vec4(x_axis, 0.0);
let col1 = vec4(y_axis, 0.0);
let col2 = vec4(z_axis, 0.0);
// The fourth column defines the inverse translation. It moves the world
// in the opposite direction of the camera's position.
let col3 = vec4(
-dot(x_axis, eye),
-dot(y_axis, eye),
-dot(z_axis, eye),
1.0
);
// The WGSL mat4x4 constructor takes columns, not rows.
return mat4x4<f32>(col0, col1, col2, col3);
}
Why the negative dot products? The dot() product calculates how far along one vector another vector lies. -dot(x_axis, eye) tells us "how much of the camera's position is in its own 'right' direction?" and then negates it. By doing this for all three axes, we find the exact opposite translation required to move the camera back to the origin (0,0,0).
Why recalculate up? The initial world_up vector is a guide. If the camera is looking straight up or down, the initial x_axis calculation could fail (the cross product of two parallel vectors is zero). By recalculating the y_axis from the new z_axis and x_axis, we guarantee the camera's local axes form a perfect, stable, 90-degree coordinate system.
Testing Your View Matrix
A correctly constructed view matrix will transform world-space coordinates into view-space. You can verify this with a few key checks:
A vertex at the camera's world position should be transformed to the origin
(0, 0, 0).A vertex located directly in front of the camera should be transformed to a position with a negative Z value.
A vertex to the right of the camera should have a positive X value.
A vertex above the camera should have a positive Y value.
Part 2: Projection Matrices - From 3D to 2D
We have successfully transformed our world into a camera-centric view. Now, we face the final challenge: how do we represent this 3D view on a 2D screen? This is the job of the Projection Matrix. It takes our 3D view-space coordinates and squashes them into a standardized 2D space that the GPU can map to pixels.
This process is analogous to how a real camera lens works, focusing light from a three-dimensional world onto a flat two-dimensional sensor. In computer graphics, we primarily use two types of "lenses" or projections.
(Note: The following matrix functions are the classic implementations, perfect for learning the fundamental concepts. In the next section, we'll see how Bevy uses a slightly modified "Reverse-Z" version for improved precision.)
Orthographic Projection: Parallel Lines Stay Parallel
An orthographic projection is the simplest type. It maps 3D coordinates directly to 2D coordinates without any perspective. This means an object's size on screen does not change with its distance from the camera. Parallel lines in the 3D world remain parallel on the 2D screen.

When to use orthographic projection:
2D games, user interfaces (UI), and sprite-based rendering.
Architectural blueprints and CAD (Computer-Aided Design) applications.
Strategy games with top-down or isometric views.
The orthographic projection matrix transforms a rectangular box of view space (defined by left, right, top, bottom, near, and far planes) into the GPU's normalized clip space cube.
fn orthographic_projection(
left: f32, right: f32,
bottom: f32, top: f32,
near: f32, far: f32
) -> mat4x4<f32> {
let width = right - left;
let height = top - bottom;
let depth = far - near;
// Column-major construction
let col0 = vec4(2.0 / width, 0.0, 0.0, 0.0);
let col1 = vec4(0.0, 2.0 / height, 0.0, 0.0);
let col2 = vec4(0.0, 0.0, -2.0 / depth, 0.0);
let col3 = vec4(
-(right + left) / width,
-(top + bottom) / height,
-(far + near) / depth,
1.0
);
return mat4x4<f32>(col0, col1, col2, col3);
}
This matrix effectively scales and shifts the view volume. Crucially, the fourth component of a transformed position vector (w) remains 1.0. This is the key reason there is no perspective.
Perspective Projection: Realistic Depth
A perspective projection mimics how the human eye and real-world cameras work: objects that are farther away appear smaller. This is the standard projection for virtually all 3D games and simulations.

When to use perspective projection:
First-person and third-person 3D games.
Realistic simulations and visualizations.
Any application where depth perception is important.
Instead of defining a box, we define a "frustum" using more intuitive parameters like the camera's field of view.
fn perspective_projection(
fov_y_radians: f32, // Vertical field of view
aspect_ratio: f32, // Viewport width / height
near: f32, // Near clipping plane distance
far: f32 // Far clipping plane distance
) -> mat4x4<f32> {
let f = 1.0 / tan(fov_y_radians / 2.0);
let range = 1.0 / (near - far);
// Column-major construction
let col0 = vec4(f / aspect_ratio, 0.0, 0.0, 0.0);
let col1 = vec4(0.0, f, 0.0, 0.0);
let col2 = vec4(0.0, 0.0, (near + far) * range, -1.0);
let col3 = vec4(0.0, 0.0, 2.0 * near * far * range, 0.0);
return mat4x4<f32>(col0, col1, col2, col3);
}
Look closely at col2: its fourth component (which will be multiplied by the w of the input vector) is set to -1.0. This means that after multiplication, the final w value of our output position will be equal to its negative z value from view space. This is the secret ingredient for perspective.
The Magic of the Perspective Divide
The real "magic" of perspective projection happens after our vertex shader is finished. The GPU's fixed-function hardware takes the vec4 position we output and automatically performs an operation called the perspective divide.
// Our vertex shader outputs a clip-space position:
// out.position = clip_pos; (a vec4<f32>)
// The GPU automatically does this for every vertex:
let final_ndc_pos = clip_pos.xyz / clip_pos.w;
It divides the x, y, and z components by the w component. Now, let's connect this to our projection matrix. We saw that the matrix was engineered to produce this result:
clip_position.w = -view_position.z
The z value in view space represents the distance from the camera into the scene. By setting w to this distance, the perspective divide scales our vertex positions accordingly.
A point close to the camera:
view_position.z = -2.0
clip_position.w = 2.0
final_x = clip_x / 2.0 (larger on screen)
A point far from the camera:
view_position.z = -50.0
clip_position.w = 50.0
final_x = clip_x / 50.0 (smaller on screen)
This simple division is how perspective is achieved in modern graphics.
Understanding Field of View (FOV)
Field of View, or FOV, is the extent of the observable world seen at any given moment. In our projection matrix, it's the vertical angle of the camera's frustum. It's analogous to the zoom lens on a camera.
Low FOV (30-50°): Creates a "telephoto" or zoomed-in effect.
Medium FOV (60-90°): A standard view that feels natural for most games.
High FOV (90-120°): A wide-angle view. Can cause "fisheye" distortion at the edges of the screen.
Understanding Aspect Ratio
Aspect ratio is the ratio of the viewport's width to its height (width / height). A 1920x1080 screen has an aspect ratio of 16/9 or ~1.777. Our projection matrix needs this value to prevent the image from being stretched. The fov_y_radians parameter defines the vertical opening of our view. We use the aspect ratio to calculate the correct horizontal opening to match the screen's shape. The matrix corrects for this by scaling the X-coordinate: f / aspect_ratio. This makes the view wider than it is tall, matching the viewport's dimensions.
Understanding Near and Far Planes
The near and far parameters define the boundaries of the camera's view frustum. They create two clipping planes.
Anything closer to the camera than the near plane is discarded ("clipped").
Anything farther from the camera than the far plane is also discarded.
These planes are not just for culling geometry; they are essential for the depth buffer. The depth buffer is a texture that stores a depth value (from 0.0 to 1.0) for every pixel. Before drawing a new pixel, the GPU checks the depth buffer. If the new pixel is farther away than the one already there, it's discarded. This is how the GPU correctly sorts overlapping objects.
The projection matrix maps the view-space Z range [-near, -far] to the clip-space Z range [0, 1]. However, this mapping is non-linear. It's designed to give more precision to objects closer to the camera. This leads to a critical trade-off:
Depth buffer precision is not distributed evenly.
Imagine the depth buffer as a ruler. In a classic perspective projection, the tick marks on the ruler are densely packed near the camera and spread out farther away.
Setting the
nearplane too close (e.g., 0.01): You are cramming an enormous amount of the depth buffer's precision into the tiny space right in front of the camera. This leaves very little precision for the rest of the scene, causing distant objects with similar depths to flicker back and forth. This artifact is called Z-fighting.Setting the
farplane too far: You are stretching a finite amount of precision over a vast distance, which also reduces accuracy and can cause Z-fighting.
Best Practice: Keep the far / near ratio as small as possible for your scene's needs (ideally under 1000). Push the near plane out as far as you can without clipping into objects the player should see.
Part 3: Reverse-Z Projection
In the last section, we discussed how the classic projection matrix maps the view-space depth range [-near, -far] to the depth buffer's [0, 1] range. This traditional method has a significant drawback related to how computers store numbers.
Traditional Z-Mapping
The
nearplane is mapped to a depth of0.0.The
farplane is mapped to a depth of1.0.Standard floating-point numbers (like
f32) have the most precision near zero.Result: Almost all of your depth precision is clustered right in front of the camera, leaving very little for the distant parts of your scene. This is the primary cause of the Z-fighting artifact.
To solve this, modern rendering pipelines, including Bevy's, use a clever technique called "Reverse-Z". The idea is simple but highly effective: we just flip the mapping.
Reverse-Z Mapping
The
nearplane is mapped to a depth of1.0.The
farplane is mapped to a depth of0.0.Result: The high precision of floating-point numbers (near zero) is now distributed across the far end of the view frustum. This results in a much more even and usable distribution of depth precision across the entire visible range, significantly reducing Z-fighting artifacts.
The implementation is a small tweak to the perspective projection matrix. Bevy also commonly uses an "infinite" far plane, meaning geometry is never clipped for being too far away, which simplifies the matrix further.
// A common form for an infinite far plane with Reverse-Z, which Bevy uses.
fn reverse_z_perspective(
fov_y_radians: f32,
aspect_ratio: f32,
near: f32
) -> mat4x4<f32> {
let f = 1.0 / tan(fov_y_radians / 2.0);
// Column-major construction
let col0 = vec4(f / aspect_ratio, 0.0, 0.0, 0.0);
let col1 = vec4(0.0, f, 0.0, 0.0);
// The Z-mapping components are different from the classic matrix
let col2 = vec4(0.0, 0.0, 0.0, -1.0);
let col3 = vec4(0.0, 0.0, near, 0.0);
return mat4x4<f32>(col0, col1, col2, col3);
}
You don't need to implement this yourself when using Bevy's built-in camera, but it's crucial to know that this is happening under the hood. It explains why Bevy's rendering is robust against depth artifacts by default and is a key piece of context for anyone diving deep into the engine's rendering code. From this point forward, when we discuss "the projection matrix," you can assume it's this more robust, modern version.
Part 4: Custom Projection Effects
Understanding how projection matrices are constructed gives you the power to break the rules. By manipulating the transformation pipeline, you can create non-standard camera effects that would be impossible with a standard projection matrix alone.
Fish-Eye Effect
A fisheye lens captures an extremely wide field of view, causing straight lines to appear curved. This "barrel distortion" is the lens's signature characteristic. A standard perspective projection matrix is fundamentally incapable of creating this effect because it is a linear transformation, meaning it is designed to preserve straight lines.
To create a true fisheye effect, we must introduce a non-linear step into our vertex shader.
Perspective Projection's Logic: The distance of a point from the center of the screen is proportional to
tan(theta), wherethetais the angle of that point from the camera's forward axis. This preserves lines.Fisheye Projection's Logic: The distance from the center is proportional directly to the angle
thetaitself. This bends lines.
Implementation in the Vertex Shader:
The most accurate way to implement this is to interrupt the standard transformation pipeline. We transform our vertex into view space, apply our custom non-linear distortion, and then apply the final projection matrix.
// --- In the vertex shader ---
// 1. Transform vertex from world space to view space as usual.
let view_pos = view_matrix * world_position;
// 2. Apply the non-linear fisheye distortion.
// Calculate the distance from the center of the view and the angle from the forward axis.
let xy_distance = length(view_pos.xy);
// -view_pos.z is the distance "into" the screen
let theta = atan2(xy_distance, -view_pos.z);
// 3. Determine the new, distorted distance from the center.
// Instead of tan(theta), we just use theta.
// The focal_length is derived from the camera's FOV.
let focal_length = 1.0 / tan(fov_y_radians * 0.5);
let fisheye_radius = theta * focal_length;
// 4. Calculate a scaling factor and apply it.
var distorted_view_pos = view_pos;
// Avoid division by zero at the very center of the view.
if (xy_distance > 0.001) {
let scale = fisheye_radius / xy_distance;
distorted_view_pos.xy *= scale;
}
// 5. Now that the view-space position is distorted, apply the standard projection.
let clip_pos = projection_matrix * distorted_view_pos;
Dolly Zoom (The "Vertigo" Effect)
Popularized by Alfred Hitchcock's film Vertigo, the dolly zoom is a dramatic cinematic technique. It's achieved by moving the camera towards or away from a subject while simultaneously adjusting the lens's zoom (or FOV) to keep the subject the same size in the frame. The result is that the subject appears stationary while the background seems to either compress or expand dramatically.
This effect isn't a custom shader trick, but rather a manipulation of the camera and projection data you send to the shader each frame from your Rust code.
Implementation (in your Rust code):
Move the camera: In your update system, change the camera's
Transformto move it closer to or farther from your target.Adjust the FOV: In the same system, change the
fovproperty of thePerspectiveProjectioncomponent.
As the camera moves closer, you must increase the FOV (zoom out) to keep the subject the same size.
As the camera moves away, you must decrease the FOV (zoom in).
The shader simply receives a different projection_matrix each frame and renders the scene accordingly, creating the iconic effect. Here is what a simple Bevy system to control a dolly zoom might look like.
// A resource to control the dolly zoom effect
#[derive(Resource)]
struct DollyZoom {
target_entity: Entity,
// The value that must remain constant: distance_to_target * tan(fov / 2)
initial_product: f32,
// A timer to drive the animation, 0.0 to 1.0
progress: f32,
start_distance: f32,
end_distance: f32,
}
fn dolly_zoom_system(
time: Res<Time>,
mut dolly: ResMut<DollyZoom>,
mut camera_query: Query<(&mut Transform, &mut Projection), With<Camera3d>>,
target_query: Query<&GlobalTransform>,
) {
let Ok((mut camera_transform, mut projection)) = camera_query.get_single_mut() else { return };
let Ok(target_transform) = target_query.get(dolly.target_entity) else { return };
// Animate the effect over a few seconds
dolly.progress = (dolly.progress + time.delta_secs() * 0.2).fract();
let current_distance = dolly.start_distance.lerp(dolly.end_distance, dolly.progress);
// 1. Move the camera
let direction_to_target = (target_transform.translation() - camera_transform.translation).normalize();
camera_transform.translation = target_transform.translation() - direction_to_target * current_distance;
// 2. Adjust the FOV to compensate
if let Projection::Perspective(ref mut pers) = *projection {
// Solve for the new FOV using the core relationship
let new_half_fov_tan = dolly.initial_product / current_distance;
let new_fov_rad = 2.0 * new_half_fov_tan.atan();
pers.fov = new_fov_rad;
}
}
Part 5: Accessing Bevy's View and Projection
While building matrices from scratch in WGSL is a fantastic learning exercise, it's not something you'll do every day. Bevy's renderer, of course, already calculates the view and projection matrices for every active camera. Our job is to get that data from Bevy into our shader.
There are two primary ways to do this, each with its own use case: accessing Bevy's global view uniform directly, and passing the data through our own custom material.
The Global View Uniform (With a Big Caveat)
Bevy prepares a large uniform buffer containing all the data for the current view and binds it for many of its internal rendering passes. This View uniform is available at a well-known location: bind group 0, binding 0.
// Bevy's built-in View uniform struct (simplified)
struct View {
view_proj: mat4x4<f32>, // The final combined view * projection matrix
inverse_view_proj: mat4x4<f32>,
view: mat4x4<f32>, // The view matrix only
inverse_view: mat4x4<f32>, // The camera's world matrix
projection: mat4x4<f32>, // The projection matrix only
inverse_projection: mat4x4<f32>,
world_position: vec3<f32>, // The camera's world position
// ... and many more fields for time, viewport size, etc.
};
@group(0) @binding(0)
var<uniform> view: View;
You could, in theory, add this to your shader and use Bevy's data directly:
// This calculates the final position just like Bevy's internal shaders.
let clip_pos = view.view_proj * world_position;
WARNING: Do NOT do this in a standard Material!
Bevy's material system uses bind group 1 for material-specific data and bind group 2 for mesh-level data. Bind group 0 is reserved for view-level data managed by Bevy's PBR pipeline. If you try to define @group(0) @binding(0) in your custom material's shader, it will cause a binding conflict with the data Bevy is already providing, leading to crashes or unpredictable behavior.
When is it safe to use the global View uniform?
In compute shaders.
In full-screen post-processing effects.
In custom render pipelines where you are not using Bevy's
Materialtrait.
For our purposes in this curriculum, we will avoid this method and use the safer, more flexible approach.
The Safe Approach: Material Uniforms
The correct and most robust way to get camera data into a custom material is to pass it in yourself. We treat the camera's matrices just like any other data we want to control, like a color or a time value.
This involves three steps:
1. Define a uniform struct in your material:
// In your material's Rust code
// ...
#[derive(ShaderType, Clone)] // ShaderType is crucial
pub struct CameraData {
pub view_proj: Mat4,
pub position: Vec3,
}
#[derive(Asset, TypePath, AsBindGroup, Clone)]
pub struct MyCustomMaterial {
// This will be bound to @group(1) @binding(0) by default
#[uniform(0)]
pub camera: CameraData,
#[uniform(1)]
pub color: Color,
}
2. Create a Bevy system to update this data every frame:
This system queries for the active camera, gets its transform and projection data, and iterates through all assets of your material type, updating them with the latest values.
// In your app's systems
fn update_material_camera_data(
camera_query: Query<(&GlobalTransform, &Projection), With<Camera3d>>,
mut materials: ResMut<Assets<MyCustomMaterial>>,
) {
let Ok((camera_transform, projection)) = camera_query.get_single() else { return };
let view_matrix = camera_transform.compute_matrix().inverse();
let view_proj = projection.get_projection_matrix() * view_matrix;
for (_, material) in materials.iter_mut() {
material.camera.view_proj = view_proj;
material.camera.position = camera_transform.translation();
}
}
3. Use the data in your shader:
Now your shader can access this data from its own bind group (@group(1) for a Material that also uses mesh data, or @group(2) if it's a Material on a Mesh3d without a StandardMaterial handle), completely avoiding any conflicts with Bevy's internal bindings.
// In your shader.wgsl
struct CameraData {
view_proj: mat4x4<f32>,
position: vec3<f32>,
};
struct MyMaterial {
camera: CameraData,
color: vec4<f32>,
};
// Assuming this material is used with Mesh3d/MeshMaterial3d
@group(2) @binding(0)
var<uniform> material: MyMaterial;
// ... in your vertex function
let clip_pos = material.camera.view_proj * world_position;
This pattern is more work to set up initially, but it is the correct, conflict-free way to work with the Material trait. It also gives you the flexibility to send different camera data to different materials if you ever needed to.
---
## Complete Example: Interactive Camera Explorer
Now, let's put all this theory into practice. We will build an interactive demo that allows you to switch between perspective and orthographic projection on the fly. You will be able to orbit a scene of simple cubes, adjust the field of view, and see exactly how these changes affect the final rendering.
This project will solidify your understanding of how camera matrices are not just theoretical constructs but are the primary tools for defining the look and feel of a 3D scene.
### Our Goal
We will create a custom material and shader that visualizes our camera logic. A Rust system will build the View and Projection matrices from scratch based on interactive controls. We will use Bevy's standard transformation for the geometry to ensure stability, but we will pass our custom camera parameters to the fragment shader to drive distance-based fog and color coding, helping us visualize the difference between projection modes.
### What This Project Demonstrates
* **Manual Matrix Construction:** Building `look_at` (view) and `perspective`/`orthographic` (projection) matrices in Rust.
* **Uniform Data Flow:** Passing complex camera data from a Rust system into a custom `Material`'s uniform buffer.
* **Complete Vertex Transformation:** Implementing the full `projection * view * model * position` pipeline in a WGSL vertex shader for all modes.
* **Shader-Based Branching:** Using a `u32` uniform to switch between different rendering modes (perspective, ortho, fisheye) inside the shader.
* **Interactive Feedback:** Connecting keyboard inputs to camera parameters (FOV, distance, projection type) to provide a tangible feel for each concept.
### The Shader (`assets/shaders/d02_02_multi_projection.wgsl`)
The vertex shader uses Bevy's built-in `position_world_to_clip` for the geometry, ensuring our mesh is placed correctly on screen. However, we pass our custom camera data to the fragment shader to visualize the different modes: Perspective mode gets distance-based fog (which relies on camera position), while Orthographic mode gets a distinct flat coloring style.
```rust
#import bevy_pbr::mesh_functions
#import bevy_pbr::view_transformations
#import bevy_pbr::forward_io::VertexOutput
struct CameraUniforms {
view_matrix: mat4x4<f32>,
projection_matrix: mat4x4<f32>,
camera_position: vec3<f32>,
projection_type: u32, // 0=perspective, 1=orthographic
fov: f32, // Field of view in radians
ortho_size: f32,
time: f32,
}
@group(2) @binding(0)
var<uniform> camera: CameraUniforms;
@vertex
fn vertex(
@builtin(instance_index) instance_index: u32,
@location(0) position: vec3<f32>,
@location(1) normal: vec3<f32>,
) -> VertexOutput {
var out: VertexOutput;
let world_from_local = mesh_functions::get_world_from_local(instance_index);
let world_position = mesh_functions::mesh_position_local_to_world(
world_from_local,
vec4<f32>(position, 1.0)
);
out.position = bevy_pbr::view_transformations::position_world_to_clip(world_position.xyz);
// Pass data to fragment shader
out.world_position = world_position;
out.world_normal = mesh_functions::mesh_normal_local_to_world(normal, instance_index);
return out;
}
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
let normal = normalize(in.world_normal);
// Calculate distance from camera
let to_camera = in.world_position.xyz - camera.camera_position;
let distance = length(to_camera);
// Color based on projection type
var base_color = vec3<f32>(0.0);
if camera.projection_type == 0u {
// Perspective - blue
base_color = vec3<f32>(0.3, 0.5, 1.0);
} else{
// Orthographic - green
base_color = vec3<f32>(0.3, 1.0, 0.5);
}
// Simple lighting
let light_dir = normalize(vec3<f32>(
cos(camera.time),
0.5,
sin(camera.time)
));
let diffuse = max(0.3, dot(normal, light_dir));
// Distance-based fog for perspective
if camera.projection_type == 0u {
let fog_start = 10.0;
let fog_end = 40.0;
let fog_factor = clamp((distance - fog_start) / (fog_end - fog_start), 0.0, 1.0);
base_color = mix(base_color, vec3<f32>(0.5, 0.5, 0.6), fog_factor * 0.5);
}
return vec4<f32>(base_color * diffuse, 1.0);
}
The Rust Material (src/materials/d02_02_multi_projection.rs)
This file defines the data structure that will be passed from the CPU to the GPU. It contains our manually constructed matrices, the camera's position for lighting calculations, and several parameters to control the projection modes. Note the padding fields, which are necessary to ensure the struct's memory layout in Rust matches WGSL's expectations.
use bevy::prelude::*;
use bevy::render::render_resource::{AsBindGroup, ShaderRef};
mod uniforms {
#![allow(dead_code)]
use bevy::prelude::*;
use bevy::render::render_resource::ShaderType;
#[derive(ShaderType, Debug, Clone)]
pub struct CameraUniforms {
pub view_matrix: Mat4,
pub projection_matrix: Mat4,
pub camera_position: Vec3,
pub projection_type: u32,
pub fov: f32,
pub fisheye_strength: f32,
pub ortho_size: f32,
pub time: f32,
}
impl Default for CameraUniforms {
fn default() -> Self {
Self {
view_matrix: Mat4::IDENTITY,
projection_matrix: Mat4::IDENTITY,
camera_position: Vec3::ZERO,
projection_type: 0,
fov: 60.0,
fisheye_strength: 0.5,
ortho_size: 10.0,
time: 0.0,
}
}
}
}
pub use uniforms::CameraUniforms;
#[derive(Asset, TypePath, AsBindGroup, Debug, Clone)]
pub struct MultiProjectionMaterial {
#[uniform(0)]
pub camera: CameraUniforms,
}
impl Material for MultiProjectionMaterial {
fn vertex_shader() -> ShaderRef {
"shaders/d02_02_multi_projection.wgsl".into()
}
fn fragment_shader() -> ShaderRef {
"shaders/d02_02_multi_projection.wgsl".into()
}
}
Don't forget to add it to src/materials/mod.rs:
// ... other materials
pub mod d02_02_multi_projection;
The Demo Module (src/demos/d02_02_multi_projection.rs)
The Rust code sets up our scene and contains the logic for interactivity. The key system is update_materials. It takes the user-controlled parameters, builds the final view and projection matrices from scratch using our own helper functions, and then iterates through every instance of our custom material to update their uniform data.
use crate::materials::d02_02_multi_projection::{CameraUniforms, MultiProjectionMaterial};
use bevy::prelude::*;
use std::f32::consts::PI;
#[derive(Resource)]
struct CameraParams {
distance: f32, // Distance from target
angle: f32, // Horizontal rotation angle
height: f32, // Vertical height
target: Vec3, // Look-at target
fov_degrees: f32,
projection_type: u32, // 0=perspective, 1=orthographic
ortho_size: f32,
}
impl Default for CameraParams {
fn default() -> Self {
Self {
distance: 15.0,
angle: 0.0,
height: 5.0,
target: Vec3::ZERO,
fov_degrees: 60.0,
projection_type: 0,
ortho_size: 10.0,
}
}
}
pub fn run() {
App::new()
.add_plugins(DefaultPlugins)
.add_plugins(MaterialPlugin::<MultiProjectionMaterial>::default())
.init_resource::<CameraParams>()
.add_systems(Startup, setup)
.add_systems(
Update,
(
handle_input,
update_camera_transform,
update_materials,
update_ui,
),
)
.run();
}
fn setup(
mut commands: Commands,
mut meshes: ResMut<Assets<Mesh>>,
mut materials: ResMut<Assets<MultiProjectionMaterial>>,
mut standard_materials: ResMut<Assets<StandardMaterial>>,
) {
// Create a large grid of cubes to show projection effects
for x in -10..=10 {
for z in -10..=10 {
let distance = ((x * x + z * z) as f32).sqrt();
let height = (distance * 0.3).sin() * 0.5 + 0.5;
commands.spawn((
Mesh3d(meshes.add(Cuboid::new(0.8, height + 0.3, 0.8))),
MeshMaterial3d(materials.add(MultiProjectionMaterial {
camera: CameraUniforms::default(),
})),
Transform::from_xyz(x as f32 * 1.5, height * 0.5, z as f32 * 1.5),
));
}
}
// Add reference spheres at different distances
for i in 0..=8 {
let i = i - 4;
let distance = i as f32 * 3.0;
commands.spawn((
Mesh3d(meshes.add(Sphere::new(0.5))),
MeshMaterial3d(standard_materials.add(StandardMaterial {
base_color: Color::srgb(1.0, 0.5, 0.2),
..default()
})),
Transform::from_xyz(0.0, 2.0, -distance),
));
}
// Light
commands.spawn((
DirectionalLight {
illuminance: 10000.0,
shadows_enabled: true,
..default()
},
Transform::from_rotation(Quat::from_euler(EulerRot::XYZ, -PI / 4.0, PI / 4.0, 0.0)),
));
// Camera
let params = CameraParams::default();
let position = Vec3::new(
params.distance * params.angle.cos(),
params.height,
params.distance * params.angle.sin(),
);
commands.spawn((
Camera3d::default(),
Transform::from_translation(position).looking_at(params.target, Vec3::Y),
));
// UI
commands.spawn((
Text::new(""),
Node {
position_type: PositionType::Absolute,
top: Val::Px(10.0),
left: Val::Px(10.0),
..default()
},
TextFont {
font_size: 16.0,
..default()
},
));
}
fn update_camera_transform(
params: Res<CameraParams>,
mut camera_query: Query<(&mut Transform, &mut Projection), With<Camera3d>>,
) {
let Ok((mut transform, mut projection)) = camera_query.single_mut() else {
return;
};
// Calculate camera position from polar coordinates
let position = Vec3::new(
params.distance * params.angle.cos(),
params.height,
params.distance * params.angle.sin(),
);
// Update camera position and orientation
*transform = Transform::from_translation(position).looking_at(params.target, Vec3::Y);
// Update camera projection based on type
match params.projection_type {
0 => {
// Perspective projection
*projection = Projection::Perspective(PerspectiveProjection {
fov: params.fov_degrees.to_radians(),
near: 0.1,
far: 1000.0,
aspect_ratio: 1.0, // Will be updated by Bevy
});
}
1 => {
// Orthographic projection
let scale = params.ortho_size;
*projection = Projection::Orthographic(OrthographicProjection {
near: -1000.0,
far: 1000.0,
viewport_origin: Vec2::new(0.5, 0.5),
scaling_mode: bevy::render::camera::ScalingMode::FixedVertical {
viewport_height: scale * 2.0,
},
scale: 1.0,
area: bevy::math::Rect {
min: Vec2::new(-scale, -scale),
max: Vec2::new(scale, scale),
},
});
}
_ => {}
}
}
fn update_materials(
time: Res<Time>,
params: Res<CameraParams>,
windows: Query<&Window>,
camera_query: Query<&Transform, With<Camera3d>>,
mut materials: ResMut<Assets<MultiProjectionMaterial>>,
) {
let Ok(window) = windows.single() else {
return;
};
let Ok(camera_transform) = camera_query.single() else {
return;
};
let aspect = window.width() / window.height();
let position = camera_transform.translation;
// Build view matrix
let view_matrix = build_view_matrix(position, params.target, Vec3::Y);
// Build projection matrix based on type
let projection_matrix = match params.projection_type {
0 => build_perspective_matrix(params.fov_degrees, aspect, 0.1, 1000.0),
1 => build_orthographic_matrix(params.ortho_size, aspect, -1000.0, 1000.0),
_ => build_perspective_matrix(params.fov_degrees, aspect, 0.1, 1000.0),
};
// Update all materials
for (_, material) in materials.iter_mut() {
material.camera.view_matrix = view_matrix;
material.camera.projection_matrix = projection_matrix;
material.camera.camera_position = position;
material.camera.projection_type = params.projection_type;
material.camera.fov = params.fov_degrees.to_radians(); // Convert to radians for shader
material.camera.ortho_size = params.ortho_size;
material.camera.time = time.elapsed_secs();
}
}
fn build_view_matrix(eye: Vec3, target: Vec3, up: Vec3) -> Mat4 {
let forward = (eye - target).normalize();
let right = up.cross(forward).normalize();
let camera_up = right.cross(forward);
Mat4::from_cols(
right.extend(0.0),
camera_up.extend(0.0),
forward.extend(0.0),
Vec4::new(-right.dot(eye), -camera_up.dot(eye), -forward.dot(eye), 1.0),
)
}
fn build_perspective_matrix(fov_degrees: f32, aspect: f32, near: f32, far: f32) -> Mat4 {
let fov_rad = fov_degrees * PI / 180.0;
let f = 1.0 / (fov_rad / 2.0).tan();
let range = 1.0 / (near - far);
Mat4::from_cols(
Vec4::new(f / aspect, 0.0, 0.0, 0.0),
Vec4::new(0.0, f, 0.0, 0.0),
Vec4::new(0.0, 0.0, (near + far) * range, -1.0),
Vec4::new(0.0, 0.0, 2.0 * near * far * range, 0.0),
)
}
fn build_orthographic_matrix(size: f32, aspect: f32, near: f32, far: f32) -> Mat4 {
let width = size * aspect;
let height = size;
let depth = far - near;
Mat4::from_cols(
Vec4::new(2.0 / width, 0.0, 0.0, 0.0),
Vec4::new(0.0, 2.0 / height, 0.0, 0.0),
Vec4::new(0.0, 0.0, -2.0 / depth, 0.0),
Vec4::new(0.0, 0.0, -(far + near) / depth, 1.0),
)
}
fn handle_input(
keyboard: Res<ButtonInput<KeyCode>>,
mut params: ResMut<CameraParams>,
time: Res<Time>,
) {
let delta = time.delta_secs();
// Switch projection type
if keyboard.just_pressed(KeyCode::Space) {
params.projection_type = (params.projection_type + 1) % 2;
// params.projection_type = params.projection_type + 1;
}
// Camera rotation (around target)
let rotation_speed = 2.0 * delta;
if keyboard.pressed(KeyCode::ArrowLeft) {
params.angle -= rotation_speed;
}
if keyboard.pressed(KeyCode::ArrowRight) {
params.angle += rotation_speed;
}
// Camera height
let height_speed = 5.0 * delta;
if keyboard.pressed(KeyCode::ArrowUp) {
params.height = (params.height + height_speed).min(20.0);
}
if keyboard.pressed(KeyCode::ArrowDown) {
params.height = (params.height - height_speed).max(1.0);
}
// Camera distance
let distance_speed = 5.0 * delta;
if keyboard.pressed(KeyCode::Equal) {
params.distance = (params.distance - distance_speed).max(3.0);
}
if keyboard.pressed(KeyCode::Minus) {
params.distance = (params.distance + distance_speed).min(50.0);
}
// FOV/ortho size adjustment
if keyboard.pressed(KeyCode::KeyQ) {
params.fov_degrees = (params.fov_degrees - 30.0 * delta).max(10.0);
params.ortho_size = (params.ortho_size - 5.0 * delta).max(1.0);
}
if keyboard.pressed(KeyCode::KeyE) {
params.fov_degrees = (params.fov_degrees + 30.0 * delta).min(120.0);
params.ortho_size = (params.ortho_size + 5.0 * delta).min(50.0);
}
}
fn update_ui(params: Res<CameraParams>, mut text_query: Query<&mut Text>) {
if !params.is_changed() {
return;
}
for mut text in text_query.iter_mut() {
let proj_name = match params.projection_type {
0 => "Perspective".to_string(),
1 => "Orthographic".to_string(),
_ => "Unknown".to_string(),
};
let zoom_info = match params.projection_type {
0 => format!("FOV: {:.0}deg", params.fov_degrees),
1 => format!("Size: {:.1}", params.ortho_size),
_ => String::new(),
};
**text = format!(
"[SPACE]: Perspective (blue) / Orthographic (green)\n\
[Arrow Keys] Rotate Camera | [=/-] Camera Distance\n\
[Q/E] FOV/Zoom\n\
Projection: {}\n\
{} | Distance: {:.1}\n\
Angle: {:.0}deg | Height: {:.1}",
proj_name,
zoom_info,
params.distance,
params.angle.to_degrees(),
params.height
);
}
}
Don't forget to add it to src/demos/mod.rs:
// ... other demoss
pub mod d02_02_multi_projection;
And register it in src/main.rs:
Demo {
number: "2.2",
title: "Camera and Projection Matrices",
run: demos::d02_02_multi_projection::run,
},
Running the Demo
When you run the application, you will be greeted by a grid of cubes. You can use the keyboard to manipulate the camera and projection in real-time.
Controls
| Key(s) | Action |
| Space | Toggle between Perspective and Orthographic modes |
| Arrow Keys | Orbit the camera around the center of the scene |
| W / S | Increase / Decrease the camera's height |
| \= / - | Move the camera closer or farther away |
| Q / E | Decrease / Increase FOV or Orthographic Size |
What You're Seeing


| Mode | Description |
| Perspective | (Blue Objects) Standard perspective. Notice how the parallel lines of the grid appear to converge at a vanishing point in the distance. Objects farther away are smaller. |
| Orthographic | (Green Objects) All cubes appear the same size, no matter their distance. Parallel lines remain perfectly parallel. The scene looks flat, like a technical diagram. |
FOV/Size: Experiment with Q and E in each mode. In perspective/fisheye, you are changing the "zoom" of the lens. In orthographic, you are changing the size of the visible rectangular area.
Key Takeaways
This article demystified the "black box" of camera transformations. Before moving on, ensure you have a solid grasp of these core concepts:
The Full MVP Pipeline: You now understand the complete vertex transformation journey: Model → World → View → Clip. You can implement this full
projection * view * model * positionmultiplication chain in a shader to gain ultimate control over where a vertex appears on screen.View Matrix: Its purpose is to transform the entire world into a camera-centric coordinate system (View Space), where the camera is at the origin looking down the -Z axis. It is the mathematical inverse of the camera's world transformation.
Look-At Matrix: This is the most common way to construct a view matrix, using an
eyeposition, atargetpoint, and anupvector to define the camera's orientation.Orthographic vs. Perspective: Orthographic projection preserves size and parallel lines, ideal for 2D or technical views. Perspective projection simulates depth by making distant objects appear smaller.
The Perspective Divide: The "magic" of perspective comes from the GPU automatically dividing the final
xyzcoordinates by thewcoordinate. The projection matrix is engineered to store the vertex's distance in thiswcomponent.Frustum Parameters: The shape of the camera's view is defined by its Field of View (FOV), Aspect Ratio, and the Near/Far Clipping Planes.
Depth Buffer Precision: The distribution of depth buffer accuracy is non-linear. Setting the
nearplane too close is a common cause of Z-fighting artifacts. Bevy uses Reverse-Z mapping by default to improve this distribution.Bevy Integration: The safest and most robust way to get camera data into a custom
Materialis to pass it in via your own uniform, which is updated each frame by a Rust system, avoiding binding conflicts.
What's Next?
You now have end-to-end control over the vertex transformation pipeline, from a model's local space all the way to the screen's clip space. You understand how to place an object in the world and how to define the camera that views it.
In the next article, we will shift our focus. Instead of just transforming the position of a vertex, we will learn how to read, interpret, and pass along other crucial pieces of data embedded in our meshes - like normals for lighting, UVs for texturing, and vertex colors for unique styling. This will unlock a whole new dimension of visual effects and prepare us to add color and texture to our custom geometry.
Next up: 2.3 - Working with Vertex Attributes
Quick Reference
Core Transformations
| Matrix | Input Space | Output Space | Primary Role | Key Insight |
| Model | Local Space | World Space | Places and orients the object in the scene. | World = Model * Local |
| View | World Space | View Space | Moves the entire world so the camera is at the origin. | Inverse of the camera's world transform. |
| Projection | View Space | Clip Space | Flattens the scene, applying perspective or ortho rules. | Sets the vertex's distance in the W component. |
| MVP | Local Space | Clip Space | The combined transformation: Proj * View * Model. | Final position sent to the GPU. |
Projection Types
| Type | Visual Effect | W Component | Precision | Ideal Use Case |
| Perspective | Distant objects appear smaller. Parallel lines converge. | Calculated as -Z (distance) | Non-linear, front-loaded (unless Reverse-Z). | 3D games, realistic rendering. |
| Orthographic | All objects appear same size. Parallel lines remain parallel. | Fixed at 1.0 | Linear (precision is evenly distributed). | 2D/UI, technical drawings. |






