Skip to main content

Command Palette

Search for a command to run...

Complete WGSL Built-in Function Reference

Updated
153 min read
Complete WGSL Built-in Function Reference

Think of the main tutorial series as a guided tour of the most important landmarks in shader programming. We focus on the essential, powerful tools you'll use every day, like mix(), dot(), and normalize().

But what happens when you're off the guided path and have a specific question? "What's the best way to manipulate individual bits?" "Is there a faster way to calculate a square root?" "How do I read a single texel from a specific mipmap level?"

This appendix is your comprehensive encyclopedia for those moments. It's a categorized, practical reference to the entire WGSL built-in function library, designed to give you the right answer quickly without forcing you to parse the dense official WGSL specification.

Each function is documented with a consistent format for easy scanning:

  • Signature: function_name(parameter: type) -> return_type

  • Description: A concise explanation of what the function does.

  • Use Case: A short code snippet demonstrating a practical application.

Alphabetical Function Index

For quick lookups, here is a complete list of all built-in WGSL functions in alphabetical order. Click on any function name to jump directly to its detailed entry.

abs acos acosh all any arrayLength asin asinh atan atan2 atanh atomicAdd atomicCompareExchangeWeak atomicExchange atomicLoad atomicStore

bitcast

ceil clamp cos cosh countLeadingZeros countOneBits countTrailingZeros cross

degrees determinant distance dot dpdx dpdxCoarse dpdxFine dpdy dpdyCoarse dpdyFine

exp exp2 extractBits

faceForward firstLeadingBit firstTrailingBit floor fma fract frexp fwidth fwidthCoarse fwidthFine

insertBits inverseSqrt

ldexp length log log2

max min mix modf

normalize

pack2x16float pack2x16snorm pack2x16unorm pack4x8snorm pack4x8unorm pow

quantizeToF16

radians reflect refract reverseBits round

saturate select sign sin sinh smoothstep sqrt step storageBarrier

tan tanh textureDimensions textureGather textureGatherCompare textureLoad textureNumLayers textureNumLevels textureNumSamples textureSample textureSampleBaseClampToEdge textureSampleBias textureSampleCompare textureSampleCompareLevel textureSampleGrad textureSampleLevel textureStore transpose trunc

unpack2x16float unpack2x16snorm unpack2x16unorm unpack4x8snorm unpack4x8unorm

workgroupBarrier workgroupUniformLoad

Index by Category

To discover functions based on their purpose, browse the categories below.

Scalar & Common Mathematical Functions

This is the largest and most fundamental category of built-in functions in WGSL. These are the mathematical workhorses that form the building blocks of almost every shader effect.

While primarily designed to operate on single scalar values (like f32 or i32), these functions have the powerful property of automatically working component-wise when given a vector input. This means you can apply the same operation to all elements of a vector in a single, concise call.

The toolkit provided by this category is vast. It gives you the power to create smooth, repeating motion for animation through trigonometric operations. You can control and shape numerical values by manipulating their sign or fractional parts, which is the key to generating procedural, tiled patterns. Furthermore, you can implement non-linear curves and falloffs for realistic lighting and effects using functions for powers, roots, and exponents. This is the essential mathematical library you will reach for in nearly every shader you write.

abs

Signature

abs(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, i32, u32, or f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the absolute value of e, making negative values positive. When e is a vector, the function is applied to each component individually.

  • For floating-point types (f32): This effectively removes the negative sign. abs(-5.0) is 5.0.

  • For unsigned integer types (u32): The function has no effect and simply returns e.

  • For signed integer types (i32): There is a specific edge case. The largest negative number (-2,147,483,648) does not have a positive equivalent within the i32 range. In this specific case, abs() will return the original negative number. This is a rare overflow condition to be aware of.

W3C Specification: abs

Use Case

A common use is to create a "bounce" or "mirror" effect from an oscillating function like sin().

// Create a "bounce" animation instead of a standard wave.
let time: f32 = ...;
let wave = sin(time);   // Oscillates between -1.0 and 1.0
let bounce = abs(wave); // Now oscillates between 0.0 and 1.0

// This is useful for effects that should grow and shrink but never go negative.

acos

Signature

acos(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the arc cosine of e. This is the inverse of the cos function; it takes a value and returns the angle in radians whose cosine is that value.

The input e must be in the range [-1.0, 1.0]. Providing a value outside this range will result in implementation-defined behavior, which is often NaN (Not a Number). To avoid this, it is highly recommended to clamp() the input value before passing it to acos().

The result is returned in radians, within the range [0, π].

W3C Specification: acos

Use Case

The primary use case for acos is to find the angle between two normalized (unit-length) vectors.

let vector_a = normalize(vec3<f32>(1.0, 0.0, 0.0));
let vector_b = normalize(vec3<f32>(0.5, 0.866, 0.0)); // Rotated 60 degrees

// The dot product of two unit vectors gives the cosine of the angle between them.
let dot_product = dot(vector_a, vector_b); // Result is ~0.5

// Due to floating-point inaccuracies, the dot product might be slightly
// outside [-1, 1]. Clamping it is a robust best practice.
let clamped_dot = clamp(dot_product, -1.0, 1.0);

// `acos` gives us the actual angle in radians.
let angle_radians = acos(clamped_dot); // Result is ~1.047, which is PI / 3 (60 degrees)

acosh

Signature

acosh(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the arc hyperbolic cosine of e, which is the inverse of the cosh function.

The input e must be greater than or equal to 1.0. Providing a value less than 1.0 will result in implementation-defined behavior, which is often NaN (Not a Number).

The result is returned in the range [0, +infinity).

W3C Specification: acosh

Use Case

This is a specialized mathematical function and is less common in general visual effects than its trigonometric counterpart, acos. It is used in scenarios involving hyperbolic geometry or solving equations that use the cosh function, such as calculations related to catenary curves (the shape of a hanging chain).

// `acosh` is the inverse of `cosh`.
let original_param = 2.0;
let cosh_result = cosh(original_param); // cosh(2.0) is approx 3.76

// To reverse the operation, we use acosh.
// The input is >= 1.0, so this is a valid operation.
let recovered_param = acosh(cosh_result); // Result is ~2.0

asin

Signature

asin(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the arc sine of e, which is the inverse of the sin function. It takes a value e and returns the angle in radians whose sine is e.

The input e must be in the range [-1.0, 1.0]. Providing a value outside this range will result in implementation-defined behavior (often NaN). To ensure correctness, it is best practice to clamp() the input before passing it to asin().

The result is returned in radians, within the range [-π/2, π/2].

W3C Specification: asin

Use Case

While less common than acos for 3D vector math, asin is useful when you have a ratio that represents the sine of an angle and need to recover the angle itself.

// Imagine you have the y-coordinate of a point on a unit circle and need its angle.
let y_coord = 0.707; // sin(45 degrees) or sin(PI / 4)

// Clamp the input for safety, even though we know it's valid here.
let clamped_y = clamp(y_coord, -1.0, 1.0);

// `asin` gives us the angle in radians.
let angle_radians = asin(clamped_y); // Result is ~0.785, which is PI / 4 (45 degrees)

asinh

Signature

asinh(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the arc hyperbolic sine of e, which is the inverse of the sinh function. Unlike its trigonometric counterpart asin, asinh is defined for all real numbers, so no input clamping is necessary.

The result is returned in the range (-infinity, +infinity).

W3C Specification: asinh

Use Case

This is a specialized mathematical function used in scenarios involving hyperbolic geometry or for reversing a sinh calculation. It is not commonly used in general visual effects programming.

// `asinh` is the inverse of `sinh`.
let original_param = 1.5;
let sinh_result = sinh(original_param); // sinh(1.5) is approx 2.129

// The `asinh` function can recover the original parameter.
let recovered_param = asinh(sinh_result); // Result is ~1.5

atan

Signature

atan(e: T) -> T

  • e: The input value or vector, typically representing a slope (y/x).

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the arc tangent of e, which is the inverse of the tan function. It takes a value e (representing a slope) and returns the angle in radians that produces it.

The result is returned in radians, within the range [-π/2, π/2] (from -90° to +90°).

Because the input represents a single ratio, atan cannot distinguish between angles in opposite quadrants. For example, the slope for a vector of (1, 1) is 1.0, and the slope for (-1, -1) is also 1.0. atan(1.0) will return the same angle for both. For most 2D angle calculations where you have the separate x and y components, the atan2 function is strongly preferred as it correctly handles all four quadrants.

W3C Specification: atan

Use Case

Used when you only have a slope value and need to find the corresponding angle within a limited 180-degree range.

// A slope of 1.0 corresponds to a 45-degree angle.
let slope = 1.0;
let angle_radians = atan(slope); // Result is ~0.785, which is PI / 4 (45 degrees)

// A negative slope.
let negative_slope = -1.0;
let neg_angle_rads = atan(negative_slope); // Result is ~-0.785 (-45 degrees)

atan2

Signature

atan2(y: T, x: T) -> T

  • y: The y-component or numerator.

  • x: The x-component or denominator.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the arc tangent of the quotient y / x, returning an angle in radians.

Unlike atan, which only takes a single ratio, atan2 is a much more powerful function that uses the signs of both y and x to correctly determine the angle in all four quadrants of a 2D coordinate system. It avoids the division-by-zero errors that would occur with atan(y/x) when x is zero.

The result is returned in radians, within the full circular range [-π, π] (from -180° to +180°). The function is ill-defined at the origin, where both x and y are zero.

W3C Specification: atan2

Use Case

atan2 is the primary and correct tool for converting 2D Cartesian coordinates (x, y) into a polar angle. This is fundamental for creating circular or radial patterns.

// Center the UV coordinates so (0,0) is at the middle of the screen.
let centered_uv = uv - 0.5;

// `atan2` calculates the angle of the current pixel relative to the center.
// The result is a full 360-degree angle.
let angle = atan2(centered_uv.y, centered_uv.x);

// To visualize, we can map the angle from [-PI, PI] to a [0, 1] range
// and use it as a color hue, creating a color wheel.
let hue = angle / (2.0 * PI) + 0.5;
let color = hsv_to_rgb(hue, 1.0, 1.0); // (using a helper function)

// Example of quadrant awareness:
let angle_top_right    = atan2( 1.0,  1.0); //  PI / 4   ( 45 degrees)
let angle_bottom_left  = atan2(-1.0, -1.0); // -3*PI / 4  (-135 degrees)
// `atan()` would give the same result for both, as the slope is 1.0 in both cases.

atanh

Signature

atanh(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the arc hyperbolic tangent of e, which is the inverse of the tanh function.

The input e must be in the range (-1.0, 1.0). Providing a value where abs(e) >= 1.0 will result in implementation-defined behavior (often NaN or infinity). To ensure correctness, it is best practice to clamp() the input to be strictly inside this range.

The result is returned in the range (-infinity, +infinity).

W3C Specification: atanh

Use Case

This is a specialized mathematical function not commonly used in general visual effects. It is primarily used in scenarios involving hyperbolic geometry or for reversing a tanh calculation.

// `atanh` is the inverse of `tanh`.
let original_param = 0.5;
let tanh_result = tanh(original_param); // tanh(0.5) is approx 0.462

// To reverse the operation, we use atanh.
// The input is between -1.0 and 1.0, so this is a valid operation.
let recovered_param = atanh(tanh_result); // Result is ~0.5

ceil

Signature

ceil(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the "ceiling" of e, which is the smallest whole number that is greater than or equal to e. In simpler terms, it always rounds up to the next integer.

  • ceil(3.2) returns 4.0

  • ceil(3.0) returns 3.0

  • ceil(-3.2) returns -3.0 (since -3 is greater than -3.2)

W3C Specification: ceil

Use Case

ceil is useful for any operation where you need to snap a value up to the next discrete step or grid line.

// Imagine you need to calculate how many inventory slots an item takes up.
// An item might take up a fractional number of slots (e.g., 2.3).
let item_size = 2.3;

// You can't have a fraction of a slot, so you must round up
// to ensure enough space is allocated.
let slots_needed = ceil(item_size); // Result is 3.0

// This guarantees that even a size of 2.001 will correctly allocate 3 slots.

cos

Signature

cos(e: T) -> T

  • e: The input angle or vector of angles, in radians.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the cosine of e, where e is an angle specified in radians. The result is a smooth, continuous wave that oscillates between -1.0 and 1.0.

cos is very similar to the sin function, but is "phase-shifted" by 90 degrees (π/2 radians). This means that cos(0.0) returns 1.0, while sin(0.0) returns 0.0.

W3C Specification: cos

Use Case

cos is fundamental to creating oscillations, waves, and any form of circular motion. It is most famously paired with sin to describe a point on a circle.

let time = ...; // Animate over time
let angle_in_radians = time * 2.0;
let radius = 5.0;

// `cos` defines the x-coordinate of a point on a circle.
let x = cos(angle_in_radians) * radius;

// `sin` defines the y-coordinate.
let y = sin(angle_in_radians) * radius;

// The resulting point (x, y) will orbit the origin in a perfect circle.
let circular_position = vec2<f32>(x, y);

cosh

Signature

cosh(e: T) -> T

  • e: The input hyperbolic angle in radians.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the hyperbolic cosine of e. This function describes the shape of a catenary curves (the shape of a hanging chain or rope). Mathematically, it is defined as (exp(e) + exp(-e)) / 2.

The function accepts any real number as input. The output is always greater than or equal to 1.0, with cosh(0.0) returning 1.0. The function is symmetrical around the y-axis and grows exponentially as e moves away from zero.

W3C Specification: cosh

Use Case

cosh is a specialized mathematical function primarily used to model physical phenomena like hanging cables or in advanced geometric calculations. It is rarely used in typical visual effects.

// The cosh function forms a catenary curve.
// At the lowest point (the center of the "chain"), the value is 1.0.
let lowest_point = cosh(0.0); // Returns 1.0

// The value increases exponentially as the input moves away from zero.
let point_on_curve = cosh(2.0); // Result is approx 3.76

// The function is symmetrical.
let other_point = cosh(-2.0); // Also approx 3.76

degrees

Signature

degrees(e: T) -> T

  • e: The input angle or vector of angles, in radians.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Converts a value from radians to degrees. It is the inverse of the radians function.

The conversion is performed by approximating the formula e * 180 / π. Since all trigonometric functions in WGSL (sin, cos, etc.) operate in radians, this function is typically used only for converting values for debugging or for interfacing with systems that expect degrees.

W3C Specification: degrees

Use Case

Useful for converting an angle to a more human-readable format for display or debugging purposes.

// WGSL provides the constant PI (~3.14159 radians).
let angle_rad = PI;

// Convert PI radians to degrees.
let angle_deg = degrees(angle_rad); // Result is 180.0

// The function also works component-wise on vectors.
let angles_rad_vec = vec2<f32>(PI, PI / 2.0);
let angles_deg_vec = degrees(angles_rad_vec); // Result is vec2<f32>(180.0, 90.0)

exp

Signature

exp(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32 or f16 or a vector of f32 or f16 (e.g., vec3<f32>).

Description

Calculates the natural exponentiation of e. This is equivalent to raising e (Euler's number, the mathematical constant ≈ 2.71828) to the power of the input e.

This function models exponential growth and is the inverse of the log() (natural logarithm) function. As the input e increases linearly, the output grows at an ever-increasing rate.

W3C Specification: exp

Use Case

exp is useful for modeling physical phenomena that follow an exponential curve, such as atmospheric density (fog) or certain types of light attenuation.

// Create an exponential falloff based on distance.
let dist = length(some_position);

// A negative input to `exp` creates exponential decay.
// As `dist` increases, `fog_density` rapidly approaches 0.
let fog_density = exp(-dist * 0.5);

// This creates a much more natural-looking fog falloff than a
// linear one, where the fog thins out slowly at first and then
// disappears very quickly.

exp2

Signature

exp2(e: T) -> T

  • e: The input value or vector, which acts as the exponent.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates 2 raised to the power of e. This is equivalent to pow(2.0, e).

This function models base-2 exponential growth and is the inverse of the log2() function. It is often more convenient or slightly more performant than the general pow() function when working with powers of two, which is common in computer graphics.

W3C Specification: exp2

Use Case

exp2 is very useful when working with systems that are based on powers of two, such as octaves in procedural noise or mipmap levels.

// A common pattern in procedural noise generation (like Perlin or Simplex noise)
// is to layer multiple "octaves" of noise, where each successive octave
// has double the frequency of the last.

var final_noise = 0.0;
for (var i = 0; i < 4; i = i + 1) { // 4 octaves of noise
    let i_f32 = f32(i);

    // exp2(i_f32) calculates the frequency for this octave:
    // i=0 -> 2^0 = 1.0
    // i=1 -> 2^1 = 2.0
    // i=2 -> 2^2 = 4.0
    // i=3 -> 2^3 = 8.0
    let frequency = exp2(i_f32);

    // ... calculate noise with this frequency ...
}

floor

Signature

floor(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the "floor" of e, which is the greatest whole number that is less than or equal to e. In simpler terms, it always rounds down to the nearest integer.

  • floor(3.7) returns 3.0

  • floor(3.0) returns 3.0

  • floor(-3.7) returns -4.0 (since -4 is less than -3.7)

W3C Specification: floor

Use Case

floor is a cornerstone of procedural pattern generation. It is used to find the integer coordinate or ID of a grid cell, which is essential for creating tiled or repeating patterns.

// Use UV coordinates that range from 0.0 to 1.0 across a surface.
let uv = in.uv;

// Scale the UVs to create a 5x5 grid.
let scaled_uv = uv * 5.0;

// `floor` gives us the integer coordinate of the grid cell we are in.
// `cell_id` will be a vec2 like (0,0), (1,0), (2,4), etc.
let cell_id = floor(scaled_uv);

// We can now use this stable integer ID to give each grid cell a unique
// property, like a pseudo-random color.
let random_value = pseudo_random(cell_id); // (using a helper hash function)
let cell_color = vec3<f32>(random_value);

fma

Signature

fma(e1: T, e2: T, e3: T) -> T

  • e1, e2, e3: The input values or vectors.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates and returns (e1 * e2) + e3. The name fma is short for "Fused Multiply-Add."

Ideally, this is a "fused" operation, meaning the multiplication e1 * e2 is performed with a higher internal precision, and only the final result of the addition is rounded to the target type T. This can produce more accurate results and avoid rounding errors compared to a separate multiply followed by an add.

However, the WGSL specification allows implementations to treat this as a standard, non-fused operation (a multiply followed by an add). Therefore, you should not rely on the higher precision of a true fused operation unless you are certain of the capabilities of your target hardware and drivers.

W3C Specification: fma

Use Case

fma is a fundamental operation in linear algebra and can be used to efficiently implement many common formulas. For example, the linear interpolation function mix(a, b, t) can be expressed using fma.

// The `mix` function performs linear interpolation:
// mix(a, b, t) is equivalent to a * (1.0 - t) + b * t

// This can be algebraically rearranged into the `fma` pattern:
// t * (b - a) + a
fn manual_mix(a: vec3<f32>, b: vec3<f32>, t: f32) -> vec3<f32> {
    // This is a direct application of the fma pattern.
    return fma(b - a, vec3<f32>(t), a);
}

let red = vec3(1.0, 0.0, 0.0);
let blue = vec3(0.0, 0.0, 1.0);

// These two lines produce the same result (a purple color).
let result_mix = mix(red, blue, 0.5);
let result_fma = manual_mix(red, blue, 0.5);

fract

Signature

fract(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the fractional part of e, which is equivalent to e - floor(e). The result is a value representing the part of the number after the decimal point.

The output is always in the range [0.0, 1.0]. For positive numbers, this is straightforward: fract(3.7) is 0.7. For negative numbers, it is 1.0 minus the fractional part: fract(-3.7) is 0.3. Due to floating-point precision, a very small negative number might result in 1.0.

Visually, fract(x) creates a repeating "sawtooth wave" that climbs linearly from 0.0 towards 1.0 and then instantly drops back to 0.0 to repeat. This behavior makes it one of the most fundamental functions for procedural pattern generation.

W3C Specification: fract

Use Case

fract is essential for creating repeating or tiled patterns. It is almost always used in combination with floor to establish a repeating local coordinate system.

// Use UV coordinates that range from 0.0 to 1.0 across a surface.
let uv = in.uv;

// Scale the UVs to create a 10x10 grid.
let scaled_uv = uv * 10.0;

// `floor` gives us the integer ID of the grid cell.
let cell_id = floor(scaled_uv);

// `fract` gives us the coordinate *inside* the current cell.
// `local_uv` will always be in the range [0,1] for x and y,
// resetting for each new cell, creating a repeating coordinate system.
let local_uv = fract(scaled_uv);

// We can now draw something in the center of every cell.
// `length(local_uv - 0.5)` calculates the distance from the cell's center.
let circle_mask = 1.0 - step(0.4, length(local_uv - 0.5));
// The result is a grid of white circles on a black background.

frexp

Decomposes a floating-point number into its two fundamental components: a significand (fractional part) and an integer exponent of 2.

Overload: Scalar Input

Applies the decomposition to a single floating-point number.

Signature

frexp(e: T) -> FrexpResult

  • e: The input scalar value. T can be f32 or f16.

  • FrexpResult: A special, built-in struct that cannot be declared by name. It must be inferred with let.

Description

The frexp function splits the input e into a significand and an exponent such that e = significand * 2^exponent. The result is returned in a struct with two members:

  • .fract: A floating-point value of the same type as e. This is the significand, and its absolute value will be in the range [0.5, 1.0).

  • .exp: A signed 32-bit integer (i32) representing the exponent.

You cannot explicitly declare a variable of the return type. Instead, you must use let to infer the type and then access its members.

W3C Specification: frexp

Overload: Vector Input

Applies the decomposition component-wise to a vector of floating-point numbers.

Signature

frexp(e: vecN<T>) -> FrexpResultVec

  • e: The input vector. T can be f32 or f16. N can be 2, 3, or 4.

  • FrexpResultVec: A special, built-in struct containing vectors.

Description

Performs the frexp operation on each component of the input vector e. The result is returned in a struct with two members:

  • .fract: A vector of the same type as e (e.g., vecN<f32>).

  • .exp: A vector of signed 32-bit integers (e.g., vecN<i32>).

W3C Specification: frexp

Use Case

frexp is a low-level function used for manipulating the binary representation of floating-point numbers. It is not common in general visual effects but is useful for certain numerical algorithms.

let my_float = 12.0;

// Decompose the float. The type of `result` is inferred.
let result = frexp(my_float);

// `result.fract` will be 0.75
// `result.exp` will be 4
// Because: 0.75 * 2^4 = 0.75 * 16 = 12.0
let significand = result.fract;
let exponent = result.exp;
let my_vec = vec2<f32>(12.0, -0.25);

// Decompose the vector.
let result_vec = frexp(my_vec);

// `result_vec.fract` will be vec2<f32>(0.75, -0.5)
// `result_vec.exp` will be vec2<i32>(4, -1)
// Because:
//   -0.5 * 2^-1 = -0.5 * 0.5 = -0.25
let significands = result_vec.fract;
let exponents = result_vec.exp;

inverseSqrt

Signature

inverseSqrt(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the inverse square root of e, which is equivalent to 1.0 / sqrt(e).

The input e must be greater than 0. Providing a non-positive value (<= 0) will result in implementation-defined behavior.

This function is often significantly faster than performing a separate sqrt and division, as many GPUs have a dedicated, hardware-accelerated instruction for this specific operation (rsqrt). It is a key function for high-performance graphics code, especially in vector normalization.

W3C Specification: inverseSqrt

Use Case

The most common use of inverseSqrt is to perform a fast vector normalization. The normalize(v) operation is mathematically v / length(v), which can be rewritten as v * (1.0 / sqrt(dot(v, v))). This is a perfect fit for inverseSqrt.

fn fast_normalize(v: vec3<f32>) -> vec3<f32> {
    // The squared length of the vector, which is v.x*v.x + v.y*v.y + v.z*v.z
    let length_sq = dot(v, v);

    // Calculate the inverse square root of the squared length.
    let inv_sqrt = inverseSqrt(length_sq);

    // Multiply the original vector by the result.
    return v * inv_sqrt;
}

// This manual implementation is often how the built-in `normalize`
// function is implemented under the hood for maximum performance.

ldexp

Signature

ldexp(e1: T, e2: I) -> T

  • e1: The significand (fractional part) or base value.

  • e2: The integer exponent for the power of 2.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

  • I: Can be i32 or a vector of i32 (e.g., vec3<i32>).

The second parameter e2 must be a vector if and only if the first parameter e1 is a vector, and they must have the same number of components.

Description

Constructs a floating-point number by calculating e1 multiplied by 2 raised to the power of e2 (i.e., e1 * 2^e2). The name ldexp is a mnemonic for "load exponent."

This function is the inverse of frexp; it takes a significand and an exponent and reconstructs the original number. For any finite number x, the following is true: x == ldexp(frexp(x).fract, frexp(x).exp).

W3C Specification: ldexp

Exponent Constraints

The maximum value for the exponent e2 is tied to the internal representation of the floating-point type T, which is defined by an "exponent bias".

  • For f32, the bias is 127.

  • For f16, the bias is 15.

  • For AbstractFloat, the bias is 1023. AbstractFloat is the high-precision type used for floating-point literals (e.g., 1.0, 3.14159) during shader compilation, before they are assigned to a concrete type like f32. This high bias reflects the greater precision used for compile-time constant evaluation.

The operation has the following constraints:

  • If e2 + bias is less than or equal to 0, the result may be zero (underflow).

  • If e2 is greater than bias + 1, the behavior depends on when the value is known:

    • If e2 is a compile-time constant, it is a shader-creation error.

    • Otherwise, the result is an indeterminate value at runtime (e.g., infinity or the maximum float value).

Use Case

ldexp is primarily used to reconstruct a floating-point number after its components have been manipulated using frexp. This allows for efficient multiplication or division by powers of two by directly modifying the exponent.

let original_value = 6.0;

// Decompose the value into its parts.
let parts = frexp(original_value); // -> .fract is 0.75, .exp is 3

// To multiply the original value by 4 (which is 2^2),
// we can simply add 2 to the exponent instead of performing a full multiplication.
let new_exponent = parts.exp + 2; // 3 + 2 = 5

// Reconstruct the number using the original significand and the new exponent.
// ldexp(0.75, 5) calculates 0.75 * 2^5 = 0.75 * 32 = 24.0
let final_value = ldexp(parts.fract, new_exponent); // Result is 24.0, which is 6.0 * 4

log

Signature

log(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the natural logarithm of the parameter e. The "natural logarithm" is the logarithm to the base e (Euler's number, the mathematical constant ≈ 2.71828). This function is the inverse of the exp() function.

The input e must be greater than 0. Providing a non-positive value (<= 0) will result in implementation-defined behavior.

Logarithms are useful for compressing a wide range of values into a much smaller one. As the input e grows exponentially, the output of log(e) grows only linearly.

W3C Specification: log

Use Case

log is often used to handle values that span several orders of magnitude, like light intensity in HDR (High Dynamic Range) rendering, by converting them to a more manageable scale for processing or display.

// Imagine `hdr_brightness` can range from 1.0 (normal) to 10000.0 (very bright).
let hdr_brightness = 10000.0;

// `log(1.0)` is 0.0
// `log(10000.0)` is approx 9.21
// The `log` function has compressed the huge [1.0, 10000.0] range
// into a much smaller [0.0, 9.21] range.
let compressed_brightness = log(hdr_brightness);

// This smaller range is now much easier to map to a standard [0.0, 1.0]
// displayable brightness for tone mapping.
let display_brightness = compressed_brightness / 10.0; // A simple tone mapping

log2

Signature

log2(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the base-2 logarithm of e. This function finds the power to which the number 2 must be raised to produce the value e. It is the inverse of the exp2() function.

The input e must be greater than 0. Providing a non-positive value (<= 0) will result in implementation-defined behavior.

W3C Specification: log2

Use Case

log2 is particularly useful in computer graphics and programming for problems that involve powers of two, such as calculating mipmap levels or determining the number of bits required for a value.

// log2 answers the question: "2 to the power of what equals this number?"

let val_a = 16.0;
// 2 to the power of 4 is 16, so log2(16.0) is 4.0.
let power_a = log2(val_a); // Result is 4.0

let val_b = 0.25;
// 2 to the power of -2 is 1/(2^2) = 1/4 = 0.25, so log2(0.25) is -2.0.
let power_b = log2(val_b); // Result is -2.0

modf

Decomposes a floating-point number into its whole and fractional parts.

W3C Specification: modf

Overload: Scalar Input

Signature

Applies the decomposition to a single floating-point number.

modf(e: T) -> ModfResult

  • e: The input scalar value.

  • T: Can be AbstractFloat, f32, or f16.

  • ModfResult: A special, built-in struct that cannot be declared by name. It must be inferred with let.

Description

The modf function splits the input e into its integer and fractional components. The result is returned in a struct with two members, both of which have the same type as the input `e:

  • .fract: The fractional part of e.

  • .whole: The whole number (integer) part of e.

Both the fractional and whole parts will have the same sign as the original input e. Note that the whole number part is returned as a floating-point type, not an integer type.

Overload: Vector Input

Applies the decomposition component-wise to a vector of floating-point numbers.

Signature

modf(e: vecN<T>) -> ModfResultVec

  • e: The input vector.

  • T: Can be f32, or f16. N can be 2, 3, or 4.

  • ModfResultVec: A special, built-in struct containing vectors that cannot be declared by name. It must be inferred with let.

Description

Performs the modf operation on each component of the input vector e. The result is returned in a struct with two members:

  • .fract: A vector of the same type as e (e.g., vecN<f32>).

  • .whole: A vector of the same type as e (e.g., vecN<f32>).

Use Case

modf is useful when you need to operate on both the integer and fractional parts of a number separately. This is distinct from using floor and fract, especially for negative numbers.

let my_float = 3.7;
let result_pos = modf(my_float);
// result_pos.fract is 0.7
// result_pos.whole is 3.0

let my_neg_float = -3.7;
let result_neg = modf(my_neg_float);
// result_neg.fract is -0.7
// result_neg.whole is -3.0

// Compare this to the standard `fract` function:
// fract(-3.7) would return 0.3

let my_vec = vec2<f32>(3.7, -2.5);
let result_vec = modf(my_vec);

// result_vec.fract is vec2<f32>(0.7, -0.5)
// result_vec.whole is vec2<f32>(3.0, -2.0)

pow

Signature

pow(e1: T, e2: T) -> T

  • e1: The base value or vector.

  • e2: The exponent value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates e1 raised to the power of e2.

This is a fundamental function for creating non-linear curves and falloffs. The result can be sensitive to the input values:

  • If the base e1 is negative, the exponent e2 must be a whole number for the result to be mathematically meaningful in the domain of real numbers. A fractional exponent with a negative base will result in implementation-defined behavior (often NaN).

  • pow(0.0, 0.0) is implementation-defined.

  • Raising 0.0 to a negative exponent will result in infinity.

W3C Specification: pow

Use Case

pow is essential for controlling the sharpness of specular highlights in lighting models and for gamma correction.

// Use Case 1: Specular Highlight
// `dot_product` is the alignment between view and reflection, typically in [0, 1].
let dot_product = max(0.0, dot(view_dir, reflect_dir));
let shininess = 32.0;

// Raising the value to a high power makes the highlight much sharper.
// Values close to 1.0 stay high, while values less than 1.0 fall off very quickly.
let specular_highlight = pow(dot_product, shininess);

// Use Case 2: Gamma Correction
let linear_color = vec3<f32>(0.2, 0.5, 0.9);
let gamma = 2.2;

// Convert from linear color space to sRGB color space for display.
let srgb_color = pow(linear_color, vec3<f32>(1.0 / gamma));

quantizeToF16

Signature

quantizeToF16(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32 or a vector of f32 (e.g., vec3<f32>).

Description

Rounds a 32-bit float (f32) down to the nearest value that can be precisely represented by a 16-bit float (f16), and then returns that value as an f32.

This is a "quantization" function, which means it reduces the precision of a value. The operation is conceptually equivalent to the process of packing an f32 into the bits of an f16 and then immediately unpacking it back to an f32. This round-trip is lossy, and quantizeToF16 gives you the result of that precision loss.

W3C Specification: quantizeToF16

Use Case

This function is used to simulate the precision loss of f16 floats while still performing calculations in f32. This is important for maintaining visual consistency and avoiding artifacts in rendering pipelines that might use a mix of f16 and f32 precision for storing data (e.g., in textures or buffers), especially on mobile or low-power hardware.

// A high-precision 32-bit float.
// 0.100006103515625 is the closest f32 representation of 0.1
let high_precision_val: f32 = 0.100006103515625;

// Simulate storing this value in an f16 buffer and reading it back.
// The result will be the closest f16 value, promoted back to f32.
// 0.0999755859375 is the closest f16 representation of 0.1
let quantized_val: f32 = quantizeToF16(high_precision_val);

// By quantizing, you ensure that any subsequent calculations using this value
// will behave as if the value had been stored with lower precision,
// preventing subtle artifacts that can arise from mixing precisions.

radians

Signature

radians(e: T) -> T

  • e: The input angle or vector of angles, in degrees.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Converts a value from degrees to radians. It is the inverse of the degrees function.

The conversion is performed by approximating the formula e * π / 180. This is a crucial utility function because all of WGSL's built-in trigonometric functions (sin, cos, tan, etc.) expect their angle inputs to be in radians.

W3C Specification: radians

Use Case

Used to convert angles from a more human-readable format (degrees) into the required format for trigonometric calculations. This is common when taking user input or parameters defined in degrees.

// A rotation angle specified in degrees, perhaps from a uniform.
let angle_deg = 45.0;

// Convert the angle to radians before using it with sin() or cos().
let angle_rad = radians(angle_deg); // Result is ~0.785, which is PI / 4

// Now `angle_rad` can be correctly used in trigonometric functions.
let val = sin(angle_rad); // Correctly calculates sin(45 degrees)

// The function also works component-wise on vectors.
let angles_deg_vec = vec2<f32>(90.0, 180.0);
let angles_rad_vec = radians(angles_deg_vec); // Result is vec2<f32>(PI / 2.0, PI)

round

Signature

round(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Rounds the input e to the nearest whole number, returning the result as a floating-point value.

This function uses the "round half to even" strategy, which is the default rounding mode in the IEEE 754 standard. This means:

  • Values are rounded to the closest integer. For example, round(2.3) is 2.0 and round(2.8) is 3.0.

  • Values that are exactly halfway between two integers (like 2.5) are rounded to the nearest even integer.

    • round(2.5) returns 2.0

    • round(3.5) returns 4.0

    • round(-2.5) returns -2.0

This behavior is different from the common "always round .5 up" rule and is designed to reduce statistical bias in long calculations.

W3C Specification: round

Use Case

round is useful for snapping values to the nearest integer grid, for example, to create a pixelated or mosaic effect.

// Use UV coordinates that range from 0.0 to 1.0.
let uv = in.uv;

// Scale the UVs to define the size of our "pixels".
let pixel_grid_size = 10.0;
let scaled_uv = uv * pixel_grid_size;

// Round the scaled UVs to the nearest integer coordinate.
let rounded_uv = round(scaled_uv);

// Scale back down to the original 0-1 range. This creates large blocks
// where every fragment inside a grid cell gets the same final UV coordinate.
let pixelated_uv = rounded_uv / pixel_grid_size;

// Sampling a texture with these new UVs will produce a mosaic effect.
let final_color = textureSample(my_texture, my_sampler, pixelated_uv);

sign

Signature

sign(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, i32, or a vector of these types (e.g., vec3<f32>).

Description

Extracts the sign of e and returns it as a value of the same type. The result is:

  • 1 if e is positive.

  • 0 if e is zero.

  • -1 if e is negative.

This function is applied component-wise when the input is a vector.

W3C Specification: sign

Use Case

sign is useful for determining a direction without its magnitude. For example, it can be used to apply a force (like friction) that always opposes the direction of velocity.

let velocity = vec2<f32>(3.5, -2.1);

// Get the direction of velocity. `sign` will return vec2<f32>(1.0, -1.0).
let direction = sign(velocity);

let friction_magnitude = 0.5;

// The friction force should oppose the velocity.
let friction_force = -direction * friction_magnitude;
// `friction_force` is now vec2<f32>(-0.5, 0.5), correctly opposing the movement.

sin

Signature

sin(e: T) -> T

  • e: The input angle or vector of angles, in radians.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the sine of e, where e is an angle specified in radians. The result is a smooth, continuous wave that oscillates between -1.0 and 1.0.

sin is one of the most fundamental functions in graphics programming for creating any kind of periodic motion. sin(0.0) returns 0.0, and the wave repeats every 2π radians.

W3C Specification: sin

Use Case

sin is the primary tool for creating waves, oscillations, and cyclical animations.

// Use a time uniform to drive an animation.
let time = material.time;

// Use Case 1: Simple wave for vertex displacement
// This creates a physical wave along the x-axis of a mesh.
let frequency = 5.0;
let amplitude = 0.1;
let wave_offset = sin(position.x * frequency + time) * amplitude;
let new_y_position = position.y + wave_offset;

// Use Case 2: Pulsating color
// `sin` returns a value in [-1, 1]. We can map this to the [0, 1] range
// to control brightness or a color channel.
let pulse = sin(time) * 0.5 + 0.5; // Maps [-1, 1] -> [0, 1]
let pulsating_color = vec3<f32>(pulse, 0.0, 0.0); // Fades from black to red

sinh

Signature

sinh(e: T) -> T

  • e: The input hyperbolic angle in radians.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the hyperbolic sine of e. This function is related to the exponential function and is mathematically defined as (exp(e) - exp(-e)) / 2.

Unlike the trigonometric sin function which oscillates, the sinh function grows exponentially. It passes through the origin (sinh(0.0) is 0.0) and is an odd function, meaning sinh(-e) is equal to -sinh(e). It is the inverse of the asinh function.

W3C Specification: sinh

Use Case

sinh is a specialized mathematical function primarily used in advanced geometry or physics calculations. It is not commonly used in general visual effects programming.

// The sinh function grows exponentially away from zero.
let val_zero = sinh(0.0); // Result is 0.0

let val_pos = sinh(2.0); // Result is approx 3.62
let val_neg = sinh(-2.0); // Result is approx -3.62

// It is the inverse of asinh.
let recovered_value = asinh(val_pos); // Result is approx 2.0

sqrt

Signature

sqrt(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the square root of e. This is the inverse of squaring a number (x * x).

The input e must be non-negative (>= 0). Providing a negative input will result in implementation-defined behavior, which is often NaN (Not a Number).

W3C Specification: sqrt

Use Case

sqrt is fundamental for calculations based on the Pythagorean theorem, such as finding the length of a vector or the distance between two points. While WGSL provides the [length()(#length)] and distance() functions for this, sqrt is the core mathematical operation they are built on.

// Manually calculate the distance between two 2D points to demonstrate sqrt.
let point_a = vec2<f32>(2.0, 3.0);
let point_b = vec2<f32>(5.0, 7.0);

let delta = point_b - point_a; // delta is vec2<f32>(3.0, 4.0)

// The squared distance is dx*dx + dy*dy
let distance_sq = dot(delta, delta); // 3*3 + 4*4 = 9 + 16 = 25.0

// The actual distance is the square root of the squared distance.
let distance = sqrt(distance_sq); // sqrt(25.0) is 5.0

// This is mathematically equivalent to the built-in `distance()` function.

tan

Signature

tan(e: T) -> T

  • e: The input angle or vector of angles, in radians.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the tangent of e, where e is an angle specified in radians. Mathematically, tan(e) is equivalent to sin(e) / cos(e).

The function's output represents the slope of a line at a given angle. Unlike sin and cos, the output range of tan is unbounded (-infinity to +infinity). The function has vertical asymptotes (approaches infinity) at odd multiples of π/2 (90°, 270°, etc.), where cos(e) is zero. This makes it generally unsuitable for creating smooth, bounded animations.

W3C Specification: tan

Use Case

The primary role of tan in computer graphics is in geometric calculations, most notably in the creation of the perspective projection matrix, where it relates the camera's field of view (FOV) to the dimensions of the viewing frustum.

// A simplified example of the math inside a perspective projection matrix.
let fov_degrees = 90.0;
let fov_radians = radians(fov_degrees);

// The tangent of half the FOV is used to calculate a scaling factor
// that determines how much of the world is visible.
let scale_factor = 1.0 / tan(fov_radians * 0.5);

// This scale_factor would then be used to construct the matrix that
// transforms 3D world coordinates into 2D screen coordinates.

tanh

Signature

tanh(e: T) -> T

  • e: The input hyperbolic angle in radians.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the hyperbolic tangent of e. Mathematically, it is equivalent to sinh(e) / cosh(e).

The tanh function is a "squashing" function, meaning it takes any real number as input and maps it to a value within the range (-1.0, 1.0). It produces an "S"-shaped curve (a sigmoid) that is nearly linear around 0 and flattens out as the input moves towards positive or negative infinity. It is the inverse of the atanh function.

W3C Specification: tanh

Use Case

tanh is useful for creating smooth falloff curves or mapping an unbounded value to a predictable range.

// Imagine you have a distance value that can be very large.
let dist_from_center: f32 = ...; // Could be 0.0, 5.0, 100.0, etc.

// A linear falloff might not look good. We can use `tanh` to create a
// smooth curve that rapidly transitions and then levels off.
let scale = 0.5;
let tanh_falloff = tanh(dist_from_center * scale);

// The `tanh_falloff` value will start at 0.0, rise quickly, and then
// smoothly approach 1.0 without ever exceeding it, no matter how large
// `dist_from_center` gets. This creates a pleasing, controlled transition.
let effect_strength = 1.0 - tanh_falloff;

trunc

Signature

trunc(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the integer part of e by discarding its fractional component. This is commonly described as rounding towards zero. The result is returned as a floating-point value.

  • trunc(3.7) returns 3.0

  • trunc(-3.7) returns -3.0

Note that for negative numbers, this behavior is different from the floor() function, which would round -3.7 down to -4.0.

W3C Specification: trunc

Vector Functions

While the common mathematical functions can be applied component-wise to vectors, this family of functions performs holistic geometric operations. They treat vectors not just as a collection of numbers, but as representations of positions, directions, and surfaces in 2D or 3D space.

These are the indispensable power tools for solving the fundamental questions of computer graphics. They allow you to calculate the distance between points, extract a pure direction from a vector, measure the alignment and angle between directions, find vectors that are perpendicular to a surface, and even simulate the physics of light as it bounces off or bends through materials.

Mastering these functions is the key to implementing almost any lighting model, geometric effect, or physics-based interaction in a shader. They are the language of 3D space.

cross

Signature

cross(e1: vec3<T>, e2: vec3<T>) -> vec3<T>

  • e1, e2: The two input vectors.

  • T: Can be f32, or f16.

Description

Calculates the cross product of two 3-component vectors, e1 and e2.

The result is a new vec3 vector that is perpendicular (orthogonal) to both of the input vectors. The direction of the resulting vector is determined by the "right-hand rule": if you curl the fingers of your right hand in the direction from e1 to e2, your thumb will point in the direction of the result.

This function is defined only for vec3. It cannot be used with vectors of other dimensions.

W3C Specification: cross

distance

Signature

distance(e1: T, e2: T) -> S

  • e1, e2: The two input points.

  • T: The type of the input points. Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

  • S: The scalar component type of T (f32 if T is vecN<f32>, f16 if T is vecN<f16>, etc.).

Description

Calculates the Euclidean distance (the straight-line distance) between two points, e1 and e2. This is mathematically equivalent to length(e1 - e2).

The function always returns a single scalar floating-point value, even when the inputs are vectors.

W3C Specification: distance

Use Case

distance is one of the most common functions in shader programming, used for any calculation involving proximity, such as range checks, radial gradients, or circular masks.

// Use Case 1: AI Attack Range
let player_pos = vec3<f32>(10.0, 0.0, 5.0);
let enemy_pos = vec3<f32>(12.0, 0.0, 6.0);
let attack_range = 3.0;

if distance(player_pos, enemy_pos) < attack_range {
    // The player is in range, so the enemy can attack.
}

// Use Case 2: Circular Mask
// `uv` is a 2D coordinate from 0.0 to 1.0.
let center = vec2<f32>(0.5);
let radius = 0.4;
let dist_from_center = distance(uv, center);

// Create a hard-edged circle.
let circle = step(dist_from_center, radius);

dot

Signature

dot(e1: vecN<T>, e2: vecN<T>) -> T

  • e1, e2: The two input vectors of the same type and dimension.

  • T: The component type of the vectors. Can be f32, f16, i32, or u32.

  • The return value is a single scalar of type T.

Description

Calculates the dot product of e1 and e2. This is done by multiplying the corresponding components of the two vectors and then summing the results. For example, for two vec3 vectors a and b, the dot product is (a.x * b.x) + (a.y * b.y) + (a.z * b.z).

The geometric meaning of the dot product is what makes it one of the most powerful tools in shader programming. For two unit-length (normalized) floating-point vectors, the result is the cosine of the angle between them. This tells you how much the two vectors are aligned:

  • 1.0: The vectors point in the exact same direction.

  • 0.0: The vectors are perfectly perpendicular (90 degrees apart).

  • -1.0: The vectors point in exact opposite directions.

It can be thought of as a projection: dot(A, B) measures how much of vector A points along the direction of vector B.

W3C Specification: dot

faceForward

Signature

faceForward(e1: T, e2: T, e3: T) -> T

  • e1: The vector to orient.

  • e2: The incident vector.

  • e3: The reference vector.

  • T: Can be vecN<f32>, or vecN<f16>, where N is 2, 3, or 4.

Description

Orients a vector to point in the same general direction as another. Specifically, it returns e1 if dot(e2, e3) is negative, and -e1 otherwise.

This function's purpose is to ensure a normal vector is always pointing "towards" a direction of interest (like a camera), which is essential for effects like two-sided lighting. The parameters are typically used as follows:

  • e1: The normal vector you want to orient, N.

  • e2: A reference normal, N_ref, used to determine the surface's orientation. Usually, this is the same vector as N.

  • e3: The incident vector, I, which is a direction pointing towards the surface (e.g., the negated view direction).

The logic is: if the surface (N_ref) is facing the incident direction (I), return the normal (N) as is. Otherwise, return the flipped normal (-N).

W3C Specification: faceForward

Use Case

The canonical use case is creating two-sided materials where both the front and back faces of a polygon are lit correctly.

// The geometric normal from the mesh.
let geometric_normal = normalize(in.normal);

// The direction from the surface point TO the camera.
let view_dir = normalize(camera.world_position - in.world_position);

// The incident vector must point TOWARDS the surface, so we negate the view direction.
let incident_dir = -view_dir;

// `faceForward` ensures the normal used for lighting always points toward the camera,
// even if we are viewing the back-face of a polygon.
// We use `geometric_normal` for all three arguments in this common pattern.
let final_normal = faceForward(geometric_normal, incident_dir, geometric_normal);

// Now, `final_normal` can be used for lighting, and it will be correct
// regardless of which side of the polygon is visible.
let light_dir = ...;
let brightness = max(0.0, dot(final_normal, light_dir));

length

Signature

length(e: T) -> S

  • e: The input value or vector.

  • T: The type of the input. Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

  • S: The scalar component type of T (f32 if T is vecN<f32>, etc.).

Description

Calculates the length (also known as the magnitude or norm) of the input e. The behavior depends on whether the input is a scalar or a vector:

  • For a scalar e: The result is the absolute value of e.

  • For a vector e: The result is the Euclidean length, calculated as sqrt(dot(e, e)). For a vec3, this is sqrt(e.x*e.x + e.y*e.y + e.z*e.z).

Precision Note: The specification allows the scalar case to be implemented as sqrt(e * e). While mathematically equivalent to abs(e), this implementation detail can matter in extreme cases. If e is a very large number, the intermediate e * e calculation could overflow to infinity before the square root is taken, resulting in an incorrect value. For standard graphics work, this is rarely a concern, but it's a known limitation of this possible implementation.

The function always returns a single, non-negative scalar floating-point value.

W3C Specification: length

Use Case

length is fundamental for any operation that depends on the magnitude of a vector, such as measuring the distance of a point from the origin or calculating the speed from a velocity vector.

// Use Case 1: Distance from Origin
let point_in_space = vec3<f32>(3.0, 4.0, 0.0);

// `length` calculates the straight-line distance from (0,0,0) to the point.
let dist_from_origin = length(point_in_space); // Result is 5.0

// Use Case 2: Calculating Speed
let velocity_vector = vec3<f32>(-10.0, 20.0, 0.0);

// The length of the velocity vector is the object's speed.
let speed = length(velocity_vector); // Result is approx 22.36

// The built-in `distance(a, b)` function is a convenient shorthand
// for `length(a - b)`.

normalize

Signature

normalize(e: vecN<T>) -> vecN<T>

  • e: The input vector.

  • T: Can be f32, or f16.

  • N: The dimension of the vector, can be 2, 3, or 4.

Description

Calculates and returns a unit vector that points in the same direction as the input vector e. A unit vector is a vector with a length of exactly 1.0.

The operation is mathematically equivalent to e / length(e). This function is essential for any calculation that relies on pure direction rather than magnitude.

If the input vector e is the zero vector (i.e., its length is 0.0), the result is a zero vector of the same type. This avoids a division-by-zero error and provides a safe, predictable outcome.

W3C Specification: normalize

Use Case

Normalization is one of the most frequent operations in shader programming, especially for lighting calculations where directions must be represented by unit vectors.

// Use Case 1: Calculating a Direction Vector
let start_pos = vec3<f32>(1.0, 1.0, 1.0);
let end_pos = vec3<f32>(4.0, 5.0, 1.0);

// The vector from start to end has a magnitude (length).
let vector = end_pos - start_pos; // vec3<f32>(3.0, 4.0, 0.0), length is 5.0

// To get only the pure direction, we normalize it.
let direction = normalize(vector); // vec3<f32>(0.6, 0.8, 0.0), length is 1.0

// Use Case 2: Preparing Vectors for Lighting
// Surface normals passed from the vertex shader get interpolated, which changes
// their length. They MUST be re-normalized in the fragment shader.
let surface_normal = normalize(in.interpolated_normal);

let light_dir = normalize(light.position - in.world_position);

// Now the dot product will correctly give the cosine of the angle.
let brightness = dot(surface_normal, light_dir);

reflect

Signature

reflect(e1: T, e2: T) -> T

  • e1: The incident vector, I.

  • e2: The surface normal vector, N.

  • T: Can be vecN<f32>, or vecN<f16>, where N is 2, 3, or 4.

Description

Calculates the direction of a perfect, mirror-like reflection. It takes an incident vector e1 (representing the incoming ray) and a surface normal e2, and returns the reflected direction vector.

For this function to work correctly, two conditions are critical:

  1. The surface normal e2 must be a unit vector (normalized).

  2. The incident vector e1 must point towards the surface.

This second point is the most common source of errors. In shader programming, the view direction vector typically points from the surface to the camera. The reflect function expects the opposite. Therefore, you must negate your view direction vector before passing it to this function.

      I \  |N / R
         \ | /
          \|/
   ════════════════════ (Surface)

W3C Specification: reflect

Use Case

The primary use of reflect is to implement environment mapping, where the reflection vector is used to sample a cubemap (skybox) to simulate reflections on a surface.

// The surface normal, which must be normalized.
let surface_normal = normalize(in.normal);

// The direction pointing FROM the surface point TO the camera.
let view_dir = normalize(camera.position - in.world_position);

// The incident vector for `reflect` must point TOWARDS the surface,
// so we negate the view direction.
let incident_dir = -view_dir;

// Calculate the reflection direction.
let reflect_dir = reflect(incident_dir, surface_normal);

// `reflect_dir` can now be used to sample a skybox texture.
let reflection_color = textureSample(skybox_texture, skybox_sampler, reflect_dir);

refract

Signature

refract(e1: vecN<T>, e2: vecN<T>, e3: T) -> vecN<T>

  • e1: The incident vector, I.

  • e2: The surface normal vector, N.

  • e3: The ratio of indices of refraction, eta.

  • T: Can be f32, or f16. N can be 2, 3, or 4.

Description

Calculates the direction of a refracted ray of light as it passes through a surface from one medium to another (e.g., from air into water).

For this function to work correctly, several conditions are critical:

  1. The incident vector e1 must point towards the surface. As with reflect, this usually requires negating your view direction vector.

  2. The surface normal e2 must be a unit vector (normalized).

  3. The ratio e3 (eta) must be calculated as (Index of Refraction of the starting medium) / (Index of Refraction of the destination medium).

    • Air to Glass (IOR ≈ 1.5): eta = 1.0 / 1.5

    • Glass to Air: eta = 1.5 / 1.0

W3C Specification: refract

Total Internal Reflection (TIR)

Under certain conditions (when light travels from a denser to a less dense medium at a shallow angle), refraction is impossible, and all light reflects. This is known as Total Internal Reflection. The refract function signals this by returning a zero vector (vecN(0.0)). Your shader code should check for this case and typically perform a reflect operation instead.

Use Case

The primary use of refract is to render transparent materials like glass and water, where it provides the distorted view of the scene behind the object.

// The surface normal, which must be normalized.
let surface_normal = normalize(in.normal);

// The direction pointing FROM the surface point TO the camera.
let view_dir = normalize(camera.position - in.world_position);

// The incident vector for `refract` must point TOWARDS the surface.
let incident_dir = -view_dir;

// The ratio of IORs for a ray traveling from air (1.0) into glass (1.5).
let eta = 1.0 / 1.5;

// Calculate the refraction direction.
let refract_dir = refract(incident_dir, surface_normal, eta);

// Check for Total Internal Reflection.
if (all(refract_dir == vec3<f32>(0.0))) {
    // Perform reflection instead.
    let reflect_dir = reflect(incident_dir, surface_normal);
    // ... sample environment with reflect_dir ...
} else {
    // Use the refraction vector to sample the background/environment map.
    let refraction_color = textureSample(background_texture, sampler, refract_dir);
}

Matrix Functions

In computer graphics, matrices are the fundamental tool for performing transformations in 2D and 3D space, such as translation, rotation, and scaling. While most of the time you will be using matrices to transform vectors, this small family of functions allows you to analyze and manipulate the matrices themselves.

These functions provide insight into the properties of a transformation. They allow you to calculate a matrix's determinant to understand if it flips or collapses space, or to compute its transpose, which is a critical step in correctly transforming normal vectors for lighting calculations. While not used as frequently as vector or mathematical functions, they are essential for solving specific, advanced problems in geometric and lighting computations.

determinant

Signature

determinant(e: matCxC<T>) -> T

  • e: The input square matrix.

  • C: The dimension of the matrix, can be 2, 3, or 4.

  • T: Can be f32, or f16.

  • The return value is a single scalar of type T.

Description

Calculates the determinant of a square matrix e. The determinant is a single scalar value that reveals important properties about the linear transformation the matrix represents.

  • If the determinant is zero, the matrix is "singular." This means the transformation collapses space into a lower dimension (e.g., a 3D volume becomes a 2D plane), and the matrix cannot be inverted.

  • The sign of the determinant indicates if the transformation flips orientation. A negative determinant means the object has been mirrored or turned "inside-out" (e.g., by a negative scale on an odd number of axes).

  • The absolute value of the determinant represents the factor by which the transformation scales volume (for mat3x3, mat4x4) or area (for mat2x2).

W3C Specification: determinant

Use Case

The determinant is useful for checking if a matrix is invertible or for detecting mirrored transformations, which is important for correct lighting and culling.

let model_matrix: mat3x3<f32> = ...; // The upper 3x3 of a model matrix

let det = determinant(model_matrix);

// Check if the model has been mirrored (e.g., scale.x = -1.0).
if (det < 0.0) {
    // This object's coordinate system has been flipped.
    // This is important information for lighting calculations, as it may
    // require flipping the bitangent vector to ensure normal maps
    // are applied correctly.
    // It also flips the winding order of triangles, which affects culling.
}

// It can also be used to guard an `inverse()` call, though this is rare
// as `inverse()` will produce a valid (if not useful) matrix even for
// singular inputs on most hardware.
if (det != 0.0) {
    // The matrix is invertible.
}

transpose

Signature

transpose(e: matRxC<T>) -> matCxR<T>

  • e: The input matrix.

  • R, C: The number of rows and columns (2, 3, or 4).

  • T: Can be f32, or f16.

  • The return type is a matrix with the row and column counts swapped.

Description

Calculates the transpose of the input matrix e. This operation flips the matrix over its main diagonal, effectively turning the original matrix's rows into the new matrix's columns, and vice versa.

For example, a mat2x3 (2 rows, 3 columns) becomes a mat3x2 (3 rows, 2 columns):

[ a, b, c ]      becomes      [ a, d ]
[ d, e, f ]                   [ b, e ]
                              [ c, f ]

W3C Specification: transpose

Use Case

The most important use case for transpose in shader programming is in the calculation of the normal matrix.

When a mesh has a non-uniform scale (e.g., scaled by (2.0, 1.0, 1.0)), simply transforming its normal vectors by the model matrix will incorrectly skew them. The correct way to transform normals is to use the transpose of the inverse of the model matrix.

let model_matrix: mat4x4<f32> = ...;

// For normals, we only care about the rotation and scale part (the upper 3x3).
let upper3x3 = mat3x3<f32>(
    model_matrix.x.xyz,
    model_matrix.y.xyz,
    model_matrix.z.xyz
);

// The normal matrix is the transpose of the inverse of the model's upper 3x3.
let normal_matrix = transpose(inverse(upper3x3));

let mesh_normal: vec3<f32> = ...;

// Now, transforming the normal with this matrix yields the correct result.
let world_normal = normalize(normal_matrix * mesh_normal);

// Note: For matrices that only contain rotation and uniform scale (orthogonal matrices),
// the transpose is equal to the inverse. In these cases, using `transpose()` is
// a significant performance optimization over the much slower `inverse()` function.

Interpolation & Clamping Functions

This family of functions provides the essential tools for controlling numerical values and creating transitions between them. They are fundamental to almost every visual effect, from basic lighting to complex procedural patterns. The functions in this group can be divided into two main categories:

  1. Clamping Functions: These functions (clamp, min, max, saturate) are used for constraining values. They ensure that the results of your calculations remain within a valid and predictable range. This is crucial for preventing visual artifacts, such as colors becoming negative or overly bright, or for providing safe inputs to other mathematical functions that have a limited domain (like acos).

  2. Interpolation Functions: These are the artistic tools for creating blends, fades, and gradients. They define how a value transitions from one state to another. This includes the linear blend of mix, the hard, binary switch of step, and the versatile, eased-in "S-curve" of smoothstep, which is a cornerstone for creating polished, natural-looking effects.

Many of these functions are also "branchless," offering a more performant way to achieve conditional logic than a traditional if statement. Mastering this group is essential for creating high-quality, robust, and visually pleasing shaders.

clamp

Signature

clamp(e: T, low: T, high: T) -> T

  • e: The input value or vector to be clamped.

  • low: The lower bound of the range.

  • high: The upper bound of the range.

  • T: Can be f32, f16, i32, u32, or a vector of these types (e.g., vec3<f32>).

Description

Restricts a value e to be within the inclusive range [low, high].

  • If e is less than low, the result is low.

  • If e is greater than high, the result is high.

  • Otherwise, the result is e.

The function is applied component-wise when the inputs are vectors. The specification notes that for floating-point types, this may be implemented as min(max(e, low), high) or as the median of the three values, which provides a numerically stable way to clamp a value.

It is an error if low is greater than high. This will be caught at shader compilation time if the values are compile-time constants.

W3C Specification: clamp

Use Case

clamp is an essential utility for ensuring values remain within a valid or expected range. It is frequently used as a "safety" function.

// Use Case 1: Keeping color values in the valid [0, 1] range.
var bright_color = vec3<f32>(1.2, -0.1, 0.8);
// Writing this color to the screen could cause visual artifacts.
let safe_color = clamp(bright_color, vec3<f32>(0.0), vec3<f32>(1.0));
// `safe_color` is now vec3<f32>(1.0, 0.0, 0.8)

// Use Case 2: Ensuring valid input for other functions.
// The `acos` function requires an input between -1.0 and 1.0.
// A dot product can sometimes have tiny floating-point errors that push it
// slightly outside this range (e.g., 1.000001 or -1.000001).
let dot_product = dot(vec_a, vec_b);
let clamped_dot = clamp(dot_product, -1.0, 1.0);

// Now it's safe to pass this value to `acos`.
let angle = acos(clamped_dot);

max

Signature

max(e1: T, e2: T) -> T

  • e1, e2: The two input values or vectors to be compared.

  • T: Can be f32, f16, i32, u32, or a vector of these types (e.g., vec3<f32>).

Description

Compares e1 and e2 and returns the larger of the two.

The function is applied component-wise when the inputs are vectors. This means for max(vec_a, vec_b), the resulting vector's x-component will be max(vec_a.x, vec_b.x), its y-component will be max(vec_a.y, vec_b.y), and so on.

Floating-Point Edge Cases:

  • If one operand is NaN (Not a Number), the other operand is returned.

  • If both operands are NaN, a NaN is returned.

  • If both operands are "denormalized" (very small numbers near zero that have lost precision), the result may be either of the two input values.

W3C Specification: max

Use Case

max is a fundamental building block for many shader effects. Its most common use is to establish a "floor" or minimum value for a calculation, preventing it from going below a certain threshold.

// Use Case 1: Diffuse Lighting "NdotL"
// The dot product of a normal and a light direction can be negative,
// which would incorrectly subtract light.
let dot_product = dot(surface_normal, light_dir);

// `max` is used to clamp the result at zero, ensuring that surfaces
// facing away from the light contribute zero brightness, not negative.
let diffuse_brightness = max(0.0, dot_product);

// Use Case 2: Component-wise comparison
let a = vec3<f32>(1.0, 5.0, 3.0);
let b = vec3<f32>(4.0, 2.0, 6.0);

// `max` will be applied to each pair of components.
// max(1.0, 4.0) -> 4.0
// max(5.0, 2.0) -> 5.0
// max(3.0, 6.0) -> 6.0
let result = max(a, b); // Result is vec3<f32>(4.0, 5.0, 6.0)

min

Signature

min(e1: T, e2: T) -> T

  • e1, e2: The two input values or vectors to be compared.

  • T: Can be f32, f16, i32, u32, or a vector of these types (e.g., vec3<f32>).

Description

Compares e1 and e2 and returns the smaller of the two.

The function is applied component-wise when the inputs are vectors. This means for min(vec_a, vec_b), the resulting vector's x-component will be min(vec_a.x, vec_b.x), its y-component will be min(vec_a.y, vec_b.y), and so on.

Floating-Point Edge Cases:

  • If one operand is NaN (Not a Number), the other operand is returned.

  • If both operands are NaN, a NaN is returned.

  • If both operands are "denormalized" (very small numbers near zero that have lost precision), the result may be either of the two input values.

W3C Specification: min

Use Case

min is a fundamental building block, often used to establish a "ceiling" or maximum value for a calculation, preventing it from exceeding a certain threshold. It is also used in creating Signed Distance Functions (SDFs).

// Use Case 1: Limiting an Effect
// An effect strength that should not exceed 1.0.
var effect_strength = calculate_effect(); // Might be > 1.0

// `min` is used to cap the value at 1.0.
let limited_strength = min(1.0, effect_strength);

// Use Case 2: Smoothly combining Signed Distance Functions (SDFs)
// `dist_a` and `dist_b` are the signed distances to two different shapes.
let dist_a = distance_to_circle(uv);
let dist_b = distance_to_square(uv);

// The minimum of the two distances gives the union of the two shapes.
let union_dist = min(dist_a, dist_b);

mix

Performs a linear interpolation (or "blend") between two values. This is one of the most fundamental operations in computer graphics, often referred to as "lerp".

W3C Specification: mix

Overload: Uniform Blending

This overload blends two values or vectors using a corresponding blend factor or vector of blend factors.

Signature

mix(e1: T, e2: T, e3: T) -> T

  • e1: The starting value or vector (when e3 is 0).

  • e2: The ending value or vector (when e3 is 1).

  • e3: The blend factor or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Calculates the linear blend of e1 and e2 using the blend factor e3. The calculation is e1 * (1.0 - e3) + e2 * e3. The blend factor e3 is typically in the range [0.0, 1.0].

  • If e3 is 0.0, the result is e1.

  • If e3 is 1.0, the result is e2.

  • If e3 is 0.5, the result is a perfect 50/50 mix.

When the inputs are vectors, the operation can be performed component-wise if e3 is also a vector. For example, you can blend the red, green, and blue channels of a color by different amounts simultaneously.

Overload: Scalar Blending Factor

This overload is a convenient alternative that blends two vectors using a single, uniform scalar blend factor.

Signature

mix(e1: T, e2: T, e3: S) -> T

  • e1, e2: The start and end vectors to blend between.

  • e3: The single, scalar blend factor.

  • T: The vector type, vecN<S>.

  • S: The scalar component type, f32, or f16.

Description

Performs a component-wise linear blend of the vectors e1 and e2, using the single scalar e3 as the blending factor for every component. This is the most common way to perform a uniform blend between two vectors (like colors or positions).

Use Case

let red = vec3<f32>(1.0, 0.0, 0.0);
let blue = vec3<f32>(0.0, 0.0, 1.0);

// Use Case 1: Simple blend with a scalar factor
let purple = mix(red, blue, 0.5); // Result is vec3<f32>(0.5, 0.0, 0.5)

// Use Case 2: Component-wise blend with a vector factor
// Blend red by 10%, green by 50%, and blue by 90%.
let blend_factors = vec3<f32>(0.1, 0.5, 0.9);
let custom_blend = mix(red, blue, blend_factors);
// result is red*(1-factors) + blue*factors
// (1,0,0)*(0.9,0.5,0.1) + (0,0,1)*(0.1,0.5,0.9)
// (0.9,0,0) + (0,0,0.9) -> (0.9, 0.0, 0.9)
let start_pos = vec3<f32>(0.0, 0.0, 0.0);
let end_pos = vec3<f32>(10.0, 20.0, 0.0);
let animation_progress = 0.25; // 25% of the way through the animation

// Interpolate the position using a single scalar factor.
let current_pos = mix(start_pos, end_pos, animation_progress);
// Result is vec3<f32>(2.5, 5.0, 0.0)

saturate

Signature

saturate(e: T) -> T

  • e: The input value or vector.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

A convenient shorthand function that is exactly equivalent to clamp(e, 0.0, 1.0). It restricts the input value e to be within the inclusive range [0.0, 1.0].

This is a very common operation in graphics, particularly for colors or blending factors, which often need to be constrained to this specific range. Using saturate can make shader code more concise and readable than writing out the full clamp.

W3C Specification: saturate

Use Case

saturate is the ideal function for ensuring that any value intended to be a standard color component or a percentage (like a lighting term or alpha value) is in the valid [0.0, 1.0] range.

// A lighting calculation that might produce values outside the [0, 1] range
// due to high-intensity lights or unusual material properties.
let dot_product = dot(surface_normal, light_dir);
let brightness = dot_product * light_intensity; // Could be 1.5, -0.2, etc.

// `saturate` cleanly clamps the result to a valid brightness range.
let final_brightness = saturate(brightness);

// The following two lines are identical in function:
let clamped_brightness = clamp(brightness, 0.0, 1.0);
let saturated_brightness = saturate(brightness);

smoothstep

Performs a smooth, Hermite interpolation between 0.0 and 1.0. It is a cornerstone of shader effects for creating natural-looking transitions, fades, and soft edges. The resulting curve has an "ease-in" and "ease-out" profile.

W3C Specification: smoothstep

Overload: Scalar Inputs

This is the most common overload, used to smoothly transition a single value.

Signature

smoothstep(low: S, high: S, x: S) -> S

  • low: The lower bound of the transition range.

  • high: The upper bound of the transition range.

  • x: The input value to be mapped.

  • S: Can be f32 or f16.

Description

Calculates a smooth transition from 0.0 to 1.0 as the input x moves from low to high.

  • If x is less than or equal to low, the result is 0.0.

  • If x is greater than or equal to high, the result is 1.0.

  • If x is between low and high, the result is a smoothly interpolated value between 0.0 and 1.0, calculated using the formula t * t * (3.0 - 2.0 * t), where t is the normalized position of x within the range.

Overload: Vector Inputs

This overload applies the smoothstep function component-wise.

Signature

smoothstep(low: T, high: T, x: T) -> T

  • low, high, x: The input vectors.

  • T: Can be vecN<f32> or vecN<f16>.

A convenient alternative signature also exists:

smoothstep(low: S, high: S, x: T) -> T

Where low and high are scalars, and x is a vector. This applies the same scalar range to each component of x.

Description

Performs the smoothstep operation on each component of the input vector x independently. If low and high are vectors, the corresponding components are used for each channel's range. If low and high are scalars, they are used as a uniform range for all components of x.

Use Case

The canonical use case is creating a soft, anti-aliased edge for a shape, or a smooth fade.

// Create a circle with a soft edge.
let center = vec2<f32>(0.5);
let dist = distance(uv, center);

let radius = 0.3;
let softness = 0.1;

// The transition will start at the inner edge (radius - softness)
// and end at the outer edge (radius + softness).
let inner_edge = radius - softness;
let outer_edge = radius + softness;

// `smoothstep` creates a smooth gradient in the softness region.
// We subtract from 1.0 to make the inside of the circle white (1.0).
let soft_circle = 1.0 - smoothstep(inner_edge, outer_edge, dist);

Useful for fading color channels at different rates or applying a smooth threshold to a vector.

let my_color = vec3<f32>(0.2, 0.6, 0.9);

// Use a single scalar range for all color channels.
// Each channel of `my_color` will be smoothly mapped from 0 to 1
// as it crosses the [0.3, 0.8] range.
let faded_color_uniform = smoothstep(0.3, 0.8, my_color);

// Use a different range for each color channel.
let low_thresholds = vec3<f32>(0.1, 0.5, 0.8);
let high_thresholds = vec3<f32>(0.3, 0.7, 0.9);
let faded_color_per_channel = smoothstep(low_thresholds, high_thresholds, my_color);
// Red channel fades between 0.1 and 0.3.
// Green channel fades between 0.5 and 0.7.
// Blue channel fades between 0.8 and 0.9.

step

Performs a binary threshold operation, returning 0.0 or 1.0 based on a comparison. It is a fundamental function for creating hard-edged shapes and conditional masks without using branches (if statements).

Overload: Scalar and Vector Inputs

This overload compares two values or vectors of the same type.

W3C Specification: step

Signature

step(edge: T, x: T) -> T

  • edge: The threshold value or vector.

  • x: The input value or vector to be tested.

  • T: Can be f32, f16, or a vector of these types (e.g., vec3<f32>).

Description

Compares the input x to the edge.

  • If x is greater than or equal to edge, the result is 1.0.

  • If x is less than edge, the result is 0.0.

When the inputs are vectors, the comparison is performed component-wise, and the result is a vector of 0.0s and 1.0s.

Overload: Scalar Edge

This is a convenient alternative overload that compares each component of a vector to a single scalar threshold.

Signature

step(edge: S, x: T) -> T

  • edge: The single, scalar threshold.

  • x: The input vector to be tested.

  • S: The scalar type, f32 or f16.

  • T: The vector type, vecN<S>.

Description

Compares each component of the input vector x against the single scalar edge. The result is a vector of 0.0s and 1.0s.

Use Case

// Use Case 1: Creating a hard-edged circle
let center = vec2<f32>(0.5);
let radius = 0.4;
let dist = distance(uv, center);

// This creates a mask. `step` returns 0.0 inside the circle and 1.0 outside.
let mask = step(radius, dist);

// We subtract from 1.0 to invert it, making the circle white.
let circle = 1.0 - mask;

// Use Case 2: Component-wise threshold
let color = vec3<f32>(0.2, 0.6, 0.8);
let threshold = vec3<f32>(0.5, 0.5, 0.5);
let result = step(threshold, color);
// `result` is vec3<f32>(0.0, 1.0, 1.0), because only the green and blue
// components were greater than or equal to 0.5.
let color = vec3<f32>(0.2, 0.6, 0.8);
let brightness_threshold = 0.5;

// Compare each component of `color` to the single `brightness_threshold`.
let result = step(brightness_threshold, color);
// `result` is vec3<f32>(0.0, 1.0, 1.0).

Logical & Selection Functions

This family of functions provides the core tools for performing logical operations and conditional selection, often serving as high-performance, "branchless" alternatives to if statements.

GPUs achieve their speed by having many threads execute the same instruction in lock-step. An if statement can break this parallelism if some threads in a group take a different path than others (a situation called "thread divergence"). The functions in this category are designed to perform decision-making tasks without introducing this branching, which can lead to more efficient and predictable code.

They serve two main purposes:

  • Vector Reduction: Functions that take a vector of booleans (often the result of a component-wise comparison) and collapse it into a single true or false result.

  • Conditional Selection: Functions that choose between two values based on a boolean condition, acting as WGSL's equivalent of a ternary operator.

Mastering these functions is key to writing concise, readable, and highly efficient shader logic.

all

Signature

  • all(e: vecN<bool>) -> bool

  • all(e: bool) -> bool

  • e: The input boolean or vector of booleans.

  • N: For the vector overload, must be 2, 3, or 4.

Description

Performs a logical AND operation on the input. The behavior depends on the input type:

  • For a vecN<bool>: Returns true if and only if every component of the vector e is true. This is a "vector reduction," collapsing a boolean vector into a single result.

  • For a single bool: Acts as an identity function, simply returning the input boolean e without modification. This overload exists mainly for language completeness.

W3C Specification: all

Use Case

The primary use case is for the vector overload, which allows you to make a single decision based on a component-wise comparison of two vectors.

let reference_point = vec3<f32>(1.0, 2.0, 3.0);
let point_a = vec3<f32>(1.5, 2.5, 3.5); // All components are greater
let point_b = vec3<f32>(0.5, 2.5, 3.5); // One component is not greater

// The component-wise comparison results in a boolean vector.
let a_is_greater = point_a > reference_point; // -> vec3<bool>(true, true, true)
let b_is_greater = point_b > reference_point; // -> vec3<bool>(false, true, true)

// `all` reduces the boolean vector to a single bool.
let result_a = all(a_is_greater); // -> true
let result_b = all(b_is_greater); // -> false

any

Signature

  • any(e: vecN<bool>) -> bool

  • any(e: bool) -> bool

  • e: The input boolean or vector of booleans.

  • N: For the vector overload, must be 2, 3, or 4.

Description

Performs a logical OR operation on the input. The behavior depends on the input type:

  • For a vecN<bool>: Returns true if one or more components of the vector e are true. It only returns false if all components are false. This is a "vector reduction," collapsing a boolean vector into a single result.

  • For a single bool: Acts as an identity function, simply returning the input boolean e without modification. This overload exists mainly for language completeness.

W3C Specification: any

Use Case

The primary use case is for the vector overload, which allows you to check if at least one component of a vector satisfies a condition.

let thresholds = vec3<f32>(0.5, 0.8, 0.3);
let values = vec3<f32>(0.6, 0.7, 0.9);

// The component-wise comparison results in a boolean vector.
// 0.6 > 0.5 -> true
// 0.7 > 0.8 -> false
// 0.9 > 0.3 -> true
let is_over_threshold = values > thresholds; // -> vec3<bool>(true, false, true)

// `any` reduces the boolean vector to a single bool.
// Since at least one component was true, the result is true.
let result = any(is_over_threshold); // -> true

if (result) {
    // This code will execute because the red and blue channels passed the check.
}

select

The select function is WGSL's branchless equivalent of a ternary operator (like condition ? value_if_true : value_if_false in other languages). It chooses between two values based on a boolean condition without using an if statement, which can be more performant on the GPU.

W3C Specification: select

Overload: Scalar Condition

This is the most common overload. It selects one of two complete values (scalars or vectors) based on a single boolean condition.

Signature

select(f: T, t: T, cond: bool) -> T

  • f: The value to be returned if cond is false.

  • t: The value to be returned if cond is true.

  • cond: The single boolean condition.

  • T: Can be any scalar (i32, f32, etc.) or vector (vec3<f32>, etc.) type. f and t must be of the same type.

Description

Returns the value t if the boolean cond is true, and returns the value f otherwise.

Crucially, this is a branchless operation. The GPU evaluates both the t and f expressions, and then simply selects the correct result. This avoids the performance cost of "thread divergence" that can happen with an if statement. However, it also means you should never use expensive, mutually-exclusive functions for both the t and f arguments, as both will always be executed.

Overload: Vector Condition

This overload performs a component-wise selection, building a new vector by choosing components from two source vectors based on a boolean vector.

Signature

select(f: vecN<T>, t: vecN<T>, cond: vecN<bool>) -> vecN<T>

  • f, t: The source vectors.

  • cond: The vector of boolean conditions.

  • T: The scalar component type of the vectors.

  • N: The dimension of the vectors.

Description

Constructs a result vector where each component i is chosen from t[i] if cond[i] is true, or from f[i] if cond[i] is false. All three input vectors must have the same number of components.

Use Case

let brightness = 0.8;
let is_bright = brightness > 0.5; // is_bright is true

let black = vec3<f32>(0.0);
let white = vec3<f32>(1.0);

// Because is_bright is true, the function will select the `t` value (white).
let final_color = select(black, white, is_bright); // Result is vec3<f32>(1.0)
let a = vec3<f32>(1.0, 10.0, 100.0);
let b = vec3<f32>(5.0, 5.0, 5.0);

// The component-wise comparison results in a boolean vector.
// 1.0 > 5.0 -> false
// 10.0 > 5.0 -> true
// 100.0 > 5.0 -> true
let condition_vec = a > b; // Result is vec3<bool>(false, true, true)

// Build a new vector based on the condition vector:
// Result.x is from a.x because condition_vec.x is false.
// Result.y is from b.y because condition_vec.y is true.
// Result.z is from b.z because condition_vec.z is true.
let result = select(a, b, condition_vec); // Result is vec3<f32>(1.0, 5.0, 5.0)

Integer & Bitwise Functions

This family of functions operates on the binary representation of integers. Instead of treating them as numerical values for arithmetic, these functions treat i32 and u32 types as a sequence of 32 individual bits (0s and 1s).

They provide the tools to perform low-level bit manipulation, such as:

  • Counting the number of set (1) or unset (0) bits.

  • Finding the position of the first or last set bit.

  • Reversing the order of all bits within an integer.

  • Directly reading (extracting) or writing (inserting) specific segments of bits.

While less common in high-level visual effects than mathematical or texture functions, these operations are indispensable for performance-critical algorithms, custom data packing and unpacking, creating bitmasks and flags, and implementing hash functions or pseudo-random number generators. They offer precise control over data at its most fundamental level.

countLeadingZeros

Signature

countLeadingZeros(e: T) -> T

  • e: The input integer or vector of integers.

  • T: Can be i32, u32, or a vector of these types (e.g., vec3<u32>).

Description

Counts the number of consecutive 0 bits, starting from the most significant bit (the "leftmost" bit in a standard binary representation) of a 32-bit integer. This function is often abbreviated as "clz".

  • countLeadingZeros(0u) returns 32.

  • countLeadingZeros(1u) returns 31.

  • countLeadingZeros(2147483648u) (which is 1 followed by 31 zeros in binary) returns 0.

The function is applied component-wise when the input is a vector.

W3C Specification: countLeadingZeros

Use Case

This is a low-level bit manipulation function often used for performance-critical tasks like calculating an integer logarithm base 2 or finding the position of the most significant bit (MSB).

// Find the integer base-2 logarithm of a number, which is equivalent
// to finding the position of its most significant bit.
let value = 1000u; // Binary: ...01111101000, MSB is at position 9 (0-indexed)

// The number of leading zeros tells us how far the MSB is from the "left".
let leading_zeros = countLeadingZeros(value); // For 1000u, this is 22.

// For a non-zero u32, the position of the MSB is 31 minus the number of leading zeros.
let msb_position = 31u - leading_zeros; // 31 - 22 = 9

// The result, 9, is the integer log2 of 1000 (since 2^9 = 512, which is the
// largest power of 2 less than or equal to 1000).

countOneBits

Signature

countOneBits(e: T) -> T

  • e: The input integer or vector of integers.

  • T: Can be i32, u32, or a vector of these types (e.g., vec3<u32>).

Description

Counts the number of bits that are set to 1 in the binary representation of an integer. This operation is also known as "population count" or "popcount".

  • countOneBits(0u) returns 0.

  • countOneBits(7u) (binary ...0111) returns 3.

  • countOneBits(4294967295u) (all 1s for a u32) returns 32.

  • For negative numbers, the count is based on the two's complement representation. countOneBits(-1i) is 32.

The function is applied component-wise when the input is a vector.

W3C Specification: countOneBits

Use Case

This is a low-level bit manipulation function used in a variety of algorithms, including cryptography, data compression, and implementing certain types of hash functions or pseudo-random number generators.

// A simple example of using popcount to measure the "bit density" of a number.
let val_a = 7u; // ...00000111
let val_b = 8u; // ...00001000

let popcount_a = countOneBits(val_a); // Result is 3
let popcount_b = countOneBits(val_b); // Result is 1

// It can be a building block for a pseudo-random generator (hash function).
// This is a toy example, not a high-quality hash.
fn simple_hash(n: u32) -> u32 {
    let a = n ^ (n << 13u);
    let b = a ^ (a >> 7u);
    return countOneBits(b);
}

countTrailingZeros

Signature

countTrailingZeros(e: T) -> T

  • e: The input integer or vector of integers.

  • T: Can be i32, u32, or a vector of these types (e.g., vec3<u32>).

Description

Counts the number of consecutive 0 bits, starting from the least significant bit (the "rightmost" bit in a standard binary representation) of a 32-bit integer. This function is often abbreviated as "ctz".

  • countTrailingZeros(0u) returns 32.

  • countTrailingZeros(8u) (binary ...1000) returns 3.

  • countTrailingZeros(7u) (binary ...0111) returns 0.

The function is applied component-wise when the input is a vector.

W3C Specification: countTrailingZeros

Use Case

This is a low-level bit manipulation function. Its primary use is to find the position of the least significant set bit (LSB). It can also be used to quickly determine the largest power of 2 that a number is divisible by.

// A number whose binary representation is ...0101100.
let value = 44u;

// The number of trailing zeros is equal to the 0-indexed position
// of the least significant '1' bit.
// For ...0101100, there are two trailing zeros, so the LSB is at position 2.
let lsb_position = countTrailingZeros(value); // Result is 2.

// This is useful for algorithms that need to isolate and process the
// lowest set bit in a bitmask.
// 2 to the power of `lsb_position` gives us a mask for that bit.
let lsb_mask = 1u << lsb_position; // 1u << 2u is 4u (binary ...100)

extractBits

Extracts a sequence of bits from a specific location within an integer. The behavior of the function differs significantly depending on whether the integer is signed or unsigned.

W3C Specification: extractBits

Overload: Signed Integers (i32)

This overload reads bits from a signed integer and performs sign extension on the result.

Signature

extractBits(e: T, offset: u32, count: u32) -> T

  • e: The source integer or vector of integers.

  • offset: The 0-indexed starting bit position to begin extracting from.

  • count: The number of bits to extract.

  • T: Can be i32 or vecN<i32>.

Description

Extracts a count-bit sequence from e starting at bit offset. The key feature is sign extension: the most significant bit of the extracted sequence is copied to all the higher bits of the 32-bit result. This preserves the numerical value of the extracted bits as if they were a smaller signed integer type.

The offset and count are clamped to safe values to prevent out-of-bounds reads. It is a compile-time error if offset + count is greater than 32 and both parameters are compile-time constants.

Overload: Unsigned Integers (u32)

This overload reads bits from an unsigned integer and performs zero extension on the result.

Signature

extractBits(e: T, offset: u32, count: u32) -> T

  • e: The source integer or vector of integers.

  • offset: The 0-indexed starting bit position.

  • count: The number of bits to extract.

  • T: Can be u32 or vecN<u32>.

Description

Extracts a count-bit sequence from e starting at bit offset. The key difference from the signed version is zero extension: all higher bits of the 32-bit result that were not part of the extraction are filled with 0s.

The offset and count are clamped to safe values. It is a compile-time error if offset + count is greater than 32 and both parameters are compile-time constants.

Use Case

Used for unpacking multiple smaller signed values that have been packed into a single i32. Sign extension is critical for correctly restoring negative values.

// Imagine two 4-bit signed numbers are packed into the first 8 bits of an i32.
// High bits: 1011 (which is -5 in a 4-bit signed integer)
// Low bits:  0101 (which is +5 in a 4-bit signed integer)
// The combined bit pattern is ...000010110101
let packed_data = 181i; // 10110101 in binary

// Extract the high 4 bits (the negative number)
// Offset is 4, count is 4.
// Extracted bits are '1011'. The most significant bit of this sequence is '1'.
// Sign extension will fill the upper 28 bits of the result with '1's.
// Result: ...11111011, which is the correct 32-bit representation of -5.
let high_value = extractBits(packed_data, 4u, 4u); // Result is -5

// Extract the low 4 bits (the positive number)
// Offset is 0, count is 4.
// Extracted bits are '0101'. The most significant bit of this sequence is '0'.
// The upper 28 bits of the result are filled with '0's.
// Result: ...00000101, which is the correct 32-bit representation of 5.
let low_value = extractBits(packed_data, 0u, 4u); // Result is 5

Used for unpacking multiple smaller unsigned values, such as color channels or other non-negative data packed into a single u32.

// Imagine an RGBA color where R and G are 4 bits each, packed into a u32.
// High bits (G): 1011 (which is the unsigned value 11)
// Low bits  (R): 0101 (which is the unsigned value 5)
// The combined bit pattern is ...000010110101
let packed_color = 181u;

// Extract the high 4 bits (the Green channel)
// Offset is 4, count is 4.
// Extracted bits are '1011'.
// Zero extension fills the upper bits with '0's.
// Result: ...00001011, which is the correct 32-bit representation of 11.
let green_channel = extractBits(packed_color, 4u, 4u); // Result is 11

// Extract the low 4 bits (the Red channel)
// Offset is 0, count is 4.
// Extracted bits are '0101'.
// Zero extension fills the upper bits with '0's.
// Result: ...00000101, which is the correct 32-bit representation of 5.
let red_channel = extractBits(packed_color, 0u, 4u); // Result is 5

firstLeadingBit

Finds the position of the most significant 1 bit in an integer's binary representation. This is also known as "find first set" from the left.

W3C Specification: firstLeadingBit

Overload: Unsigned Integers (u32)

This is the most common and intuitive version of the function.

Signature

firstLeadingBit(e: T) -> T

  • e: The source unsigned integer or vector.

  • T: Can be u32 or vecN<u32>.

Description

Finds the index of the most significant (leftmost) 1 bit in the binary representation of e.

  • If e is not zero, the result is the 0-indexed position of the highest set bit.

  • If e is zero, the result is -1.

Use Case

This function is a highly efficient way to calculate the integer base-2 logarithm of a number, which is equivalent to finding the position of its most significant bit.

// A number whose binary representation is ...00101000.
// The most significant '1' bit is at position 5 (reading from the right, 0-indexed).
let value_a = 40u;
let msb_pos_a = firstLeadingBit(value_a); // Result is 5

// Another example. Binary is ...01000000
let value_b = 64u;
let msb_pos_b = firstLeadingBit(value_b); // Result is 6

// Case where the input is zero.
let value_c = 0u;
let msb_pos_c = firstLeadingBit(value_c); // Result is -1

Overload: Signed Integers (i32)

This overload operates on the two's complement representation of a signed integer, which can lead to different results than the unsigned version, especially for negative numbers.

Signature

firstLeadingBit(e: T) -> T

  • e: The source signed integer or vector.

  • T: Can be i32 or vecN<i32>.

Description

Finds the index of the most significant 1 bit in the two's complement binary representation of e.

  • For positive numbers, the behavior is the same as the u32 version. firstLeadingBit(40i) is 5.

  • For negative numbers, the result is the position of the leftmost 0 bit, because in two's complement, negative numbers are effectively the bitwise inverse of their positive counterparts (plus one), resulting in many leading 1s.

  • If e is -1 (all bits are 1), the result is -1.

  • If e is zero, the result is -1.

Use Case

This is a specialized function. It is rarely used directly for visual effects. Its primary use is in low-level numerical algorithms where operating on the two's complement representation of numbers is necessary. For most "find my highest bit" tasks, the u32 version is what you want.

// Positive number, same as u32 version.
let val_pos = 40i;
let msb_pos_pos = firstLeadingBit(val_pos); // Result is 5

// For negative numbers, the two's complement representation is used.
// -1 in i32 is 32 ones (...11111111). There are no '0' bits to be the "leading bit".
// The spec defines this as a special case.
let val_neg_one = -1i;
let msb_pos_neg_one = firstLeadingBit(val_neg_one); // Result is -1

firstTrailingBit

Signature

firstTrailingBit(e: T) -> T

  • e: The input integer or vector of integers.

  • T: Can be i32, u32, or a vector of these types (e.g., vec3<u32>).

Description

Finds the position of the least significant 1 bit in an integer's binary representation. This is also known as "find first set" from the right.

  • If e is not zero, the result is the 0-indexed position of the rightmost 1 bit.

  • If e is zero, the result is -1.

Unlike firstLeadingBit, the behavior of this function is consistent for both positive and negative signed integers because the least significant bits are not affected by the two's complement sign representation in the same way.

W3C Specification: firstTrailingBit

Use Case

This function is a highly efficient way to find the position of the lowest set bit in a bitmask. This is equivalent to countTrailingZeros, but provides a different name for the same operation, which can sometimes improve code clarity depending on the algorithm's intent.

// A number whose binary representation is ...0101100.
// The least significant '1' bit is at position 2 (reading from the right, 0-indexed).
let value = 44u;
let lsb_position = firstTrailingBit(value); // Result is 2.

// This result is identical to `countTrailingZeros(44u)`.

// Case where the input is zero.
let value_zero = 0u;
let lsb_pos_zero = firstTrailingBit(value_zero); // Result is -1

insertBits

Signature

insertBits(e: T, newbits: T, offset: u32, count: u32) -> T

  • e: The base integer or vector to insert bits into.

  • newbits: The integer or vector containing the bits to insert.

  • offset: The 0-indexed starting bit position in e where insertion begins.

  • count: The number of bits to insert.

  • T: Can be i32, u32, or a vector of these types (e.g., vec3<u32>).

Description

Constructs a new integer by replacing a sequence of bits in e with bits from newbits.

Specifically, it takes the count least significant bits from newbits and copies them into the bit range [offset, offset + count - 1] of the base value e. All other bits from e are preserved.

The offset and count parameters are clamped to safe values to prevent out-of-bounds writes. It is a compile-time error if offset + count is greater than 32 and both parameters are compile-time constants.

W3C Specification: insertBits

Use Case

This is a low-level bit manipulation function used for packing multiple smaller values into a single larger integer. It is the inverse operation of extractBits.

// Let's pack a 4-bit Red and a 4-bit Green channel into a single u32.
var packed_color = 0u; // Start with an empty integer

let red_channel = 5u;  // Binary ...0101
let green_channel = 11u; // Binary ...1011

// Insert the 4 bits of the red channel at the beginning (offset 0).
// packed_color is now ...00000101
packed_color = insertBits(packed_color, red_channel, 0u, 4u);

// Insert the 4 bits of the green channel starting at bit 4.
// packed_color is now ...000010110101
packed_color = insertBits(packed_color, green_channel, 4u, 4u);

// The final packed_color is 181u, which contains both pieces of data.
// This can now be unpacked using `extractBits`.

reverseBits

Signature

reverseBits(e: T) -> T

  • e: The input integer or vector of integers.

  • T: Can be i32, u32, or a vector of these types (e.g., vec3<u32>).

Description

Reverses the order of the 32 bits in the binary representation of e. The most significant bit (bit 31) becomes the least significant bit (bit 0), and so on.

For example, for a u32, the bit at position k in the result is taken from the bit at position 31 - k in the input.

W3C Specification: reverseBits

Use Case

This is a low-level bit manipulation function used in specific algorithms that require reordering of data at the bit level, such as certain Fast Fourier Transform (FFT) implementations, Cyclic Redundancy Checks (CRCs), or as a component in some hash functions.

// Let's take the number 13u.
// Its 32-bit binary representation is:
// 00000000 00000000 00000000 00001101
let value = 13u;

// `reverseBits` will flip this entire sequence.
let reversed_value = reverseBits(value);

// The result's binary representation will be:
// 10110000 00000000 00000000 00000000
// which is the decimal value 2952790016u.

Buffer and Array Functions

This category contains functions that query metadata about buffer resources, specifically those containing arrays whose size is not known at shader compilation time.

In WGSL, storage buffers can be declared with runtime-sized arrays, meaning their final length is determined by the buffer that is bound on the CPU side. This is a powerful feature for handling dynamic data, but it requires a special mechanism within the shader to determine the array's bounds. The functions in this group provide that mechanism, allowing you to safely query the size of a runtime array to prevent out-of-bounds memory access.

arrayLength

Signature

arrayLength(p: ptr<storage, T>) -> u32

  • p: A pointer to a runtime-sized array.

  • T: The type of the array itself (e.g., array<f32>).

Description

Returns the number of elements in a runtime-sized array that is stored in a storage buffer.

This function has a critical structural requirement: it can only be used on an array that is the last member of a struct which is the type for a storage buffer variable. The size is not known at shader compilation time; it is determined by the size of the actual buffer that is bound on the CPU side at runtime.

This function is the primary mechanism for safely working with dynamic-sized data in shaders, as it allows you to query the buffer's bounds before accessing it.

W3C Specification: arrayLength

Use Case

The most common use case is to get the size of an input buffer in a compute shader to use as a loop bound or for safety checks to prevent out-of-bounds memory access.

// This struct defines the layout of our storage buffer.
// The runtime-sized array MUST be the last member.
struct MyBuffer {
    some_config_value: f32,
    data: array<f32>,
}

@group(0) @binding(0) var<storage, read> my_buffer: MyBuffer;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
    // Get the total number of elements in the 'data' array at runtime.
    // Note the '&' to pass a pointer to the array.
    let num_elements = arrayLength(&my_buffer.data);

    // Use the length to prevent reading past the end of the buffer.
    if (global_id.x >= num_elements) {
        return;
    }

    // Now it's safe to access the element.
    let value = my_buffer.data[global_id.x];
    // ... do work with the value ...
}

Type Conversion & Reinterpretation

This category contains functions that perform low-level reinterpretation of a value's underlying binary representation.

Unlike standard type conversions (e.g., f32(my_int)), which change a value from one numerical representation to another while preserving its meaning, these functions change the type label of a block of memory without altering the raw bits.

This is an advanced operation used for specialized algorithms, data packing, or when you need to operate directly on the bit pattern of a number. The core principle for these operations is that the total bit-width of the source and destination types must be identical.

bitcast

A low-level operation that reinterprets the raw binary bits of an input value as if they were of a different type, without changing the bit pattern itself. This is not a traditional type conversion that preserves numerical value (e.g., bitcast<i32>(1.0) will not result in 1).

The fundamental rule of bitcast is that the total bit-width of the input and output types must be identical. The target type T is specified using a unique angle-bracket syntax: bitcast<T>(...).

W3C Specification: bitcast

Overloads: Same-Size Reinterpretation (f32, i32, u32)

This group of overloads reinterprets the bits between the fundamental 32-bit types.

Signature
  • bitcast<T>(e: S) -> T (Scalar)

  • bitcast<vecN<T>>(e: vecN<S>) -> vecN<T> (Vector)

  • S: The source type, can be f32, i32, or u32.

  • T: The target type, a different type from S, can be f32, i32, or u32.

Description

Reinterprets the 32 bits of a scalar or each 32-bit component of a vector. For example, bitcast<u32>(1.0) takes the 32 bits that represent the f32 value 1.0 (which is 0x3F800000 in hex) and interprets them as a u32, resulting in the integer 1065353216.

Use Case

Used for data packing or implementing highly specialized algorithms that rely on the binary representation of numbers.

// Use case: Packing a floating-point value into an integer for storage.
let my_float: f32 = -2.0;

// Reinterpret the 32 bits of the float as a 32-bit unsigned integer.
// The result is not 2, but the integer value of the float's bit pattern.
let my_packed_int: u32 = bitcast<u32>(my_float);

// You could now store this integer in a buffer that only accepts integers.
// To retrieve it, you would bitcast back:
let my_unpacked_float = bitcast<f32>(my_packed_int); // Restores -2.0

Overloads: Packing and Unpacking (f16)

This group of overloads reinterprets bits between types of different sizes but the same total bit-width. This is used for packing smaller f16 values into larger 32-bit types and vice versa.

Signature
  • bitcast<T>(e: vec2<f16>) -> T (Pack two f16 into one 32-bit value)

  • bitcast<vec2<f16>>(e: T) -> vec2<f16> (Unpack one 32-bit value into two f16s)

  • bitcast<vec2<T>>(e: vec4<f16>) -> vec2<T> (Pack four f16 into two 32-bit values)

  • bitcast<vec4<f16>>(e: vec2<T>) -> vec4<f16> (Unpack two 32-bit values into four f16s)

  • T: Can be f32, i32, or u32.

Description

These overloads allow you to treat a block of memory as different types. For example, since an f16 is 16 bits, two of them (vec2<f16>) occupy the same 32 bits of memory as a single f32. bitcast allows you to convert between these representations.

Use Case

Used for memory optimization, allowing you to store data with lower precision (f16) and unpack it to higher precision (f32) for calculations.

// Imagine you have two f16 values representing UV coordinates.
let uvs_f16 = vec2<f16>(0.5, 0.25);

// You can pack them into a single u32 to save space in a storage buffer.
// The 32 bits of the u32 will now contain the 16 bits of the first f16
// and the 16 bits of the second f16.
let packed_uvs = bitcast<u32>(uvs_f16);

// Later, in another shader, you can unpack them to do math.
let unpacked_uvs = bitcast<vec2<f16>>(packed_uvs);

// And convert to f32 for high-precision calculations.
let uvs_f32 = vec2<f32>(unpacked_uvs);

Overload: Identity Transform

This overload handles the case where the source and target types are the same.

Signature

bitcast<T>(e: T) -> T

  • T: A concrete numeric scalar or vector type (e.g., f32, vec3<i32>).
Description

When the source type e and the target type T are identical, bitcast performs an identity transform. It simply returns the input value e without any modification.

While this has no effect on the value, it exists for language completeness and can be useful in generic, programmatic shader generation where the source and target types might sometimes be the same.

Use Case
let my_val: f32 = 123.45;

// This is a valid operation, but it does nothing.
let same_val = bitcast<f32>(my_val); // `same_val` is also 123.45

Packing & Unpacking Functions

This family of functions provides a low-level way to optimize memory usage by converting multiple floating-point values into a single 32-bit unsigned integer (u32), and vice versa. This is a common and critical technique for efficiently storing vertex attributes or data in storage buffers.

The core idea is to reduce the precision of the data to a more compact format before storing it:

  • Packing functions take a vector of f32 values and convert each component to a smaller representation (like a 16-bit float or an 8/16-bit normalized integer). The bits of these smaller components are then packed together into a single u32. This is a lossy conversion, as precision is intentionally discarded.

  • Unpacking functions perform the reverse operation. They take a u32, split it into its smaller bit-chunks, and convert each chunk back into a f32 value according to the specified format.

These functions support several standard formats, including half-precision floats (float), unsigned normalized integers (unorm for values in the [0.0, 1.0] range like colors and UVs), and signed normalized integers (snorm for values in the [-1.0, 1.0] range like normals and tangents).


pack2x16float

Signature

pack2x16float(e: vec2<f32>) -> u32

  • e: The input vector containing two 32-bit floats.

Description

Packs two 32-bit floating-point values (f32) into a single 32-bit unsigned integer (u32).

The function performs this by converting each component of the input vec2<f32> to its nearest 16-bit half-precision representation (an f16). This is a lossy conversion, as the f16 type has less precision and a smaller range than f32.

The 16 bits of the first component (e.x) are placed into the lower half (bits 0-15) of the resulting u32, and the 16 bits of the second component (e.y) are placed into the upper half (bits 16-31).

The input values must be within the representable range of an f16 float. Providing a value that is too large, too small, or NaN will result in an error if the value is known at compile time, or an indeterminate u32 value at runtime.

W3C Specification: pack2x16float

Use Case

This function is essential for memory optimization. It allows you to store data that does not require full 32-bit precision, such as texture coordinates, more compactly.

// A pair of texture coordinates that don't need high precision.
let uvs_f32 = vec2<f32>(0.5, 0.25);

// Pack the two 32-bit floats into a single 32-bit integer.
let packed_uvs: u32 = pack2x16float(uvs_f32);

// The `packed_uvs` variable now holds both UV coordinates in just 32 bits,
// whereas the original `vec2<f32>` required 64 bits. This is highly useful
// for storing data compactly in storage buffers. The corresponding
// `unpack2x16float` function is used to reverse the operation.

pack2x16snorm

Signature

pack2x16snorm(e: vec2<f32>) -> u32

  • e: The input vector containing two f32 values, typically in the range [-1.0, 1.0].

Description

Packs two 32-bit floating-point values into a single 32-bit unsigned integer, treating the inputs as "signed normalized" (snorm) numbers. This is the inverse operation of unpack2x16snorm.

The packing process works as follows for each component of the input vec2<f32>:

  1. The f32 value is first clamped to the [-1.0, 1.0] range.

  2. It is then scaled by 32767.0.

  3. The result is rounded to the nearest whole number.

  4. This number is stored as a 16-bit two's-complement signed integer.

The 16 bits of the first component (e.x) are placed into the lower half (bits 0-15) of the resulting u32, and the 16 bits of the second component (e.y) are placed into the upper half (bits 16-31).

W3C Specification: pack2x16snorm

Use Case

This function is used to compactly store data that naturally fits the [-1.0, 1.0] range, such as normal vectors or other signed directional data, saving significant memory.

// A normal vector's x and y components. The z component is often omitted and
// reconstructed later to save even more space.
let normal_xy = vec2<f32>(0.707, -0.707);

// Pack the two 32-bit floats into a single 32-bit integer.
let packed_normal_xy: u32 = pack2x16snorm(normal_xy);

// The `packed_normal_xy` variable now holds both vector components in just 32 bits.
// This can be stored efficiently in a storage buffer or texture.
// The `unpack2x16snorm` function is used to reverse the operation.

pack2x16unorm

Signature

pack2x16unorm(e: vec2<f32>) -> u32

  • e: The input vector containing two f32 values, typically in the range [0.0, 1.0].

Description

Packs two 32-bit floating-point values into a single 32-bit unsigned integer, treating the inputs as "unsigned normalized" (unorm) numbers. This is the inverse operation of unpack2x16unorm.

The packing process works as follows for each component of the input vec2<f32>:

  1. The f32 value is first clamped to the [0.0, 1.0] range.

  2. It is then scaled by 65535.0.

  3. The result is rounded to the nearest whole number.

  4. This number is stored as a 16-bit unsigned integer.

The 16 bits of the first component (e.x) are placed into the lower half (bits 0-15) of the resulting u32, and the 16 bits of the second component (e.y) are placed into the upper half (bits 16-31).

W3C Specification: pack2x16unorm

Use Case

This is the most common packing function for data that is naturally in the [0.0, 1.0] range, such as texture coordinates or HDR color components that have been tone-mapped.

// A pair of texture coordinates.
let uvs = vec2<f32>(0.25, 0.75);

// Pack the two 32-bit floats into a single 32-bit integer.
let packed_uvs: u32 = pack2x16unorm(uvs);

// The `packed_uvs` variable now holds both UV coordinates in just 32 bits,
// saving half the memory compared to a vec2<f32>.
// The `unpack2x16unorm` function is used to reverse the operation.

pack4x8snorm

Signature

pack4x8snorm(e: vec4<f32>) -> u32

  • e: The input vector containing four f32 values, typically in the range [-1.0, 1.0].

Description

Packs four 32-bit floating-point values into a single 32-bit unsigned integer, treating the inputs as "signed normalized" (snorm) numbers. This is the inverse operation of unpack4x8snorm.

This function offers greater compression than the pack2x16 variants, but with lower precision (8 bits per component, allowing for 256 discrete values). The packing process works as follows for each component of the input vec4<f32>:

  1. The f32 value is first clamped to the [-1.0, 1.0] range.

  2. It is then scaled by 127.0.

  3. The result is rounded to the nearest whole number.

  4. This number is stored as an 8-bit two's-complement signed integer.

The four resulting 8-bit integers are then packed into the u32 result:

  • e.x is placed in bits 0-7.

  • e.y is placed in bits 8-15.

  • e.z is placed in bits 16-23.

  • e.w is placed in bits 24-31.

W3C Specification: pack4x8snorm

Use Case

This function is ideal for storing four-component data where each component is in the [-1.0, 1.0] range and high precision is not required. A common example is packing a tangent vector, which has an xyz direction and a w component for handedness.

// A tangent vector for normal mapping.
// The .xyz is the direction, and .w is the handedness (-1.0 or 1.0).
let tangent = vec4<f32>(0.707, 0.0, 0.707, 1.0);

// Pack the four 32-bit floats into a single 32-bit integer.
let packed_tangent: u32 = pack4x8snorm(tangent);

// The `packed_tangent` variable now holds the entire tangent vector in
// just 32 bits, a 4x memory saving compared to the original vec4<f32>.
// This is very efficient for storing vertex attributes.

pack4x8unorm

Signature

pack4x8unorm(e: vec4<f32>) -> u32

  • e: The input vector containing four f32 values, typically in the range [0.0, 1.0].

Description

Packs four 32-bit floating-point values into a single 32-bit unsigned integer, treating the inputs as "unsigned normalized" (unorm) numbers. This is the inverse operation of unpack4x8unorm.

This function is one of the most common packing operations, as its format directly corresponds to the standard 8-bit RGBA color representation. The packing process works as follows for each component of the input vec4<f32>:

  1. The f32 value is first clamped to the [0.0, 1.0] range.

  2. It is then scaled by 255.0.

  3. The result is rounded to the nearest whole number.

  4. This number is stored as an 8-bit unsigned integer.

The four resulting 8-bit integers are then packed into the u32 result:

  • e.x (Red) is placed in bits 0-7.

  • e.y (Green) is placed in bits 8-15.

  • e.z (Blue) is placed in bits 16-23.

  • e.w (Alpha) is placed in bits 24-31.

W3C Specification: pack4x8unorm

Use Case

This function is the standard method for packing a floating-point RGBA color into a single 32-bit integer, which is how colors are often stored in textures and buffers to save memory.

// A standard RGBA color with floating-point components.
let my_color = vec4<f32>(1.0, 0.5, 0.25, 1.0); // An orange color

// Pack the four f32 components into a single u32.
let packed_color: u32 = pack4x8unorm(my_color);

// The `packed_color` now holds the entire color in 32 bits, a 4x memory saving.
// This integer value could be written directly to a texture format like `Rgba8Unorm`.
// For example, the resulting integer would be 4284901375u, which is
// 0xFF4080FF in hexadecimal (AABBGGRR in little-endian).

unpack2x16float

Signature

unpack2x16float(e: u32) -> vec2<f32>

  • e: The input 32-bit unsigned integer containing the packed data.

Description

Unpacks a single u32 value into two 32-bit floating-point values (f32). This is the inverse operation of pack2x16float.

The unpacking process works as follows:

  1. The u32 input e is split into two 16-bit chunks.

  2. Each 16-bit chunk is interpreted as an IEEE-754 binary16 (half-precision float) value.

  3. This 16-bit float value is then converted (promoted) to a full 32-bit f32.

The two resulting f32 values are placed into a vec2<f32>:

  • Bits 0-15 of e become the first component of the result (.x).

  • Bits 16-31 of e become the second component (.y).

W3C Specification: unpack2x16float

Use Case

This function is used to read and decode data that was previously packed with pack2x16float to save memory. It allows you to restore lower-precision data into a high-precision format suitable for calculations.

// A u32 value received from a storage buffer, which contains packed UV coordinates.
let packed_uvs: u32 = ...;

// Unpack the u32 back into two f32s.
let unpacked_uvs: vec2<f32> = unpack2x16float(packed_uvs);

// Now `unpacked_uvs` is a standard vec2<f32> that can be used for
// texture sampling or other calculations. The precision of the unpacked
// values will be limited to that of the original f16 representation.
let color = textureSample(my_texture, my_sampler, unpacked_uvs);

unpack2x16snorm

Signature

unpack2x16snorm(e: u32) -> vec2<f32>

  • e: The input 32-bit unsigned integer containing the packed data.

Description

Unpacks a single u32 value into two 32-bit floating-point values (f32), interpreting the packed data as "signed normalized" (snorm) numbers. This is the inverse operation of pack2x16snorm.

The process works as follows:

  1. The u32 input e is split into two 16-bit chunks.

  2. Each 16-bit chunk is interpreted as a two's-complement signed integer, giving a value v in the range [-32768, 32767].

  3. This integer v is converted to a floating-point number in the range [-1.0, 1.0] by dividing it by 32767.0 and clamping the result at -1.0.

  • Bits 0-15 of e become the first component of the result (.x).

  • Bits 16-31 of e become the second component (.y).

"Signed Normalized" (snorm) is a standard format for storing values that have a direction (like components of a normal vector) efficiently in an integer.

W3C Specification: unpack2x16snorm

Use Case

This function is used to read and decode data that was previously packed with pack2x16snorm to save memory. It's ideal for data that naturally fits the [-1.0, 1.0] range.

// A u32 value received from a storage buffer, which contains a packed normal vector's x and y components.
let packed_normal_xy: u32 = ...;

// Unpack the u32 back into two f32s in the [-1.0, 1.0] range.
let unpacked_normal_xy: vec2<f32> = unpack2x16snorm(packed_normal_xy);

// Now the unpacked_normal_xy can be used in lighting calculations.
// For example, reconstructing the full 3D normal:
let normal_xy = unpacked_normal_xy;
let normal_z = sqrt(1.0 - saturate(dot(normal_xy, normal_xy)));
let full_normal = vec3<f32>(normal_xy, normal_z);

unpack2x16unorm

Signature

unpack2x16unorm(e: u32) -> vec2<f32>

  • e: The input 32-bit unsigned integer containing the packed data.

Description

Unpacks a single u32 value into two 32-bit floating-point values (f32), interpreting the packed data as "unsigned normalized" (unorm) numbers. This is the inverse operation of pack2x16unorm.

The process works as follows:

  1. The u32 input e is split into two 16-bit chunks.

  2. Each 16-bit chunk is interpreted as an unsigned integer, giving a value v in the range [0, 65535].

  3. This integer v is converted to a floating-point number in the range [0.0, 1.0] by dividing it by 65535.0.

  • Bits 0-15 of e become the first component of the result (.x).

  • Bits 16-31 of e become the second component (.y).

"Unsigned Normalized" (unorm) is a standard format for storing values that have a natural [0.0, 1.0] range, such as texture coordinates or color channels.

W3C Specification: unpack2x16unorm

Use Case

This function is used to read and decode data that was previously packed with pack2x16unorm to save memory. It's the standard way to reconstruct 16-bit unorm data, like UV coordinates, into a format suitable for high-precision calculations.

// A u32 value received from a storage buffer, which contains packed UV coordinates.
let packed_uvs: u32 = ...;

// Unpack the u32 back into two f32s in the [0.0, 1.0] range.
let unpacked_uvs: vec2<f32> = unpack2x16unorm(packed_uvs);

// Now `unpacked_uvs` is a standard vec2<f32> that can be used for
// texture sampling or other calculations.
let color = textureSample(my_texture, my_sampler, unpacked_uvs);


### `unpack4x8snorm`

#### Signature

`unpack4x8snorm(e: u32) -> vec4<f32>`

* `e`: The input 32-bit unsigned integer containing the packed data.


#### Description

Unpacks a single `u32` value into four 32-bit floating-point values (`f32`), interpreting the packed data as "signed normalized" (`snorm`) numbers. This is the inverse operation of [`pack4x8snorm`](#heading-pack4x8snorm).

The process works as follows:

1. The `u32` input e is split into four 8-bit chunks.

2. Each 8-bit chunk is interpreted as a two's-complement signed integer, giving a value `v` in the range `[-128, 127]`.

3. This integer `v` is converted to a floating-point number in the range `[-1.0, 1.0]` by dividing it by `127.0` and clamping the result at `-1.0`.


The four resulting `f32` values are placed into a `vec4<f32>`:

* Bits 0-7 of `e` become the first component (`.x`).

* Bits 8-15 of `e` become the second component (`.y`).

* Bits 16-23 of `e` become the third component (`.z`).

* Bits 24-31 of `e` become the fourth component (`.w`).


*W3C Specification:* [`unpack4x8snorm`](https://www.w3.org/TR/WGSL/#unpack4x8snorm-builtin)

#### Use Case

This function is used to read and decode four-component data, like tangent vectors, that were previously packed with [`pack4x8snorm`](#heading-pack4x8snorm) to save memory.

```rust
// A u32 value received as a vertex attribute, containing a packed tangent vector.
let packed_tangent: u32 = ...;

// Unpack the u32 back into four f32s in the [-1.0, 1.0] range.
let unpacked_tangent: vec4<f32> = unpack4x8snorm(packed_tangent);

// Now `unpacked_tangent` is a standard vec4<f32> that can be used
// to build the TBN matrix for normal mapping.
let tangent_direction = unpacked_tangent.xyz;
let tangent_handedness = unpacked_tangent.w;

unpack4x8unorm

Signature

unpack4x8unorm(e: u32) -> vec4<f32>

  • e: The input 32-bit unsigned integer containing the packed data.

Description

Unpacks a single u32 value into four 32-bit floating-point values (f32), interpreting the packed data as "unsigned normalized" (unorm) numbers. This is the inverse operation of pack4x8unorm.

This is a very common operation, as it directly corresponds to decoding a standard 32-bit RGBA color. The process works as follows:

  1. The u32 input e is split into four 8-bit chunks.

  2. Each 8-bit chunk is interpreted as an unsigned integer, giving a value v in the range [0, 255].

  3. This integer v is converted to a floating-point number in the range [0.0, 1.0] by dividing it by 255.0.

The four resulting f32 values are placed into a vec4<f32>:

  • Bits 0-7 of e become the first component (.x, Red).

  • Bits 8-15 of e become the second component (.y, Green).

  • Bits 16-23 of e become the third component (.z, Blue).

  • Bits 24-31 of e become the fourth component (.w, Alpha).

W3C Specification: unpack4x8unorm

Use Case

This is the standard function for converting a color from its compact 32-bit integer representation (as found in Rgba8Unorm textures or buffers) into the vec4<f32> format required for high-precision color calculations.

// A u32 value representing an orange color (R=255, G=128, B=64, A=255).
// In hex, this is 0xFF4080FF (AABBGGRR in little-endian).
// The corresponding u32 literal is 4282589439u.
let packed_color: u32 = 4282589439u;

// Unpack the u32 back into a vec4<f32> color in the [0.0, 1.0] range.
let unpacked_color: vec4<f32> = unpack4x8unorm(packed_color);

// The result is approximately vec4<f32>(1.0, 0.5019, 0.2509, 1.0).
// This color can now be used in lighting calculations, blended, etc.

Texture Functions

This family of functions is the primary interface for all interactions with texture objects in WGSL. These powerful functions can be broadly divided into three main purposes: reading texel data, writing to special storage textures, and querying a texture's properties.

A fundamental concept within this group is the distinction between sampling and loading.

  • Sampling (e.g., using textureSample and its variants) is a complex operation. It uses a sampler object to apply filtering (like linear interpolation), wrapping (like repeating), and automatic mipmap selection, all based on floating-point coordinates (typically UVs).

  • Loading (using textureLoad) is a direct, unfiltered read. It uses integer texel coordinates to fetch the exact data from a specific location in the texture's memory, bypassing the sampler entirely.

Additionally, this category includes functions to query a texture's metadata (like its dimensions and layer count), as well as highly specialized functions for advanced techniques like custom shadow filtering and writing output from compute shaders.

textureDimensions

Returns the dimensions (width, height, depth) of a texture in texels. This function has several overloads depending on the texture's type and whether you are querying the base size or the size of a specific mipmap level.

W3C Specification: textureDimensions

Overloads: 1D Textures

These overloads return a single u32 representing the width of a 1D texture.

Signature
  • textureDimensions(t: texture_1d<T>) -> u32

  • textureDimensions(t: texture_storage_1d<F,A>) -> u32

  • textureDimensions(t: texture_1d<T>, level: u32) -> u32

  • T: The 1D texture type, can be texture_1d<ST> or texture_storage_1d<F,A>.

  • ST: The stored texel type for a sampled texture (f32, i32, or u32).

  • F: The texel format for a storage texture (e.g., rgba8unorm).

  • A: The access mode for a storage texture (read, write, or read_write).

  • level / L: An optional i32 or u32 specifying the mipmap level to query.

Description

Returns the width, in texels, of the 1D texture. If the optional level parameter is provided, it returns the width of that specific mipmap level; otherwise, it returns the width of the base level (level 0).

Use Case
@group(0) @binding(0) var my_texture: texture_1d<f32>;

let width_level_0 = textureDimensions(my_texture);
let width_level_1 = textureDimensions(my_texture, 1u); // Half of level 0

Overloads: 2D Textures

These overloads return a vec2<u32> representing the width and height of a 2D texture, texture array, or cube map face.

Signature
  • textureDimensions(t: T) -> vec2<u32>

  • textureDimensions(t: T, level: u32) -> vec2<u32>

  • T: A 2D-based texture type (see list below).

  • level / L: An optional i32 or u32 specifying the mipmap level.

T can be one of many 2D-based texture types:

  • texture_2d<ST>, texture_2d_array<ST>

  • texture_cube<ST>, texture_cube_array<ST>

  • texture_multisampled_2d<ST>

  • texture_depth_2d, texture_depth_2d_array

  • texture_depth_cube, texture_depth_cube_array

  • texture_depth_multisampled_2d

  • texture_storage_2d<F,A>, texture_storage_2d_array<F,A>

  • texture_external

Description

Returns the width and height, in texels, of the 2D texture. If the optional level parameter is provided, it returns the dimensions of that specific mipmap level; otherwise, it returns the dimensions of the base level (level 0). For cube maps, this returns the dimensions of a single face (which are always square).

Use Case

The most common use is to convert pixel coordinates into UV coordinates ([0.0, 1.0]), which is essential for screen-space effects.

@group(0) @binding(0) var screen_texture: texture_2d<f32>;

// `in.position` is the fragment's pixel coordinate (e.g., from 0 to 1920).
let screen_coords = vec2<f32>(in.position.xy);

// Get the full size of the texture.
let tex_dims = vec2<f32>(textureDimensions(screen_texture));

// Divide the pixel coordinate by the total size to get the UV coordinate.
let screen_uv = screen_coords / tex_dims;

Overloads: 3D Textures

These overloads return a vec3<u32> representing the width, height, and depth of a 3D texture.

Signature
  • textureDimensions(t: texture_3d<T>) -> vec3<u32>

  • textureDimensions(t: texture_storage_3d<F,A>) -> vec3<u32>

  • textureDimensions(t: texture_3d<T>, level: u32) -> vec3<u32>

  • T: The 3D texture type, can be texture_3d<ST> or texture_storage_3d<F,A>.

  • ST: The stored texel type for a sampled texture (f32, i32, or u32).

  • F: The texel format for a storage texture (e.g., rgba8unorm).

  • A: The access mode for a storage texture (read, write, or read_write).

  • level / L: An optional i32 or u32 specifying the mipmap level.

Description

Returns the width, height, and depth, in texels, of the 3D texture. If the optional level parameter is provided, it returns the dimensions of that specific mipmap level; otherwise, it returns the dimensions of the base level (level 0).

Use Case
@group(0) @binding(0) var my_3d_texture: texture_3d<f32>;

// Get the volume dimensions of the base mip level.
let volume_dims = textureDimensions(my_3d_texture);

Parameter Details & Behavior

  • level Parameter: When you provide the level parameter, you are querying the size of a specific mipmap. If the provided level is outside the valid range of mipmap levels for the texture, the function may return an indeterminate value.

  • What is Returned: The function returns the logical dimensions of a single image in the texture. It does not include the number of layers in a texture array or the number of samples in a multisampled texture. For those, you must use textureNumLayers and textureNumSamples respectively.

textureGather

Samples the four texels that would be used for bilinear filtering and returns their values directly in a vec4. This is a low-level operation primarily used for implementing custom texture filtering, most notably Percentage-Closer Filtering (PCF) for high-quality shadow mapping.

Instead of blending the four texels into a single color like textureSample, textureGather gives you the raw data from a specific channel of those four texels, allowing you to perform your own manual interpolation or comparison.

W3C Specification: textureGather

Overloads: Color/Data Textures (component required)

This family of overloads is used for standard color or data textures (texture_2d, texture_cube, etc.).

Signature

textureGather(component: C, t: TextureType, s: sampler, coords: VecType, ...)

  • component: A compile-time constant i32 or u32 (0, 1, 2, or 3) that specifies which channel to gather from the four source texels (0=R, 1=G, 2=B, 3=A).

  • t: A color/data texture type, such as texture_2d<f32>, texture_2d_array<u32>, or texture_cube<i32>.

  • s: A sampler.

  • coords: The coordinates for sampling (vec2<f32> for 2D, vec3<f32> for cubes).

  • ...: Optional or required parameters depending on the texture type, such as array_index or offset.

Description

Finds the 2x2 block of texels that surround the given coords. It then extracts the value of the specified component from each of these four texels and returns them in a vec4.

For example, if component is 1 (the Green channel):

  • The result's .x component will be the Green value of the top-left texel.

  • The result's .y component will be the Green value of the top-right texel.

  • The result's .z component will be the Green value of the bottom-left texel.

  • The result's .w component will be the Green value of the bottom-right texel.

Use Case

Used for custom filtering algorithms that need to operate on a single channel of neighboring texels.

@group(0) @binding(0) var my_texture: texture_2d<f32>;
@group(0) @binding(1) var my_sampler: sampler;

// Gather the values from the RED channel (component 0) of the four
// texels surrounding the given UV coordinate.
let red_values: vec4<f32> = textureGather(0, my_texture, my_sampler, in.uv);

// Now we have the raw red values and can perform custom logic,
// for example, finding the average red value in the 2x2 block.
let avg_red = (red_values.x + red_values.y + red_values.z + red_values.w) * 0.25;

Overloads: Depth Textures

This family of overloads is used specifically for depth textures.

Signature

textureGather(t: DepthTextureType, s: sampler, coords: VecType, ...)

  • t: A depth texture type, such as texture_depth_2d or texture_depth_cube_array.

  • s: A sampler.

  • coords: The coordinates for sampling (vec2<f32> for 2D, vec3<f32> for cubes).

  • ...: Optional or required parameters depending on the texture type, such as array_index or offset.

Description

Finds the 2x2 block of texels that surround the given coords in a depth texture. It returns a vec4<f32> where each component contains the raw depth value from one of the four texels.

Unlike the color version, there is no component parameter, as depth textures only have a single channel.

Use Case

This is the cornerstone of custom shadow mapping algorithms like PCF.

@group(0) @binding(0) var shadow_map: texture_depth_2d;
@group(0) @binding(1) var shadow_sampler: sampler;

let fragment_depth: f32 = ...; // The current fragment's depth from the light's perspective

// Gather the four closest depth values from the shadow map.
let shadow_depths: vec4<f32> = textureGather(shadow_map, shadow_sampler, in.shadow_coords);

// Manually compare the current fragment's depth to the four stored depths.
// The result of `>` is a boolean vector.
let comparison = vec4<f32>(fragment_depth > shadow_depths);

// Average the results to get a soft shadow value (0.0 to 1.0).
let shadow_factor = dot(comparison, vec4<f32>(0.25));

textureGatherCompare

Performs a hardware-accelerated depth comparison against a 2x2 block of texels from a depth texture. This is a highly specialized and optimized function designed almost exclusively for implementing high-quality, soft shadows with Percentage-Closer Filtering (PCF).

It combines the "gather" step of textureGather and a subsequent comparison step into a single, efficient hardware operation.

Signatures

The function is overloaded for various depth texture types.

  • textureGatherCompare(t: texture_depth_2d, s: sampler_comparison, coords: vec2<f32>, depth_ref: f32, ...)

  • textureGatherCompare(t: texture_depth_2d_array, s: sampler_comparison, coords: vec2<f32>, array_index: u32, depth_ref: f32, ...)

  • textureGatherCompare(t: texture_depth_cube, s: sampler_comparison, coords: vec3<f32>, depth_ref: f32, ...)

  • textureGatherCompare(t: texture_depth_cube_array, s: sampler_comparison, coords: vec3<f32>, array_index: u32, depth_ref: f32, ...)

  • t: The depth texture to sample.

  • s: A sampler_comparison. This special sampler type is configured on the CPU side with a comparison function (e.g., LessEqual, Greater).

  • coords: The coordinates for sampling (vec2<f32> for 2D, vec3<f32> for cubes).

  • depth_ref: The depth value to compare against the values in the texture.

  • ...: Optional offset can be provided for some texture types.

Description

This function executes a complete, optimized shadow-PCF step in hardware:

  1. It identifies the 2x2 quad of texels in the depth texture that surround the given coords.

  2. For each of those four texels, it performs a comparison between the provided depth_ref and the depth value stored in the texel.

  3. The specific comparison operation (e.g., depth_ref <= texel_depth) is determined by the compare property of the sampler_comparison object, which is set in your Rust code.

  4. It returns a vec4<f32> where each component is either 1.0 (if the comparison passed) or 0.0 (if it failed).

The result is a vector containing four binary pass/fail results, ready to be averaged for a soft shadow effect.

W3C Specification: textureGatherCompare

Use Case

The sole purpose of this function is to implement PCF for soft shadows efficiently. It is significantly faster than using a manual textureGather followed by a comparison in the shader code.

@group(0) @binding(0) var shadow_map: texture_depth_2d;
@group(0) @binding(1) var shadow_sampler: sampler_comparison;

// The current fragment's depth from the light's perspective.
let fragment_depth_ref: f32 = ...;

// The hardware performs the gather and four comparisons in one go.
// The result is already a vector of 0.0s and 1.0s.
let comparison_results: vec4<f32> = textureGatherCompare(
    shadow_map,
    shadow_sampler,
    in.shadow_coords,
    fragment_depth_ref
);

// Average the four pass/fail results to get a smooth shadow value.
// If all four tests passed (fragment is not in shadow), the result is 1.0.
// If two passed, the result is 0.5 (a penumbra).
// If none passed, the result is 0.0 (fully shadowed).
let shadow_factor = dot(comparison_results, vec4<f32>(0.25));

textureLoad

Fetches the unfiltered data for a single texel from a texture using integer texel coordinates.

This function is fundamentally different from textureSample:

  • It uses integer coordinates (e.g., (10, 20)) to specify an exact texel, not floating-point UVs ([0.0, 1.0]).

  • It does not use a sampler. There is no filtering (bilinear, trilinear) or wrapping.

  • For mipmapped textures, you must explicitly provide the mipmap level to read from.

It is a direct, unfiltered read from a specific location in the texture's memory.

W3C Specification: textureLoad

Overloads: Standard (Mipmapped) Textures

This is the most common family of overloads for reading from standard 1D, 2D, 3D, and array textures.

Signatures
  • textureLoad(t: texture_1d<ST>, coords: u32, level: u32) -> vec4<ST>

  • textureLoad(t: texture_2d<ST>, coords: vec2<u32>, level: u32) -> vec4<ST>

  • textureLoad(t: texture_2d_array<ST>, coords: vec2<u32>, array_index: u32, level: u32) -> vec4<ST>

  • textureLoad(t: texture_3d<ST>, coords: vec3<u32>, level: u32) -> vec4<ST>

  • ST: The stored texel type (f32, i32, or u32).

  • coords / C: The integer texel coordinates. Can be i32 or u32 (or a vector of them).

  • level / L: The integer mipmap level. Can be i32 or u32.

  • array_index / A: The integer array layer. Can be i32 or u32.

Description

Reads the texel data from the specified integer coords within a specific array_index (for array textures) and mipmap level. The coords are 0-indexed from the top-left corner of the specified mip level.

Use Case

Ideal for situations where you need precise, unfiltered data, such as reading from a data texture or manually implementing a custom filtering algorithm.

@group(0) @binding(0) var data_texture: texture_2d<f32>;

// Read the exact texel value from coordinate (5, 10) on the second mipmap level.
// There is no blending or interpolation with neighboring texels.
let texel_data = textureLoad(data_texture, vec2<u32>(5u, 10u), 1u);

Overloads: Multisampled Textures

This family of overloads reads from multisampled textures, which store multiple data samples per texel.

Signatures
  • textureLoad(t: texture_multisampled_2d<ST>, coords: vec2<u32>, sample_index: u32) -> vec4<ST>

  • textureLoad(t: texture_depth_multisampled_2d, coords: vec2<u32>, sample_index: u32) -> f32

  • ST: The stored texel type (f32, i32, or u32).

  • coords / C: The integer texel coordinates. Can be i32 or u32 (or a vector of them).

  • sample_index / S: The integer sample index. Can be i32 or u32.

Description

Reads the data from a single sample_index within the texel at the specified integer coords. Multisampled textures do not have mipmaps, so there is no level parameter.

Use Case

Used to manually "resolve" a multisampled texture, for example, by averaging all the samples for a given pixel to produce a final anti-aliased color.

@group(0) @binding(0) var msaa_texture: texture_multisampled_2d<f32>;

let texel_coords = vec2<u32>(in.position.xy);
let num_samples = textureNumSamples(msaa_texture);
var final_color = vec4<f32>(0.0);

// Loop through all samples for the current pixel and average them.
for (var i: u32 = 0u; i < num_samples; i = i + 1u) {
    final_color = final_color + textureLoad(msaa_texture, texel_coords, i);
}
final_color = final_color / f32(num_samples);

Overloads: Depth Textures

This family of overloads reads a single depth value from a depth texture.

Signatures
  • textureLoad(t: texture_depth_2d, coords: vec2<u32>, level: u32) -> f32

  • textureLoad(t: texture_depth_2d_array, coords: vec2<u32>, array_index: u32, level: u32) -> f32

  • coords / C: The integer texel coordinates. Can be i32 or u32 (or a vector of them).

  • level / L: The integer mipmap level. Can be i32 or u32.

  • array_index / A: The integer array layer. Can be i32 or u32.

Description

Reads the single depth value from the specified integer coords and mipmap level. The key difference is that the return type is a single f32, not a vec4.

Use Case

Used to fetch a precise, unfiltered depth value, which is useful in some custom shadow or post-processing techniques.

@group(0) @binding(0) var depth_texture: texture_depth_2d;

// Get the exact depth value stored at texel (100, 200) on mip level 0.
let precise_depth = textureLoad(depth_texture, vec2<u32>(100u, 200u), 0u);

Behavior with Invalid Coordinates

If the provided coords, array_index, level, or sample_index are outside the valid bounds of the texture, the address is considered invalid. In this case, the function's return value is defined by the GPU implementation and may be one of the following:

  • The data for a different texel that is within the texture's bounds.

  • A vector of zeros (0,0,0,0) or (0,0,0,1) for color/data textures.

  • 0.0 for depth textures.

textureNumLayers

Signature

textureNumLayers(t: T) -> u32

  • t: The input arrayed texture.

T must be an arrayed texture type:

  • texture_2d_array<ST>

  • texture_cube_array<ST>

  • texture_depth_2d_array

  • texture_depth_cube_array

  • texture_storage_2d_array<F,A>

Description

Returns the number of layers in an arrayed texture.

The meaning of a "layer" depends on the texture type:

  • For a texture_2d_array, it returns the number of 2D texture layers in the array.

  • For a texture_cube_array, it returns the number of cubes in the array (where each cube itself consists of 6 faces/layers).

This function is distinct from textureDimensions, which returns the width and height of a single layer.

W3C Specification: textureNumLayers

Use Case

Used to query the size of a texture array, which is often necessary for looping through all the layers or for clamping an array index to be within a valid range.

@group(0) @binding(0) var my_texture_array: texture_2d_array<f32>;

// Get the total number of layers available in the texture array.
let total_layers = textureNumLayers(my_texture_array);

// This can be used as the upper bound for a loop.
for (var i: u32 = 0u; i < total_layers; i = i + 1u) {
    // ... sample from layer `i` of the texture array ...
    let color = textureSample(my_texture_array, my_sampler, in.uv, i);
}

// Or it can be used to safely clamp a runtime index.
let some_runtime_index: u32 = ...;
let safe_index = min(some_runtime_index, total_layers - 1u);

textureNumLevels

Signature

textureNumLevels(t: T) -> u32

  • t: The input mipmapped texture.

T must be a texture type that supports mipmaps:

  • texture_1d<ST>

  • texture_2d<ST>, texture_2d_array<ST>

  • texture_3d<ST>

  • texture_cube<ST>, texture_cube_array<ST>

  • texture_depth_2d, texture_depth_2d_array

  • texture_depth_cube, texture_depth_cube_array

Description

Returns the total number of mipmap levels available in the texture. Level 0 is the original, full-resolution texture, and subsequent levels are progressively smaller.

This function does not work with texture types that do not have mipmaps, such as texture_storage_* or texture_multisampled_*.

W3C Specification: textureNumLevels

Use Case

This function is useful for algorithms that need to manually iterate through mipmap levels, such as in some advanced post-processing or image analysis techniques.

@group(0) @binding(0) var my_texture: texture_2d<f32>;

// Get the total number of mip levels.
let mip_count = textureNumLevels(my_texture);

// This can be used as the upper bound for a loop that processes
// each mipmap level of a texture.
var total_red_value = 0.0;
for (var i: u32 = 0u; i < mip_count; i = i + 1u) {
    // Load a texel from the center of the current mip level.
    let mip_dims = textureDimensions(my_texture, i);
    let center_coord = mip_dims / 2u;
    total_red_value = total_red_value + textureLoad(my_texture, center_coord, i).r;
}

textureNumSamples

Signature

textureNumSamples(t: T) -> u32

  • t: The input multisampled texture.

  • T must be a multisampled texture type:

    • texture_multisampled_2d<ST>

    • texture_depth_multisampled_2d

Description

Returns the number of samples stored for each texel in a multisampled texture. The result is typically a small power of two, such as 2, 4, or 8, depending on the MSAA (Multi-Sample Anti-Aliasing) level configured when the texture was created.

This function only works with multisampled textures. Attempting to use it on a non-multisampled texture will result in a shader compilation error.

W3C Specification: textureNumSamples

Use Case

This function is essential for manually resolving a multisampled texture. "Resolving" is the process of combining the multiple samples within each texel into a single, final anti-aliased value.

@group(0) @binding(0) var msaa_texture: texture_multisampled_2d<f32>;

// The integer coordinates of the texel to resolve.
let texel_coords = vec2<u32>(in.position.xy);

// Get the number of samples per texel (e.g., 4).
let sample_count = textureNumSamples(msaa_texture);

// A variable to accumulate the color from all samples.
var resolved_color = vec4<f32>(0.0);

// Loop through each sample, load its color, and add it to the total.
for (var i: u32 = 0u; i < sample_count; i = i + 1u) {
    resolved_color = resolved_color + textureLoad(msaa_texture, texel_coords, i);
}

// Divide by the number of samples to get the final averaged color.
resolved_color = resolved_color / f32(sample_count);

textureSample

The primary and most common function for reading from a texture. It uses a sampler to apply filtering, wrapping, and automatic mipmap selection based on floating-point coordinates.

This function must be called in uniform control flow (i.e., not inside if statements or loops that depend on per-fragment data). If it is used inside a loop with a variable number of iterations or an if statement with a per-fragment condition, the result is an indeterminate value.

W3C Specification: textureSample

Overloads: Color/Data Textures

Signature
  • textureSample(t: texture_1d<f32>, s: sampler, coords: f32) -> vec4<f32>

  • textureSample(t: texture_2d<f32>, s: sampler, coords: vec2<f32>) -> vec4<f32>

  • textureSample(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, offset: vec2<i32>) -> vec4<f32>

  • textureSample(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A) -> vec4<f32>

  • textureSample(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, offset: vec2<i32>) -> vec4<f32>

  • textureSample(t: T, s: sampler, coords: vec3<f32>) -> vec4<f32> where T is texture_3d<f32> or texture_cube<f32>

  • textureSample(t: texture_3d<f32>, s: sampler, coords: vec3<f32>, offset: vec3<i32>) -> vec4<f32>

  • textureSample(t: texture_cube_array<f32>, s: sampler, coords: vec3<f32>, array_index: A) -> vec4<f32>

  • A is i32 or u32.

  • t: The texture to be sampled.

  • s: A sampler object that defines filtering and wrapping rules.

  • coords: The floating-point coordinates for sampling (e.g., UVs for a texture_2d, or a direction vector for a texture_cube).

  • array_index: The 0-indexed layer of the texture array to sample from.

  • offset: An optional, compile-time constant integer vector that applies a texel offset before sampling.

Description

Samples the texture t using the filtering and wrapping rules defined in the sampler s. The GPU automatically calculates the appropriate mipmap level based on the screen-space derivatives of the texture coords. The return value is a filtered vec4<f32> representing the color or data at that location.

Use Case

This is the standard function for applying a texture to a 3D model.

@group(1) @binding(0) var my_texture: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;

// The `in.uv` coordinate is a floating-point value from 0.0 to 1.0.
// The sampler handles blending between texels if coords fall between them.
let albedo_color = textureSample(my_texture, my_sampler, in.uv);

Overloads: Depth Textures

These overloads sample from a depth texture and return a single f32 depth value.

Signature
  • textureSample(t: texture_depth_2d, s: sampler, coords: vec2<f32>) -> f32

  • textureSample(t: texture_depth_2d, s: sampler, coords: vec2<f32>, offset: vec2<i32>) -> f32

  • textureSample(t: texture_depth_2d_array, s: sampler, coords: vec2<f32>, array_index: A) -> f32

  • textureSample(t: texture_depth_2d_array, s: sampler, coords: vec2<f32>, array_index: A, offset: vec2<i32>) -> f32

  • textureSample(t: texture_depth_cube, s: sampler, coords: vec3<f32>) -> f32

  • textureSample(t: texture_depth_cube_array, s: sampler, coords: vec3<f32>, array_index: A) -> f32

  • A is i32 or u32.

  • t: The depth texture to be sampled.

  • s: A sampler object.

  • coords: The floating-point coordinates for sampling.

  • array_index: The 0-indexed layer of the texture array to sample from.

  • offset: An optional, compile-time constant integer vector that applies a texel offset before sampling.

Description

Samples the depth texture t using the sampler s. The return value is a single filtered f32 representing the depth at that location.

Use Case

Used to read a filtered depth value, for example, to visualize a shadow map or for post-processing effects. For actual shadow comparisons, textureSampleCompare is usually preferred.

@group(1) @binding(0) var shadow_map: texture_depth_2d;
@group(1) @binding(1) var my_sampler: sampler;

// Read the filtered depth value from the shadow map.
let sampled_depth = textureSample(shadow_map, my_sampler, in.shadow_coords);

// Visualize the depth map by converting the depth value to a grayscale color.
let shadow_map_color = vec4<f32>(vec3(sampled_depth), 1.0);

Parameters and Behavior

  • sampler: An object defined in Rust code that controls how the texture is read. It specifies the filtering mode (e.g., nearest, linear), the wrapping mode (e.g., repeat, clamp-to-edge), and mipmapping behavior.

  • coords: The floating-point coordinates used for sampling. For 2D textures, this is typically a UV coordinate in the [0.0, 1.0] range. For cube maps, this is a 3D direction vector.

  • offset: An optional vec2<i32> or vec3<i32> parameter that must be a compile-time constant. It applies an integer texel offset to the coordinates before sampling.

  • Uniform Control Flow: This function (and its variants like textureSampleBias) must be called in uniform control flow. If it is used inside a loop with a variable number of iterations or an if statement with a per-fragment condition, the result is an indeterminate value. For sampling in such situations, you must use textureSampleLevel.

textureSampleBaseClampToEdge

Signature

textureSampleBaseClampToEdge(t: T, s: sampler, coords: vec2<f32>) -> vec4<f32>

  • t: The texture to sample.

  • T: Must be texture_2d<f32> or texture_external.

  • s: A sampler.

  • coords: The vec2<f32> coordinates for sampling.

Description

Samples a 2D texture, but with two specific and non-overridable behaviors:

  1. It always samples from the base mipmap level (level 0).

  2. It always uses a clamp-to-edge wrapping mode for the coordinates, regardless of the wrapping mode configured in the sampler.

The primary reason this function exists is to be the only sampling function that can operate on the special texture_external type. An external texture is a handle to an image produced by an external system, such as a video decoder, and has special restrictions. It cannot be sampled with the standard textureSample function.

While it can also be used on a regular texture_2d<f32>, its behavior in that case is equivalent to using textureSampleLevel(..., 0.0) with a sampler configured for clamp-to-edge.

W3C Specification: textureSampleBaseClampToEdge

Use Case

The exclusive use case for this function is sampling from a texture_external, which is typically used for displaying video frames in a WebGPU context.

// This binding would be provided by the browser when working with a video element.
@group(0) @binding(0) var video_frame: texture_external;
@group(0) @binding(1) var video_sampler: sampler;

// Use this special function to sample the video frame texture.
// Using `textureSample` here would result in a shader compilation error.
let video_color = textureSampleBaseClampToEdge(video_frame, video_sampler, in.uv);

textureSampleBias

Signatures

  • textureSampleBias(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, bias: f32) -> vec4<f32>

  • textureSampleBias(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, bias: f32, offset: vec2<i32>) -> vec4<f32>

  • textureSampleBias(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, bias: f32) -> vec4<f32>

  • textureSampleBias(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, bias: f32, offset: vec2<i32>) -> vec4<f32>

  • textureSampleBias(t: T, s: sampler, coords: vec3<f32>, bias: f32) -> vec4<f32> where T is texture_3d<f32> or texture_cube<f32>

  • textureSampleBias(t: texture_3d<f32>, s: sampler, coords: vec3<f32>, bias: f32, offset: vec3<i32>) -> vec4<f32>

  • textureSampleBias(t: texture_cube_array<f32>, s: sampler, coords: vec3<f32>, array_index: A, bias: f32) -> vec4<f32>

  • A is i32 or u32.

  • t: The texture to be sampled.

  • s: A sampler object that defines filtering and wrapping rules.

  • coords: The floating-point coordinates for sampling (e.g., UVs for a texture_2d, or a direction vector for a texture_cube).

  • bias: A floating-point value that provides a manual adjustment to the calculated mipmap level.

  • array_index: The 0-indexed layer of the texture array to sample from.

  • offset: An optional, compile-time constant integer vector (vec2<i32> or vec3<i32>) that applies a texel offset before sampling.

Description

Samples a texture with a manual adjustment to the mipmap level selection. This function is identical to textureSample but adds a bias parameter that allows you to make the texture appear sharper or blurrier than it normally would. The function first calculates the ideal mipmap level based on the texture coordinate derivatives, then adds the floating-point bias value to it before sampling.

  • A positive bias (> 0.0) forces the GPU to sample from a smaller, more distant mipmap level, resulting in a blurrier appearance.

  • A negative bias (< 0.0) forces the GPU to sample from a larger, closer mipmap level, resulting in a sharper appearance (which can also lead to aliasing).

This function must be called in uniform control flow (i.e., not inside if statements or loops that depend on per-fragment data). If it is used inside a loop with a variable number of iterations or an if statement with a per-fragment condition, the result is an indeterminate value.

W3C Specification: textureSampleBias

Use Case

textureSampleBias is used for effects that require programmatic control over texture sharpness. A classic example is faking a depth-of-field effect or creating a "blur in" transition.

@group(1) @binding(0) var scene_texture: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;

let distance_from_camera: f32 = ...;
let focus_point = 10.0;

// Calculate a blur amount based on distance from the camera's focus point.
// Objects far from the focus point will get a larger positive bias.
let blur_bias = abs(distance_from_camera - focus_point) * 0.5;

// Sample the scene texture with the calculated bias.
// Distant or very close objects will appear blurry, while objects near
// the focus point will be sharp (bias ≈ 0.0).
let final_color = textureSampleBias(scene_texture, my_sampler, in.uv, blur_bias);

textureSampleCompare

Signature

  • textureSampleCompare(t: texture_depth_2d, s: sampler_comparison, coords: vec2<f32>, depth_ref: f32) -> f32

  • textureSampleCompare(t: texture_depth_2d, s: sampler_comparison, coords: vec2<f32>, depth_ref: f32, offset: vec2<i32>) -> f32

  • textureSampleCompare(t: texture_depth_2d_array, s: sampler_comparison, coords: vec2<f32>, array_index: A, depth_ref: f32) -> f32

  • textureSampleCompare(t: texture_depth_2d_array, s: sampler_comparison, coords: vec2<f32>, array_index: A, depth_ref: f32, offset: vec2<i32>) -> f32

  • textureSampleCompare(t: texture_depth_cube, s: sampler_comparison, coords: vec3<f32>, depth_ref: f32) -> f32

  • textureSampleCompare(t: texture_depth_cube_array, s: sampler_comparison, coords: vec3<f32>, array_index: A, depth_ref: f32) -> f32

  • A is i32 or u32.

  • t: A depth texture to sample from.

  • s: A sampler_comparison object. This special sampler type is configured on the CPU side with a specific comparison function (e.g., LessEqual, Greater).

  • coords: The floating-point coordinates for sampling.

  • depth_ref: The depth value to compare against the values in the texture.

  • array_index: The 0-indexed layer of the texture array to sample from.

  • offset: An optional, compile-time constant integer vector (vec2<i32>) that applies a texel offset before sampling.

Description

Performs a hardware-accelerated depth comparison against a sampled value from a depth texture. This is a highly specialized function designed almost exclusively for efficient shadow mapping.

The function samples a depth texture and compares the sampled depth value(s) against the provided depth_ref. The comparison operation (e.g., depth_ref <= texel_depth) is determined by the compare property of the sampler_comparison object.

The return value is a float in the range [0.0, 1.0]. If the sampler's filter mode is linear, the hardware will sample the four nearest texels, perform the comparison for each, and return a bilinearly filtered average of the four 0.0 or 1.0 results. If the filter mode is nearest, it compares against a single texel and returns either 0.0 or 1.0.

This function must be called in uniform control flow (i.e., not inside if statements or loops that depend on per-fragment data). If it is used inside a loop with a variable number of iterations or an if statement with a per-fragment condition, the result is an indeterminate value.

W3C Specification: textureSampleCompare

Use Case

This is the standard, high-performance way to implement Percentage-Closer Filtering (PCF) for soft shadows, as the hardware handles the sampling, comparison, and filtering in a single, highly optimized step.

@group(1) @binding(0) var shadow_map: texture_depth_2d;
@group(1) @binding(1) var shadow_sampler: sampler_comparison;

// The current fragment's depth from the light's perspective.
let fragment_depth = ...;

// The hardware performs the sampling, comparison, and (if configured)
// the filtering of the comparison results all in one operation.
// The result is a shadow factor (0.0 = shadowed, 1.0 = lit).
let shadow_factor = textureSampleCompare(
    shadow_map,
    shadow_sampler,
    in.shadow_coords,
    fragment_depth
);

textureSampleCompareLevel

Signature

  • textureSampleCompareLevel(t: texture_depth_2d, s: sampler_comparison, coords: vec2<f32>, depth_ref: f32) -> f32

  • textureSampleCompareLevel(t: texture_depth_2d, s: sampler_comparison, coords: vec2<f32>, depth_ref: f32, offset: vec2<i32>) -> f32

  • textureSampleCompareLevel(t: texture_depth_2d_array, s: sampler_comparison, coords: vec2<f32>, array_index: A, depth_ref: f32) -> f32

  • textureSampleCompareLevel(t: texture_depth_2d_array, s: sampler_comparison, coords: vec2<f32>, array_index: A, depth_ref: f32, offset: vec2<i32>) -> f32

  • textureSampleCompareLevel(t: texture_depth_cube, s: sampler_comparison, coords: vec3<f32>, depth_ref: f32) -> f32

  • textureSampleCompareLevel(t: texture_depth_cube_array, s: sampler_comparison, coords: vec3<f32>, array_index: A, depth_ref: f32) -> f32

  • A is i32 or u32.

  • t: A depth texture to sample from.

  • s: A sampler_comparison object.

  • coords: The floating-point coordinates for sampling.

  • depth_ref: The depth value to compare against the values in the texture.

  • array_index: The 0-indexed layer of the texture array to sample from.

  • offset: An optional, compile-time constant integer vector (vec2<i32>) that applies a texel offset before sampling.

Description

Performs a hardware-accelerated depth comparison against a depth texture, similar to textureSampleCompare, but with several key differences that make it safe to use in more contexts.

The main distinctions are:

  1. It always samples texels from the base mipmap level (level 0).

  2. It does not compute derivatives for mipmap selection.

  3. Because of this, it is not required to be invoked in uniform control flow.

  4. It can be used in any shader stage, including vertex and compute shaders.

The function samples the nearest texel(s) on mip level 0, compares them against depth_ref using the comparison function from the sampler_comparison, and returns a filtered average of the pass/fail results in the range [0.0, 1.0].

W3C Specification: textureSampleCompareLevel

Use Case

This function is required for any shadow mapping algorithm that performs depth comparisons inside loops with variable iteration counts or if statements with per-fragment conditions, as textureSampleCompare would be invalid in those scenarios.

@group(1) @binding(0) var shadow_map: texture_depth_2d;
@group(1) @binding(1) var shadow_sampler: sampler_comparison;

let fragment_depth = ...;
let num_samples = get_sample_count_for_this_pixel(); // A non-uniform value

var total_shadow = 0.0;

// This loop has a variable number of iterations, so it is non-uniform control flow.
// We MUST use textureSampleCompareLevel inside it.
for (var i = 0; i < num_samples; i = i + 1) {
    let offset = get_sample_offset(i); // Get a pre-defined offset
    total_shadow = total_shadow + textureSampleCompareLevel(
        shadow_map,
        shadow_sampler,
        in.shadow_coords + offset,
        fragment_depth
    );
}
let shadow_factor = total_shadow / f32(num_samples);

textureSampleGrad

Signature

  • textureSampleGrad(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, ddx: vec2<f32>, ddy: vec2<f32>) -> vec4<f32>

  • textureSampleGrad(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, ddx: vec2<f32>, ddy: vec2<f32>, offset: vec2<i32>) -> vec4<f32>

  • textureSampleGrad(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, ddx: vec2<f32>, ddy: vec2<f32>) -> vec4<f32>

  • textureSampleGrad(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, ddx: vec2<f32>, ddy: vec2<f32>, offset: vec2<i32>) -> vec4<f32>

  • textureSampleGrad(t: T, s: sampler, coords: vec3<f32>, ddx: vec3<f32>, ddy: vec3<f32>) -> vec4<f32> where T is texture_3d<f32> or texture_cube<f32>

  • textureSampleGrad(t: texture_3d<f32>, s: sampler, coords: vec3<f32>, ddx: vec3<f32>, ddy: vec3<f32>, offset: vec3<i32>) -> vec4<f32>

  • textureSampleGrad(t: texture_cube_array<f32>, s: sampler, coords: vec3<f32>, array_index: A, ddx: vec3<f32>, ddy: vec3<f32>) -> vec4<f32>

  • A is i32 or u32.

  • t: The texture to be sampled.

  • s: A sampler object.

  • coords: The floating-point coordinates for sampling.

  • ddx, ddy: The explicit gradients (derivatives) of the texture coordinates with respect to the horizontal (x) and vertical (y) screen axes.

  • array_index: The 0-indexed layer of the texture array to sample from.

  • offset: An optional, compile-time constant integer vector that applies a texel offset before sampling.

Description

Samples a texture using explicitly provided gradients, giving you manual control over mipmap level selection.

Normally, textureSample automatically calculates the rate of change of the coords to determine the correct mip level. However, this automatic calculation fails if the coords are manipulated procedurally within the shader (e.g., inside a loop or a complex if statement). textureSampleGrad solves this by letting you provide the gradients (ddx and ddy) yourself. The GPU then uses these manual gradients to perform a correct mipmap selection.

This is an advanced function required for specific rendering techniques.

W3C Specification: textureSampleGrad

Use Case

The canonical use case is in advanced materials like Parallax Occlusion Mapping (POM), where the texture coordinates are modified inside a raymarching loop. To get correct mipmapping, you must calculate the derivatives before the loop and then pass them into textureSampleGrad inside the loop.

@group(1) @binding(0) var my_texture: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;

// 1. Calculate the gradients on the original, unmodified UVs.
let uv_ddx = dpdx(in.uv);
let uv_ddy = dpdy(in.uv);

var final_uv = in.uv;

// 2. Perform some procedural UV manipulation (e.g., a raymarching loop for POM).
// This loop makes the final_uv value non-uniform, breaking automatic derivatives.
for (var i = 0; i < 16; i = i + 1) {
    // ... complex logic that modifies final_uv ...
}

// 3. Sample the texture using the modified UVs, but provide the original,
// correct gradients. This ensures the texture is sampled with the correct mip level.
let final_color = textureSampleGrad(my_texture, my_sampler, final_uv, uv_ddx, uv_ddy);

textureSampleLevel

Samples a texture from a single, explicitly specified mipmap level, bypassing the GPU's automatic level-of-detail selection.

This function has one critical advantage over textureSample: because it does not implicitly calculate derivatives, it is not required to be invoked in uniform control flow. This makes it the correct and safe choice for sampling textures inside loops with variable iteration counts or if statements with per-fragment conditions.

W3C Specification: textureSampleLevel

Overloads: Color/Data Textures

These overloads sample from standard color or data textures.

Signature
  • textureSampleLevel(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, level: f32) -> vec4<f32>

  • textureSampleLevel(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, level: f32, offset: vec2<i32>) -> vec4<f32>

  • textureSampleLevel(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, level: f32) -> vec4<f32>

  • textureSampleLevel(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, level: f32, offset: vec2<i32>) -> vec4<f32>

  • textureSampleLevel(t: T, s: sampler, coords: vec3<f32>, level: f32) -> vec4<f32> where T is texture_3d<f32> or texture_cube<f32>

  • textureSampleLevel(t: texture_3d<f32>, s: sampler, coords: vec3<f32>, level: f32, offset: vec3<i32>) -> vec4<f32>

  • textureSampleLevel(t: texture_cube_array<f32>, s: sampler, coords: vec3<f32>, array_index: A, level: f32) -> vec4<f32>

  • A is i32 or u32.

  • t: The texture to be sampled.

  • s: A sampler object.

  • coords: The floating-point coordinates for sampling.

  • level: The f32 mipmap level to sample from.

  • array_index: The 0-indexed layer of the texture array.

  • offset: An optional, compile-time constant integer offset.

Description

Samples the texture using the specified sampler rules, but only from the mipmap level indicated by level. Because the level is a floating-point value, the GPU can perform trilinear filtering by blending the results from the two nearest integer mip levels if the sampler is configured to do so.

Use Case

Ideal for effects that need data from a specific level of detail or for sampling inside non-uniform control flow.

@group(1) @binding(0) var scene_texture: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;

// Use Case 1: Getting a blurred version of a texture for a bloom effect.
// Sample from a high mip level (e.g., 4.0) to get a small, blurry image.
let bloom_source = textureSampleLevel(scene_texture, my_sampler, in.uv, 4.0);

// Use Case 2: Sampling inside a non-uniform loop.
var final_color = vec4(0.0);
if (in.world_position.x > 0.0) { // Non-uniform condition
    // We MUST use textureSampleLevel here.
    final_color = textureSampleLevel(scene_texture, my_sampler, in.uv, 0.0);
}

Overloads: Depth Textures

These overloads sample from a single mip level of a depth texture.

Signature
  • textureSampleLevel(t: texture_depth_2d, s: sampler, coords: vec2<f32>, level: L) -> f32

  • textureSampleLevel(t: texture_depth_2d, s: sampler, coords: vec2<f32>, level: L, offset: vec2<i32>) -> f32

  • textureSampleLevel(t: texture_depth_2d_array, s: sampler, coords: vec2<f32>, array_index: A, level: L) -> f32

  • textureSampleLevel(t: texture_depth_2d_array, s: sampler, coords: vec2<f32>, array_index: A, level: L, offset: vec2<i32>) -> f32

  • textureSampleLevel(t: texture_depth_cube, s: sampler, coords: vec3<f32>, level: L) -> f32

  • textureSampleLevel(t: texture_depth_cube_array, s: sampler, coords: vec3<f32>, array_index: A, level: L) -> f32

  • A and L are i32 or u32.

  • t: The depth texture to be sampled.

  • s: A sampler object.

  • coords: The floating-point coordinates for sampling.

  • level: The integer i32 or u32 mipmap level to sample from.

  • array_index: The 0-indexed layer of the texture array.

  • offset: An optional, compile-time constant integer offset.

Description

Samples the depth texture, returning a single f32 depth value from the specified integer mip level. Unlike the color texture overloads, the level parameter is an integer, so no blending between mip levels is performed.

Use Case

Used to fetch a precise depth value from a specific mip level, often in custom post-processing effects.

@group(1) @binding(0) var depth_texture: texture_depth_2d;
@group(1) @binding(1) var my_sampler: sampler;

// Get the unfiltered depth value from the center of the highest-resolution mip level.
let center_depth = textureSampleLevel(depth_texture, my_sampler, vec2(0.5), 0);

textureStore

Signature

  • textureStore(t: texture_storage_1d<F,A>, coords: C, value: vec4<CF>)

  • textureStore(t: texture_storage_2d<F,A>, coords: vec2<C>, value: vec4<CF>)

  • textureStore(t: texture_storage_2d_array<F,A>, coords: vec2<C>, array_index: A_idx, value: vec4<CF>)

  • textureStore(t: texture_storage_3d<F,A>, coords: vec3<C>, value: vec4<CF>)

  • t: The storage texture to write to.

  • F: The texel format of the texture, declared in Rust/CPU code (e.g., rgba8unorm, r32uint).

  • A: The access mode, which must be write or read_write.

  • coords / C: The integer texel coordinates (i32 or u32).

  • array_index / A_idx: The integer array layer (i32 or u32).

  • value: A vec4 containing the data to write.

  • CF: The required "channel format" for the value parameter. This type depends on the texture's texel format F. For example:

    • If F is rgba8unorm, CF must be f32.

    • If F is r32uint, CF must be u32.

    • If F is rg32sint, CF must be i32.

Description

Writes a vec4 value to a specific texel in a storage texture. This is an output-only function that does not return a value. It is fundamentally different from sampling functions: it only works with texture_storage_* types, uses integer coordinates instead of UVs, does not use a sampler, and requires that the texture's access mode be write or read_write.

The value is written to the texel specified by the integer coords. If the coordinates are outside the bounds of the texture, the operation is simply discarded and has no effect. For single-channel formats (like r32float), only the first component of the value (.x) is used.

W3C Specification: textureStore

Use Case

textureStore is the primary way to write output from a compute shader. It is commonly used for image processing, simulations, or any general-purpose GPU computation where the result is an image or data grid.

// A simple compute shader that inverts the colors of an image.

// Input texture (read-only)
@group(0) @binding(0) var input_texture: texture_2d<f32>;
// Output texture (write-only)
@group(0) @binding(1) var output_texture: texture_storage_2d<rgba8unorm, write>;

@compute @workgroup_size(8, 8, 1)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
    // `id.xy` is the integer coordinate of the pixel this shader instance is processing.

    // 1. Read the original color using `textureLoad`.
    let original_color = textureLoad(input_texture, id.xy, 0);

    // 2. Perform the image processing (in this case, a simple inversion).
    let inverted_color = vec4<f32>(1.0 - original_color.rgb, original_color.a);

    // 3. Write the new color to the output texture at the same coordinate.
    // The value must be a vec4<f32> because the output format is `rgba8unorm`.
    textureStore(output_texture, id.xy, inverted_color);
}

Derivative Functions

This family of functions calculates the rate of change of a value with respect to the screen axes. They are powerful tools available only in fragment shaders.

These functions operate by implicitly comparing the value in the current fragment with the value in an adjacent fragment. They provide derivatives for both the horizontal (X) and vertical (Y) screen axes, as well as a combined measure of the total change.

A critical requirement for these functions is that they must be called in uniform control flow. The GPU executes fragment shaders in small 2x2 blocks of pixels, and derivatives are calculated by differencing values within this block. If an if statement causes one fragment in the block to take a different code path than its neighbor, the values are no longer comparable, making the derivative result meaningless.

While their most common use is implicit (the GPU uses them behind the scenes for textureSample to select mipmap levels), they are powerful explicit tools for advanced effects like procedural anti-aliasing and generating normal maps from height maps.

dpdx

Signature

dpdx(e: T) -> T

  • e: The input value or vector to find the derivative of.

  • T: Can be f32 or a vector of f32 (e.g., vec3<f32>).

Description

Calculates the partial derivative of the input e with respect to the horizontal screen axis (X). This function is only available in fragment shaders.

In practice, this means it computes the rate of change of e by finding the difference between its value in the current fragment and its value in the adjacent fragment to the right. The result can be positive or negative. dpdx is a general version of this function; the GPU is free to use either a high-precision (dpdxFine) or a faster, less-precise (dpdxCoarse) calculation. For explicit control, those functions can be called directly.

This function must be called in uniform control flow. This is because the GPU computes derivatives by running the same code on a 2x2 block of fragments and comparing their results. If fragments in the block take different code paths (due to an if statement), the comparison is meaningless, and the function will return an indeterminate value.

W3C Specification: dpdx

Use Case

Derivative functions are fundamental for many advanced effects. While they are used implicitly by textureSample to select mipmap levels, an explicit use is to procedurally generate normal maps from a height map.

// Assume we have a function that generates a procedural height value (e.g., noise).
let height = procedural_height(in.uv);

// Calculate how the height changes along the screen's X and Y axes.
let height_dx = dpdx(height);
let height_dy = dpdy(height); // (dpdy is the derivative on the Y axis)

// Create two vectors that lie on the surface of our procedural height map.
// These represent the tangents in the X and Y directions.
let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);

// The cross product of the two tangents gives us the surface normal.
let procedural_normal = normalize(cross(tangent_x, tangent_y));

// This normal can now be used for realistic lighting on a procedural surface.

dpdxCoarse

Signature

dpdxCoarse(e: T) -> T

  • e: The input value or vector to find the derivative of.

  • T: Can be f32 or a vector of f32 (e.g., vec3<f32>).

Description

Calculates a low-precision ("coarse") partial derivative of the input e with respect to the horizontal screen axis (X). This function is only available in fragment shaders.

Like dpdx, it computes the rate of change of e between adjacent fragments. However, dpdxCoarse allows the GPU to perform the calculation using "local differences," which may mean it reuses derivative calculations across a larger 2x2 group of fragments rather than computing a unique value for each one. This can be faster but may result in a less precise derivative, potentially leading to blocky artifacts in some algorithms.

This function must be called in uniform control flow.

W3C Specification: dpdxCoarse

Use Case

Used in situations where performance is more critical than perfect derivative accuracy. It can be a performance optimization over dpdx or dpdxFine if the visual impact of the lower precision is negligible.

// Calculating a procedural normal where maximum precision is not required.
let height = procedural_height(in.uv);

// Using the coarse derivatives may be slightly faster. For many procedural
// patterns, the visual difference will be unnoticeable.
let height_dx = dpdxCoarse(height);
let height_dy = dpdyCoarse(height);

let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);

let procedural_normal = normalize(cross(tangent_x, tangent_y));

dpdxFine

Signature

dpdxFine(e: T) -> T

  • e: The input value or vector to find the derivative of.

  • T: Can be f32 or a vector of f32 (e.g., vec3<f32>).

Description

Calculates a high-precision ("fine") partial derivative of the input e with respect to the horizontal screen axis (X). This function is only available in fragment shaders.

Like dpdx, it computes the rate of change of e by finding the difference between its value in the current fragment and its value in an adjacent fragment to the right. The "fine" qualifier suggests that the GPU will use a more precise, per-fragment calculation, potentially providing a more accurate local derivative than dpdxCoarse. This can be important for avoiding artifacts when working with high-frequency data.

This function must be called in uniform control flow.

W3C Specification: dpdxFine

Use Case

Used in situations where derivative accuracy is critical, and you want to ensure the highest-quality result, even at a potential minor performance cost compared to dpdxCoarse. This is important for procedural normals based on detailed height maps or for texture sampling with explicit gradients where precision is key.

// A height map texture that contains fine, sharp details.
@group(1) @binding(0) var height_map: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;

let height = textureSample(height_map, my_sampler, in.uv).r;

// Using the fine derivatives ensures we capture the sharp changes
// in the height map as accurately as possible, preventing aliasing
// or blocky artifacts in the resulting normal.
let height_dx = dpdxFine(height);
let height_dy = dpdyFine(height);

let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);

let procedural_normal = normalize(cross(tangent_x, tangent_y));

dpdy

Signature

dpdy(e: T) -> T

  • e: The input value or vector to find the derivative of.

  • T: Can be f32 or a vector of f32 (e.g., vec3<f32>).

Description

Calculates the partial derivative of the input e with respect to the vertical screen axis (Y). This function is the vertical counterpart to dpdx and is only available in fragment shaders.

It computes the rate of change of e by finding the difference between its value in the current fragment and its value in the adjacent fragment below it. The result can be positive or negative. dpdy is a general version of this function; the GPU is free to use either a high-precision (dpdyFine) or a faster, less-precise (dpdyCoarse) calculation. For explicit control, those functions can be called directly.

This function must be called in uniform control flow. If it is used inside a loop with a variable number of iterations or an if statement with a per-fragment condition, the result is an indeterminate value.

W3C Specification: dpdy

Use Case

Derivative functions are fundamental for many advanced effects. dpdy is almost always used in conjunction with dpdx to understand how a value is changing across a 2D surface. A primary use is generating procedural normal maps from a height map.

// Assume we have a function that generates a procedural height value (e.g., noise).
let height = procedural_height(in.uv);

// Calculate how the height changes along the screen's X and Y axes.
let height_dx = dpdx(height);
let height_dy = dpdy(height);

// Create two vectors that lie on the surface of our procedural height map.
// These represent the tangents in the X and Y directions.
let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);

// The cross product of the two tangents gives us the surface normal.
let procedural_normal = normalize(cross(tangent_x, tangent_y));

// This normal can now be used for realistic lighting on a procedural surface.

dpdyCoarse

Signature

dpdyCoarse(e: T) -> T

  • e: The input value or vector to find the derivative of.

  • T: Can be f32 or a vector of f32 (e.g., vec3<f32>).

Description

Calculates a low-precision ("coarse") partial derivative of the input e with respect to the vertical screen axis (Y). This function is the vertical counterpart to dpdxCoarse and is only available in fragment shaders.

Like dpdy, it computes the rate of change of e between adjacent vertical fragments. However, dpdyCoarse allows the GPU to perform the calculation using "local differences," which may mean it reuses derivative calculations across a larger 2x2 group of fragments rather than computing a unique value for each one. This can be faster but may result in a less precise derivative.

This function must be called in uniform control flow.

W3C Specification: dpdyCoarse

Use Case

Used in situations where performance is more critical than perfect derivative accuracy. It can be a performance optimization over dpdy or dpdyFine if the visual impact of the lower precision is negligible. It is typically used alongside dpdxCoarse.

// Calculating a procedural normal where maximum precision is not required.
let height = procedural_height(in.uv);

// Using the coarse derivatives may be slightly faster. For many procedural
// patterns, the visual difference will be unnoticeable.
let height_dx = dpdxCoarse(height);
let height_dy = dpdyCoarse(height);

let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);

let procedural_normal = normalize(cross(tangent_x, tangent_y));

dpdyFine

Signature

dpdyFine(e: T) -> T

  • e: The input value or vector to find the derivative of.

  • T: Can be f32 or a vector of f32 (e.g., vec3<f32>).

Description

Calculates a high-precision ("fine") partial derivative of the input e with respect to the vertical screen axis (Y). This function is the vertical counterpart to dpdxFine and is only available in fragment shaders.

Like dpdy, it computes the rate of change of e by finding the difference between its value in the current fragment and its value in an adjacent vertical fragment. The "fine" qualifier suggests that the GPU will use a more precise, per-fragment calculation, potentially providing a more accurate local derivative than dpdyCoarse.

This function must be called in uniform control flow.

W3C Specification: dpdyFine

Use Case

Used in situations where derivative accuracy is critical, particularly when creating procedural normals from high-frequency data. It is almost always used in conjunction with dpdxFine to ensure consistent precision.

// A height map texture that contains fine, sharp details.
@group(1) @binding(0) var height_map: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;

let height = textureSample(height_map, my_sampler, in.uv).r;

// Using the fine derivatives ensures we capture the sharp changes
// in the height map as accurately as possible, preventing aliasing
// or blocky artifacts in the resulting normal.
let height_dx = dpdxFine(height);
let height_dy = dpdyFine(height);

let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);

let procedural_normal = normalize(cross(tangent_x, tangent_y));

fwidth

Signature

fwidth(e: T) -> T

  • e: The input value or vector to find the derivative of.

  • T: Can be f32 or a vector of f32 (e.g., vec3<f32>).

Description

Calculates the sum of the absolute values of the partial derivatives with respect to the screen axes. It is a convenient shorthand for abs(dpdx(e)) + abs(dpdy(e)). This function is only available in fragment shaders.

The result is a single positive value that represents the total rate of change of the input e at the current fragment, combining both horizontal and vertical changes. It provides a measure of how "busy" or how steep the gradient of e is across the pixel.

This function must be called in uniform control flow.

W3C Specification: fwidth

Use Case

The primary use of fwidth is for procedural anti-aliasing. It can determine the "width" of a procedural shape in screen space, allowing you to create a smooth transition that is exactly one pixel wide, eliminating aliasing (jagged edges).

// Use Case: Drawing an anti-aliased procedural line.

let line_pos = 0.5;
let uv = in.uv;

// An aliased (jagged) vertical line using `step`.
let aliased_line = step(line_pos, uv.x);

// To anti-alias it, we need to know how much `uv.x` changes from
// one pixel to the next. This is exactly what `fwidth` provides.
let pixel_width = fwidth(uv.x);

// Now, instead of a hard step, we create a smooth transition that is
// exactly one pixel wide, centered on our line position.
let anti_aliased_line = smoothstep(
    line_pos - pixel_width,
    line_pos + pixel_width,
    uv.x
);

// The `anti_aliased_line` will be a soft, non-jagged line, while
// `aliased_line` will be pixelated.

fwidthCoarse

Signature

fwidthCoarse(e: T) -> T

  • e: The input value or vector to find the derivative of.

  • T: Can be f32 or a vector of f32 (e.g., vec3<f32>).

Description

Calculates the sum of the absolute values of the low-precision ("coarse") partial derivatives. It is a convenient shorthand for abs(dpdxCoarse(e)) + abs(dpdyCoarse(e)). This function is only available in fragment shaders.

This is the less precise but potentially faster version of fwidth. It provides an estimate of the total rate of change of the input e at the current fragment.

This function must be called in uniform control flow.

W3C Specification: fwidthCoarse

Use Case

Used for procedural anti-aliasing in situations where maximum performance is desired and a slightly less accurate gradient calculation is acceptable. For many effects, the visual difference between fwidth and fwidthCoarse is negligible.

// Use Case: Procedural anti-aliasing where performance is a priority.

let circle_radius = 0.4;
let dist = length(in.uv - 0.5);

// Get a fast, approximate measure of how much `dist` changes across one pixel.
let pixel_width = fwidthCoarse(dist);

// Create a smooth transition (anti-aliasing) that is approximately one pixel wide.
let anti_aliased_circle = 1.0 - smoothstep(
    circle_radius - pixel_width,
    circle_radius + pixel_width,
    dist
);


### `fwidthFine`

#### Signature

`fwidthFine(e: T) -> T`

* `e`: The input value or vector to find the derivative of.

* `T`: Can be `f32` or a vector of `f32` (e.g., `vec3<f32>`).


#### Description

Calculates the sum of the absolute values of the high-precision ("fine") partial derivatives. It is a convenient shorthand for `abs(dpdxFine(e)) + abs(dpdyFine(e))`. This function is only available in fragment shaders.

This is the high-precision version of [`fwidth`](#heading-fwidth). It provides a more accurate measure of the total rate of change of the input `e` at the current fragment, which is important for effects that rely on precise gradient information.

This function must be called in **uniform control flow**.

*W3C Specification:* [`fwidthFine`](https://www.w3.org/TR/WGSL/#fwidthFine-builtin)

#### Use Case

Used for high-quality procedural anti-aliasing, especially for patterns with sharp, high-frequency details where a less precise gradient from [`fwidthCoarse`](#heading-fwidthcoarse) might lead to visual artifacts like moiré patterns or inconsistent line thickness.

```rust
// Use Case: Drawing a high-quality, anti-aliased procedural grid.
let uv = in.uv * 10.0; // Create a 10x10 grid.

// `fract()` gives us sawtooth waves. `abs(v - 0.5)` gives us triangles.
// `dist` will be 0 at the grid lines and 0.5 at the center of cells.
let dist_to_line = abs(fract(uv) - 0.5);

// Get a precise measure of how much `dist_to_line` changes across one pixel.
let pixel_width = fwidthFine(dist_to_line);

// Use this precise width to create a smooth transition around the grid lines.
// `smoothstep` will create a line that is exactly one pixel thick.
let grid_lines = 1.0 - smoothstep(
    0.5 - pixel_width, // Start fading in one pixel-width before the line
    0.5,               // The line itself
    dist_to_line
);

// The result is a crisp, perfectly anti-aliased grid.

Atomic Read-Modify-Write Functions

This family of functions performs atomic read-modify-write operations on a memory location shared between multiple shader invocations (threads). They are primarily used in compute shaders to safely modify data without causing race conditions.

An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs one of these functions, it executes a complete sequence:

  1. It reads the original value from a memory location.

  2. It performs a calculation using that original value and a new input value (e.g., addition, comparison, or a bitwise operation).

  3. It writes the result of the calculation back to the memory location.

This entire sequence happens without any other thread being able to interfere. A key feature of all functions in this group is that they return the original value that was in memory before the modification took place.

atomicAdd

Signature

atomicAdd(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T

  • atomic_ptr: A pointer to the atomic integer to be modified.

    • AS: The address space of the variable, typically workgroup or storage.

    • atomic<T>: The special type indicating this memory is for atomic access.

    • read_write: The access mode for the pointer.

  • v: The integer value to add.

  • T: The data type, which must be either i32 or u32.

Description

Atomically adds the value v to the integer stored at the memory location atomic_ptr. The function returns the original value that was stored in memory before the addition occurred.

An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicAdd, it reads the original value, calculates the sum, and writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for preventing race conditions in compute shaders where multiple threads might try to modify the same shared variable at the same time.

W3C Specification: atomicAdd

Use Case

The most common use case is to safely increment a shared counter from multiple threads in a compute shader.

// A shared counter in `workgroup` memory, accessible by all threads.
// Note the `atomic<u32>` type declaration.
var<workgroup> shared_counter: atomic<u32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
    // A single thread initializes the counter.
    if (local_id == 0u) {
        // `atomicStore` is used for initialization.
        atomicStore(&shared_counter, 0u);
    }
    // Ensure all threads wait until initialization is done.
    workgroupBarrier();

    // Imagine each thread does some work and needs to increment the counter.
    // Without `atomicAdd`, if two threads read the value '5' at the same time,
    // they would both write back '6', and one increment would be lost.
    // `atomicAdd` prevents this.
    atomicAdd(&shared_counter, 1u);
}

atomicSub

Signature

atomicSub(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T

  • atomic_ptr: A pointer to the atomic integer to be modified.

    • AS: The address space of the variable, typically workgroup or storage.

    • atomic<T>: The special type indicating this memory is for atomic access.

    • read_write: The access mode for the pointer.

  • v: The integer value to subtract.

  • T: The data type, which must be either i32 or u32.

Description

Atomically subtracts the value v from the integer stored at the memory location atomic_ptr. The function returns the original value that was stored in memory before the subtraction occurred.

An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicSub, it reads the original value, calculates the difference, and writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for preventing race conditions in compute shaders where multiple threads might try to modify the same shared variable simultaneously.

W3C Specification: atomicSub

Use Case

Used to safely decrement a shared counter or resource pool from multiple threads in a compute shader. For example, implementing a work queue where threads "claim" items to process.

// A shared counter for the number of items left in a work pool.
var<workgroup> items_remaining: atomic<u32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
    // A single thread initializes the counter.
    if (local_id == 0u) {
        atomicStore(&items_remaining, 100u); // Start with 100 items.
    }
    workgroupBarrier();

    // Each thread attempts to claim an item by decrementing the counter.
    // `atomicSub` returns the value *before* the subtraction.
    let previous_count = atomicSub(&items_remaining, 1u);

    // If the count before we subtracted was greater than 0, then our claim was successful.
    if (previous_count > 0u) {
        // ... process the item ...
    }
}

atomicMax

Signature

atomicMax(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T

  • atomic_ptr: A pointer to the atomic integer to be modified.

    • AS: The address space of the variable, typically workgroup or storage.

    • atomic<T>: The special type indicating this memory is for atomic access.

    • read_write: The access mode for the pointer.

  • v: The integer value to compare against.

  • T: The data type, which must be either i32 or u32.

Description

Atomically compares the value v with the integer stored at the memory location atomic_ptr. If v is larger, the value at atomic_ptr is replaced with v. The function returns the original value that was stored in memory before the comparison and potential replacement occurred.

An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicMax, it reads the original value, performs the comparison, and (if necessary) writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for preventing race conditions when finding a maximum value in parallel.

W3C Specification: atomicMax

Use Case

Used to safely find the maximum value in a dataset in parallel. Each thread in a compute shader can process a subset of the data and use atomicMax to update a single shared variable with its local maximum.

// A shared variable to hold the global maximum value found by the workgroup.
var<workgroup> global_maximum: atomic<u32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
    // A single thread initializes the shared maximum to zero.
    if (local_id == 0u) {
        atomicStore(&global_maximum, 0u);
    }
    workgroupBarrier();

    // Each thread calculates its own local maximum from some data.
    let local_maximum = calculate_local_maximum(local_id); // e.g., 10u, 50u, 20u...

    // Each thread attempts to update the shared global_maximum.
    // `atomicMax` ensures that even if two threads try to write a new
    // maximum at the same time, the correct final maximum is preserved.
    atomicMax(&global_maximum, local_maximum);
}

atomicMin

Signature

atomicMin(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T

  • atomic_ptr: A pointer to the atomic integer to be modified.

    • AS: The address space of the variable, typically workgroup or storage.

    • atomic<T>: The special type indicating this memory is for atomic access.

    • read_write: The access mode for the pointer.

  • v: The integer value to compare against.

  • T: The data type, which must be either i32 or u32.

Description

Atomically compares the value v with the integer stored at the memory location atomic_ptr. If v is smaller, the value at atomic_ptr is replaced with v. The function returns the original value that was stored in memory before the comparison and potential replacement occurred.

An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicMin, it reads the original value, performs the comparison, and (if necessary) writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for preventing race conditions when finding a minimum value in parallel.

W3C Specification: atomicMin

Use Case

Used to safely find the minimum value in a dataset in parallel. Each thread in a compute shader can process a subset of the data and use atomicMin to update a single shared variable with its local minimum.

// A shared variable to hold the global minimum value found by the workgroup.
var<workgroup> global_minimum: atomic<u32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
    // A single thread initializes the shared minimum to a very large value.
    if (local_id == 0u) {
        atomicStore(&global_minimum, 4294967295u); // Max u32 value
    }
    workgroupBarrier();

    // Each thread calculates its own local minimum from some data.
    let local_minimum = calculate_local_minimum(local_id); // e.g., 100u, 50u, 200u...

    // Each thread attempts to update the shared global_minimum.
    // `atomicMin` ensures that even if two threads try to write a new
    // minimum at the same time, the correct final minimum is preserved.
    atomicMin(&global_minimum, local_minimum);
}

atomicAnd

Signature

atomicAnd(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T

  • atomic_ptr: A pointer to the atomic integer to be modified.

    • AS: The address space of the variable, typically workgroup or storage.

    • atomic<T>: The special type indicating this memory is for atomic access.

    • read_write: The access mode for the pointer.

  • v: The integer value to use for the bitwise AND operation.

  • T: The data type, which must be either i32 or u32.

Description

Atomically performs a bitwise AND operation between the value v and the integer stored at the memory location atomic_ptr, storing the result back at atomic_ptr. The function returns the original value that was stored in memory before the AND operation occurred.

An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicAnd, it reads the original value, performs the bitwise AND, and writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for safely manipulating shared bitmasks from multiple threads without race conditions.

W3C Specification: atomicAnd

Use Case

Used to safely clear specific bits in a shared integer flag or bitmask. Each thread can use atomicAnd with a custom mask to clear its own bit without affecting the bits being modified by other threads.

// A shared bitmask representing the completion status of 64 tasks.
// A '1' means the task is NOT complete.
var<workgroup> completion_mask: atomic<u32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
    // A single thread initializes the mask to all '1's, indicating no tasks are complete.
    if (local_id == 0u) {
        atomicStore(&completion_mask, 0xFFFFFFFFu);
    }
    workgroupBarrier();

    // ... each thread performs its task ...

    // When a thread finishes its task, it clears its corresponding bit in the mask.
    // Create a mask to clear the bit at `local_id`.
    // `~` is the bitwise NOT operator. `1u << local_id` creates a '1' at the thread's bit.
    // `~ (1u << local_id)` creates a mask with a '0' at the thread's bit and '1's everywhere else.
    let clear_mask = ~ (1u << local_id);

    // `atomicAnd` safely applies this mask. For example, if the shared mask was ...1111
    // and our clear_mask is ...1011, the result will be ...1011.
    atomicAnd(&completion_mask, clear_mask);
}

atomicOr

Signature

atomicOr(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T

  • atomic_ptr: A pointer to the atomic integer to be modified.

    • AS: The address space of the variable, typically workgroup or storage.

    • atomic<T>: The special type indicating this memory is for atomic access.

    • read_write: The access mode for the pointer.

  • v: The integer value to use for the bitwise OR operation.

  • T: The data type, which must be either i32 or u32.

Description

Atomically performs a bitwise OR operation between the value v and the integer stored at the memory location atomic_ptr, storing the result back at atomic_ptr. The function returns the original value that was stored in memory before the OR operation occurred.

An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicOr, it reads the original value, performs the bitwise OR, and writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for safely manipulating shared bitmasks from multiple threads without race conditions.

W3C Specification: atomicOr

Use Case

Used to safely set specific bits in a shared integer flag or bitmask. Each thread can use atomicOr with a custom mask to set its own bit without affecting the bits being modified by other threads.

// A shared bitmask representing which tasks have completed.
// A '1' means the task IS complete.
var<workgroup> completion_mask: atomic<u32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
    // A single thread initializes the mask to all '0's.
    if (local_id == 0u) {
        atomicStore(&completion_mask, 0u);
    }
    workgroupBarrier();

    // ... each thread performs its task ...
    var task_is_complete = true; // Based on some per-thread calculation.

    if (task_is_complete) {
        // Create a mask to set the bit corresponding to this thread's ID.
        let set_mask = 1u << local_id;

        // `atomicOr` safely applies this mask. If thread 2 (mask ...0100) and
        // thread 3 (mask ...1000) both finish at the same time, the `atomicOr`
        // guarantees the final result will correctly have both bits set (...1100).
        atomicOr(&completion_mask, set_mask);
    }
}

atomicXor

Signature

atomicXor(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T

  • atomic_ptr: A pointer to the atomic integer to be modified.

    • AS: The address space of the variable, typically workgroup or storage.

    • atomic<T>: The special type indicating this memory is for atomic access.

    • read_write: The access mode for the pointer.

  • v: The integer value to use for the bitwise XOR operation.

  • T: The data type, which must be either i32 or u32.

Description

Atomically performs a bitwise XOR operation between the value v and the integer stored at the memory location atomic_ptr, storing the result back at atomic_ptr. The function returns the original value that was stored in memory before the XOR operation occurred.

An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicXor, it reads the original value, performs the bitwise XOR, and writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for safely manipulating shared bitmasks from multiple threads without race conditions.

W3C Specification: atomicXor

Use Case

Used to safely toggle specific bits in a shared integer flag or bitmask. The XOR operation flips a bit if the corresponding bit in the mask is 1, and leaves it unchanged if the mask bit is 0.

// A shared bitmask representing a set of toggleable states.
var<workgroup> toggle_states: atomic<u32>;

@compute @workgroup_size(32, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
    // A single thread initializes the states.
    if (local_id == 0u) {
        atomicStore(&toggle_states, 0u);
    }
    workgroupBarrier();

    // Imagine each thread decides it needs to flip the state of its corresponding bit.
    let should_toggle = some_per_thread_condition();

    if (should_toggle) {
        // Create a mask to toggle the bit for this thread's ID.
        let toggle_mask = 1u << local_id;

        // `atomicXor` safely applies this mask. If a thread applies the mask, its bit
        // will flip (0->1 or 1->0). If multiple threads do this at the same time,
        // each toggle is guaranteed to be applied correctly without being lost.
        atomicXor(&toggle_states, toggle_mask);
    }
}

Atomic Access and Exchange Functions

This family of functions provides the fundamental primitives for reading, writing, and swapping values in shared memory atomically.

Unlike the "Read-Modify-Write" group, these operations do not perform arithmetic calculations. Instead, their focus is on direct memory access and replacement. This category includes the most basic atomic operations: safely reading a complete value from memory and safely writing a complete value to memory. It also contains more powerful primitives for unconditionally swapping a value with a new one, and for performing a conditional swap that only succeeds if the memory currently holds an expected value.

These functions are the essential building blocks for creating any complex, lock-free algorithm or synchronization mechanism in a compute shader, enabling multiple threads to coordinate their work on shared data without corruption.

atomicLoad

Signature

atomicLoad(atomic_ptr: ptr<AS, atomic<T>, read_write>) -> T

  • atomic_ptr: A pointer to the atomic integer to be read.

    • AS: The address space of the variable, typically workgroup or storage.

    • atomic<T>: The special type indicating this memory is for atomic access.

    • read_write: The access mode for the pointer.

  • T: The data type, which must be either i32 or u32.

Description

Atomically loads (reads) the integer value from the memory location atomic_ptr and returns it.

An atomic load guarantees that the entire value is read without being partially modified ("torn") by another thread writing to the same location at the same time. It ensures you always get a coherent, complete value that was present at some point in time, even during concurrent write operations.

W3C Specification: atomicLoad

Use Case

Used in compute shaders to safely read a shared value that may have been modified by other threads. It is often used after a workgroupBarrier to ensure all previous modifications are visible before the read.

var<workgroup> shared_result: atomic<u32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
    // A single thread calculates a result and stores it atomically.
    if (local_id == 0u) {
        let result = do_some_complex_calculation();
        atomicStore(&shared_result, result);
    }

    // A workgroup barrier ensures that all threads wait until the `atomicStore`
    // from thread 0 is complete and its result is visible to all other threads.
    workgroupBarrier();

    // Now, all other threads can safely read the calculated result.
    // `atomicLoad` guarantees they get the complete, final value written by thread 0.
    let final_result = atomicLoad(&shared_result);

    // ... use final_result in subsequent calculations ...
}

atomicStore

Signature

atomicStore(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T)

  • atomic_ptr: A pointer to the atomic integer to be modified.

    • AS: The address space of the variable, typically workgroup or storage.

    • atomic<T>: The special type indicating this memory is for atomic access.

    • read_write: The access mode for the pointer.

  • v: The integer value to store.

  • T: The data type, which must be either i32 or u32.

Description

Atomically stores (writes) the integer value v to the memory location atomic_ptr. This function does not return a value.

An atomic store guarantees that the entire value is written without being partially overwritten ("torn") by another thread writing to the same location at the same time. It ensures that any other thread reading this location will see either the value from before the store or the complete value from after the store, but never a corrupt, half-written value.

W3C Specification: atomicStore

Use Case

Used in compute shaders to safely write to a shared memory location. It is the primary way to initialize an atomic variable or to publish a final result from one thread for other threads to read.

var<workgroup> shared_result: atomic<u32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
    // A single thread is designated to calculate and store an initial value.
    if (local_id == 0u) {
        let initial_value = 123u;

        // The atomic store ensures the value 123 is written completely
        // before any other thread can read it.
        atomicStore(&shared_result, initial_value);
    }

    // A barrier is crucial here. It makes all other threads wait until the
    // store operation from thread 0 is complete and visible.
    workgroupBarrier();

    // Now other threads can safely load the initialized value.
    let value = atomicLoad(&shared_result); // will be 123 for all threads
}

atomicExchange

Signature

atomicExchange(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T

  • atomic_ptr: A pointer to the atomic integer to be modified.

    • AS: The address space of the variable, typically workgroup or storage.

    • atomic<T>: The special type indicating this memory is for atomic access.

    • read_write: The access mode for the pointer.

  • v: The new integer value to store.

  • T: The data type, which must be either i32 or u32.

Description

Atomically replaces the integer value at the memory location atomic_ptr with v. The function returns the original value that was stored in memory before the replacement occurred.

An atomic exchange is an indivisible "read-and-then-write" operation. It guarantees that the old value is read and the new value is written without any other thread being able to interfere in the middle of the sequence.

W3C Specification: atomicExchange

Use Case

Used in compute shaders for algorithms that require a "claim and replace" pattern on a shared resource. A common example is implementing a simple spinlock or mutex.

// A shared variable representing a lock. 0 means unlocked, 1 means locked.
var<workgroup> lock: atomic<u32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
    // Initialize the lock to 0 (unlocked).
    if (local_id == 0u) {
        atomicStore(&lock, 0u);
    }
    workgroupBarrier();

    // Try to acquire the lock.
    // `atomicExchange` attempts to write '1' (locked) and returns the previous value.
    // This entire operation is atomic.
    if (atomicExchange(&lock, 1u) == 0u) {
        // If the previous value was '0' (unlocked), we have successfully
        // acquired the lock because we were the one to write the '1'.

        // ... perform critical section work here ...

        // Release the lock by storing '0' back.
        atomicStore(&lock, 0u);
    }
}

atomicCompareExchangeWeak

Signature

atomicCompareExchangeWeak(atomic_ptr: ptr<AS, atomic<T>, read_write>, cmp: T, v: T) -> R

  • atomic_ptr: A pointer to the atomic integer to be modified.

    • AS: The address space of the variable, typically workgroup or storage.

    • atomic<T>: The special type indicating this memory is for atomic access.

    • read_write: The access mode for the pointer.

  • cmp: The value to compare against the value stored at atomic_ptr.

  • v: The new value to store if the comparison is successful.

  • T: The data type, which must be either i32 or u32.

  • R: A special, built-in struct __atomic_compare_exchange_result<T> that cannot be declared by name. It must be inferred with let. This struct contains two members:

    • old_value: T: The value that was in memory at atomic_ptr before this operation began.

    • exchanged: bool: true if the exchange was successful, false otherwise.

Description

Performs an atomic "compare-and-swap" (CAS) operation. This is a conditional write that is the most powerful of the atomic primitives, forming the basis for many complex, lock-free algorithms.

The operation follows these indivisible steps:

  1. Atomically reads the value at atomic_ptr (let's call it original_value).

  2. Compares original_value with the provided comparison value, cmp.

  3. If original_value equals cmp, it atomically attempts to write the new value v to atomic_ptr.

  4. If original_value does not equal cmp, it does nothing.

The "Weak" Guarantee and Spurious Failures:
The "weak" in the function's name is critical. It means the operation can spuriously fail. This is a rare event where the exchanged member of the result will be false even though original_value was equal to cmp. This is a performance trade-off allowed by some hardware.

Because of this possibility, the standard and correct way to use this function is to always place it inside a loop. The loop should repeat the attempt until exchanged is true.

W3C Specification: atomicCompareExchangeWeak

Use Case

The canonical use case is implementing robust, lock-free data structures or synchronization primitives like a spinlock. The loop pattern is mandatory for correctness.

// A shared variable representing a lock. 0 means unlocked, 1 means locked.
var<workgroup> lock: atomic<u32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
    // Initialize the lock to 0 (unlocked).
    if (local_id == 0u) {
        atomicStore(&lock, 0u);
    }
    workgroupBarrier();

    // Loop until we successfully acquire the lock.
    loop {
        // We expect the lock to be 0 (unlocked). This is our comparison value.
        let expected_value = 0u;

        // Attempt to swap the 0 with a 1 (locked).
        let result = atomicCompareExchangeWeak(&lock, expected_value, 1u);

        // The operation succeeded if:
        // 1. The old value was indeed 0.
        // 2. The exchange was successful (no spurious failure).
        if (result.old_value == expected_value && result.exchanged) {
            // We have successfully acquired the lock.
            break; // Exit the loop.
        }

        // If we failed (either because another thread got the lock first, or
        // a spurious failure), the loop will repeat the attempt.
    }

    // ... perform critical section work here ...

    // Release the lock. A simple store is sufficient.
    atomicStore(&lock, 0u);
}

Synchronization Functions

This family of functions provides the essential tools for controlling the order of execution and memory visibility between different shader invocations (threads). They are the fundamental primitives for coordinating work in a parallel environment and are primarily used in compute shaders.

By default, shader threads execute independently, and the hardware makes no guarantees about their relative speed or the order in which their memory operations become visible to each other. This can lead to "race conditions," where one thread reads a value before another thread has had a chance to write its result, leading to incorrect or unpredictable behavior.

Synchronization functions solve this problem by creating a barrier, which is a point in the shader code that all threads in a given scope (like a compute shader workgroup) must reach before any of them are allowed to proceed further. This enforces a strict order of operations. Crucially, barriers also enforce memory visibility, guaranteeing that all memory writes made before the barrier are completed and visible to all threads after the barrier. These functions are the indispensable traffic control for any algorithm that requires threads to collaborate on shared data.

storageBarrier

Signature

storageBarrier()

This function takes no parameters and does not return a value.

Description

Executes a control barrier and synchronizes memory access for variables in the storage address space. This function is primarily used in compute shaders.

When storageBarrier is called, it has two effects:

  1. Control Barrier: All shader invocations (threads) within the same workgroup must reach this point in the code before any of them are allowed to proceed further.

  2. Memory Barrier: It guarantees that all writes to storage and atomic variables in the storage address space performed by any thread in the workgroup before the barrier are completed and visible to all threads in the same workgroup after the barrier.

This ensures that threads reading from a storage buffer after the barrier will not see stale data written by other threads before the barrier. Note that workgroupBarrier provides a stronger guarantee, as it synchronizes both workgroup and storage memory. storageBarrier is a more granular control for when you only need to synchronize storage access.

W3C Specification: storageBarrier

Use Case

Used to coordinate multi-stage algorithms within a compute shader workgroup that use a storage buffer for communication between threads.

// A buffer in the storage address space, accessible for reading and writing.
@group(0) @binding(0) var<storage, read_write> data_buffer: array<f32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_id) local_id: vec3<u32>) {
    let idx = local_id.x;

    // --- Stage 1: Initial Calculation ---
    // Each thread performs a calculation and writes its result to the buffer.
    let initial_value = f32(idx) * 2.0;
    data_buffer[idx] = initial_value;

    // --- Synchronization ---
    // This barrier ensures that all 64 writes from Stage 1 are complete
    // and visible to all threads in this workgroup before anyone proceeds.
    storageBarrier();

    // --- Stage 2: Neighbor-dependent Calculation ---
    // Each thread now reads the results written by its neighbors in Stage 1
    // to perform a second calculation (e.g., a simple blur/average).
    let left_neighbor_val = data_buffer[idx - 1u];
    let right_neighbor_val = data_buffer[idx + 1u];

    // Without the barrier, a thread might read the old, incorrect value
    // from its neighbor's location before the neighbor had a chance to write its new value.
    data_buffer[idx] = (left_neighbor_val + initial_value + right_neighbor_val) / 3.0;
}

workgroupBarrier

Signature

workgroupBarrier()

This function takes no parameters and does not return a value.

Description

Executes a control barrier and synchronizes memory access for variables in both the workgroup and storage address spaces. This is the strongest synchronization function and is exclusively used in compute shaders.

When workgroupBarrier is called, it has two effects:

  1. Control Barrier: All shader invocations (threads) within the same workgroup must reach this point in the code before any of them are allowed to proceed further. This synchronizes the execution flow of all threads in the workgroup.

  2. Memory Barrier: It guarantees that all writes to workgroup variables, storage buffers, and atomic variables in those address spaces performed by any thread in the workgroup before the barrier are completed and visible to all other threads in the workgroup after the barrier.

This is the most common barrier function, as it provides a comprehensive guarantee that a stage of a parallel algorithm is fully complete before the next stage begins.

W3C Specification: workgroupBarrier

Use Case

Used to coordinate multi-stage algorithms within a compute shader workgroup, especially when using the fast workgroup memory for communication between threads. A classic example is a parallel reduction (e.g., finding the sum of a large array).

// A shared array in fast `workgroup` memory to hold data for one workgroup.
var<workgroup> local_data: array<f32, 64>;

@group(0) @binding(0) var<storage, read> input_data: array<f32>;
@group(0) @binding(1) var<storage, read_write> output_data: array<f32>;

@compute @workgroup_size(64, 1, 1)
fn main(
    @builtin(global_invocation_id) global_id: vec3<u32>,
    @builtin(local_invocation_id) local_id: vec3<u32>,
) {
    let idx = local_id.x;

    // --- Stage 1: Load data from slow global storage to fast workgroup memory ---
    local_data[idx] = input_data[global_id.x];

    // --- Synchronization ---
    // This is CRITICAL. It ensures every thread has finished loading its data
    // into `local_data` before anyone starts reading from it.
    workgroupBarrier();

    // --- Stage 2: Parallel reduction in workgroup memory ---
    // (This is a simplified reduction loop)
    var stride = 32u;
    loop {
        if (stride == 0u) { break; }
        if (idx < stride) {
            local_data[idx] = local_data[idx] + local_data[idx + stride];
        }
        // Barrier inside the loop ensures each level of the reduction is complete
        // before the next level begins.
        workgroupBarrier();
        stride = stride / 2u;
    }

    // --- Stage 3: Write final result ---
    // After the reduction, thread 0 holds the sum for the entire workgroup.
    if (idx == 0u) {
        output_data[global_id.x / 64u] = local_data[0];
    }
}

workgroupUniformLoad

Signature

workgroupUniformLoad(p : ptr<workgroup, T>) -> T

  • p: A pointer to a variable in the workgroup address space. The pointer p itself must be a uniform value (i.e., it must be the same for all invocations in the workgroup).

  • T: A concrete, non-atomic type with a fixed size, such as f32, vec4<i32>, or a user-defined struct that meets these criteria.

Description

Atomically reads a value from workgroup memory and broadcasts that single value to every invocation (thread) in the workgroup. This function is exclusively used in compute shaders.

workgroupUniformLoad has two critical, combined effects:

  1. Uniform Return Value: It guarantees that every single thread in the workgroup receives the exact same return value. This is powerful because using this uniform result in a subsequent if statement or loop condition will not cause thread divergence within the workgroup.

  2. Implicit Barrier: It executes a control barrier and synchronizes memory for the workgroup address space. This means all threads must reach this function call before any can proceed, and all writes to workgroup memory before the call are guaranteed to be visible to the load operation. It effectively combines a workgroupBarrier with a subsequent load.

It provides a single, synchronized value that can be safely used to direct the flow of the entire workgroup.

W3C Specification: workgroupUniformLoad

Use Case

Used to efficiently distribute a single, workgroup-wide value that is calculated by one thread to all other threads in that workgroup, while also providing the necessary synchronization.

// A shared variable in workgroup memory.
var<workgroup> workgroup_id: u32;

@compute @workgroup_size(64, 1, 1)
fn main(
    @builtin(local_invocation_id) local_id: vec3<u32>,
    @builtin(workgroup_id) group_id: vec3<u32>
) {
    // ---- The "Manual" Way ----
    // A single thread calculates a group-wide value.
    if (local_id.x == 0u) {
        workgroup_id = group_id.x;
    }
    // A barrier is required to make the write visible to all other threads.
    workgroupBarrier();
    // Now every thread loads the value.
    let id_manual = workgroup_id;


    // ---- The `workgroupUniformLoad` Way ----
    // A single thread calculates a group-wide value.
    if (local_id.x == 0u) {
        workgroup_id = group_id.x;
    }
    // This single function both waits for all threads (acting as a barrier)
    // and loads the value written by thread 0 into a uniform variable for all threads.
    let id_uniform = workgroupUniformLoad(&workgroup_id);

    // Because `id_uniform` is guaranteed to be the same for all threads,
    // this 'if' statement will not cause thread divergence.
    if (id_uniform % 2u == 0u) {
        // ... all threads in this workgroup will either enter this block, or none will ...
    }
}