Complete WGSL Built-in Function Reference

Think of the main tutorial series as a guided tour of the most important landmarks in shader programming. We focus on the essential, powerful tools you'll use every day, like mix(), dot(), and normalize().
But what happens when you're off the guided path and have a specific question? "What's the best way to manipulate individual bits?" "Is there a faster way to calculate a square root?" "How do I read a single texel from a specific mipmap level?"
This appendix is your comprehensive encyclopedia for those moments. It's a categorized, practical reference to the entire WGSL built-in function library, designed to give you the right answer quickly without forcing you to parse the dense official WGSL specification.
Each function is documented with a consistent format for easy scanning:
Signature:
function_name(parameter: type) -> return_typeDescription: A concise explanation of what the function does.
Use Case: A short code snippet demonstrating a practical application.
Alphabetical Function Index
For quick lookups, here is a complete list of all built-in WGSL functions in alphabetical order. Click on any function name to jump directly to its detailed entry.
abs acos acosh all any arrayLength asin asinh atan atan2 atanh atomicAdd atomicCompareExchangeWeak atomicExchange atomicLoad atomicStore
ceil clamp cos cosh countLeadingZeros countOneBits countTrailingZeros cross
degrees determinant distance dot dpdx dpdxCoarse dpdxFine dpdy dpdyCoarse dpdyFine
faceForward firstLeadingBit firstTrailingBit floor fma fract frexp fwidth fwidthCoarse fwidthFine
pack2x16float pack2x16snorm pack2x16unorm pack4x8snorm pack4x8unorm pow
radians reflect refract reverseBits round
saturate select sign sin sinh smoothstep sqrt step storageBarrier
tan tanh textureDimensions textureGather textureGatherCompare textureLoad textureNumLayers textureNumLevels textureNumSamples textureSample textureSampleBaseClampToEdge textureSampleBias textureSampleCompare textureSampleCompareLevel textureSampleGrad textureSampleLevel textureStore transpose trunc
unpack2x16float unpack2x16snorm unpack2x16unorm unpack4x8snorm unpack4x8unorm
workgroupBarrier workgroupUniformLoad
Index by Category
To discover functions based on their purpose, browse the categories below.
Scalar & Common Mathematical Functions
absacosacoshasinasinhatanatan2atanhceilcoscoshdegreesexpexp2floorfmafractfrexpinverseSqrtldexploglog2modfpowquantizeToF16radiansroundsignsinsinhsqrttantanhtrunc-
crossdistancedotfaceForwardlengthnormalizereflectrefract -
countLeadingZeroscountOneBitscountTrailingZerosextractBitsfirstLeadingBitfirstTrailingBitinsertBitsreverseBits -
pack2x16floatpack2x16snormpack2x16unormpack4x8snormpack4x8unormunpack2x16floatunpack2x16snormunpack2x16unormunpack4x8snormunpack4x8unorm -
textureDimensionstextureGathertextureGatherComparetextureLoadtextureNumLayerstextureNumLevelstextureNumSamplestextureSampletextureSampleBaseClampToEdgetextureSampleBiastextureSampleComparetextureSampleCompareLeveltextureSampleGradtextureSampleLeveltextureStore -
dpdxdpdxCoarsedpdxFinedpdydpdyCoarsedpdyFinefwidthfwidthCoarsefwidthFine Atomic Read-Modify-Write Functions
atomicAddatomicSubatomicMaxatomicMinatomicAndatomicOratomicXorAtomic Access and Exchange Functions
atomicLoadatomicStoreatomicExchangeatomicCompareExchangeWeak
Scalar & Common Mathematical Functions
This is the largest and most fundamental category of built-in functions in WGSL. These are the mathematical workhorses that form the building blocks of almost every shader effect.
While primarily designed to operate on single scalar values (like f32 or i32), these functions have the powerful property of automatically working component-wise when given a vector input. This means you can apply the same operation to all elements of a vector in a single, concise call.
The toolkit provided by this category is vast. It gives you the power to create smooth, repeating motion for animation through trigonometric operations. You can control and shape numerical values by manipulating their sign or fractional parts, which is the key to generating procedural, tiled patterns. Furthermore, you can implement non-linear curves and falloffs for realistic lighting and effects using functions for powers, roots, and exponents. This is the essential mathematical library you will reach for in nearly every shader you write.
abs
Signature
abs(e: T) -> T
e: The input value or vector.T: Can bef32,i32,u32, orf16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the absolute value of e, making negative values positive. When e is a vector, the function is applied to each component individually.
For floating-point types (
f32): This effectively removes the negative sign.abs(-5.0)is5.0.For unsigned integer types (
u32): The function has no effect and simply returnse.For signed integer types (
i32): There is a specific edge case. The largest negative number (-2,147,483,648) does not have a positive equivalent within thei32range. In this specific case,abs()will return the original negative number. This is a rare overflow condition to be aware of.
W3C Specification: abs
Use Case
A common use is to create a "bounce" or "mirror" effect from an oscillating function like sin().
// Create a "bounce" animation instead of a standard wave.
let time: f32 = ...;
let wave = sin(time); // Oscillates between -1.0 and 1.0
let bounce = abs(wave); // Now oscillates between 0.0 and 1.0
// This is useful for effects that should grow and shrink but never go negative.
acos
Signature
acos(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the arc cosine of e. This is the inverse of the cos function; it takes a value and returns the angle in radians whose cosine is that value.
The input e must be in the range [-1.0, 1.0]. Providing a value outside this range will result in implementation-defined behavior, which is often NaN (Not a Number). To avoid this, it is highly recommended to clamp() the input value before passing it to acos().
The result is returned in radians, within the range [0, π].
W3C Specification: acos
Use Case
The primary use case for acos is to find the angle between two normalized (unit-length) vectors.
let vector_a = normalize(vec3<f32>(1.0, 0.0, 0.0));
let vector_b = normalize(vec3<f32>(0.5, 0.866, 0.0)); // Rotated 60 degrees
// The dot product of two unit vectors gives the cosine of the angle between them.
let dot_product = dot(vector_a, vector_b); // Result is ~0.5
// Due to floating-point inaccuracies, the dot product might be slightly
// outside [-1, 1]. Clamping it is a robust best practice.
let clamped_dot = clamp(dot_product, -1.0, 1.0);
// `acos` gives us the actual angle in radians.
let angle_radians = acos(clamped_dot); // Result is ~1.047, which is PI / 3 (60 degrees)
acosh
Signature
acosh(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the arc hyperbolic cosine of e, which is the inverse of the cosh function.
The input e must be greater than or equal to 1.0. Providing a value less than 1.0 will result in implementation-defined behavior, which is often NaN (Not a Number).
The result is returned in the range [0, +infinity).
W3C Specification: acosh
Use Case
This is a specialized mathematical function and is less common in general visual effects than its trigonometric counterpart, acos. It is used in scenarios involving hyperbolic geometry or solving equations that use the cosh function, such as calculations related to catenary curves (the shape of a hanging chain).
// `acosh` is the inverse of `cosh`.
let original_param = 2.0;
let cosh_result = cosh(original_param); // cosh(2.0) is approx 3.76
// To reverse the operation, we use acosh.
// The input is >= 1.0, so this is a valid operation.
let recovered_param = acosh(cosh_result); // Result is ~2.0
asin
Signature
asin(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the arc sine of e, which is the inverse of the sin function. It takes a value e and returns the angle in radians whose sine is e.
The input e must be in the range [-1.0, 1.0]. Providing a value outside this range will result in implementation-defined behavior (often NaN). To ensure correctness, it is best practice to clamp() the input before passing it to asin().
The result is returned in radians, within the range [-π/2, π/2].
W3C Specification: asin
Use Case
While less common than acos for 3D vector math, asin is useful when you have a ratio that represents the sine of an angle and need to recover the angle itself.
// Imagine you have the y-coordinate of a point on a unit circle and need its angle.
let y_coord = 0.707; // sin(45 degrees) or sin(PI / 4)
// Clamp the input for safety, even though we know it's valid here.
let clamped_y = clamp(y_coord, -1.0, 1.0);
// `asin` gives us the angle in radians.
let angle_radians = asin(clamped_y); // Result is ~0.785, which is PI / 4 (45 degrees)
asinh
Signature
asinh(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the arc hyperbolic sine of e, which is the inverse of the sinh function. Unlike its trigonometric counterpart asin, asinh is defined for all real numbers, so no input clamping is necessary.
The result is returned in the range (-infinity, +infinity).
W3C Specification: asinh
Use Case
This is a specialized mathematical function used in scenarios involving hyperbolic geometry or for reversing a sinh calculation. It is not commonly used in general visual effects programming.
// `asinh` is the inverse of `sinh`.
let original_param = 1.5;
let sinh_result = sinh(original_param); // sinh(1.5) is approx 2.129
// The `asinh` function can recover the original parameter.
let recovered_param = asinh(sinh_result); // Result is ~1.5
atan
Signature
atan(e: T) -> T
e: The input value or vector, typically representing a slope (y/x).T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the arc tangent of e, which is the inverse of the tan function. It takes a value e (representing a slope) and returns the angle in radians that produces it.
The result is returned in radians, within the range [-π/2, π/2] (from -90° to +90°).
Because the input represents a single ratio, atan cannot distinguish between angles in opposite quadrants. For example, the slope for a vector of (1, 1) is 1.0, and the slope for (-1, -1) is also 1.0. atan(1.0) will return the same angle for both. For most 2D angle calculations where you have the separate x and y components, the atan2 function is strongly preferred as it correctly handles all four quadrants.
W3C Specification: atan
Use Case
Used when you only have a slope value and need to find the corresponding angle within a limited 180-degree range.
// A slope of 1.0 corresponds to a 45-degree angle.
let slope = 1.0;
let angle_radians = atan(slope); // Result is ~0.785, which is PI / 4 (45 degrees)
// A negative slope.
let negative_slope = -1.0;
let neg_angle_rads = atan(negative_slope); // Result is ~-0.785 (-45 degrees)
atan2
Signature
atan2(y: T, x: T) -> T
y: The y-component or numerator.x: The x-component or denominator.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the arc tangent of the quotient y / x, returning an angle in radians.
Unlike atan, which only takes a single ratio, atan2 is a much more powerful function that uses the signs of both y and x to correctly determine the angle in all four quadrants of a 2D coordinate system. It avoids the division-by-zero errors that would occur with atan(y/x) when x is zero.
The result is returned in radians, within the full circular range [-π, π] (from -180° to +180°). The function is ill-defined at the origin, where both x and y are zero.
W3C Specification: atan2
Use Case
atan2 is the primary and correct tool for converting 2D Cartesian coordinates (x, y) into a polar angle. This is fundamental for creating circular or radial patterns.
// Center the UV coordinates so (0,0) is at the middle of the screen.
let centered_uv = uv - 0.5;
// `atan2` calculates the angle of the current pixel relative to the center.
// The result is a full 360-degree angle.
let angle = atan2(centered_uv.y, centered_uv.x);
// To visualize, we can map the angle from [-PI, PI] to a [0, 1] range
// and use it as a color hue, creating a color wheel.
let hue = angle / (2.0 * PI) + 0.5;
let color = hsv_to_rgb(hue, 1.0, 1.0); // (using a helper function)
// Example of quadrant awareness:
let angle_top_right = atan2( 1.0, 1.0); // PI / 4 ( 45 degrees)
let angle_bottom_left = atan2(-1.0, -1.0); // -3*PI / 4 (-135 degrees)
// `atan()` would give the same result for both, as the slope is 1.0 in both cases.
atanh
Signature
atanh(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the arc hyperbolic tangent of e, which is the inverse of the tanh function.
The input e must be in the range (-1.0, 1.0). Providing a value where abs(e) >= 1.0 will result in implementation-defined behavior (often NaN or infinity). To ensure correctness, it is best practice to clamp() the input to be strictly inside this range.
The result is returned in the range (-infinity, +infinity).
W3C Specification: atanh
Use Case
This is a specialized mathematical function not commonly used in general visual effects. It is primarily used in scenarios involving hyperbolic geometry or for reversing a tanh calculation.
// `atanh` is the inverse of `tanh`.
let original_param = 0.5;
let tanh_result = tanh(original_param); // tanh(0.5) is approx 0.462
// To reverse the operation, we use atanh.
// The input is between -1.0 and 1.0, so this is a valid operation.
let recovered_param = atanh(tanh_result); // Result is ~0.5
ceil
Signature
ceil(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the "ceiling" of e, which is the smallest whole number that is greater than or equal to e. In simpler terms, it always rounds up to the next integer.
ceil(3.2)returns4.0ceil(3.0)returns3.0ceil(-3.2)returns-3.0(since-3is greater than-3.2)
W3C Specification: ceil
Use Case
ceil is useful for any operation where you need to snap a value up to the next discrete step or grid line.
// Imagine you need to calculate how many inventory slots an item takes up.
// An item might take up a fractional number of slots (e.g., 2.3).
let item_size = 2.3;
// You can't have a fraction of a slot, so you must round up
// to ensure enough space is allocated.
let slots_needed = ceil(item_size); // Result is 3.0
// This guarantees that even a size of 2.001 will correctly allocate 3 slots.
cos
Signature
cos(e: T) -> T
e: The input angle or vector of angles, in radians.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the cosine of e, where e is an angle specified in radians. The result is a smooth, continuous wave that oscillates between -1.0 and 1.0.
cos is very similar to the sin function, but is "phase-shifted" by 90 degrees (π/2 radians). This means that cos(0.0) returns 1.0, while sin(0.0) returns 0.0.
W3C Specification: cos
Use Case
cos is fundamental to creating oscillations, waves, and any form of circular motion. It is most famously paired with sin to describe a point on a circle.
let time = ...; // Animate over time
let angle_in_radians = time * 2.0;
let radius = 5.0;
// `cos` defines the x-coordinate of a point on a circle.
let x = cos(angle_in_radians) * radius;
// `sin` defines the y-coordinate.
let y = sin(angle_in_radians) * radius;
// The resulting point (x, y) will orbit the origin in a perfect circle.
let circular_position = vec2<f32>(x, y);
cosh
Signature
cosh(e: T) -> T
e: The input hyperbolic angle in radians.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the hyperbolic cosine of e. This function describes the shape of a catenary curves (the shape of a hanging chain or rope). Mathematically, it is defined as (exp(e) + exp(-e)) / 2.
The function accepts any real number as input. The output is always greater than or equal to 1.0, with cosh(0.0) returning 1.0. The function is symmetrical around the y-axis and grows exponentially as e moves away from zero.
W3C Specification: cosh
Use Case
cosh is a specialized mathematical function primarily used to model physical phenomena like hanging cables or in advanced geometric calculations. It is rarely used in typical visual effects.
// The cosh function forms a catenary curve.
// At the lowest point (the center of the "chain"), the value is 1.0.
let lowest_point = cosh(0.0); // Returns 1.0
// The value increases exponentially as the input moves away from zero.
let point_on_curve = cosh(2.0); // Result is approx 3.76
// The function is symmetrical.
let other_point = cosh(-2.0); // Also approx 3.76
degrees
Signature
degrees(e: T) -> T
e: The input angle or vector of angles, in radians.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Converts a value from radians to degrees. It is the inverse of the radians function.
The conversion is performed by approximating the formula e * 180 / π. Since all trigonometric functions in WGSL (sin, cos, etc.) operate in radians, this function is typically used only for converting values for debugging or for interfacing with systems that expect degrees.
W3C Specification: degrees
Use Case
Useful for converting an angle to a more human-readable format for display or debugging purposes.
// WGSL provides the constant PI (~3.14159 radians).
let angle_rad = PI;
// Convert PI radians to degrees.
let angle_deg = degrees(angle_rad); // Result is 180.0
// The function also works component-wise on vectors.
let angles_rad_vec = vec2<f32>(PI, PI / 2.0);
let angles_deg_vec = degrees(angles_rad_vec); // Result is vec2<f32>(180.0, 90.0)
exp
Signature
exp(e: T) -> T
e: The input value or vector.T: Can bef32orf16or a vector off32orf16(e.g.,vec3<f32>).
Description
Calculates the natural exponentiation of e. This is equivalent to raising e (Euler's number, the mathematical constant ≈ 2.71828) to the power of the input e.
This function models exponential growth and is the inverse of the log() (natural logarithm) function. As the input e increases linearly, the output grows at an ever-increasing rate.
W3C Specification: exp
Use Case
exp is useful for modeling physical phenomena that follow an exponential curve, such as atmospheric density (fog) or certain types of light attenuation.
// Create an exponential falloff based on distance.
let dist = length(some_position);
// A negative input to `exp` creates exponential decay.
// As `dist` increases, `fog_density` rapidly approaches 0.
let fog_density = exp(-dist * 0.5);
// This creates a much more natural-looking fog falloff than a
// linear one, where the fog thins out slowly at first and then
// disappears very quickly.
exp2
Signature
exp2(e: T) -> T
e: The input value or vector, which acts as the exponent.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates 2 raised to the power of e. This is equivalent to pow(2.0, e).
This function models base-2 exponential growth and is the inverse of the log2() function. It is often more convenient or slightly more performant than the general pow() function when working with powers of two, which is common in computer graphics.
W3C Specification: exp2
Use Case
exp2 is very useful when working with systems that are based on powers of two, such as octaves in procedural noise or mipmap levels.
// A common pattern in procedural noise generation (like Perlin or Simplex noise)
// is to layer multiple "octaves" of noise, where each successive octave
// has double the frequency of the last.
var final_noise = 0.0;
for (var i = 0; i < 4; i = i + 1) { // 4 octaves of noise
let i_f32 = f32(i);
// exp2(i_f32) calculates the frequency for this octave:
// i=0 -> 2^0 = 1.0
// i=1 -> 2^1 = 2.0
// i=2 -> 2^2 = 4.0
// i=3 -> 2^3 = 8.0
let frequency = exp2(i_f32);
// ... calculate noise with this frequency ...
}
floor
Signature
floor(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the "floor" of e, which is the greatest whole number that is less than or equal to e. In simpler terms, it always rounds down to the nearest integer.
floor(3.7)returns3.0floor(3.0)returns3.0floor(-3.7)returns-4.0(since-4is less than-3.7)
W3C Specification: floor
Use Case
floor is a cornerstone of procedural pattern generation. It is used to find the integer coordinate or ID of a grid cell, which is essential for creating tiled or repeating patterns.
// Use UV coordinates that range from 0.0 to 1.0 across a surface.
let uv = in.uv;
// Scale the UVs to create a 5x5 grid.
let scaled_uv = uv * 5.0;
// `floor` gives us the integer coordinate of the grid cell we are in.
// `cell_id` will be a vec2 like (0,0), (1,0), (2,4), etc.
let cell_id = floor(scaled_uv);
// We can now use this stable integer ID to give each grid cell a unique
// property, like a pseudo-random color.
let random_value = pseudo_random(cell_id); // (using a helper hash function)
let cell_color = vec3<f32>(random_value);
fma
Signature
fma(e1: T, e2: T, e3: T) -> T
e1,e2,e3: The input values or vectors.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates and returns (e1 * e2) + e3. The name fma is short for "Fused Multiply-Add."
Ideally, this is a "fused" operation, meaning the multiplication e1 * e2 is performed with a higher internal precision, and only the final result of the addition is rounded to the target type T. This can produce more accurate results and avoid rounding errors compared to a separate multiply followed by an add.
However, the WGSL specification allows implementations to treat this as a standard, non-fused operation (a multiply followed by an add). Therefore, you should not rely on the higher precision of a true fused operation unless you are certain of the capabilities of your target hardware and drivers.
W3C Specification: fma
Use Case
fma is a fundamental operation in linear algebra and can be used to efficiently implement many common formulas. For example, the linear interpolation function mix(a, b, t) can be expressed using fma.
// The `mix` function performs linear interpolation:
// mix(a, b, t) is equivalent to a * (1.0 - t) + b * t
// This can be algebraically rearranged into the `fma` pattern:
// t * (b - a) + a
fn manual_mix(a: vec3<f32>, b: vec3<f32>, t: f32) -> vec3<f32> {
// This is a direct application of the fma pattern.
return fma(b - a, vec3<f32>(t), a);
}
let red = vec3(1.0, 0.0, 0.0);
let blue = vec3(0.0, 0.0, 1.0);
// These two lines produce the same result (a purple color).
let result_mix = mix(red, blue, 0.5);
let result_fma = manual_mix(red, blue, 0.5);
fract
Signature
fract(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the fractional part of e, which is equivalent to e - floor(e). The result is a value representing the part of the number after the decimal point.
The output is always in the range [0.0, 1.0]. For positive numbers, this is straightforward: fract(3.7) is 0.7. For negative numbers, it is 1.0 minus the fractional part: fract(-3.7) is 0.3. Due to floating-point precision, a very small negative number might result in 1.0.
Visually, fract(x) creates a repeating "sawtooth wave" that climbs linearly from 0.0 towards 1.0 and then instantly drops back to 0.0 to repeat. This behavior makes it one of the most fundamental functions for procedural pattern generation.
W3C Specification: fract
Use Case
fract is essential for creating repeating or tiled patterns. It is almost always used in combination with floor to establish a repeating local coordinate system.
// Use UV coordinates that range from 0.0 to 1.0 across a surface.
let uv = in.uv;
// Scale the UVs to create a 10x10 grid.
let scaled_uv = uv * 10.0;
// `floor` gives us the integer ID of the grid cell.
let cell_id = floor(scaled_uv);
// `fract` gives us the coordinate *inside* the current cell.
// `local_uv` will always be in the range [0,1] for x and y,
// resetting for each new cell, creating a repeating coordinate system.
let local_uv = fract(scaled_uv);
// We can now draw something in the center of every cell.
// `length(local_uv - 0.5)` calculates the distance from the cell's center.
let circle_mask = 1.0 - step(0.4, length(local_uv - 0.5));
// The result is a grid of white circles on a black background.
frexp
Decomposes a floating-point number into its two fundamental components: a significand (fractional part) and an integer exponent of 2.
Overload: Scalar Input
Applies the decomposition to a single floating-point number.
Signature
frexp(e: T) -> FrexpResult
e: The input scalar value.Tcan bef32orf16.FrexpResult: A special, built-in struct that cannot be declared by name. It must be inferred withlet.
Description
The frexp function splits the input e into a significand and an exponent such that e = significand * 2^exponent. The result is returned in a struct with two members:
.fract: A floating-point value of the same type ase. This is the significand, and its absolute value will be in the range[0.5, 1.0)..exp: A signed 32-bit integer (i32) representing the exponent.
You cannot explicitly declare a variable of the return type. Instead, you must use let to infer the type and then access its members.
W3C Specification: frexp
Overload: Vector Input
Applies the decomposition component-wise to a vector of floating-point numbers.
Signature
frexp(e: vecN<T>) -> FrexpResultVec
e: The input vector.Tcan bef32orf16.Ncan be2,3, or4.FrexpResultVec: A special, built-in struct containing vectors.
Description
Performs the frexp operation on each component of the input vector e. The result is returned in a struct with two members:
.fract: A vector of the same type ase(e.g.,vecN<f32>)..exp: A vector of signed 32-bit integers (e.g.,vecN<i32>).
W3C Specification: frexp
Use Case
frexp is a low-level function used for manipulating the binary representation of floating-point numbers. It is not common in general visual effects but is useful for certain numerical algorithms.
let my_float = 12.0;
// Decompose the float. The type of `result` is inferred.
let result = frexp(my_float);
// `result.fract` will be 0.75
// `result.exp` will be 4
// Because: 0.75 * 2^4 = 0.75 * 16 = 12.0
let significand = result.fract;
let exponent = result.exp;
let my_vec = vec2<f32>(12.0, -0.25);
// Decompose the vector.
let result_vec = frexp(my_vec);
// `result_vec.fract` will be vec2<f32>(0.75, -0.5)
// `result_vec.exp` will be vec2<i32>(4, -1)
// Because:
// -0.5 * 2^-1 = -0.5 * 0.5 = -0.25
let significands = result_vec.fract;
let exponents = result_vec.exp;
inverseSqrt
Signature
inverseSqrt(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the inverse square root of e, which is equivalent to 1.0 / sqrt(e).
The input e must be greater than 0. Providing a non-positive value (<= 0) will result in implementation-defined behavior.
This function is often significantly faster than performing a separate sqrt and division, as many GPUs have a dedicated, hardware-accelerated instruction for this specific operation (rsqrt). It is a key function for high-performance graphics code, especially in vector normalization.
W3C Specification: inverseSqrt
Use Case
The most common use of inverseSqrt is to perform a fast vector normalization. The normalize(v) operation is mathematically v / length(v), which can be rewritten as v * (1.0 / sqrt(dot(v, v))). This is a perfect fit for inverseSqrt.
fn fast_normalize(v: vec3<f32>) -> vec3<f32> {
// The squared length of the vector, which is v.x*v.x + v.y*v.y + v.z*v.z
let length_sq = dot(v, v);
// Calculate the inverse square root of the squared length.
let inv_sqrt = inverseSqrt(length_sq);
// Multiply the original vector by the result.
return v * inv_sqrt;
}
// This manual implementation is often how the built-in `normalize`
// function is implemented under the hood for maximum performance.
ldexp
Signature
ldexp(e1: T, e2: I) -> T
e1: The significand (fractional part) or base value.e2: The integer exponent for the power of 2.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).I: Can bei32or a vector ofi32(e.g.,vec3<i32>).
The second parameter e2 must be a vector if and only if the first parameter e1 is a vector, and they must have the same number of components.
Description
Constructs a floating-point number by calculating e1 multiplied by 2 raised to the power of e2 (i.e., e1 * 2^e2). The name ldexp is a mnemonic for "load exponent."
This function is the inverse of frexp; it takes a significand and an exponent and reconstructs the original number. For any finite number x, the following is true: x == ldexp(frexp(x).fract, frexp(x).exp).
W3C Specification: ldexp
Exponent Constraints
The maximum value for the exponent e2 is tied to the internal representation of the floating-point type T, which is defined by an "exponent bias".
For
f32, the bias is 127.For
f16, the bias is 15.For
AbstractFloat, the bias is 1023.AbstractFloatis the high-precision type used for floating-point literals (e.g.,1.0,3.14159) during shader compilation, before they are assigned to a concrete type likef32. This high bias reflects the greater precision used for compile-time constant evaluation.
The operation has the following constraints:
If
e2 + biasis less than or equal to0, the result may be zero (underflow).If
e2is greater thanbias + 1, the behavior depends on when the value is known:If
e2is a compile-time constant, it is a shader-creation error.Otherwise, the result is an indeterminate value at runtime (e.g., infinity or the maximum float value).
Use Case
ldexp is primarily used to reconstruct a floating-point number after its components have been manipulated using frexp. This allows for efficient multiplication or division by powers of two by directly modifying the exponent.
let original_value = 6.0;
// Decompose the value into its parts.
let parts = frexp(original_value); // -> .fract is 0.75, .exp is 3
// To multiply the original value by 4 (which is 2^2),
// we can simply add 2 to the exponent instead of performing a full multiplication.
let new_exponent = parts.exp + 2; // 3 + 2 = 5
// Reconstruct the number using the original significand and the new exponent.
// ldexp(0.75, 5) calculates 0.75 * 2^5 = 0.75 * 32 = 24.0
let final_value = ldexp(parts.fract, new_exponent); // Result is 24.0, which is 6.0 * 4
log
Signature
log(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the natural logarithm of the parameter e. The "natural logarithm" is the logarithm to the base e (Euler's number, the mathematical constant ≈ 2.71828). This function is the inverse of the exp() function.
The input e must be greater than 0. Providing a non-positive value (<= 0) will result in implementation-defined behavior.
Logarithms are useful for compressing a wide range of values into a much smaller one. As the input e grows exponentially, the output of log(e) grows only linearly.
W3C Specification: log
Use Case
log is often used to handle values that span several orders of magnitude, like light intensity in HDR (High Dynamic Range) rendering, by converting them to a more manageable scale for processing or display.
// Imagine `hdr_brightness` can range from 1.0 (normal) to 10000.0 (very bright).
let hdr_brightness = 10000.0;
// `log(1.0)` is 0.0
// `log(10000.0)` is approx 9.21
// The `log` function has compressed the huge [1.0, 10000.0] range
// into a much smaller [0.0, 9.21] range.
let compressed_brightness = log(hdr_brightness);
// This smaller range is now much easier to map to a standard [0.0, 1.0]
// displayable brightness for tone mapping.
let display_brightness = compressed_brightness / 10.0; // A simple tone mapping
log2
Signature
log2(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the base-2 logarithm of e. This function finds the power to which the number 2 must be raised to produce the value e. It is the inverse of the exp2() function.
The input e must be greater than 0. Providing a non-positive value (<= 0) will result in implementation-defined behavior.
W3C Specification: log2
Use Case
log2 is particularly useful in computer graphics and programming for problems that involve powers of two, such as calculating mipmap levels or determining the number of bits required for a value.
// log2 answers the question: "2 to the power of what equals this number?"
let val_a = 16.0;
// 2 to the power of 4 is 16, so log2(16.0) is 4.0.
let power_a = log2(val_a); // Result is 4.0
let val_b = 0.25;
// 2 to the power of -2 is 1/(2^2) = 1/4 = 0.25, so log2(0.25) is -2.0.
let power_b = log2(val_b); // Result is -2.0
modf
Decomposes a floating-point number into its whole and fractional parts.
W3C Specification: modf
Overload: Scalar Input
Signature
Applies the decomposition to a single floating-point number.
modf(e: T) -> ModfResult
e: The input scalar value.T: Can be AbstractFloat, f32, or f16.ModfResult: A special, built-in struct that cannot be declared by name. It must be inferred withlet.
Description
The modf function splits the input e into its integer and fractional components. The result is returned in a struct with two members, both of which have the same type as the input `e:
.fract: The fractional part ofe..whole: The whole number (integer) part ofe.
Both the fractional and whole parts will have the same sign as the original input e. Note that the whole number part is returned as a floating-point type, not an integer type.
Overload: Vector Input
Applies the decomposition component-wise to a vector of floating-point numbers.
Signature
modf(e: vecN<T>) -> ModfResultVec
e: The input vector.T: Can bef32, orf16.Ncan be2,3, or4.ModfResultVec: A special, built-in struct containing vectors that cannot be declared by name. It must be inferred withlet.
Description
Performs the modf operation on each component of the input vector e. The result is returned in a struct with two members:
.fract: A vector of the same type as e (e.g.,vecN<f32>)..whole: A vector of the same type as e (e.g.,vecN<f32>).
Use Case
modf is useful when you need to operate on both the integer and fractional parts of a number separately. This is distinct from using floor and fract, especially for negative numbers.
let my_float = 3.7;
let result_pos = modf(my_float);
// result_pos.fract is 0.7
// result_pos.whole is 3.0
let my_neg_float = -3.7;
let result_neg = modf(my_neg_float);
// result_neg.fract is -0.7
// result_neg.whole is -3.0
// Compare this to the standard `fract` function:
// fract(-3.7) would return 0.3
let my_vec = vec2<f32>(3.7, -2.5);
let result_vec = modf(my_vec);
// result_vec.fract is vec2<f32>(0.7, -0.5)
// result_vec.whole is vec2<f32>(3.0, -2.0)
pow
Signature
pow(e1: T, e2: T) -> T
e1: The base value or vector.e2: The exponent value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates e1 raised to the power of e2.
This is a fundamental function for creating non-linear curves and falloffs. The result can be sensitive to the input values:
If the base
e1is negative, the exponente2must be a whole number for the result to be mathematically meaningful in the domain of real numbers. A fractional exponent with a negative base will result in implementation-defined behavior (oftenNaN).pow(0.0, 0.0)is implementation-defined.Raising
0.0to a negative exponent will result in infinity.
W3C Specification: pow
Use Case
pow is essential for controlling the sharpness of specular highlights in lighting models and for gamma correction.
// Use Case 1: Specular Highlight
// `dot_product` is the alignment between view and reflection, typically in [0, 1].
let dot_product = max(0.0, dot(view_dir, reflect_dir));
let shininess = 32.0;
// Raising the value to a high power makes the highlight much sharper.
// Values close to 1.0 stay high, while values less than 1.0 fall off very quickly.
let specular_highlight = pow(dot_product, shininess);
// Use Case 2: Gamma Correction
let linear_color = vec3<f32>(0.2, 0.5, 0.9);
let gamma = 2.2;
// Convert from linear color space to sRGB color space for display.
let srgb_color = pow(linear_color, vec3<f32>(1.0 / gamma));
quantizeToF16
Signature
quantizeToF16(e: T) -> T
e: The input value or vector.T: Can bef32or a vector off32(e.g.,vec3<f32>).
Description
Rounds a 32-bit float (f32) down to the nearest value that can be precisely represented by a 16-bit float (f16), and then returns that value as an f32.
This is a "quantization" function, which means it reduces the precision of a value. The operation is conceptually equivalent to the process of packing an f32 into the bits of an f16 and then immediately unpacking it back to an f32. This round-trip is lossy, and quantizeToF16 gives you the result of that precision loss.
W3C Specification: quantizeToF16
Use Case
This function is used to simulate the precision loss of f16 floats while still performing calculations in f32. This is important for maintaining visual consistency and avoiding artifacts in rendering pipelines that might use a mix of f16 and f32 precision for storing data (e.g., in textures or buffers), especially on mobile or low-power hardware.
// A high-precision 32-bit float.
// 0.100006103515625 is the closest f32 representation of 0.1
let high_precision_val: f32 = 0.100006103515625;
// Simulate storing this value in an f16 buffer and reading it back.
// The result will be the closest f16 value, promoted back to f32.
// 0.0999755859375 is the closest f16 representation of 0.1
let quantized_val: f32 = quantizeToF16(high_precision_val);
// By quantizing, you ensure that any subsequent calculations using this value
// will behave as if the value had been stored with lower precision,
// preventing subtle artifacts that can arise from mixing precisions.
radians
Signature
radians(e: T) -> T
e: The input angle or vector of angles, in degrees.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Converts a value from degrees to radians. It is the inverse of the degrees function.
The conversion is performed by approximating the formula e * π / 180. This is a crucial utility function because all of WGSL's built-in trigonometric functions (sin, cos, tan, etc.) expect their angle inputs to be in radians.
W3C Specification: radians
Use Case
Used to convert angles from a more human-readable format (degrees) into the required format for trigonometric calculations. This is common when taking user input or parameters defined in degrees.
// A rotation angle specified in degrees, perhaps from a uniform.
let angle_deg = 45.0;
// Convert the angle to radians before using it with sin() or cos().
let angle_rad = radians(angle_deg); // Result is ~0.785, which is PI / 4
// Now `angle_rad` can be correctly used in trigonometric functions.
let val = sin(angle_rad); // Correctly calculates sin(45 degrees)
// The function also works component-wise on vectors.
let angles_deg_vec = vec2<f32>(90.0, 180.0);
let angles_rad_vec = radians(angles_deg_vec); // Result is vec2<f32>(PI / 2.0, PI)
round
Signature
round(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Rounds the input e to the nearest whole number, returning the result as a floating-point value.
This function uses the "round half to even" strategy, which is the default rounding mode in the IEEE 754 standard. This means:
Values are rounded to the closest integer. For example,
round(2.3)is2.0andround(2.8)is3.0.Values that are exactly halfway between two integers (like
2.5) are rounded to the nearest even integer.round(2.5)returns2.0round(3.5)returns4.0round(-2.5)returns-2.0
This behavior is different from the common "always round .5 up" rule and is designed to reduce statistical bias in long calculations.
W3C Specification: round
Use Case
round is useful for snapping values to the nearest integer grid, for example, to create a pixelated or mosaic effect.
// Use UV coordinates that range from 0.0 to 1.0.
let uv = in.uv;
// Scale the UVs to define the size of our "pixels".
let pixel_grid_size = 10.0;
let scaled_uv = uv * pixel_grid_size;
// Round the scaled UVs to the nearest integer coordinate.
let rounded_uv = round(scaled_uv);
// Scale back down to the original 0-1 range. This creates large blocks
// where every fragment inside a grid cell gets the same final UV coordinate.
let pixelated_uv = rounded_uv / pixel_grid_size;
// Sampling a texture with these new UVs will produce a mosaic effect.
let final_color = textureSample(my_texture, my_sampler, pixelated_uv);
sign
Signature
sign(e: T) -> T
e: The input value or vector.T: Can bef32,f16,i32, or a vector of these types (e.g.,vec3<f32>).
Description
Extracts the sign of e and returns it as a value of the same type. The result is:
1ifeis positive.0ifeis zero.-1ifeis negative.
This function is applied component-wise when the input is a vector.
W3C Specification: sign
Use Case
sign is useful for determining a direction without its magnitude. For example, it can be used to apply a force (like friction) that always opposes the direction of velocity.
let velocity = vec2<f32>(3.5, -2.1);
// Get the direction of velocity. `sign` will return vec2<f32>(1.0, -1.0).
let direction = sign(velocity);
let friction_magnitude = 0.5;
// The friction force should oppose the velocity.
let friction_force = -direction * friction_magnitude;
// `friction_force` is now vec2<f32>(-0.5, 0.5), correctly opposing the movement.
sin
Signature
sin(e: T) -> T
e: The input angle or vector of angles, in radians.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the sine of e, where e is an angle specified in radians. The result is a smooth, continuous wave that oscillates between -1.0 and 1.0.
sin is one of the most fundamental functions in graphics programming for creating any kind of periodic motion. sin(0.0) returns 0.0, and the wave repeats every 2π radians.
W3C Specification: sin
Use Case
sin is the primary tool for creating waves, oscillations, and cyclical animations.
// Use a time uniform to drive an animation.
let time = material.time;
// Use Case 1: Simple wave for vertex displacement
// This creates a physical wave along the x-axis of a mesh.
let frequency = 5.0;
let amplitude = 0.1;
let wave_offset = sin(position.x * frequency + time) * amplitude;
let new_y_position = position.y + wave_offset;
// Use Case 2: Pulsating color
// `sin` returns a value in [-1, 1]. We can map this to the [0, 1] range
// to control brightness or a color channel.
let pulse = sin(time) * 0.5 + 0.5; // Maps [-1, 1] -> [0, 1]
let pulsating_color = vec3<f32>(pulse, 0.0, 0.0); // Fades from black to red
sinh
Signature
sinh(e: T) -> T
e: The input hyperbolic angle in radians.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the hyperbolic sine of e. This function is related to the exponential function and is mathematically defined as (exp(e) - exp(-e)) / 2.
Unlike the trigonometric sin function which oscillates, the sinh function grows exponentially. It passes through the origin (sinh(0.0) is 0.0) and is an odd function, meaning sinh(-e) is equal to -sinh(e). It is the inverse of the asinh function.
W3C Specification: sinh
Use Case
sinh is a specialized mathematical function primarily used in advanced geometry or physics calculations. It is not commonly used in general visual effects programming.
// The sinh function grows exponentially away from zero.
let val_zero = sinh(0.0); // Result is 0.0
let val_pos = sinh(2.0); // Result is approx 3.62
let val_neg = sinh(-2.0); // Result is approx -3.62
// It is the inverse of asinh.
let recovered_value = asinh(val_pos); // Result is approx 2.0
sqrt
Signature
sqrt(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the square root of e. This is the inverse of squaring a number (x * x).
The input e must be non-negative (>= 0). Providing a negative input will result in implementation-defined behavior, which is often NaN (Not a Number).
W3C Specification: sqrt
Use Case
sqrt is fundamental for calculations based on the Pythagorean theorem, such as finding the length of a vector or the distance between two points. While WGSL provides the [length()(#length)] and distance() functions for this, sqrt is the core mathematical operation they are built on.
// Manually calculate the distance between two 2D points to demonstrate sqrt.
let point_a = vec2<f32>(2.0, 3.0);
let point_b = vec2<f32>(5.0, 7.0);
let delta = point_b - point_a; // delta is vec2<f32>(3.0, 4.0)
// The squared distance is dx*dx + dy*dy
let distance_sq = dot(delta, delta); // 3*3 + 4*4 = 9 + 16 = 25.0
// The actual distance is the square root of the squared distance.
let distance = sqrt(distance_sq); // sqrt(25.0) is 5.0
// This is mathematically equivalent to the built-in `distance()` function.
tan
Signature
tan(e: T) -> T
e: The input angle or vector of angles, in radians.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the tangent of e, where e is an angle specified in radians. Mathematically, tan(e) is equivalent to sin(e) / cos(e).
The function's output represents the slope of a line at a given angle. Unlike sin and cos, the output range of tan is unbounded (-infinity to +infinity). The function has vertical asymptotes (approaches infinity) at odd multiples of π/2 (90°, 270°, etc.), where cos(e) is zero. This makes it generally unsuitable for creating smooth, bounded animations.
W3C Specification: tan
Use Case
The primary role of tan in computer graphics is in geometric calculations, most notably in the creation of the perspective projection matrix, where it relates the camera's field of view (FOV) to the dimensions of the viewing frustum.
// A simplified example of the math inside a perspective projection matrix.
let fov_degrees = 90.0;
let fov_radians = radians(fov_degrees);
// The tangent of half the FOV is used to calculate a scaling factor
// that determines how much of the world is visible.
let scale_factor = 1.0 / tan(fov_radians * 0.5);
// This scale_factor would then be used to construct the matrix that
// transforms 3D world coordinates into 2D screen coordinates.
tanh
Signature
tanh(e: T) -> T
e: The input hyperbolic angle in radians.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the hyperbolic tangent of e. Mathematically, it is equivalent to sinh(e) / cosh(e).
The tanh function is a "squashing" function, meaning it takes any real number as input and maps it to a value within the range (-1.0, 1.0). It produces an "S"-shaped curve (a sigmoid) that is nearly linear around 0 and flattens out as the input moves towards positive or negative infinity. It is the inverse of the atanh function.
W3C Specification: tanh
Use Case
tanh is useful for creating smooth falloff curves or mapping an unbounded value to a predictable range.
// Imagine you have a distance value that can be very large.
let dist_from_center: f32 = ...; // Could be 0.0, 5.0, 100.0, etc.
// A linear falloff might not look good. We can use `tanh` to create a
// smooth curve that rapidly transitions and then levels off.
let scale = 0.5;
let tanh_falloff = tanh(dist_from_center * scale);
// The `tanh_falloff` value will start at 0.0, rise quickly, and then
// smoothly approach 1.0 without ever exceeding it, no matter how large
// `dist_from_center` gets. This creates a pleasing, controlled transition.
let effect_strength = 1.0 - tanh_falloff;
trunc
Signature
trunc(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the integer part of e by discarding its fractional component. This is commonly described as rounding towards zero. The result is returned as a floating-point value.
trunc(3.7)returns3.0trunc(-3.7)returns-3.0
Note that for negative numbers, this behavior is different from the floor() function, which would round -3.7 down to -4.0.
W3C Specification: trunc
Vector Functions
While the common mathematical functions can be applied component-wise to vectors, this family of functions performs holistic geometric operations. They treat vectors not just as a collection of numbers, but as representations of positions, directions, and surfaces in 2D or 3D space.
These are the indispensable power tools for solving the fundamental questions of computer graphics. They allow you to calculate the distance between points, extract a pure direction from a vector, measure the alignment and angle between directions, find vectors that are perpendicular to a surface, and even simulate the physics of light as it bounces off or bends through materials.
Mastering these functions is the key to implementing almost any lighting model, geometric effect, or physics-based interaction in a shader. They are the language of 3D space.
cross
Signature
cross(e1: vec3<T>, e2: vec3<T>) -> vec3<T>
e1,e2: The two input vectors.T: Can bef32, orf16.
Description
Calculates the cross product of two 3-component vectors, e1 and e2.
The result is a new vec3 vector that is perpendicular (orthogonal) to both of the input vectors. The direction of the resulting vector is determined by the "right-hand rule": if you curl the fingers of your right hand in the direction from e1 to e2, your thumb will point in the direction of the result.
This function is defined only for vec3. It cannot be used with vectors of other dimensions.
W3C Specification: cross
distance
Signature
distance(e1: T, e2: T) -> S
e1,e2: The two input points.T: The type of the input points. Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).S: The scalar component type ofT(f32ifTisvecN<f32>,f16ifTisvecN<f16>, etc.).
Description
Calculates the Euclidean distance (the straight-line distance) between two points, e1 and e2. This is mathematically equivalent to length(e1 - e2).
The function always returns a single scalar floating-point value, even when the inputs are vectors.
W3C Specification: distance
Use Case
distance is one of the most common functions in shader programming, used for any calculation involving proximity, such as range checks, radial gradients, or circular masks.
// Use Case 1: AI Attack Range
let player_pos = vec3<f32>(10.0, 0.0, 5.0);
let enemy_pos = vec3<f32>(12.0, 0.0, 6.0);
let attack_range = 3.0;
if distance(player_pos, enemy_pos) < attack_range {
// The player is in range, so the enemy can attack.
}
// Use Case 2: Circular Mask
// `uv` is a 2D coordinate from 0.0 to 1.0.
let center = vec2<f32>(0.5);
let radius = 0.4;
let dist_from_center = distance(uv, center);
// Create a hard-edged circle.
let circle = step(dist_from_center, radius);
dot
Signature
dot(e1: vecN<T>, e2: vecN<T>) -> T
e1,e2: The two input vectors of the same type and dimension.T: The component type of the vectors. Can bef32,f16,i32, oru32.The return value is a single scalar of type
T.
Description
Calculates the dot product of e1 and e2. This is done by multiplying the corresponding components of the two vectors and then summing the results. For example, for two vec3 vectors a and b, the dot product is (a.x * b.x) + (a.y * b.y) + (a.z * b.z).
The geometric meaning of the dot product is what makes it one of the most powerful tools in shader programming. For two unit-length (normalized) floating-point vectors, the result is the cosine of the angle between them. This tells you how much the two vectors are aligned:
1.0: The vectors point in the exact same direction.0.0: The vectors are perfectly perpendicular (90 degrees apart).-1.0: The vectors point in exact opposite directions.
It can be thought of as a projection: dot(A, B) measures how much of vector A points along the direction of vector B.
W3C Specification: dot
faceForward
Signature
faceForward(e1: T, e2: T, e3: T) -> T
e1: The vector to orient.e2: The incident vector.e3: The reference vector.T: Can bevecN<f32>, orvecN<f16>, whereNis2,3, or4.
Description
Orients a vector to point in the same general direction as another. Specifically, it returns e1 if dot(e2, e3) is negative, and -e1 otherwise.
This function's purpose is to ensure a normal vector is always pointing "towards" a direction of interest (like a camera), which is essential for effects like two-sided lighting. The parameters are typically used as follows:
e1: The normal vector you want to orient,N.e2: A reference normal,N_ref, used to determine the surface's orientation. Usually, this is the same vector asN.e3: The incident vector,I, which is a direction pointing towards the surface (e.g., the negated view direction).
The logic is: if the surface (N_ref) is facing the incident direction (I), return the normal (N) as is. Otherwise, return the flipped normal (-N).
W3C Specification: faceForward
Use Case
The canonical use case is creating two-sided materials where both the front and back faces of a polygon are lit correctly.
// The geometric normal from the mesh.
let geometric_normal = normalize(in.normal);
// The direction from the surface point TO the camera.
let view_dir = normalize(camera.world_position - in.world_position);
// The incident vector must point TOWARDS the surface, so we negate the view direction.
let incident_dir = -view_dir;
// `faceForward` ensures the normal used for lighting always points toward the camera,
// even if we are viewing the back-face of a polygon.
// We use `geometric_normal` for all three arguments in this common pattern.
let final_normal = faceForward(geometric_normal, incident_dir, geometric_normal);
// Now, `final_normal` can be used for lighting, and it will be correct
// regardless of which side of the polygon is visible.
let light_dir = ...;
let brightness = max(0.0, dot(final_normal, light_dir));
length
Signature
length(e: T) -> S
e: The input value or vector.T: The type of the input. Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).S: The scalar component type ofT(f32ifTisvecN<f32>, etc.).
Description
Calculates the length (also known as the magnitude or norm) of the input e. The behavior depends on whether the input is a scalar or a vector:
For a scalar
e: The result is the absolute value ofe.For a vector
e: The result is the Euclidean length, calculated assqrt(dot(e, e)). For avec3, this is sqrt(e.x*e.x + e.y*e.y + e.z*e.z).
Precision Note: The specification allows the scalar case to be implemented as sqrt(e * e). While mathematically equivalent to abs(e), this implementation detail can matter in extreme cases. If e is a very large number, the intermediate e * e calculation could overflow to infinity before the square root is taken, resulting in an incorrect value. For standard graphics work, this is rarely a concern, but it's a known limitation of this possible implementation.
The function always returns a single, non-negative scalar floating-point value.
W3C Specification: length
Use Case
length is fundamental for any operation that depends on the magnitude of a vector, such as measuring the distance of a point from the origin or calculating the speed from a velocity vector.
// Use Case 1: Distance from Origin
let point_in_space = vec3<f32>(3.0, 4.0, 0.0);
// `length` calculates the straight-line distance from (0,0,0) to the point.
let dist_from_origin = length(point_in_space); // Result is 5.0
// Use Case 2: Calculating Speed
let velocity_vector = vec3<f32>(-10.0, 20.0, 0.0);
// The length of the velocity vector is the object's speed.
let speed = length(velocity_vector); // Result is approx 22.36
// The built-in `distance(a, b)` function is a convenient shorthand
// for `length(a - b)`.
normalize
Signature
normalize(e: vecN<T>) -> vecN<T>
e: The input vector.T: Can bef32, orf16.N: The dimension of the vector, can be2,3, or4.
Description
Calculates and returns a unit vector that points in the same direction as the input vector e. A unit vector is a vector with a length of exactly 1.0.
The operation is mathematically equivalent to e / length(e). This function is essential for any calculation that relies on pure direction rather than magnitude.
If the input vector e is the zero vector (i.e., its length is 0.0), the result is a zero vector of the same type. This avoids a division-by-zero error and provides a safe, predictable outcome.
W3C Specification: normalize
Use Case
Normalization is one of the most frequent operations in shader programming, especially for lighting calculations where directions must be represented by unit vectors.
// Use Case 1: Calculating a Direction Vector
let start_pos = vec3<f32>(1.0, 1.0, 1.0);
let end_pos = vec3<f32>(4.0, 5.0, 1.0);
// The vector from start to end has a magnitude (length).
let vector = end_pos - start_pos; // vec3<f32>(3.0, 4.0, 0.0), length is 5.0
// To get only the pure direction, we normalize it.
let direction = normalize(vector); // vec3<f32>(0.6, 0.8, 0.0), length is 1.0
// Use Case 2: Preparing Vectors for Lighting
// Surface normals passed from the vertex shader get interpolated, which changes
// their length. They MUST be re-normalized in the fragment shader.
let surface_normal = normalize(in.interpolated_normal);
let light_dir = normalize(light.position - in.world_position);
// Now the dot product will correctly give the cosine of the angle.
let brightness = dot(surface_normal, light_dir);
reflect
Signature
reflect(e1: T, e2: T) -> T
e1: The incident vector,I.e2: The surface normal vector,N.T: Can bevecN<f32>, orvecN<f16>, whereNis2,3, or4.
Description
Calculates the direction of a perfect, mirror-like reflection. It takes an incident vector e1 (representing the incoming ray) and a surface normal e2, and returns the reflected direction vector.
For this function to work correctly, two conditions are critical:
The surface normal
e2must be a unit vector (normalized).The incident vector
e1must point towards the surface.
This second point is the most common source of errors. In shader programming, the view direction vector typically points from the surface to the camera. The reflect function expects the opposite. Therefore, you must negate your view direction vector before passing it to this function.
I \ |N / R
\ | /
\|/
════════════════════ (Surface)
W3C Specification: reflect
Use Case
The primary use of reflect is to implement environment mapping, where the reflection vector is used to sample a cubemap (skybox) to simulate reflections on a surface.
// The surface normal, which must be normalized.
let surface_normal = normalize(in.normal);
// The direction pointing FROM the surface point TO the camera.
let view_dir = normalize(camera.position - in.world_position);
// The incident vector for `reflect` must point TOWARDS the surface,
// so we negate the view direction.
let incident_dir = -view_dir;
// Calculate the reflection direction.
let reflect_dir = reflect(incident_dir, surface_normal);
// `reflect_dir` can now be used to sample a skybox texture.
let reflection_color = textureSample(skybox_texture, skybox_sampler, reflect_dir);
refract
Signature
refract(e1: vecN<T>, e2: vecN<T>, e3: T) -> vecN<T>
e1: The incident vector,I.e2: The surface normal vector,N.e3: The ratio of indices of refraction,eta.T: Can bef32, orf16.Ncan be2,3, or4.
Description
Calculates the direction of a refracted ray of light as it passes through a surface from one medium to another (e.g., from air into water).
For this function to work correctly, several conditions are critical:
The incident vector
e1must point towards the surface. As with reflect, this usually requires negating your view direction vector.The surface normal
e2must be a unit vector (normalized).The ratio
e3(eta) must be calculated as (Index of Refraction of the starting medium) / (Index of Refraction of the destination medium).Air to Glass (
IOR ≈ 1.5):eta = 1.0 / 1.5Glass to Air:
eta = 1.5 / 1.0
W3C Specification: refract
Total Internal Reflection (TIR)
Under certain conditions (when light travels from a denser to a less dense medium at a shallow angle), refraction is impossible, and all light reflects. This is known as Total Internal Reflection. The refract function signals this by returning a zero vector (vecN(0.0)). Your shader code should check for this case and typically perform a reflect operation instead.
Use Case
The primary use of refract is to render transparent materials like glass and water, where it provides the distorted view of the scene behind the object.
// The surface normal, which must be normalized.
let surface_normal = normalize(in.normal);
// The direction pointing FROM the surface point TO the camera.
let view_dir = normalize(camera.position - in.world_position);
// The incident vector for `refract` must point TOWARDS the surface.
let incident_dir = -view_dir;
// The ratio of IORs for a ray traveling from air (1.0) into glass (1.5).
let eta = 1.0 / 1.5;
// Calculate the refraction direction.
let refract_dir = refract(incident_dir, surface_normal, eta);
// Check for Total Internal Reflection.
if (all(refract_dir == vec3<f32>(0.0))) {
// Perform reflection instead.
let reflect_dir = reflect(incident_dir, surface_normal);
// ... sample environment with reflect_dir ...
} else {
// Use the refraction vector to sample the background/environment map.
let refraction_color = textureSample(background_texture, sampler, refract_dir);
}
Matrix Functions
In computer graphics, matrices are the fundamental tool for performing transformations in 2D and 3D space, such as translation, rotation, and scaling. While most of the time you will be using matrices to transform vectors, this small family of functions allows you to analyze and manipulate the matrices themselves.
These functions provide insight into the properties of a transformation. They allow you to calculate a matrix's determinant to understand if it flips or collapses space, or to compute its transpose, which is a critical step in correctly transforming normal vectors for lighting calculations. While not used as frequently as vector or mathematical functions, they are essential for solving specific, advanced problems in geometric and lighting computations.
determinant
Signature
determinant(e: matCxC<T>) -> T
e: The input square matrix.C: The dimension of the matrix, can be2,3, or4.T: Can bef32, orf16.The return value is a single scalar of type
T.
Description
Calculates the determinant of a square matrix e. The determinant is a single scalar value that reveals important properties about the linear transformation the matrix represents.
If the determinant is zero, the matrix is "singular." This means the transformation collapses space into a lower dimension (e.g., a 3D volume becomes a 2D plane), and the matrix cannot be inverted.
The sign of the determinant indicates if the transformation flips orientation. A negative determinant means the object has been mirrored or turned "inside-out" (e.g., by a negative scale on an odd number of axes).
The absolute value of the determinant represents the factor by which the transformation scales volume (for
mat3x3,mat4x4) or area (format2x2).
W3C Specification: determinant
Use Case
The determinant is useful for checking if a matrix is invertible or for detecting mirrored transformations, which is important for correct lighting and culling.
let model_matrix: mat3x3<f32> = ...; // The upper 3x3 of a model matrix
let det = determinant(model_matrix);
// Check if the model has been mirrored (e.g., scale.x = -1.0).
if (det < 0.0) {
// This object's coordinate system has been flipped.
// This is important information for lighting calculations, as it may
// require flipping the bitangent vector to ensure normal maps
// are applied correctly.
// It also flips the winding order of triangles, which affects culling.
}
// It can also be used to guard an `inverse()` call, though this is rare
// as `inverse()` will produce a valid (if not useful) matrix even for
// singular inputs on most hardware.
if (det != 0.0) {
// The matrix is invertible.
}
transpose
Signature
transpose(e: matRxC<T>) -> matCxR<T>
e: The input matrix.R,C: The number of rows and columns (2,3, or4).T: Can bef32, orf16.The return type is a matrix with the row and column counts swapped.
Description
Calculates the transpose of the input matrix e. This operation flips the matrix over its main diagonal, effectively turning the original matrix's rows into the new matrix's columns, and vice versa.
For example, a mat2x3 (2 rows, 3 columns) becomes a mat3x2 (3 rows, 2 columns):
[ a, b, c ] becomes [ a, d ]
[ d, e, f ] [ b, e ]
[ c, f ]
W3C Specification: transpose
Use Case
The most important use case for transpose in shader programming is in the calculation of the normal matrix.
When a mesh has a non-uniform scale (e.g., scaled by (2.0, 1.0, 1.0)), simply transforming its normal vectors by the model matrix will incorrectly skew them. The correct way to transform normals is to use the transpose of the inverse of the model matrix.
let model_matrix: mat4x4<f32> = ...;
// For normals, we only care about the rotation and scale part (the upper 3x3).
let upper3x3 = mat3x3<f32>(
model_matrix.x.xyz,
model_matrix.y.xyz,
model_matrix.z.xyz
);
// The normal matrix is the transpose of the inverse of the model's upper 3x3.
let normal_matrix = transpose(inverse(upper3x3));
let mesh_normal: vec3<f32> = ...;
// Now, transforming the normal with this matrix yields the correct result.
let world_normal = normalize(normal_matrix * mesh_normal);
// Note: For matrices that only contain rotation and uniform scale (orthogonal matrices),
// the transpose is equal to the inverse. In these cases, using `transpose()` is
// a significant performance optimization over the much slower `inverse()` function.
Interpolation & Clamping Functions
This family of functions provides the essential tools for controlling numerical values and creating transitions between them. They are fundamental to almost every visual effect, from basic lighting to complex procedural patterns. The functions in this group can be divided into two main categories:
Clamping Functions: These functions (
clamp,min,max,saturate) are used for constraining values. They ensure that the results of your calculations remain within a valid and predictable range. This is crucial for preventing visual artifacts, such as colors becoming negative or overly bright, or for providing safe inputs to other mathematical functions that have a limited domain (likeacos).Interpolation Functions: These are the artistic tools for creating blends, fades, and gradients. They define how a value transitions from one state to another. This includes the linear blend of
mix, the hard, binary switch ofstep, and the versatile, eased-in "S-curve" ofsmoothstep, which is a cornerstone for creating polished, natural-looking effects.
Many of these functions are also "branchless," offering a more performant way to achieve conditional logic than a traditional if statement. Mastering this group is essential for creating high-quality, robust, and visually pleasing shaders.
clamp
Signature
clamp(e: T, low: T, high: T) -> T
e: The input value or vector to be clamped.low: The lower bound of the range.high: The upper bound of the range.T: Can be f32, f16, i32, u32, or a vector of these types (e.g.,vec3<f32>).
Description
Restricts a value e to be within the inclusive range [low, high].
If
eis less thanlow, the result islow.If
eis greater thanhigh, the result ishigh.Otherwise, the result is
e.
The function is applied component-wise when the inputs are vectors. The specification notes that for floating-point types, this may be implemented as min(max(e, low), high) or as the median of the three values, which provides a numerically stable way to clamp a value.
It is an error if low is greater than high. This will be caught at shader compilation time if the values are compile-time constants.
W3C Specification: clamp
Use Case
clamp is an essential utility for ensuring values remain within a valid or expected range. It is frequently used as a "safety" function.
// Use Case 1: Keeping color values in the valid [0, 1] range.
var bright_color = vec3<f32>(1.2, -0.1, 0.8);
// Writing this color to the screen could cause visual artifacts.
let safe_color = clamp(bright_color, vec3<f32>(0.0), vec3<f32>(1.0));
// `safe_color` is now vec3<f32>(1.0, 0.0, 0.8)
// Use Case 2: Ensuring valid input for other functions.
// The `acos` function requires an input between -1.0 and 1.0.
// A dot product can sometimes have tiny floating-point errors that push it
// slightly outside this range (e.g., 1.000001 or -1.000001).
let dot_product = dot(vec_a, vec_b);
let clamped_dot = clamp(dot_product, -1.0, 1.0);
// Now it's safe to pass this value to `acos`.
let angle = acos(clamped_dot);
max
Signature
max(e1: T, e2: T) -> T
e1,e2: The two input values or vectors to be compared.T: Can bef32,f16,i32,u32, or a vector of these types (e.g.,vec3<f32>).
Description
Compares e1 and e2 and returns the larger of the two.
The function is applied component-wise when the inputs are vectors. This means for max(vec_a, vec_b), the resulting vector's x-component will be max(vec_a.x, vec_b.x), its y-component will be max(vec_a.y, vec_b.y), and so on.
Floating-Point Edge Cases:
If one operand is
NaN(Not a Number), the other operand is returned.If both operands are
NaN, aNaNis returned.If both operands are "denormalized" (very small numbers near zero that have lost precision), the result may be either of the two input values.
W3C Specification: max
Use Case
max is a fundamental building block for many shader effects. Its most common use is to establish a "floor" or minimum value for a calculation, preventing it from going below a certain threshold.
// Use Case 1: Diffuse Lighting "NdotL"
// The dot product of a normal and a light direction can be negative,
// which would incorrectly subtract light.
let dot_product = dot(surface_normal, light_dir);
// `max` is used to clamp the result at zero, ensuring that surfaces
// facing away from the light contribute zero brightness, not negative.
let diffuse_brightness = max(0.0, dot_product);
// Use Case 2: Component-wise comparison
let a = vec3<f32>(1.0, 5.0, 3.0);
let b = vec3<f32>(4.0, 2.0, 6.0);
// `max` will be applied to each pair of components.
// max(1.0, 4.0) -> 4.0
// max(5.0, 2.0) -> 5.0
// max(3.0, 6.0) -> 6.0
let result = max(a, b); // Result is vec3<f32>(4.0, 5.0, 6.0)
min
Signature
min(e1: T, e2: T) -> T
e1,e2: The two input values or vectors to be compared.T: Can bef32,f16,i32,u32, or a vector of these types (e.g.,vec3<f32>).
Description
Compares e1 and e2 and returns the smaller of the two.
The function is applied component-wise when the inputs are vectors. This means for min(vec_a, vec_b), the resulting vector's x-component will be min(vec_a.x, vec_b.x), its y-component will be min(vec_a.y, vec_b.y), and so on.
Floating-Point Edge Cases:
If one operand is
NaN(Not a Number), the other operand is returned.If both operands are
NaN, aNaNis returned.If both operands are "denormalized" (very small numbers near zero that have lost precision), the result may be either of the two input values.
W3C Specification: min
Use Case
min is a fundamental building block, often used to establish a "ceiling" or maximum value for a calculation, preventing it from exceeding a certain threshold. It is also used in creating Signed Distance Functions (SDFs).
// Use Case 1: Limiting an Effect
// An effect strength that should not exceed 1.0.
var effect_strength = calculate_effect(); // Might be > 1.0
// `min` is used to cap the value at 1.0.
let limited_strength = min(1.0, effect_strength);
// Use Case 2: Smoothly combining Signed Distance Functions (SDFs)
// `dist_a` and `dist_b` are the signed distances to two different shapes.
let dist_a = distance_to_circle(uv);
let dist_b = distance_to_square(uv);
// The minimum of the two distances gives the union of the two shapes.
let union_dist = min(dist_a, dist_b);
mix
Performs a linear interpolation (or "blend") between two values. This is one of the most fundamental operations in computer graphics, often referred to as "lerp".
W3C Specification: mix
Overload: Uniform Blending
This overload blends two values or vectors using a corresponding blend factor or vector of blend factors.
Signature
mix(e1: T, e2: T, e3: T) -> T
e1: The starting value or vector (whene3is0).e2: The ending value or vector (whene3is1).e3: The blend factor or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Calculates the linear blend of e1 and e2 using the blend factor e3. The calculation is e1 * (1.0 - e3) + e2 * e3. The blend factor e3 is typically in the range [0.0, 1.0].
If
e3is0.0, the result ise1.If
e3is1.0, the result ise2.If
e3is0.5, the result is a perfect 50/50 mix.
When the inputs are vectors, the operation can be performed component-wise if e3 is also a vector. For example, you can blend the red, green, and blue channels of a color by different amounts simultaneously.
Overload: Scalar Blending Factor
This overload is a convenient alternative that blends two vectors using a single, uniform scalar blend factor.
Signature
mix(e1: T, e2: T, e3: S) -> T
e1,e2: The start and end vectors to blend between.e3: The single, scalar blend factor.T: The vector type,vecN<S>.S: The scalar component type,f32, orf16.
Description
Performs a component-wise linear blend of the vectors e1 and e2, using the single scalar e3 as the blending factor for every component. This is the most common way to perform a uniform blend between two vectors (like colors or positions).
Use Case
let red = vec3<f32>(1.0, 0.0, 0.0);
let blue = vec3<f32>(0.0, 0.0, 1.0);
// Use Case 1: Simple blend with a scalar factor
let purple = mix(red, blue, 0.5); // Result is vec3<f32>(0.5, 0.0, 0.5)
// Use Case 2: Component-wise blend with a vector factor
// Blend red by 10%, green by 50%, and blue by 90%.
let blend_factors = vec3<f32>(0.1, 0.5, 0.9);
let custom_blend = mix(red, blue, blend_factors);
// result is red*(1-factors) + blue*factors
// (1,0,0)*(0.9,0.5,0.1) + (0,0,1)*(0.1,0.5,0.9)
// (0.9,0,0) + (0,0,0.9) -> (0.9, 0.0, 0.9)
let start_pos = vec3<f32>(0.0, 0.0, 0.0);
let end_pos = vec3<f32>(10.0, 20.0, 0.0);
let animation_progress = 0.25; // 25% of the way through the animation
// Interpolate the position using a single scalar factor.
let current_pos = mix(start_pos, end_pos, animation_progress);
// Result is vec3<f32>(2.5, 5.0, 0.0)
saturate
Signature
saturate(e: T) -> T
e: The input value or vector.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
A convenient shorthand function that is exactly equivalent to clamp(e, 0.0, 1.0). It restricts the input value e to be within the inclusive range [0.0, 1.0].
This is a very common operation in graphics, particularly for colors or blending factors, which often need to be constrained to this specific range. Using saturate can make shader code more concise and readable than writing out the full clamp.
W3C Specification: saturate
Use Case
saturate is the ideal function for ensuring that any value intended to be a standard color component or a percentage (like a lighting term or alpha value) is in the valid [0.0, 1.0] range.
// A lighting calculation that might produce values outside the [0, 1] range
// due to high-intensity lights or unusual material properties.
let dot_product = dot(surface_normal, light_dir);
let brightness = dot_product * light_intensity; // Could be 1.5, -0.2, etc.
// `saturate` cleanly clamps the result to a valid brightness range.
let final_brightness = saturate(brightness);
// The following two lines are identical in function:
let clamped_brightness = clamp(brightness, 0.0, 1.0);
let saturated_brightness = saturate(brightness);
smoothstep
Performs a smooth, Hermite interpolation between 0.0 and 1.0. It is a cornerstone of shader effects for creating natural-looking transitions, fades, and soft edges. The resulting curve has an "ease-in" and "ease-out" profile.
W3C Specification: smoothstep
Overload: Scalar Inputs
This is the most common overload, used to smoothly transition a single value.
Signature
smoothstep(low: S, high: S, x: S) -> S
low: The lower bound of the transition range.high: The upper bound of the transition range.x: The input value to be mapped.S: Can bef32orf16.
Description
Calculates a smooth transition from 0.0 to 1.0 as the input x moves from low to high.
If
xis less than or equal tolow, the result is0.0.If
xis greater than or equal tohigh, the result is1.0.If
xis betweenlowandhigh, the result is a smoothly interpolated value between0.0and1.0, calculated using the formulat * t * (3.0 - 2.0 * t), wheretis the normalized position ofxwithin the range.
Overload: Vector Inputs
This overload applies the smoothstep function component-wise.
Signature
smoothstep(low: T, high: T, x: T) -> T
low,high,x: The input vectors.T: Can bevecN<f32>orvecN<f16>.
A convenient alternative signature also exists:
smoothstep(low: S, high: S, x: T) -> T
Where low and high are scalars, and x is a vector. This applies the same scalar range to each component of x.
Description
Performs the smoothstep operation on each component of the input vector x independently. If low and high are vectors, the corresponding components are used for each channel's range. If low and high are scalars, they are used as a uniform range for all components of x.
Use Case
The canonical use case is creating a soft, anti-aliased edge for a shape, or a smooth fade.
// Create a circle with a soft edge.
let center = vec2<f32>(0.5);
let dist = distance(uv, center);
let radius = 0.3;
let softness = 0.1;
// The transition will start at the inner edge (radius - softness)
// and end at the outer edge (radius + softness).
let inner_edge = radius - softness;
let outer_edge = radius + softness;
// `smoothstep` creates a smooth gradient in the softness region.
// We subtract from 1.0 to make the inside of the circle white (1.0).
let soft_circle = 1.0 - smoothstep(inner_edge, outer_edge, dist);
Useful for fading color channels at different rates or applying a smooth threshold to a vector.
let my_color = vec3<f32>(0.2, 0.6, 0.9);
// Use a single scalar range for all color channels.
// Each channel of `my_color` will be smoothly mapped from 0 to 1
// as it crosses the [0.3, 0.8] range.
let faded_color_uniform = smoothstep(0.3, 0.8, my_color);
// Use a different range for each color channel.
let low_thresholds = vec3<f32>(0.1, 0.5, 0.8);
let high_thresholds = vec3<f32>(0.3, 0.7, 0.9);
let faded_color_per_channel = smoothstep(low_thresholds, high_thresholds, my_color);
// Red channel fades between 0.1 and 0.3.
// Green channel fades between 0.5 and 0.7.
// Blue channel fades between 0.8 and 0.9.
step
Performs a binary threshold operation, returning 0.0 or 1.0 based on a comparison. It is a fundamental function for creating hard-edged shapes and conditional masks without using branches (if statements).
Overload: Scalar and Vector Inputs
This overload compares two values or vectors of the same type.
W3C Specification: step
Signature
step(edge: T, x: T) -> T
edge: The threshold value or vector.x: The input value or vector to be tested.T: Can bef32,f16, or a vector of these types (e.g.,vec3<f32>).
Description
Compares the input x to the edge.
If
xis greater than or equal toedge, the result is1.0.If
xis less thanedge, the result is0.0.
When the inputs are vectors, the comparison is performed component-wise, and the result is a vector of 0.0s and 1.0s.
Overload: Scalar Edge
This is a convenient alternative overload that compares each component of a vector to a single scalar threshold.
Signature
step(edge: S, x: T) -> T
edge: The single, scalar threshold.x: The input vector to be tested.S: The scalar type,f32orf16.T: The vector type,vecN<S>.
Description
Compares each component of the input vector x against the single scalar edge. The result is a vector of 0.0s and 1.0s.
Use Case
// Use Case 1: Creating a hard-edged circle
let center = vec2<f32>(0.5);
let radius = 0.4;
let dist = distance(uv, center);
// This creates a mask. `step` returns 0.0 inside the circle and 1.0 outside.
let mask = step(radius, dist);
// We subtract from 1.0 to invert it, making the circle white.
let circle = 1.0 - mask;
// Use Case 2: Component-wise threshold
let color = vec3<f32>(0.2, 0.6, 0.8);
let threshold = vec3<f32>(0.5, 0.5, 0.5);
let result = step(threshold, color);
// `result` is vec3<f32>(0.0, 1.0, 1.0), because only the green and blue
// components were greater than or equal to 0.5.
let color = vec3<f32>(0.2, 0.6, 0.8);
let brightness_threshold = 0.5;
// Compare each component of `color` to the single `brightness_threshold`.
let result = step(brightness_threshold, color);
// `result` is vec3<f32>(0.0, 1.0, 1.0).
Logical & Selection Functions
This family of functions provides the core tools for performing logical operations and conditional selection, often serving as high-performance, "branchless" alternatives to if statements.
GPUs achieve their speed by having many threads execute the same instruction in lock-step. An if statement can break this parallelism if some threads in a group take a different path than others (a situation called "thread divergence"). The functions in this category are designed to perform decision-making tasks without introducing this branching, which can lead to more efficient and predictable code.
They serve two main purposes:
Vector Reduction: Functions that take a vector of booleans (often the result of a component-wise comparison) and collapse it into a single
trueorfalseresult.Conditional Selection: Functions that choose between two values based on a boolean condition, acting as WGSL's equivalent of a ternary operator.
Mastering these functions is key to writing concise, readable, and highly efficient shader logic.
all
Signature
all(e: vecN<bool>) -> boolall(e: bool) -> boole: The input boolean or vector of booleans.N: For the vector overload, must be2,3, or4.
Description
Performs a logical AND operation on the input. The behavior depends on the input type:
For a
vecN<bool>: Returnstrueif and only if every component of the vectoreistrue. This is a "vector reduction," collapsing a boolean vector into a single result.For a single
bool: Acts as an identity function, simply returning the input booleanewithout modification. This overload exists mainly for language completeness.
W3C Specification: all
Use Case
The primary use case is for the vector overload, which allows you to make a single decision based on a component-wise comparison of two vectors.
let reference_point = vec3<f32>(1.0, 2.0, 3.0);
let point_a = vec3<f32>(1.5, 2.5, 3.5); // All components are greater
let point_b = vec3<f32>(0.5, 2.5, 3.5); // One component is not greater
// The component-wise comparison results in a boolean vector.
let a_is_greater = point_a > reference_point; // -> vec3<bool>(true, true, true)
let b_is_greater = point_b > reference_point; // -> vec3<bool>(false, true, true)
// `all` reduces the boolean vector to a single bool.
let result_a = all(a_is_greater); // -> true
let result_b = all(b_is_greater); // -> false
any
Signature
any(e: vecN<bool>) -> boolany(e: bool) -> boole: The input boolean or vector of booleans.N: For the vector overload, must be2,3, or4.
Description
Performs a logical OR operation on the input. The behavior depends on the input type:
For a
vecN<bool>: Returnstrueif one or more components of the vectorearetrue. It only returnsfalseif all components arefalse. This is a "vector reduction," collapsing a boolean vector into a single result.For a single bool: Acts as an identity function, simply returning the input boolean
ewithout modification. This overload exists mainly for language completeness.
W3C Specification: any
Use Case
The primary use case is for the vector overload, which allows you to check if at least one component of a vector satisfies a condition.
let thresholds = vec3<f32>(0.5, 0.8, 0.3);
let values = vec3<f32>(0.6, 0.7, 0.9);
// The component-wise comparison results in a boolean vector.
// 0.6 > 0.5 -> true
// 0.7 > 0.8 -> false
// 0.9 > 0.3 -> true
let is_over_threshold = values > thresholds; // -> vec3<bool>(true, false, true)
// `any` reduces the boolean vector to a single bool.
// Since at least one component was true, the result is true.
let result = any(is_over_threshold); // -> true
if (result) {
// This code will execute because the red and blue channels passed the check.
}
select
The select function is WGSL's branchless equivalent of a ternary operator (like condition ? value_if_true : value_if_false in other languages). It chooses between two values based on a boolean condition without using an if statement, which can be more performant on the GPU.
W3C Specification: select
Overload: Scalar Condition
This is the most common overload. It selects one of two complete values (scalars or vectors) based on a single boolean condition.
Signature
select(f: T, t: T, cond: bool) -> T
f: The value to be returned ifcondisfalse.t: The value to be returned ifcondistrue.cond: The single boolean condition.T: Can be any scalar (i32,f32, etc.) or vector (vec3<f32>, etc.) type.fandtmust be of the same type.
Description
Returns the value t if the boolean cond is true, and returns the value f otherwise.
Crucially, this is a branchless operation. The GPU evaluates both the t and f expressions, and then simply selects the correct result. This avoids the performance cost of "thread divergence" that can happen with an if statement. However, it also means you should never use expensive, mutually-exclusive functions for both the t and f arguments, as both will always be executed.
Overload: Vector Condition
This overload performs a component-wise selection, building a new vector by choosing components from two source vectors based on a boolean vector.
Signature
select(f: vecN<T>, t: vecN<T>, cond: vecN<bool>) -> vecN<T>
f,t: The source vectors.cond: The vector of boolean conditions.T: The scalar component type of the vectors.N: The dimension of the vectors.
Description
Constructs a result vector where each component i is chosen from t[i] if cond[i] is true, or from f[i] if cond[i] is false. All three input vectors must have the same number of components.
Use Case
let brightness = 0.8;
let is_bright = brightness > 0.5; // is_bright is true
let black = vec3<f32>(0.0);
let white = vec3<f32>(1.0);
// Because is_bright is true, the function will select the `t` value (white).
let final_color = select(black, white, is_bright); // Result is vec3<f32>(1.0)
let a = vec3<f32>(1.0, 10.0, 100.0);
let b = vec3<f32>(5.0, 5.0, 5.0);
// The component-wise comparison results in a boolean vector.
// 1.0 > 5.0 -> false
// 10.0 > 5.0 -> true
// 100.0 > 5.0 -> true
let condition_vec = a > b; // Result is vec3<bool>(false, true, true)
// Build a new vector based on the condition vector:
// Result.x is from a.x because condition_vec.x is false.
// Result.y is from b.y because condition_vec.y is true.
// Result.z is from b.z because condition_vec.z is true.
let result = select(a, b, condition_vec); // Result is vec3<f32>(1.0, 5.0, 5.0)
Integer & Bitwise Functions
This family of functions operates on the binary representation of integers. Instead of treating them as numerical values for arithmetic, these functions treat i32 and u32 types as a sequence of 32 individual bits (0s and 1s).
They provide the tools to perform low-level bit manipulation, such as:
Counting the number of set (
1) or unset (0) bits.Finding the position of the first or last set bit.
Reversing the order of all bits within an integer.
Directly reading (
extracting) or writing (inserting) specific segments of bits.
While less common in high-level visual effects than mathematical or texture functions, these operations are indispensable for performance-critical algorithms, custom data packing and unpacking, creating bitmasks and flags, and implementing hash functions or pseudo-random number generators. They offer precise control over data at its most fundamental level.
countLeadingZeros
Signature
countLeadingZeros(e: T) -> T
e: The input integer or vector of integers.T: Can bei32,u32, or a vector of these types (e.g.,vec3<u32>).
Description
Counts the number of consecutive 0 bits, starting from the most significant bit (the "leftmost" bit in a standard binary representation) of a 32-bit integer. This function is often abbreviated as "clz".
countLeadingZeros(0u)returns32.countLeadingZeros(1u)returns31.countLeadingZeros(2147483648u)(which is1followed by 31 zeros in binary) returns0.
The function is applied component-wise when the input is a vector.
W3C Specification: countLeadingZeros
Use Case
This is a low-level bit manipulation function often used for performance-critical tasks like calculating an integer logarithm base 2 or finding the position of the most significant bit (MSB).
// Find the integer base-2 logarithm of a number, which is equivalent
// to finding the position of its most significant bit.
let value = 1000u; // Binary: ...01111101000, MSB is at position 9 (0-indexed)
// The number of leading zeros tells us how far the MSB is from the "left".
let leading_zeros = countLeadingZeros(value); // For 1000u, this is 22.
// For a non-zero u32, the position of the MSB is 31 minus the number of leading zeros.
let msb_position = 31u - leading_zeros; // 31 - 22 = 9
// The result, 9, is the integer log2 of 1000 (since 2^9 = 512, which is the
// largest power of 2 less than or equal to 1000).
countOneBits
Signature
countOneBits(e: T) -> T
e: The input integer or vector of integers.T: Can bei32,u32, or a vector of these types (e.g.,vec3<u32>).
Description
Counts the number of bits that are set to 1 in the binary representation of an integer. This operation is also known as "population count" or "popcount".
countOneBits(0u)returns0.countOneBits(7u)(binary...0111) returns3.countOneBits(4294967295u)(all1s for au32) returns32.For negative numbers, the count is based on the two's complement representation.
countOneBits(-1i)is32.
The function is applied component-wise when the input is a vector.
W3C Specification: countOneBits
Use Case
This is a low-level bit manipulation function used in a variety of algorithms, including cryptography, data compression, and implementing certain types of hash functions or pseudo-random number generators.
// A simple example of using popcount to measure the "bit density" of a number.
let val_a = 7u; // ...00000111
let val_b = 8u; // ...00001000
let popcount_a = countOneBits(val_a); // Result is 3
let popcount_b = countOneBits(val_b); // Result is 1
// It can be a building block for a pseudo-random generator (hash function).
// This is a toy example, not a high-quality hash.
fn simple_hash(n: u32) -> u32 {
let a = n ^ (n << 13u);
let b = a ^ (a >> 7u);
return countOneBits(b);
}
countTrailingZeros
Signature
countTrailingZeros(e: T) -> T
e: The input integer or vector of integers.T: Can bei32,u32, or a vector of these types (e.g.,vec3<u32>).
Description
Counts the number of consecutive 0 bits, starting from the least significant bit (the "rightmost" bit in a standard binary representation) of a 32-bit integer. This function is often abbreviated as "ctz".
countTrailingZeros(0u)returns32.countTrailingZeros(8u)(binary...1000) returns3.countTrailingZeros(7u)(binary...0111) returns0.
The function is applied component-wise when the input is a vector.
W3C Specification: countTrailingZeros
Use Case
This is a low-level bit manipulation function. Its primary use is to find the position of the least significant set bit (LSB). It can also be used to quickly determine the largest power of 2 that a number is divisible by.
// A number whose binary representation is ...0101100.
let value = 44u;
// The number of trailing zeros is equal to the 0-indexed position
// of the least significant '1' bit.
// For ...0101100, there are two trailing zeros, so the LSB is at position 2.
let lsb_position = countTrailingZeros(value); // Result is 2.
// This is useful for algorithms that need to isolate and process the
// lowest set bit in a bitmask.
// 2 to the power of `lsb_position` gives us a mask for that bit.
let lsb_mask = 1u << lsb_position; // 1u << 2u is 4u (binary ...100)
extractBits
Extracts a sequence of bits from a specific location within an integer. The behavior of the function differs significantly depending on whether the integer is signed or unsigned.
W3C Specification: extractBits
Overload: Signed Integers (i32)
This overload reads bits from a signed integer and performs sign extension on the result.
Signature
extractBits(e: T, offset: u32, count: u32) -> T
e: The source integer or vector of integers.offset: The 0-indexed starting bit position to begin extracting from.count: The number of bits to extract.T: Can bei32orvecN<i32>.
Description
Extracts a count-bit sequence from e starting at bit offset. The key feature is sign extension: the most significant bit of the extracted sequence is copied to all the higher bits of the 32-bit result. This preserves the numerical value of the extracted bits as if they were a smaller signed integer type.
The offset and count are clamped to safe values to prevent out-of-bounds reads. It is a compile-time error if offset + count is greater than 32 and both parameters are compile-time constants.
Overload: Unsigned Integers (u32)
This overload reads bits from an unsigned integer and performs zero extension on the result.
Signature
extractBits(e: T, offset: u32, count: u32) -> T
e: The source integer or vector of integers.offset: The 0-indexed starting bit position.count: The number of bits to extract.T: Can beu32orvecN<u32>.
Description
Extracts a count-bit sequence from e starting at bit offset. The key difference from the signed version is zero extension: all higher bits of the 32-bit result that were not part of the extraction are filled with 0s.
The offset and count are clamped to safe values. It is a compile-time error if offset + count is greater than 32 and both parameters are compile-time constants.
Use Case
Used for unpacking multiple smaller signed values that have been packed into a single i32. Sign extension is critical for correctly restoring negative values.
// Imagine two 4-bit signed numbers are packed into the first 8 bits of an i32.
// High bits: 1011 (which is -5 in a 4-bit signed integer)
// Low bits: 0101 (which is +5 in a 4-bit signed integer)
// The combined bit pattern is ...000010110101
let packed_data = 181i; // 10110101 in binary
// Extract the high 4 bits (the negative number)
// Offset is 4, count is 4.
// Extracted bits are '1011'. The most significant bit of this sequence is '1'.
// Sign extension will fill the upper 28 bits of the result with '1's.
// Result: ...11111011, which is the correct 32-bit representation of -5.
let high_value = extractBits(packed_data, 4u, 4u); // Result is -5
// Extract the low 4 bits (the positive number)
// Offset is 0, count is 4.
// Extracted bits are '0101'. The most significant bit of this sequence is '0'.
// The upper 28 bits of the result are filled with '0's.
// Result: ...00000101, which is the correct 32-bit representation of 5.
let low_value = extractBits(packed_data, 0u, 4u); // Result is 5
Used for unpacking multiple smaller unsigned values, such as color channels or other non-negative data packed into a single u32.
// Imagine an RGBA color where R and G are 4 bits each, packed into a u32.
// High bits (G): 1011 (which is the unsigned value 11)
// Low bits (R): 0101 (which is the unsigned value 5)
// The combined bit pattern is ...000010110101
let packed_color = 181u;
// Extract the high 4 bits (the Green channel)
// Offset is 4, count is 4.
// Extracted bits are '1011'.
// Zero extension fills the upper bits with '0's.
// Result: ...00001011, which is the correct 32-bit representation of 11.
let green_channel = extractBits(packed_color, 4u, 4u); // Result is 11
// Extract the low 4 bits (the Red channel)
// Offset is 0, count is 4.
// Extracted bits are '0101'.
// Zero extension fills the upper bits with '0's.
// Result: ...00000101, which is the correct 32-bit representation of 5.
let red_channel = extractBits(packed_color, 0u, 4u); // Result is 5
firstLeadingBit
Finds the position of the most significant 1 bit in an integer's binary representation. This is also known as "find first set" from the left.
W3C Specification: firstLeadingBit
Overload: Unsigned Integers (u32)
This is the most common and intuitive version of the function.
Signature
firstLeadingBit(e: T) -> T
e: The source unsigned integer or vector.T: Can beu32orvecN<u32>.
Description
Finds the index of the most significant (leftmost) 1 bit in the binary representation of e.
If
eis not zero, the result is the 0-indexed position of the highest set bit.If
eis zero, the result is-1.
Use Case
This function is a highly efficient way to calculate the integer base-2 logarithm of a number, which is equivalent to finding the position of its most significant bit.
// A number whose binary representation is ...00101000.
// The most significant '1' bit is at position 5 (reading from the right, 0-indexed).
let value_a = 40u;
let msb_pos_a = firstLeadingBit(value_a); // Result is 5
// Another example. Binary is ...01000000
let value_b = 64u;
let msb_pos_b = firstLeadingBit(value_b); // Result is 6
// Case where the input is zero.
let value_c = 0u;
let msb_pos_c = firstLeadingBit(value_c); // Result is -1
Overload: Signed Integers (i32)
This overload operates on the two's complement representation of a signed integer, which can lead to different results than the unsigned version, especially for negative numbers.
Signature
firstLeadingBit(e: T) -> T
e: The source signed integer or vector.T: Can bei32orvecN<i32>.
Description
Finds the index of the most significant 1 bit in the two's complement binary representation of e.
For positive numbers, the behavior is the same as the
u32version.firstLeadingBit(40i) is 5.For negative numbers, the result is the position of the leftmost
0bit, because in two's complement, negative numbers are effectively the bitwise inverse of their positive counterparts (plus one), resulting in many leading1s.If
eis -1 (all bits are1), the result is-1.If
eis zero, the result is-1.
Use Case
This is a specialized function. It is rarely used directly for visual effects. Its primary use is in low-level numerical algorithms where operating on the two's complement representation of numbers is necessary. For most "find my highest bit" tasks, the u32 version is what you want.
// Positive number, same as u32 version.
let val_pos = 40i;
let msb_pos_pos = firstLeadingBit(val_pos); // Result is 5
// For negative numbers, the two's complement representation is used.
// -1 in i32 is 32 ones (...11111111). There are no '0' bits to be the "leading bit".
// The spec defines this as a special case.
let val_neg_one = -1i;
let msb_pos_neg_one = firstLeadingBit(val_neg_one); // Result is -1
firstTrailingBit
Signature
firstTrailingBit(e: T) -> T
e: The input integer or vector of integers.T: Can bei32,u32, or a vector of these types (e.g.,vec3<u32>).
Description
Finds the position of the least significant 1 bit in an integer's binary representation. This is also known as "find first set" from the right.
If
eis not zero, the result is the 0-indexed position of the rightmost1bit.If
eis zero, the result is-1.
Unlike firstLeadingBit, the behavior of this function is consistent for both positive and negative signed integers because the least significant bits are not affected by the two's complement sign representation in the same way.
W3C Specification: firstTrailingBit
Use Case
This function is a highly efficient way to find the position of the lowest set bit in a bitmask. This is equivalent to countTrailingZeros, but provides a different name for the same operation, which can sometimes improve code clarity depending on the algorithm's intent.
// A number whose binary representation is ...0101100.
// The least significant '1' bit is at position 2 (reading from the right, 0-indexed).
let value = 44u;
let lsb_position = firstTrailingBit(value); // Result is 2.
// This result is identical to `countTrailingZeros(44u)`.
// Case where the input is zero.
let value_zero = 0u;
let lsb_pos_zero = firstTrailingBit(value_zero); // Result is -1
insertBits
Signature
insertBits(e: T, newbits: T, offset: u32, count: u32) -> T
e: The base integer or vector to insert bits into.newbits: The integer or vector containing the bits to insert.offset: The 0-indexed starting bit position inewhere insertion begins.count: The number of bits to insert.T: Can bei32,u32, or a vector of these types (e.g.,vec3<u32>).
Description
Constructs a new integer by replacing a sequence of bits in e with bits from newbits.
Specifically, it takes the count least significant bits from newbits and copies them into the bit range [offset, offset + count - 1] of the base value e. All other bits from e are preserved.
The offset and count parameters are clamped to safe values to prevent out-of-bounds writes. It is a compile-time error if offset + count is greater than 32 and both parameters are compile-time constants.
W3C Specification: insertBits
Use Case
This is a low-level bit manipulation function used for packing multiple smaller values into a single larger integer. It is the inverse operation of extractBits.
// Let's pack a 4-bit Red and a 4-bit Green channel into a single u32.
var packed_color = 0u; // Start with an empty integer
let red_channel = 5u; // Binary ...0101
let green_channel = 11u; // Binary ...1011
// Insert the 4 bits of the red channel at the beginning (offset 0).
// packed_color is now ...00000101
packed_color = insertBits(packed_color, red_channel, 0u, 4u);
// Insert the 4 bits of the green channel starting at bit 4.
// packed_color is now ...000010110101
packed_color = insertBits(packed_color, green_channel, 4u, 4u);
// The final packed_color is 181u, which contains both pieces of data.
// This can now be unpacked using `extractBits`.
reverseBits
Signature
reverseBits(e: T) -> T
e: The input integer or vector of integers.T: Can bei32,u32, or a vector of these types (e.g.,vec3<u32>).
Description
Reverses the order of the 32 bits in the binary representation of e. The most significant bit (bit 31) becomes the least significant bit (bit 0), and so on.
For example, for a u32, the bit at position k in the result is taken from the bit at position 31 - k in the input.
W3C Specification: reverseBits
Use Case
This is a low-level bit manipulation function used in specific algorithms that require reordering of data at the bit level, such as certain Fast Fourier Transform (FFT) implementations, Cyclic Redundancy Checks (CRCs), or as a component in some hash functions.
// Let's take the number 13u.
// Its 32-bit binary representation is:
// 00000000 00000000 00000000 00001101
let value = 13u;
// `reverseBits` will flip this entire sequence.
let reversed_value = reverseBits(value);
// The result's binary representation will be:
// 10110000 00000000 00000000 00000000
// which is the decimal value 2952790016u.
Buffer and Array Functions
This category contains functions that query metadata about buffer resources, specifically those containing arrays whose size is not known at shader compilation time.
In WGSL, storage buffers can be declared with runtime-sized arrays, meaning their final length is determined by the buffer that is bound on the CPU side. This is a powerful feature for handling dynamic data, but it requires a special mechanism within the shader to determine the array's bounds. The functions in this group provide that mechanism, allowing you to safely query the size of a runtime array to prevent out-of-bounds memory access.
arrayLength
Signature
arrayLength(p: ptr<storage, T>) -> u32
p: A pointer to a runtime-sized array.T: The type of the array itself (e.g.,array<f32>).
Description
Returns the number of elements in a runtime-sized array that is stored in a storage buffer.
This function has a critical structural requirement: it can only be used on an array that is the last member of a struct which is the type for a storage buffer variable. The size is not known at shader compilation time; it is determined by the size of the actual buffer that is bound on the CPU side at runtime.
This function is the primary mechanism for safely working with dynamic-sized data in shaders, as it allows you to query the buffer's bounds before accessing it.
W3C Specification: arrayLength
Use Case
The most common use case is to get the size of an input buffer in a compute shader to use as a loop bound or for safety checks to prevent out-of-bounds memory access.
// This struct defines the layout of our storage buffer.
// The runtime-sized array MUST be the last member.
struct MyBuffer {
some_config_value: f32,
data: array<f32>,
}
@group(0) @binding(0) var<storage, read> my_buffer: MyBuffer;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
// Get the total number of elements in the 'data' array at runtime.
// Note the '&' to pass a pointer to the array.
let num_elements = arrayLength(&my_buffer.data);
// Use the length to prevent reading past the end of the buffer.
if (global_id.x >= num_elements) {
return;
}
// Now it's safe to access the element.
let value = my_buffer.data[global_id.x];
// ... do work with the value ...
}
Type Conversion & Reinterpretation
This category contains functions that perform low-level reinterpretation of a value's underlying binary representation.
Unlike standard type conversions (e.g., f32(my_int)), which change a value from one numerical representation to another while preserving its meaning, these functions change the type label of a block of memory without altering the raw bits.
This is an advanced operation used for specialized algorithms, data packing, or when you need to operate directly on the bit pattern of a number. The core principle for these operations is that the total bit-width of the source and destination types must be identical.
bitcast
A low-level operation that reinterprets the raw binary bits of an input value as if they were of a different type, without changing the bit pattern itself. This is not a traditional type conversion that preserves numerical value (e.g., bitcast<i32>(1.0) will not result in 1).
The fundamental rule of bitcast is that the total bit-width of the input and output types must be identical. The target type T is specified using a unique angle-bracket syntax: bitcast<T>(...).
W3C Specification: bitcast
Overloads: Same-Size Reinterpretation (f32, i32, u32)
This group of overloads reinterprets the bits between the fundamental 32-bit types.
Signature
bitcast<T>(e: S) -> T (Scalar)bitcast<vecN<T>>(e: vecN<S>) -> vecN<T> (Vector)S: The source type, can bef32,i32, oru32.T: The target type, a different type fromS, can bef32,i32, oru32.
Description
Reinterprets the 32 bits of a scalar or each 32-bit component of a vector. For example, bitcast<u32>(1.0) takes the 32 bits that represent the f32 value 1.0 (which is 0x3F800000 in hex) and interprets them as a u32, resulting in the integer 1065353216.
Use Case
Used for data packing or implementing highly specialized algorithms that rely on the binary representation of numbers.
// Use case: Packing a floating-point value into an integer for storage.
let my_float: f32 = -2.0;
// Reinterpret the 32 bits of the float as a 32-bit unsigned integer.
// The result is not 2, but the integer value of the float's bit pattern.
let my_packed_int: u32 = bitcast<u32>(my_float);
// You could now store this integer in a buffer that only accepts integers.
// To retrieve it, you would bitcast back:
let my_unpacked_float = bitcast<f32>(my_packed_int); // Restores -2.0
Overloads: Packing and Unpacking (f16)
This group of overloads reinterprets bits between types of different sizes but the same total bit-width. This is used for packing smaller f16 values into larger 32-bit types and vice versa.
Signature
bitcast<T>(e: vec2<f16>) -> T(Pack twof16into one 32-bit value)bitcast<vec2<f16>>(e: T) -> vec2<f16>(Unpack one 32-bit value into twof16s)bitcast<vec2<T>>(e: vec4<f16>) -> vec2<T>(Pack fourf16into two 32-bit values)bitcast<vec4<f16>>(e: vec2<T>) -> vec4<f16>(Unpack two 32-bit values into fourf16s)T: Can bef32,i32, oru32.
Description
These overloads allow you to treat a block of memory as different types. For example, since an f16 is 16 bits, two of them (vec2<f16>) occupy the same 32 bits of memory as a single f32. bitcast allows you to convert between these representations.
Use Case
Used for memory optimization, allowing you to store data with lower precision (f16) and unpack it to higher precision (f32) for calculations.
// Imagine you have two f16 values representing UV coordinates.
let uvs_f16 = vec2<f16>(0.5, 0.25);
// You can pack them into a single u32 to save space in a storage buffer.
// The 32 bits of the u32 will now contain the 16 bits of the first f16
// and the 16 bits of the second f16.
let packed_uvs = bitcast<u32>(uvs_f16);
// Later, in another shader, you can unpack them to do math.
let unpacked_uvs = bitcast<vec2<f16>>(packed_uvs);
// And convert to f32 for high-precision calculations.
let uvs_f32 = vec2<f32>(unpacked_uvs);
Overload: Identity Transform
This overload handles the case where the source and target types are the same.
Signature
bitcast<T>(e: T) -> T
T: A concrete numeric scalar or vector type (e.g.,f32,vec3<i32>).
Description
When the source type e and the target type T are identical, bitcast performs an identity transform. It simply returns the input value e without any modification.
While this has no effect on the value, it exists for language completeness and can be useful in generic, programmatic shader generation where the source and target types might sometimes be the same.
Use Case
let my_val: f32 = 123.45;
// This is a valid operation, but it does nothing.
let same_val = bitcast<f32>(my_val); // `same_val` is also 123.45
Packing & Unpacking Functions
This family of functions provides a low-level way to optimize memory usage by converting multiple floating-point values into a single 32-bit unsigned integer (u32), and vice versa. This is a common and critical technique for efficiently storing vertex attributes or data in storage buffers.
The core idea is to reduce the precision of the data to a more compact format before storing it:
Packing functions take a vector of
f32values and convert each component to a smaller representation (like a 16-bit float or an 8/16-bit normalized integer). The bits of these smaller components are then packed together into a singleu32. This is a lossy conversion, as precision is intentionally discarded.Unpacking functions perform the reverse operation. They take a
u32, split it into its smaller bit-chunks, and convert each chunk back into af32value according to the specified format.
These functions support several standard formats, including half-precision floats (float), unsigned normalized integers (unorm for values in the [0.0, 1.0] range like colors and UVs), and signed normalized integers (snorm for values in the [-1.0, 1.0] range like normals and tangents).
pack2x16float
Signature
pack2x16float(e: vec2<f32>) -> u32
e: The input vector containing two 32-bit floats.
Description
Packs two 32-bit floating-point values (f32) into a single 32-bit unsigned integer (u32).
The function performs this by converting each component of the input vec2<f32> to its nearest 16-bit half-precision representation (an f16). This is a lossy conversion, as the f16 type has less precision and a smaller range than f32.
The 16 bits of the first component (e.x) are placed into the lower half (bits 0-15) of the resulting u32, and the 16 bits of the second component (e.y) are placed into the upper half (bits 16-31).
The input values must be within the representable range of an f16 float. Providing a value that is too large, too small, or NaN will result in an error if the value is known at compile time, or an indeterminate u32 value at runtime.
W3C Specification: pack2x16float
Use Case
This function is essential for memory optimization. It allows you to store data that does not require full 32-bit precision, such as texture coordinates, more compactly.
// A pair of texture coordinates that don't need high precision.
let uvs_f32 = vec2<f32>(0.5, 0.25);
// Pack the two 32-bit floats into a single 32-bit integer.
let packed_uvs: u32 = pack2x16float(uvs_f32);
// The `packed_uvs` variable now holds both UV coordinates in just 32 bits,
// whereas the original `vec2<f32>` required 64 bits. This is highly useful
// for storing data compactly in storage buffers. The corresponding
// `unpack2x16float` function is used to reverse the operation.
pack2x16snorm
Signature
pack2x16snorm(e: vec2<f32>) -> u32
e: The input vector containing twof32values, typically in the range[-1.0, 1.0].
Description
Packs two 32-bit floating-point values into a single 32-bit unsigned integer, treating the inputs as "signed normalized" (snorm) numbers. This is the inverse operation of unpack2x16snorm.
The packing process works as follows for each component of the input vec2<f32>:
The
f32value is first clamped to the[-1.0, 1.0]range.It is then scaled by
32767.0.The result is rounded to the nearest whole number.
This number is stored as a 16-bit two's-complement signed integer.
The 16 bits of the first component (e.x) are placed into the lower half (bits 0-15) of the resulting u32, and the 16 bits of the second component (e.y) are placed into the upper half (bits 16-31).
W3C Specification: pack2x16snorm
Use Case
This function is used to compactly store data that naturally fits the [-1.0, 1.0] range, such as normal vectors or other signed directional data, saving significant memory.
// A normal vector's x and y components. The z component is often omitted and
// reconstructed later to save even more space.
let normal_xy = vec2<f32>(0.707, -0.707);
// Pack the two 32-bit floats into a single 32-bit integer.
let packed_normal_xy: u32 = pack2x16snorm(normal_xy);
// The `packed_normal_xy` variable now holds both vector components in just 32 bits.
// This can be stored efficiently in a storage buffer or texture.
// The `unpack2x16snorm` function is used to reverse the operation.
pack2x16unorm
Signature
pack2x16unorm(e: vec2<f32>) -> u32
e: The input vector containing twof32values, typically in the range[0.0, 1.0].
Description
Packs two 32-bit floating-point values into a single 32-bit unsigned integer, treating the inputs as "unsigned normalized" (unorm) numbers. This is the inverse operation of unpack2x16unorm.
The packing process works as follows for each component of the input vec2<f32>:
The
f32value is first clamped to the[0.0, 1.0]range.It is then scaled by
65535.0.The result is rounded to the nearest whole number.
This number is stored as a 16-bit unsigned integer.
The 16 bits of the first component (e.x) are placed into the lower half (bits 0-15) of the resulting u32, and the 16 bits of the second component (e.y) are placed into the upper half (bits 16-31).
W3C Specification: pack2x16unorm
Use Case
This is the most common packing function for data that is naturally in the [0.0, 1.0] range, such as texture coordinates or HDR color components that have been tone-mapped.
// A pair of texture coordinates.
let uvs = vec2<f32>(0.25, 0.75);
// Pack the two 32-bit floats into a single 32-bit integer.
let packed_uvs: u32 = pack2x16unorm(uvs);
// The `packed_uvs` variable now holds both UV coordinates in just 32 bits,
// saving half the memory compared to a vec2<f32>.
// The `unpack2x16unorm` function is used to reverse the operation.
pack4x8snorm
Signature
pack4x8snorm(e: vec4<f32>) -> u32
e: The input vector containing fourf32values, typically in the range[-1.0, 1.0].
Description
Packs four 32-bit floating-point values into a single 32-bit unsigned integer, treating the inputs as "signed normalized" (snorm) numbers. This is the inverse operation of unpack4x8snorm.
This function offers greater compression than the pack2x16 variants, but with lower precision (8 bits per component, allowing for 256 discrete values). The packing process works as follows for each component of the input vec4<f32>:
The
f32value is first clamped to the[-1.0, 1.0]range.It is then scaled by
127.0.The result is rounded to the nearest whole number.
This number is stored as an 8-bit two's-complement signed integer.
The four resulting 8-bit integers are then packed into the u32 result:
e.xis placed in bits 0-7.e.yis placed in bits 8-15.e.zis placed in bits 16-23.e.wis placed in bits 24-31.
W3C Specification: pack4x8snorm
Use Case
This function is ideal for storing four-component data where each component is in the [-1.0, 1.0] range and high precision is not required. A common example is packing a tangent vector, which has an xyz direction and a w component for handedness.
// A tangent vector for normal mapping.
// The .xyz is the direction, and .w is the handedness (-1.0 or 1.0).
let tangent = vec4<f32>(0.707, 0.0, 0.707, 1.0);
// Pack the four 32-bit floats into a single 32-bit integer.
let packed_tangent: u32 = pack4x8snorm(tangent);
// The `packed_tangent` variable now holds the entire tangent vector in
// just 32 bits, a 4x memory saving compared to the original vec4<f32>.
// This is very efficient for storing vertex attributes.
pack4x8unorm
Signature
pack4x8unorm(e: vec4<f32>) -> u32
e: The input vector containing fourf32values, typically in the range[0.0, 1.0].
Description
Packs four 32-bit floating-point values into a single 32-bit unsigned integer, treating the inputs as "unsigned normalized" (unorm) numbers. This is the inverse operation of unpack4x8unorm.
This function is one of the most common packing operations, as its format directly corresponds to the standard 8-bit RGBA color representation. The packing process works as follows for each component of the input vec4<f32>:
The
f32value is first clamped to the[0.0, 1.0]range.It is then scaled by
255.0.The result is rounded to the nearest whole number.
This number is stored as an 8-bit unsigned integer.
The four resulting 8-bit integers are then packed into the u32 result:
e.x(Red) is placed in bits 0-7.e.y(Green) is placed in bits 8-15.e.z(Blue) is placed in bits 16-23.e.w(Alpha) is placed in bits 24-31.
W3C Specification: pack4x8unorm
Use Case
This function is the standard method for packing a floating-point RGBA color into a single 32-bit integer, which is how colors are often stored in textures and buffers to save memory.
// A standard RGBA color with floating-point components.
let my_color = vec4<f32>(1.0, 0.5, 0.25, 1.0); // An orange color
// Pack the four f32 components into a single u32.
let packed_color: u32 = pack4x8unorm(my_color);
// The `packed_color` now holds the entire color in 32 bits, a 4x memory saving.
// This integer value could be written directly to a texture format like `Rgba8Unorm`.
// For example, the resulting integer would be 4284901375u, which is
// 0xFF4080FF in hexadecimal (AABBGGRR in little-endian).
unpack2x16float
Signature
unpack2x16float(e: u32) -> vec2<f32>
e: The input 32-bit unsigned integer containing the packed data.
Description
Unpacks a single u32 value into two 32-bit floating-point values (f32). This is the inverse operation of pack2x16float.
The unpacking process works as follows:
The
u32inputeis split into two 16-bit chunks.Each 16-bit chunk is interpreted as an IEEE-754 binary16 (half-precision float) value.
This 16-bit float value is then converted (promoted) to a full 32-bit
f32.
The two resulting f32 values are placed into a vec2<f32>:
Bits 0-15 of
ebecome the first component of the result (.x).Bits 16-31 of
ebecome the second component (.y).
W3C Specification: unpack2x16float
Use Case
This function is used to read and decode data that was previously packed with pack2x16float to save memory. It allows you to restore lower-precision data into a high-precision format suitable for calculations.
// A u32 value received from a storage buffer, which contains packed UV coordinates.
let packed_uvs: u32 = ...;
// Unpack the u32 back into two f32s.
let unpacked_uvs: vec2<f32> = unpack2x16float(packed_uvs);
// Now `unpacked_uvs` is a standard vec2<f32> that can be used for
// texture sampling or other calculations. The precision of the unpacked
// values will be limited to that of the original f16 representation.
let color = textureSample(my_texture, my_sampler, unpacked_uvs);
unpack2x16snorm
Signature
unpack2x16snorm(e: u32) -> vec2<f32>
e: The input 32-bit unsigned integer containing the packed data.
Description
Unpacks a single u32 value into two 32-bit floating-point values (f32), interpreting the packed data as "signed normalized" (snorm) numbers. This is the inverse operation of pack2x16snorm.
The process works as follows:
The
u32input e is split into two 16-bit chunks.Each 16-bit chunk is interpreted as a two's-complement signed integer, giving a value
vin the range[-32768, 32767].This integer v is converted to a floating-point number in the range [-1.0, 1.0] by dividing it by
32767.0and clamping the result at-1.0.
Bits 0-15 of e become the first component of the result (
.x).Bits 16-31 of e become the second component (
.y).
"Signed Normalized" (snorm) is a standard format for storing values that have a direction (like components of a normal vector) efficiently in an integer.
W3C Specification: unpack2x16snorm
Use Case
This function is used to read and decode data that was previously packed with pack2x16snorm to save memory. It's ideal for data that naturally fits the [-1.0, 1.0] range.
// A u32 value received from a storage buffer, which contains a packed normal vector's x and y components.
let packed_normal_xy: u32 = ...;
// Unpack the u32 back into two f32s in the [-1.0, 1.0] range.
let unpacked_normal_xy: vec2<f32> = unpack2x16snorm(packed_normal_xy);
// Now the unpacked_normal_xy can be used in lighting calculations.
// For example, reconstructing the full 3D normal:
let normal_xy = unpacked_normal_xy;
let normal_z = sqrt(1.0 - saturate(dot(normal_xy, normal_xy)));
let full_normal = vec3<f32>(normal_xy, normal_z);
unpack2x16unorm
Signature
unpack2x16unorm(e: u32) -> vec2<f32>
e: The input 32-bit unsigned integer containing the packed data.
Description
Unpacks a single u32 value into two 32-bit floating-point values (f32), interpreting the packed data as "unsigned normalized" (unorm) numbers. This is the inverse operation of pack2x16unorm.
The process works as follows:
The
u32input e is split into two 16-bit chunks.Each 16-bit chunk is interpreted as an unsigned integer, giving a value v in the range
[0, 65535].This integer
vis converted to a floating-point number in the range[0.0, 1.0]by dividing it by65535.0.
Bits 0-15 of e become the first component of the result (
.x).Bits 16-31 of e become the second component (
.y).
"Unsigned Normalized" (unorm) is a standard format for storing values that have a natural [0.0, 1.0] range, such as texture coordinates or color channels.
W3C Specification: unpack2x16unorm
Use Case
This function is used to read and decode data that was previously packed with pack2x16unorm to save memory. It's the standard way to reconstruct 16-bit unorm data, like UV coordinates, into a format suitable for high-precision calculations.
// A u32 value received from a storage buffer, which contains packed UV coordinates.
let packed_uvs: u32 = ...;
// Unpack the u32 back into two f32s in the [0.0, 1.0] range.
let unpacked_uvs: vec2<f32> = unpack2x16unorm(packed_uvs);
// Now `unpacked_uvs` is a standard vec2<f32> that can be used for
// texture sampling or other calculations.
let color = textureSample(my_texture, my_sampler, unpacked_uvs);
### `unpack4x8snorm`
#### Signature
`unpack4x8snorm(e: u32) -> vec4<f32>`
* `e`: The input 32-bit unsigned integer containing the packed data.
#### Description
Unpacks a single `u32` value into four 32-bit floating-point values (`f32`), interpreting the packed data as "signed normalized" (`snorm`) numbers. This is the inverse operation of [`pack4x8snorm`](#heading-pack4x8snorm).
The process works as follows:
1. The `u32` input e is split into four 8-bit chunks.
2. Each 8-bit chunk is interpreted as a two's-complement signed integer, giving a value `v` in the range `[-128, 127]`.
3. This integer `v` is converted to a floating-point number in the range `[-1.0, 1.0]` by dividing it by `127.0` and clamping the result at `-1.0`.
The four resulting `f32` values are placed into a `vec4<f32>`:
* Bits 0-7 of `e` become the first component (`.x`).
* Bits 8-15 of `e` become the second component (`.y`).
* Bits 16-23 of `e` become the third component (`.z`).
* Bits 24-31 of `e` become the fourth component (`.w`).
*W3C Specification:* [`unpack4x8snorm`](https://www.w3.org/TR/WGSL/#unpack4x8snorm-builtin)
#### Use Case
This function is used to read and decode four-component data, like tangent vectors, that were previously packed with [`pack4x8snorm`](#heading-pack4x8snorm) to save memory.
```rust
// A u32 value received as a vertex attribute, containing a packed tangent vector.
let packed_tangent: u32 = ...;
// Unpack the u32 back into four f32s in the [-1.0, 1.0] range.
let unpacked_tangent: vec4<f32> = unpack4x8snorm(packed_tangent);
// Now `unpacked_tangent` is a standard vec4<f32> that can be used
// to build the TBN matrix for normal mapping.
let tangent_direction = unpacked_tangent.xyz;
let tangent_handedness = unpacked_tangent.w;
unpack4x8unorm
Signature
unpack4x8unorm(e: u32) -> vec4<f32>
e: The input 32-bit unsigned integer containing the packed data.
Description
Unpacks a single u32 value into four 32-bit floating-point values (f32), interpreting the packed data as "unsigned normalized" (unorm) numbers. This is the inverse operation of pack4x8unorm.
This is a very common operation, as it directly corresponds to decoding a standard 32-bit RGBA color. The process works as follows:
The
u32input e is split into four 8-bit chunks.Each 8-bit chunk is interpreted as an unsigned integer, giving a value
vin the range[0, 255].This integer
vis converted to a floating-point number in the range[0.0, 1.0]by dividing it by255.0.
The four resulting f32 values are placed into a vec4<f32>:
Bits 0-7 of
ebecome the first component (.x, Red).Bits 8-15 of
ebecome the second component (.y, Green).Bits 16-23 of
ebecome the third component (.z, Blue).Bits 24-31 of
ebecome the fourth component (.w, Alpha).
W3C Specification: unpack4x8unorm
Use Case
This is the standard function for converting a color from its compact 32-bit integer representation (as found in Rgba8Unorm textures or buffers) into the vec4<f32> format required for high-precision color calculations.
// A u32 value representing an orange color (R=255, G=128, B=64, A=255).
// In hex, this is 0xFF4080FF (AABBGGRR in little-endian).
// The corresponding u32 literal is 4282589439u.
let packed_color: u32 = 4282589439u;
// Unpack the u32 back into a vec4<f32> color in the [0.0, 1.0] range.
let unpacked_color: vec4<f32> = unpack4x8unorm(packed_color);
// The result is approximately vec4<f32>(1.0, 0.5019, 0.2509, 1.0).
// This color can now be used in lighting calculations, blended, etc.
Texture Functions
This family of functions is the primary interface for all interactions with texture objects in WGSL. These powerful functions can be broadly divided into three main purposes: reading texel data, writing to special storage textures, and querying a texture's properties.
A fundamental concept within this group is the distinction between sampling and loading.
Sampling (e.g., using
textureSampleand its variants) is a complex operation. It uses asamplerobject to apply filtering (like linear interpolation), wrapping (like repeating), and automatic mipmap selection, all based on floating-point coordinates (typically UVs).Loading (using
textureLoad) is a direct, unfiltered read. It uses integer texel coordinates to fetch the exact data from a specific location in the texture's memory, bypassing the sampler entirely.
Additionally, this category includes functions to query a texture's metadata (like its dimensions and layer count), as well as highly specialized functions for advanced techniques like custom shadow filtering and writing output from compute shaders.
textureDimensions
Returns the dimensions (width, height, depth) of a texture in texels. This function has several overloads depending on the texture's type and whether you are querying the base size or the size of a specific mipmap level.
W3C Specification: textureDimensions
Overloads: 1D Textures
These overloads return a single u32 representing the width of a 1D texture.
Signature
textureDimensions(t: texture_1d<T>) -> u32textureDimensions(t: texture_storage_1d<F,A>) -> u32textureDimensions(t: texture_1d<T>, level: u32) -> u32T: The 1D texture type, can betexture_1d<ST>ortexture_storage_1d<F,A>.ST: The stored texel type for a sampled texture (f32,i32, oru32).F: The texel format for a storage texture (e.g.,rgba8unorm).A: The access mode for a storage texture (read,write, orread_write).level/L: An optionali32oru32specifying the mipmap level to query.
Description
Returns the width, in texels, of the 1D texture. If the optional level parameter is provided, it returns the width of that specific mipmap level; otherwise, it returns the width of the base level (level 0).
Use Case
@group(0) @binding(0) var my_texture: texture_1d<f32>;
let width_level_0 = textureDimensions(my_texture);
let width_level_1 = textureDimensions(my_texture, 1u); // Half of level 0
Overloads: 2D Textures
These overloads return a vec2<u32> representing the width and height of a 2D texture, texture array, or cube map face.
Signature
textureDimensions(t: T) -> vec2<u32>textureDimensions(t: T, level: u32) -> vec2<u32>T: A 2D-based texture type (see list below).level/L: An optionali32oru32specifying the mipmap level.
T can be one of many 2D-based texture types:
texture_2d<ST>,texture_2d_array<ST>texture_cube<ST>,texture_cube_array<ST>texture_multisampled_2d<ST>texture_depth_2d,texture_depth_2d_arraytexture_depth_cube,texture_depth_cube_arraytexture_depth_multisampled_2dtexture_storage_2d<F,A>,texture_storage_2d_array<F,A>texture_external
Description
Returns the width and height, in texels, of the 2D texture. If the optional level parameter is provided, it returns the dimensions of that specific mipmap level; otherwise, it returns the dimensions of the base level (level 0). For cube maps, this returns the dimensions of a single face (which are always square).
Use Case
The most common use is to convert pixel coordinates into UV coordinates ([0.0, 1.0]), which is essential for screen-space effects.
@group(0) @binding(0) var screen_texture: texture_2d<f32>;
// `in.position` is the fragment's pixel coordinate (e.g., from 0 to 1920).
let screen_coords = vec2<f32>(in.position.xy);
// Get the full size of the texture.
let tex_dims = vec2<f32>(textureDimensions(screen_texture));
// Divide the pixel coordinate by the total size to get the UV coordinate.
let screen_uv = screen_coords / tex_dims;
Overloads: 3D Textures
These overloads return a vec3<u32> representing the width, height, and depth of a 3D texture.
Signature
textureDimensions(t: texture_3d<T>) -> vec3<u32>textureDimensions(t: texture_storage_3d<F,A>) -> vec3<u32>textureDimensions(t: texture_3d<T>, level: u32) -> vec3<u32>T: The 3D texture type, can betexture_3d<ST>ortexture_storage_3d<F,A>.ST: The stored texel type for a sampled texture (f32,i32, oru32).F: The texel format for a storage texture (e.g.,rgba8unorm).A: The access mode for a storage texture (read,write, orread_write).level/L: An optionali32oru32specifying the mipmap level.
Description
Returns the width, height, and depth, in texels, of the 3D texture. If the optional level parameter is provided, it returns the dimensions of that specific mipmap level; otherwise, it returns the dimensions of the base level (level 0).
Use Case
@group(0) @binding(0) var my_3d_texture: texture_3d<f32>;
// Get the volume dimensions of the base mip level.
let volume_dims = textureDimensions(my_3d_texture);
Parameter Details & Behavior
levelParameter: When you provide thelevelparameter, you are querying the size of a specific mipmap. If the providedlevelis outside the valid range of mipmap levels for the texture, the function may return an indeterminate value.What is Returned: The function returns the logical dimensions of a single image in the texture. It does not include the number of layers in a texture array or the number of samples in a multisampled texture. For those, you must use
textureNumLayersandtextureNumSamplesrespectively.
textureGather
Samples the four texels that would be used for bilinear filtering and returns their values directly in a vec4. This is a low-level operation primarily used for implementing custom texture filtering, most notably Percentage-Closer Filtering (PCF) for high-quality shadow mapping.
Instead of blending the four texels into a single color like textureSample, textureGather gives you the raw data from a specific channel of those four texels, allowing you to perform your own manual interpolation or comparison.
W3C Specification: textureGather
Overloads: Color/Data Textures (component required)
This family of overloads is used for standard color or data textures (texture_2d, texture_cube, etc.).
Signature
textureGather(component: C, t: TextureType, s: sampler, coords: VecType, ...)
component: A compile-time constanti32oru32(0, 1, 2, or 3) that specifies which channel to gather from the four source texels (0=R, 1=G, 2=B, 3=A).t: A color/data texture type, such astexture_2d<f32>,texture_2d_array<u32>, ortexture_cube<i32>.s: Asampler.coords: The coordinates for sampling (vec2<f32>for 2D,vec3<f32>for cubes)....: Optional or required parameters depending on the texture type, such as
array_indexoroffset.
Description
Finds the 2x2 block of texels that surround the given coords. It then extracts the value of the specified component from each of these four texels and returns them in a vec4.
For example, if component is 1 (the Green channel):
The result's
.xcomponent will be the Green value of the top-left texel.The result's
.ycomponent will be the Green value of the top-right texel.The result's
.zcomponent will be the Green value of the bottom-left texel.The result's
.wcomponent will be the Green value of the bottom-right texel.
Use Case
Used for custom filtering algorithms that need to operate on a single channel of neighboring texels.
@group(0) @binding(0) var my_texture: texture_2d<f32>;
@group(0) @binding(1) var my_sampler: sampler;
// Gather the values from the RED channel (component 0) of the four
// texels surrounding the given UV coordinate.
let red_values: vec4<f32> = textureGather(0, my_texture, my_sampler, in.uv);
// Now we have the raw red values and can perform custom logic,
// for example, finding the average red value in the 2x2 block.
let avg_red = (red_values.x + red_values.y + red_values.z + red_values.w) * 0.25;
Overloads: Depth Textures
This family of overloads is used specifically for depth textures.
Signature
textureGather(t: DepthTextureType, s: sampler, coords: VecType, ...)
t: A depth texture type, such astexture_depth_2dortexture_depth_cube_array.s: Asampler.coords: The coordinates for sampling (vec2<f32>for 2D,vec3<f32>for cubes)....: Optional or required parameters depending on the texture type, such as
array_indexoroffset.
Description
Finds the 2x2 block of texels that surround the given coords in a depth texture. It returns a vec4<f32> where each component contains the raw depth value from one of the four texels.
Unlike the color version, there is no component parameter, as depth textures only have a single channel.
Use Case
This is the cornerstone of custom shadow mapping algorithms like PCF.
@group(0) @binding(0) var shadow_map: texture_depth_2d;
@group(0) @binding(1) var shadow_sampler: sampler;
let fragment_depth: f32 = ...; // The current fragment's depth from the light's perspective
// Gather the four closest depth values from the shadow map.
let shadow_depths: vec4<f32> = textureGather(shadow_map, shadow_sampler, in.shadow_coords);
// Manually compare the current fragment's depth to the four stored depths.
// The result of `>` is a boolean vector.
let comparison = vec4<f32>(fragment_depth > shadow_depths);
// Average the results to get a soft shadow value (0.0 to 1.0).
let shadow_factor = dot(comparison, vec4<f32>(0.25));
textureGatherCompare
Performs a hardware-accelerated depth comparison against a 2x2 block of texels from a depth texture. This is a highly specialized and optimized function designed almost exclusively for implementing high-quality, soft shadows with Percentage-Closer Filtering (PCF).
It combines the "gather" step of textureGather and a subsequent comparison step into a single, efficient hardware operation.
Signatures
The function is overloaded for various depth texture types.
textureGatherCompare(t: texture_depth_2d, s: sampler_comparison, coords: vec2<f32>, depth_ref: f32, ...)textureGatherCompare(t: texture_depth_2d_array, s: sampler_comparison, coords: vec2<f32>, array_index: u32, depth_ref: f32, ...)textureGatherCompare(t: texture_depth_cube, s: sampler_comparison, coords: vec3<f32>, depth_ref: f32, ...)textureGatherCompare(t: texture_depth_cube_array, s: sampler_comparison, coords: vec3<f32>, array_index: u32, depth_ref: f32, ...)t: The depth texture to sample.s: Asampler_comparison. This specialsamplertype is configured on the CPU side with a comparison function (e.g.,LessEqual,Greater).coords: The coordinates for sampling (vec2<f32>for 2D,vec3<f32>for cubes).depth_ref: The depth value to compare against the values in the texture....: Optional offset can be provided for some texture types.
Description
This function executes a complete, optimized shadow-PCF step in hardware:
It identifies the 2x2 quad of texels in the depth texture that surround the given
coords.For each of those four texels, it performs a comparison between the provided
depth_refand the depth value stored in the texel.The specific comparison operation (e.g.,
depth_ref <= texel_depth) is determined by thecompareproperty of thesampler_comparisonobject, which is set in your Rust code.It returns a
vec4<f32>where each component is either1.0(if the comparison passed) or0.0(if it failed).
The result is a vector containing four binary pass/fail results, ready to be averaged for a soft shadow effect.
W3C Specification: textureGatherCompare
Use Case
The sole purpose of this function is to implement PCF for soft shadows efficiently. It is significantly faster than using a manual textureGather followed by a comparison in the shader code.
@group(0) @binding(0) var shadow_map: texture_depth_2d;
@group(0) @binding(1) var shadow_sampler: sampler_comparison;
// The current fragment's depth from the light's perspective.
let fragment_depth_ref: f32 = ...;
// The hardware performs the gather and four comparisons in one go.
// The result is already a vector of 0.0s and 1.0s.
let comparison_results: vec4<f32> = textureGatherCompare(
shadow_map,
shadow_sampler,
in.shadow_coords,
fragment_depth_ref
);
// Average the four pass/fail results to get a smooth shadow value.
// If all four tests passed (fragment is not in shadow), the result is 1.0.
// If two passed, the result is 0.5 (a penumbra).
// If none passed, the result is 0.0 (fully shadowed).
let shadow_factor = dot(comparison_results, vec4<f32>(0.25));
textureLoad
Fetches the unfiltered data for a single texel from a texture using integer texel coordinates.
This function is fundamentally different from textureSample:
It uses integer coordinates (e.g.,
(10, 20)) to specify an exact texel, not floating-point UVs ([0.0, 1.0]).It does not use a sampler. There is no filtering (bilinear, trilinear) or wrapping.
For mipmapped textures, you must explicitly provide the mipmap level to read from.
It is a direct, unfiltered read from a specific location in the texture's memory.
W3C Specification: textureLoad
Overloads: Standard (Mipmapped) Textures
This is the most common family of overloads for reading from standard 1D, 2D, 3D, and array textures.
Signatures
textureLoad(t: texture_1d<ST>, coords: u32, level: u32) -> vec4<ST>textureLoad(t: texture_2d<ST>, coords: vec2<u32>, level: u32) -> vec4<ST>textureLoad(t: texture_2d_array<ST>, coords: vec2<u32>, array_index: u32, level: u32) -> vec4<ST>textureLoad(t: texture_3d<ST>, coords: vec3<u32>, level: u32) -> vec4<ST>ST: The stored texel type (f32,i32, oru32).coords/C: The integer texel coordinates. Can bei32oru32(or a vector of them).level/L: The integer mipmap level. Can bei32oru32.array_index/A: The integer array layer. Can bei32oru32.
Description
Reads the texel data from the specified integer coords within a specific array_index (for array textures) and mipmap level. The coords are 0-indexed from the top-left corner of the specified mip level.
Use Case
Ideal for situations where you need precise, unfiltered data, such as reading from a data texture or manually implementing a custom filtering algorithm.
@group(0) @binding(0) var data_texture: texture_2d<f32>;
// Read the exact texel value from coordinate (5, 10) on the second mipmap level.
// There is no blending or interpolation with neighboring texels.
let texel_data = textureLoad(data_texture, vec2<u32>(5u, 10u), 1u);
Overloads: Multisampled Textures
This family of overloads reads from multisampled textures, which store multiple data samples per texel.
Signatures
textureLoad(t: texture_multisampled_2d<ST>, coords: vec2<u32>, sample_index: u32) -> vec4<ST>textureLoad(t: texture_depth_multisampled_2d, coords: vec2<u32>, sample_index: u32) -> f32ST: The stored texel type (f32,i32, oru32).coords/C: The integer texel coordinates. Can bei32oru32(or a vector of them).sample_index/S: The integer sample index. Can bei32oru32.
Description
Reads the data from a single sample_index within the texel at the specified integer coords. Multisampled textures do not have mipmaps, so there is no level parameter.
Use Case
Used to manually "resolve" a multisampled texture, for example, by averaging all the samples for a given pixel to produce a final anti-aliased color.
@group(0) @binding(0) var msaa_texture: texture_multisampled_2d<f32>;
let texel_coords = vec2<u32>(in.position.xy);
let num_samples = textureNumSamples(msaa_texture);
var final_color = vec4<f32>(0.0);
// Loop through all samples for the current pixel and average them.
for (var i: u32 = 0u; i < num_samples; i = i + 1u) {
final_color = final_color + textureLoad(msaa_texture, texel_coords, i);
}
final_color = final_color / f32(num_samples);
Overloads: Depth Textures
This family of overloads reads a single depth value from a depth texture.
Signatures
textureLoad(t: texture_depth_2d, coords: vec2<u32>, level: u32) -> f32textureLoad(t: texture_depth_2d_array, coords: vec2<u32>, array_index: u32, level: u32) -> f32coords/C: The integer texel coordinates. Can bei32oru32(or a vector of them).level/L: The integer mipmap level. Can bei32oru32.array_index/A: The integer array layer. Can bei32oru32.
Description
Reads the single depth value from the specified integer coords and mipmap level. The key difference is that the return type is a single f32, not a vec4.
Use Case
Used to fetch a precise, unfiltered depth value, which is useful in some custom shadow or post-processing techniques.
@group(0) @binding(0) var depth_texture: texture_depth_2d;
// Get the exact depth value stored at texel (100, 200) on mip level 0.
let precise_depth = textureLoad(depth_texture, vec2<u32>(100u, 200u), 0u);
Behavior with Invalid Coordinates
If the provided coords, array_index, level, or sample_index are outside the valid bounds of the texture, the address is considered invalid. In this case, the function's return value is defined by the GPU implementation and may be one of the following:
The data for a different texel that is within the texture's bounds.
A vector of zeros
(0,0,0,0)or(0,0,0,1)for color/data textures.0.0for depth textures.
textureNumLayers
Signature
textureNumLayers(t: T) -> u32
t: The input arrayed texture.
T must be an arrayed texture type:
texture_2d_array<ST>texture_cube_array<ST>texture_depth_2d_arraytexture_depth_cube_arraytexture_storage_2d_array<F,A>
Description
Returns the number of layers in an arrayed texture.
The meaning of a "layer" depends on the texture type:
For a
texture_2d_array, it returns the number of 2D texture layers in the array.For a
texture_cube_array, it returns the number of cubes in the array (where each cube itself consists of 6 faces/layers).
This function is distinct from textureDimensions, which returns the width and height of a single layer.
W3C Specification: textureNumLayers
Use Case
Used to query the size of a texture array, which is often necessary for looping through all the layers or for clamping an array index to be within a valid range.
@group(0) @binding(0) var my_texture_array: texture_2d_array<f32>;
// Get the total number of layers available in the texture array.
let total_layers = textureNumLayers(my_texture_array);
// This can be used as the upper bound for a loop.
for (var i: u32 = 0u; i < total_layers; i = i + 1u) {
// ... sample from layer `i` of the texture array ...
let color = textureSample(my_texture_array, my_sampler, in.uv, i);
}
// Or it can be used to safely clamp a runtime index.
let some_runtime_index: u32 = ...;
let safe_index = min(some_runtime_index, total_layers - 1u);
textureNumLevels
Signature
textureNumLevels(t: T) -> u32
t: The input mipmapped texture.
T must be a texture type that supports mipmaps:
texture_1d<ST>texture_2d<ST>,texture_2d_array<ST>texture_3d<ST>texture_cube<ST>,texture_cube_array<ST>texture_depth_2d,texture_depth_2d_arraytexture_depth_cube,texture_depth_cube_array
Description
Returns the total number of mipmap levels available in the texture. Level 0 is the original, full-resolution texture, and subsequent levels are progressively smaller.
This function does not work with texture types that do not have mipmaps, such as texture_storage_* or texture_multisampled_*.
W3C Specification: textureNumLevels
Use Case
This function is useful for algorithms that need to manually iterate through mipmap levels, such as in some advanced post-processing or image analysis techniques.
@group(0) @binding(0) var my_texture: texture_2d<f32>;
// Get the total number of mip levels.
let mip_count = textureNumLevels(my_texture);
// This can be used as the upper bound for a loop that processes
// each mipmap level of a texture.
var total_red_value = 0.0;
for (var i: u32 = 0u; i < mip_count; i = i + 1u) {
// Load a texel from the center of the current mip level.
let mip_dims = textureDimensions(my_texture, i);
let center_coord = mip_dims / 2u;
total_red_value = total_red_value + textureLoad(my_texture, center_coord, i).r;
}
textureNumSamples
Signature
textureNumSamples(t: T) -> u32
t: The input multisampled texture.Tmust be a multisampled texture type:texture_multisampled_2d<ST>texture_depth_multisampled_2d
Description
Returns the number of samples stored for each texel in a multisampled texture. The result is typically a small power of two, such as 2, 4, or 8, depending on the MSAA (Multi-Sample Anti-Aliasing) level configured when the texture was created.
This function only works with multisampled textures. Attempting to use it on a non-multisampled texture will result in a shader compilation error.
W3C Specification: textureNumSamples
Use Case
This function is essential for manually resolving a multisampled texture. "Resolving" is the process of combining the multiple samples within each texel into a single, final anti-aliased value.
@group(0) @binding(0) var msaa_texture: texture_multisampled_2d<f32>;
// The integer coordinates of the texel to resolve.
let texel_coords = vec2<u32>(in.position.xy);
// Get the number of samples per texel (e.g., 4).
let sample_count = textureNumSamples(msaa_texture);
// A variable to accumulate the color from all samples.
var resolved_color = vec4<f32>(0.0);
// Loop through each sample, load its color, and add it to the total.
for (var i: u32 = 0u; i < sample_count; i = i + 1u) {
resolved_color = resolved_color + textureLoad(msaa_texture, texel_coords, i);
}
// Divide by the number of samples to get the final averaged color.
resolved_color = resolved_color / f32(sample_count);
textureSample
The primary and most common function for reading from a texture. It uses a sampler to apply filtering, wrapping, and automatic mipmap selection based on floating-point coordinates.
This function must be called in uniform control flow (i.e., not inside if statements or loops that depend on per-fragment data). If it is used inside a loop with a variable number of iterations or an if statement with a per-fragment condition, the result is an indeterminate value.
W3C Specification: textureSample
Overloads: Color/Data Textures
Signature
textureSample(t: texture_1d<f32>, s: sampler, coords: f32) -> vec4<f32>textureSample(t: texture_2d<f32>, s: sampler, coords: vec2<f32>) -> vec4<f32>textureSample(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, offset: vec2<i32>) -> vec4<f32>textureSample(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A) -> vec4<f32>textureSample(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, offset: vec2<i32>) -> vec4<f32>textureSample(t: T, s: sampler, coords: vec3<f32>) -> vec4<f32> where T is texture_3d<f32> or texture_cube<f32>textureSample(t: texture_3d<f32>, s: sampler, coords: vec3<f32>, offset: vec3<i32>) -> vec4<f32>textureSample(t: texture_cube_array<f32>, s: sampler, coords: vec3<f32>, array_index: A) -> vec4<f32>Aisi32oru32.t: The texture to be sampled.s: Asamplerobject that defines filtering and wrapping rules.coords: The floating-point coordinates for sampling (e.g., UVs for atexture_2d, or a direction vector for atexture_cube).array_index: The 0-indexed layer of the texture array to sample from.offset: An optional, compile-time constant integer vector that applies a texel offset before sampling.
Description
Samples the texture t using the filtering and wrapping rules defined in the sampler s. The GPU automatically calculates the appropriate mipmap level based on the screen-space derivatives of the texture coords. The return value is a filtered vec4<f32> representing the color or data at that location.
Use Case
This is the standard function for applying a texture to a 3D model.
@group(1) @binding(0) var my_texture: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;
// The `in.uv` coordinate is a floating-point value from 0.0 to 1.0.
// The sampler handles blending between texels if coords fall between them.
let albedo_color = textureSample(my_texture, my_sampler, in.uv);
Overloads: Depth Textures
These overloads sample from a depth texture and return a single f32 depth value.
Signature
textureSample(t: texture_depth_2d, s: sampler, coords: vec2<f32>) -> f32textureSample(t: texture_depth_2d, s: sampler, coords: vec2<f32>, offset: vec2<i32>) -> f32textureSample(t: texture_depth_2d_array, s: sampler, coords: vec2<f32>, array_index: A) -> f32textureSample(t: texture_depth_2d_array, s: sampler, coords: vec2<f32>, array_index: A, offset: vec2<i32>) -> f32textureSample(t: texture_depth_cube, s: sampler, coords: vec3<f32>) -> f32textureSample(t: texture_depth_cube_array, s: sampler, coords: vec3<f32>, array_index: A) -> f32Aisi32oru32.t: The depth texture to be sampled.s: Asamplerobject.coords: The floating-point coordinates for sampling.array_index: The 0-indexed layer of the texture array to sample from.offset: An optional, compile-time constant integer vector that applies a texel offset before sampling.
Description
Samples the depth texture t using the sampler s. The return value is a single filtered f32 representing the depth at that location.
Use Case
Used to read a filtered depth value, for example, to visualize a shadow map or for post-processing effects. For actual shadow comparisons, textureSampleCompare is usually preferred.
@group(1) @binding(0) var shadow_map: texture_depth_2d;
@group(1) @binding(1) var my_sampler: sampler;
// Read the filtered depth value from the shadow map.
let sampled_depth = textureSample(shadow_map, my_sampler, in.shadow_coords);
// Visualize the depth map by converting the depth value to a grayscale color.
let shadow_map_color = vec4<f32>(vec3(sampled_depth), 1.0);
Parameters and Behavior
sampler: An object defined in Rust code that controls how the texture is read. It specifies the filtering mode (e.g., nearest, linear), the wrapping mode (e.g., repeat, clamp-to-edge), and mipmapping behavior.coords: The floating-point coordinates used for sampling. For 2D textures, this is typically a UV coordinate in the[0.0, 1.0]range. For cube maps, this is a 3D direction vector.offset: An optionalvec2<i32>orvec3<i32>parameter that must be a compile-time constant. It applies an integer texel offset to the coordinates before sampling.Uniform Control Flow: This function (and its variants like
textureSampleBias) must be called in uniform control flow. If it is used inside a loop with a variable number of iterations or anifstatement with a per-fragment condition, the result is an indeterminate value. For sampling in such situations, you must usetextureSampleLevel.
textureSampleBaseClampToEdge
Signature
textureSampleBaseClampToEdge(t: T, s: sampler, coords: vec2<f32>) -> vec4<f32>
t: The texture to sample.T: Must betexture_2d<f32>ortexture_external.s: Asampler.coords: Thevec2<f32>coordinates for sampling.
Description
Samples a 2D texture, but with two specific and non-overridable behaviors:
It always samples from the base mipmap level (level 0).
It always uses a
clamp-to-edgewrapping mode for the coordinates, regardless of the wrapping mode configured in thesampler.
The primary reason this function exists is to be the only sampling function that can operate on the special texture_external type. An external texture is a handle to an image produced by an external system, such as a video decoder, and has special restrictions. It cannot be sampled with the standard textureSample function.
While it can also be used on a regular texture_2d<f32>, its behavior in that case is equivalent to using textureSampleLevel(..., 0.0) with a sampler configured for clamp-to-edge.
W3C Specification: textureSampleBaseClampToEdge
Use Case
The exclusive use case for this function is sampling from a texture_external, which is typically used for displaying video frames in a WebGPU context.
// This binding would be provided by the browser when working with a video element.
@group(0) @binding(0) var video_frame: texture_external;
@group(0) @binding(1) var video_sampler: sampler;
// Use this special function to sample the video frame texture.
// Using `textureSample` here would result in a shader compilation error.
let video_color = textureSampleBaseClampToEdge(video_frame, video_sampler, in.uv);
textureSampleBias
Signatures
textureSampleBias(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, bias: f32) -> vec4<f32>textureSampleBias(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, bias: f32, offset: vec2<i32>) -> vec4<f32>textureSampleBias(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, bias: f32) -> vec4<f32>textureSampleBias(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, bias: f32, offset: vec2<i32>) -> vec4<f32>textureSampleBias(t: T, s: sampler, coords: vec3<f32>, bias: f32) -> vec4<f32> where T is texture_3d<f32> or texture_cube<f32>textureSampleBias(t: texture_3d<f32>, s: sampler, coords: vec3<f32>, bias: f32, offset: vec3<i32>) -> vec4<f32>textureSampleBias(t: texture_cube_array<f32>, s: sampler, coords: vec3<f32>, array_index: A, bias: f32) -> vec4<f32>Aisi32oru32.t: The texture to be sampled.s: A sampler object that defines filtering and wrapping rules.coords: The floating-point coordinates for sampling (e.g., UVs for atexture_2d, or a direction vector for atexture_cube).bias: A floating-point value that provides a manual adjustment to the calculated mipmap level.array_index: The 0-indexed layer of the texture array to sample from.offset: An optional, compile-time constant integer vector (vec2<i32>orvec3<i32>) that applies a texel offset before sampling.
Description
Samples a texture with a manual adjustment to the mipmap level selection. This function is identical to textureSample but adds a bias parameter that allows you to make the texture appear sharper or blurrier than it normally would. The function first calculates the ideal mipmap level based on the texture coordinate derivatives, then adds the floating-point bias value to it before sampling.
A positive bias (
> 0.0) forces the GPU to sample from a smaller, more distant mipmap level, resulting in a blurrier appearance.A negative bias (
< 0.0) forces the GPU to sample from a larger, closer mipmap level, resulting in a sharper appearance (which can also lead to aliasing).
This function must be called in uniform control flow (i.e., not inside if statements or loops that depend on per-fragment data). If it is used inside a loop with a variable number of iterations or an if statement with a per-fragment condition, the result is an indeterminate value.
W3C Specification: textureSampleBias
Use Case
textureSampleBias is used for effects that require programmatic control over texture sharpness. A classic example is faking a depth-of-field effect or creating a "blur in" transition.
@group(1) @binding(0) var scene_texture: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;
let distance_from_camera: f32 = ...;
let focus_point = 10.0;
// Calculate a blur amount based on distance from the camera's focus point.
// Objects far from the focus point will get a larger positive bias.
let blur_bias = abs(distance_from_camera - focus_point) * 0.5;
// Sample the scene texture with the calculated bias.
// Distant or very close objects will appear blurry, while objects near
// the focus point will be sharp (bias ≈ 0.0).
let final_color = textureSampleBias(scene_texture, my_sampler, in.uv, blur_bias);
textureSampleCompare
Signature
textureSampleCompare(t: texture_depth_2d, s: sampler_comparison, coords: vec2<f32>, depth_ref: f32) -> f32textureSampleCompare(t: texture_depth_2d, s: sampler_comparison, coords: vec2<f32>, depth_ref: f32, offset: vec2<i32>) -> f32textureSampleCompare(t: texture_depth_2d_array, s: sampler_comparison, coords: vec2<f32>, array_index: A, depth_ref: f32) -> f32textureSampleCompare(t: texture_depth_2d_array, s: sampler_comparison, coords: vec2<f32>, array_index: A, depth_ref: f32, offset: vec2<i32>) -> f32textureSampleCompare(t: texture_depth_cube, s: sampler_comparison, coords: vec3<f32>, depth_ref: f32) -> f32textureSampleCompare(t: texture_depth_cube_array, s: sampler_comparison, coords: vec3<f32>, array_index: A, depth_ref: f32) -> f32Aisi32oru32.t: A depth texture to sample from.s: Asampler_comparisonobject. This special sampler type is configured on the CPU side with a specific comparison function (e.g.,LessEqual,Greater).coords: The floating-point coordinates for sampling.depth_ref: The depth value to compare against the values in the texture.array_index: The 0-indexed layer of the texture array to sample from.offset: An optional, compile-time constant integer vector (vec2<i32>) that applies a texel offset before sampling.
Description
Performs a hardware-accelerated depth comparison against a sampled value from a depth texture. This is a highly specialized function designed almost exclusively for efficient shadow mapping.
The function samples a depth texture and compares the sampled depth value(s) against the provided depth_ref. The comparison operation (e.g., depth_ref <= texel_depth) is determined by the compare property of the sampler_comparison object.
The return value is a float in the range [0.0, 1.0]. If the sampler's filter mode is linear, the hardware will sample the four nearest texels, perform the comparison for each, and return a bilinearly filtered average of the four 0.0 or 1.0 results. If the filter mode is nearest, it compares against a single texel and returns either 0.0 or 1.0.
This function must be called in uniform control flow (i.e., not inside if statements or loops that depend on per-fragment data). If it is used inside a loop with a variable number of iterations or an if statement with a per-fragment condition, the result is an indeterminate value.
W3C Specification: textureSampleCompare
Use Case
This is the standard, high-performance way to implement Percentage-Closer Filtering (PCF) for soft shadows, as the hardware handles the sampling, comparison, and filtering in a single, highly optimized step.
@group(1) @binding(0) var shadow_map: texture_depth_2d;
@group(1) @binding(1) var shadow_sampler: sampler_comparison;
// The current fragment's depth from the light's perspective.
let fragment_depth = ...;
// The hardware performs the sampling, comparison, and (if configured)
// the filtering of the comparison results all in one operation.
// The result is a shadow factor (0.0 = shadowed, 1.0 = lit).
let shadow_factor = textureSampleCompare(
shadow_map,
shadow_sampler,
in.shadow_coords,
fragment_depth
);
textureSampleCompareLevel
Signature
textureSampleCompareLevel(t: texture_depth_2d, s: sampler_comparison, coords: vec2<f32>, depth_ref: f32) -> f32textureSampleCompareLevel(t: texture_depth_2d, s: sampler_comparison, coords: vec2<f32>, depth_ref: f32, offset: vec2<i32>) -> f32textureSampleCompareLevel(t: texture_depth_2d_array, s: sampler_comparison, coords: vec2<f32>, array_index: A, depth_ref: f32) -> f32textureSampleCompareLevel(t: texture_depth_2d_array, s: sampler_comparison, coords: vec2<f32>, array_index: A, depth_ref: f32, offset: vec2<i32>) -> f32textureSampleCompareLevel(t: texture_depth_cube, s: sampler_comparison, coords: vec3<f32>, depth_ref: f32) -> f32textureSampleCompareLevel(t: texture_depth_cube_array, s: sampler_comparison, coords: vec3<f32>, array_index: A, depth_ref: f32) -> f32Aisi32oru32.t: A depth texture to sample from.s: A sampler_comparison object.coords: The floating-point coordinates for sampling.depth_ref: The depth value to compare against the values in the texture.array_index: The 0-indexed layer of the texture array to sample from.offset: An optional, compile-time constant integer vector (vec2<i32>) that applies a texel offset before sampling.
Description
Performs a hardware-accelerated depth comparison against a depth texture, similar to textureSampleCompare, but with several key differences that make it safe to use in more contexts.
The main distinctions are:
It always samples texels from the base mipmap level (level 0).
It does not compute derivatives for mipmap selection.
Because of this, it is not required to be invoked in uniform control flow.
It can be used in any shader stage, including vertex and compute shaders.
The function samples the nearest texel(s) on mip level 0, compares them against depth_ref using the comparison function from the sampler_comparison, and returns a filtered average of the pass/fail results in the range [0.0, 1.0].
W3C Specification: textureSampleCompareLevel
Use Case
This function is required for any shadow mapping algorithm that performs depth comparisons inside loops with variable iteration counts or if statements with per-fragment conditions, as textureSampleCompare would be invalid in those scenarios.
@group(1) @binding(0) var shadow_map: texture_depth_2d;
@group(1) @binding(1) var shadow_sampler: sampler_comparison;
let fragment_depth = ...;
let num_samples = get_sample_count_for_this_pixel(); // A non-uniform value
var total_shadow = 0.0;
// This loop has a variable number of iterations, so it is non-uniform control flow.
// We MUST use textureSampleCompareLevel inside it.
for (var i = 0; i < num_samples; i = i + 1) {
let offset = get_sample_offset(i); // Get a pre-defined offset
total_shadow = total_shadow + textureSampleCompareLevel(
shadow_map,
shadow_sampler,
in.shadow_coords + offset,
fragment_depth
);
}
let shadow_factor = total_shadow / f32(num_samples);
textureSampleGrad
Signature
textureSampleGrad(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, ddx: vec2<f32>, ddy: vec2<f32>) -> vec4<f32>textureSampleGrad(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, ddx: vec2<f32>, ddy: vec2<f32>, offset: vec2<i32>) -> vec4<f32>textureSampleGrad(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, ddx: vec2<f32>, ddy: vec2<f32>) -> vec4<f32>textureSampleGrad(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, ddx: vec2<f32>, ddy: vec2<f32>, offset: vec2<i32>) -> vec4<f32>textureSampleGrad(t: T, s: sampler, coords: vec3<f32>, ddx: vec3<f32>, ddy: vec3<f32>) -> vec4<f32> where T is texture_3d<f32> or texture_cube<f32>textureSampleGrad(t: texture_3d<f32>, s: sampler, coords: vec3<f32>, ddx: vec3<f32>, ddy: vec3<f32>, offset: vec3<i32>) -> vec4<f32>textureSampleGrad(t: texture_cube_array<f32>, s: sampler, coords: vec3<f32>, array_index: A, ddx: vec3<f32>, ddy: vec3<f32>) -> vec4<f32>Aisi32oru32.t: The texture to be sampled.s: Asamplerobject.coords: The floating-point coordinates for sampling.ddx,ddy: The explicit gradients (derivatives) of the texture coordinates with respect to the horizontal (x) and vertical (y) screen axes.array_index: The 0-indexed layer of the texture array to sample from.offset: An optional, compile-time constant integer vector that applies a texel offset before sampling.
Description
Samples a texture using explicitly provided gradients, giving you manual control over mipmap level selection.
Normally, textureSample automatically calculates the rate of change of the coords to determine the correct mip level. However, this automatic calculation fails if the coords are manipulated procedurally within the shader (e.g., inside a loop or a complex if statement). textureSampleGrad solves this by letting you provide the gradients (ddx and ddy) yourself. The GPU then uses these manual gradients to perform a correct mipmap selection.
This is an advanced function required for specific rendering techniques.
W3C Specification: textureSampleGrad
Use Case
The canonical use case is in advanced materials like Parallax Occlusion Mapping (POM), where the texture coordinates are modified inside a raymarching loop. To get correct mipmapping, you must calculate the derivatives before the loop and then pass them into textureSampleGrad inside the loop.
@group(1) @binding(0) var my_texture: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;
// 1. Calculate the gradients on the original, unmodified UVs.
let uv_ddx = dpdx(in.uv);
let uv_ddy = dpdy(in.uv);
var final_uv = in.uv;
// 2. Perform some procedural UV manipulation (e.g., a raymarching loop for POM).
// This loop makes the final_uv value non-uniform, breaking automatic derivatives.
for (var i = 0; i < 16; i = i + 1) {
// ... complex logic that modifies final_uv ...
}
// 3. Sample the texture using the modified UVs, but provide the original,
// correct gradients. This ensures the texture is sampled with the correct mip level.
let final_color = textureSampleGrad(my_texture, my_sampler, final_uv, uv_ddx, uv_ddy);
textureSampleLevel
Samples a texture from a single, explicitly specified mipmap level, bypassing the GPU's automatic level-of-detail selection.
This function has one critical advantage over textureSample: because it does not implicitly calculate derivatives, it is not required to be invoked in uniform control flow. This makes it the correct and safe choice for sampling textures inside loops with variable iteration counts or if statements with per-fragment conditions.
W3C Specification: textureSampleLevel
Overloads: Color/Data Textures
These overloads sample from standard color or data textures.
Signature
textureSampleLevel(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, level: f32) -> vec4<f32>textureSampleLevel(t: texture_2d<f32>, s: sampler, coords: vec2<f32>, level: f32, offset: vec2<i32>) -> vec4<f32>textureSampleLevel(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, level: f32) -> vec4<f32>textureSampleLevel(t: texture_2d_array<f32>, s: sampler, coords: vec2<f32>, array_index: A, level: f32, offset: vec2<i32>) -> vec4<f32>textureSampleLevel(t: T, s: sampler, coords: vec3<f32>, level: f32) -> vec4<f32> where T is texture_3d<f32> or texture_cube<f32>textureSampleLevel(t: texture_3d<f32>, s: sampler, coords: vec3<f32>, level: f32, offset: vec3<i32>) -> vec4<f32>textureSampleLevel(t: texture_cube_array<f32>, s: sampler, coords: vec3<f32>, array_index: A, level: f32) -> vec4<f32>Aisi32oru32.t: The texture to be sampled.s: Asamplerobject.coords: The floating-point coordinates for sampling.level: Thef32mipmap level to sample from.array_index: The 0-indexed layer of the texture array.offset: An optional, compile-time constant integer offset.
Description
Samples the texture using the specified sampler rules, but only from the mipmap level indicated by level. Because the level is a floating-point value, the GPU can perform trilinear filtering by blending the results from the two nearest integer mip levels if the sampler is configured to do so.
Use Case
Ideal for effects that need data from a specific level of detail or for sampling inside non-uniform control flow.
@group(1) @binding(0) var scene_texture: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;
// Use Case 1: Getting a blurred version of a texture for a bloom effect.
// Sample from a high mip level (e.g., 4.0) to get a small, blurry image.
let bloom_source = textureSampleLevel(scene_texture, my_sampler, in.uv, 4.0);
// Use Case 2: Sampling inside a non-uniform loop.
var final_color = vec4(0.0);
if (in.world_position.x > 0.0) { // Non-uniform condition
// We MUST use textureSampleLevel here.
final_color = textureSampleLevel(scene_texture, my_sampler, in.uv, 0.0);
}
Overloads: Depth Textures
These overloads sample from a single mip level of a depth texture.
Signature
textureSampleLevel(t: texture_depth_2d, s: sampler, coords: vec2<f32>, level: L) -> f32textureSampleLevel(t: texture_depth_2d, s: sampler, coords: vec2<f32>, level: L, offset: vec2<i32>) -> f32textureSampleLevel(t: texture_depth_2d_array, s: sampler, coords: vec2<f32>, array_index: A, level: L) -> f32textureSampleLevel(t: texture_depth_2d_array, s: sampler, coords: vec2<f32>, array_index: A, level: L, offset: vec2<i32>) -> f32textureSampleLevel(t: texture_depth_cube, s: sampler, coords: vec3<f32>, level: L) -> f32textureSampleLevel(t: texture_depth_cube_array, s: sampler, coords: vec3<f32>, array_index: A, level: L) -> f32AandLarei32oru32.t: The depth texture to be sampled.s: Asamplerobject.coords: The floating-point coordinates for sampling.level: The integeri32oru32mipmap level to sample from.array_index: The 0-indexed layer of the texture array.offset: An optional, compile-time constant integer offset.
Description
Samples the depth texture, returning a single f32 depth value from the specified integer mip level. Unlike the color texture overloads, the level parameter is an integer, so no blending between mip levels is performed.
Use Case
Used to fetch a precise depth value from a specific mip level, often in custom post-processing effects.
@group(1) @binding(0) var depth_texture: texture_depth_2d;
@group(1) @binding(1) var my_sampler: sampler;
// Get the unfiltered depth value from the center of the highest-resolution mip level.
let center_depth = textureSampleLevel(depth_texture, my_sampler, vec2(0.5), 0);
textureStore
Signature
textureStore(t: texture_storage_1d<F,A>, coords: C, value: vec4<CF>)textureStore(t: texture_storage_2d<F,A>, coords: vec2<C>, value: vec4<CF>)textureStore(t: texture_storage_2d_array<F,A>, coords: vec2<C>, array_index: A_idx, value: vec4<CF>)textureStore(t: texture_storage_3d<F,A>, coords: vec3<C>, value: vec4<CF>)t: The storage texture to write to.F: The texel format of the texture, declared in Rust/CPU code (e.g.,rgba8unorm,r32uint).A: The access mode, which must bewriteorread_write.coords/C: The integer texel coordinates (i32oru32).array_index/A_idx: The integer array layer (i32oru32).value: Avec4containing the data to write.CF: The required "channel format" for the value parameter. This type depends on the texture's texel formatF. For example:If
Fisrgba8unorm,CFmust bef32.If
Fisr32uint,CFmust beu32.If
Fisrg32sint,CFmust bei32.
Description
Writes a vec4 value to a specific texel in a storage texture. This is an output-only function that does not return a value. It is fundamentally different from sampling functions: it only works with texture_storage_* types, uses integer coordinates instead of UVs, does not use a sampler, and requires that the texture's access mode be write or read_write.
The value is written to the texel specified by the integer coords. If the coordinates are outside the bounds of the texture, the operation is simply discarded and has no effect. For single-channel formats (like r32float), only the first component of the value (.x) is used.
W3C Specification: textureStore
Use Case
textureStore is the primary way to write output from a compute shader. It is commonly used for image processing, simulations, or any general-purpose GPU computation where the result is an image or data grid.
// A simple compute shader that inverts the colors of an image.
// Input texture (read-only)
@group(0) @binding(0) var input_texture: texture_2d<f32>;
// Output texture (write-only)
@group(0) @binding(1) var output_texture: texture_storage_2d<rgba8unorm, write>;
@compute @workgroup_size(8, 8, 1)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
// `id.xy` is the integer coordinate of the pixel this shader instance is processing.
// 1. Read the original color using `textureLoad`.
let original_color = textureLoad(input_texture, id.xy, 0);
// 2. Perform the image processing (in this case, a simple inversion).
let inverted_color = vec4<f32>(1.0 - original_color.rgb, original_color.a);
// 3. Write the new color to the output texture at the same coordinate.
// The value must be a vec4<f32> because the output format is `rgba8unorm`.
textureStore(output_texture, id.xy, inverted_color);
}
Derivative Functions
This family of functions calculates the rate of change of a value with respect to the screen axes. They are powerful tools available only in fragment shaders.
These functions operate by implicitly comparing the value in the current fragment with the value in an adjacent fragment. They provide derivatives for both the horizontal (X) and vertical (Y) screen axes, as well as a combined measure of the total change.
A critical requirement for these functions is that they must be called in uniform control flow. The GPU executes fragment shaders in small 2x2 blocks of pixels, and derivatives are calculated by differencing values within this block. If an if statement causes one fragment in the block to take a different code path than its neighbor, the values are no longer comparable, making the derivative result meaningless.
While their most common use is implicit (the GPU uses them behind the scenes for textureSample to select mipmap levels), they are powerful explicit tools for advanced effects like procedural anti-aliasing and generating normal maps from height maps.
dpdx
Signature
dpdx(e: T) -> T
e: The input value or vector to find the derivative of.T: Can bef32or a vector off32(e.g.,vec3<f32>).
Description
Calculates the partial derivative of the input e with respect to the horizontal screen axis (X). This function is only available in fragment shaders.
In practice, this means it computes the rate of change of e by finding the difference between its value in the current fragment and its value in the adjacent fragment to the right. The result can be positive or negative. dpdx is a general version of this function; the GPU is free to use either a high-precision (dpdxFine) or a faster, less-precise (dpdxCoarse) calculation. For explicit control, those functions can be called directly.
This function must be called in uniform control flow. This is because the GPU computes derivatives by running the same code on a 2x2 block of fragments and comparing their results. If fragments in the block take different code paths (due to an if statement), the comparison is meaningless, and the function will return an indeterminate value.
W3C Specification: dpdx
Use Case
Derivative functions are fundamental for many advanced effects. While they are used implicitly by textureSample to select mipmap levels, an explicit use is to procedurally generate normal maps from a height map.
// Assume we have a function that generates a procedural height value (e.g., noise).
let height = procedural_height(in.uv);
// Calculate how the height changes along the screen's X and Y axes.
let height_dx = dpdx(height);
let height_dy = dpdy(height); // (dpdy is the derivative on the Y axis)
// Create two vectors that lie on the surface of our procedural height map.
// These represent the tangents in the X and Y directions.
let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);
// The cross product of the two tangents gives us the surface normal.
let procedural_normal = normalize(cross(tangent_x, tangent_y));
// This normal can now be used for realistic lighting on a procedural surface.
dpdxCoarse
Signature
dpdxCoarse(e: T) -> T
e: The input value or vector to find the derivative of.T: Can bef32or a vector off32(e.g.,vec3<f32>).
Description
Calculates a low-precision ("coarse") partial derivative of the input e with respect to the horizontal screen axis (X). This function is only available in fragment shaders.
Like dpdx, it computes the rate of change of e between adjacent fragments. However, dpdxCoarse allows the GPU to perform the calculation using "local differences," which may mean it reuses derivative calculations across a larger 2x2 group of fragments rather than computing a unique value for each one. This can be faster but may result in a less precise derivative, potentially leading to blocky artifacts in some algorithms.
This function must be called in uniform control flow.
W3C Specification: dpdxCoarse
Use Case
Used in situations where performance is more critical than perfect derivative accuracy. It can be a performance optimization over dpdx or dpdxFine if the visual impact of the lower precision is negligible.
// Calculating a procedural normal where maximum precision is not required.
let height = procedural_height(in.uv);
// Using the coarse derivatives may be slightly faster. For many procedural
// patterns, the visual difference will be unnoticeable.
let height_dx = dpdxCoarse(height);
let height_dy = dpdyCoarse(height);
let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);
let procedural_normal = normalize(cross(tangent_x, tangent_y));
dpdxFine
Signature
dpdxFine(e: T) -> T
e: The input value or vector to find the derivative of.T: Can bef32or a vector off32(e.g.,vec3<f32>).
Description
Calculates a high-precision ("fine") partial derivative of the input e with respect to the horizontal screen axis (X). This function is only available in fragment shaders.
Like dpdx, it computes the rate of change of e by finding the difference between its value in the current fragment and its value in an adjacent fragment to the right. The "fine" qualifier suggests that the GPU will use a more precise, per-fragment calculation, potentially providing a more accurate local derivative than dpdxCoarse. This can be important for avoiding artifacts when working with high-frequency data.
This function must be called in uniform control flow.
W3C Specification: dpdxFine
Use Case
Used in situations where derivative accuracy is critical, and you want to ensure the highest-quality result, even at a potential minor performance cost compared to dpdxCoarse. This is important for procedural normals based on detailed height maps or for texture sampling with explicit gradients where precision is key.
// A height map texture that contains fine, sharp details.
@group(1) @binding(0) var height_map: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;
let height = textureSample(height_map, my_sampler, in.uv).r;
// Using the fine derivatives ensures we capture the sharp changes
// in the height map as accurately as possible, preventing aliasing
// or blocky artifacts in the resulting normal.
let height_dx = dpdxFine(height);
let height_dy = dpdyFine(height);
let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);
let procedural_normal = normalize(cross(tangent_x, tangent_y));
dpdy
Signature
dpdy(e: T) -> T
e: The input value or vector to find the derivative of.T: Can bef32or a vector off32(e.g.,vec3<f32>).
Description
Calculates the partial derivative of the input e with respect to the vertical screen axis (Y). This function is the vertical counterpart to dpdx and is only available in fragment shaders.
It computes the rate of change of e by finding the difference between its value in the current fragment and its value in the adjacent fragment below it. The result can be positive or negative. dpdy is a general version of this function; the GPU is free to use either a high-precision (dpdyFine) or a faster, less-precise (dpdyCoarse) calculation. For explicit control, those functions can be called directly.
This function must be called in uniform control flow. If it is used inside a loop with a variable number of iterations or an if statement with a per-fragment condition, the result is an indeterminate value.
W3C Specification: dpdy
Use Case
Derivative functions are fundamental for many advanced effects. dpdy is almost always used in conjunction with dpdx to understand how a value is changing across a 2D surface. A primary use is generating procedural normal maps from a height map.
// Assume we have a function that generates a procedural height value (e.g., noise).
let height = procedural_height(in.uv);
// Calculate how the height changes along the screen's X and Y axes.
let height_dx = dpdx(height);
let height_dy = dpdy(height);
// Create two vectors that lie on the surface of our procedural height map.
// These represent the tangents in the X and Y directions.
let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);
// The cross product of the two tangents gives us the surface normal.
let procedural_normal = normalize(cross(tangent_x, tangent_y));
// This normal can now be used for realistic lighting on a procedural surface.
dpdyCoarse
Signature
dpdyCoarse(e: T) -> T
e: The input value or vector to find the derivative of.T: Can bef32or a vector off32(e.g.,vec3<f32>).
Description
Calculates a low-precision ("coarse") partial derivative of the input e with respect to the vertical screen axis (Y). This function is the vertical counterpart to dpdxCoarse and is only available in fragment shaders.
Like dpdy, it computes the rate of change of e between adjacent vertical fragments. However, dpdyCoarse allows the GPU to perform the calculation using "local differences," which may mean it reuses derivative calculations across a larger 2x2 group of fragments rather than computing a unique value for each one. This can be faster but may result in a less precise derivative.
This function must be called in uniform control flow.
W3C Specification: dpdyCoarse
Use Case
Used in situations where performance is more critical than perfect derivative accuracy. It can be a performance optimization over dpdy or dpdyFine if the visual impact of the lower precision is negligible. It is typically used alongside dpdxCoarse.
// Calculating a procedural normal where maximum precision is not required.
let height = procedural_height(in.uv);
// Using the coarse derivatives may be slightly faster. For many procedural
// patterns, the visual difference will be unnoticeable.
let height_dx = dpdxCoarse(height);
let height_dy = dpdyCoarse(height);
let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);
let procedural_normal = normalize(cross(tangent_x, tangent_y));
dpdyFine
Signature
dpdyFine(e: T) -> T
e: The input value or vector to find the derivative of.T: Can bef32or a vector off32(e.g.,vec3<f32>).
Description
Calculates a high-precision ("fine") partial derivative of the input e with respect to the vertical screen axis (Y). This function is the vertical counterpart to dpdxFine and is only available in fragment shaders.
Like dpdy, it computes the rate of change of e by finding the difference between its value in the current fragment and its value in an adjacent vertical fragment. The "fine" qualifier suggests that the GPU will use a more precise, per-fragment calculation, potentially providing a more accurate local derivative than dpdyCoarse.
This function must be called in uniform control flow.
W3C Specification: dpdyFine
Use Case
Used in situations where derivative accuracy is critical, particularly when creating procedural normals from high-frequency data. It is almost always used in conjunction with dpdxFine to ensure consistent precision.
// A height map texture that contains fine, sharp details.
@group(1) @binding(0) var height_map: texture_2d<f32>;
@group(1) @binding(1) var my_sampler: sampler;
let height = textureSample(height_map, my_sampler, in.uv).r;
// Using the fine derivatives ensures we capture the sharp changes
// in the height map as accurately as possible, preventing aliasing
// or blocky artifacts in the resulting normal.
let height_dx = dpdxFine(height);
let height_dy = dpdyFine(height);
let tangent_x = vec3<f32>(1.0, 0.0, height_dx);
let tangent_y = vec3<f32>(0.0, 1.0, height_dy);
let procedural_normal = normalize(cross(tangent_x, tangent_y));
fwidth
Signature
fwidth(e: T) -> T
e: The input value or vector to find the derivative of.T: Can bef32or a vector off32(e.g.,vec3<f32>).
Description
Calculates the sum of the absolute values of the partial derivatives with respect to the screen axes. It is a convenient shorthand for abs(dpdx(e)) + abs(dpdy(e)). This function is only available in fragment shaders.
The result is a single positive value that represents the total rate of change of the input e at the current fragment, combining both horizontal and vertical changes. It provides a measure of how "busy" or how steep the gradient of e is across the pixel.
This function must be called in uniform control flow.
W3C Specification: fwidth
Use Case
The primary use of fwidth is for procedural anti-aliasing. It can determine the "width" of a procedural shape in screen space, allowing you to create a smooth transition that is exactly one pixel wide, eliminating aliasing (jagged edges).
// Use Case: Drawing an anti-aliased procedural line.
let line_pos = 0.5;
let uv = in.uv;
// An aliased (jagged) vertical line using `step`.
let aliased_line = step(line_pos, uv.x);
// To anti-alias it, we need to know how much `uv.x` changes from
// one pixel to the next. This is exactly what `fwidth` provides.
let pixel_width = fwidth(uv.x);
// Now, instead of a hard step, we create a smooth transition that is
// exactly one pixel wide, centered on our line position.
let anti_aliased_line = smoothstep(
line_pos - pixel_width,
line_pos + pixel_width,
uv.x
);
// The `anti_aliased_line` will be a soft, non-jagged line, while
// `aliased_line` will be pixelated.
fwidthCoarse
Signature
fwidthCoarse(e: T) -> T
e: The input value or vector to find the derivative of.T: Can bef32or a vector off32(e.g.,vec3<f32>).
Description
Calculates the sum of the absolute values of the low-precision ("coarse") partial derivatives. It is a convenient shorthand for abs(dpdxCoarse(e)) + abs(dpdyCoarse(e)). This function is only available in fragment shaders.
This is the less precise but potentially faster version of fwidth. It provides an estimate of the total rate of change of the input e at the current fragment.
This function must be called in uniform control flow.
W3C Specification: fwidthCoarse
Use Case
Used for procedural anti-aliasing in situations where maximum performance is desired and a slightly less accurate gradient calculation is acceptable. For many effects, the visual difference between fwidth and fwidthCoarse is negligible.
// Use Case: Procedural anti-aliasing where performance is a priority.
let circle_radius = 0.4;
let dist = length(in.uv - 0.5);
// Get a fast, approximate measure of how much `dist` changes across one pixel.
let pixel_width = fwidthCoarse(dist);
// Create a smooth transition (anti-aliasing) that is approximately one pixel wide.
let anti_aliased_circle = 1.0 - smoothstep(
circle_radius - pixel_width,
circle_radius + pixel_width,
dist
);
### `fwidthFine`
#### Signature
`fwidthFine(e: T) -> T`
* `e`: The input value or vector to find the derivative of.
* `T`: Can be `f32` or a vector of `f32` (e.g., `vec3<f32>`).
#### Description
Calculates the sum of the absolute values of the high-precision ("fine") partial derivatives. It is a convenient shorthand for `abs(dpdxFine(e)) + abs(dpdyFine(e))`. This function is only available in fragment shaders.
This is the high-precision version of [`fwidth`](#heading-fwidth). It provides a more accurate measure of the total rate of change of the input `e` at the current fragment, which is important for effects that rely on precise gradient information.
This function must be called in **uniform control flow**.
*W3C Specification:* [`fwidthFine`](https://www.w3.org/TR/WGSL/#fwidthFine-builtin)
#### Use Case
Used for high-quality procedural anti-aliasing, especially for patterns with sharp, high-frequency details where a less precise gradient from [`fwidthCoarse`](#heading-fwidthcoarse) might lead to visual artifacts like moiré patterns or inconsistent line thickness.
```rust
// Use Case: Drawing a high-quality, anti-aliased procedural grid.
let uv = in.uv * 10.0; // Create a 10x10 grid.
// `fract()` gives us sawtooth waves. `abs(v - 0.5)` gives us triangles.
// `dist` will be 0 at the grid lines and 0.5 at the center of cells.
let dist_to_line = abs(fract(uv) - 0.5);
// Get a precise measure of how much `dist_to_line` changes across one pixel.
let pixel_width = fwidthFine(dist_to_line);
// Use this precise width to create a smooth transition around the grid lines.
// `smoothstep` will create a line that is exactly one pixel thick.
let grid_lines = 1.0 - smoothstep(
0.5 - pixel_width, // Start fading in one pixel-width before the line
0.5, // The line itself
dist_to_line
);
// The result is a crisp, perfectly anti-aliased grid.
Atomic Read-Modify-Write Functions
This family of functions performs atomic read-modify-write operations on a memory location shared between multiple shader invocations (threads). They are primarily used in compute shaders to safely modify data without causing race conditions.
An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs one of these functions, it executes a complete sequence:
It reads the original value from a memory location.
It performs a calculation using that original value and a new input value (e.g., addition, comparison, or a bitwise operation).
It writes the result of the calculation back to the memory location.
This entire sequence happens without any other thread being able to interfere. A key feature of all functions in this group is that they return the original value that was in memory before the modification took place.
atomicAdd
Signature
atomicAdd(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T
atomic_ptr: A pointer to the atomic integer to be modified.AS: The address space of the variable, typically workgroup or storage.atomic<T>: The special type indicating this memory is for atomic access.read_write: The access mode for the pointer.
v: The integer value to add.T: The data type, which must be eitheri32oru32.
Description
Atomically adds the value v to the integer stored at the memory location atomic_ptr. The function returns the original value that was stored in memory before the addition occurred.
An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicAdd, it reads the original value, calculates the sum, and writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for preventing race conditions in compute shaders where multiple threads might try to modify the same shared variable at the same time.
W3C Specification: atomicAdd
Use Case
The most common use case is to safely increment a shared counter from multiple threads in a compute shader.
// A shared counter in `workgroup` memory, accessible by all threads.
// Note the `atomic<u32>` type declaration.
var<workgroup> shared_counter: atomic<u32>;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
// A single thread initializes the counter.
if (local_id == 0u) {
// `atomicStore` is used for initialization.
atomicStore(&shared_counter, 0u);
}
// Ensure all threads wait until initialization is done.
workgroupBarrier();
// Imagine each thread does some work and needs to increment the counter.
// Without `atomicAdd`, if two threads read the value '5' at the same time,
// they would both write back '6', and one increment would be lost.
// `atomicAdd` prevents this.
atomicAdd(&shared_counter, 1u);
}
atomicSub
Signature
atomicSub(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T
atomic_ptr: A pointer to the atomic integer to be modified.AS: The address space of the variable, typically workgroup or storage.atomic<T>: The special type indicating this memory is for atomic access.read_write: The access mode for the pointer.
v: The integer value to subtract.T: The data type, which must be eitheri32oru32.
Description
Atomically subtracts the value v from the integer stored at the memory location atomic_ptr. The function returns the original value that was stored in memory before the subtraction occurred.
An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicSub, it reads the original value, calculates the difference, and writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for preventing race conditions in compute shaders where multiple threads might try to modify the same shared variable simultaneously.
W3C Specification: atomicSub
Use Case
Used to safely decrement a shared counter or resource pool from multiple threads in a compute shader. For example, implementing a work queue where threads "claim" items to process.
// A shared counter for the number of items left in a work pool.
var<workgroup> items_remaining: atomic<u32>;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
// A single thread initializes the counter.
if (local_id == 0u) {
atomicStore(&items_remaining, 100u); // Start with 100 items.
}
workgroupBarrier();
// Each thread attempts to claim an item by decrementing the counter.
// `atomicSub` returns the value *before* the subtraction.
let previous_count = atomicSub(&items_remaining, 1u);
// If the count before we subtracted was greater than 0, then our claim was successful.
if (previous_count > 0u) {
// ... process the item ...
}
}
atomicMax
Signature
atomicMax(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T
atomic_ptr: A pointer to the atomic integer to be modified.AS: The address space of the variable, typically workgroup or storage.atomic<T>: The special type indicating this memory is for atomic access.read_write: The access mode for the pointer.
v: The integer value to compare against.T: The data type, which must be eitheri32oru32.
Description
Atomically compares the value v with the integer stored at the memory location atomic_ptr. If v is larger, the value at atomic_ptr is replaced with v. The function returns the original value that was stored in memory before the comparison and potential replacement occurred.
An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicMax, it reads the original value, performs the comparison, and (if necessary) writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for preventing race conditions when finding a maximum value in parallel.
W3C Specification: atomicMax
Use Case
Used to safely find the maximum value in a dataset in parallel. Each thread in a compute shader can process a subset of the data and use atomicMax to update a single shared variable with its local maximum.
// A shared variable to hold the global maximum value found by the workgroup.
var<workgroup> global_maximum: atomic<u32>;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
// A single thread initializes the shared maximum to zero.
if (local_id == 0u) {
atomicStore(&global_maximum, 0u);
}
workgroupBarrier();
// Each thread calculates its own local maximum from some data.
let local_maximum = calculate_local_maximum(local_id); // e.g., 10u, 50u, 20u...
// Each thread attempts to update the shared global_maximum.
// `atomicMax` ensures that even if two threads try to write a new
// maximum at the same time, the correct final maximum is preserved.
atomicMax(&global_maximum, local_maximum);
}
atomicMin
Signature
atomicMin(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T
atomic_ptr: A pointer to the atomic integer to be modified.AS: The address space of the variable, typically workgroup or storage.atomic<T>: The special type indicating this memory is for atomic access.read_write: The access mode for the pointer.
v: The integer value to compare against.
T: The data type, which must be either
i32oru32.
Description
Atomically compares the value v with the integer stored at the memory location atomic_ptr. If v is smaller, the value at atomic_ptr is replaced with v. The function returns the original value that was stored in memory before the comparison and potential replacement occurred.
An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicMin, it reads the original value, performs the comparison, and (if necessary) writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for preventing race conditions when finding a minimum value in parallel.
W3C Specification: atomicMin
Use Case
Used to safely find the minimum value in a dataset in parallel. Each thread in a compute shader can process a subset of the data and use atomicMin to update a single shared variable with its local minimum.
// A shared variable to hold the global minimum value found by the workgroup.
var<workgroup> global_minimum: atomic<u32>;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
// A single thread initializes the shared minimum to a very large value.
if (local_id == 0u) {
atomicStore(&global_minimum, 4294967295u); // Max u32 value
}
workgroupBarrier();
// Each thread calculates its own local minimum from some data.
let local_minimum = calculate_local_minimum(local_id); // e.g., 100u, 50u, 200u...
// Each thread attempts to update the shared global_minimum.
// `atomicMin` ensures that even if two threads try to write a new
// minimum at the same time, the correct final minimum is preserved.
atomicMin(&global_minimum, local_minimum);
}
atomicAnd
Signature
atomicAnd(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T
atomic_ptr: A pointer to the atomic integer to be modified.AS: The address space of the variable, typically workgroup or storage.atomic<T>: The special type indicating this memory is for atomic access.read_write: The access mode for the pointer.
v: The integer value to use for the bitwise AND operation.T: The data type, which must be eitheri32oru32.
Description
Atomically performs a bitwise AND operation between the value v and the integer stored at the memory location atomic_ptr, storing the result back at atomic_ptr. The function returns the original value that was stored in memory before the AND operation occurred.
An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicAnd, it reads the original value, performs the bitwise AND, and writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for safely manipulating shared bitmasks from multiple threads without race conditions.
W3C Specification: atomicAnd
Use Case
Used to safely clear specific bits in a shared integer flag or bitmask. Each thread can use atomicAnd with a custom mask to clear its own bit without affecting the bits being modified by other threads.
// A shared bitmask representing the completion status of 64 tasks.
// A '1' means the task is NOT complete.
var<workgroup> completion_mask: atomic<u32>;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
// A single thread initializes the mask to all '1's, indicating no tasks are complete.
if (local_id == 0u) {
atomicStore(&completion_mask, 0xFFFFFFFFu);
}
workgroupBarrier();
// ... each thread performs its task ...
// When a thread finishes its task, it clears its corresponding bit in the mask.
// Create a mask to clear the bit at `local_id`.
// `~` is the bitwise NOT operator. `1u << local_id` creates a '1' at the thread's bit.
// `~ (1u << local_id)` creates a mask with a '0' at the thread's bit and '1's everywhere else.
let clear_mask = ~ (1u << local_id);
// `atomicAnd` safely applies this mask. For example, if the shared mask was ...1111
// and our clear_mask is ...1011, the result will be ...1011.
atomicAnd(&completion_mask, clear_mask);
}
atomicOr
Signature
atomicOr(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T
atomic_ptr: A pointer to the atomic integer to be modified.AS: The address space of the variable, typically workgroup or storage.atomic<T>: The special type indicating this memory is for atomic access.read_write: The access mode for the pointer.
v: The integer value to use for the bitwise OR operation.T: The data type, which must be eitheri32oru32.
Description
Atomically performs a bitwise OR operation between the value v and the integer stored at the memory location atomic_ptr, storing the result back at atomic_ptr. The function returns the original value that was stored in memory before the OR operation occurred.
An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicOr, it reads the original value, performs the bitwise OR, and writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for safely manipulating shared bitmasks from multiple threads without race conditions.
W3C Specification: atomicOr
Use Case
Used to safely set specific bits in a shared integer flag or bitmask. Each thread can use atomicOr with a custom mask to set its own bit without affecting the bits being modified by other threads.
// A shared bitmask representing which tasks have completed.
// A '1' means the task IS complete.
var<workgroup> completion_mask: atomic<u32>;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
// A single thread initializes the mask to all '0's.
if (local_id == 0u) {
atomicStore(&completion_mask, 0u);
}
workgroupBarrier();
// ... each thread performs its task ...
var task_is_complete = true; // Based on some per-thread calculation.
if (task_is_complete) {
// Create a mask to set the bit corresponding to this thread's ID.
let set_mask = 1u << local_id;
// `atomicOr` safely applies this mask. If thread 2 (mask ...0100) and
// thread 3 (mask ...1000) both finish at the same time, the `atomicOr`
// guarantees the final result will correctly have both bits set (...1100).
atomicOr(&completion_mask, set_mask);
}
}
atomicXor
Signature
atomicXor(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T
atomic_ptr: A pointer to the atomic integer to be modified.AS: The address space of the variable, typically workgroup or storage.atomic<T>: The special type indicating this memory is for atomic access.read_write: The access mode for the pointer.
v: The integer value to use for the bitwise XOR operation.T: The data type, which must be eitheri32oru32.
Description
Atomically performs a bitwise XOR operation between the value v and the integer stored at the memory location atomic_ptr, storing the result back at atomic_ptr. The function returns the original value that was stored in memory before the XOR operation occurred.
An atomic operation is guaranteed by the hardware to be indivisible. When a thread performs atomicXor, it reads the original value, performs the bitwise XOR, and writes the new value back to memory without any other thread being able to interfere in the middle of that sequence. This is essential for safely manipulating shared bitmasks from multiple threads without race conditions.
W3C Specification: atomicXor
Use Case
Used to safely toggle specific bits in a shared integer flag or bitmask. The XOR operation flips a bit if the corresponding bit in the mask is 1, and leaves it unchanged if the mask bit is 0.
// A shared bitmask representing a set of toggleable states.
var<workgroup> toggle_states: atomic<u32>;
@compute @workgroup_size(32, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
// A single thread initializes the states.
if (local_id == 0u) {
atomicStore(&toggle_states, 0u);
}
workgroupBarrier();
// Imagine each thread decides it needs to flip the state of its corresponding bit.
let should_toggle = some_per_thread_condition();
if (should_toggle) {
// Create a mask to toggle the bit for this thread's ID.
let toggle_mask = 1u << local_id;
// `atomicXor` safely applies this mask. If a thread applies the mask, its bit
// will flip (0->1 or 1->0). If multiple threads do this at the same time,
// each toggle is guaranteed to be applied correctly without being lost.
atomicXor(&toggle_states, toggle_mask);
}
}
Atomic Access and Exchange Functions
This family of functions provides the fundamental primitives for reading, writing, and swapping values in shared memory atomically.
Unlike the "Read-Modify-Write" group, these operations do not perform arithmetic calculations. Instead, their focus is on direct memory access and replacement. This category includes the most basic atomic operations: safely reading a complete value from memory and safely writing a complete value to memory. It also contains more powerful primitives for unconditionally swapping a value with a new one, and for performing a conditional swap that only succeeds if the memory currently holds an expected value.
These functions are the essential building blocks for creating any complex, lock-free algorithm or synchronization mechanism in a compute shader, enabling multiple threads to coordinate their work on shared data without corruption.
atomicLoad
Signature
atomicLoad(atomic_ptr: ptr<AS, atomic<T>, read_write>) -> T
atomic_ptr: A pointer to the atomic integer to be read.AS: The address space of the variable, typically workgroup or storage.atomic<T>: The special type indicating this memory is for atomic access.read_write: The access mode for the pointer.
T: The data type, which must be eitheri32oru32.
Description
Atomically loads (reads) the integer value from the memory location atomic_ptr and returns it.
An atomic load guarantees that the entire value is read without being partially modified ("torn") by another thread writing to the same location at the same time. It ensures you always get a coherent, complete value that was present at some point in time, even during concurrent write operations.
W3C Specification: atomicLoad
Use Case
Used in compute shaders to safely read a shared value that may have been modified by other threads. It is often used after a workgroupBarrier to ensure all previous modifications are visible before the read.
var<workgroup> shared_result: atomic<u32>;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
// A single thread calculates a result and stores it atomically.
if (local_id == 0u) {
let result = do_some_complex_calculation();
atomicStore(&shared_result, result);
}
// A workgroup barrier ensures that all threads wait until the `atomicStore`
// from thread 0 is complete and its result is visible to all other threads.
workgroupBarrier();
// Now, all other threads can safely read the calculated result.
// `atomicLoad` guarantees they get the complete, final value written by thread 0.
let final_result = atomicLoad(&shared_result);
// ... use final_result in subsequent calculations ...
}
atomicStore
Signature
atomicStore(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T)
atomic_ptr: A pointer to the atomic integer to be modified.AS: The address space of the variable, typically workgroup or storage.atomic<T>: The special type indicating this memory is for atomic access.read_write: The access mode for the pointer.
v: The integer value to store.T: The data type, which must be eitheri32oru32.
Description
Atomically stores (writes) the integer value v to the memory location atomic_ptr. This function does not return a value.
An atomic store guarantees that the entire value is written without being partially overwritten ("torn") by another thread writing to the same location at the same time. It ensures that any other thread reading this location will see either the value from before the store or the complete value from after the store, but never a corrupt, half-written value.
W3C Specification: atomicStore
Use Case
Used in compute shaders to safely write to a shared memory location. It is the primary way to initialize an atomic variable or to publish a final result from one thread for other threads to read.
var<workgroup> shared_result: atomic<u32>;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
// A single thread is designated to calculate and store an initial value.
if (local_id == 0u) {
let initial_value = 123u;
// The atomic store ensures the value 123 is written completely
// before any other thread can read it.
atomicStore(&shared_result, initial_value);
}
// A barrier is crucial here. It makes all other threads wait until the
// store operation from thread 0 is complete and visible.
workgroupBarrier();
// Now other threads can safely load the initialized value.
let value = atomicLoad(&shared_result); // will be 123 for all threads
}
atomicExchange
Signature
atomicExchange(atomic_ptr: ptr<AS, atomic<T>, read_write>, v: T) -> T
atomic_ptr: A pointer to the atomic integer to be modified.AS: The address space of the variable, typically workgroup or storage.atomic<T>: The special type indicating this memory is for atomic access.read_write: The access mode for the pointer.
v: The new integer value to store.T: The data type, which must be eitheri32oru32.
Description
Atomically replaces the integer value at the memory location atomic_ptr with v. The function returns the original value that was stored in memory before the replacement occurred.
An atomic exchange is an indivisible "read-and-then-write" operation. It guarantees that the old value is read and the new value is written without any other thread being able to interfere in the middle of the sequence.
W3C Specification: atomicExchange
Use Case
Used in compute shaders for algorithms that require a "claim and replace" pattern on a shared resource. A common example is implementing a simple spinlock or mutex.
// A shared variable representing a lock. 0 means unlocked, 1 means locked.
var<workgroup> lock: atomic<u32>;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
// Initialize the lock to 0 (unlocked).
if (local_id == 0u) {
atomicStore(&lock, 0u);
}
workgroupBarrier();
// Try to acquire the lock.
// `atomicExchange` attempts to write '1' (locked) and returns the previous value.
// This entire operation is atomic.
if (atomicExchange(&lock, 1u) == 0u) {
// If the previous value was '0' (unlocked), we have successfully
// acquired the lock because we were the one to write the '1'.
// ... perform critical section work here ...
// Release the lock by storing '0' back.
atomicStore(&lock, 0u);
}
}
atomicCompareExchangeWeak
Signature
atomicCompareExchangeWeak(atomic_ptr: ptr<AS, atomic<T>, read_write>, cmp: T, v: T) -> R
atomic_ptr: A pointer to the atomic integer to be modified.AS: The address space of the variable, typically workgroup or storage.atomic<T>: The special type indicating this memory is for atomic access.read_write: The access mode for the pointer.
cmp: The value to compare against the value stored atatomic_ptr.v: The new value to store if the comparison is successful.T: The data type, which must be eitheri32oru32.R: A special, built-in struct __atomic_compare_exchange_result<T>that cannot be declared by name. It must be inferred withlet. This struct contains two members:old_value: T: The value that was in memory atatomic_ptrbefore this operation began.exchanged: bool:trueif the exchange was successful,falseotherwise.
Description
Performs an atomic "compare-and-swap" (CAS) operation. This is a conditional write that is the most powerful of the atomic primitives, forming the basis for many complex, lock-free algorithms.
The operation follows these indivisible steps:
Atomically reads the value at
atomic_ptr(let's call itoriginal_value).Compares
original_valuewith the provided comparison value,cmp.If
original_valueequalscmp, it atomically attempts to write the new valuevtoatomic_ptr.If
original_valuedoes not equal cmp, it does nothing.
The "Weak" Guarantee and Spurious Failures:
The "weak" in the function's name is critical. It means the operation can spuriously fail. This is a rare event where the exchanged member of the result will be false even though original_value was equal to cmp. This is a performance trade-off allowed by some hardware.
Because of this possibility, the standard and correct way to use this function is to always place it inside a loop. The loop should repeat the attempt until exchanged is true.
W3C Specification: atomicCompareExchangeWeak
Use Case
The canonical use case is implementing robust, lock-free data structures or synchronization primitives like a spinlock. The loop pattern is mandatory for correctness.
// A shared variable representing a lock. 0 means unlocked, 1 means locked.
var<workgroup> lock: atomic<u32>;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_index) local_id: u32) {
// Initialize the lock to 0 (unlocked).
if (local_id == 0u) {
atomicStore(&lock, 0u);
}
workgroupBarrier();
// Loop until we successfully acquire the lock.
loop {
// We expect the lock to be 0 (unlocked). This is our comparison value.
let expected_value = 0u;
// Attempt to swap the 0 with a 1 (locked).
let result = atomicCompareExchangeWeak(&lock, expected_value, 1u);
// The operation succeeded if:
// 1. The old value was indeed 0.
// 2. The exchange was successful (no spurious failure).
if (result.old_value == expected_value && result.exchanged) {
// We have successfully acquired the lock.
break; // Exit the loop.
}
// If we failed (either because another thread got the lock first, or
// a spurious failure), the loop will repeat the attempt.
}
// ... perform critical section work here ...
// Release the lock. A simple store is sufficient.
atomicStore(&lock, 0u);
}
Synchronization Functions
This family of functions provides the essential tools for controlling the order of execution and memory visibility between different shader invocations (threads). They are the fundamental primitives for coordinating work in a parallel environment and are primarily used in compute shaders.
By default, shader threads execute independently, and the hardware makes no guarantees about their relative speed or the order in which their memory operations become visible to each other. This can lead to "race conditions," where one thread reads a value before another thread has had a chance to write its result, leading to incorrect or unpredictable behavior.
Synchronization functions solve this problem by creating a barrier, which is a point in the shader code that all threads in a given scope (like a compute shader workgroup) must reach before any of them are allowed to proceed further. This enforces a strict order of operations. Crucially, barriers also enforce memory visibility, guaranteeing that all memory writes made before the barrier are completed and visible to all threads after the barrier. These functions are the indispensable traffic control for any algorithm that requires threads to collaborate on shared data.
storageBarrier
Signature
storageBarrier()
This function takes no parameters and does not return a value.
Description
Executes a control barrier and synchronizes memory access for variables in the storage address space. This function is primarily used in compute shaders.
When storageBarrier is called, it has two effects:
Control Barrier: All shader invocations (threads) within the same workgroup must reach this point in the code before any of them are allowed to proceed further.
Memory Barrier: It guarantees that all writes to
storageandatomicvariables in the storage address space performed by any thread in the workgroup before the barrier are completed and visible to all threads in the same workgroup after the barrier.
This ensures that threads reading from a storage buffer after the barrier will not see stale data written by other threads before the barrier. Note that workgroupBarrier provides a stronger guarantee, as it synchronizes both workgroup and storage memory. storageBarrier is a more granular control for when you only need to synchronize storage access.
W3C Specification: storageBarrier
Use Case
Used to coordinate multi-stage algorithms within a compute shader workgroup that use a storage buffer for communication between threads.
// A buffer in the storage address space, accessible for reading and writing.
@group(0) @binding(0) var<storage, read_write> data_buffer: array<f32>;
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(local_invocation_id) local_id: vec3<u32>) {
let idx = local_id.x;
// --- Stage 1: Initial Calculation ---
// Each thread performs a calculation and writes its result to the buffer.
let initial_value = f32(idx) * 2.0;
data_buffer[idx] = initial_value;
// --- Synchronization ---
// This barrier ensures that all 64 writes from Stage 1 are complete
// and visible to all threads in this workgroup before anyone proceeds.
storageBarrier();
// --- Stage 2: Neighbor-dependent Calculation ---
// Each thread now reads the results written by its neighbors in Stage 1
// to perform a second calculation (e.g., a simple blur/average).
let left_neighbor_val = data_buffer[idx - 1u];
let right_neighbor_val = data_buffer[idx + 1u];
// Without the barrier, a thread might read the old, incorrect value
// from its neighbor's location before the neighbor had a chance to write its new value.
data_buffer[idx] = (left_neighbor_val + initial_value + right_neighbor_val) / 3.0;
}
workgroupBarrier
Signature
workgroupBarrier()
This function takes no parameters and does not return a value.
Description
Executes a control barrier and synchronizes memory access for variables in both the workgroup and storage address spaces. This is the strongest synchronization function and is exclusively used in compute shaders.
When workgroupBarrier is called, it has two effects:
Control Barrier: All shader invocations (threads) within the same workgroup must reach this point in the code before any of them are allowed to proceed further. This synchronizes the execution flow of all threads in the workgroup.
Memory Barrier: It guarantees that all writes to
workgroupvariables,storagebuffers, andatomicvariables in those address spaces performed by any thread in the workgroup before the barrier are completed and visible to all other threads in the workgroup after the barrier.
This is the most common barrier function, as it provides a comprehensive guarantee that a stage of a parallel algorithm is fully complete before the next stage begins.
W3C Specification: workgroupBarrier
Use Case
Used to coordinate multi-stage algorithms within a compute shader workgroup, especially when using the fast workgroup memory for communication between threads. A classic example is a parallel reduction (e.g., finding the sum of a large array).
// A shared array in fast `workgroup` memory to hold data for one workgroup.
var<workgroup> local_data: array<f32, 64>;
@group(0) @binding(0) var<storage, read> input_data: array<f32>;
@group(0) @binding(1) var<storage, read_write> output_data: array<f32>;
@compute @workgroup_size(64, 1, 1)
fn main(
@builtin(global_invocation_id) global_id: vec3<u32>,
@builtin(local_invocation_id) local_id: vec3<u32>,
) {
let idx = local_id.x;
// --- Stage 1: Load data from slow global storage to fast workgroup memory ---
local_data[idx] = input_data[global_id.x];
// --- Synchronization ---
// This is CRITICAL. It ensures every thread has finished loading its data
// into `local_data` before anyone starts reading from it.
workgroupBarrier();
// --- Stage 2: Parallel reduction in workgroup memory ---
// (This is a simplified reduction loop)
var stride = 32u;
loop {
if (stride == 0u) { break; }
if (idx < stride) {
local_data[idx] = local_data[idx] + local_data[idx + stride];
}
// Barrier inside the loop ensures each level of the reduction is complete
// before the next level begins.
workgroupBarrier();
stride = stride / 2u;
}
// --- Stage 3: Write final result ---
// After the reduction, thread 0 holds the sum for the entire workgroup.
if (idx == 0u) {
output_data[global_id.x / 64u] = local_data[0];
}
}
workgroupUniformLoad
Signature
workgroupUniformLoad(p : ptr<workgroup, T>) -> T
p: A pointer to a variable in the workgroup address space. The pointerpitself must be a uniform value (i.e., it must be the same for all invocations in the workgroup).T: A concrete, non-atomic type with a fixed size, such asf32,vec4<i32>, or a user-defined struct that meets these criteria.
Description
Atomically reads a value from workgroup memory and broadcasts that single value to every invocation (thread) in the workgroup. This function is exclusively used in compute shaders.
workgroupUniformLoad has two critical, combined effects:
Uniform Return Value: It guarantees that every single thread in the workgroup receives the exact same return value. This is powerful because using this uniform result in a subsequent
ifstatement or loop condition will not cause thread divergence within the workgroup.Implicit Barrier: It executes a control barrier and synchronizes memory for the
workgroupaddress space. This means all threads must reach this function call before any can proceed, and all writes toworkgroupmemory before the call are guaranteed to be visible to the load operation. It effectively combines aworkgroupBarrierwith a subsequent load.
It provides a single, synchronized value that can be safely used to direct the flow of the entire workgroup.
W3C Specification: workgroupUniformLoad
Use Case
Used to efficiently distribute a single, workgroup-wide value that is calculated by one thread to all other threads in that workgroup, while also providing the necessary synchronization.
// A shared variable in workgroup memory.
var<workgroup> workgroup_id: u32;
@compute @workgroup_size(64, 1, 1)
fn main(
@builtin(local_invocation_id) local_id: vec3<u32>,
@builtin(workgroup_id) group_id: vec3<u32>
) {
// ---- The "Manual" Way ----
// A single thread calculates a group-wide value.
if (local_id.x == 0u) {
workgroup_id = group_id.x;
}
// A barrier is required to make the write visible to all other threads.
workgroupBarrier();
// Now every thread loads the value.
let id_manual = workgroup_id;
// ---- The `workgroupUniformLoad` Way ----
// A single thread calculates a group-wide value.
if (local_id.x == 0u) {
workgroup_id = group_id.x;
}
// This single function both waits for all threads (acting as a barrier)
// and loads the value written by thread 0 into a uniform variable for all threads.
let id_uniform = workgroupUniformLoad(&workgroup_id);
// Because `id_uniform` is guaranteed to be the same for all threads,
// this 'if' statement will not cause thread divergence.
if (id_uniform % 2u == 0u) {
// ... all threads in this workgroup will either enter this block, or none will ...
}
}






