In Metal 2.0, the number of threads in the grid does not have to be a multiple of the number of
threads in a threadgroup. It is therefore possible that the actual threadgroup size of a specific
threadgroup may be smaller than the threadgroup size specified in the dispatch. The
[[threads_per_threadgroup]]
atriabute specifies the actual threadgroup size for a given
threadgroup executing the kernel. The
[[dispatch_threads_per_threadgroup]]
attribute is
the threadgroup size specified at dispatch.
Notes on kernel function attributes:
• The type used to declare
[[thread_position_in_grid]]
,
[[threads_per_grid]]
,
[[thread_position_in_threadgroup]]
,
[[threads_per_threadgroup]]
,
[[threadgroup_position_in_grid]]
,
[[dispatch_threads_per_threadgroup]]
,
and
[[threadgroups_per_grid]]
must be a scalar type or a vector type. If it is a
vector type, the number of components for the vector types used to declare these
arguments must match.
• The data types used to declare
[[thread_position_in_grid]]
and
[[threads_per_grid]]
must match.
• The data types used to declare
[[thread_position_in_threadgroup]]
and
[[threads_per_threadgroup]]
, and
[[dispatch_threads_per_threadgroup]]
must match.
• If
[[thread_position_in_threadgroup]]
is declared to be of type
uint
,
uint2
or
uint3
, then
[[thread_index_in_threadgroup]]
must be declared to be of type
uint
.
• The types used to declare
[[thread_index_in_simdgroup]]
,
[[threads_per_simdgroup]]
,
[[simdgroup_index_in_threadgroup]]
,
[[simdgroups_per_threadgroup]]
,
[[dispatch_simdgroups_per_threadgroup]]
,
[[quadgroup_index_in_threadgroup]]
,
[[quadgroups_per_threadgroup]]
, and
[[dispatch_quadgroups_per_threadgroup]]
must be
ushort
or
uint
. The types
used to declare these built-in variables must match.
•
[[thread_execution_width]]
and
[[threads_per_simdgroup]]
are aliases of one
another that reference the same concept.
[[quadgroup_index_in_threadgroup]]
ushort
or
uint
[[simdgroups_per_threadgroup]]
ushort
or
uint
[[quadgroups_per_threadgroup]]
ushort
or
uint
[[dispatch_simdgroups_per_threadgroup]]
ushort
or
uint
[[dispatch_quadgroups_per_threadgroup]]
ushort
or
uint
Attribute
Corresponding Data Types
2017-9-12 | Copyright © 2017 Apple Inc. All Rights Reserved.
Page of
77
174
4.3.5
stage_in Attribute
The per-fragment inputs to a fragment function are generated using the output from a vertex
function and the fragments generated by the rasterizer. The per-fragment inputs are identified
using the
[[stage_in]]
attribute.
A vertex function can read per-vertex inputs by indexing into a buffer(s) passed as arguments to
the vertex function using the vertex and instance IDs. In addition, per-vertex inputs can also be
passed as arguments to a vertex function by declaring them with the
[[stage_in]]
attribute.
A kernel function reads per-thread inputs by indexing into a buffer(s) or texture(s) passed as
arguments to the kernel function using the thread position in grid or thread position in
threadgroup IDs. In addition, per-thread inputs can also be passed as arguments to a kernel
function by declaring them with the
[[stage_in]]
attribute.
Only one argument of the vertex, fragment or kernel function can be declared with the
[[stage_in]]
attribute. For a user-defined struct declared with the
[[stage_in]]
attribute,
the members of the struct can be:
• a scalar integer or floating-point value or
• a vector of integer or floating-point values.
NOTE: Packed vectors, matrices, structs, references or pointers to a type, and arrays of
scalars, vectors, matrices and bitfields are not supported as members of the struct
declared with the
stage_in
attribute.
4.3.5.1
Vertex Function Example that Uses the stage_in Attribute
The following example shows how to pass per-vertex inputs using the stage_in attribute.
struct VertexOutput {
float4 position [[position]];
float4 color;
float2 texcoord[4];
};
struct VertexInput {
float4 position [[attribute(0)]];
float3 normal [[attribute(1)]];
half4 color [[attribute(2)]];
half2 texcoord [[attribute(3)]];
};
constexpr constant uint MAX_LIGHTS = 4;
2017-9-12 | Copyright © 2017 Apple Inc. All Rights Reserved.
Page of
78
174
struct LightDesc {
uint num_lights;
float4 light_position[MAX_LIGHTS];
float4 light_color[MAX_LIGHTS];
float4 light_attenuation_factors[MAX_LIGHTS];
};
constexpr sampler s = sampler(coord::normalized, address::clamp_to_zero,
filter::linear);
vertex VertexOutput
render_vertex(VertexInput v_in [[stage_in]],
constant float4x4& mvp_matrix [[buffer(1)]],
constant LightDesc& lights [[buffer(2)]],
uint v_id [[vertex_id]])
{
VertexOutput v_out;
v_out.position = v_in.position * mvp_matrix;
v_out.color = do_lighting(v_in.position, v_in.normal, lights);
…
return v_out;
}
4.3.5.2
Fragment Function Example that Uses the stage_in Attribute
An example in section 4.3.3 previously introduces the
process_vertex
vertex function, which
returns a
VertexOutput
struct per vertex. In the following example, the output from
process_vertex
is pipelined to become input for a fragment function called
render_pixel
, so
the first argument of the fragment function uses the
[[stage_in]]
attribute and uses the
incoming
VertexOutput
type. (In
render_pixel
, the
imgA
and
imgB
2D textures call the built-
in function
sample
, which is introduced in section 5.10.3).
struct VertexOutput {
float4 position [[position]];
float4 color;
float2 texcoord;
};
2017-9-12 | Copyright © 2017 Apple Inc. All Rights Reserved.
Page of
79
174
Dostları ilə paylaş: |