If a fragment function writes a depth value, the
depth_argument
must be specified with one of
the following values:
any
greater
less
The following example shows how color attachment indices can be specified. Color values
written in
clr_f
write to color attachment index 0,
clr_i
to color attachment index 1, and
clr_ui
to color attachment index 2.
struct MyFragmentOutput {
// color attachment 0
float4 clr_f [[color(0)]];
// color attachment 1
int4 clr_i [[color(1)]];
// color attachment 2
uint4 clr_ui [[color(2)]];
}
fragment MyFragmentOutput
my_fragment(…)
{
MyFragmentOutput f;
…
f.clr_f = …;
…
return f;
}
NOTE: If a color attachment index is used both as an input to and output of a fragment
function, the data types associated with the input argument and output declared with
this color attachment index must match.
4.3.4.6
Kernel Function Input Attributes
2017-9-12 | Copyright © 2017 Apple Inc. All Rights Reserved.
Page of
74
174
When a kernel is submitted for execution, it executes over an N-dimensional grid of threads,
where N is one, two or three. A thread is an instance of the kernel that executes for each point in
this grid, and
thread_position_in_grid
identifies its position in the grid.
Within a compute unit, a threadgroup is partitioned into multiple smaller groups for execution.
The execution width of the compute unit, referred to as the
thread_execution_width
,
determines the recommended size of this smaller group. For best performance, the total
number of threads in the threadgroup should be a multiple of the
thread_execution_width
.
Threadgroups are assigned a unique position within the grid (referred to as
threadgroup_position_in_grid
). Threads are assigned a unique position within a
threadgroup (referred to as
thread_position_in_threadgroup
). The unique scalar index of a
thread within a threadgroup is given by
thread_index_in_threadgroup
.
Each thread’s position in the grid and position in the threadgroup are N-dimensional tuples.
Threadgroups are assigned a position using a similar approach to that used for threads. Threads
are assigned to a threadgroup and given a position in the threadgroup with components in the
range from zero to the size of the threadgroup size in that dimension minus one.
When a kernel is submitted for execution, the number of threadgroups and the threadgroup size
are specified, or the number of threads in the grid and the threadgroup size are specified, or the
number of threads in the grid and the threadgroup size are specified. For example, consider a
kernel submitted for execution that uses a 2-dimensional grid where the number of
threadgroups specified are
(Wx, Wy)
and the threadgroup size is
(Sx, Sy)
. Let
(wx, wy)
be
the position of each threadgroup in the grid (i.e.,
threadgroup_position_in_grid
) and
(lx,
ly)
be the position of each thread in the threadgroup (i.e.,
thread_position_in_threadgroup
).
The thread position in the grid (i.e.,
thread_position_in_grid
) is:
(gx, gy) = (wx * Sx + lx, wy * Sy + ly)
The grid size (i.e.,
threads_per_grid
) is:
(Gx, Gy) = (Wx * Sx, Wy * Sy)
In most cases (other than a tile function), the thread index in the threadgroup (i.e.,
thread_index_in_threadgroup
) is determined by:
ly * Sx + lx
For a tile function, the thread index is not a linear mapping from the lx and ly values. Each thread
in a tile function is guaranteed to get a unique index in the range [0, Sx * Sy).
Threadgroups may be multi-dimensional, but a SIMD-group is 1-dimensional. Any given thread
in a SIMD-group can query its SIMD lane ID and which SIMD-group it is a member of. The
number of SIMD-groups (given by
[[simdgroups_per_threadgroup]]
) is computed by the
total number of threads in threadgroup / SIMD-group size (i.e., the thread execution width). The
[[dispatch_simdgroups_per_threadgroup]]
is computed as the number of threads in the
the threadgroup size specified at dispatch / SIMD-group size.
2017-9-12 | Copyright © 2017 Apple Inc. All Rights Reserved.
Page of
75
174
SIMD-groups execute concurrently within a given threadgroup and make independent forward
progress with respect to each other, even in the absence of threadgroup barrier operations.
Threads within a SIMD-group do not need to perform any barrier operations for synchronization.
The thread index in the SIMD-group (given by
[[thread_index_in_simdgroup]]
) is a value
between 0 and SIMD-group size – 1, inclusive.
Similarly, the number of quad-groups (given by
[[quadgroups_per_threadgroup]]
) is the
total number of threads in threadgroup divided by 4, which is the thread execution width of a
quad-group.
[[dispatch_quadgroups_per_threadgroup]]
is computed as the number of
threads in the threadgroup size specified at dispatch divided by 4. The thread index in a quad-
group (given by
[[thread_index_in_quadgroup]]
) is a value between 0 and 3, inclusive.
Table 15 lists the built-in attributes that can be specified for arguments to a kernel function and
the corresponding data types with which they can be used.
Table 15 Attributes for Kernel Function Input Arguments
Attribute
Corresponding Data Types
[[thread_position_in_grid]]
ushort
,
ushort2
,
ushort3
,
uint
,
uint2
or
uint3
[[thread_position_in_threadgroup]]
ushort
,
ushort2
,
ushort3
,
uint
,
uint2
or
uint3
[[thread_index_in_threadgroup]]
ushort
or
uint
[[threadgroup_position_in_grid]]
ushort
,
ushort2
,
ushort3
,
uint
,
uint2
or
uint3
[[threads_per_grid]]
ushort
,
ushort2
,
ushort3
,
uint
,
uint2
or
uint3
[[threads_per_threadgroup]]
ushort
,
ushort2
,
ushort3
,
uint
,
uint2
or
uint3
[[dispatch_threads_per_threadgroup]]
ushort
,
ushort2
,
ushort3
,
uint
,
uint2
or
uint3
[[threadgroups_per_grid]]
ushort
,
ushort2
,
ushort3
,
uint
,
uint2
or
uint3
[[thread_execution_width]]
ushort
or
uint
[[threads_per_simdgroup]]
ushort
or
uint
[[thread_index_in_simdgroup]]
ushort
or
uint
[[thread_index_in_quadgroup]]
ushort
or
uint
[[simdgroup_index_in_threadgroup]]
ushort
or
uint
2017-9-12 | Copyright © 2017 Apple Inc. All Rights Reserved.
Page of
76
174
Dostları ilə paylaş: |