T
is one of the scalar or vector integer or floating-point types.
Let's take a look at examples that start with the following threadgroup:
simd_shuffle_up()
shifts up each threadgroup by the
delta
number of threads. If
delta
is 2,
the resulting computed SIMD lane IDs are shifted down by 2, as seen below. Negative values for
computed SIMD lane IDs indicate invalid IDs. The computed SIMD lane IDs do not wrap around,
so the data for the lower invalid SIMD lane IDs remain unchanged.
Similarly,
simd_shuffle_down()
shifts down each threadgroup by the
delta
number of
threads. Starting from the original threadgroup, if
delta
is 2, the resulting computed SIMD lane
IDs are shifted up by 2, as seen below. Computed SIMD lane IDs greater than the SIMD- group
size indicate invalid IDs. The computed SIMD lane IDs do not wrap around, so the data for the
upper invalid SIMD lane IDs remain unchanged.
Below is an example of how these SIMD functions can be used to perform a reduction
operation.
kernel void
reduce(const device int *input [[buffer(0)]],
device int *output [[buffer(1)]],
threadgroup int *ldata [[threadgroup(0)]],
uint gid [[thread_position_in_grid]],
T simd_shuffle_xor(T value,
ushort mask)
Returns the value of
data
specified by thread
whose SIMD lane ID is computed by
performing a bitwise XOR of the caller’s SIMD
lane ID and
mask
. The value of
data
specified
by the resulting SIMD lane ID is returned. The
value of
mask
must be the same for all
threads in a SIMD- group; otherwise the
behavior is undefined.
SIMD Lane ID
0
1
2
3
4
5
data
a
b
c
d
e
f
Computed SIMD Lane ID
-2
-1
0
1
2
3
valid
0
0
1
1
1
1
data
a
b
a
b
c
d
Computed SIMD Lane ID
2
3
4
5
6
7
valid
1
1
1
1
0
0
data
c
d
e
f
e
f
2017-9-12 | Copyright © 2017 Apple Inc. All Rights Reserved.
Page
of
147
174