Metal Shading Language Specification

Yüklə 4,82 Kb.

Pdf görüntüsü

səhifə	45/51
tarix	25.05.2018
ölçüsü	4,82 Kb.
	#45967

1 ... 41 42 43 44 45 46 47 48 ... 51

For all versions of Metal, the following atomic fetch/modify functions are supported. For all
versions of Metal,
memory_order_relaxed
is supported for order. For
ios-metal2.0
, all
memory_order values are available.
C atomic_fetch_key_explicit(threadgroup A* object,

     M operand,

     memory_order order)
C atomic_fetch_key_explicit(volatile threadgroup A* object,

     M operand,

     memory_order order)
C atomic_fetch_key_explicit(device A* object,

     M operand,

     memory_order order)
C atomic_fetch_key_explicit(volatile device A* object,

     M operand,

     memory_order order)
C atomic_fetch_key_explicit(device A* object,

     M operand,

     memory_order order,

     memory_scope scope)
C atomic_fetch_key_explicit(volatile device A* object,

     M operand,

     memory_order order,

     memory_scope scope)
For
ios-metal2.0
, the following atomic fetch/modify functions are also supported.
memory_order_seq_cst
is the implied memory order.
C atomic_fetch_key(threadgroup A* object, M operand)
C atomic_fetch_key(volatile threadgroup A* object, M operand)
C atomic_fetch_key(device A* object, M operand)
C atomic_fetch_key(volatile device A* object, M operand)
5.13  SIMD-group Functions
The SIMD-group functions in Table 34 are supported by kernel and fragment functions. These
functions allow threads in a SIMD-group to share data without the use of threadgroup memory
or require any synchronization operations such as a barrier. Threads may only read data from
another thread in the SIMD-group that is actively participating. If the target thread is inactive,
the retrieved value is undefined.

2017-9-12   |  Copyright © 2017 Apple Inc. All Rights Reserved.
Page
of
145
174

SIMD-groups and SIMD-group functions are only supported for
macos-metal2.0
. SIMD-group
functions are defined in the header

.
Table 34 SIMD-group Functions in the Metal Standard Library
Built-in SIMD-group functions
Description
T simd_shuffle(T data,
 
ushort simd_lane_id)
Returns the value of data specified by thread
whose SIMD lane ID is
simd_lane_id
. The
value of
simd_lane_id
does not have to be
the same for all threads in the SIMD-group.
The
simd_lane_id
must be a valid SIMD
lane ID; otherwise the behavior is undefined.
T simd_broadcast(T data,
 
ushort broadcast_lane_id)
Broadcast the value of data specified by
thread whose SIMD lane ID is
broadcast_lane_id
.
broadcast_lane_id

must be a valid SIMD lane ID and must be the
same for all threads in a SIMD-group;
otherwise the behavior is undefined.
T simd_shuffle_up(T data,
 
ushort delta)
Returns the value of
data
specified by thread
whose SIMD lane ID is computed by
subtracting delta from the caller’s SIMD lane
ID. The value of
data
specified by the
resulting SIMD lane ID is returned. The
computed SIMD lane ID will not wrap around
the value of the SIMD-group size so the lower
delta
lanes will remain unchanged. The
value of
delta
must be the same for all
threads in a SIMD-group; otherwise the
behavior is undefined.
T simd_shuffle_down(T data,
 
ushort delta)
Returns the value of
data
specified by thread
whose SIMD lane ID is computed by adding
delta to the caller’s SIMD lane ID. The value of
data
specified by the resulting SIMD lane ID
is returned. The computed SIMD lane ID will
not wrap around the value of the SIMD-group
size so the upper
delta
lanes will remain
unchanged. The value of
delta
must be the
same for all threads in a SIMD-group;
otherwise the behavior is undefined.

2017-9-12   |  Copyright © 2017 Apple Inc. All Rights Reserved.
Page
of
146
174

T
is one of the scalar or vector integer or floating-point types.
Let's take a look at examples that start with the following threadgroup:
simd_shuffle_up()
shifts up each threadgroup by the
delta
number of threads. If
delta
is 2,
the resulting computed SIMD lane IDs are shifted down by 2, as seen below. Negative values for
computed SIMD lane IDs indicate invalid IDs. The computed SIMD lane IDs do not wrap around,
so the data for the lower invalid SIMD lane IDs remain unchanged.
Similarly,
simd_shuffle_down()
shifts down each threadgroup by the
delta
number of
threads. Starting from the original threadgroup, if
delta
is 2, the resulting computed SIMD lane
IDs are shifted up by 2, as seen below. Computed SIMD lane IDs greater than the SIMD- group
size indicate invalid IDs. The computed SIMD lane IDs do not wrap around, so the data for the
upper invalid SIMD lane IDs remain unchanged.
Below is an example of how these SIMD functions can be used to perform a reduction
operation.
kernel void
reduce(const device int *input [[buffer(0)]],

device int *output [[buffer(1)]],

threadgroup int *ldata [[threadgroup(0)]],

uint gid [[thread_position_in_grid]],
T simd_shuffle_xor(T value,
 
ushort mask)
Returns the value of
data
specified by thread
whose SIMD lane ID is computed by
performing a bitwise XOR of the caller’s SIMD
lane ID and
mask
. The value of
data
specified
by the resulting SIMD lane ID is returned. The
value of
mask
must be the same for all
threads in a SIMD- group; otherwise the
behavior is undefined.
SIMD Lane ID
0
1
2
3
4
5
data
a
b
c
d
e
f
Computed SIMD Lane ID
-2
-1
0
1
2
3
valid
0
0
1
1
1
1
data
a
b
a
b
c
d
Computed SIMD Lane ID
2
3
4
5
6
7
valid
1
1
1
1
0
0
data
c
d
e
f
e
f

2017-9-12   |  Copyright © 2017 Apple Inc. All Rights Reserved.
Page
of
147
174

Yüklə 4,82 Kb.

Dostları ilə paylaş:

1 ... 41 42 43 44 45 46 47 48 ... 51