For single precision floating-point, Metal also supports a precise and fast variant of the
following geometric functions:
distance
,
length
and
normalize
. The
ffast-math
compiler
option (refer to section 6.2) is used to select the appropriate variant when compiling the Metal
source. In addition, the
metal::precise
and
metal::fast
nested namespaces are also
available and provide developers a way to explicitly select the fast or precise variant of these
geometric functions.
5.8 Compute Functions
The functions in section 5.8 and its subsections can only be called from a
kernel
function and
are defined in the header
.
5.8.1
Threadgroup and SIMD-group Synchronization Functions
Table 24 lists supported threadgroup and SIMD-group synchronization functions.
Table 24 Synchronization Compute Function in the Metal Standard Library
Refer to section 5.12.2 for valid values of
scope
.
The
threadgroup_barrier
function acts as an execution and memory barrier. The
threadgroup_barrier
function must be encountered by all threads in a threadgroup
executing the kernel. The
threadgroup_barrier
function also supports a variant that
specifies the memory scope. For the
threadgroup_barrier
variant that does not take a
memory scope, the default
scope
is
memory_scope_threadgroup
.
If
threadgroup_barrier
is inside a conditional statement and if any thread enters the
conditional statement and executes the barrier, then all threads in the threadgroup must enter
the conditional and execute the barrier.
If
threadgroup_barrier
is inside a loop, for each iteration of the loop, all threads in the
threadgroup must execute the
threadgroup_barrier
before any threads are allowed to
continue execution beyond the
threadgroup_barrier
.
Built-in threadgroup function
Description
void threadgroup_barrier(mem_flags
flags)
void threadgroup_barrier(mem_flags
flags, memory_scope scope)
All threads in a threadgroup executing the kernel
must execute this function before any thread is
allowed to continue execution beyond the
threadgroup_barrier
.
void simdgroup_barrier(mem_flags
flags)
void simdgroup_barrier(mem_flags
flags, memory_scope scope)
All threads in a SIMD-group executing the kernel
must execute this function before any thread is
allowed to continue execution beyond the
simdgroup_barrier
.
2017-9-12 | Copyright © 2017 Apple Inc. All Rights Reserved.
Page
of
117
174
The
threadgroup_barrier
function can also queue a memory fence (reads and writes) to
ensure correct ordering of memory operations to threadgroup or device memory.
The
simdgroup_barrier
function acts as an execution and memory barrier. The
simdgroup_barrier
function must be encountered by all threads in a SIMD-group executing
the kernel. The
simdgroup_barrier
function also supports a variant that specifies the memory
scope. For the
simdgroup_barrier
variant that does not take a memory scope, the default
scope
is
memory_scope_simdgroup
.
If
simdgroup_barrier
is inside a conditional statement and if any thread enters the conditional
statement and executes the barrier, then all threads in the SIMD-group must enter the
conditional and execute the barrier.
If
simdgroup_barrier
is inside a loop, for each iteration of the loop, all threads in the SIMD-
group must execute the
simdgroup_barrier
before any threads are allowed to continue
execution beyond the
simdgroup_barrier
.
The
simdgroup_barrier
function can also queue a memory fence (reads and writes) to ensure
correct ordering of memory operations to threadgroup or device memory
The
mem_flags
argument to
threadgroup_barrier
and
simdgroup_barrier
is a bitfield and
can be one or more of the following values, as described in Table 25.
Table 25 mem_flags Enum Values for Barrier Functions
The enumeration types used by
mem_flags
are specified as follows:
enum class mem_flags {mem_none, mem_device, mem_threadgroup, mem_texture};
The
scope
argument specifies whether the memory accesses of threads in the threadgroup to
memory address space(s) identified by
flags
become visible to all threads in the threadgroup
or the device.
mem_flags
Description
mem_none
In this case, no memory fence is applied, and
threadgroup_barrier
acts only as an
execution barrier.
mem_device
Ensure correct ordering of memory operations
to device memory.
mem_threadgroup
Ensure correct ordering of memory operations
to threadgroup memory for threads in a
threadgroup.
mem_texture
Ensure correct ordering of memory operations
to texture memory for threads in a threadgroup.
2017-9-12 | Copyright © 2017 Apple Inc. All Rights Reserved.
Page
of
118
174
5.9 Graphics Functions
This section and its subsections list the set of graphics functions that can be called by a
fragment and vertex functions. These are defined in the header
.
5.9.1
Fragment Functions
The functions in this section (listed in Table 26, Table 27, and Table 28) can only be called inside
a fragment function (a function declared with the
fragment
function specifier) or inside a
function called from a fragment function. Otherwise the behavior is undefined and may result in
a compile-time error.
Fragment function helper threads may be created to help evaluate derivatives (explicit or
implicit) for use with a fragment thread(s). Fragment function helper threads execute the same
code as the non-helper fragment threads, but do not have side effects that modify the render
target(s) or any other memory that can be accessed by the fragment function. In particular:
• Fragments corresponding to helper threads are discarded when the fragment function
execution is complete without any updates to the render target(s).
• Stores and atomic operations to buffers and textures performed by helper threads have
no effect on the underlying memory associated with the buffer or texture.
5.9.1.1
Fragment Functions – Derivatives
Metal includes the functions in Table 26 to compute derivatives.
T
is one of
float
,
float2
,
float3
,
float4
,
half
,
half2
,
half3
or
half4
.
NOTE: Derivatives are undefined within non-uniform control flow.
Table 26 Derivatives Fragment Functions in the Metal Standard Library
5.9.1.2
Fragment Functions – Samples
Built-in fragment functions
Description
T dfdx(T p)
Returns a high precision partial derivative of the
specified value with respect to the screen space
x
coordinate.
T dfdy(T p)
Returns a high precision partial derivative of the
specified value with respect to the screen space
y
coordinate.
T fwidth(T p)
Returns the sum of the absolute derivatives in x
and y using local differencing for
p
; i.e.,
fabs(dfdx(p)) + fabs(dfdy(p))
2017-9-12 | Copyright © 2017 Apple Inc. All Rights Reserved.
Page
of
119
174
Dostları ilə paylaş: |