Metal Shading Language Specification

Yüklə 4,82 Kb.

Pdf görüntüsü

səhifə	17/51
tarix	25.05.2018
ölçüsü	4,82 Kb.
	#45967

1 ... 13 14 15 16 17 18 19 20 ... 51

The post-tessellation vertex function generates the final vertex data for the tessellated
triangles. For example, to add additional detail (such as displacement mapping values) to the
rendered geometry, the post-tessellation vertex function can sample a texture to modify the
vertex position by a displacement value.
After the post-tessellation vertex function has executed, the tessellated primitives are
rasterized.
The post-tessellation vertex function is a vertex function identified using the ordinary
vertex

function specifier.
4.1.1.1
Patch Type and Number of Control Points Per-Patch
The
[[patch]]
specifier is required for the post-tessellation vertex function.
For macOS, the
[[patch(patch-type, N)]]
specifier must specify both the patch type
(
patch-type
is either
quad
or
triangle
) and the number of control points in the patch (
N
must
be a value from 0 to 32). For iOS, specifying the
patch-type
is required, but the number of
control points is optional.
If the number of control points are specified in the post-tessellation vertex function, this number
must match the number of control points provided to the
drawPatches
or
drawIndexedPatches
API.
Example:
[[patch(quad)]]
vertex vertex_output
my_post_tessellation_vertex(…)
{…}
[[patch(quad, 16)]]
vertex vertex_output
my_bezier_vertex(…)
{…}
4.1.2
Tile Functions
A tile shading function is a special type of compute kernel or fragment function that can
execute inline with graphics operations and take advantage of the tile-based deferred rendering
(TBDR) architecture. With TBDR, commands are buffered until a large list of commands is
accumulated. The hardware divides the framebuffer into tiles and then renders only the
primitives that are visible within each tile. Tile shading functions support performing compute
operations in the middle of rendering, which can access memory more efficiently by reducing
round trips to memory and utilizing high-bandwidth local memory.

2017-9-12   |  Copyright © 2017 Apple Inc. All Rights Reserved.
Page    of
53
174

A tile function launches a set of threads called a dispatch, which is organized into threadgroups
and grids. Threads may be launched at any point in a render pass and as often as needed. Tile
functions barrier against previous and subsequent draws, so a tile function does not execute
until all earlier draws have completed. Likewise, later draws do not execute until the tile function
completes.
A tile function is only supported for
iOS-metal2.0
. iOS GPUs always process each tile and
each dispatch to completion. All draws and dispatches for a tile will launch in submission before
the next tile is processed.
Tile functions have access to 32KB of threadgroup memory that may be divided between
imageblock storage and threadgroup storage. (For details on the
threadgroup_imageblock

address space, see section 4.2.3.) The imageblock size is dependent on the tile width, tile
height, and the bit depth of each sample. The bit depth is determined either by the render pass
attachments (see implicit imageblock layout in section 2.10.1.1) or in function-declared
structures (see explicit imageblock layout in section 2.10.1.2). For a detailed description of how
the
threadgroup_imageblock
address space is used in kernel functions, refer to section 4.2.3.
4.1.3
Fragment Function Specifier
The
[[early_fragment_tests]]
function specifier can be used with a fragment function to
request that fragment tests be performed before fragment function execution.
Below is an example of a fragment function that uses this specifier:
[[early_fragment_tests]]
fragment float4
my_fragment( … )
{…}

NOTE:
• It is an error if the return type of the fragment function declared with the
[[early_fragment_tests]]
specifier includes a depth value i.e. the return type of
this fragment function includes an element declared with the
[[depth(depth_attribute]]
attribute.
• It is an error to use the
[[early_fragment_tests]]
specifier with any function
that is not a fragment function i.e. not declared with the
fragment
specifier.
4.2  Address Space Attributes for Variables and
Arguments
The Metal shading language implements address space attributes to specify the region of
memory where a function variable or argument is allocated. These attributes describe disjoint
address spaces for variables:
•
device
(for more details, see section 4.2.1)

2017-9-12   |  Copyright © 2017 Apple Inc. All Rights Reserved.
Page    of
54
174

•
threadgroup
(see section 4.2.2)
•
threadgroup_imageblock
(see section 4.2.3)
•
constant
(see section 4.2.4)
•
thread
  (see section 4.2.5)
All arguments to a graphics or kernel function that are a pointer or reference to a type must be
declared with an address space attribute. For graphics functions, an argument that is a pointer
or reference to a type must be declared in the
device
or
constant
address space. For kernel
functions, an argument that is a pointer or reference to a type must be declared in the
device
,
threadgroup
,
threadgroup_imageblock
, or
constant
address space. The following example
introduces the use of several address space attributes. (The
threadgroup
attribute is
supported here for the pointer
l_data
only if
foo
is called by a kernel function, as detailed in
section 4.2.2.)
void foo(device int *g_data,
threadgroup int *l_data,
  constant float *c_data)
{…}
The address space for a variable at program scope must be
constant
.
Any variable that is a pointer or reference must be declared with one of the address space
attributes discussed in this section. If an address space attribute is missing on a pointer or
reference type declaration, a compilation error occurs.
4.2.1
device Address Space
The
device
address space name refers to buffer memory objects allocated from the device
memory pool that are both readable and writeable.
A buffer memory object can be declared as a pointer or reference to a scalar, vector or user-
defined struct. The actual size of the buffer memory object is determined when the memory
object is allocated via appropriate Metal API calls in the host code.
Some examples are:
// an array of a float vector with 4 components
device float4 *color;
struct Foo {
float a[3];
int b[2];
}

2017-9-12   |  Copyright © 2017 Apple Inc. All Rights Reserved.
Page    of
55
174

Yüklə 4,82 Kb.

Dostları ilə paylaş:

1 ... 13 14 15 16 17 18 19 20 ... 51