Metal 3

leman

Power User
Posts
76
Reaction score
176
What is a resource attachment point?

It's a logical "slot" to which you are binding resources like textures and data buffers to make them accessible in the shaders. You declare them in shaders and then use some sort of API to link them to actual resources. For example, if you want to use a texture in DriectX 12, you'd do something like this in your shader code:

Code:
Texture2D    my_texture : register(t0);

As you can see, DX uses a register model at the shader level, defining a set of abstract registers you can use to refer to attachment points. For this shader, you have one attachment point — the register t0. In the application code you can then create a resource descriptor (with the texture ID and some other information) and bind it to the register t0. When you run the shader, it can access the texture you have linked.

Metal uses a hybrid system, it also has a similar register model, but here I want to talk about their more flexible argument buffers model. With argument buffers you are describing shader resources just like regular C structs, e.g. like this:


Code:
struct Bindings {
    // a pointer to some GPU buffer of floats
    device float* data;
    // a texture 
    device texture2d<float> my_texture;
    // some sort of constant
    int data_len;
} 


// shader code  gets passed a Bindings reference
kernel void shader(constant Bindings& bindings) {
    .... 
}

On the API side, you create a data buffer that will hold a value of type Bindings and use a typed encoder (MTLArgumentEncoder) API to populate the struct members. In this case, the "attachment point" is any location within the struct where you can bind some GPU object. For the type Bindings there are two attachment points: the first member (pointer to data which must be bound to a Metal GPU data buffer) and the second member (which must be bound to a Metal GPU texture object). You can also use arrays etc. in which case every element of an array represents an attachment point.

There are number of important differences between the DX12 and the Metal binding model. DX12 model is flat — you just get a bunch of numbered registers, Metal binding model makes use of data buffers encoding typed structs which in turn can contain pointers to other typed structs etc. etc. You bind resources in DX12 by using a special restricted memory pool of resource descriptors that are linked to register ranges, while in Metal you simply use regular data buffers to lay out your structs.

Now to the core of the issue mentioned in @Colstan's post. Modern DX12 guarantees that at least one million of descriptors/registers is available to any application. Furthermore, it allows you to bind the resources sparsely (that is, if your shader uses one thousand texture registers, you don't actually have to bind a valid resource to all of them as long as you don't read the unset registers). This made one particular approach popular, where the developers would define an unbounded array of textures, allocate the biggest array of resource descriptors that they can (which means one million items) and then use a dynamic system of binding and rebinding the textures. Something like this:


Code:
// array wil a million textures  (textures[0] is in t0, textures[1] is in t1, textures[I] is in t1 etc.)
Texture2D    textures[] : register(t0);

...
// somewhere in the shader code
// my_texture_index is application provided data telling the shader which of the potentially millions textures to use
my_texture = textures[my_texture_index];

The current belief that it is impossible to emulate this approach in Metal (e.g. when you want to port a shader or write a DX12 implementation on top of Metal) since Apple only gives you 500,000 resources where DX12 guarantees at least one million. There are also other problems, e.g. Metal uses typed encoders while DX12 uses type-erased descriptors. At any rate, my experiments suggest that the 500,000 limit in Metal might be misunderstood. This leaves hope that modern DX12 and Vulkan API patterns (which work very similarly to each other) can in fact be efficiently implemented on top of modern Metal.
 

leman

Power User
Posts
76
Reaction score
176
According to a game dev posting on Andrew Tsai's video about Metal 3:

Apparently, this is the biggest issue that CodeWeavers has to deal with in supporting DX12 games with CrossOver.

And now it's official: the 500,000 limit is not what folks though it was. Metal supports multiple millions of bindings. And with Metal 3 and direct GPU addresses, the binding system is a strict functional superset of Vulkan and DirectX.

 

Andropov

Power User
Vaccinated
Posts
223
Reaction score
160
Location
Spain
I think I might be able to use one of the new Metal 3 features (Geometry Shading) to significantly speed up my app’s renderer :)

After a couple weeks of optimizations (worst case went from 14 to 25fps), my renderer is now bandwidth-read limited when working with the biggest/most demanding mesh it can realistically load. It sustains reads of around ~30GiB/s, which I’ve been told is the maximum bandwidth of the device. I’m already using half precision values wherever possible, so after thinking about it for a few days I thought maybe there’s not much else to do.

However, after rewatching Metal 3’s Geometry Shaders talk I decided to try that. My scene is entirely made of quads (which are used as billboards to display spheres), where each quad is expanded from a point source. Doing the point -> quad expansion on the Geometry Shader should save memory bandwidth, although it will also introduce additional overhead. It might actually be slower, but I have to try.
 

leman

Power User
Posts
76
Reaction score
176
I think I might be able to use one of the new Metal 3 features (Geometry Shading) to significantly speed up my app’s renderer :)

After a couple weeks of optimizations (worst case went from 14 to 25fps), my renderer is now bandwidth-read limited when working with the biggest/most demanding mesh it can realistically load. It sustains reads of around ~30GiB/s, which I’ve been told is the maximum bandwidth of the device. I’m already using half precision values wherever possible, so after thinking about it for a few days I thought maybe there’s not much else to do.

However, after rewatching Metal 3’s Geometry Shaders talk I decided to try that. My scene is entirely made of quads (which are used as billboards to display spheres), where each quad is expanded from a point source. Doing the point -> quad expansion on the Geometry Shader should save memory bandwidth, although it will also introduce additional overhead. It might actually be slower, but I have to try.

Why do you need quads to begin with? If you only need billboards, why not use points?
 

Andropov

Power User
Vaccinated
Posts
223
Reaction score
160
Location
Spain
Why do you need quads to begin with? If you only need billboards, why not use points?
I need to know at which point of the billboard I'm at, so I pass down a XY mapping to the triangles, so I can later reconstruct the world coordinates of the point of the billboard I'm at (in camera space) in the fragment shader. I'm unsure if I can do that with points, since I'm currently getting that info from the fragment interpolation of the vertices' XY mapping attribute.

Other people that have used similar things used quads, so that's what I went with. Maybe it could be done with points, IDK. I'll investigate it further.
 

leman

Power User
Posts
76
Reaction score
176
I need to know at which point of the billboard I'm at, so I pass down a XY mapping to the triangles, so I can later reconstruct the world coordinates of the point of the billboard I'm at (in camera space) in the fragment shader. I'm unsure if I can do that with points, since I'm currently getting that info from the fragment interpolation of the vertices' XY mapping attribute.

Other people that have used similar things used quads, so that's what I went with. Maybe it could be done with points, IDK. I'll investigate it further.

I’m fairly confident you can do this with points and barycentric coordinates. But I never did it myself, so I might be mistaken how barycentrics work for points.

Actually scratch that. Just looked at the docs, there is a point_coord fragment shader input attribute that does exactly what you need.

point_coord float2 Two-dimensional coordinates, which range from 0.0 to 1.0 across a point primitive, specifying the location of the current fragment within the point primitive.
 
Last edited:

Andropov

Power User
Vaccinated
Posts
223
Reaction score
160
Location
Spain
I’m fairly confident you can do this with points and barycentric coordinates. But I never did it myself, so I might be mistaken how barycentrics work for points.

Actually scratch that. Just looked at the docs, there is a point_coord fragment shader input attribute that does exactly what you need.
Whoa, thanks! I'll give point_coord a try, I don't know if there's going to be any other roadblock to use points instead of quads. It's even better than barycentric coordinates (I though of that too) because that's A13 or newer only, I think. So hopefully I can switch every supported device to use points, and see if it's faster.
 

leman

Power User
Posts
76
Reaction score
176
Whoa, thanks! I'll give point_coord a try, I don't know if there's going to be any other roadblock to use points instead of quads. It's even better than barycentric coordinates (I though of that too) because that's A13 or newer only, I think. So hopefully I can switch every supported device to use points, and see if it's faster.

Yeah, it’s one of those features that has been around since the dawn of time but is so situational that one completely forgets about it. When I think about it, I believe I read billboard tutorials using gl_PointCoord like twenty years ago :)
 

leman

Power User
Posts
76
Reaction score
176
Does anyone know how similar is MetalFX to FSR 2?

MetalFX offers two upscales, a "spatial" one and a "temporal" one. The first one only needs color input to do its work, the second one needs color, depth and motion data. From a cursory glance these seem to closely mirror FSR1 and FSR2, respectively. Wouldn't be surprised if Apple was secretly inspired by the code published by AMD :)
 
Top Bottom