Vulkan undergoing major API shift

dada_dave · Apr 2, 2023

So it seems Vulkan is deciding that the low level graphics API (at least as they implemented it using pipelines) did not work as intended and will be reintroducing a higher level shader stage. They seem to imply that other low level APIs likewise suffer. I’ll put in a couple of quote but here is the full link:

Vulkan-Docs/proposals/VK_EXT_shader_object.adoc at main · KhronosGroup/Vulkan-Docs

The Vulkan API Specification and related tools. Contribute to KhronosGroup/Vulkan-Docs development by creating an account on GitHub.

github.com

Thoughts? @leman @Andropov @Nycturne any others?

Enter the new low-level APIs like Mantle and ultimately Vulkan. These APIs set out to reduce driver overhead by exposing lower-level abstractions that would hopefully avoid the need for the draw time state validation and shader patching that was so problematic for IHVs, and so detrimental to performance for applications.

Many of these assumptions have since proven to be unrealistic.
On the application side, many developers considering or implementing Vulkan and similar APIs found them unable to efficiently support important use cases which were easily supportable in earlier APIs. This has not been simply a matter of developers being stuck in an old way of thinking or unwilling to "rein in" an unnecessarily large number of state combinations, but a reflection of the reality that the natural design patterns of the most demanding class of applications which use graphics APIs — video games — are inherently and deeply dependent on the very "dynamism" that pipelines set out to constrain.
As a result, renderers with a choice of API have largely chosen to avoid Vulkan and its "pipelined" contemporaries, while those without a choice have largely just devised workarounds to make these new APIs behave like the old ones

leman · Apr 3, 2023

dada_dave said:
So it seems Vulkan is deciding that the low level graphics API (at least as they implemented it using pipelines) did not work as intended and will be reintroducing a higher level shader stage. They seem to imply that other low level APIs likewise suffer. I’ll put in a couple of quote but here is the full link:

Vulkan-Docs/proposals/VK_EXT_shader_object.adoc at main · KhronosGroup/Vulkan-Docs

The Vulkan API Specification and related tools. Contribute to KhronosGroup/Vulkan-Docs development by creating an account on GitHub.

github.com

Thoughts? @leman @Andropov @Nycturne any others?

The idea of pipelines was to abstract the state that can be switched quickly on any GPU. I agree that it's too crude of a tool and makes combining shaders awkward. But I wonder how this proposal is going to address the needs of the hardware which require shader recompilation on certain state change (like hardware capable of programmable blending where blending is done by the pixel shader). The document bluntly states "IHVs whose implementations have such limitations today are encouraged to consider incorporating changes which could remove these limitations into their future hardware roadmaps". Which to me kind of sounds like "FY if you are not Nvidia or AMD". And what does any of this means for subpasses (which are vital for bandwidth-optimised solutions like tiled renderers)? I am a bit worried that Vulkan might be evolving to a hegemony of large desktop GPUs.

I mean, in principle, I think that abandoning pipelines in favour of a more dynamic approach is a terrific idea. But care needs to be taken that this does not penalise more flexible hardware solutions or restricts the hardware innovation space. I mean, why would mobile GPUs be penalised for having programmable blending while desktop IMRs be rewarded for being less flexible?

KingOfPain · Apr 3, 2023

Since graphics API isn't my forte, please excuse this possible stuipd question:
Does Metal also have these pipelines?

leman · Apr 3, 2023

KingOfPain said:
Since graphics API isn't my forte, please excuse this possible stuipd question:
Does Metal also have these pipelines?

Yes, they are foundation of Metal (as pretty much any other modern graphics API).

KingOfPain · Apr 3, 2023

Thanks, Leman!

dada_dave · Apr 3, 2023

leman said:
Yes, they are foundation of Metal (as pretty much any other modern graphics API).

Interesting! So would you agree with the statement that the promise of the pipeline idea isn’t being borne out in practice for Metal as well? The Vulkan problem statement makes it sound like all the modern APIs essentially took a wrong turn with the pipelines concept and all the low level APIs end up not being as efficient/performant in practice as people thought they’d be in theory.

dada_dave · Apr 3, 2023

KingOfPain said:
Since graphics API isn't my forte, please excuse this possible stuipd question:
Does Metal also have these pipelines?

I would’ve had the same question if you hadn’t asked first!

leman · Apr 3, 2023

dada_dave said:
Interesting! So would you agree with the statement that the promise of the pipeline idea isn’t being borne out in practice for Metal as well? The Vulkan problem statement makes it sound like all the modern APIs essentially took a wrong turn with the pipelines concept and all the low level APIs end up not being as efficient/performant in practice as people thought they’d be in theory.

I don't have any industry experience with large projects, so I can't really provide an informed take on this. I can however try to give a little bit of historical and theoretical background on the issue. One big problem of legacy APIs (especially OpenGL) was the unpredictable performance of state changes. OpenGL programming model is a state machine — you set up certain parameters (e.g. texturing mode, color blend mode, shading mode, active textures, or data buffers), and then execute drawing operations that make use of these parameters. Complex apps and games would obviously need quite a bit of those state changes to draw complex scenes. Now, the problem is that on some hardware some state changes are quite expensive. Maybe on hardware A changing texture filtering mode is as simple as setting a bit in the shader memory, and for hardware B it requires a complete reconfiguration of the vertex submission process with a new variant of the vertex and the pixel shader. What would happen in practice is that you do something that would seemingly be trivial (like enabling alpha testing), and your app would suffer a multiple ms of delay because a whole new shader combination needs to be compiled and linked. And you had no way of knowing where to expect these slowdowns because the API wasn't designed with this concern in mind. So what one ended up doing is testing the behaviour on various hardware and driver versions, identifying problems and trying to shuffle the code and state changes around to minimise the slowdowns (which sometimes required device-specific code!)

Pipelines were introduced to solve all these issues. The idea was that your API makes guarantees which state changes are fast and which state changes are slow. You get pre-configured immutable state objects (pipelines), which might be slow to create, but should be fast to change. Obviously, to accommodate as much hardware as possible under the same API, these state objects need to be fat enough to include all things that might be slow on different types of hardware. The rendering pipeline state object in major APIs for example includes things like vertex and fragment shaders, blend state, accepted types of render targets and some others. This doesn't mean that changing these parameters separately is slow on all GPUs: some for example can allow for efficient blend state change, but others do not, which is why you put all these things together.

So what you end up with is a neat API that gives you predictable performance by allowing you to bake the necessary state configurations beforehand and then just then those in your rendering loop with low overhead. Everybody happy, right? Well, for a brief while anyway. The problem is that every combination of different shader programs plus associated parameters is a new state object to be baked. And the number of shader stages keep growing: mesh shading, tessellation, ray tracing... that's quote some combinatorial potential! Plus, recent APIs introduced the concept called shader specialisation, where you can optimise the shader using some application-supplied values. And since this requires to compile a new shader variant, you need a new pipeline state object.

For modern games and applications this can very quickly become a burden. The usual idea of baking the pipelines at application startup stops being feasible as the number of combination explodes (ever wondered why newer games take ages to start?) and if you add specialisation to the mix it gets even worse... and what's the worst, much of these state objects are almost entirely redundant, as they contain copies of the same programs, just combined in a slightly different way, for many hardware platforms.

So the idea of EXT_shader_object is basically to break down the pipeline state into smaller pieces of immutable state which could be handled by the application more efficiently. And as I wrote, this is a good initiative. I am just worried that the way how it's formulated it might hurt certain platforms that simply do things differently from the big two. But then again, I see people from IMG, Qualcomm, and ARM as contributors to the extension, so it would seem that mobile vendors see no problem with this approach.

I wonder whether we will see some changes in this area from Apple at WWDC this year. They did actually introduce some tools for managing these issues (at least to a certain degree). Recent Metal versions support function pointers as well feature a fairly involved system of composing functions from basic building blocks (keyword: function stitching). But I never used those and don't really know how they work. The basic idea is that you can have your shaders call some utility function to handle certain operations, and those function can be bound at runtime (you are literally passing a function pointer to the shader, and that pointer can be set by the client code). You still use pre-backed pipeline state objects, but hopefully you need less of those since you can cover a lot of cases with these pluggable functions. Still, all that stuff seems awkward to use and has a large API surface, which is kind of uncommon with usually elegant and straight to the point Metal. It might be a good idea to just radically overhauls the entire thing and just use function pointers though the entire API. The big question is whether Apple hardware can do it efficiently or whether they rely on some state being pre-baked.

dada_dave · Apr 3, 2023

leman said:
I don't have any industry experience with large projects, so I can't really provide an informed take on this. I can however try to give a little bit of historical and theoretical background on the issue. One big problem of legacy APIs (especially OpenGL) was the unpredictable performance of state changes. OpenGL programming model is a state machine — you set up certain parameters (e.g. texturing mode, color blend mode, shading mode, active textures, or data buffers), and then execute drawing operations that make use of these parameters. Complex apps and games would obviously need quite a bit of those state changes to draw complex scenes. Now, the problem is that on some hardware some state changes are quite expensive. Maybe on hardware A changing texture filtering mode is as simple as setting a bit in the shader memory, and for hardware B it requires a complete reconfiguration of the vertex submission process with a new variant of the vertex and the pixel shader. What would happen in practice is that you do something that would seemingly be trivial (like enabling alpha testing), and your app would suffer a multiple ms of delay because a whole new shader combination needs to be compiled and linked. And you had no way of knowing where to expect these slowdowns because the API wasn't designed with this concern in mind. So what one ended up doing is testing the behaviour on various hardware and driver versions, identifying problems and trying to shuffle the code and state changes around to minimise the slowdowns (which sometimes required device-specific code!)

Pipelines were introduced to solve all these issues. The idea was that your API makes guarantees which state changes are fast and which state changes are slow. You get pre-configured immutable state objects (pipelines), which might be slow to create, but should be fast to change. Obviously, to accommodate as much hardware as possible under the same API, these state objects need to be fat enough to include all things that might be slow on different types of hardware. The rendering pipeline state object in major APIs for example includes things like vertex and fragment shaders, blend state, accepted types of render targets and some others. This doesn't mean that changing these parameters separately is slow on all GPUs: some for example can allow for efficient blend state change, but others do not, which is why you put all these things together.

So what you end up with is a neat API that gives you predictable performance by allowing you to bake the necessary state configurations beforehand and then just then those in your rendering loop with low overhead. Everybody happy, right? Well, for a brief while anyway. The problem is that every combination of different shader programs plus associated parameters is a new state object to be baked. And the number of shader stages keep growing: mesh shading, tessellation, ray tracing... that's quote some combinatorial potential! Plus, recent APIs introduced the concept called shader specialisation, where you can optimise the shader using some application-supplied values. And since this requires to compile a new shader variant, you need a new pipeline state object.

For modern games and applications this can very quickly become a burden. The usual idea of baking the pipelines at application startup stops being feasible as the number of combination explodes (ever wondered why newer games take ages to start?) and if you add specialisation to the mix it gets even worse... and what's the worst, much of these state objects are almost entirely redundant, as they contain copies of the same programs, just combined in a slightly different way, for many hardware platforms.

So the idea of EXT_shader_object is basically to break down the pipeline state into smaller pieces of immutable state which could be handled by the application more efficiently. And as I wrote, this is a good initiative. I am just worried that the way how it's formulated it might hurt certain platforms that simply do things differently from the big two. But then again, I see people from IMG, Qualcomm, and ARM as contributors to the extension, so it would seem that mobile vendors see no problem with this approach.

I wonder whether we will see some changes in this area from Apple at WWDC this year. They did actually introduce some tools for managing these issues (at least to a certain degree). Recent Metal versions support function pointers as well feature a fairly involved system of composing functions from basic building blocks (keyword: function stitching). But I never used those and don't really know how they work. The basic idea is that you can have your shaders call some utility function to handle certain operations, and those function can be bound at runtime (you are literally passing a function pointer to the shader, and that pointer can be set by the client code). You still use pre-backed pipeline state objects, but hopefully you need less of those since you can cover a lot of cases with these pluggable functions. Still, all that stuff seems awkward to use and has a large API surface, which is kind of uncommon with usually elegant and straight to the point Metal. It might be a good idea to just radically overhauls the entire thing and just use function pointers though the entire API. The big question is whether Apple hardware can do it efficiently or whether they rely on some state being pre-baked.

Pretty damn informative for me!

Andropov · Apr 4, 2023

I have never written anything in any graphics API other than Metal so I'm not really qualified to comment on Vulkan's issues but... the wording of that doc makes it sound like a major failure of the language to meet its goals. Maybe it's not that bad, but the wording sure is harsh. Vulkan was supposed to be all new and shiny not long ago.

I glanced through a Vulkan tutorial and the API doesn't look radically different to Metal for the pipeline management. So the obvious question arises: why haven't Metal's MTLRenderPipelineState ran into the same issues? One could argue that Apple will eventually need to do the same thing Vulkan is proposing but, instead, in Metal 3, pipelines can be pre-compiled (using a MTLBinaryArchive, instead of compiling them at runtime), which, if anything, removes flexibility instead of adding more.

leman said:
OpenGL programming model is a state machine — you set up certain parameters (e.g. texturing mode, color blend mode, shading mode, active textures, or data buffers), and then execute drawing operations that make use of these parameters. Complex apps and games would obviously need quite a bit of those state changes to draw complex scenes. Now, the problem is that on some hardware some state changes are quite expensive. Maybe on hardware A changing texture filtering mode is as simple as setting a bit in the shader memory, and for hardware B it requires a complete reconfiguration of the vertex submission process with a new variant of the vertex and the pixel shader. What would happen in practice is that you do something that would seemingly be trivial (like enabling alpha testing), and your app would suffer a multiple ms of delay because a whole new shader combination needs to be compiled and linked. And you had no way of knowing where to expect these slowdowns because the API wasn't designed with this concern in mind. So what one ended up doing is testing the behaviour on various hardware and driver versions, identifying problems and trying to shuffle the code and state changes around to minimise the slowdowns (which sometimes required device-specific code!)

Another problem with the massive global state of OpenGL was thread-safety. Global, shared state imposes synchronization concerns for every thread that wants to access that state.

leman said:
Plus, recent APIs introduced the concept called shader specialisation, where you can optimise the shader using some application-supplied values. And since this requires to compile a new shader variant, you need a new pipeline state object.

To build on what @leman said, the idea is that some shader constants (for example) might not be known at the application compile time, but rather at the application runtime. The additional optimizations you get by creating specialized shaders are essentially the optimizations you get from the compiler by providing known (at shader compile time) constants vs dynamic checks.

leman · Apr 4, 2023

Andropov said:
I glanced through a Vulkan tutorial and the API doesn't look radically different to Metal for the pipeline management. So the obvious question arises: why haven't Metal's MTLRenderPipelineState ran into the same issues?

Who says they don't? It's the same design, so it will have the same problems. Last time I run a Blender rendering benchmark on my Mac, it took multiple minutes just to build all the shaders and pipeline variants.

Andropov said:
One could argue that Apple will eventually need to do the same thing Vulkan is proposing but, instead, in Metal 3, pipelines can be pre-compiled (using a MTLBinaryArchive, instead of compiling them at runtime), which, if anything, removes flexibility instead of adding more.

I suppose that is how Apple has been trying to solve the scalability problem: move the pipeline baking to the build step. Another tool are shader function pointers and dynamically linked functions. But this is a very different design to what Vulkan is pursuing. If the APIs diverge that much, one can entirely forget about efficient emulation of Vulkan on Apple platforms.

KingOfPain · Apr 5, 2023

leman said:
If the APIs diverge that much, one can entirely forget about efficient emulation of Vulkan on Apple platforms.

I guess that is the main crux. Vulkan isn't directly supported by macOS and OpenGL is deprecated. If Vulkan and Metal diverge even more than they already have, MoltenVK most likely isn't a practical option anymore either.

dada_dave · Apr 5, 2023

KingOfPain said:
I guess that is the main crux. Vulkan isn't directly supported by macOS and OpenGL is deprecated. If Vulkan and Metal diverge even more than they already have, MoltenVK most likely isn't a practical option anymore either.

Oh I dunno: People figured out how to emulate all the old graphics APIs with each other and they were pretty different. There might be a hit both in performance and definitely code size but I’d bet it’d still be practical to do. For one thing even Metal and Vulkan diverge Apple would not get rid of hardware to accelerate things like pipelines - Alyssa and Lina have discovered plenty in the Apple GPU that is clearly designed for OpenGL still even though it isn’t exposed or at least used normally by the Metal API.

Andropov · Apr 6, 2023

leman said:
Who says they don't? It's the same design, so it will have the same problems. Last time I run a Blender rendering benchmark on my Mac, it took multiple minutes just to build all the shaders and pipeline variants.

Well, that was kind of my point, since the design is so similar, devs using Metal should be running into the same issues. But I have only ever heard complaints about Vulkan, not Metal, so it's surprising to me that all that's said in that document is applicable to Metal too.

leman said:
For modern games and applications this can very quickly become a burden. The usual idea of baking the pipelines at application startup stops being feasible as the number of combination explodes (ever wondered why newer games take ages to start?) and if you add specialisation to the mix it gets even worse... and what's the worst, much of these state objects are almost entirely redundant, as they contain copies of the same programs, just combined in a slightly different way, for many hardware platforms.

But, although code has to be maintained if different hardware platforms require different shaders, that shouldn't impact startup time. Only the shaders that could be used in a certain platform need to be compiled at runtime. For example if you're targeting a certain family of Metal devices you may need to provide a fallback for older devices that don't support that family, in the shape of a different shader (and thus, a different pipeline). But you'd only need to compile the pipeline that your system supports.

leman said:
So the idea of EXT_shader_object is basically to break down the pipeline state into smaller pieces of immutable state which could be handled by the application more efficiently. And as I wrote, this is a good initiative. I am just worried that the way how it's formulated it might hurt certain platforms that simply do things differently from the big two. But then again, I see people from IMG, Qualcomm, and ARM as contributors to the extension, so it would seem that mobile vendors see no problem with this approach.

It's also leaving performance on the plate by making unlinked shaders the default. Say you have a common vertex stage that could potentially be used with multiple (different) fragment stages. If you compile the vertex and fragments separately, you cut down a massive number of possible combinations, but at the same time you lose all the compile-time optimizations that can be done by knowing both the vertex and fragment stages' code. On the other hand, you'd need a lot of pipelines to match the same number of combinations, but each pipeline would be compiler-optimized for that particular vertex+fragment combination. There's also a mention to Shader optimization hints under Further functionality, but the whole system would be opt-in rather than opt-out.

You also lose compile-time validation that the combination of vertex+fragment you set the GPU to use can work together (and the document explicitly mentions that there would not be any draw-time validation either, it's the application's responsibility to provide compatible vertex and fragment stages). So breaking down pipelines into smaller objects makes them easier to handle individually, but now you'll have to take into account how they work together.

IDK. People working on this proposal have thought way longer than me in the implications of all these changes, and with greater knowledge of the topic.

leman · Apr 6, 2023

Andropov said:
Well, that was kind of my point, since the design is so similar, devs using Metal should be running into the same issues. But I have only ever heard complaints about Vulkan, not Metal, so it's surprising to me that all that's said in that document is applicable to Metal too.

I would guess that most folks who work with Metal only have modest requirements and are unlikely to run into combinatorial explosion.

Andropov said:
But, although code has to be maintained if different hardware platforms require different shaders, that shouldn't impact startup time. Only the shaders that could be used in a certain platform need to be compiled at runtime. For example if you're targeting a certain family of Metal devices you may need to provide a fallback for older devices that don't support that family, in the shape of a different shader (and thus, a different pipeline). But you'd only need to compile the pipeline that your system supports.

Suppose you have few variants of vertex shaders, mesh shaders, a bunch of compute shaders, and several dozen fragment shaders doing different types of materials. Then you want to specialize some of them. You can quickly end up with many thousand pipeline combinations this way.

Andropov said:
It's also leaving performance on the plate by making unlinked shaders the default. Say you have a common vertex stage that could potentially be used with multiple (different) fragment stages. If you compile the vertex and fragments separately, you cut down a massive number of possible combinations, but at the same time you lose all the compile-time optimizations that can be done by knowing both the vertex and fragment stages' code. On the other hand, you'd need a lot of pipelines to match the same number of combinations, but each pipeline would be compiler-optimized for that particular vertex+fragment combination. There's also a mention to Shader optimization hints under Further functionality, but the whole system would be opt-in rather than opt-out.

You also lose compile-time validation that the combination of vertex+fragment you set the GPU to use can work together (and the document explicitly mentions that there would not be any draw-time validation either, it's the application's responsibility to provide compatible vertex and fragment stages). So breaking down pipelines into smaller objects makes them easier to handle individually, but now you'll have to take into account how they work together.

IDK. People working on this proposal have thought way longer than me in the implications of all these changes, and with greater knowledge of the topic.

This is all very much true and also something I’ve been wondering about. As I said, I don’t have any experience with industrial-scale game programming. There are some big names from the gaming industry on the author list, I suppose they have much better understanding of the needs. Frankly, I don’t really understand why this can’t be handled at build time.

Andropov · Apr 6, 2023

leman said:
Suppose you have few variants of vertex shaders, mesh shaders, a bunch of compute shaders, and several dozen fragment shaders doing different types of materials. Then you want to specialize some of them. You can quickly end up with many thousand pipeline combinations this way.

It's worth mentioning that you don't necessarily need to build all of those at startup though, you can build them while the app/game is running. Let's say you're building a game and a specific boss battle has a screen-space bloom effect, which requires a fragment shader that isn't used anywhere else. Those kind of things can be loaded once certain conditions are met (let's say, the player comes within X distance of the boss battle entrance) while the game is already running.
This approach has obvious drawbacks: the app (or game) has to keep track of when those Pipeline State Objects (PSOs) should be built. If you unexpectedly need one of those PSOs and the binary hadn't been built, it's going to be compiled just in time and likely cause a frame drop. And I can only imagine the added development complexity of keeping track of where each combination of vertex/fragment/compute shaders are used, and maintaining the game logic for each combination to be built just before it's needed.

But I don't like Vulkan's solution either: making different shaders independent of each other removes the combinatorial part of the problem. That's no small feat, but, if the resulting number of shaders is still too high to manage, aren't you just delaying the problem? I mean, even with the added flexibility cutting down the number of combinations by two orders of magnitude, it's possible that a game would still need to keep track of which shaders are used where, and maintain logic to build those as required. And it seems to me that the new flexibility is far from free, as it comes at the cost of the compiler knowing less about the code, so some issues that could be caught by the compiler are now the application's responsibility and could be more easily missed (plus optimization options are missed).

Ultimately, I like the approach Apple has taken is the best out of all proposed solutions, by far. Apple has built tools to allow for PSOs to be generated en masse beforehand (like the new JSON-based pipeline state descriptors), which could even be generated from some sort of code-generation tool (or harvested during development, as Apple suggests). This essentially trades disk space (which is cheap!) to remove all of the complexity of managing when each PSO is created, without hurting neither performance nor compile-time validation of the compatibility of shaders. Plus the shader compilation is done once, instead of every time you open the app. I think it's neat.
I suppose the reason the Khronos group hasn't proposed anything like this is that you need to pre-compile different binaries for different GPUs, even within the same hardware vendor, and there are thousands of GPUs with Vulkan support. Apple has a much more limited set of supported GPUs, so you'd need just a few different binaries for each shader instead of literally thousands.

leman · Apr 6, 2023

Andropov said:
It's worth mentioning that you don't necessarily need to build all of those at startup though, you can build them while the app/game is running. Let's say you're building a game and a specific boss battle has a screen-space bloom effect, which requires a fragment shader that isn't used anywhere else. Those kind of things can be loaded once certain conditions are met (let's say, the player comes within X distance of the boss battle entrance) while the game is already running.
This approach has obvious drawbacks: the app (or game) has to keep track of when those Pipeline State Objects (PSOs) should be built. If you unexpectedly need one of those PSOs and the binary hadn't been built, it's going to be compiled just in time and likely cause a frame drop. And I can only imagine the added development complexity of keeping track of where each combination of vertex/fragment/compute shaders are used, and maintaining the game logic for each combination to be built just before it's needed.

Yep, I think that's exactly the problem they are trying to address.

Andropov said:
But I don't like Vulkan's solution either: making different shaders independent of each other removes the combinatorial part of the problem. That's no small feat, but, if the resulting number of shaders is still too high to manage, aren't you just delaying the problem? I mean, even with the added flexibility cutting down the number of combinations by two orders of magnitude, it's possible that a game would still need to keep track of which shaders are used where, and maintain logic to build those as required.

I suppose tracking shaders is much simpler than tracking shader combinations. I mean, it's probably easier to locate and put on a different pair of pants and a shirt rather than track unique outfits..

Andropov said:
And it seems to me that the new flexibility is far from free, as it comes at the cost of the compiler knowing less about the code, so some issues that could be caught by the compiler are now the application's responsibility and could be more easily missed (plus optimization options are missed).

Yep, absolutely. But hey, it's Vulkan, the API that eagerly embraces the proverbial footgun!

Andropov said:
Ultimately, I like the approach Apple has taken is the best out of all proposed solutions, by far. Apple has built tools to allow for PSOs to be generated en masse beforehand (like the new JSON-based pipeline state descriptors), which could even be generated from some sort of code-generation tool (or harvested during development, as Apple suggests). This essentially trades disk space (which is cheap!) to remove all of the complexity of managing when each PSO is created, without hurting neither performance nor compile-time validation of the compatibility of shaders. Plus the shader compilation is done once, instead of every time you open the app. I think it's neat.
I suppose the reason the Khronos group hasn't proposed anything like this is that you need to pre-compile different binaries for different GPUs, even within the same hardware vendor, and there are thousands of GPUs with Vulkan support. Apple has a much more limited set of supported GPUs, so you'd need just a few different binaries for each shader instead of literally thousands.

I agree. You also have similar features for consoles from what I understand.

dada_dave · Apr 10, 2023

An interesting tangent on OpenGL and Vulkan:

An astronomy software vendor recently asked me to take a look at their OpenGL ES code. It’s mostly OpenGL ES 2.0 with some 3.0 mixed in. It could certainly stand a little modernization and update, but it would be creating work for no reason for me to advise them to rewrite the whole engine with Vulkan. It’s already smooth and beautiful, and even with some new “gee wiz” things we are talking about adding, there’s absolutely no justification for moving to Vulkan right now. Should the time come that Apple removes OpenGL entirely for example, there’s plenty of solutions to layer OpenGL on top of Metal or Vulkan ourselves and it will be 1/100th the work of moving entirely to Vulkan. Vulkan is the new King of Android too you might point out, and yes, there’s also ways to layer OpenGL ES 3 on Vulkan for Android too (and Google is investing heavily in this). When it comes to product development, part of your job is to evaluate tradeoffs, and get a product to market in a timely fashion. Yes, there will be minimum performance criteria, but don’t use a canon to kill a mosquito.

“Oh, but I LOVE Canons… and they make such great noise, and… and…BOOM!”. This is why many engineering teams need some adult supervision in the form of a product manager. Engineers love their work, and given the chance will create more work for themselves, to the detriment of the product schedule.

OpenGL is not dead, long live Vulkan

Today I’ve made a substantial update to one of my open source libraries, “Math3D”, and I’ve changed one of my internal libraries, “GLTools”, on GitHub from private to public, and made it open sourc…

accidentalastro.com

Summary: OpenGL isn’t going anywhere for a long time and even now it can even be the better choice to program in relative to more advanced APIs as it’s simply easier to write the needed code which outweighs performant for many applications.

Andropov · Apr 10, 2023

dada_dave said:
Summary: OpenGL isn’t going anywhere for a long time and even now it can even be the better choice to program in relative to more advanced APIs as it’s simply easier to write the needed code which outweighs performant for many applications.

I wouldn't in a million years start a project using a framework that is obviously being on its way out and has clear, mature alternatives. Like. The writing is on the wall. Apple stopped updating OpenGL after 2010. Metal was released in 2014. Vulkan was released in 2016. OpenGL's last version was 4.6, in 2017. And Apple officially deprecated OpenGL in 2019. How long until they remove it altogether? It would simplify both software and hardware development (I seem to recall people from the Asahi project saying that Apple Silicon GPUs still have hardware features to make OpenGL support easier/more performant, but don't quote me on this).

Obviously not everything revolves around Apple, and I'm less aware of what other hardware vendors are doing, but it's my understanding that, even while actively supported, OpenGL performance is less and less relevant, and most metrics use Vulkan benchmarks/games. In a world with high core count CPUs, OpenGL has poor multithreading support. And new features (i.e. NVIDIA's DLSS, raytracing...) aren't ever going to come to OpenGL, so... kind of a dead end IMHO. I wouldn't start a project with it, even if it's higher level and easier to write for. Specially not if (as it seems to be the case) I want to be able to run it on Apple platforms.

leman · Apr 10, 2023

I can see WebGPU becoming a popular framework for applications that have moderate performance requirements and value developer time. A lot of Rust projects use desktop implementation of WebGPU already.

Vulkan undergoing major API shift

Elite Member

Site Champ

Site Champ

Site Champ

Site Champ

Elite Member

Elite Member

Site Champ

Elite Member

Site Champ

Site Champ

Site Champ

Elite Member

Site Champ

Site Champ

Site Champ

Site Champ

Elite Member

Site Champ

Site Champ

Similar threads