Thread: iPhone 15 / Apple Watch 9 Event

dada_dave

Elite Member
Posts
2,363
Reaction score
2,381
Doing some double dubious methodology I think the A17 would score around 17000 in OpenCL if you could run it and based on around a 10K Vulkan score the Adreno 740 would be about 16K for a rational OpenCL score (not what is reported indicating a very poor OpenCL driver implementation) - here’s how I constructed this dubious monstrosity of a comparison: basically taking the ratio of metal scores for the M2 10 core GPU and A17pro roughly 1.6, applying that to the M2’s OpenCL score. Taking the ratio of Vulkan scores to OpenCL scores for the same Nvidia GPUs, also about 1.6 oddly (really dubious here) and applying that to the Adreno GPU.

Unsure how safe any of that is but taking it at face value: the A17 would have roughly the same, a bit better, compute than the Adreno 740 with oddly fewer reported TFLOPs but same/more power. Weird.
 

Jimmyjames

Site Champ
Posts
822
Reaction score
929
Doing some double dubious methodology I think the A17 would score around 17000 in OpenCL if you could run it and based on around a 10K Vulkan score the Adreno 740 would be about 16K for a rational OpenCL score - basically taking the ratio of metal scores for the M2 10 core GPU and A17pro roughly 1.6, applying that to the M2’s OpenCL score. Taking the ratio of Vulkan scores to OpenCL scores for the same Nvidia GPUs, also about 1.6 oddly (really dubious here) and applying that to the Adreno GPU.

Unsure how safe any of that is but taking it at face value: the A17 would have roughly the same, a bit better, compute than the Adreno 740 with oddly fewer reported TFLOPs but same/more power. Weird.
Why would we take OpenCL scores? It’s basically abandoned on iOS/macOS. Im sure J Poole said you can compare compute scores even with different apis.
 

dada_dave

Elite Member
Posts
2,363
Reaction score
2,381
Why would we take OpenCL scores? It’s basically abandoned on iOS/macOS. Im sure J Poole said you can compare compute scores even with different apis.
He may say that but you can’t. And yes I agree that OpenCL is not ideal but it’s the best comparator I’ve got between Metal focused Apple GPUs and Vulkan focused Androids. But you can see on devices that have both Vulkan and OpenCL that the Vulkan GB6 score are very clearly lower - like 1.6 fold lower. Except for Adreno where the OpenCL scores are trash tier and clearly poorly optimized. Basically I don’t believe any of the comparisons here, not even mine.
 

Jimmyjames

Site Champ
Posts
822
Reaction score
929
He may say that but you can’t. And yes I agree that OpenCL is not ideal but it’s the best comparator I’ve got between Metal focused Apple GPUs and Vulkan focused Androids. But you can see on devices that have both Vulkan and OpenCL that the Vulkan GB6 score are very clearly lower - like 1.6 fold lower. Except for Adreno where the OpenCL scores are trash tier and clearly poorly optimized. Basically I don’t believe any of the comparisons here, not even mine.
Hmmm to be honest, if J Poole says it, I tend to believe it without strong evidence to the contrary, given he knows more about it than any of us.
 

dada_dave

Elite Member
Posts
2,363
Reaction score
2,381
Hmmm to be honest, if J Poole says it, I tend to believe it without strong evidence to the contrary, given he knows more about it than any of us.


So I should stress that for Nvidia the relationship goes one way, for Amd the other, for Adreno who knows?

But bottom line you see different GPUs in different spots with different scores. Occasionally you’ll get something like the RTX 3080 scoring the same in both but most of them the same GPU is scoring differently. And if they score the same, a GPU they beat handily on one list, can beat them handily in the other.

Interesting he got rid of CUDA in the latest iteration.
 

Jimmyjames

Site Champ
Posts
822
Reaction score
929


So I should stress that for Nvidia the relationship goes one way, for Amd the other, for Adreno who knows?

But bottom line you see different GPUs in different spots with different scores. Occasionally you’ll get something like the RTX 3080 scoring the same in both but most of them the same GPU is scoring differently. And if they score the same, a GPU they beat handily on one list, can beat them handily in the other.

Interesting he got rid of CUDA in the latest iteration.
I’m not convinced why it matters. There is no performance of a gpu without an api or software. If that software performs poorly then that’s part of the equation. Apples “thing” is software and hardware together.
 

dada_dave

Elite Member
Posts
2,363
Reaction score
2,381
I’m not convinced why it matters. There is no performance of a gpu without an api or software. If that software performs poorly then that’s part of the equation. Apples “thing” is software and hardware together.
It matters because we’re trying to compare the capabilities of two GPUs across multiple APIs. It’s hard to know what’s a function of the benchmark being poorly coded for that API, what’s a poor driver optimization for the API, and what’s the hardware. That’s the problem with GPU benchmarks to an even greater degree than CPU benchmarks which can already suffer from this.
 

dada_dave

Elite Member
Posts
2,363
Reaction score
2,381
It matters because we’re trying to compare the capabilities of two GPUs across multiple APIs. It’s hard to know what’s a function of the benchmark being poorly coded for that API, what’s a poor driver optimization for the API, and what’s the hardware. That’s the problem with GPU benchmarks to an even greater degree than CPU benchmarks which can already suffer from this.
Here’s what I wrote earlier:
That might reflect more weirdness in the OpenCL test than some ground truth between the two GPUs. As I mentioned above some of the tests heavily depend on how well a particular API is implemented by the driver rather than what the hardware is capable of. You could argue the user doesn’t care if they depend on that API, but one cannot generalized conclusions about the capabilities of the two GPUs including for compute.

OpenCL is basically a dead api, including on the Mac, and it has a bunch of weird results that make Apple look really good and really bad depending on the test that is being run. Like I remember when the M1 first came out some trolls tried to use some OpenCL results to prove how bad the new GPU was relative to AMD, but those weren’t really representative of anything else.*

The Adreno 740 in the Snapdragon 8 gen 2 does boast some impressive stats, so maybe it is that good? I’m still suspicious of these power draw figures between devices but it’s not as out of bounds as when I first looked at them.

@Jimmyjames you said you’d seen other tests with the 740 being not as good as the Apple GPU?

*Edit: having looked again at the OpenCL GB results they look a lot more rational that I remembered for most devices. However it could still be a case of an issue with the OpenCL driver in the 740 being subpar rather than any hardware limitation with respect to compute.
 

Jimmyjames

Site Champ
Posts
822
Reaction score
929
It matters because we’re trying to compare the capabilities of two GPUs across multiple APIs. It’s hard to know what’s a function of the benchmark being poorly coded for that API, what’s a poor driver optimization for the API, and what’s the hardware. That’s the problem with GPU benchmarks to an even greater degree than CPU benchmarks which can already suffer from this.
I’m definitely not an expert, but I don’t see how the gpu can be said to have a capability without the software to back it up.
 

dada_dave

Elite Member
Posts
2,363
Reaction score
2,381
I’m definitely not an expert, but I don’t see how the gpu can be said to have a capability without the software to back it up.
Same reason you objected to OpenCL being used to measure performance on the Mac.

Why would we take OpenCL scores? It’s basically abandoned on iOS/macOS.

Any individual piece of software may not be representative or useful as a benchmark for what users will be running on the platform they buy. It’s also why we objected to CB23 benchmarks for the Mac - it was known have performance issues on AS and potentially ARM more broadly.

To essentially repeat what I wrote earlier: if you know that the benchmark is indicative of the workload you plan on doing then the fact that it suffers from poor optimization on a particular platform doesn’t matter to you. As you say the hardware needs software and the performance is what it is. Why is unimportant. You could bank on it getting better, but that’s generally not a safe thing to do when making a major purchase.

However, the “why” IS important when trying to get an overall sense of a particular hardware’s performance relative to others. Taking a test known to not perform well due to driver bugs, running under Rosetta, etc … will tell you a very skewed picture of performance in general. Personally I think GB6’s GPU results look unreliable when comparing across APIs. There are major discrepancies between them which are unclear what the cause is and therefore what safe conclusions one can make about the performance of the underlying hardware. Is it the drivers? Is it GB6? Something hardware related? Unclear.

The following example is extreme I’ll admit, but illustrative of the point I’m trying to make. Take one of my favorite m7chy performances from the other place: he ran a benchmark through Wine through a VM and compared it to native Windows x86 performance. Now this was especially egregious because a native application did exist. BUT there are apps which don’t. Outside of the context of somebody needing that particular use case does it make sense to actually use that as benchmark to asses the overall performance of a device? Not really, no.

Basically it’s an extreme case of why many of us were leery of drawing firm, general conclusions about CB23 or OpenCL results for Apple Silicon. Those benchmarks are completely valid depending on your individual use case, but aren’t necessarily indicative of what a user will experience.
 

Jimmyjames

Site Champ
Posts
822
Reaction score
929
Same reason you objected to OpenCL being used to measure performance on the Mac.
i object to opencl because it’s abandoned on Apple’s platforms. That isn’t true of Vulcan or opencl on Android, to my knowledge. I’m all in favour of using the best, highest performance way of measuring a cpu or gpu. We’re in a situation where neither Vulcan or opencl on Qualcomm perform anywhere close to metal on Apple silicon. I don’t believe the gap can be explained by your reasoning.
Any individual piece of software may not be representative or useful as a benchmark for what users will be running on the platform they buy. It’s also why we objected to CB23 benchmarks for the Mac - it was known have performance issues on AS and potentially ARM more broadly.
Ehhhh. C23 was written using the library of one architecture (Intel) and being used to compare performance on arm as well. If CB23 Used Apple’s own ray tracing library and it performed poorly, I’d say fair game. It’s fine to compare them.
What is the evidence that Geekbench performs unfairly on, or Nvidia or Amd?
To essentially repeat what I wrote earlier: if you know that the benchmark is indicative of the workload you plan on doing then the fact that it suffers from poor optimization on a particular platform doesn’t matter to you. As you say the hardware needs software and the performance is what it is. Why is unimportant. You could bank on it getting better, but that’s generally not a safe thing to do when making a major purchase.

However, the “why” IS important when trying to get an overall sense of a particular hardware’s performance relative to others. Taking a test known to not perform well due to driver bugs, running under Rosetta, etc … will tell you a very skewed picture of performance in general.
Performance in what software? Presumably that would be using the same api as the benchmark. Thus giving a reliable score.
Personally I think GB6’s GPU results look unreliable when comparing across APIs.
I’m struggling to understand why you think this? What is the explanation or reason? I can point to specific things on CB23 or opencl.
There are major discrepancies between them which are unclear what the cause is and therefore what safe conclusions one can make about the performance of the underlying hardware. Is it the drivers? Is it GB6? Something hardware related? Unclear.

The following example is extreme I’ll admit, but illustrative of the point I’m trying to make. Take one of my favorite m7chy performances from the other place: he ran a benchmark through Wine through a VM and compared it to native Windows x86 performance. Now this was especially egregious because a native application did exist. BUT there are apps which don’t. Outside of the context of somebody needing that particular use case does it make sense to actually use that as benchmark to asses the overall performance of a device? Not really, no.

Basically it’s an extreme case of why many of us were leery of drawing firm, general conclusions about CB23 or OpenCL results for Apple Silicon. Those benchmarks are completely valid depending on your individual use case, but aren’t necessarily indicative of what a user will experience.
I don’t see those as comparable.
 
Last edited:

dada_dave

Elite Member
Posts
2,363
Reaction score
2,381
i object to opencl because it’s abandoned on Apple’s platforms. That isn’t true of Vulcan or opencl on Android, to my knowledge. I’m all in favour of using the best, highest performance way of measuring a cpu or gpu. We’re in a situation where neither Vulcan or opencl on Qualcomm perform anywhere close to metal on Apple silicon. I don’t believe the gap can be explained by your reasoning.

Ehhhh. C23 was written using the library of one architecture (Intel) and being used to compare performance on arm as well. If CB23 Used Apple’s own ray tracing library and it performed poorly, I’d say fair game. It’s fine to compare them.
What is the evidence that Geekbench performs unfairly on, or Nvidia or Amd?

Performance in what software? Presumably that would be using the same api as the benchmark. Thus giving a reliable score.

I’m struggling to understand why you think this? What is the explanation or reason? I can point to specific things on CB23 or opencl.

I don’t see those as comparable.
As far as I can tell OpenCL is basically a legacy API everywhere. I don’t believe there is any further work being done for it by the Khronos group beyond maintenance. 3.0 was released 3 years ago, 3.0.14 is 5 months ago, the .0.14 meaning minor versions/bug fixes only. It’s not clear to me what state the drivers are in for any of the device makers, some may be paying more attention to it than others.

The reason I’m suspicious of comparing the GB6 results across APIs is the results don’t make much sense when doing that. I showed you those links: the GPU rankings and scores can be wildly different on Nvidia and AMD GPUs depending on which API you use and if the goal is to use the best for a particular platform well … where’s CUDA for Nvidia?

So if you’re comparing Apple’s Metal numbers to Nvidia’s OpenCL numbers they’ll likely look very different to Nvidia’s Vulkan numbers and then there’s Apple’s OpenCL numbers … Same with comparing to AMD. Now these differences could be driver differences, they could be differences in hardware, they could be differences in optimization of individual GB6 tests to the different hardware or APIs or drivers. The latter would be very analogous to CB23. For CB23, it’s because it was such an outlier that people dig into it and figured out the probable cause for its poor performance. GB6 is relatively new and I certainly haven’t had the time or energy to look too deeply into it but yes I’m wary of drawing conclusions from a test which shows that level of variability across different APIs.

Remember that there are no native OpenCL drivers on Apple Silicon anymore. It’s translated/emulated on top of Metal, just like OpenGL is.

Welcome to the site! Yes both @Jimmyjames and I are aware of that, but we’re just disagreeing over how broadly we should then interpret cross platform cross API results knowing that as well as the other particularities of GB6 GPU results which I overall find frustratingly unreliable. My concern is that we’re in a cryptic CB23 situation where at least some of these GB results aren’t very useful as a metric between platforms/APIs. Jimmy disagrees.
 

Jimmyjames

Site Champ
Posts
822
Reaction score
929
The reason I’m suspicious of comparing the GB6 results across APIs is the results don’t make much sense when doing that. I showed you those links: the GPU rankings and scores can be wildly different on Nvidia and AMD GPUs depending on which API you use and if the goal is to use the best for a particular platform well … where’s CUDA for Nvidia?
Yes. Cudas absence from GB 6 is strange. I haven’t seen any explanation for it. Doubly weird given its presence on GB 5.
 

dada_dave

Elite Member
Posts
2,363
Reaction score
2,381
Yes. Cudas absence from GB 6 is strange. I haven’t seen any explanation for it. Doubly weird given its presence on GB 5.
Aye, that’s just one of my concerns. I don’t want to rag on GB6: make no mistake some of GB6 seems better like comparing within platform within API. A lot of the weird scaling issues that were way worse with GB than any other benchmark for Apple silicon are gone. but knowing how hard it is to do GPU programming, knowing how much optimization you can squeeze into a program if you put your mind to it, I’m much more wary about GPU benchmarks, especially when it’s an outlier or shows significant variability. Don’t get me wrong you can do a lot of optimization for CPUs too, of course, but even for CB23 the suspected culprit was the parallel SIMD vector math being Intel/x86 optimized in the Embree library (whether it was actually is unknown and I don’t actually know if CB24 changed it!). I like to think of the vector accelerators as little baby GPUs 🙃, they aren’t but some of the same principles and difficulties apply.
 
Last edited:

Jimmyjames

Site Champ
Posts
822
Reaction score
929
Aye, that’s just one of my concerns. I don’t want to rag on GB6: make no mistake some of GB6 seems better like comparing within platform within API. A lot of the weird scaling issues that were way worse with GB than any other benchmark for Apple silicon are gone. but knowing how hard it is to do GPU programming, knowing how much optimization you can squeeze into a program if you put your mind to it, I’m much more wary about GPU benchmarks, especially when it’s an outlier or shows significant variability. Don’t get me wrong you can do a lot of optimization for CPUs too, of course, but even for CB23 the suspected culprit was the parallel SIMD vector math being Intel/x86 optimized in the Embree library (whether it was actually is unknown and I don’t actually know if CB24 changed it!). I like to think of the vector accelerators as little baby GPUs 🙃, they aren’t but some of the same principles and difficulties apply.
Just out of interest, do you think the weirdness extends to benchmarks of raster performance? GFXBench, 3DMark etc
 
Top Bottom
1 2