May 7 “Let Loose” Event - new iPads

dada_dave

Elite Member
Posts
2,208
Reaction score
2,212
Here is GB5. It seems mostly clock improvements and a bit of IPC.
Yeah that tracks, looking at all the different submitted M3 GB5 scores, about a 1-5% overall IPC increase (more for some subtest, less for others) depending, similar for the M3 to M2.

============

Although, I do feel the need to point out again though, a problem with the "IPC" argument at different clocks speeds is that the same processor clocked x% higher isn't guaranteed to get an x% increase in performance - i.e. IPC tends to get lower with higher clocks as things like cache misses, RAM latency, etc ... all come into play the more you increase the clocks and thus sometimes all you do is increase the number of cycles a processor is waiting. For instance, take Horizon Detection. As @theorist9 noted it looks like a slight IPC regression and that may be the case. This is for the Intel chip Geekbench uses as a reference, looking at the L3 cache miss and the working data set we see a lot of trips to main memory if the requested data isn't in L1-3, more than most of the other tests (a few others match it or exceed it). Not being able to measure the latency difference between M4 and M3's RAM I can't necessarily say that's the sole cause of an IPC decrease, but you could see how it could be for such a test (the test also has a really high branch miss rate in that Intel chip, though of course an Apple chip has a completely different branch predictor, and it has overall low IPC compared to other tests, so it's possibly a combination of factors).

Don't get me wrong, I'd love to have clockspeed increases AND IPC increases at those higher clocks, but more important is performance per watt. Further, the same (or similar) IPC at higher clocks but same or similar power doesn't mean the architecture is staying still. In fact, we know it isn't, so this isn't just TSMC's node or increased power. True, it does mean that outside of the introduction of specialty hardware like SME, SVE2, etc ... and optimizations to take advantage of those (which is still important! and yes still counts, so yes we got nice IPC uplift for any application that can take advantage of SME ... like say stockfish! 🤪 ), we aren't getting massive leaps in single core performance beyond clock speed. The architectural changes are likely what's letting Apple keep IPC up with clocks. And that in and of itself is interesting. It suggests that massive IPC increases in "normal" code for "normal" floating point and integer workloads is getting harder for Apple's wide CPU design. At least Apple hasn't managed it in four (or 5 counting iPhones) generations. So eventually others may catch up in IPC, but then so far the only way people have found to do so is in making wider cores so they'll likely start hitting the same limits unless someone figures out where the bottleneck to further gains is and solves it (if it can be solved).
 
Last edited:

Andropov

Site Champ
Posts
637
Reaction score
816
Location
Spain
I'm finding the discussion over SME at the other place (and on Twitter in general) absurd. The chip has SME support. If it didn't, it'd be a different chip. Other things would have been prioritized, which may have resulted in higher scores in different sub-benchmarks instead.

In any case, impressive upgrade! 25% faster single core, 20% faster multi core...

Updated the graph I've been keeping on Geekbench scores for Apple products btw, I don't think it's possible to interpret it as a negative/underwhelming trend:

AppleGB6.png


(Not unless being intentionally obtuse, I guess).
 

dada_dave

Elite Member
Posts
2,208
Reaction score
2,212
I'm finding the discussion over SME at the other place (and on Twitter in general) absurd. The chip has SME support. If it didn't, it'd be a different chip. Other things would have been prioritized, which may have resulted in higher scores in different sub-benchmarks instead.

In any case, impressive upgrade! 25% faster single core, 20% faster multi core...

Updated the graph I've been keeping on Geekbench scores for Apple products btw, I don't think it's possible to interpret it as a negative/underwhelming trend:

View attachment 29325

(Not unless being intentionally obtuse, I guess).
Sure though were I being unnecessarily combative and pedantic I would point out that the hardware for accelerating matrices on Apple silicon Macs has been there since M1 (and A13 for iPhones I think) and it’s just that this is the first time cross platform software like Geekbench, which won’t use an Apple framework like Accelerate, could access it. Thus at the hardware level it doesn’t necessarily represent as big a step up as the Geekbench 6 results might suggest.

Okay devil’s advocate over. I agree. The hardware is there and it can now be taken advantage of by software targeting cross platform development which is very important (though as @leman cautions we need to see what’s actually supported, these results are indicative not conclusive) and if someone wants to make the above devil’s advocate argument because they hate Apple well then that means the M1 was also that much better than it already was!

Edit: I gotta say though this is one the reasons I don’t like average benchmark numbers taken from lots of sub benchmarks. Regardless of geometric or arithmetic or whatever kind of mean, the final number is a little meaningless. The sub benchmark scores are very meaningful and are super important but the summary statistic meant to represent them all obfuscates too much no matter how you calculate it. That’s as true for SPEC as it is for Geekbench. Unfortunately it is also incredibly convenient as a simple metric, so I’ll probably still use it too!
 
Last edited:

leman

Site Champ
Posts
667
Reaction score
1,232
On my MacBook Air M3:

CPU: 4197
GPU: 5444
NPU: 8079

On my iPhone 15 Pro (A17Pro):
CPU: 4071
GPU: 3650
NPU: 5996

I'm not sure what I can infer from this but it certainly looks like the NPU on the M3 is at least as good as the A17. Maybe the cross-platform isn't comparable yet? And for the CPU scores, is that the AMX that makes both the M3 and A17 so similar?

One should keep in mind that CoreML does not allow precise control about where the model is executed. The NPU result could still include the CPU.



That's a bit of a silly take since they are taking one of the early M4 iPad scores vs. a top 1% score for an M3 Max in a large laptop. If one compares agains higher M3 MacBook Air (also passively cooled) scores, one sees 10-20% improvements in most of the subtests. That's significantly more than what a 7% increase in clock frequency can explain.
 

Jimmyjames

Site Champ
Posts
709
Reaction score
813
One should keep in mind that CoreML does not allow precise control about where the model is executed. The NPU result could still include the CPU.




That's a bit of a silly take since they are taking one of the early M4 iPad scores vs. a top 1% score for an M3 Max in a large laptop. If one compares agains higher M3 MacBook Air (also passively cooled) scores, one sees 10-20% improvements in most of the subtests. That's significantly more than what a 7% increase in clock frequency can explain.
Right? It is a silly take. If we look at the Air vs iPad

We can see things like html5 browser and pdf renderer getting 15-20%. I don’t believe those can be explained by SME. Not to mention we don’t know much about these devices yet, or how SME can be accessed.

I’d also be surprised If one score could skew the overall result as much as is being claimed. I’d love to know..

Lastly, with regard to GB5 showing the “true” increase without SME, I find it a little suspicious. These benchmarks are improved as time goes on and the version number is increased. I’d be surprised if the older version was a more accurate representation of the “truth” than the new one. It strikes me a little like people who use Cinebench R23 ove 24 to prove Apple silicon is slower than x86 monsters.

Edit: also, from the link above, the supposed SME using Object Detection gets +117 in single core and +52 in multi core. Seems strange that the benefits don’t show as much in multi core even though the iPad has two extra cores.
 
Last edited:

leman

Site Champ
Posts
667
Reaction score
1,232
Lastly, with regard to GB5 showing the “true” increase without SME, I find it a little suspicious. These benchmarks are improved as time goes on and the version number is increased. I’d be surprised if the older version was a more accurate representation of the “truth” than the new one. It strikes me a little like people who use Cinebench R23 ove 24 to prove Apple silicon is slower than x86 monsters.

One change from GB5 to GB6 was that they increased the dataset sizes to better reflect modern workloads. So with GB5 it is likely that there are fewer cache misses. It is entirely possible that the M4 is only able to press an IPC advantage when things get more complicated. Or maybe they made the caches larger :)

Edit: also, from the link above, the supposed SME using Object Detection gets +117 in single core and +52 in multi core. Seems strange that the benefits don’t show as much in multi core even though the iPad has two extra cores.

That’s less surprising since AMX units are shared resources for all CPUs in the cluster. I’d guess M4 has two of them - one for the four P-cores and one for the six E-cores.
 

dada_dave

Elite Member
Posts
2,208
Reaction score
2,212
One should keep in mind that CoreML does not allow precise control about where the model is executed. The NPU result could still include the CPU.




That's a bit of a silly take since they are taking one of the early M4 iPad scores vs. a top 1% score for an M3 Max in a large laptop. If one compares agains higher M3 MacBook Air (also passively cooled) scores, one sees 10-20% improvements in most of the subtests. That's significantly more than what a 7% increase in clock frequency can explain.
Isn’t it 4.4/4.05 ~= 1.086 or ~9% rounded?

Also I’ve seen M3 Maxes with scores of 3048 on the latest Geekbench 6.3. I think it’s less that the M3 max has any extra cooling - especially on a single core test - and more that reported Geekbench results are so variable. We don’t have a good sense where the currently reported M4 results will lie yet. Another issue is that different minor geekbench versions include tweaks to the subtests in the change log and that makes me concerned that they are not always directly comparable even if they are all normalized against the same processor.

Right? It is a silly take. If we look at the Air vs iPad

We can see things like html5 browser and pdf renderer getting 15-20%. I don’t believe those can be explained by SME. Not to mention we don’t know much about these devices yet, or how SME can be accessed.

I’d also be surprised If one score could skew the overall result as much as is being claimed. I’d love to know..

Lastly, with regard to GB5 showing the “true” increase without SME, I find it a little suspicious. These benchmarks are improved as time goes on and the version number is increased. I’d be surprised if the older version was a more accurate representation of the “truth” than the new one. It strikes me a little like people who use Cinebench R23 ove 24 to prove Apple silicon is slower than x86 monsters.

With regard to GB6 vs 5, it’s more that Jon Poole wanted to update Geekbench to test different features that he felt would be more useful as a metric for users going forwards (more AI) and also to change up the way multicore was tested as presumably he felt CPU makers were overselling how useful many core systems would be to most user’s needs. There’s not really a problem with the GB5 CPU test, especially not single core, in the way that CB R23 had a problem. There is a bit of an update to the working set sizes but not to the extent that say an old graphics benchmark simply doesn’t stress a modern GPU. Even GB5 multicore is still a decent test (except on Windows although someone mentioned GB6 might still have a problem, I can’t remember). It’s just different, with different priorities - as long as one is cognizant of that and reflective of how things of changed and why, it can still be a useful tool. In this instance since we know it won’t have SME, it’s useful for that singular purpose with the context that naturally it isn’t comparable to or better than GB6.

And that’s the thing, SME is no more a cheat than Nvidia adding tensor cores and AI benchmarks on the GPU speeding up. That said, how important it is will depend on how often it gets used in real software as opposed to say the program targeting the GPU or the NPU to accomplish that task or using CoreML and the computer deciding where to run.

Personally I like @leman ’s explanation from the other thread that SME can be thought of as a more thoughtful way to execute a lot of the same kinds of tasks that AVX-512 was meant for and I think it’ll see use.

As for their calculations I think they did make a slight mistake which I saw someone post a follow up from them that it was about 3% even in their example. I had a little trouble recreating their exact numbers, I think I’m making a mistake but it was close enough.

Beyond object detection though yeah we see some with better IPC improvements definitely and at least one, maybe more?, with a possible IPC regression dragging the average down and that’s going to happen with clock speed increases. Apple has clearly changed the architecture, some tasks benefit, other tasks had more trouble keeping up with the clock speed increases and the weighted arithmetic mean of the geometric mean for non-object detection test IPC increases looks middling. But if SME is set to become super useful well then … I don’t care? Also if my tasks fall into the tests that showed the best increases, I also don’t care what the average is. Of course the opposite is true as well, but that’s why I’m not such a fan of the averages.
 

dada_dave

Elite Member
Posts
2,208
Reaction score
2,212
I’m really curious about the GPU and E-core changes. They increased the GPU compute score with no extra cores and I’m interested if they raised clocks or did something microarchitecturally or both. And of course the E-cores have been receiving the biggest changes for awhile. So for the new multicore score, given the changes to the P-core + two extra E-cores, I’ll be fascinated to see how much their performance has improved or not this generation.
 
Top Bottom
1 2