Cinebench 2024

Jimmyjames

Site Champ
Posts
675
Reaction score
763
New release of everyone’s favourite benchmark! Seems like a good improvement. They now include a gpu benchmark. As you might imagine, Apple Silicon Macs get slaughtered by the 4090. Iirc, the 4090 gets around 30,000 while an M2 Ultra gets ~8500 to 9000. I would guess this is partly down to lack of ray tracing cores?

Will the M3 close the gap?

 
Last edited:

dada_dave

Elite Member
Posts
2,164
Reaction score
2,149
New release of everyone’s favourite benchmark! Seems like a good improvement. They now include a gpu benchmark. As you might imagine, Apple Silicon Macs get slaughtered by the 4090. Iirc, the 4090 gets around 30,000 while an M2 Ultra gets ~8500 to 9000. I would guess this is partly down to lack of ray tracing cores?
Almost entirely one would think (well beyond the obvious difference in size of the GPUs of course).

Will the M3 close the gap?
Hopefully! But if you want to read clickbait doomerism I could link to Macworld articles that say (based on Gurman rumors that Apple hasn’t planned anything yet) that M3 will be so unspectacular that Apple won’t even have an event to launch it.
 

Jimmyjames

Site Champ
Posts
675
Reaction score
763
Almost entirely one would think (well beyond the obvious difference in size of the GPUs of course).


Hopefully! But if you want to read clickbait doomerism I could link to Macworld articles that say (based on Gurman rumors that Apple hasn’t planned anything yet) that M3 will be so unspectacular that Apple won’t even have an event to launch it.
Lol Macworld has been surprisingly negative lately. We’ll see. according to Cliff, the M3 should be a nice improvement.
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,149
Amd 7900 xtx gets around 15000. Significantly more than the M2 Ultra. In blender they are close. I’m guessing redshift is just much better optimized?

This does put into perspective just how far behind Apple’s gpus are for this kind of work, especially given the cost. People reporting around 37000 for a 4090. Oof.
I’m having trouble finding the numbers, is there a chart or you getting this from Reddit/YouTube?
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,149
I believe AMD 7000 GPUs have hardware raytracing too so I’m a little surprised that they’re close to the M2 Ultra in Blender. Or is that what you meant by redshift being better optimized (for AMD)?
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,149
CPU roundup from Ian Cutress:

1693989391374.png


1693989459097.png

 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,149
My sense of it is judging by these scores that whatever was dragging down AS CPU scores has been largely eliminated. Of course that it is a particular kind of multithreaded test remains (large numbers of lightweight threads being ideal hence why GPU processing works so well), but that’s okay.

Comparing M1 Ultra and 5950X (latter is overclocked on Ian’s) on R23 v 2024 from CPU Monkey:

2024:
M1 Ultra: 1624
5950X: 1494

R23:
M1 Ultra: 24189
5950X: 28577

A massive shift. I would expect an M2 Ultra to score around 1900 approximately which would put it just below a 7950X3D and above a 13900K according to Ian’s numbers. Interestingly here CPU monkey has higher scores for 13900K and 7950X3D than Ian. Looks like overclocked results on CPU Monkey based on GHz (edit: actually not overclocked as far as I can tell, so unsure about the discrepancy).



Basically entirely reasonable CPU results now for Cinebench 2024 for Apple Silicon.

CC: @leman
 
Last edited:

Jimmyjames

Site Champ
Posts
675
Reaction score
763
I believe AMD 7000 GPUs have hardware raytracing too so I’m a little surprised that they’re close to the M2 Ultra in Blender. Or is that what you meant by redshift being better optimized (for AMD)?
I’m no expert in Redshift. I’m certain it’s much better optimized for Nvidia and it definitely uses RT on that platform. ITs less clear on AMD. I’ve seen conflicting reports. Redshift uses HIP but not HIP-RT…I think!
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,149
My sense of it is judging by these scores that whatever was dragging down AS CPU scores has been largely eliminated. Of course that it is a particular kind of multithreaded test remains (large numbers of lightweight threads being ideal hence why GPU processing works so well), but that’s okay.

Comparing M1 Ultra and 5950X (latter is overclocked on Ian’s) on R23 v 2024 from CPU Monkey:

2024:
M1 Ultra: 1624
5950X: 1494

R23:
M1 Ultra: 24189
5950X: 28577

A massive shift. I would expect an M2 Ultra to score around 1900 approximately which would put it just below a 7950X3D and above a 13900K according to Ian’s numbers. Interestingly here CPU monkey has higher scores for 13900K and 7950X3D than Ian. Looks like overclocked results on CPU Monkey based on GHz (edit: actually not overclocked as far as I can tell, so unsure about the discrepancy).



Basically entirely reasonable CPU results now for Cinebench 2024 for Apple Silicon.

CC: @leman
New chart from Ian, cpu monkey results for 7950X3D and 13900K may be overclocked after all?

1693999301515.png


Hard to keep track
 

leman

Site Champ
Posts
641
Reaction score
1,196
Just a few low confidence observations, don't take too seriously:

- the new test likely has much larger data structures to traverse, which would explains why Apple with it's larger caches does so much better on r24 compared to r23; x86 CPUs still have a large advantage when it comes to SIMD and cache throughout, but you can't use this advantage if you are experiencing cache misses (another piece of evidence for this is that Intel power consumption when running the new test is lower compared to the old one)

- AMD does exceptionally well in the GPU test, Apple does worse than expected. I am inclined to speculate that the Metal backend is a bit suboptimal
 

Jimmyjames

Site Champ
Posts
675
Reaction score
763
- AMD does exceptionally well in the GPU test, Apple does worse than expected. I am inclined to speculate that the Metal backend is a bit suboptimal
That’s my hope. My fear is that AMD is under optimised in Blender etc!
 

leman

Site Champ
Posts
641
Reaction score
1,196
That’s my hope. My fear is that AMD is under optimised in Blender etc!

That is of course also possible. But on the other hand Blender scores seem to be predicted very well by the compute capability of the card. If we disregard Nvidia OPTIX for the moment (since it catapults Nvidia into an entirely new category), we get following scores:

CUDA 3080 (30TFLOPS) ~ 3000
CUDA 3070 (20TFLOPS) ~ 2111
CUDA 3060 (13TFLOPS) ~ 1339

HIP RX 7900 XTX (61TFLOPS*) ~ 3700
HIP RX 6800 XT (20 TFLOPS) ~ 2342
HIP RX 6700 XT (13 TFLOPS) ~ 1487
HIP RX 6600 XT (10 TFLOPS) ~ 1057

M2 Ultra (27TFLOPS) ~ 3400
M2 Max (13TFLOPS) ~ 1800

As we can see, the fit between FLOP throughput and the score is fairly good between the manufacturers. There is some variation of course, like AMD is a bit faster per FLOP than Nvidia/CUDA, probably thanks to their RT acceleration (which primitive as it is does help does a little bit), and Apple being the fastest of the bunch per FLOP (probably because Apple optimised the hell out of it by now), but there is no huge disparity. I would assume that the CUDA backend is well maintained and well optimised, and all of this kind of puts the results where they are expected. Cinebench showing a different distribution of results can be either to less optimal optimization on some platforms or due to some difference in the scene data. No easy way to check this, unfortunately...

*The RX 7900 XTX is obviously an outlier, but this is because how AMD advertises its compute capability. They clam that Navi3 has doubled the execution units compared to Navi2, but this is a bit misleading. Navi3 can indeed execute two FLOPs per unit per clock (in a kind of a SIMD within SIMD fashion), but this is subject to a lot of restrictions and won't work every time. My entirely uneducated guess is that only around 20% of real-world operations can be "fused" to benefit from this capability. Which would put the corrected TFLOPS of the 7900 XTX somewhere around 34-36 TFLOPS, and then the data fits again.
 

Jimmyjames

Site Champ
Posts
675
Reaction score
763
Very interesting. Many thanks.

I’m hoping the upcoming iPhone event will give us some cause for optimism on the gpu front. All the leaks I’ve seen relate the the cpu, but for me the gpu is the most interesting. Probably because they have the furthest to go there. I had hoped Apple might catch and surpass the 4090 with the M3 Ultra. That looks like a tall order now.
 

leman

Site Champ
Posts
641
Reaction score
1,196
I had hoped Apple might catch and surpass the 4090 with the M3 Ultra. That looks like a tall order now.

That's a tall order indeed, for two reasons mostly: area and frequency.

First is the die area. Nvidia can dedicate pretty much the entire die to the GPU functionality, while Apple has to fit other processors and units as well. E.g. ~60% of a 4090 is occupied by GPU compute clusters (rest being caches and memory support), while only 20% of the M2 Max is dedicated to the GPU cores. Also, Nvidia has heavily invested in optimising their compute density per mm2, requiring at least 25% less die area for the same number of compute units as Apple on 5N node, and that also includes things like RT and matrix accelerators. Nvidia's optimizations may result in slightly lower compute cluster efficiency, but they can afford it. At any rate, the end result is that Nvidia can currently pack 16384 compute ALUs into a 600mm2 die where Apple can only pack 4864 ALUs into a 500mm2 die.

Secondly, Nvidia runs their GPUs at much higher clocks than Apple, 2.2GHz+ vs. 1.4GHz. Combined with superior compute density, this puts Nvidia very far ahead in terms of absolute performance. They obviously pay a huge cost in power consumption, but that’s less relevant on the desktop. Besides, they could run the GPU at the same 1.4Ghz and still be 2.5x faster for the same die area.

So yeah, I kind of doubt that Apple can catch up any time soon. UltraFusion is a way to scale up by throwing more money at the problem, but it’s not ideal either. Maybe they will get a slight boost with 3N, allowing them to pack more compute and clock it higher, but they still won’t be able to reach compute capability of a 4090 with a single chip.
 

Jimmyjames

Site Champ
Posts
675
Reaction score
763
So yeah, I kind of doubt that Apple can catch up any time soon.
So to me, that begs the question: what’s the point of Apple Silicon on the desktop?

If you can build a pc with 2x 4090 for the price of an Ultra, what is the end game for Apple. Why would people buy it and therefore what motivates developers to optimise for Apple Silicon’s unique architecture?

I’ve recent seen a discussion on Mastodon where two ex Apple devs briefly discuss the problems they have with getting their current companies to invest in Metal and Apple Silicon’s way of doing things. There isn’t a great deal of optimism there. Then there are the recent games released for the Mac, with help from Apple. The performance seems OK at best. Often struggling to match a pc.

I feel to have a chance on the desktop, ASi has to have significantly better performance than the competition, given the massive entrenched market there. In terms of cpu, they are really close. In terms of gpu, it feels like they are miles away.
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,149
So to me, that begs the question: what’s the point of Apple Silicon on the desktop?

If you can build a pc with 2x 4090 for the price of an Ultra, what is the end game for Apple. Why would people buy it and therefore what motivates developers to optimise for Apple Silicon’s unique architecture?

I’ve recent seen a discussion on Mastodon where two ex Apple devs briefly discuss the problems they have with getting their current companies to invest in Metal and Apple Silicon’s way of doing things. There isn’t a great deal of optimism there. Then there are the recent games released for the Mac, with help from Apple. The performance seems OK at best. Often struggling to match a pc.

I feel to have a chance on the desktop, ASi has to have significantly better performance than the competition, given the massive entrenched market there. In terms of cpu, they are really close. In terms of gpu, it feels like they are miles away.
How does Baulder’s Gate 3 run on Mac vs PC?
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,149
That is of course also possible. But on the other hand Blender scores seem to be predicted very well by the compute capability of the card. If we disregard Nvidia OPTIX for the moment (since it catapults Nvidia into an entirely new category), we get following scores:

CUDA 3080 (30TFLOPS) ~ 3000
CUDA 3070 (20TFLOPS) ~ 2111
CUDA 3060 (13TFLOPS) ~ 1339

HIP RX 7900 XTX (61TFLOPS*) ~ 3700
HIP RX 6800 XT (20 TFLOPS) ~ 2342
HIP RX 6700 XT (13 TFLOPS) ~ 1487
HIP RX 6600 XT (10 TFLOPS) ~ 1057

M2 Ultra (27TFLOPS) ~ 3400
M2 Max (13TFLOPS) ~ 1800

As we can see, the fit between FLOP throughput and the score is fairly good between the manufacturers. There is some variation of course, like AMD is a bit faster per FLOP than Nvidia/CUDA, probably thanks to their RT acceleration (which primitive as it is does help does a little bit), and Apple being the fastest of the bunch per FLOP (probably because Apple optimised the hell out of it by now), but there is no huge disparity. I would assume that the CUDA backend is well maintained and well optimised, and all of this kind of puts the results where they are expected. Cinebench showing a different distribution of results can be either to less optimal optimization on some platforms or due to some difference in the scene data. No easy way to check this, unfortunately...

*The RX 7900 XTX is obviously an outlier, but this is because how AMD advertises its compute capability. They clam that Navi3 has doubled the execution units compared to Navi2, but this is a bit misleading. Navi3 can indeed execute two FLOPs per unit per clock (in a kind of a SIMD within SIMD fashion), but this is subject to a lot of restrictions and won't work every time. My entirely uneducated guess is that only around 20% of real-world operations can be "fused" to benefit from this capability. Which would put the corrected TFLOPS of the 7900 XTX somewhere around 34-36 TFLOPS, and then the data fits again.
How did you get non-ray tracing results for Blender? When I was playing with the database it was either ray tracing results or older versions of Blender.
 

Jimmyjames

Site Champ
Posts
675
Reaction score
763
How does Baulder’s Gate 3 run on Mac vs PC?
Good question. I don’t know unfortunately. I thought it was going to be released on the 6th, but I was listening to the Mac Game Cast a couple of days ago, and the host said that it had been delayed due to “performance problems”. Interestingly, performance was ok in the early access beta so who knows what’s happened!
 
Top Bottom
1 2