Apple: M1 vs. M2

mr_roboto

Site Champ
Posts
287
Reaction score
464
If the benchmark turns out to be true, it'd mean +15.6% Single Core and +20.8% Multicore over the Mac Studio. The multicore score scales better than 8 x P core score should be due to either the 2 extra E-cores or the improvements in the µarch of the E cores on the A15. Maybe both. I'm saying should because the M1 Pro/Max had the E cores running at 2GHz (vs 1GHz on the regular M1) when under high load [source], and now that the M2 Pro/Max apparently has 4 E cores that design decision may have changed. Maybe the M2 Pro/Max E cores only go up to 1GHz, in which case the full difference in scores would be because the µarch improvement in those cores.
That article is a bit confusing - it doesn't make it totally clear that it only covers a subset of the system's behavior.

The context is that macOS won't schedule low-QoS (background) threads on P cores under any circumstance, even when there's enough background work to use 100% of all E cores. However, the opposite is not true. Higher-prio threads are preferentially scheduled on P cores, but when all P cores are occupied, macOS is allowed to run them on E cores.

When the E cluster is under 100% load, and that load consists exclusively of background work, M1's 4-core E cluster is software-capped at 1 GHz, but M1 Pro/Max's 2-core E cluster is allowed to run at the full 2 GHz. Presumably, Apple did this so that Pro/Max wouldn't suffer any regression in background compute throughput compared to the base M1.

But as soon as any higher-prio thread runs on an E core, the E cluster's frequency is uncapped. I played around with this on a M1 Air quite a bit. It's easy to make its E cluster to stay at 2 GHz indefinitely, even under sustained loads which heat the computer up and force its P cluster to throttle down.

Benchmarks like GB5 don't use low priority bands for their threads, as far as I know, so they don't measure the 1 GHz E-cluster behavior on base M1.
 

leman

Site Champ
Posts
638
Reaction score
1,186
Yeah, M1 is the first generation. These are core designs shared with iPhone, and the yearly phone release cycle is a big cash cow for Apple, so a conservative approach makes sense. They would not have wanted the Mac projects to add much risk before they were fully committed to the Mac transition, and at kickoff time for the A14/M1 generation of Apple Silicon, they probably did not know yet whether they were fully committed.

I don't think this is about commitment — they were 100% committed by the moment that WWDC announcement came, but more about risk management. Apple plays a long game here. Conservative approach makes a lot of business sense, especially if your tech is already this good. I'm sure there are more interesting things to come.

For example, Maynard Handley has found some more newer Apple patents (https://patents.google.com/patent/US20220334997A1 https://patents.google.com/patent/US20220342588A1) that point to more aggressive use of multi-chip technology in the future. Some big things are likely coming.
 

dada_dave

Elite Member
Posts
2,162
Reaction score
2,145
When the E cluster is under 100% load, and that load consists exclusively of background work, M1's 4-core E cluster is software-capped at 1 GHz, but M1 Pro/Max's 2-core E cluster is allowed to run at the full 2 GHz. Presumably, Apple did this so that Pro/Max wouldn't suffer any regression in background compute throughput compared to the base M1.

But as soon as any higher-prio thread runs on an E core, the E cluster's frequency is uncapped. I played around with this on a M1 Air quite a bit. It's easy to make its E cluster to stay at 2 GHz indefinitely, even under sustained loads which heat the computer up and force its P cluster to throttle down.

Benchmarks like GB5 don't use low priority bands for their threads, as far as I know, so they don't measure the 1 GHz E-cluster behavior on base M1.

Interesting! I was unaware of that latter behavior of the M1 E cores with priority threads, I assumed they were completely capped at 1GHz vs the M1 Pro/Max at 2GHz.

I don't think this is about commitment — they were 100% committed by the moment that WWDC announcement came, but more about risk management. Apple plays a long game here. Conservative approach makes a lot of business sense, especially if your tech is already this good. I'm sure there are more interesting things to come.

For example, Maynard Handley has found some more newer Apple patents (https://patents.google.com/patent/US20220334997A1 https://patents.google.com/patent/US20220342588A1) that point to more aggressive use of multi-chip technology in the future. Some big things are likely coming.

I think he meant at the start of the design of the A14/M1 chip family which would’ve been years before the WWDC announcement. But even so I agree it’s not about commitment. Rather, being conservative in some aspects of their design for the first generation of larger SOCs probably eased some of the design issues, allowed for different development priorities, etc …
 
Last edited:

mr_roboto

Site Champ
Posts
287
Reaction score
464
I don't think this is about commitment — they were 100% committed by the moment that WWDC announcement came, but more about risk management. Apple plays a long game here. Conservative approach makes a lot of business sense, especially if your tech is already this good. I'm sure there are more interesting things to come.
I think he meant at the start of the design of the A14/M1 chip family which would’ve been years before the WWDC announcement. But even so I agree it’s not about commitment. Rather, being conservative in some aspects of their design for the first generation of larger SOCs probably eased some of the design issues, allowed for different development priorities, etc …
Yep, that's what I was going for, worded poorly. I do think Apple was fully committed to transitioning the Mac when they kicked off A14/M1 development, just not fully committed to doing it with A14 generation AS. The start dates for those projects had to be so long before fall 2020. There's no way they could have had full confidence everything would be ready for Mac product launch on time. I would be astonished if they made no contingency plans for delaying the Mac AS launch to a later AS generation.

On the flip side, they would have planned the A14/M1 generation to de-risk both iOS devices and Mac. No severe rocking of the boat allowed. Never designed a P core targeted at a Fmax higher than what's appropriate for a phone or tablet before? Well, is that Fmax likely to be good enough for Mac? If so, kick that can down the road a little.
 

leman

Site Champ
Posts
638
Reaction score
1,186
Yep, that's what I was going for, worded poorly. I do think Apple was fully committed to transitioning the Mac when they kicked off A14/M1 development, just not fully committed to doing it with A14 generation AS. The start dates for those projects had to be so long before fall 2020. There's no way they could have had full confidence everything would be ready for Mac product launch on time. I would be astonished if they made no contingency plans for delaying the Mac AS launch to a later AS generation.

On the flip side, they would have planned the A14/M1 generation to de-risk both iOS devices and Mac. No severe rocking of the boat allowed. Never designed a P core targeted at a Fmax higher than what's appropriate for a phone or tablet before? Well, is that Fmax likely to be good enough for Mac? If so, kick that can down the road a little.

Thanks for clarifying, I now better understand what you meant, and yes, I agree with you entirely.

This is also why I don't believe it makes much sense to draw far reaching conclusions about Apple's strategy just from the M1 and M2 families.
 

dada_dave

Elite Member
Posts
2,162
Reaction score
2,145
Someone on ars observed that that reported model number seemed off

According to Macrumors, there are two new model numbers in the November Steam Survey - one of which is indeed 14,6 (and the other is 15,4 interestingly). So that’s additional support for 14,6 being a real model number.
 

Andropov

Site Champ
Posts
617
Reaction score
776
Location
Spain
As for your second question, the variability comes from variability in each process step. Each mask layer has tolerances. For example, you need to align mask. So in step 1 say you use a mask to determine where photoresist goes. Then you etch. then you deposit metal. then you mask again so you can etch away some of the metal. But the new mask may not be perfectly aligned with where the first mask was. The tolerances are incredibly tight.

You are also doping the semiconductor. It’s impossible to get it exactly the same twice. The wafer has curvature to it (imperceptible to a human eye). So chips at the edges are a little different than chips in the middle. Etc. etc.

the dimensions and number of atoms we are talking about are so small that it’s hard to keep everything identical at all times. Small changes in humidity, slight differences in the chemical composition of etchants or dopants, maybe somebody sneezed in the clean room. So many things can affect the end result. Vertical cross-sections of wires are never the same on two-chips (if you look at them with a powerful-enough microscope). Etc. etc.
Oh I see. It's easy forget how close to the size of atoms this things are. Thanks!
 

theorist9

Site Champ
Posts
613
Reaction score
563

dada_dave

Elite Member
Posts
2,162
Reaction score
2,145
And, FWIW, back in June a developer named Pierre Blazquez claimed he found the model numbers 14,5, 14,6, and 14,7 in Apple code: https://appleinsider.com/articles/22/07/05/apple-is-preparing-three-new-mac-studio-models
Do you think the 15,4 is real or some weird mistake in the reporting of the hardware and meant to be 14,5? I mean if it really is meant to be 15,4 that could be interesting! That should be an M3 chip undergoing testing, yes? Or have I got that wrong?
 

theorist9

Site Champ
Posts
613
Reaction score
563
Do you think the 15,4 is real or some weird mistake in the reporting of the hardware and meant to be 14,5? I mean if it really is meant to be 15,4 that could be interesting! That should be an M3 chip undergoing testing, yes? Or have I got that wrong?
Sorry, no idea. I've never bothered to try to figure out their numbering system. :).
 

Andropov

Site Champ
Posts
617
Reaction score
776
Location
Spain
Do you think the 15,4 is real or some weird mistake in the reporting of the hardware and meant to be 14,5? I mean if it really is meant to be 15,4 that could be interesting! That should be an M3 chip undergoing testing, yes? Or have I got that wrong?
Not necessarily. MacBook Pro M1 13" is MacBookPro17,1, and MacBook Pro M1 Pro/Max are MacBookPro18,X. BTW, the ID Mac14,7 is already in use: the 13" M2 MacBook Pro. No idea why they dropped the "Book" from the model ID.
 

theorist9

Site Champ
Posts
613
Reaction score
563
Not necessarily. MacBook Pro M1 13" is MacBookPro17,1, and MacBook Pro M1 Pro/Max are MacBookPro18,X. BTW, the ID Mac14,7 is already in use: the 13" M2 MacBook Pro. No idea why they dropped the "Book" from the model ID.
Ah, sorry, I wrote "14,7" when I should have written "14,8". I just corrected that in my post.
 

theorist9

Site Champ
Posts
613
Reaction score
563
My theory was that image processing likely uses the math library, which is not optimized for M1.
I got my hands on an M1 Pro MacBook Pro, which allowed me to do more detailed testing, and was able to get a better idea of what was causing Mathematica's image processing to be so much slower on the M1 than on my 2019 iMac (i9-9900K). There were two specific commands that were responsible for the difference: Sharpen and Blur. Looking at just those by themselves, the iMac was 9x faster on Sharpen (3.5 s vs. 32.5 s), and 14x faster on Blur (2.3 s vs. 32.0 s).

Futher, I was able to do a detailed comparison of GB 6.2.1 SC subscores on the two machines. All were higher on the M1 than on the iMac, except for the Background Blur task, which was 15% lower (see screenshot). Things are improved with the M2, but its GB score for this task is still 6% lower than my iMac's.

Thus these image processing tasks, when done by the CPU, seem to represent an inherent challenge for AS. Interestingly, other programs also have issues with blur tasks on AS, which are fixed by enabling GPU hardware acceleration (something I've read is not built into MMA functions; and, of course, it wouldn't operate in GB's CPU benchmark):
https://github.com/brave/brave-browser/issues/26186

So could it be Apple didn't bother optimizing CPU-based image processing because it assumes those doing image processing will be using GPU accleration?

1698799660279.png
 
Last edited:

leman

Site Champ
Posts
638
Reaction score
1,186
Thus these image processing tasks, when done by the CPU, seem to represent an inherent challenge for AS. Interestingly, other programs also have issues with blur tasks on AS, which are fixed by enabling GPU hardware acceleration (something I've read is not built into MMA functions; and, of course, it wouldn't operate in GB's CPU benchmark):
https://github.com/brave/brave-browser/issues/26186

CPU code doing these things relies on high-throughput SIMD operations, and x86 CPUs have an advantage here simply because of their higher clock. Both Apple Silicon and any modern x86 CPU are capable of roughly 512-bit worth of SIMD operations per clock, but x86 is clocked higher.
https://github.com/brave/brave-browser/issues/26186
So could it be Apple didn't bother optimizing CPU-based image processing because it assumes those doing image processing will be using GPU accleration?

Achieving high SIDM throughout on the CPU is expensive, both in terms of power consumption and die area. You need wide SIMD units, high clocks, and fast caches to feed those units. If I remember correctly, Intel cores have roughly 3x higher cache bandwidth than Apple Silicon (which is not cheap!) but all this capability is unused unless running AVX512 code. Which is disabled on consumer chips anyway.

Apple focuses on power efficiency, so they choose a different implementation path. One consequence is that Apple Silicon has no chance competing on pure throughput-oriented SIMD tasks (remember the chess engine controversy? exactly). To compensate for this, Apple has a wide vector/matrix engine (AMX) hooked to the L2 cache directly, which is a much more power efficient way of doing throughput-oriented computing on the CPU. And yes, for image processing etc., they want you to use the GPU, which is much better suited for that task anyway.
 

Citysnaps

Elite Member
Staff Member
Site Donor
Posts
3,693
Reaction score
8,993
Main Camera
iPhone
And I think Affinity Photo shows just how well that can play out when you build an image editor with that in mind these days.

With that in mind, it will be interesting to see how Matlab's image processing toolbox evolves. Especially for people who like to home-grow their own tools.
 
Top Bottom
1 2