Thread: iPhone 15 / Apple Watch 9 Event

Cmaier · Sep 23, 2023

theorist9 said:
IIUC, latency is difficult to estimate because it depends on a complex set of timings. But, FWIW, in 2021 Micron claimed that, under "heavy loading", its 7500 MHz LPDDR5x would offer a 20% latency reduction over 5500 MHz LPDDR5. I don't know what the corresponding reduction would be if Apple goes to 8533 MHz LPDDR5x from its current 6400 MHz LPDDR5, or how much that would impact any latency bottleneck present in the current design. [Though that change would give a 8533/6400 – 1 = 33% increase in bandwidth.]

View attachment 26103

Micron becomes the first to validate its LPDDR5X memory, promising performance and latency benefits - OC3D

Micron promises performance and latency benefits with its LPDDR5X memory Micron Technology has become the first memory provider to validate its LPDDR5X memory technology, using MediaTek’s new Dimensity 9000 5G flagship smartphone SOC to do so.Â Right now, Micron is set to be first to market...

overclock3d.net

What matters is the latency as seen by the CPU core. On average it will be nothing close to the RAM timing, because of caches. After all, that‘s the point of caches. So if memory gets 20% faster but your cache hit rate goes down by 40%, you aren’t helping yourself. So I tend to look at the memory subsystem holistically, taking into account page faults, cache misses, the different levels and sizes of cache (each with their own latencies and bandwidth), etc.

theorist9 · Sep 23, 2023

leman said:
Power curve

The next graph shows the operation frequency in relation to the power usage. Since the phones gradually reduce their power usage and frequency with each run this gives us a glimpse into their power curve.

There are a few interesting things going on here, IMO. The power curve of A17 kind of looks like a continuation of that of A14, but offset by half a Ghz. A17 power curve is steeper. But what I find very interesting is looking at M1 data. Apparently getting these extra 300Mhz out of the Firestorm design is quite costly in terms of power. Running at full speed A17 Pro and M1 cores consume the same amount of power, but A17 Pro runs at 15% higher frequency. Of course, this all assuming the power estimates returned by the APIs are correct.

What I found notable is that power vs. frequency for both the A13 and A17 are roughy linear (rather than following a power law). Is this unusual, even for low-powered devices?

leman · Sep 23, 2023

theorist9 said:
What I found notable is that power vs. frequency for both the A13 and A17 are roughy linear (rather than following a power law). Is this unusual, even for low-powered devices?

A sufficiently small interval of any curve will appear linear. The A14 segment on the graph is almost linear as well. I ran linear regression on both segments and the residuals here very small in both cases.

What might be a bit worrying for the scalability is that the slope of A17 line is larger. Up to 4.2ghz could be reachable within 10 watts I think, but going beyond that might be problematic…

Cmaier · Sep 23, 2023

theorist9 said:
What I found notable is that power vs. frequency for both the A13 and A17 are roughy linear (rather than following a power law). Is this unusual, even for low-powered devices?

Dynamic power is linear with frequency, but squares with voltage. Double the frequency at the same voltage and you should double the power.

Cmaier · Sep 23, 2023

leman said:
A sufficiently small interval of any curve will appear linear. The A14 segment on the graph is almost linear as well. I ran linear regression on both segments and the residuals here very small in both cases.

What might be a bit worrying for the scalability is that the slope of A17 line is larger. Up to 4.2ghz could be reachable within 10 watts I think, but going beyond that might be problematic…

The slope is the capacitance that charges and discharges each cycle. Despite the smaller process node, there are more transistors and apparently more of them are busy each cycle.

theorist9 · Sep 23, 2023

leman said:
A sufficiently small interval of any curve will appear linear. The A14 segment on the graph is almost linear as well. I ran linear regression on both segments and the residuals here very small in both cases.

What might be a bit worrying for the scalability is that the slope of A17 line is larger. Up to 4.2ghz could be reachable within 10 watts I think, but going beyond that might be problematic…

Yes, that's where an understanding of the system comes in--judging how large the range needs to be to assess the scaling behavior.

4.2 GHz for the M3 would be pretty decent: 3000 * 4.2/3.78 ⇒ 3300 in GB6 SC (and more if the M3 has higher IPC). For comparison, leaked GB6 SC results for the next-gen (and probably quite power-hungry) Intel Raptor Lake i9-14900K and i9-14900KF are ≈ 3150 @ 6.0 GHz and 3350 @ 6.0 GHz, respectively. The only faster model (for SC) will probably be the specialty i9-14900KS.

ASRock reiterates rumors about 14th Gen Core "Raptor Lake Refresh", October launch & DDR5-6400 memory support - VideoCardz.com

ASRock ready for Raptor Lake Refresh ASRock recently shared an unusual blog post on their Chinese Weibo social media account. In anticipation of the upcoming launch of the 14th Gen Core series for desktops, ASRock has released an article introducing some fundamental details about this platform...

videocardz.com

theorist9 · Sep 23, 2023

Cmaier said:
Dynamic power is linear with frequency, but squares with voltage. Double the frequency at the same voltage and you should double the power.

Yes, that's the P = 1/2 C * f * V^2 formula.

But I thought modern CPU's dynamically scaled voltage with frequency, meaning the formula is closer to P = 1/2 C * f * V(f)^2, which leads to non-linear power vs. frequency scaling behavior.

I suppose there could be regimes in which this dynamic scaling takes place and regimes in which it doesn't, leading to a transitions between linear and non-linear scaling. How likely is it that Apple would need to increase the voltage (and thus move into a non-linear scaing regime) if they wanted to go above, say, 4 GHz?

Cmaier · Sep 23, 2023

theorist9 said:
Yes, that's the P = 1/2 C * f * V^2 formula.

But I thought modern CPU's dynamically scaled voltage with frequency, meaning the formula is closer to P = 1/2 C * f * V(f)^2, which leads to non-linear power vs. frequency scaling behavior.

Yep, it will usually be step-wise linear, roughly approximating a parabola, though (not exponential, which was suggested, I think, by the post you were responding to). The voltage levels are typically discrete, because you can scale frequency to some extent without changing voltage. I have no idea how many steps Apple uses.

theorist9 said:
I suppose there could be regimes in which this dynamic scaling takes place and regimes in which it doesn't, leading to a transitions between linear and non-linear scaling. How likely is it that Apple would need to increase the voltage (and thus move into a non-linear scaing regime) if they wanted to go above, say, 4 GHz?

Hard to tell. The reason you need to increase voltage is because, at a certain point, you need the transistors to switch faster in order to meet your cycle time. (This is not obvious. Somewhere on the chip is a critical path at a given voltage. The maximum frequency you obtain is 1/(the time it takes for logic to propagate through that path.). So as long as the frequency is less that the critical frequency, you can scale all you want without modifying the voltage.

When you increase the voltage, you can switch transistors (and wires) more quickly, because V=q/C, so higher V means more q, and current equals delta q over delta time. The transistors switching faster has some effect on the critical path, but it tends to reshuffle them. Some paths are dominated by wire delay instead of transistor switching delay (In fact, most are. Though how much they are dominated varies a lot). But speeding up the transistor switch also helps in other ways, like by reducing the effect of coupling noise (because the ratio of switching times between adjacent wires is an important factor, so speeding them all up is usually good).

In any case, you very quickly hit diminishing returns in these sorts of hand-crafted chips, where increasing the voltage doesn’t buy you so much speed. It’s the kind of thing you very well might support on M (going out past the knee of that curve) but not on A-series.

exoticspice1 · Sep 23, 2023

Jimmyjames said:
Holy Moly! Impressive sign of what the A17 can do.

So M3 with proper cooling that is Mac mini will be great

leman · Sep 23, 2023

Cmaier said:
The slope is the capacitance that charges and discharges each cycle. Despite the smaller process node, there are more transistors and apparently more of them are busy each cycle.

Thank you, very interesting!

What still puzzles me a bit is that the IPC improvements are so small, especially with all this machinery costing more to support. There is either something here we are not seeing or maybe Apple indeed hit some sort of practical ILP wall.

Andropov · Sep 24, 2023

leman said:
What still puzzles me a bit is that the IPC improvements are so small, especially with all this machinery costing more to support. There is either something here we are not seeing or maybe Apple indeed hit some sort of practical ILP wall.

Maybe it's just the bandwidth. We'll have to wait for M3.

theorist9 said:
Yes, that's where an understanding of the system comes in--judging how large the range needs to be to assess the scaling behavior.

4.2 GHz for the M3 would be pretty decent: 3000 * 4.2/3.78 ⇒ 3300 in GB6 SC (and more if the M3 has higher IPC). For comparison, leaked GB6 SC results for the next-gen (and probably quite power-hungry) Intel Raptor Lake i9-14900K and i9-14900KF are ≈ 3150 @ 6.0 GHz and 3350 @ 6.0 GHz, respectively. The only faster model (for SC) will probably be the specialty i9-14900KS.

ASRock reiterates rumors about 14th Gen Core "Raptor Lake Refresh", October launch & DDR5-6400 memory support - VideoCardz.com

ASRock ready for Raptor Lake Refresh ASRock recently shared an unusual blog post on their Chinese Weibo social media account. In anticipation of the upcoming launch of the 14th Gen Core series for desktops, ASRock has released an article introducing some fundamental details about this platform...

videocardz.com

There's also been a PassMark score leak. I know, not ideal. But anyway:

A 3.6% increase after last year's chip. If this leak is accurate, and if this relative increase in PassMark translates to other benchmarks, Apple would retake the single core performance lead with the M3, despite Intel's efforts.

leman · Sep 24, 2023

Andropov said:
Maybe it's just the bandwidth. We'll have to wait for M3.

I doubt that RAM bandwidth has anything to do with it. More likely cache bandwidth (A17 still has same load/store throughput as its predecessors) or the cost of cache misses.

Cmaier · Sep 24, 2023

Yoused · Sep 24, 2023

I wonder if Apple ever intends to implement SVE3 (probably not 2). Seems like all their other CP hardware covers most of what you would need that for.

dada_dave · Sep 24, 2023

Cmaier said:
View attachment 26113

I wonder what this is for? This is in addition to the 128 bit SIMD units right? Is this SIMD unit in the P-core as well? - ie so a thread migrates between E and P and doesn’t crash?

Jimmyjames · Sep 24, 2023

dada_dave said:
I wonder what this is for? This is in addition to the 128 bit SIMD units right? Is this SIMD unit in the P-core as well? - ie so a thread migrates between E and P and doesn’t crash?

Forgive the ignorance, does that mean the new E-core SIMD unit is 3 time better?

leman · Sep 24, 2023

dada_dave said:
I wonder what this is for? This is in addition to the 128 bit SIMD units right? Is this SIMD unit in the P-core as well? - ie so a thread migrates between E and P and doesn’t crash?

Jimmyjames said:
Forgive the ignorance, does that mean the new E-core SIMD unit is 3 time better?

It’s third 128-bit unit, total 3x128=384. Or 50% more FP compute.

leman · Sep 24, 2023

Yoused said:
I wonder if Apple ever intends to implement SVE3 (probably not 2). Seems like all their other CP hardware covers most of what you would need that for.

Is there an SVE3?

Jimmyjames · Sep 24, 2023

leman said:
It’s third 128-bit unit, total 3x128=384. Or 50% more FP compute.

Ahh I misunderstood. I thought it meant an additional 384bit SIMD unit!

dada_dave · Sep 24, 2023

Jimmyjames said:
Ahh I misunderstood. I thought it meant an additional 384bit SIMD unit!

Yeah that’s what I thought he meant too. I was so confused!

Thread: iPhone 15 / Apple Watch 9 Event

Site Master

Site Champ

Site Champ

Site Master

Site Master

Site Champ

Site Champ

Site Master

Site Champ

Site Champ

Site Champ

Site Champ

Site Master

up

Elite Member

Site Champ

Site Champ

Site Champ

Site Champ

Elite Member

Similar threads