Mark Gurman’s Q&A. (M4 and other interesting tidbits).

Jimmyjames

Site Champ
Posts
675
Reaction score
763
Mark Gurman had a q&a today. The primary topic was the Apple Car.


It’s interesting overall, the items that I found particularly interesting are these three nuggets.

1) The car had advanced silicon and it ran on the equivalent of 4 x M2 Ultras! So 2x M2 Extremes?

2) The os was a new(?) microkernel based os called safetyOS. I wonder if this is something that will eventually come to their other products. I mentioned in another thread that there has been changes to XNU to allow more microkernel related activities: Exclaves.

3) He mentions in passing that work has just started on the M4 MacBooks. Many have taken this to mean the M4 itself will not arrive for a long time, and that this proves the cadence for M chips is around 18 months. I think he meant the MacBooks with M4 were just starting, not the M4. I still hold out hope that a yearly cadence is Apple’s aim.

Thoughts?
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,329
Reaction score
8,520
Mark Gurman had a q&a today. The primary topic was the Apple Car.


It’s interesting overall, the items that I found particularly interesting are these three nuggets.

1) The car had advanced silicon and it ran on the equivalent of 4 x M2 Ultras! So 2x M2 Extremes?

2) The os was a new(?) microkernel based os called safetyOS. I wonder if this is something that will eventually come to their other products. I mentioned in another thread that there has been changes to XNU to allow more microkernel related activities: Exclaves.

3) He mentions in passing that work has just started on the M4 MacBooks. Many have taken this to mean the M4 itself will not arrive for a long time, and that this proves the cadence for M chips is around 18 months. I think he meant the MacBooks with M4 were just starting, not the M4. I still hold out hope that a yearly cadence is Apple’s aim.

Thoughts?

I assume “safety OS” is just a real time OS. Probably not much reason to use it in other products.
 

Jimmyjames

Site Champ
Posts
675
Reaction score
763
I assume “safety OS” is just a real time OS. Probably not much reason to use it in other products.
Yes, could be. I imagined part of the safetyos could be not just the real time aspect, but the security and stability aspect? Hard to say without more information though.
 

leman

Site Champ
Posts
641
Reaction score
1,196
1) The car had advanced silicon and it ran on the equivalent of 4 x M2 Ultras! So 2x M2 Extremes?

I have some difficulty with this. The power consumption would be off the charts for one. And I don’t understand why a car would need a server-class CPU or a super-beefy GPU. Maybe he meant ML capabilities, which could be achieved using larger NPU-like devices?
 

mr_roboto

Site Champ
Posts
288
Reaction score
464
I assume “safety OS” is just a real time OS. Probably not much reason to use it in other products.
Alternatively, it might just have been a scaled up version of RTKit, which is the RTOS that runs on most Chinook cores in A-series and M-series chips. (Chinook is their tiny microcontroller core.)
 

exoticspice1

Site Champ
Posts
298
Reaction score
101
this is an interesting article.


M4 will be made using advanced packaging not from TSMC.

“Sources pointed out that the packaging process adopted by Apple's M4 processor will integrate CPU, GPU and DRAM in 3D advanced packaging mode”

What does this mean?
 

leman

Site Champ
Posts
641
Reaction score
1,196
this is an interesting article.


M4 will be made using advanced packaging not from TSMC.

“Sources pointed out that the packaging process adopted by Apple's M4 processor will integrate CPU, GPU and DRAM in 3D advanced packaging mode”

What does this mean?

Probably this: https://patentscope.wipo.int/search/en/detail.jsf?docId=US304657870&_fid=US398111768
 

theorist9

Site Champ
Posts
613
Reaction score
563
I have some difficulty with this. The power consumption would be off the charts for one. And I don’t understand why a car would need a server-class CPU or a super-beefy GPU. Maybe he meant ML capabilities, which could be achieved using larger NPU-like devices?
According to this (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9942310), full self-driving (level 4 or 5) will require ≈10 cameras making ≈60 DNN (deep neural network) inferences/second, and with the estimated TOPS/watt of the NVIDIA DRIVE Orin platform*, that would consume 820 W. [Unfortunately, I couldn't find the actual TOPS figure in their article.] Further, as systems advance, computing power needs to advance as well.

*I think they estimated the energy consumption of the Orin from its computing capability and the TOPS/watts of the 2080Ti:
"We measure Pmeas and Lmeas on an Nvidia RTX 2080 Ti and scale the hardware energy efficiency for the Nvidia DRIVE Orin system to be the target platform"

So if they did that, and using the 2080TI's 250W rating, that would mean the Orin's computing power is 3.3 x 2080 Ti's (?). NVIDIA itself say the Orin is 254 TFLOPS, but they don't say what kind of TFLOPS, and NVIDA's non-qualified TFLOP figures are often wildly inflated.

I wish the article were more straightforwardly written, so I didn't have to tease out these figures....

The other way to assess needed computing power would be to look at NVIDIA's next-gen DRIVE Thor system, which they say will be 500 FP16 TFLOPS. That's more than the GPU sections of 4x M2 Ultras could deliver.
 
Last edited:

leman

Site Champ
Posts
641
Reaction score
1,196
Only difference I see is the vertical interposer. The rest looks like the existing interconnect.

From what I understand the idea of the patent is combining multiple dies in a 2.5D configuration in a way that’s compact and cost-efficient. They explicitly mention the possibility of manufacturing two dies on different node, e.g the logic die on a more advanced node and the cache/IO die on an older node (since it wouldn’t scale very well with node size). This would be a very good way IMO to solve their issues with compute density and die cost.
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,329
Reaction score
8,520
From what I understand the idea of the patent is combining multiple dies in a 2.5D configuration in a way that’s compact and cost-efficient. They explicitly mention the possibility of manufacturing two dies on different node, e.g the logic die on a more advanced node and the cache/IO die on an older node (since it wouldn’t scale very well with node size). This would be a very good way IMO to solve their issues with compute density and die cost.
For what it‘s worth, they can do that with their current tech. The difference here seems to be that they can add vertical interposers and then put even more stuff on top. So it’s like stacking packages on packages.
 

dada_dave

Elite Member
Posts
2,163
Reaction score
2,148
According to this (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9942310), full self-driving (level 4 or 5) will require ≈10 cameras making ≈60 DNN (deep neural network) inferences/second, and with the estimated TOPS/watt of the NVIDIA DRIVE Orin platform*, that would consume 820 W. [Unfortunately, I couldn't find the actual TOPS figure in their article.] Further, as systems advance, computing power needs to advance as well.

*I think they estimated the energy consumption of the Orin from its computing capability and the TOPS/watts of the 2080Ti:
"We measure Pmeas and Lmeas on an Nvidia RTX 2080 Ti and scale the hardware energy efficiency for the Nvidia DRIVE Orin system to be the target platform"

So if they did that, and using the 2080TI's 250W rating, that would mean the Orin's computing power is 3.3 x 2080 Ti's (?). NVIDIA itself say the Orin is 254 TFLOPS, but they don't say what kind of TFLOPS, and NVIDA's non-qualified TFLOP figures are often wildly inflated.

I wish the article were more straightforwardly written, so I didn't have to tease out these figures....

The other way to assess needed computing power would be to look at NVIDIA's next-gen DRIVE Thor system, which they say will be 500 FP16 TFLOPS. That's more than the GPU sections of 4x M2 Ultras could deliver.
While I understand that inference of some neural networks can be more bandwidth limited than compute limited, your last statement here is why I still partially agree with @leman than 4xM2 Ultras don't make much sense even if the power requirements of the self driving system are greater than I thought. Yes, there would need to be a lot of compute power but having a server amount of P-cores seems like a substantial amount of that compute power would be in the wrong focus. But not my field so I'm probably wildly off base.
 

leman

Site Champ
Posts
641
Reaction score
1,196
For what it‘s worth, they can do that with their current tech. The difference here seems to be that they can add vertical interposers and then put even more stuff on top. So it’s like stacking packages on packages.

Sure, that's what the patent is about (at least in my amateur-level understanding). They seem to be after three things here: power and latency reduction by wiring the chips directly without the need to use longer wires and/or serialization; area reduction by partially overlapping the dies; production cost reduction by avoiding full 3D stacking.

While I understand that inference of some neural networks can be more bandwidth limited than compute limited, your last statement here is why I still partially agree with @leman than 4xM2 Ultras don't make much sense even if the power requirements of the self driving system are greater than I thought. Yes, there would need to be a lot of compute power but having a server amount of P-cores seems like a substantial amount of that compute power would be in the wrong focus. But not my field so I'm probably wildly off base.

I agree. Most likely the plan was to use a large amount of NPU cores that would exceed the capabilities of multiple M2 Ultra GPUs. A M2 Ultra can do what, 27 TFLOPS? So 4xM2 Ultras would only have ~110 TFLOPS of FP16 matmul, hardly an insane amount. Even if you add the P-core and AMX capabilities you are still under 200 TFLOPS.
 

theorist9

Site Champ
Posts
613
Reaction score
563
While I understand that inference of some neural networks can be more bandwidth limited than compute limited, your last statement here is why I still partially agree with @leman than 4xM2 Ultras don't make much sense even if the power requirements of the self driving system are greater than I thought. Yes, there would need to be a lot of compute power but having a server amount of P-cores seems like a substantial amount of that compute power would be in the wrong focus. But not my field so I'm probably wildly off base.
Oh, I agree. I assumed when Gurman's source said said 4 x M2 Ultras, what the source actually meant was the GPU+Neural Engine power of 4 x M2 Ultras. Or maybe the source was more explicit, and Gurman was imprecise.
 

Yoused

up
Posts
5,623
Reaction score
8,942
Location
knee deep in the road apples of the 4 horsemen
According to this (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9942310), full self-driving (level 4 or 5) will require ≈10 cameras making ≈60 DNN (deep neural network) inferences/second, and with the estimated TOPS/watt of the NVIDIA DRIVE Orin platform*, that would consume 820 W. [Unfortunately, I couldn't find the actual TOPS figure in their article.] Further, as systems advance, computing power needs to advance as well.

Not to hijack, but it is kind of interesting. A non-trivial part of our brain is devoted to determining what is not interesting (what we should ignore). It seems to me that making these kinds of calculations might improve the overall efficiency of a FSD system. But merely relying on visual input (in the human range of sight) sounds like folly to me. Other kinds of input would probably be much more efficient for FSD.

But, if you get down to 500W, that still works out to power that is being used to run the control system, which gets stolen from the power needed to move the vehicle. Depending on how efficient the BEV is, that can amount to more than a 10% cost in range.
 

theorist9

Site Champ
Posts
613
Reaction score
563
But, if you get down to 500W, that still works out to power that is being used to run the control system, which gets stolen from the power needed to move the vehicle. Depending on how efficient the BEV is, that can amount to more than a 10% cost in range.
I concur! According to https://insideevs.com/news/556299/2022-tesla-model3-epa-range/ , the EPA-calculated power consumption of the 2022 Tesla Model 3 RWD 18" is 152 Whr/km and 166 Whr/km during their city and highway range tests, respectively. And according to EPA, their average speed during those tests is 21.2 mph and 48.3 mph, respectively. A precise calculation would require integrating over the varying speeds and energy consumptions (or taking a different approach, using driving time and battery capacity), but we can course-grain it, and get:

Avg. power consumption during city driving = 152 W x hr/km x 1.62 km/mi x 21.2 mi/hr = 5220 W
Avg. power consumption during highway driving = 166 W x hr/km x 1.62 km/mi x 48.3 mi/hr = 12989 W

And 500 W would be 10% of the former and 4% of latter.

An 18-wheeler, by comparison, would take much less of a range hit. Then again, the idea of autonomous 18-wheelers is a bit scary.

Conversely, no wonder my autonomous self-driving bicycle is so hard to pedal.
 
Last edited:

theorist9

Site Champ
Posts
613
Reaction score
563
What ever compute they have would have to be doubled for redundancy.
Good point--I hadn't considered that. I checked, and that's what Waymo (formerly the Google Self-Driving Car Project) does--though I'm curious if the backup has full capabilities of the primary. It seems it would have to, since it needs to pull over safely, and the failure could happen during heavy traffic. But perhaps if all it needs to do while running in the background is to monitor the primary, its background energy consumption could be kept low--it wouldn't need to access the cameras and control the car until it took over from the primary.


1710475305617.png
 

diamond.g

Power User
Posts
248
Reaction score
87
Good point--I hadn't considered that. I checked, and that's what Waymo (formerly the Google Self-Driving Car Project) does--though I'm curious if the backup has full capabilities of the primary. It seems it would have to, since it needs to pull over safely, and the failure could happen during heavy traffic. But perhaps if all it needs to do while running in the background is to monitor the primary, its background energy consumption could be kept low--it wouldn't need to access the cameras and control the car until it took over from the primary.


View attachment 28705
I wonder if it has to run active active to even determine there was a failure or that the primary was going to make the wrong decision.
 
Top Bottom
1 2