M1 for Mac Pro/iMac Pro: Die sizes, Jade 2C, chiplets

Cmaier

Elite Member
Staff Member
Vaccinated
Site Donor
Posts
3,050
Reaction score
4,237
Since someone on the other forum is asking…

I’ve previously noted that the reticle size for the TSMC N5p node appears to be around 858mm squared. That raises the question of whether Jade 2C (the 20 core version of M1) and Jade 4C (the 40) could fit in the reticle (the reticle being the maximum size any single die could be).

Some assumptions:
  • Jade 2C is 20 CPU cores and 64 GPU cores
  • Jade 4C is 40 CPU cores and 128 GPU cores
  • Jade 2C will have double the SLC of M1 Max
  • Jade 4C will have double the SLC of Jade 2C
  • Jade 2C will allow up to 128GB RAM
  • Jade 4C will allow up to 256GB RAM
Some background:
  • M1 Max with 10 CPU cores and 32 GPU cores is 432 mm squared
  • The dimensions are 19.96mm x 21.66mm
My first observation is that if you double 432mm you get 864mm, which is 6 square mm more than the reticle. Not everything on the die has to be ”doubled” to get to Jade 2C - you don’t need twice the I/Os, etc. It is thus at least theoretically possible that Jade 2C could fit in the reticle.

That said, Apple won’t try to do that. While the area more or less works out, they would have to redesign the entirety of the each piece, because the aspect ratio is wrong. Apple would want to, essentially, mirror the M1Max. Looking at the floor plan, they would probably want to mirror in such a way that the die would be around 42mm tall and 20mm wide. Their alternative would be around 21mm tall and 40mm wide.

The reticle, however, does not have that aspect ratio. I don’t know the actual reticle dimensions, but they are likely something close to 29.5mm x 29.5mm.

We also know there is already some yield fall-off for M1 Max (hence the 24-core GPU version), and this would just get worse if they attempted to fit the entire Jade 2C on one chip.

So, instead, they will use some sort of multi-chip-module, and include multiple copies of what are, essentially, M1 Max’s (probably slightly modified around the outside edge where the I/Os are).

P.S.: i refuse to call them “chiplets.” That’s a dumb name for something that already had a name.
 

Entropy

Member
Posts
20
Reaction score
20
Current (193nm immersion and 0.33NA EUV) reticle dimensions are 26 mm by 33 mm or 858 mm2.
High (0.55) NA EUV looks set to cut this in half. It is targeted for introduction at 3nm, but may not show up until the 2nm node.
 

Yoused

up
Vaccinated
Posts
3,461
Reaction score
5,408
Location
knee deep in the road apples of the 4 horsemen
P.S.: i refuse to call them “chiplets.” That’s a dumb name for something that already had a name.
The ancient vernacular, that a "chip" is always in a DIP or other package, so it is natural for people to consider the two things inseverable. For a package to contain more than one chip, new terminology seems appropriate to the unwashed masses. Hopefully that will change.
 

Cmaier

Elite Member
Staff Member
Vaccinated
Site Donor
Posts
3,050
Reaction score
4,237
The ancient vernacular, that a "chip" is always in a DIP or other package, so it is natural for people to consider the two things inseverable. For a package to contain more than one chip, new terminology seems appropriate to the unwashed masses. Hopefully that will change.

I’ve been writing about packages containing multiple die since at least 1992, when I designed one. They’re called MCMs, or sometimes WSIP. Hell, Sparc64 did that way back when. (Designed by Hal. I turned down a job there in 1997ish, because I thought the commute would suck. Now my house is walking distance to that building, which is no longer Hal, of course. But I digress.)

“Chip” is always a vague word, so we go by “die” and “package.” But “chiplet” sounds cute, so suddenly everyone thinks its a great new invention.
 

Cmaier

Elite Member
Staff Member
Vaccinated
Site Donor
Posts
3,050
Reaction score
4,237
@Cmaier did you see the story on ars, borrowed from wired, where they talk to the designers? It is not very technical or in depth, but it reads through quickly.
Yeah, saw it. Can’t really gather much from that.

I miss the good old days, where I could read about all these processors in the Journal of Solid State Circuits. I wonder if it’s frustrating working at Apple and not being able to publish - I assume not :)

When the technical reports start coming, one also has to be a little cautious. I recall vividly reading about how a couple of my chips worked - seeing block diagrams and such - that were nowhere near right. Meanwhile, I actually have a book sitting on my shelf that was published celebrating AMD’s anniversary, and it has the actual RTL code for K6 in it (it was published when we were still doing spins on K6, if I recall correctly), so the world is just weird.
 

Nycturne

Site Champ
Vaccinated
Posts
424
Reaction score
460
My first observation is that if you double 432mm you get 864mm, which is 6 square mm more than the reticle. Not everything on the die has to be ”doubled” to get to Jade 2C - you don’t need twice the I/Os, etc. It is thus at least theoretically possible that Jade 2C could fit in the reticle.

...

So, instead, they will use some sort of multi-chip-module, and include multiple copies of what are, essentially, M1 Max’s (probably slightly modified around the outside edge where the I/Os are).

I do think this makes the most sense. Even if Jade 2C fits, why not use the same die for 2C and 4C?

But I guess the bit I'm still trying to figure out is how to handle data movement. I'm hesitant to buy into the idea posted "elsewhere" of each die having its own memory and having to shuttle things around.
 

Cmaier

Elite Member
Staff Member
Vaccinated
Site Donor
Posts
3,050
Reaction score
4,237
I do think this makes the most sense. Even if Jade 2C fits, why not use the same die for 2C and 4C?

But I guess the bit I'm still trying to figure out is how to handle data movement. I'm hesitant to buy into the idea posted "elsewhere" of each die having its own memory and having to shuttle things around.

If I had to bet, I would probably guess they would use a crossbar and let any die address any memory. You do have to add 6ps for every mm of wire, more or less, though. So does the MMU migrate stuff based on what die(s) access things most often? Or migrate threads to be closer to the memory? In actual Mac code, is Apple finding that most memory access is local to a given core cluster? Is there a useful heuristic (e.g. whatever die first loads or stores a memory address is favored by assigning the memory address to that die?) Fascinating questions.
 
Top Bottom