Apple to use it’s own server chips

Cmaier · May 9, 2024

I bet a lot of people would love to buy xserves with these chips if apple would sell them to us.

Bloomberg: iOS 18 AI tech to rely on data centers with Apple silicon - 9to5Mac

Apple plans to use its own high-end Apple silicon chips in data centers designed to power its upcoming iOS 18 AI features.

9to5mac.com

Jimmyjames · May 9, 2024

i wonder if they are using macOS on these or Linux (Asahi)?

Yoused · May 9, 2024

I wonder if the servers are boxes of blades.

Cmaier · May 9, 2024

Yoused said:
I wonder if the servers are boxes of blades.

i would think so. I think these are M2 ultras, or something very close to those, and they should be able to achieve fairly high density in a rack, assuming custom cooling, given the thermals.

jbailey · May 9, 2024

Jimmyjames said:
i wonder if they are using macOS on these or Linux (Asahi)?

Apple is quite capable of writing a scalable and performant version of Darwin. I pretty sure they will not use Asahi.

MacPoulet · May 9, 2024

Cmaier said:
I bet a lot of people would love to buy xserves with these chips if apple would sell them to us.

Bloomberg: iOS 18 AI tech to rely on data centers with Apple silicon - 9to5Mac

Apple plans to use its own high-end Apple silicon chips in data centers designed to power its upcoming iOS 18 AI features.

9to5mac.com

It won’t be the same, their ~~jet turbines~~ fans won’t be as loud.

theorist9 · May 10, 2024

If this application could also benefit from Extreme-class chips, that would reduce or eliminate one of the main barriers to seeing these in the Mac Pro: Insufficient volume to justify the development costs.

B01L · May 10, 2024

Apple ASi AI Server

Rackmount Chassis
M4 Ultra SoC
ASi GPU/NPU Blades
???

While I would love to see a M4 Extreme SoC, I kinda feel like we will not see such a beast until the M5-series of SoCs on 2nm...?

Jimmyjames · Wednesday at 2:52 PM

I’m struggling to make sense of some of the rumours surrounding this initiative. There was a report today by The Information:
https://9to5mac.com/2024/05/29/apple-ai-confidential-computing-ios-18/ (The Information link unavailable at the time of writing)

It states that Apple has a way to perform more demanding AI tasks in the cloud while maintaining privacy. That’s all great but what I am perplexed by is

A). Can they really provide enough compute for these really demanding tasks by using M2 Ultras? These are nearly beaten by an M3 Max now and certainly big iron from Nvidia will destroy these. Can really support their user base of ~1 billion users?

B) If not just M2 Ultras, then what? The article states that later in the year Apple will move to M4 chips. I assume the Ultra. Gurman has repeatedly said no M4 Ultra until mid-2025. He could certainly be wrong, but he has doubled down on it recently. The M4 Ultra has to come sooner if they are to be even remotely competitive with their own chips surely?

C) Perhaps the rumours of their own servers for AI are just wrong. That is, they are making them, but they handle only a small component, and that detail is lost in the rumour.

Yoused · Wednesday at 3:14 PM

Jimmyjames said:
If not just M2 Ultras, then what? The article states that later in the year Apple will move to M4 chips. I assume the Ultra. Gurman has repeatedly said no M4 Ultra until mid-2025.

The obvious answer (well, obvious to me) is that each blade has a pair of M2 Ultras with a memory bridge between them (each can access the other's package RAM over the bridge), and the SoCs are clamped into sockets that will be pin-compatible with M4 Ultras (the clamp maintains solid pin contact better than the old style seating sockets and also affords a good waste heat path).

Jimmyjames · Wednesday at 3:18 PM

Yoused said:
The obvious answer (well, obvious to me) is that each blade has a pair of M2 Ultras with a memory bridge between them (each can access the other's package RAM over the bridge), and the SoCs are clamped into sockets that will be pin-compatible with M4 Ultras (the clamp maintains solid pin contact better than the old style seating sockets and also affords a good waste heat path).

Do you think this would provide enough compute for a user base the size of Apple’s?

Yoused · Wednesday at 6:03 PM

Jimmyjames said:
Do you think this would provide enough compute for a user base the size of Apple’s?

Depends on the size of the installation. The largest, PFlop/EFlop SCs have millions of cores (and also draw many megawatts). If one U4 can hold 16 blades comfortably and 1∞L has a room with 30 racks full of units, that would be a shitload of cores. They might have to contract with Cray for the high-performance interconnect, though.

(Remember when Steve told us the G4 was classified as a SC?)

Altaic · Thursday at 1:27 AM

Pretty sure Apple’s ML infrastructure will be devoted to R&D, not inference. They may collect non-opt-out-on-device-data with differential privacy, but that’s different. This guy gets it:

dada_dave · Thursday at 6:02 AM

Altaic said:
Pretty sure Apple’s ML infrastructure will be devoted to R&D, not inference. They may collect non-opt-out-on-device-data with differential privacy, but that’s different. This guy gets it:

View attachment 29678

I’m not sure I understand what API vs local is in this context. Or rather I get local, but am unsure what API is here. Is there another meaning beyond Application Programming Interface or is there some way that applies here that I’m missing?

casperes1996 · Thursday at 6:39 AM

dada_dave said:
I’m not sure I understand what API vs local is in this context. Or rather I get local, but am unsure what API is here. Is there another meaning beyond Application Programming Interface or is there some way that applies here that I’m missing?

Took me a while to get that too. They mean using cloud provider apis like ChatGPT. I think the ai crowd use it as a shorthand because the model running programs either let you run local or insert an api access token so they conclude api is online models.

throAU · Thursday at 7:07 AM

Jimmyjames said:
I’m struggling to make sense of some of the rumours surrounding this initiative. There was a report today by The Information:
https://9to5mac.com/2024/05/29/apple-ai-confidential-computing-ios-18/ (The Information link unavailable at the time of writing)

It states that Apple has a way to perform more demanding AI tasks in the cloud while maintaining privacy. That’s all great but what I am perplexed by is

A). Can they really provide enough compute for these really demanding tasks by using M2 Ultras? These are nearly beaten by an M3 Max now and certainly big iron from Nvidia will destroy these. Can really support their user base of ~1 billion users?

B) If not just M2 Ultras, then what? The article states that later in the year Apple will move to M4 chips. I assume the Ultra. Gurman has repeatedly said no M4 Ultra until mid-2025. He could certainly be wrong, but he has doubled down on it recently. The M4 Ultra has to come sooner if they are to be even remotely competitive with their own chips surely?

C) Perhaps the rumours of their own servers for AI are just wrong. That is, they are making them, but they handle only a small component, and that detail is lost in the rumour.

Server hardware is usually last generation or late in the generation because that’s when teh bugs have been sorted out.

Whilst the M2 ultra may not be the latest and greatest, it is tested and like anyone apple will be scaling out.

Its not like one server is going to use one of these chips - think probably 4 or 8 per blade, and say 40 blades per rack, and hundreds of racks in a datacentre….

whilst your modern iDevice might compete with a single m2 ultra, it certainly won’t be as fast as a cluster of even 300 of them, and apple will have hundreds of thousands or millions of them in their cloud no doubt.

B01L · Thursday at 10:33 AM

I want to see Johny Srouji walking thru a sub-basement under the Apple Mothership while telling us all about the new Mn Ultra & Mn Extreme SoCs powering the new iCloudAI service, and we see nothing but rack after rack after rack of shiny new Apple blade servers...! ;^p

Yoused · Thursday at 11:03 AM

How much of a GPU core's logic is graphic-specific? If you shave out shading and RT to get a PEU (Parallel Embarrassing Unit), is it notably smaller and more efficient at the kind of work a server would be doing?

dada_dave · Thursday at 12:16 PM

Yoused said:
How much of a GPU core's logic is graphic-specific? If you shave out shading and RT to get a PEU (Parallel Embarrassing Unit), is it notably smaller and more efficient at the kind of work a server would be doing?

GPU cores are really designed for FP heavy loads and have no branch prediction and threads in a warp have to all take the same branches to be fully parallel. So they’re not like little CPU cores that would be useful for server workloads.

If the primary purpose is for ML training, maybe large scale inference though, obviously the GPUs are good for that. Though it would be better if they had dedicated matmul accelerators for that purpose.

The Hardcard · 2024-06-01T14:38:59-0700

Jimmyjames said:
I’m struggling to make sense of some of the rumours surrounding this initiative. There was a report today by The Information:
https://9to5mac.com/2024/05/29/apple-ai-confidential-computing-ios-18/ (The Information link unavailable at the time of writing)

It states that Apple has a way to perform more demanding AI tasks in the cloud while maintaining privacy. That’s all great but what I am perplexed by is

A). Can they really provide enough compute for these really demanding tasks by using M2 Ultras? These are nearly beaten by an M3 Max now and certainly big iron from Nvidia will destroy these. Can really support their user base of ~1 billion users?

B) If not just M2 Ultras, then what? The article states that later in the year Apple will move to M4 chips. I assume the Ultra. Gurman has repeatedly said no M4 Ultra until mid-2025. He could certainly be wrong, but he has doubled down on it recently. The M4 Ultra has to come sooner if they are to be even remotely competitive with their own chips surely?

C) Perhaps the rumours of their own servers for AI are just wrong. That is, they are making them, but they handle only a small component, and that detail is lost in the rumour.

I’m just another tech nerd speculator here, no special or professional insight:

1 The bottleneck for large language model token generation Is memory bandwidth. for this part of inference both Ultras are about the same speed and nearly double the speed of each Max. The M1 generation at least, appears to not be using it theoretical maximum bandwidth, later generations are closer.

The other important factor for running the largest most useful models is memory capacity, where M2 Ultras also crush all Max variations.

I would hope the datacenter team would arrange to have the M2 Ultras fully populated with faster RAM than the consumer Ultras.

2. if these rumors are true, it would make sense for the M4 Max and Ultra to be available for consumers later. serving their user base with datacenter AI is going to require many hundreds of thousands of Ultras. They might need two to three quarters of M4 Ultra production before they can spare to sell them.

According to Semianalysis:

”The other indication that Cupertino is serious about their AI hardware and infrastructure strategy is they made a number of major hires a few months ago. This includes Sumit Gupta who joined to lead cloud infrastructure at Apple in March. He’s an impressive hire. He was at Nvidia from 2007 to 2015, and involved in the beginning of Nvidia's foray into accelerated computing. After working on AI at IBM, he then joined Google’s AI infrastructure team in 2021 and eventually was the product manager for all Google infrastructure including the Google TPU and Arm based datacenter CPUs.

He’s been heavily involved in AI hardware at Nvidia and Google who are both the best in the business and are the only companies that are deploying AI infrastructure at scale today. This is the perfect hire.”

Hopefully he is discussing with the chip team about optimizing Apple Silicon capabilities in this arena.

Apple to use it’s own server chips

Site Master

Site Champ

up

Site Master

Power User

Member

Site Champ

SlackMaster

Site Champ

up

Site Champ

up

Power User

Elite Member

Power User

Site Champ

SlackMaster

up

Elite Member

New member

Similar threads