M4 Rumors (requests for).

leman

Site Champ
Posts
641
Reaction score
1,196
May I ask which site you use to search for patents? Do you just search for ‘Apple’ or are there specific engineers you search for?


They are neat since they update every weak. I would just search for "Apple" once per month while having my morning cup of tea. One doesn't need more than 5-10 minutes to browse through 1000 patents and determine the interesting ones just from the title.
 

casperes1996

Power User
Posts
185
Reaction score
171
Also a bunch of patents for hardware based merge sorts.
That’s interesting. Sorting is quite a common algorithm. If there’s something to be won from hardware specific to it not also lost in spinning up that hardware I’m curious why it hasn’t been done before.
 

leman

Site Champ
Posts
641
Reaction score
1,196
That’s interesting. Sorting is quite a common algorithm. If there’s something to be won from hardware specific to it not also lost in spinning up that hardware I’m curious why it hasn’t been done before.

Oh, there are plenty of hardware sorting solutions around, it's mostly specialized stuff though. I suppose there are quite a lot of details around sorting, and current processors can already pretty much saturate the memory bandwidth doing sorting vie the general-purpose ISA. But I suppose if you need sorting in a neural processor, which is less programmable than a CPU, a dedicated sorting unit might be useful. If it allows you to forego the cost of doing a roundtrip to the CPU, that already might be a win. Although I am curious why they see a need for sorting in neural processor. Are there many ML algorithms and models that rely on sorting?
 

casperes1996

Power User
Posts
185
Reaction score
171
Oh, there are plenty of hardware sorting solutions around, it's mostly specialized stuff though. I suppose there are quite a lot of details around sorting, and current processors can already pretty much saturate the memory bandwidth doing sorting vie the general-purpose ISA. But I suppose if you need sorting in a neural processor, which is less programmable than a CPU, a dedicated sorting unit might be useful. If it allows you to forego the cost of doing a roundtrip to the CPU, that already might be a win. Although I am curious why they see a need for sorting in neural processor. Are there many ML algorithms and models that rely on sorting?
Huh. Never seen dedicated sorting hardware that I know of. But cool. ML really isn’t my area so can’t speak to that.
 

Yoused

up
Posts
5,623
Reaction score
8,942
Location
knee deep in the road apples of the 4 horsemen
Here are the "infringed" patents in question (pdfs):

First one appears to be about bonding a chip to a sheet of metal, then etching it out to form leads

These ones (just replace the end number in the url to get the file) cover methods of pushing the chip package into warm soft plastic as it is starting to harden
7732909
7989944
8368201

Something about bonding the leads inside the adhesive that holds the chip on the board
8222723

Making the leads that pass through the board much skinnier than older methods.
8238113

Using multi-layer metal sandwich leads were the inner layer is pushed up into the package
9107324

A method of embedding the board in the device chassis
11071207

Embedding chip packages inside a board contacting exterior leads, I think
11716816
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,150
Oh, there are plenty of hardware sorting solutions around, it's mostly specialized stuff though. I suppose there are quite a lot of details around sorting, and current processors can already pretty much saturate the memory bandwidth doing sorting vie the general-purpose ISA. But I suppose if you need sorting in a neural processor, which is less programmable than a CPU, a dedicated sorting unit might be useful. If it allows you to forego the cost of doing a roundtrip to the CPU, that already might be a win. Although I am curious why they see a need for sorting in neural processor. Are there many ML algorithms and models that rely on sorting?
The last time I programmed a neural net was in high school and two layers was considered advanced. When I saw your post, I tried looking it up but only found a bunch of links where the goal was to train neural networks to write sorting algorithms. Which is not what we’re looking for. So not sure.
 

leman

Site Champ
Posts
641
Reaction score
1,196
The last time I programmed a neural net was in high school and two layers was considered advanced. When I saw your post, I tried looking it up but only found a bunch of links where the goal was to train neural networks to write sorting algorithms. Which is not what we’re looking for. So not sure.

Just had an idea. A token-predicting machine (like ChatGPT) generates a list of occurrence probabilities. You need to sort it to sample from it efficiently. If you can do this from within the neural processor you can save a bit of latency and a bunch of data transfers.
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,150
Just had an idea. A token-predicting machine (like ChatGPT) generates a list of occurrence probabilities. You need to sort it to sample from it efficiently. If you can do this from within the neural processor you can save a bit of latency and a bunch of data transfers.

Very plausible and fitting with the rumored goal of it focusing on generative AI. Although I wonder at that point why stop at the sort and just accelerate the entire Alias method in hardware 🙃. Maybe they will!
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,330
Reaction score
8,523
Here are the "infringed" patents in question (pdfs):

First one appears to be about bonding a chip to a sheet of metal, then etching it out to form leads

These ones (just replace the end number in the url to get the file) cover methods of pushing the chip package into warm soft plastic as it is starting to harden
7732909
7989944
8368201

Something about bonding the leads inside the adhesive that holds the chip on the board
8222723

Making the leads that pass through the board much skinnier than older methods.
8238113

Using multi-layer metal sandwich leads were the inner layer is pushed up into the package
9107324

A method of embedding the board in the device chassis
11071207

Embedding chip packages inside a board contacting exterior leads, I think
11716816
No those are not? I mean, some of them are. Maybe that’s the list from the prior Samsung lawsuit. Also don’t think those descriptions are right. Also, “infringed” shouldn’t be in quotes. :)
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,330
Reaction score
8,523
Huh. Never seen dedicated sorting hardware that I know of. But cool. ML really isn’t my area so can’t speak to that.
I just designed one in my head. It would have a ton of adders, a large dedicated set of registers, with pointers into a set of ordinals. Don’t know that it would actually speed anything up, though :)
 

casperes1996

Power User
Posts
185
Reaction score
171
I just designed one in my head. It would have a ton of adders, a large dedicated set of registers, with pointers into a set of ordinals. Don’t know that it would actually speed anything up, though :)

But that's what I was thinking; If there's really a good reason to do so
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,330
Reaction score
8,523
If it is likely to get significant use, it would definitely speed things up at least a bit merely by dint of other stuff being able to other stuff at the same time.
I’m just wondering if it’s possible to make it big enough to pay for the overhead of shifting things into and out of the unit, in actual use. I have some experience designing big parallel comparators, and they aren’t all that small. You’d also have to reduce the sort function to, say, an unsigned long int comparison by some sort of hashing, which presumably you’d do one time prior to loading the unit, but that should be O(n).
 

casperes1996

Power User
Posts
185
Reaction score
171
I’m just wondering if it’s possible to make it big enough to pay for the overhead of shifting things into and out of the unit, in actual use. I have some experience designing big parallel comparators, and they aren’t all that small. You’d also have to reduce the sort function to, say, an unsigned long int comparison by some sort of hashing, which presumably you’d do one time prior to loading the unit, but that should be O(n).
I assume that at some array sizes it would be worth it. But yeah I imagine the arrays would have to be quite big and regular to warrant this sort of thing instead of another E core or something
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,330
Reaction score
8,523
I assume that at some array sizes it would be worth it. But yeah I imagine the arrays would have to be quite big and regular to warrant this sort of thing instead of another E core or something

By the way, funny that I typed O ( n ) and it got converted to a thumbs down
 

casperes1996

Power User
Posts
185
Reaction score
171
By the way, funny that I typed O ( n ) and it got converted to a thumbs down
Hehe I did imagine that was supposed to be O(n). Whole sorting still bounded by O(n log(n)) at best though so the potentially additional O ( n ) isn’t too important for large enough arrays. But if n needs to be that large the point also goes away a bit for consumer use cases. And for large enough Ns we become IO bound as well. I’d love to see where any hardware accelerated sorting is currently being used
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,330
Reaction score
8,523
Hehe I did imagine that was supposed to be O(n). Whole sorting still bounded by O(n log(n)) at best though so the potentially additional O ( n ) isn’t too important for large enough arrays. But if n needs to be that large the point also goes away a bit for consumer use cases. And for large enough Ns we become IO bound as well. I’d love to see where any hardware accelerated sorting is currently being used
Yeah, me too. Seems to me that it might make sense for certain special-purpose machines (or sub portions of special purpose algorithms - sort vertices by x-coordinate?). But so much sorting involves an array of pointers to objects where you are sorting using complex criteria on properties of properties stored in non-sequential memory addresses - there would often be a lot of work to set things up before you could even let the algorithm do its thing. And then you’d never be able to fit the entire data set into the thing at once (because if your data set is small you are probably ok using a general purpose processor), so now you are dividing things up and merging the results, which still ends up being a lot of swapping.
 
Top Bottom
1 2