Intel’s new APX/AVX10 extensions

dada_dave

Elite Member
Posts
2,164
Reaction score
2,150
This could also under x86 vs ARM but I thought it might deserve its own thread:


Intel® APX demonstrates the advantage of the variable-length instruction encodings of x86 – new features enhancing the entire instruction set can be defined with only incremental changes to the instruction-decode hardware. This flexibility has allowed Intel® architecture to adapt and flourish over four decades of rapid advances in computing – and it enables the innovations that will keep it thriving into the future.

The rest of the news bulletin is about the details of the new instructions, but I highlighted the conclusion above as especially certain posters would “like” the quoted claim. 😉 Intel is doubling down. We’ll see how that works out for them. I’m not qualified to judge, maybe it will!

And we have AVX10 too which seems very comparable in philosophy to ARM’s SVE2 though only meant to bridge the 256-512bit SIMD workloads.

 
Last edited:

leman

Site Champ
Posts
641
Reaction score
1,196
Interesting… Intel is reimplementing ARM in x86? 😅

One thing I find curious is that they talk about conditional execution for many operations, a feature ARM removed from their ISA with Aarch64…
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,330
Reaction score
8,523
Sort of ironic that this seems to be about adding an additional 16 general purpose registers and adding versions of a bunch of instructions that now take a destination register; it makes things more RISC-like, by making the instruction encodings more perverse.
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,330
Reaction score
8,523
Interesting… Intel is reimplementing ARM in x86? 😅

One thing I find curious is that they talk about conditional execution for many operations, a feature ARM removed from their ISA with Aarch64…

Yep. It‘s like “let’s add a RISC-like instruction set“ which we will do by lengthening a bunch of instructions so the front end is less RISC-like.

Good luck with that.
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,150
Interesting… Intel is reimplementing ARM in x86? 😅

One thing I find curious is that they talk about conditional execution for many operations, a feature ARM removed from their ISA with Aarch64…

Sort of ironic that this seems to be about adding an additional 16 general purpose registers and adding versions of a bunch of instructions that now take a destination register; it makes things more RISC-like, by making the instruction encodings more perverse.
Okay so I wasn’t completely off base. I was reading this thinking that sounds like … it’s an Arm-like design but in x86 with an x86 decoder which they were propping up in their hype release. So it’s 90’s pipelines all over again but without the manufacturing advantage? Shall we prepare for another set of think pieces from tech journalists about how it’s now all RISC-like under the hood anyway we should just stick with x86?
 

Yoused

up
Posts
5,624
Reaction score
8,943
Location
knee deep in the road apples of the 4 horsemen
One thing I find curious is that they talk about conditional execution for many operations, a feature ARM removed from their ISA with Aarch64…
The original AArch32 included conditional execution for every instruction as a way to make the code slightly more compact and to simplify decoding. There were no conditional branches, calls or returns because they were already all conditional. The condition field was removed in order to implement a 32-register architecture.

This idea reminds me of Coldfire, which stripped out a large fraction of 68K function in order to make it more "RISC-like", (yeah, that memory-indirect mode really did need to go away), but it does not look like they gained much. Intel might see gains from this, but a lot of people are likely to say ffswhy?
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,330
Reaction score
8,523
So does AMD have a license to this new stuff? (Doubt it). If it comes to adding more registers and adding RISC-like instructions, we had some ideas 20 years ago. But at this point, if you want RISC, just do RISC.
 

KingOfPain

Site Champ
Posts
270
Reaction score
357
One thing I find curious is that they talk about conditional execution for many operations, a feature ARM removed from their ISA with Aarch64…

As Yoused mentions, AArch32 did predication (i.e. the conditional execution of all instructions), although I believe that some later instruction additions had no predications, because they were running out of ways to encode the instructions, which is kinda the point of Intel here.
The idea of predication is that you can reduce the number of branches prior to branch prediction to reduce the occurance of pipeline flushes.
Unfortunately, the compilers couldn't really work with that feature. I tried the standard GCD example from a lot of books in various compilers (including the official one from ARM), and none managed to reproduce that code. Basically, it was just of use for hand-coded assembly language.

Of course Intel had to put (at least partial) predication (including a dedicated predication register) into IA-64 (Itanium) over a decade later, and it was a great success there (yes, that was sarcasm).

If you take a look at history, then the great extensibility of the IA-32 has been used for a lot of crap (e.g. MMX), which also means that a lot of the instructions should actually no longer be used (e.g. all the old FPU instructions; or INC/DEC, which were great back in the early 80s, but are now slower than ADD/SUB, because they set the condition flags differently).
 

KingOfPain

Site Champ
Posts
270
Reaction score
357
So does AMD have a license to this new stuff? (Doubt it). If it comes to adding more registers and adding RISC-like instructions, we had some ideas 20 years ago. But at this point, if you want RISC, just do RISC.

Maybe Intel tries to get a monopoly that way...
Adding additional registers always has to be supported by the operating system, though, which is why MMX was reusing the FPU registers (but that wasn't a great idea either).

The motto back in my Acorn RiscPC time: No RISC, no fun.

Since Cliff knows a little bit of German, I'm wondering if he gets this one (it unfortunately doesn't really translate to English):
"Besser ARM dran als ARM ab."
(It's a play on words, because "arm dran" means that someone doesn't do that well, but in this case the literal meaning would be "having an ARM". While "Arm ab" literally means "an arm off", and it now occurs to me that this is probably a poor choice, given Cliff's recent predicament. I hope your shoulder is doing better now...)
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,330
Reaction score
8,523
Maybe Intel tries to get a monopoly that way...
Adding additional registers always has to be supported by the operating system, though, which is why MMX was reusing the FPU registers (but that wasn't a great idea either).

The motto back in my Acorn RiscPC time: No RISC, no fun.

Since Cliff knows a little bit of German, I'm wondering if he gets this one (it unfortunately doesn't really translate to English):
"Besser ARM dran als ARM ab."
(It's a play on words, because "arm dran" means that someone doesn't do that well, but in this case the literal meaning would be "having an ARM". While "Arm ab" literally means "an arm off", and it now occurs to me that this is probably a poor choice, given Cliff's recent predicament. I hope your shoulder is doing better now...)

Well, we got Microsoft to rewrite the whole operating system to support AMD64, so i imagine Intel can convince Microsoft (maybe). Or maybe Microsoft just doesn’t care, because it sees the future as Arm, too.

Und das war ein guter Witz.
 

leman

Site Champ
Posts
641
Reaction score
1,196
What’s also interesting is that they seem to be abandoning AVX512 in favor for a 256-bit SIMD ISA with similar features. Looks like a risky play with long-term consequences, but doesn’t make the x86 software ecosystem simpler in the short term. But this does mess things for AMD who literally just now introduced AVX512 support…
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,150
Dougall’s thoughts:


Intel's new PUSH2/POP2 are similar to ARM's LDP/STP. I think these are extremely underrated instructions. Loads and stores are quite expensive, but these processors already support 128-bit loads and stores for vector instructions.
Zen 3 and the Apple M1 can both do 3 loads per cycle, but with LDP, the M1 can load 2x the scalar registers per cycle – kinda crazy. It's a shame compilers aren't better at using these instructions, and that the Intel paired load/store is restricted to stack push/pop.

The "Balanced PUSH/POP Hint" is a little odd, but mirrors an optimisation used by the M1 that detects matching pushes and pops with the register numbers typically used by compilers in function prologues and epilogues, and performs fast forwarding. There's a load-to-store-forwarding-like penalty in cases where the instructions are incorrectly paired, but don't actually alias, so I guess this hint could avoid that.

Conditional loads and stores are the biggest surprise to me so far. But they kind of make sense – you already have predicated loads and stores happening on the vector side, so it's nice to see that as an option in scalar code too.
Should also allow for a conditional trap by NULL-pointer-deref (or by writing to RIP+0 if you have W^X and want to save a byte?)
Adding that to my ARM wish-list.

There’s more info in his replies to others as well. So well worth the read in the mastodon thread even with main excerpts quoted above.
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,150
What’s also interesting is that they seem to be abandoning AVX512 in favor for a 256-bit SIMD ISA with similar features. Looks like a risky play with long-term consequences, but doesn’t make the x86 software ecosystem simpler in the short term. But this does mess things for AMD who literally just now introduced AVX512 support…
Reading about the AVX10 instructions, it seems like it’ll be more similar to ARM’s SVE2 in that the instruction set can run on either 256 or 512-bit SIMD - e.g. E-cores with 256-bit SIMD can run the same code as a P-core with 512-bit SIMD or, if no consumer cores ship with such, at the very least a server processor with such a SIMD layout.

 
Last edited:

dada_dave

Elite Member
Posts
2,164
Reaction score
2,150
Do they believe that there is something superior about the x86 coding patterns that makes it a good idea to just tape stuff onto their crusty ISA?
Backwards compatibility with decades of Windows x86 software. That’s … not a small thing I’ll grant them.
 

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,330
Reaction score
8,523
Backwards compatibility with decades of Windows x86 software. That’s … not a small thing I’ll grant them.
I bet the vast majority of people running windows are not running software that is more than a few years old.
 

Yoused

up
Posts
5,624
Reaction score
8,943
Location
knee deep in the road apples of the 4 horsemen
I bet the vast majority of people running windows are not running software that is more than a few years old.
My best friend swore by a Windows web search aggregator (cf. Sherlock) called "Copernic", about 2 decades ago. I tried it on VPC and struggled with Windows UI "standards": the damn thing was just garish as hell to look at. I suspect it never made it to 64-bit, though it looks like the company now makes a computer search tool. Most old stuff simply becomes non-useful (counter example: I thought Freehand was lovely and miss it often).

I see an application as two parts: the work engine and the UI. Mostly the work engine parts carry forward to the latest hardware while the UI parts (including the engine interties that create methodologies) often fall by the wayside. Really, building programs is a lot about putting existing work engine parts together to form methodologies. It should be blindingly easy to move to new architectures. There was an interesting system concept twenty years ago called "OpenDoc" that might have been ideal for end-user-level modularization of system/application tools, but it got killed, probably because of being a threat to profitability.
 

dada_dave

Elite Member
Posts
2,164
Reaction score
2,150
I bet the vast majority of people running windows are not running software that is more than a few years old.
Sure but I can hear the collective howls of all the businesses running that one piece of software they refuse to upgrade from … plus the IT administrators where “no one ever got fired for buying IBM Intel” because that’s just “what works” with their archaic admin software. It ain’t called inertia for a reason … (or the more highfalutin terms: “market entrenchment”/“barrier to entry”).
 
Last edited:

Cmaier

Site Master
Staff Member
Site Donor
Posts
5,330
Reaction score
8,523
Sure but I can hear the collective howls of all the businesses running that one piece of software they refuse to upgrade from … plus the IT administrators where “no one ever got fired for buying IBM Intel” because that’s just “what works” with their archaic admin software. It ain’t called inertia for a reason … (or the more highfalutin terms: “market entrenchment”/“barrier to entry”).
Yep, but at some point (already passed) it no longer matters for most customers. Piss off a few so that the rest can prosper with an ISA that isn’t held together with bubble gum and duct tape.
 

leman

Site Champ
Posts
641
Reaction score
1,196
Do they believe that there is something superior about the x86 coding patterns that makes it a good idea to just tape stuff onto their crusty ISA?

Backwards compatibility is certainly an important psychological factor. Intel marketing is certainly hard at work here given the rhetoric “look what we can do with our variable length instructions!”

Probably the more interesting part is that they can still use old instructions formats. The new instructions address some of x86 deficiencies, but the bulk of code won’t need 32 registers or the new features, so I can imagine that they can achieve slightly better instruction density. Of course, all this at the expense of complex and power-hungry decoding (but that’s what uop caches are for).

In general, this does look like a useful revitalization of the old x86, but it just might be too little too late. It will take many years for these instructions to become commonplace. Intel should have introduced them ten years ago… instead of forcing the customer to post with hide and seek.
 
Top Bottom
1 2