It’s capable of issuing tons of instructions at once, and issuing them out of order so that the ALUs are always busy with something to do. It can do this because it can look deep into the instruction stream and find instructions that do not depend on results from other instructions, and can issue them while there are bubbles in the pipeline (caused by things like branch mispredictions, cache misses, etc)
Some of those things can be done simultaneously – if this, for instance, takes longer to do and the reorder buffer can finish that and more stuff while this is being worked on, it will, in an effort to get as much stuff done in as short a time as possible. All of this is carefully tagged and managed so that data flow carries forward properly.
On an efficiency core, the reorder buffer can hold over 200 instructions, a performance core about three times that (and the buffers can retain looped code so that the core only has to fetch it once). There is a large variety of "barrier" instructions so that code can selectively enforce instruction and memory-access ordering in order to maje sure things make sense, but those instructions are used sparingly, to allow the core to work at its best.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.