Instruction-level parallelism - page 1 When a frontend instruction tries to read a register whose backend device is not ready, that WIZ is forced to wait. The ready bits are key to the WIZ's operation. If read-ready is low when a read instruction comes, the entire frontend circuit simply stops. As there is no clock, not a single transistor anywhere in the frontend is switching. No power is being used. That WIZ is simply "frozen" until that ready line goes high again and the action resumes. If the source register's read-ready line is already high when an instruction starts, and likewise for the destination register's write-ready line, no added delays occur. The instruction is merely a register-to-register copy across the bus, and will execute at the maximum speed of the frontend hardware, possibly about 10 picoseconds or 100 GHz (in a 10 nm process). Whatever that time turns out to be, let's call it 1 "time unit" or 1 "tu" (pronounced "tee-you"). It is our baseline instruction delay. Suppose we have a multiplier that takes 400 tu's to compute, and three registers, "A", "B", and "C", which are all currently read-ready. Then we can do this: A => multiplier.in1 ; B => multiplier.in2 ; multiplier.out => C The first two instructions will take just 1 tu each. They are merely "dropping off" data into the multiplier's frontend registers. The multiplier's backend logic then kicks in, clearing multiplier.out's read-ready until it is done. Thus the third instruction will sit in a zero-power-using wait state for about 400 tu's.