A two-phase recovery mechanism
Superscalar processors take advantage of speculative execution to improve performance. When the speculation turns out to be incorrect, a recovery procedure is initiated. The back-end of the processor cannot be flushed due to having a mixture of both valid and invalid instructions. A basic solution is to wait for all valid instructions to retire and then purge the invalid instructions. However, if a long latency operation, such as a Last-level Cache (LLC) miss appears before the misspeculation point, the back-end recovery time significantly increases.
Many proposed mechanisms selectively flush invalid instructions in order to speed up the back-end recovery. In general, these mechanisms rely on broadcasting some misprediction related tags to remove the instructions from any backend structures, such as ROB, LSQ, RS, etc. The hardware overhead in these mechanisms is nontrivial and can potentially affect the processor clock cycle time if they are on the critical path. Moreover, a checkpointing mechanism or a walker needs to be added to accelerate the recovery of the front-end register alias table (F-RAT).
We propose a two-phase recovery mechanism which does not need any walking or broadcasting process and can still match the performance of the state-of-the-art recovery approaches. The first phase works similar to a typical basic recovery mechanism and the second phase is not triggered until the backend is stalled by an LLC miss load. In that case, the second phase treats the load as a misspeculation and recovers from this load. Since the LLC miss response time is usually much longer than the time to fill the entire pipeline with new instructions, in most cases our mechanism can completely overlap the branch misprediction recovery penalty with the cache miss penalty.