Cache structures in a multicore system are highly vulnerable to soft errors. Enabling fault tolerance capabilities on all cache structures in a system is inefficient in terms of performance and power consumption. In this study, we propose an enhanced protection mechanism for code segments, which are critical in terms of reliability, by utilizing asymmetrically reliable cores under performance and power constraints. Our proposed system contains at least one high-reliability core, which has an ECC-protected L1 cache, and several low-reliability cores, which have no protection mechanisms. Reliability-based critical code regions are assumed to be high-priority functions, which are extracted by examining the execution time percentages and the program's call graph in our framework, statically. Software threads that invoke one of the high-priority functions are bound to the high-reliability cores dynamically during the execution, while the threads that execute the remaining functions are bound to the low-reliability cores. As part of the experimental analysis, our proposed framework is compared with traditional fully protected and unprotected configurations with respect to performance, power and reliability metrics for various applications of the benchmarks. Our framework exploits the benefits of providing the reliability-based critical regions of the applications exclusively by offering notable power and cost savings with close performance and reliability values for the set of functions reported in the experimental results.
All Science Journal Classification (ASJC) codes
- Hardware and Architecture