NMR-Mpar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors

Abstract : Multi-core and many-core processors are a promising solution to achieve high performance 6 by maintaining a lower power consumption. However, the degree of miniaturisation make 7 them more sensitive to soft-errors. To improve the system reliability, this work proposes a 8 fault-tolerance approach based on redundancy and partitioning principles called NMR-MPar: 9 N-Modular Redundancy and M-Partitions. By combining both principles, this approach allows 10 multi/many-core processors to perform critical functions in mixed-criticality systems. Benefiting of 11 the capabilities of these devices, NMR-MPar creates different partitions that perform independent 12 functions. For critical functions, it is proposed that N partitions with the same configuration 13 participate of a N-Modular Redundancy system. In order to validate the approach, a case study 14 is implemented on the KALRAY MPPA-256 many-core processor running two parallel benchmark 15 applications. Traveling Salesman Problem and Matrix Multiplication applications were selected to test 16 different device’s resources. The effectiveness of NMR-MPar is assessed by Software Implemented 17 Fault-Injection. For evaluation purposes, it is considered that the system is intended to be used in 18 avionics. Results show the improvement of the application reliability in two orders of magnitude 19 when implementing NMR-MPar on the system. Finally, ththis work opens the possibility to use 20 massive parallelism for dependable applications in embedded systems.
