Coordinated versus Uncoordinated Checkpoint Recovery for Network-on-Chip based Systems
Résumé
This paper presents and compares two failure recovery schemes developed for multi-core systems-on- chip that use network-on-chip communication infrastructures. The failure recovery methods are aimed towards fast recovery from system or application failures, when global reset is the last resort to recover a failed system. The first method uses coordinated checkpointing, while the second is based on uncoordinated checkpointing and message logging. Their effectiveness and overhead are evaluated and compared, under different application traffic loads and failure rates.