Distributed online software monitoring of manycore architectures

Abstract : This paper describes the design principles of a software based on-line testing application used to monitor manycore architectures running multi thread functional applications. The key idea is to have a non intrusive monitoring application running in parallel with the functional one. The monitoring application aims at detecting and reacting to software or hardware malfunctions, and can be seen as a service provided by the operating system. This monitoring method relies on the use of embedded sensors that capture physical values (temperature, ...) from the chip, or software-related indicators like CPU load. A case-study implementing this methodology has been performed and results in terms of memory usage and performance overhead are given.
