An algorithm for automatically obtaining distributed and fault-tolerant static schedules

Abstract : Embedded systems account for a major part of crit- ical applications (space, aeronautics, nuclear. . . ) as well Our goal is to automatically obtain a distributed and as public domain applications (automotive, consumer fault-tolerant embedded system: distributed because the electronics. . . ). Their main features are: system must run on a distributed architecture; fault-tolerant because the system is critical. Our starting point is a source algorithm, a target distributed architecture, some distribu- tion constraints, some indications on the execution times of the algorithm operations on the processors of the target ar- chitecture, some indications on the communication times of the data-dependencies on the communication links of the target architecture, a number Npf of fail-silent processor failures that the obtained system must tolerate, and finally some real-time constraints that the obtained system must satisfy. In this article, we present a scheduling heuristic which, given all these inputs, produces a fault-tolerant, dis- tributed, and static scheduling of the algorithm on the ar- chitecture, with an indication whether or not the real-time constraints are satisfied. The algorithm we propose consist of a list scheduling heuristic based active replication strat- egy, that allows at least Npf +1 replicas of an operation to be scheduled on different processors, which are run in parallel to tolerate at most Npf failures. Due to the strat- egy used to schedule operations, simulation results show that the proposed heuristic improve the performance of our method, both in the absence and in the presence of failures.
