Skip to Main content Skip to Navigation
Reports

Scheduling computational workflows on failure-prone platforms

Abstract : We study the scheduling of computational workflows on compute resources thatexperience exponentially distributed failures. When a failure occurs, rollback and recovery is usedto resume the execution from the last checkpointed state. The scheduling problem is to minimizethe expected execution time by deciding in which order to execute the tasks in the workflow andwhether to checkpoint or not checkpoint a task after it completes. We give a polynomial-timealgorithm for fork graphs and show that the problem is NP-complete with join graphs. Our mainresult is a polynomial-time algorithm to compute the execution time of a workflow with specifiedto-be-checkpointed tasks. Using this algorithm as a basis, we propose efficient heuristics for solvingthe scheduling problem. We evaluate these heuristics for representative workflow configurations.
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-01075100
Contributor : Equipe Roma <>
Submitted on : Thursday, October 16, 2014 - 3:55:35 PM
Last modification on : Friday, June 25, 2021 - 3:40:06 PM
Long-term archiving on: : Saturday, January 17, 2015 - 10:46:37 AM

File

RR-8609.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01075100, version 1

Collections

Citation

Guillaume Aupy, Anne Benoit, Henri Casanova, Yves Robert. Scheduling computational workflows on failure-prone platforms. [Research Report] RR-8609, ENS Lyon; LIP; INRIA; CNRS; Université Lyon 1. 2014. ⟨hal-01075100⟩

Share

Metrics

Record views

540

Files downloads

371