Ambuj Goyal, S.S. Lavenberg, et al.
Annals of Operations Research
Checkpointing is a technique for reducing the completion (execution) time of long-running batch programs in the presence of failures. It consists of intermittently saving the current status of the program under execution so that if a failure occurs, the program needs to be restarted from the most recent checkpoint rather than from the beginning. Because of occasional long reprocessing times and the possibility of failures during recovery, checkpoints during reprocessing is guaranteed to increase the effectiveness of any checkpointing technique. In all models considered thus far, checkpointing is allowed only during useful processing of the program. In this paper, we carry out the analysis of a model with checkpointing during reprocessing following failures. Closed-form expressions for the Laplace transform of the program completion time, its mean and variance are derived. It is shown that, asymptotically, the expected completion time increases linearly (exponentially) with the service requirement in the presence (absence) of checkpointing. The question of whether checkpointing is beneficial is addressed and the optimal checkpointing rate, α*, which minimizes the expected completion time, is computed. The completion time analysis is used as a building block in a queueing model with a Poisson input stream of jobs. This system can be viewed as an M/G/1 queue, in which the first customer starting a busy period is having a different distribution for its service requirement. We derive this distribution and then determine several measures of interest. The checkpointing rate, α**, which minimizes the mean response time is computed. It is numerically shown that α** is very close to α* for a wide range of system parameter values. Finally, sensitivity analysis is performed to study the effect of using non-optimal checkpointing rates on the mean completion time and the mean response time. © 1990, Taylor & Francis Group, LLC. All rights reserved.
Ambuj Goyal, S.S. Lavenberg, et al.
Annals of Operations Research
Victor Nicola, Marvin K. Nakayama, et al.
FTCS 1990
Alexander Thomasian, Victor Nicola
SIGMETRICS 1989
N. Sato, K.S. Trivedi
SCC 2007