Feasibility of stepwise design of multitolerant programs

Document Type

Conference Proceeding

Publication Date

12-1-2011

Abstract

The complexity of designing programs that simultaneously tolerate multiple classes of faults, called multitolerant programs, is in part due to the conflicting nature of the fault tolerance requirements that must be met by a multitolerant program when different types of faults occur. To facilitate the design of multitolerant programs, we present sound and (deterministically) complete algorithms for stepwise design of two families of multitolerant programs in a high atomicity program model, where a process can read and write all program variables in an atomic step. We illustrate that if one needs to design failsafe (respectively, nonmasking) fault tolerance for one class of faults and masking fault tolerance for another class of faults, then a multitolerant program can be designed in separate polynomial-time (in the state space of the faultintolerant program) steps regardless of the order of addition. This result has a significant methodological implication in that designers need not be concerned about unknown fault tolerance requirements that may arise due to unanticipated types of faults. Further, we illustrate that if one needs to design failsafe fault tolerance for one class of faults and nonmasking fault tolerance for a different class of faults, then the resulting problem is NP-complete in program state space. This is a counterintuitive result in that designing failsafe and nonmasking fault tolerance for the same class of faults can be done in polynomial time. We also present sufficient conditions for polynomial-time design of failsafe-nonmasking multitolerance. Finally, we demonstrate the stepwise design of multitolerance for a stable disk storage system, a token ring network protocol and a repetitive agreement protocol that tolerates Byzantine and transient faults. Our automatic approach decreases the design time from days to a few hours for the token ring program that is our largest example with 200 million reachable states and 8 processes. © 2011 ACM.

Publication Title

ACM Transactions on Software Engineering and Methodology

Share

COinS