Maintenance Cost Models

Preventive maintenance allows us to ensure an acceptable level of reliability during the structural lifecycle. A conditional maintenance policy is based on periodic inspections of degradation, which eventually trigger alarms related to repairs and replacements. In practice, however, quantitative knowledge of the state of degradation and operation conditions presents many uncertainties, which leads to a difficult decision-making process. Therefore, reliability-based maintenance becomes mandatory for decision-making.


Preventive maintenance
Preventive maintenance allows us to ensure an acceptable level of reliability during the structural lifecycle. A conditional maintenance policy is based on periodic inspections of degradation, which eventurally trigger alarms related to repairs and replacements. In practice, however, quantitative knowledge of the state of degradation and operation conditions presents many uncertainties, which leads to a difficult decision-making process. Therefore, reliability-based maintenance becomes mandatory for decision-making.
The total maintenance cost can be written in the form: where M C is the expected total maintenance cost, F C is the expected failure cost (including operation losses, production losses, and the direct and indirect damages due to failure), PM C is the expected preventive maintenance cost, INS C is the expected inspection cost, and REP C is the expected repair cost. These costs are affected by uncertainties related to the state of degradation of the structure, the results of inspections and to repair/replacement methods. Moreover, these parameters may vary in terms of socio-economic environment, such as the discount rate, inflation and the fluctuations of market prices.
The failure cost is related to direct damage (human lives, economic losses, loss of benefits, environment degradation, etc.) and to indirect damage (procedure fees, commercial impact, market losses, expert works, long term effects, etc.). Depending on the industry concerned, some costs may increase in an exponential way in terms of the failure rate. For example, the public relations/marketing impact, and therefore market losses, can jump considerably when the number of failed products becomes significant (which is the case in mass production, the automotive industry, aeronautics, etc.) or because the perception of risks makes them unacceptable (which is the case for nuclear power plants, dams, railways, etc.). In energy production industries (e.g. power plants, petro-chemical plants, etc.) the main losses are due to benefit losses when production is stopped. Moreover, during the last decade, the public has become more sensitive to aspects related to the environment; failures inducing pollution are severely punished by justice, politics and public opinion. Figure 14.1 illustrates the costs of preventive and corrective maintenance, in terms of the level of planning. A low level of planning leads to larger number of emergency repairs and consequently to a larger total cost. Conversely, a high level of planned maintenance costs more and leads to losses due to over-maintenance. The optimization of the total maintenance cost allows us to find the best equilibrium in terms of the risks to be considered.  under the assumption of an infinite horizon 1 , the expectation of the maintenance cost per unit time takes the form: where τ is the maintenance interval, c C is the corrective maintenance cost, p C is the preventive maintenance cost 2 and ( ) f P τ is the accumulated failure probability at time τ ; reliability is given by the survival probability ( ) 1 ( ) . The choice of maintenance strategy depends on the ratio between preventive and corrective costs; when , preventive maintenance becomes useful, otherwise, maintenance should only be performed when the component fails.

Maintenance based on time
Under this policy, preventive maintenance is performed periodically at predefined times τ k . When a component fails during an , corrective maintenance is performed at the failure time. The advantage of this policy lies in its simplicity of application for the management of industrial systems, as preventive maintenance is previously planned and there is no need to monitor the ageing of components. Three versions of the model (I, II, III) can be considered: -Model I: the failed component is instantaneously replaced when failure occurs; -Model II: the failed component remains unrepaired until the next preventive maintenance; -Model III: the failed component is subject to minimal repair until the next preventive maintenance.
1 The hypothesis of infinite horizon admits that the maintenance cycles are repetitive and identical at each renewal of the system. By contrast, the hypothesis of finite horizon admits that each maintenance cycle is different from a stochastic or economic point of view. 2 Unlike the notation PM C and F C which indicate the mathematical expectation of preventive maintenance and failure costs, respectively. The notation p C and c C indicate the deterministic costs of preventive and corrective maintenance (including the cost of failure), respectively. In other words, p C and c C are paid only when an event occurs.

Model I
In this model, the failed component is replaced by a new one during the maintenance interval, and all the components are systematically replaced at time intervals τ . According to renewal theory, the cost per unit time is obtained by:  Under the assumption of only one failure of the same component during the interval, the above equation takes the form: with c C being the corrective maintenance cost and p C the preventive maintenance cost.

Model II
In Model I, the component failure is immediately detected when failure occurs. In the absence of monitoring devices, it can be assumed that failure is detected only at the planned maintenance times τ k . In Model II, the failed component remains unusable or unoperating after failure, until its detection. The expectation of the duration between failure and detection times is given by: . Therefore, the maintenance cost per unit time is written as: with ct C being the corrective cost per unit time.

Model III
In Model III, it is assumed that the component undergoes minimal repair when failure occurs. The process of the number of failures ( ) t N is not perturbed by the failure-repair couple of events. A Non-Homogeneous Poisson Process (NHPP) represents the behavior, where the expectation of the failure rate is given by: , called "the hazard function" and ( ) t Λ "the cumulated hazard function". The total cost per unit time is therefore: where cm C indicates the corrective cost of minimal repair.

Maintenance based on age
In this model, only the components which have survived till the planned time of preventive maintenance are replaced by new components, otherwise replacement is performed at failure. The advantage of this model is particularly significant when the only considered actions are replacement by a new component (i.e. repair is not considered to be alternative). The total cost per unit time is therefore: When the investment is damped for a long duration, it is necessary to update the cost by a factor r, called the "discount rate", taking into account the interest rate, inflation and other economic parameters. The present value of a cost is obtained by multiplying by the discount function , which can be replaced by a continuous function Under the assumption of infinite horizon, the updated expected cost is given by:

Impact of inspection on costs
An optimal maintenance policy minimizes the total cost, including inspection and failure costs. A small time span between inspections leads to high costs of inspection operations, while a large time interval does not allow for timely detection of failures, therefore increasing the possibility of failure.
The first modeling consists of performing inspections at specific times, where each inspection is considered as instantaneous and perfect. The policy considers the inspection cost i c and the failure cost . The expected total inspection cost is written: The solution of this leads to a recursive equation of the time intervals between inspections: with the first interval defined as: To simplify the solution method, some approximate formulations of the total cost have been proposed in the literature.
If we let ) (t n be the approximate number of inspections per unit time, the expected cost until the detection of the failure can be approximated by: which can be minimized to give: The inspection times thus satisfy the equation: where k is an integer.

The case of imperfect inspections
In situ inspection of structures is performed in conditions that are far from the ideal conditions found in a laboratory. When the operator has an important influence on the inspection result (precision and disposition of the material, visual reading, etc.), the working conditions directly affect the measurements. External factors such as fog, extreme temperatures, difficult working positions or internal factors such as fatigue and concentration level can be mentioned here as examples. In such cases, we talk about imperfect inspections.
We can use a probabilistic format to define the corresponding quantities of Probability of Detection (PoD) of a defect, and Probability of False Alarm (PFA). The calibration of these probabilities can be performed, either on the basis of statistical analysis, or by signal analysis. It should be noted that, in the case of PFA = 0, a Bayesian updating can be performed on the inspection results in order to modify the distribution of defects after inspection.
The technical performance of Non-Destructive Testing (NDT) devices and the chain of decision processes to achieve the information required are generally observed from two objectives regarding (i) the presence of defects (capacity of detection) and (ii) the measurement of the defect (capacity of measuring the physical or geometrical properties, such as the length and the depth of a crack). We can easily understand that a measurement (e.g. in the case of a lock or immersed piles of a wharf) realized at a number of meters depth is subject to large uncertainties, related to the following events: -the diver gives s signal to the ground operator (beginning time of measurement t 0 ); -the diver can handle the NDT device in operation more or less easily (depending on complexity of inspected joints, agitation due to waves and marine currents); -the diver's vision is strongly reduced; -the quality of the decision is based on the quality of cleaning of the surface to be inspected, especially of bio-dirtiness; -diver fatigue and respiration difficulties come to increase the above difficulties.
Details on the available techniques, and their respective advantages and disadvantages, for the example of offshore platforms can be found in Let us assume that a crack has been detected and we are considering the measurement of uncertainties. Let d a be the detection threshold, that is, the size under which no crack can be detected. If unknown the distribution of defect d is called signal and measured defect "d hat" signal plus noise. The noise is the mathematical notion that allows us to model the errors of measurement, interpretation, see section 14.4.2. The probability of detection of a measured random defect dˆis therefore defined by: This definition is practical as long as the defect a can be described by a random variable. However, in the operational framework of inspection of real structures, defects are generally classified by groups and we prefer a Bayesian definition [ROU 03]: where X is the event of "defect existence" and d(.) the event "decision". The realization "X = 1" indicates the existence of a defect and "X = 0" the absence of a defect. The interest of this formulation lies in the fact that it offers a clear definition of the Probability of False Alarm (PFA): If the defect is an event with non-discrete values where the distribution of the signal is known, and the distribution of the noise is known, the theory of detection leads to the following definitions of the two probabilities, PoD and PFA: where f SN and f Ν indicate, respectively, the probability densities of the variables "signal + noise" and "noise". We can note that the probability density of the noise can be defined by the probability conditioned by the measured value. We shall not detail these considerations because it is extremely difficult to prove and even to quantify them. We can simply note that, physically, for many measurement devices, the operator can tune the signal gain more and more finely, if defects are not detected with the original settings. In this case, the noise evolves with the adjustment and consequently with the defect that we are measuring. The formulas PoD and PFA can be modified to include this information in the conditional probabilities.
For a given size or class of measured defect, we can plot the curve relating to the points with coordinates (PFA; PoD), by modifying the parameters affecting the measurements (according to the case concerned, the parameters can be device adjustment, visibility, operator experience, etc.). This curve, Figure 14.2, is obtained, in a continuous form, by varying the threshold d a ; the curve is called the curve of Receiver Operating Characteristics, or simply the "ROC curve". Note that, in case of inspections under severe in situ conditions, such as in high mountains, offshore platforms and marine structures, the performance of measurement devices is strongly affected (be agitation of waves and storms, visibility, temperature, experience and state of fatigue of divers, quality of the link with platform supervisor, etc.). Campaigns of inter-calibration of type -InterCalibration of NDT for Offshore Structures (ICON) -become necessary, by which we measure, for each class of defect (size and typology), the numbers of good and bad detections, and calculate the observed probabilities corresponding to the two cases: are, respectively, the number of existing and detected defects, the number of non-existing and detected defects, the number of existing and undetected defects, and finally the number of non-existing and undetected defects. According to these definitions, ) (c p F is the PFA and ) (c p b is the PoD. We can, depending on the considered class of defects, build discrete ROC curves.

New concepts for decision-making
For a structure manager, the questions are often different and new probabilities have to be introduced [ROU 03]. In fact, when using Bayesian modeling, we have to define the conditional probabilities associated to the following events: We can write these probabilities in terms of PoD and PFA to find: These equations introduce a new measure of probability: the Probability of Crack Existence (PCE), so named because these definitions were initially developed for the detection of cracks in the oil structure industry. The presence of only probabilities PoD and PFA in the same decision scheme is not, therefore, satisfactory. Moreover, considering only the PoD is equivalent to considering that PoD = P(d(X) = 1). This implies that the two conditions: {PCP = 1 ; PFA = 0}, are satisfied, which are strong assumptions. Parametric studies can thus be performed, in order to identify (for example) the importance of the PFA. Hence, the information transfer during inspection can be drawn as indicated in Figure 14.2, where F 1 is an unknown function and F 2 is described by the nonlinear equations above.
We note that the laboratory generated Probability of Detection (PoD) is discontinuous by class of defects, while the decision-maker needs continuous information scales, integrable and differentiable for a numerical analysis.

Decision-maker
In situ conditions

Figure 14.5. Evolution of the Probability of Detection (PoD) as a function of the probability of false alarm (PFA): Receiver Operating Characteristic (ROC) curves for the Probabilities of Crack Existence (PCE) under various conditional probabilities P i
From the curves in Figure 14.5, it can be observed that ROC curves are highly sensitive to the variations of PCE and to the studied conditional probabilities P i .

Structures with large lifetimes
For structures with large lifetimes, such as civil engineering structures and infrastructures, it is necessary to take into account the evolution of monetary values, which is performed by the mean of discount functions, including interest and inflation rates. Moreover, the assumption of infinite horizon cannot usually be allowed, as the number of actions is often limited during the lifetime of the structure. In this case, the total cost concerning the whole lifetime of the structure should be considered, and should include discount effects. When failure occurs between two inspections at times 1 − i t and i t , the expected failure cost is written as: is the cost of failure consequences. Moreover, for a number of inspections INS N , the expression of the total inspection cost is written: is the cost of the th i inspection which depends on type and the quality q , is the cumulated failure probability at the th i inspection and r is the discount rate (often considered between 0.01 and 0.05, and up to 0.09 in the nuclear industry). At the end of each inspection, we can associate the Probability of Detection PoD(t i ) and the Probability of False Alarm PFA(t i ); a decision should then be taken regarding the system repair, taking account for PoD(t i ) and PFA(t i ). This decision is generally based on admissible reliability levels. The repair and replacement cost REP C depends on the nature and number of actions lNS N to be performed: is the replacement cost at the th i inspection and ( ) i REP t P is the corresponding repair probability.

Criteria for choosing a maintenance policy
Maintenance policies can be based on various criteria to define optimal strategy. Garabatov & Guedes Soares [GAR 01] compared strategies based on the following criteria: -pure economic criterion: the time intervals between inspections and replacements are defined by optimal cost of maintenance without constraints on the required reliability level; -economic criterion with minimal interval: in order to avoid closely scheduled operations, a constraint on the minimum time interval between successive operations is introduced in the cost optimization problem; -pure operational criterion: for a better management of the system and its availability, a constant time interval is often adopted for maintenance operations; the choice of this interval is based on a minimization of the total maintenance cost; -pure reliability criterion: the time interval is determined by the time at which the system reliability reaches the minimum acceptable level; due to system degradation, the time intervals vary along the lifetime of the structure; -reliability criterion based on inspection quality: in this case, the inspection/replacement intervals are regular, but the quality of the operation is adjusted such that minimum reliability is ensured over the whole lifetime.
In general, the purely economic criterion leads to a large reduction of costs, but implies frequent maintenance actions. The choice of a specific policy strongly depends on the nature of the system and the failure consequences. A reliability criterion with consideration of maintenance quality seems to be a reasonable compromise to reduce costs, while ensuring appropriate reliability levels.

Example of a corroded steel pipeline
To illustrate some of the above concepts, consider a simple example of a steel pipeline subject to corrosion. The system variables and their distribution parameters are given in Table 14.1 (for simplicity, all the probability distributions are considered as normal). In this example, the tube wall thickness loss is given by the corrosion law of type kt n for t > 1, where t is the time in years, and k and n are the parameters of the corrosion model.
By considering the safety margin corresponding to the material strength regarding hoop stress, the reliability index is found to be 3.904 (the mean of the margin is 4.5 and its standard deviation is 1.153). In the corroded state, the safety margin, the reliability index and the failure probability are given by:    Figure 14.6 depicts the evolution of the reliability index as a function of the age of the structure t.   In this example, perfect maintenance corresponds to replacement of the pipe by a new one, and imperfect maintenance consists of applying a coating with cost equal to 10 k€ per mm of additional thickness. Without maintenance, the total cost of the pipe is composed of the initial and failure costs. The expected total cost is plotted in Figure 14.7 as a function of the pipe age, where a minimum is observed at 41 years, corresponding to its economic lifetime without maintenance. .9 depict the costs per unit time in the case of perfect and imperfect maintenance, respectively. In the case of perfect maintenance, optimal maintenance is located at 23 years with a total cost of 1.34 k€/yr. In the case of imperfect maintenance, we have chosen to add 0.3 mm of coating, representing a repair cost of 3 k€. The optimum is located at 16 years with a total cost of 0.49 k€/yr. It is important to note that these values are based on the assumption of an infinite horizon. It can easily be demonstrated that this assumption does not apply in the case of imperfect maintenance, as the maintenance cycles are not identical. In other words, imperfect maintenance is only valid for the first cycle.

Perfect maintenance under infinite horizon
By considering the case of a finite horizon, the curves in Figures 14.10 and 14.11 show the evolution of the failure probabilities associated with different perfect and imperfect maintenance policies, respectively. In this case, we have to calculate the total cost over the service life, which is taken here as 50 years. In the case of perfect maintenance,

Imperfect maintenance under infinite horizon
As the two cycles are supposed to start with a new structure, the failure costs are minimized when the two cycles are identical (i.e. a maintenance operation at 25 years), which explains why the consideration of finite horizon allows us to reduce the total cost. The application of two maintenance actions leads to a significant increase in the maintenance costs, which is not recovered by the benefits of reducing the failure costs. In the case of imperfect maintenance, the assumption of an infinite horizon allows us to optimize the first cycle, but the increase of the failure probability at the end of the lifetime (i.e. at 50 years) leads to very large failure costs. When the maintenance intervals are chosen to balance the failure probabilities by using 0.5 mm of coating, at 15 and 35 years, we obtain a total costs of 18.9 k€ instead of 44 k€. The same strategy is applied with four operations, leading to a higher cost of 21.2 k€.

Perfect maintenance
One action (finite horizon) One action (infinite horizon) Two actions Figure 14.11. Evolution of the failure probability (imperfect maintenance)

Conclusion
This chapter has introduced several types of reliability-based maintenance models. The main difficulty lies in the estimation of direct and indirect costs of failure, especially when immaterial losses are involved (i.e. human lives, public relations effects, etc.). The formulation of the maintenance cost becomes more difficult when multi-component systems are considered, as economic and stochastic interactions make the analysis very complex. Interested readers can consult the specialized literature, such as [CRE 03], dedicated to the management of infrastructures by considering inspection tool performance, determination of degradation laws, reliability assessment and the choice of maintenance actions.