Reliability assessment of complex mechatronic systems using a modified nonparametric belief propagation algorithm

Various parametric skewed distributions are widely used to model the time-to-failure (TTF) in the reliability analysis of mechatronic systems, where many items are unobservable due to the high cost of testing. Estimating the parameters of those distributions becomes a challenge. Previous research has failed to consider this problem due to the difﬁculty of dependency modeling. Recently the methodology of Bayesian networks (BNs) has greatly contributed to the reliability analysis of complex systems. In this paper, the problem of system reliability assessment (SRA) is formulated as a BN considering the parameter uncertainty. As the quantitative speciﬁcation of BN, a normal distribution representing the stochastic nature of TTF distribution is learned to capture the interactions between the basic items and their output items. The approximation inference of our continuous BN model is performed by a modiﬁed version of nonparametric belief propagation (NBP) which can avoid using a junction tree that is inefﬁcient for the mechatronic case because of the large treewidth. After reasoning, we obtain the marginal posterior density of each TTF model parameter. Other information from diverse sources and expert priors can be easily incorporated in this SRA model to achieve more accurate results. Simulation in simple and complex cases of mechatronic systems demonstrates that the posterior of the parameter network ﬁts the data well and the uncertainty passes effectively through our BN based SRA model by using the modiﬁed NBP.


Introduction
Reliability analysis is an essential element during the development of mechatronic systems.The attempt to improve system reliability makes the system reliability assessment (SRA) an ongoing research topic.A number of methods for SRA have been developed, most of which estimate the system reliability using only the data of components since it is much easier and less expensive to test the components/subsystems than the entire system.In general, the authors believe that SRA remains a difficult problem for the following reasons.
Complexity: Products are built increasingly complex due to added functionalities.It becomes more difficult to propagate component information to the system level in those systems with high complexity.
Expensiveness: of testing.Testing a whole complex system tends to be impossible.Reliability assessment at the system level requires a good tool to efficiently combine the information collected from various levels and diverse sources.
Functional and temporal dependency: A failure might be caused by more than one mutually-dependent event, such as shared causes, exclusive events and standby redundancies.The exploitation of explicit dependence is itself a challenge.
Uncertainty due to variability of model, also known as epistemic uncertainty.This arises from a lack of knowledge about the probability distribution of time-to-failure data.
SRA is clearly a problem of system decision with inherent ambiguities.The statistical inference provides a promising methodology for such decision problems by finding the ''best guess'' for the interesting items given the existing observation.As one of the most popular approaches over the last decade, Bayesian networks (BNs) have many advantages over the classical reliability formalisms when applied to SRA [1,2].BNs can deal with the problem of dependency under not only temporal but also functional relations.The BN formalism can accept and integrate the diverse data from all kind of sources, such as the prior specifications given by manufacturers, the historical TTF data and the current testing data.In a BN, we can easily propagate the uncertainties, which would be difficult and even impossible with conventional logic models, e.g.fault trees and event trees.Moreover, the methodology of BNs has been so wellstudied that we have many options for modeling and reasoning in various situations.
However, the previous research on BNs has not paid enough attention to the parameter uncertainty of failure models that are usually parametric skewed distributions.There are obviously some unobservable items (e.g.no test data) in a complex mechatronic system, thus it is a challenge to estimate the model parameters of those items.The main contribution of this research is that we propose a continuous BN model to estimate the optimal configuration of time-to-failure (TTF) model parameters for a complex mechatronic system using data from diverse sources, and we develop a modified nonparametric belief propagation algorithm to obtain the marginal density in non-singly-connected BNs without the use of a junction tree.
The remainder of this paper is organized as follows.After reviewing the related works in Section 2, we formulate in Section 3 our basic model of system reliability assessment.In Section 4, the nonparametric belief propagation is modified and applied to the inference of the basic model.The simulation results and discussion shown in Section 5 demonstrate that our model is effective and efficient.Finally, some conclusions and perspectives are offered in Section 6.

Related works
In this section, the related methods of system reliability assessment (SRA) are reviewed and classified, especially those using the BN based approaches.Since there exist various methodologies for modeling and analyzing system reliability, the authors also refer the readers to a more detailed classification by Mihalache [3] and an alternative review by Zio [4].
In the past decades, many methods have been applied to SRA with the help of specific mathematical tools, such as fault trees (FTs) [5], reliability block diagrams [6], petri nets [7], method of moments [8] and so on.Among these approaches, the FT formalism is undoubtedly the most widely used one due to its efficiency and the capability of system abstraction.However, it cannot cope with the temporal or functional dependency among components.To deal with the temporal interrelation, people proposed Markov chain (MC) method [9,10] and dynamic fault trees (DFTs) [11,12].The problem of combinatorial explosion is nevertheless intractable when applying those two well-known methods to highly-complex mechatronic systems.Moreover, they work only under the assumption that the TTF of components obey the exponential distribution law.In recent years, Bayesian reliability analysis has proved to be a powerful tool providing important methodological advantages over the conventional techniques [13].The BN methodology is strongly recommended [1] as an effective and versatile SRA tool that is capable of modeling both temporal and functional dependency.Another important advantage over traditional approaches is the ability to combine the information from diverse sources [14].BNs become attractive due to the ease of building a BN from FT [15] and directly from existing data [16].
In the authors' view, the published approaches based on BNs can be classified into three categories according to the variable type in the network.(1) Working state based BNs, also known as event based BNs.The working state usually indicates fail/work which is inherited from binary decision diagrams (BDD) and FTs.The node variables are thus defined in discrete space.Most published research work [17,18,2] falls into this category.For more flexibility in modeling the working condition, Weber [19] introduced the multi-state variable to describe the degradative states of items in dynamic Bayesian networks (DBNs).These methods calculate the distribution of system state deterministically by using conditional probability tables (CPTs) established by experts in advance.However, the temporal dependency might be time-varying due to the existence of component deterioration and environment change.An unchangeable CPT may mismatch the time-varying temporal dependency.Furthermore, the working state based BNs are not flexible enough to analyze the risk in continuous time space because the estimation must be reexecuted at each timestamp concerned.(2) TTF model based BNs.These BNs assign a probabilistic description to the TTF data of an item, see the simple example of binary distribution discussed by Langseth [1].Boudali [20] presented a non-parametric discretetime TTF model, and Boudali [21] succeeded in modeling a continuous-time TTF in close-form without considering the model uncertainty.However it is still non-trivial to directly model the TTF because of the complexity of modeling an arbitrary probability density in continuous space.Johnson [22] and Anderson-Cook [23] modeled the distribution parameters of TTF as a continuous unknown variable, such as the scale and the shape of a 2D Weibull density.This facilitates passing information through the network and the reliability analysis at system level based on the characteristics of the skewed distributions, e.g.Weibull analysis.Therefore, the uncertainty of the model parameters must unavoidably be considered.The problem of SRA is hence regarded as an inference of the continuous model parameters in a BN.We notice three advantages compared with the first category: any parametric distribution can be used to model the TTF model parameters uniformly or differently; no CPT needs to be completed, the dependency is specified according to the problems or learned from examples; after statistical inference, the posterior finds the ''best guess'' solution and fully describes the uncertainty relative to the TTF model parameters.(3) Hybrid state based BNs.N e i l [24] and Marquez [25] have successfully modeled TTF distribution by continuous random variables as well as by discrete random variables (e.g.working states of items).In fact, the discrete variables can be directly appended into the BNs of the second category when the TTF models of items are ready.In all the research works above, little study was made of parameter uncertainty in the TTF model for system reliability, though it has long been investigated in relation to component reliability [26,27].
On the other hand, a number of statistical inference algorithms have been proposed to perform reasoning in a graph.Exact inference in discrete state BNs is possible with some exact marginalization algorithms for trees, such as variable elimination, belief propagation [28] and junction trees [29].However, the integral in continuous state space still raises some difficulties.Until recently a number of discretization methods have been suggested to perform the inference in the continuous graph [30], such as mixture of truncated exponentials [31] and dynamic discretization [24,32].Most of these methods rely on the junction tree algorithm in the case of non-singly-connected BN.However, performing an inference in the junction tree with treewidth k tr will have at least one computation which requires time exponential in k tr due to the marginalization process over supernodes.Therefore the junction tree based methods become less attractive for mechatronic systems because the subsystems in a mechatronic system, especially the electronic devices, commonly have a large number of components (input items), which leads to a large treewidth in the generated junction tree.We found that the sampling based nonparametric belief propagation (NBP) [33] is efficient and applicable to our BN model because of the flexible uncertainty representation by means of kernel density estimation (KDE, an open-source toolbox of KDE has been provided by Ihler [34]).Unlike the other inference methods for loopy graphs, the NBP does not propagate information in a junction tree, but instead it iterates the message updates in a loopy graph until convergence.
From the review above, we see that the previous research has not paid enough attention to the stochastic nature of parametric TTF models in system reliability.In this paper, we formulate a new BN model to cope with the parameter uncertainty of the SRA problem.Our proposed BN model falling into the second category is an extension of the formalism of Bayesian component reliability [13], see Section 3 for the basic model.In contrast with conventional methods, we do not predefine the quantitative part of BNs, but learn the dependency, which is derived from probabilistic noisy gates, from some sample testing data.This model is efficiently solved by a modified version of the NBP algorithm.

A new model for SRA
Unlike the continuous time-to-failure (TTF) models in [21,35], our method for SRA uses the general BN framework to model the failure parameters of items in the complex system and their dependency.This model allows us to estimate the parametric TTF distributions of all items given the observed failure data as evidence.
As the qualitative part of a BN model, the directed acyclic graph (DAG) can be constructed from the fault tree model by Bobbio's method [15].However in our model the network variables denote neither the working state of items nor the items' TTF, but the items' failure parameters which are function outputs of the noisy gates rather than the deterministic gates.By ''noisy'' we do not mean the uncertainty of the output event (success/ failure) as do the methods in other published work [15,35], rather we mean the uncertainty of the model parameters of the output failure distribution.For example, a simple OR-gate system C is shown in Fig. 1a with two basic independent components A and B. A, B and C have exponential failure distributions with the failure rates l a , l b and l c respectively.Their failure times are denoted by t a , t b and t c respectively.If the OR-gate is deterministic, C fails when either A or B fails.Then C fails in the time interval (0,t] with the probability Prðt c rtÞ¼Prðt a rt [ t b r tÞ¼Prðt a rtÞþPrðt b r tÞÀ Prðt a r tÞPrðt b r tÞ.Notice that for an exponential failure distribution PrðtÞ¼1Àe Àlt .Therefore the failure rate of C can be First, not all failure modes can be found because of the large number of components in the electronic and mechanical subsystems.The modes not found can have more or less influence on the gates.Second, the multi-parameter skewed distributions are widely used for a mechatronic system due to its various failure modes.For some multi-parameter distributions, the deterministic output parameters may not have any solution because we have only one equation (the output failure distribution).Third, the effect of environment is usually unmeasurable, especially for the electronic and mechanical subsystems.
None of the previous research has addressed this noise which is difficult to measure.Fortunately, this uncertainty can generally be described by a statistical model on the test data acquired from the basic components as well as from the system.The SRA problem can then be formulated as a probabilistic reasoning process for the failure parameter of each item given the test data.
As analyzed above, it is difficult to model the parameter uncertainty directly.For this reason we turn to the TTF density that is a function of the parameters.We denote the density by f i ðx i ,tÞ, i A I n, where I n is the label set of the nodes in the BN and X i ¼x i is a unified notation for all types of multidimensional parameters.For brevity the time is hereafter omitted in density expressions.For instance, the exponential TTF density of X c in Fig. 1ci , t Z0.Fig. 2 shows the stochastic nature of failure density in reliability analysis.Thus a probability distribution is expected to represent the stochastic nature of f i ðX i Þ, iA I n.Notice that a higher probability should be assigned to an uncertain density that is closer to the deterministic density.Then some symmetrical distributions, such as the normal and the triangular distributions, can be selected when no other prior information is known [26].In this paper, the normal distribution is used because of its flexibility.Further the ''closer'' is measured based on a distance  function dðÁ,ÁÞ between probability densities.Thus the parameter uncertainty can be reduced to the distance uncertainty, where f i d denotes the deterministic output density function of the gate going to node i. N ð0,s i Þ is a normal distribution with zero mean and standard deviation s i which will be discussed at the end of this section.Note that a distance cannot be negative.A one-sided normal distribution is then considered.Then given s i , we have ) In fact this ''one-sided'' transformation does not affect the simulation results of those parameters because of the normalization stage in a probabilistic inference algorithm.The formulation above plays the role of output parameter prediction based on the parameters of the basic items.
For the gates with more than two basic items, the deterministic output density functions f d i can still be derived without difficulty [13].For a series structure, the system functions only if all components function, i.e. for iA I n, where BðiÞ denotes the label set of node i's basic component nodes and F k (X k ) is the cumulative probability function of f k (X k ).BðiÞ\j means the set BðiÞ except j.For a parallel structure, the system functions if any component functions, i.e. for i A I n, Series and parallel structures are special cases of k-of-n systems, which can be re-structured as C n k parallel subsystems of k series components.Hence any k-of-n system can also be analyzed by a repeated calculation of Eqs. ( 3) and ( 4).So far we can obtain the dependency between the output parameter and the parameters of basic items in Eqs. ( 2)-( 4).Finally the quantitative part of our BN model, i.e. the conditional probability distribution (CPD), is defined by There are many choices for the distance function between probability densities [36], such as Kullback-Leibler (KL) divergence and Euclidean distance.The following example demonstrates the effect of the dependency described above.
Example.A simple exponential-distributed series system consists of two components, whose FT model has the same structure as that of Fig. 1a.The TTF of node A is modeled as exponential density with failure rate l a ¼ 0:68 and that of node B is modeled as Weibull distribution with shape k b ¼ 1.5 and scale l b ¼ 1.We discretize the densities (time interval ¼ 10 À 3 s) and take the KL divergence as distance metric.The distribution of the system parameter (X c ) given the parameters of the basic items, i.e.
PrðX c ¼ l c jX a ¼fl a g,X b ¼fk b ,l b gÞ, is then calculated from Eq. ( 5) and shown in Fig. 3 for different deviation value s c .From Fig. 3 we observe that the smaller s c is, the sharper the distribution is.
In other words, the gate tends to be deterministic with the decrease of s c .While s c increases, the information from the basic items to the system tends to be cut off by the gate because PrðX c jX a ¼fl a g,X b ¼fk b ,l b gÞ tends to be uniform and no uncertainty can pass through a uniform CPD.
The remainder of this dependency model is the standard deviation s i in Eq. ( 2) that encodes the dependency between the parameters of the basic items and those of the higher level item.In order to achieve the best distinguishing capacity of Eq. ( 2), this standard deviation can be estimated by a parameter estimation technique.The maximum likelihood estimation (MLE) approach is selected in our implementation because of its ease of use.If there exists adequate sample data, MLE yields where y j ¼d(f i (j) ,f i d ) is called pseudo-observation that represents the distance between a sample candidate density f i (j) and the deterministic density f i d . The above formulation can be converted to logarithmic interpretation to avoid the trouble of exponential overflow.In practice, we reuse this parameter s for the same type of components or subsystems.The readers are referred to Section 5 for a simulation example of this standard deviation estimation.

Approximate inference
After modeling, the Bayesian network needs a process of probabilistic reasoning, i.e. an approximate inference in the continuous networks.The inference task of a BN is to calculate the posterior marginal probability of each hidden variable given the observed variables.We have various methods to perform the approximate inference in a continuous network [30].As described in Section 2 the conventional inference methods for non-singlyconnected continuous graphs depends on a junction tree, such as dynamic discretization in [35,32].However, the computation of exponential in treewidth makes the junction tree based methods less attractive for mechatronic applications.Recently a nonparametric belief propagation (NBP) in Markov random fields (MRFs) has attracted our attention because of its reasoning capacity and efficiency in the general loopy graphs.In particular, it is a linear time algorithm proportional to the number of hidden variables.To facilitate the inference process of NBP, we need to convert the BN to a corresponding MRF.Then following a brief introduction of NBP, a modification on the NBP is made to adapt the NBP to the newly-generated MRF.

NBP algorithm in pairwise MRF
According to Weiss' method [37], arbitrary BN can be converted to an undirected graphical model, i.e. pairwise MRF.An example of this conversion is demonstrated in Fig. 4. The method is to add a cluster node for the parent nodes that share a common child in the BN, add the connection between the cluster node and the parents/child, then replace the directed arcs with undirected ones.In the example, the variable for the newly-created node is denoted by X 5 which is a compound variable of the parents, i.e.X 5 ¼fX u 5,2 ,X u 5,3 ,X u 5,4 g.WesayX 5 and X 4 are consistent with each other if x u 5,4 ¼ x 4 and inconsistent if x u 5,4 ax 4 .For convenience in the pairwise MRF, we still call node 1 the child of node 5 and node 2 (or 3, 4) a parent of node 5.The child set and the parent set of a cluster node, say i,aredenotedbyChðiÞ and PaðiÞ respectively.Furthermore I c refers to the label set of the cluster nodes and d i denotes the corresponding test data (e.g.TTF) of node i, iA I n.
Besides the qualitative specification, the corresponding MRF and the original BN should have the same joint probability distribution, which is a quantitative part of the graphical model.Suppose that all probabilities are positive, the Hammersley-Clifford theorem [28] guarantees that the joint probability distribution can be factorized into a product of the compatibility functions of the cliques over the graph.Thus the joint density of a pairwise MRF is factorized as follows where I ¼fI n,I cg.X and D denote the joint variable and joint observation respectively.GðiÞ denotes the label set of neighbors of node i.The quantitative specification of a pairwise MRF is then composed of f i ðÁÞ and c ji ðÁ,ÁÞ, named as first-order potential and second-order clique potential respectively.They encode the dependency between nodes and are set in the following way so that the joint probability distribution of the MRF is identical to that of the BN: f i ðX i Þ¼Prðd i jX i Þ represents the likelihood of observed data d i with respect to variable X i , implying the local evidence; c ji ðX j ,X i Þ is the child's conditional probability given the corresponding cluster variable if it is a potential between a cluster node and the corresponding child (e.g.node 5 and node 1 in Fig. 4b); otherwise, c ji ðX j ,X i Þ is set to one if the compound node is consistent with the parent node and zero otherwise.For the example in Fig. 4b, and X 4 are consistent with each other) and 0 when X u 5,2 a X 2 (inconsistent).Finally PrðX,DÞ¼ In the belief propagation (BP) algorithm, we use a so-called ''message'' from a hidden node (say j) to another one (say i)t o describe how likely node i will be in one state given the knowledge of node j, i.e. m ji (X i ), see Fig. 4c for message passing paths.The belief at node i denotes how likely node i should be in one state given the information from neighbors and from its local evidence.It is then proportional to the product of local evidence and all the messages coming to node i, i.e.
Accordingly, the estimates of minimum mean square error (MMSE) and maximum a posterior (MAP) are obtained based on this belief.Furthermore, the messages are updated iteratively by the rule where t indicates the iteration index and a is a normalization constant.The summation is replaced by an integral with respect to X j for a continuous network.The BP is also known as a sumproduct algorithm.It has been proven [38] that in the case of a singly-connected graph, the BP in fact converges and gives the exact marginal probabilities for all the hidden nodes.Although the BP cannot guarantee the convergence in the loopy case, the BP has been applied with experimental success [39].
Until recently, the techniques of Monte Carlo sampling and kernel density estimation has provided an efficient way to perform BP inference in loopy continuous graphs, named nonparametric belief propagation (NBP, [33]).It is called nonparametric because the message is represented by a mixture of Gaussian which is a sample-based or kernel density estimate where w ji (n) is a weight associated with nth kernel mean m ðnÞ ji , and P n w ðnÞ ji ¼ 1. N is the number of kernel means used for density estimation.L ji is the common bandwidth.The belief (Eq.( 8)) and message updating (Eq.( 9)) are implemented by using samplingbased approximations in two stages: message product and message propagation.
In the first stage, the product of mN-component messages yields an e wm i x t u r eo fG a u s s i a nw i t hN m components.Each component and its weight can be obtained by the following equations To avoid the exponentially-large number of mixture components, a Gibbs sampler [40] is employed to draw N independent samples as kernel means of the message product.In our experience, about one hundred kernels make it possible to achieve satisfying approximations of the 2D and 3D densities even for a rare event.It is not surprising that this approximation is much faster than the other stochastic simulation methods, such as Markov chain Monte Carlo (MCMC) which requires many more samples to stochastically update the density representation.
In the second stage, the NBP propagates each component of a message by approximating the integral in Eq. ( 9).Before the propagation, the product of R c ji ðX j ,X i Þ dX i and f j ðX j Þ are regarded as a marginal influence and calculated as xðX j Þ, For message m ji (X i ), each sample x j (n) drawn from the product by sampling X i from c ji ðx ðnÞ j ,X i Þ using importance sampling or MCMC techniques.After obtaining those output samples as kernel means of the updated message, we can choose a kernel bandwidth L i by leave-one-out likelihood cross-validation [41].We refer the readers to [33] for the detailed generic NBP algorithm.

Modified NBP
We notice that the generic NBP may encounter two problems in the newly-generated MRF when propagating the message between a cluster node and its parents, because of the specification of second-order clique potentials generated by the conversion from BN to MRF.Without loss of generality, we suppose node s A I c is a cluster node and node p A PaðsÞ is one of its parents.Like the example in Fig. 4b X 5 ¼fX u 5,2 ,X u 5,3 ,X u 5,4 g, we denote by X u s,p the corresponding component of node p in compound variable X s .Then we discuss the message propagation stage for the constructions of m sp (X p ) and m ps (X s ) as follows respectively.
First, since the second-order clique potential between X s and X p is a delta function, any common sampling technique for X s $ c ps ðX p ,X s Þ or X p $ c sp ðX s ,X p Þ is unfeasible.We suggest making them consistent with each other instead of sampling.For example, given a message product sample x s (n) of X s we set x ðnÞ p ¼ x uðnÞ s,p as a new sample of X p .Vice versa, when given a sample x p (n) of X p we set x uðnÞ sp ¼ x ðnÞ p .For the other components in X s , sampling techniques can still be used.However, c ps ðX p ,X s Þ does not provide any information about the other components except that of node p.Any sampling method will sample from a uniform distribution for the other components.
The second concern is that sampling the other components from uniform distributions will degenerate the message product Q p A PaðsÞ m ps ðX s Þ.We use a two-component example for further explication.When a system, say X 3 , consists of two basic components, say X 1 and X 2 , we can refer X 4 ¼fX u 4,1 ,X u 4,2 g to the compound node.Then the kernel means of message m 14 (X 4 ) are sampled as described in the first issue, i.e. x uðnÞ 4,1 ¼ x ðnÞ sampled from uniform distribution.The kernels of message m 24 (X 4 ) are obtained in the same manner.Before propagating those messages to node 3, a message product is required.Fig. 5 shows a possible kernel of m 14 (X 4 ) and two possible kernels of m 24 (X 4 ).According to the rule of message product in Eqs.(11a)-(11c), their product is also a normal function whose mean is a linear interpolation of two kernels.Obviously, the important region for the following message propagation is a small area where the samples get high probability in both components, such as the ellipse region in Fig. 5.This is because concentrating the samples in this area can reduce the estimation variance and failing to do so leads to a biased estimation of the density [42].
However in this two-component example, most of these product kernel means (N m in total) are spread out instead of being concentrated into the important region.Therefore there will be a degeneration of the sample set for the message product.It occurs in the multi-component case in the same way.
Our strategy to overcome this degeneration is to obtain the samples of the other components from their corresponding beliefs instead of from uniform distributions.For the above twocomponent example, given all the messages of previous time step i.e. m tÀ1 ji ðX i Þ, i A I , j A GðiÞ, and the kernel mean set of current message product for node 1 i.e. {x 1 (n) } n ¼ 1

N
, the kernel means of current message m t 14 (X 4 ), denoted by x ðnÞ 4 ¼fx uðnÞ 4,1 ,x uðnÞ 42 g where n¼ 1,y,N, are sampled as follows: In fact Gð2Þ\1 ¼ | for this twocomponent example.The multi-component case can be deduced by analogy.Now the samples of message product are certainly concentrated into the important region by this means.
In summary, a modified version of NBP is then proposed for our pairwise MRF and Table 1 shows the steps of the algorithm.In

Table 1
Algorithm of modified NBP for pairwise MRFs converted from BN by Weiss' method [37].

Initialization :
(1) sample randomly N kernel means for all messages, (2) initialize the messages m 0 ji , i A I ,jA GðiÞ as uniform distributions, i.e.
(3) initialize the iteration step t¼ 0 and the maximum iteration number T.
2. Updating the messages and the beliefs for i A I , j A GðiÞ : given input messages m t kj ðX j Þ¼fx ðnÞ kj ,L ðnÞ kj ,w ðnÞ ji (X i )a s follows (1) determine the marginal influence of Eq. ( 12) to get xðX j Þ, 11c) and the Gibbs sampler depicted in [33], (3) for n¼ 1,y,N, update kernel means : if j A I c (or i A I c) and i A ChðjÞ (or j A ChðiÞ), draw a sample from x ðnÞ ji $ c ji ðX j ¼ x ðnÞ j ,X i Þ by importance sampling or MCMC technique, -if j A I c and i A PaðjÞ, x ðnÞ ji ¼ x uðnÞ j,i , -if i A I c and j A PaðiÞ, x uðnÞ ji,j ¼ x ðnÞ j , the remaining components are sampled from previous beliefs by Gibbs sampler, i.e.
x uðnÞ ji,r $ f r ðXrÞ set w ji (n) as the importance weights (if they exit) generated by sampling techniques, select a bandwidth L ji by using the method in [41].
3. If t þ 1 o T, set t¼ t+ 1 and go to step 2. 4. Compute the MMSE and MAP estimations, for i A I n: (1) compute the beliefs and normalize them, practice, our simulation demonstrates that one hundred kernels is enough to achieve good convergence results.

Simulation analysis: application to mechatronic systems
In this section, we demonstrate simulations on a hypothetical mechatronic system: an active vehicle suspension (AVS).We start by giving a brief description of the complex system.Our model is then applied to its passive subsystem with/without observation and the entire system itself.These simulations are all implemented based on the Kernel Density Estimation (KDE) Matlab toolbox [34].
The AVS system is intended to support the vehicle body and reduce body vibration from the road surface.Fig. 6 shows the fault tree of some important components/subsystems.Apart from the linkage which is considered to be unfailing, the parallel system is composed of two subsystems: a passive subsystem and an actuator subsystem.When the two subsystems both function, the suspension works in semi-active mode.It fails if none of the subsystem functions.But then it works in passive mode if only the actuator subsystem fails and in active mode if only the actuator subsystem functions.Moreover, the passive subsystem works in a series structure with the spring and damper components.So does the actuator subsystem whose failure can be provoked by a series mechanical part or by a series electronic part.
This FT description (Fig. 6) is then converted to BN representation (Fig. 7a), by the transition algorithm of Bobbio [15].For simplicity, only constant failure rates of the root components are taken from the reliability data base [43] and given in Table 2, where the unit of failure rate has been converted to failure-perkilometer by assuming a constant velocity.Moreover, their observations are sampled from the corresponding exponential distribution of km-to-failure (KTF).

Passive subsystem 5.1.1. Learning the dependency
The passive subsystem is a simple series case with only three nodes.The failure rate (FR) of the subsystem is thus 0.785 Â 10 À 7 F/km when decided deterministically.Given its true FR we can obtain the optimal deviation parameter s Ã pairwise MRF converted from BN model Monte Carlo simulation of Eq. ( 6).Thanks to the easy manipulation of discretized probability density function (PDF), the density distance measure in Eq. ( 2) is defined as a city block difference between two densities, dðr,qÞ¼s t Á X where r i and q i are discretized values of the two probability densities r and q. s t is the constant interval of the discretization.A curve of s Ã 2 with respect to actual FR l true 2 is illustrated in Fig. 8 after maximum likelihood estimation (MLE).For a series structure, the FRs greater than the deterministic one (sum of the components' FR) are not feasible, thus plotted by a dashed line.It is also found that the optimal s is monotonically decreasing with respect to the distance from observed FR to the deterministic one that is indicated in the figure by dash-dot straight line.This is sound because the increasing FR implies dependency (see in Eq. ( 2)) when it is less than the deterministic one.

SRA of unobserved subsystem
Without loss of generality, the KTF data is modeled by a oneparameter exponential distribution.Using the i.i.d.KTF data observed in Table 2, the likelihoods of the observation with respect to FR are calculated as follows: where d i (n) denotes the nth sample KTF in the data set of d i .W e show the normalized likelihoods with solid line in Fig. 9a and b respectively.When the passive subsystem is unobserved and has no priors, the likelihood curve with respect to FR is assumed to be uniform in the range of [0.485 Â 10 À 7 F/Km, 1.085 Â 10 À 7 F/Km], shown as a solid straight line in Fig. 9c.After 10 iterations of inference by our modified NBP, the maximum marginal criteria yields the marginal posterior (MP) distributions.They are plotted in Fig. 9a, b and c with dash-dot lines for the spring (node 4), the damper (node 5) and the passive subsystem (node 2) respectively.Notice that the posteriors are all sharper than the normalized likelihoods even when the priors do not exist.This implies that the network has been correctly driven by the dependency defined in Eq. ( 2).To compare more easily, the actual FRs of the components and the deterministic FR of the subsystem are all indicated in those figures by dashed straight lines.The comparison of those estimates is also reported in Table 3.The results are identical to the densities in Fig. 9.In this table MLE comes from normalized likelihood density, while the minimum mean square estimation (MMSE), maximum a posterior (MAP) and the quantiles are evaluated by using the posterior distribution.Notice that the MMSE results are much closer to the actual values than the MLE results.We also obtain the MMSE and the MAP of subsystem, which is not feasible for MLE.These confirm the effectiveness of our model.Furthermore, Fig. 9d presents the reliability of the subsystem given by R 2 (t) ¼exp[ À x 2 t] with the mean of the marginal posterior (solid line) and 90% confidence intervals (dashed lines) respectively.

SRA of observed subsystem
In the following example, we kept the configuration of Section 5.1.2unchanged except that the subsystem acquired 30 KTF data given as Passive device in Table 2.We use a Weibull distribution with two parameters for the subsystem node.The likelihood of the observed data with respect to FR can be estimated in the same way as done in Section 5.1.2.It is plotted in Fig. 10c by a surface in the 2D Weibull parameter space: shape and 1/scale (for comparing the FR because scale ¼FR À 1 when the shape is set at one).After 10 iterations of our modified NBP, we likewise obtain the FR's marginal posteriors (MP) of the components and the Weibull's marginal posterior surface of the subsystem in Fig. 10.Notice that the marginal posteriors in Fig. 10a and b are sharper than those of Section 5.1.2because we have received more evidence for the network.In the same way, the posterior in Fig. 10d has less covariance than the normalized likelihood does in Fig. 10c.Moreover, MAP of the posterior is much closer to the actual value than the MLE, see Fig. 10d and c where the straight solid line indicates the actual FR (0.785 Â 10 À 7 F/Km).These results demonstrate that our model is validated by the inference.F/km, X 5 ¼0.68 Â 10 À 7 F/km.
The goodness-of-fit is discussed based on the statistic of Bayesian w 2 presented in [13] where K ¼ 4 % 30 0:4 Þ equal probability bins are suggested for our case.We draw repeatedly from the marginal posterior of Weibull parameter space, say (a,b), and compute the corresponding cumulative probability of observations' likelihood, F Lik ðd i jða,bÞÞ.We decide which bin the draw should be assigned to, and calculate the Bayesian w 2 statistic, then compare it against the 0.95 quantile of w 2 3 ðw 2 3,0:95 % 7:82Þ.W e observed that 13% of these values exceed the quantile, which indicates that the marginal posterior fits the subsystem's data observed reasonably well.Notice that in this example the newly-observed data is successfully combined into the network and this can yield an up-to-date posterior estimation.In the same way, other types of new data, such as expert prior and historical data, can also be integrated to the network without difficulty.More information is fused, more-accurate estimation can be achieved.This can further help to identify the system deterioration by sequential updating after replacing the past observation with the new one.

The entire mechatronic system
In this example, the new SRA model is tested on the entire system by the NBP and the modified NBP respectively.In this system only the root nodes are observed and given in Table 2 (all data except the passive subsystem).As a deterministic function, a series structure of exponential family still has an output of exponential distribution, but a parallel structure does not.Therefore use a Weibull distribution for the system node and the exponential family for the subsystems.The actual FR of the nodes are given in Table 4, where the optimal deviation is leaned by Eq. ( 6) with some simulated observation by the density distance measure in Eq. (13).Note that a loop (formed by node 3, 6, 7, and 10) exists in the actuator subsystem.Hence the inference algorithm needs more iterations to converge.After 60 iterations in the modified NBP, we obtain the MMSE and MAP results of the system as well as those of the subsystems, shown in Table 4. Notice that the errors of MMSE results are all less than 10% even without any observation of those nodes.The further reliability assessment can be accomplished by using the corresponding TTF distribution model, as done in Fig. 9d.
We also show every-five-iterations the convergence process of the system's Weibull parameters in Fig. 11, where the points indicate the MMSE values by NBP and modified NBP respectively.It is clear that our modification makes NBP converge much faster.All these results demonstrate again that our dependency model is effective by using the normal distribution in Eq. (2).

Conclusion
In this research, the system reliability assessment (SRA) has been formulated as a statistical inference problem of Bayesian networks (BNs).In contrast to conventional methods, we model the time-to-failure of the system/components by the parametric distributions whose parameters are considered as random variables in the BN, such as failure rate of exponential family, shape and scale of 2D Weibull family and so on.Further, the parameter uncertainty of output function is modeled by a normal distribution whose standard deviation can be learned from samples.All of these uncertainties constitute the quantitative specification of the BN.To avoid the use of junction trees which are inefficient for the large-treewidth mechatronic systems, a modified nonparametric belief propagation (NBP) is developed to perform the inference in the BNs.For reasoning in a continuous BN, this provides an alternative solution to the other methods, such as mixture of truncated exponentials (MTE), dynamic discretization (DD) and Markov chain Monte Carlo (MCMC).12.9 Â 10 À 7 X 6 .Mech. dev.
2.8 Â 10 À 7 0.0628 3.0 Â 10 À 7 3.1 Â 10 À 7 Experimental results demonstrate the efficiency of our NBP modification and the effectiveness of our SRA method: the posterior fits the observed data well, and the uncertainty is effectively propagated through our BN based SRA model.Based on the parametric time-to-failure (TTF) representation, our BN model can perform further analysis, such as sensitivity analysis, diagnosis analysis.One future work is to consider model selection for unobservable items.It is another future work to apply our SRA method to the reliable active control issue in complex structural problems [44][45][46].

Acknowledgment
The authors would like to thank China Scholarship Council (CSC) 2007153069 for the financial support of this research and gratefully acknowledge the anonymous referees for all their insightful comments and very helpful suggestions.

FTFig. 1 .
Fig. 1.The simplest OR-gate and AND-gate models described by Fault tree (FT) and Bayesian network (BN).The BN model is converted from the FTs by the method in [15].(a) FT OR-gate model, (b) FT AND-gate model and (c) BN model.

Fig. 2 .
Fig. 2. The stochastic nature of failure density in reliability analysis.

Fig. 3 .
Fig. 3.The distribution of output parameter given the parameters of the basic items.

Fig. 4 .
Fig. 4.An example of the conversion from (a) a BN to (b) the corresponding pairwise MRF and (c) the local message passing in the MRF.The shaded circles represent the hidden nodes and the white circles denote the observed ones.(a) BN model, (b) pairwise MRF model, and (c) local message passing in MRF.

Fig. 5 .
Fig. 5.The degeneration problem of message product in a two-component case.

Fig. 6 .
Fig. 6.A fault tree example of an active vehicle suspension.

Fig. 7 .
Fig. 7.The corresponding BN model and pairwise MRF model of an active vehicle suspension.The gray circles represent hidden nodes and the white ones denote cluster nodes.(a) BN model and (b) pairwise MRF converted from BN model.

Fig. 9 .
Fig. 9. Normalized likelihood and marginal posterior (MP) of FR in the passive subsystem.In (a), (b) and (c) solid lines stand for the likelihoods, dash-dot lines for the posteriors and dashed lines for the actual values.In (d) the solid line represents the reliability of the subsystem by FR's posterior mean and the dashed lines for those of 5% and 95% quantiles of the posterior, indicating the 90% confidence intervals.

Table 2
Failure rates and observed km-to-failure (KTF) of components.
ij ðX i ,X j Þ potential of pairwise clique in a MRF BðiÞ label set of node i's basic components PaðiÞ label set of parent nodes of cluster node i in MRF nth kernel mean in messagem ij (X j ) corresponding component of variable X r in the nth kernel mean of message m ij (X ij ) w ij (n) weight associated with nth kernel mean in message m ij (X j ) L ij common bandwith of message m ij (X j ) c the