Bearing Fault Event-Triggered Diagnosis using a Variational Mode Decomposition-based Machine Learning Approach

—The monitoring of rolling element bearing is indexed as a critical task for condition-based maintenance in various industrial applications. It allows avoiding unscheduled maintenance operations while decreasing their cost. For this purpose, various methodologies were developed to ensure accurate and efﬁcient monitoring. In this context, this paper proposes an approach for bearing fault early diagnosis based on the variational mode decomposition (VMD), used as a notch ﬁlter for dominant mode cancellation, and a machine learning approach, namely the one-dimensional convolution neural network (1D-CNN), for detection and diagnosis purposes. Speciﬁcally, the proposed approach ﬁrst performs features extraction using VMD for fault detection, and then triggers to multi-scale features extraction using CNN convolution and pooling layers for classiﬁcation and diagnosis. The proposed bearing fault detection and diagnosis approach is evaluated, in terms of robustness and performances, using the well-known Case Western Reserve University experimental dataset. In addition, performances are evaluated versus well-established demodulation techniques, in terms of fault detection, and machine learning strategies, in terms of fault diagnosis. The achieved results show that the proposed VMD notch ﬁlter-based 1D-CNN approach is clearly promising for bearing degradation monitoring.


I. INTRODUCTION
I NDUSTRIAL rotating machines key and critical components are clearly gears and bearings as their failure modes can lead to prolonged downtime and substantial additional maintenance costs.Monitoring the health status of these components is therefore a high industrial priority [1], [2].Monitoring and analyzing these sensed vibrations is therefore given a great industrial importance, in addition to carry information about the rotating machine dynamic state [3].Indeed, industrial experience feedback has shown that 70% of rotating machine shutdowns are caused by induced vibrations and 30% of them are a consequence of bearing failures [4], H. Habbouche and T. Benkedjouh are with Ecole Militaire Polytechnique, Mechanical Structures Laboratory, Bordj Elbahri 16046 Algiers, Algeria (email: habbouche.houssem@gmail.com;bktarek@gmail.com).
M. Benbouzid is with the University of Brest, UMR CNRS 6027 IRDL, 29238 Brest, France, and also with the Logistics Engineering College, Shanghai Maritime University, Shanghai 201306, China (e-mail: mohamed.benbouzid@univ-brest.fr).[5].Rotating machines state of health indicators have to be adequately chosen to help in the monitoring process: failure detection [6], failure diagnosis, which is in fact the classification of the failure type, and the prognosis to estimate the remaining useful lifetime in order to avoid unplanned shutdowns [7], [8].For this purpose, many techniques and tools were developed for these failures detection and diagnosis, as well as potentially attempting to prolong the working life cycle.While the literature is rich of model-based failures detection and diagnosis approaches, data-driven or signal-based processing techniques become more attractive as they do not require any prior knowledge of rotating machines parameters in addition of their ability to extract useful features for fault recognition [9] without specific information about the rotating machine operating conditions [10].Depending on the operating conditions, specific signal processing techniques were developed, where the most popular ones are time-and frequency-analysis techniques for steady-state conditions, while time-frequency and time-scale representations are adopted for non-stationary behaviors.As highlighted in the literature, these techniques have some limitations mainly in terms of complexity, poor resolution, and cross-terms occurrence.To address this issues, parametric methods, based on parameter estimations of a known model were developed.Nevertheless, these methods have drawbacks as they are formulated through integral transforms and analytic signal representations [11].Hence, they depend on data length and model accuracy.To overcome the above-mentioned drawbacks, data driven approaches based on mode decomposition (i.e.empirical mode decomposition (EMD), ensemble empirical mode decomposition EEMD, waveform mode decomposition (WPD), variational mode decomposition (VMD), etc.) were introduced and their merits highlighted as one of the most suitable bearing fault detection and diagnostic approaches given the generated non-stationary vibration signals [4], [12], [13].These techniques are often coupled with an artificial intelligence-based algorithm for automatic fault classification [14].Indeed, artificial intelligence, namely machine learning, is introduced to process the acquired signals and the extracted features with the objective to design a fully automated diagnosis process [13].In this context, artificial intelligence-based techniques provide relevant tools in terms of features recognition, detection, diagnosis, and prediction accuracy (prognosis) [15].Specifically, machine learning-based algorithms, such as deep neural network (DNN), recurrent neural network (RNN), and conventional neural network (CNN) are considered as the best path for features classification [16].Indeed, these techniques efficiently extract features by useful information compression [7], [8].Hybridizing mode decomposition with machine learning for fault detection and diagnosis, respectively, has been recently addressed in the literature.Indeed, Chen et al. [17] proposed a comparative study between VMD and EMD-based support vector machine (SVM) approaches for classification.They specifically investigated the entropy of each intrinsic mode function and mode as an input vector for classification purpose for wind turbine monitoring.For gearbox monitoring, Li et al. [18], used VMD for features extraction, by power spectral entropy, and DNN for classification.In this case, comparisons have been carried out versus back propagation neural network (BPNN) and SVM, although modes after decomposition were stationary.Gai et al. [19] used hybrid grey wolf optimization to find the VMD best parameters combination to improve mode decomposition for DBN-based learning and bearing faults classification.In this study, where VMD has been compared to EMD, it has been shown that convolution layers to extract more useful information can improve the diagnosis accuracy.Sharma et al. [20] compared VMD to empirical wavelet transform and flexible analytic wavelet transform for gearbox faults detection.This study highlighted the VMD effectiveness in clearly exhibiting a fault by extracting its transients.In the same context, Gu et al. [4] proposed a framework based on VMD to decompose signals, then calculating statistical indicators used as SVM inputs for learning and bearing states classification.In this case, VMD-based decomposition has been compared to EMD.However, replacing signals with temporal indicators limits the technique robustness.For the monitoring of a refrigeration system, Wang et al. [21] proposed a combination of 1D-CNN and gate recurrent unit (GRU) to ensure features extraction and learning features for classification.In this case, the proposal has been evaluated versus BPNN, CNN, and long short-term memory (LSTM).A multi-signal fault detection and diagnosis approach has been proposed by Hao et al. [22] which used 1D-CNN for features extraction that LSTM for classification.The effectiveness of this approach has been assessed versus SVM, k-nearest neighbors (KNN), BPNN, and CNN.The denoising issue has been addressed by Liu et al. [23] who proposed a combination of 1D-convolutional autoencoder for denoising input signals, which are then learned by 1D-CNN for faults classification.In this case, high diagnosis accuracy has been achieved.Finally, the idea to remove the signal processing step has been proposed by Jiang et al. [24].It is in then replaced by convolution layers and pooling for classes discrimination.Despite machine learning obtained results for filtering and features extraction, signal processing remains very useful especially in highly noisy environment to improve diagnosis accuracy [25].
According to the above-discussed context and state of the art review, this paper proposes a new method for bearing fault diagnosis.It is based on the VMD, as a notch filter for fault detection, and the 1D-CNN for classification purpose.In this context, the proposal main contributions are the following: • Providing an intelligent decision support tool for realtime bearing diagnosis, ensuring detection first and fault classification later; • High-level features extraction by filtering the dominant mode using VMD to identify the fault even in presence of high harmonics pollution; • Multi-scale features extraction using CNN convolution and pooling layers to extract most discriminating features between different classes; • Experimental evaluation and validation using the Case Western Reserve University (CWRU) database.The paper is organized as follows; section II is devoted to the methodology theoretical background presentation.Section III deals with the experimental evaluation and validation.Section IV provides an analysis and discussion of the achieved results while section V concludes this paper.

II. PROPOSED FAULT EVENT-TRIGGERED DIAGNOSIS
METHODOLOGY The proposed bearing fault diagnosis methodology is illustrated by Fig. 1 flowchart, where a VMD approach is adopted for fault detection then triggering to diagnosis using a machine learning approach, namely 1D-CNN.The following subsections will detail the operating flow of the proposed diagnosis methodology.

A. Variational Mode Decomposition-based Fault Detection
As above-mentioned in the state of the art review, signal processing is the step of choice to handle the issue of acquired signals corrupted by noise and harmonics.This is particularly the case of vibration signals related to mechanical components generating low amplitudes pulses [25].Signal processing techniques are therefore used to isolate these components [2], [20].Among the adaptive mode decomposition family, the VMD is proposed.Indeed, it has been introduced to improve the EMD and becomes the technique of choice for the analysis of nonstationary and nonlinear data for detection in a wide range of applications [26], [27], [28], [29].VMD has the advantageous ability to decompose complex signals into several stationary signals, regardless of their origin, using Wiener filter [13].
where x n is the acquired signal, {u k } = {u 1 , u 2 , ..., u n } are decomposition modes, and res is the residual signal after optimisation.
The decomposition process lies in solving an optimization problem formulated as: subject to where f is the original signal, {ω k } are center frequencies of each {u k }, δ(t) is an impulse function, and k is modal component number.The new formulation of the variational constrained problem is an augmented Lagrangian equation formulated as follows [18]: (3) where λ is the Lagrange multiplier, and α is a quadratic penalty factor.Resolution is done by iterative techniques that allow estimating modes u k and their central frequencies ω k as well as the Lagrangian operator λ(t), formulated iteratively in (4), ( 5), and (6), respectively [30] where ûn+1 k are obtained by Wiener filtering.
The stopping criterion is formulated as follows: where, τ is noise tolerance, and ε is convergence error.

B. Pearson Correlation Coefficient
Correlation between two signals A and B of size N is the measure of their linear dependence.It is positively or negatively assessed if the correlation coefficient ρ is close to 1 or -1, respectively [31].Pearson correlation coefficient is calculated as: where µ A , σ A , µ B , and σ B are mean and standard deviations of A and B, respectively.

C. Dominant Mode Filtering
This technique starts by decomposing signal x n (t) into n modes, where at least one of them is closer to the original signal and called the dominant mode M ode d .In this case, signal x n (t) can be expressed as [31]: where d in M ode d refers to dominant between the n modes.Location of the dominant mode is of major interest, especially for bearing fault early detection, as the sensed signal is usually dominated by other faults or even shaft rotating frequencies as previously shown in [32], [33].Hence the need to eliminate this mode to keep a filtered signal x c (t) containing only information related to the bearing fault.Dominant mode determination for elimination purpose is therefore an important step to increase bearing fault detection accuracy.The filtered signal is the result of: Metrics evaluation Fold_2

D. Convolution Neural Network
CNN design is mainly based on the convolution of inputs with filters of different sizes to generate more discriminating output features, which will be inputs of the next layer.Pooling layers (Max, Average, or L2-norm) allow information compression and complexity reduction [9], in addition to overfitting control ensuring a better learning [34].Convolution between input features u and Kernel filters k is provided by: where f represents the obtained new features, * denotes the convolution operator, b is the bias, and ϕ is the activation function [35].Feature extraction convolution and pooling layers are followed by feature learning layers (fully connected layers), which are traditional neural networks with an input, hidden, and classification layers [34].

E. Evaluation and Classification
Learning reliability is ensured by random sub-sampling technique [36].The dataset is split randomly into subsets of training and testing data.Training data are then divided into training and validation for k-folds cross-validation technique, as shown in Fig. 2, to ensure a better reliability of prediction results.Evaluation is made according to universal metrics, such as accuracy [37] and confusion matrix.

A. Experimental Dataset
The proposed VMD/1D-CNN fault detection and diagnosis methodology is evaluated using the well-known Case Western Reserve University (CWRU) database [39] that has been extensively exploited in the literature for validation purposes [4], [10], [22].The CWRU experimental setup, shown in Fig. 3, consists in a 1.49kW (2HP) three-phase Reliance electric motor driving a shaft on which a torque transducer and encoder are mounted.The used includes an induction motor, a loading motor, and an axle attached to it.The investigated bearing (SKF deep-groove ball bearings:    6205-2RS JEM and 6203-2RSJEM) real faults concern inner race, outer race, and rolling elements with different severity and sizes (0.007, 0.014, 0.021, and 0.028inch).The CWRU dataset is enriched with 4 operating modes (0, 1, 2, and 3HP) at a speed of 1730rpm, with a sampling frequency of 12kHz.
As illustrated in the methodology flowchart (Fig. 1), the above-presented data are pre-processed, i.e. segmented.Pre-processing consists in this case in data segmentation with overlap and window length of 8192 samples for each sub-signal.This is a sufficient length to maintain bearing faults features while increasing the number of sequences, resulting in a total of 28 sequences per fault.

B. Signal Processing
In this step VMD is used as a notch filter for dominant mode removing [25], [31].In this study, decomposition consists in a maximum of 8 modes (Fig. 4) with center frequencies and bandwidth illustrated in Fig. 5.In this context, dominant mode identification is carried out using Pearson correlation between each decomposed mode and the original signal x n (t).The dominant mode (best correlation with ρ = 1) is then subtracted from original signal leading to the filtered signal x c (t) that will thereafter be used for detection purposes.

C. Bearing Fault Detection and Diagnosis
Fault detection is in fact a binary classification between healthy and faulty states using 1D-CNN.In this context, VMD-based filtered signals x c (t) first undergo treatment consisting in splitting signal into training and testing data with 70% and 30% for each class, respectively.Data-augmentation is then carried out with Additive White Gaussian Noise (AWGN) [40], with different Signal to Noise Ratio (SNR) levels (10,15,20,25,30,35, and 40 dB), as shown in Fig. 6.The data preparation step is important as it allows improving learning quality by increasing the number of data, while improving fault detection robustness in a noisy environment [41].For diagnosis purposes, the same 1D-CNN network is used with a difference in the last classification layer as illustrated by Fig. 7.For fault detection, the last layer allows a binary classification (healthy or faulty state).When a fault is detected, the fault diagnosis process is switch-on (event-triggered process).Figs 8 and 9 illustrate the adopted CNN architecture for multi-scale features extraction ensuring fault detection and classification.This architecture is manually designed and tuned and consists in: (1) two 1D-convolution layers with 64 filters and kernel size of 10, (2) a dropout layer to control overfitting with a rate of 0.5 [42], (3) a max-pooling layer with pool-size of 2 for down-sampling and compressing useful information [22], and (4) a flattening layer to arrange features into vector.Features learning for classification uses a fully connected network of 300, 200, and 100 nodes for each layer, respectively, with a ReLU activation function and ends with a classification layer Softmax using a trial and error search technique.The training, tuned by a grid search mechanism [40], is done under the CPU with an early stopping option and batch size equal to 10 samples using Adam optimizer [43].

IV. ACHIEVED RESULTS ANALYSIS AND DISCUSSION
Bearing faults detection has been achieved with 100% accuracy for the four operating modes (0, 1, 2, and 3HP).The proposed fault detection method was successful and efficient in discriminating healthy and faulty states.This is mainly due to the VMD use, as a notch filter removing the dominant mode that is common to healthy and faulty states, and the 1D-CNN use, allowing good discriminant features extraction.In terms of diagnosis, 5-folds cross validation is used.The VMD choice relevance, in addition to the above-mentioned  validation, is evaluated versus the EMD, which is a similar adaptive time-frequency analysis technique but without a clearly mathematically basement [44].Moreover, dominant mode filtering impact on detection accuracy is also evaluated (noted VMD(-)) versus a classical VMD (without dominant mode subtraction).The carried out comparison study clearly shows that the VMD(-) approach outperforms the two others in terms of accuracy, as illustrated by Table II, even under different working loads.Fault diagnosis performances are also highlighted by Figs. 10 to 13 in terms of confusion matrix.
Convolution layers (1D-CNN) learning relevance is also evaluated versus well-known machine learning approaches, namely multi-layer perceptron (MLP) and RNN, which are a simple networks without convolution layers, and LSTM, which     has the ability to learn short-and long-terms information and is dedicated to time series signals.For comparison purposes, same number of layers and nodes per layer are kept while using same CWRU dataset (3HP).The achieved results are given in Table III and clearly show that 1D-CNN (trained with VMD(-)) achieve the highest diagnosis accuracy.
The above-presented achieved results clearly highlight the robustness of the proposed methodology to accurately diagnose different type of bearing faults with different severity (0.007, 0.014, 0.021, 0.028inch, centered, orthogonal, and opposite) and under different operating loads (0, 1, 2, and 3HP).In addition high diagnosis accuracy is achieved despite the noise levels (10, 15, 20, 25, 30, 35, and 40dB).In this context, it could be concluded that the carried out CWRU-based dataset simulations allow validating the proposed VMD-based 1D-CNN bearing fault diagnosis approach in conditions that are close to operational environments (i.e.varying load, noise) [45].
V. CONCLUSION This paper has proposed a specific approach for bearing fault detection and diagnosis.In this context, the variational mode decomposition (VMD) was used as a notch filter for dominant mode cancellation that clearly enhance fault detection even in presence of high harmonics pollution.For diagnosis purposes, a one dimensional convolution neural network (1D-CNN) was adopted.
The proposed bearing faults detection and diagnosis approach was clearly validated using the well-known Case Western Reserve University experimental dataset.In particular, it has been highlighted the ability of the proposed diagnosis methodology to discriminate a given fault with different severities.The detection and diagnosis performance have also been compared to the relevant state of the art corresponding techniques.In terms of fault detection, it has been shown that the VMD used as a notch filter is the solution of choice when compared to EMD and traditional VMD.In terms of features extraction for classification, convolution layers (1D-CNN) have been found worthwhile when compared with MLP, RNN, and LSTM.Future investigations should consider exploring the proposed VMD-based 1D-CNN approach using LSTM for bearing failure prognosis.Indeed, LSTM is dedicated to time-series monitoring and clearly exhibit high learning ability of long-and short-terms dependencies, which can be useful for prognosis.

Fig. 1 .
Fig. 1.Flowchart of the proposed fault event-triggered detection and diagnosis methodology.
Fault typeClass label Ball fault with a size of 0.007inch C1 Ball fault with a size of 0.014inch C2 Ball fault with a size of 0.021inch C3 Ball fault with a size of 0.028inch C4 Inner race fault with a size of 0.007inch C5 Inner race fault with a size of 0.014inch C6 Inner race fault with a size of 0.021inch C7 Inner race fault with a size of 0.028inch C8 Outer race fault centered with size of 0.007inch C9 Outer race fault orthogonal with size of 0.007inch C10 Outer race fault opposite with size of 0.007inch C11 Outer race fault centered with size of 0.014inch C12

TABLE I BEARING
FAULT TYPES.

TABLE II PROCESSING
TECHNIQUES ACCURACY COMPARISON.

TABLE III COMPARISON
OF MACHINE LEARNING NETWORKS.