Noise Reduction in Side Channel Attack Using Fourth-Order Cumulant

Side channel attacks exploit physical information leaked during the operation of a cryptographic device (e.g., a smart card). The confidential data, which can be leaked from side channels, are timing of operations, power consumption, and electromagnetic emanation. In this paper, we propose a preprocessing method based on the fourth-order cumulant, which aims to improve the performance of side channel attacks. It takes advantages of the Gaussian and nonGaussian properties, that respectively characterize the noise and the signal, to remove the effects due to Gaussian noise coupled into side channel signals. The proposed method is then applied to analyze the electromagnetic signals of a synthesized application-specific integrated circuit during a data encryption standard operation. The theoretical and experimental results show that our method significantly reduces the number of side channel signals needed to detect the encryption key.


Noise Reduction in Side Channel Attack
Using Fourth-Order Cumulant Thanh-Ha Le, Jessy Clédière, Christine Servière, and Jean-Louis Lacoume, Senior Member, IEEE Abstract-Side channel attacks exploit physical information leaked during the operation of a cryptographic device (e.g., a smart card).The confidential data, which can be leaked from side channels, are timing of operations, power consumption, and electromagnetic emanation.In this paper, we propose a preprocessing method based on the fourth-order cumulant, which aims to improve the performance of side channel attacks.It takes advantages of the Gaussian and nonGaussian properties, that respectively characterize the noise and the signal, to remove the effects due to Gaussian noise coupled into side channel signals.The proposed method is then applied to analyze the electromagnetic signals of a synthesized application-specific integrated circuit during a data encryption standard operation.The theoretical and experimental results show that our method significantly reduces the number of side channel signals needed to detect the encryption key.Index Terms-Correlation power analysis (CPA), data encryption standard (DES), differential power analysis (DPA), fourth-order cumulant, Gaussian noise, higher order statistics, side channel attack.

I. INTRODUCTION
S IDE channel analysis was first introduced in the form of timing attacks by Kocher in 1996 [1].Some years later, Kocher et al. proposed another attack based on power consumption information, known as differential power analysis (DPA) [2].Power consumption signals of complementary metal-oxide semiconductor (CMOS) chips were used to deduce the key of the DES algorithm [3] by the difference of mean curves selected on defined criteria.Later, electromagnetic emanation signals obtained by different kinds of sensors were successfully used to replace power consumption signals [4]- [6].This kind of attack is known as differential electromagnetic analysis (DEMA).The effectiveness of DPA and DEMA has been verified in different types of devices [application-specific integtated circuit (ASIC), field-programmable gate array (FPGA)], implemented with different cryptographic algorithms (DES, AES, RC4, ECC, RSA).Several countermeasures have been proposed to secure them from first-and high-order differential attacks [7]- [10].Numerous authors have extended Kocher's et al. point of view by introducing multibit DPA methods to improve the differential attack [11]- [14].Recently, the new technique of correlation T.-H.Le and J. Clédière are with the CEA Leti, Grenoble 38054, France (e-mail: thanhha.le@cea.fr;jessy.clediere@cea.fr).
Since the detection of an encryption key is mainly based on side channel signals, their signal-to-noise ratio (SNR) may significantly influence the key guess accuracy.If the undesirable noise level is extremely high, the secret key can be undetectable.Therefore, adding noise to side channel signals is one of the countermeasures against side channel analysis.The averaging operation can be used to reduce noise as in [2] and [15].However, this method requires many power consumption signals.
Messerges et al. introduced another method which consists of filtering noise and using the multibit DPA attack to improve the SNR of DPA signals [12].Contrary to many approaches which try to eliminate noise, the template attack technique [15] is based on a precise noise model to collect the maximum information from a single signal.Template attacks were then developed in [16]- [18].
Our work is directed toward filling the gap between signal processing and what has been previously proposed.The idea of improving the detection of transient signals embedded in additive Gaussian noise using higher order statistics was investigated in [19] and [20].Transient and impulsive signals have super-Gaussian probability densities and, thus, high values of kurtosis.As a result, by using the fourth-order cumulant of the observations, the effects due to Gaussian noise can be removed and the dynamics of the signal can be enhanced.
Our contribution focuses on exploiting the fourth-order cumulant properties as a preprocessing phase before the standard DPA/CPA methods.We calculate the probability of detection and the SNR which represent the capacity of the secret key detection.We show theoretically and experimentally that our method supports the reduction of the number of signals needed to detect the encryption key.
This paper is structured as follows.The background of side channels attacks and higher order statistics is presented in Section II and Section III.In Section IV, we provide a detailed explanation of the proposed method.Section V describes the theoretical analysis of our solution which is then experimentally validated in Section VI.

A. Information Leaked From Side Channel Signals
Today, CMOS technology is the most widely used in digital design applications, such as smart cards.Two main side channels which can be leaked in CMOS circuits are the power dissipation and the electromagnetic emanation.

1) Power Dissipation:
The amount of power dissipated in a CMOS circuit is the sum of static and dynamic dissipation [21].The static dissipation, which is, in general, very small, is due to leakage current or other currents drawn continuously from the power supply.The dynamic dissipation is due to the switching transient current, the charging, and the discharging of load capacitance.From a side channel attack point of view, the dynamic dissipation contains significant information which can be exploited by attackers.
2) Electromagnetic Emanations: A sudden current pulse in a CMOS circuit causes a sudden variation of the electromagnetic field surrounding the device, which can be captured by inductive sensors.The relation between the magnetic field and its source current is given by Biot-Savart'sla w , where is an infinitesimal length of the conductor carrying electric current , is the magnetic permeability, and is the directional vector representing the distance between the current and the field point.According to Faraday's law, any change in the magnetic environment of a coil of wire will cause a voltage to be induced in the coil , where the magnetic flux is .Hence, if useful information is contained in , it can also be detected by measuring .The advantage of electromagnetic signals compared to power consumption signals is the possibility of measuring without direct device access.Furthermore, for each message, several electromagnetic signals can be captured by placing sensors in different positions [5] to obtain more localized information.

B. Side Channel Noise
1) Gaussian Noise: When performing a power analysis of smart cards, the following kinds of noise should be taken into account.
• Intrinsic noise is due to physical fluctuations in circuits.Such noise can be distinguished into at least four different types: thermal noise, shot noise, noise, and generation-recombination noise [22].
• Added noise is due to voluntary physical fluctuations in circuits.It can be added by using a linear feedback shift register (LFSR) or random generators, which allow chip developers to partially block side channel attacks.• Quantization noise is caused by analog-to-digital conversion and is assumed to be an uncorrelated stationary white noise source [23].Numerical noise can also be generated during DPA/CPA computation.• External noise is generated by external sources, such as measuring equipment or environment conditions.In practice, all fluctuating currents and voltages generated in electrical devices have a probability density function of Gaussian form [22] since the fluctuating quantity is the sum of a large number of independent random variables.In such a case, the central limit theorem holds and, thus, the intrinsic noise is Gaussian.By the same way, we can consider the quantization noise and the external noise as Gaussian noise.
2) Temporal Misalignment: The temporal misalignment of signals provokes a great amount of noise into signals and destabilizes side channel attacks.The misalignment sources in power analysis can be divided into two groups.The first one consists of unintentional sources generated by the device or measurements [24].The second one includes intentional sources added by device developers, for example, the random process interrupts (RPIs) [25].Some solutions were proposed in [25]- [27] to solve the temporal misalignment.

C. Differential Power Analysis
DPA exploits the dependence between the handled data and the power consumption of the circuit.The original DPA attack proposed by Kocher et al. [2] is based on the fact that the power dissipation to manipulate one bit to 1 is different from the power dissipation to manipulate it to 0. To test different keys , DPA uses ciphertexts (or plaintexts) and a selection function which predicts the value of an examined bit .DPA computes the differential trace as the difference between the average of the traces for which is 1 and the average of the traces for which is 0. If we denote as the power consumption signal corresponding to the message , the trace is computed as follows: (1) In theory, if the bits inside the algorithm are uniformly distributed and if the choice of and text messages is suitable, then for the correct hypothesis , the at the instant when the bit is handled.It is thus represented by a peak in the differential trace at the instant , which is called the DPA peak.For incorrect keys, tends to 0 and no significant peak appears.However, in practice, the bit distribution conditions are never perfect, and some output correlations can occur with incorrect key guess, so we observe other peaks which are not the DPA peak.We define a ghost peak as the one which appears at the instant and in a differential curve corresponding to an incorrect key hypothesis.The ghost peak problem was explained in [28] and [29].We call also a secondary peak as the one which appears at an instant other than in a differential curve corresponding to any key hypothesis (wrong or correct).In our experiment, we detect the subkey used in the first S-box of the first round of DES.The size of is 6 b, so we have 64 key assumptions.The bit is one bit of the S-box output.

D. Correlation Power Analysis
Correlation approaches are based on the relation between the actual power consumption of a circuit and a power consumption model (e.g., the Hamming weight model [30], [31]).In [28], the Hamming distance model was used.The relationship between the power consumption and the Hamming distance is linear and the correct key is the one which maximizes their correlation factor.If we denote as the Hamming distance between the actual state of message and a reference state , the correlation factor is formulated as where and are the standard deviations of and .In our evaluation, we examine four bits of an S-box output and is the Hamming distance between the S-box output and its reference state.

III. HIGHER ORDER STATISTICS
Moments and cumulants are statistical measures which characterize signal properties.The first-order moment (the mean) and the second-order cumulant (the variance) have been widely used to characterize the probability distribution of a signal.If a signal has a Gaussian probability density function, it is sufficient to use the first-and second-order measures to characterize it.However, many real-life signals are nonGaussian and higher order statistics (HOS, moments and cumulants of orders higher than 2) are needed to fully describe them.As for applications, HOS first play an important role in blind array signal processing [32], [33].The idea of the Gaussian noise suppression using cumulants was investigated in [34].Another application using cumulants is the retrieval harmonics in noise [35], [36].Blind source separation also obtains much success using HOS [37], [38].
Consider a 1-D real random variable which is associated with its first and second characteristic functions.The moments of can be obtained by deriving the first characteristic function at point 0, whereas the cumulants can be obtained by deriving the second characteristic function at point 0 [39].The th-order cumulant is a function of the moments of orders up to .If the variable is centered (i.e., ), for the orders from 1 to 4, these relations are where and are the -order moment and cumulant, respectively.
Many interesting properties of cumulants can be found in [40].In our work, we are mainly interested in the following characteristic: cumulants of order higher than 2 can remove the Gaussian noise present in the signal.It means that if are Gaussian random variables independent of , then we have .In general, we do not have the knowledge about the probability density of the signal, the moments and cumulants are calculated by estimators.Let be a centered scalar random variable, be realizations of .The unbiased estimator of the fourth-order cumulant is formulated as [39] (3)

A. Gaussian Noise Suppression Using the Fourth-Order Cumulant
Consider the side channel signal corresponding to the message .This signal can be considered as the sum of a useful signal and Gaussian noise .As cumulants of an order higher than two of a Gaussian random variable are equal to zero, the cumulants of the signal plus Gaussian noise are equal to the cumulants of the useful signal (4) The fourth-order cumulant is generally used versus the thirdorder one since for any signal with a symmetric probability density, its third-order cumulant is equal to zero.Therefore, we use the fourth-order cumulant in our case.
We perform the cumulant computation by sliding a window of samples with a step sample as illustrated in Fig. 1.The fourth-order cumulant of the signal in each window is computed using (3).As , the influence of can be observed on consecutive values of the corresponding cumulant signal . The value of is given by the following formula [39]: (5)

B. Comparison With the Noise Variance Subtraction Method
A standard noise reduction technique in signal processing is to calculate the noise variance and then subtract it.As the noise is independent of the signal , the power of is the power of minus the power of ( 6) To illustrate this technique, we use the same sliding window and compute the power of the signal in each window.Then we estimate the noise variance1 and subtract it from the power of .We obtain the power of the useful signal as presented in the third curve of Fig. 1.While comparing the second and the third curves of Fig. 1, we observe that the contrast between the signal and the noise of the cumulant signal (the second one) is greater than that of the power signal (the third one).This can be explained by two points.First, the noise variance subtraction method requires an estimation of the noise variance while the cumulant method suppresses the noise itself.If this noise estimation is not exact, the useful information for DPA can be modified and the efficiency of the key detection may be reduced.Second, the efficiency of our method using the fourth-order cumulant is based on the fact that the useful signal is impulsive (strongly superGaussian).This property of the signal is characterized by high values of its kurtosis (i.e., the normalized fourth-order cumulant) (7) From ( 7), we deduce that for impulsive signals with high kurtosis values, its fourth-order cumulant is superior to its power .Consequently, the relative amplitude of the cumulant signal to noise is greater than that of the power signal to noise.The dynamics of cumulant signals allows us to detect the correct key more easily.

C. Temporal Misalignment Correction
Like any attack based on a sliding window [25], the attack using cumulant signals makes it possible to minimize the effect of the lack of temporal synchronization.Note that the temporal misalignment in our paper does not refer to countermeasures, such as RPI or random order executions, but to the imprecision of measurements or the clock jittering.We consider two signals and which are not well aligned as represented in the upper figure of   of Fig. 3).Hence, useful data of and converge in and the effect due to the temporal misalignment can be reduced.

V. T HEORETICAL EVALUATION
A theoretical model was proposed in [41] to determine the effect of hardware countermeasures against DPA (noise adding and random disarrangement of the instant ).However, this evaluation is only dedicated to original signals (power consumption and electromagnetic signals).The goal of this section is to provide a theoretical study which makes it possible to evaluate the DPA methods using different signal types: the original DPA [2], the cumulant DPA, and two other methods based on the sliding window technique: the integration DPA [25] and the energy DPA [42].
In [25], the sliding window concept was used to collect peaks distributed over consecutive cycles.This technique can also be applied when the peak is distributed over consecutive samples.In regard to the energy DPA proposed in [42], we make two remarks.First, the differential signal of DPA is the difference between two mean signals.If we use the energy signals instead of the original signals, after the subtraction, the noise variance of two mean signals will be removed.Therefore, the DPA method using energy signals suppresses implicitly the noise variance.Second, as the power of a signal is its energy divided by the signal length, the energy signals can be replaced by the power signals.In our case, the signal length is the sliding window length, which is fixed.Therefore, the energy DPA method is nothing other than the DPA using power signals presented in Section IV-B.Hereafter, we call this method the power DPA.We consider the monobit analysis where one bit is examined.Side channel signals are distributed in two groups corresponding to the value of ( or ).If the bit is handled to 1, the corresponding side channel signal is denoted by , and if is handled to 0, the corresponding side channel signal is denoted by .Fig. 4 shows an example of two electromagnetic signals and , which are used in the theoretical evaluation.

A. First Index: Probability of Detection
In our context, the probability of detection represents the capacity of correctly detecting the secret key among key hypotheses.This parameter is computed at the instant when the examined bit is handled.In order to simplify the problem without loss of generality, we consider two hypotheses: the correct hypothesis and the wrong hypothesis .We choose as the detection threshold.The key hypothes, whose peak is higher than , is considered to be the correct key.Let us denote: • as the height of the detection peak of ; • as the height of the detection peak of ; • and as the expectations of and ; • and as the standard deviations of and .A s differential signals are computed from the same elementary signals, we can consider that the distributions of and are Gaussian with the same standard deviation, that is (Fig. 5).

The probability of detection is
, where is the probability of a miss The probability of detection can be written as (8) where is the complementary error function .In order to compute ,

B. Second Index: SNR
In many cases, we do not have any knowledge about the instant when the examined bit is handled.The detection peak of the correct key cannot be observed because it is covered with noise.We thus define the second parameter, which is the SNR of the differential curve corresponding to the correct key.The detection peak is considered to be the signal and the other parts of the curve are defined as noise.The SNR of each method is height of the detection peak standard deviation of noise (9) The theoretical values of DPA peak height and the standard deviation of noise are given in Table I. ( denotes the mean of ).The calculations of noise are given in the Appendix.Note that , where is the kurtosis of the signal .We develop of the cumulant DPA method as follows:

As ,weha ve
The value represents the noise variance of and ,s o , ,or .As signal (and ) is impulsive, 2 we have .We obtain: (10) The previous demonstration confirms the advantage of the cumulant method compared to the power one when the examined signal is impulsive (i.e., its kurtosis is highly superior to 1).

C. About CPA Using Fourth-Order Cumulant Signals
If the Hamming weight or the Hamming distance model is adopted , where is the instant when the data are handled, is the Hamming weight or Hamming distance of the data and and are constant values.The side channel at the instant becomes (11) As is Gaussian noise, it will disappear after the cumulant computation.We can write , where , , , , and are constant values computed from and .As only the term contributes in the correlation factor between and , this correlation factor is exactly the one between and multiplied by a constant.The probability of detection of CPA with cumulant signals, calculated at the instant , is equivalent to that of CPA with original signals.However, using cumulant signals, the SNR of CPA will clearly be enhanced because of the signal denoising.Consequently, the key detection is more efficient.

D. Discussion
The criteria defined previously allow us to evaluate the performance of an attack.A method is powerful if both the probability of detection and the SNR of the DPA signal are high.It is obvious that and SNR depend on the number of side channel signals and the standard deviation of noise in a side channel signal .However, these dependences are simple: when increases, and SNR increase and when increases, and SNR decrease.In this section, we present only the variation of given by ( 8) (Fig. 6), and the variation of SNR given in Table I (Fig. 7) according to the sliding window length .The number of side channel signals is set to and the noise level of an elementary side channel signal is mV.The and the SNR of the original method, which does not use the sliding window technique, are independent of .They are thus represented by horizontal lines (Figs. 6 and 7).For the integration method, the longer the window is, the greater the noise is added to the window.Consequently, its and SNR decrease rapidly when the window length increases.The decrease of SNR of the integration DPA was explained in [25].This method is even worse than the original one if is large .It means that the integration method is only applicable to weak misalignments [i.e., the peak is distributed over a small number of consecutive samples (or cycles)].Regarding the power DPA, the fact that the noise variance is removed by the subtraction of two mean signals makes it better than the original and the integration methods.
The variations of and SNR corresponding to the cumulant method presents a fall when .It can be explained by two reasons.On one hand, when the sliding window is too small, the signal cannot be considered as impulsive.Accordingly, its kurtosis is not high, and the values of and SNR are close to 0. On the other hand, if the window is not large enough, the assumption about the Gaussian noise may not hold and the cumulant of noise can be different from 0. When the window is large, the conditions of impulsive signal and of Gaussian noise hold.Therefore, the cumulant DPA performs better than the power DPA and it becomes the best method in both criteria: the probability of detection and the SNR.

VI. EXPERIMENTAL RESULTS
For a real experiment, the probability of detection is replaced by index , which is easier to compute.It is defined as the ratio between the DPA/CPA peak corresponding to the correct key (expected peak) and the highest DPA/CPA peak resulting from incorrect keys (ghost peaks).These peaks are observed at the same time location when the data are handled.If this index is greater than 1, the expected peak is higher than any ghost peak and the key detection is reliable.In contrast, if this index is smaller than 1, a ghost peak exists which is higher than the expected peak and the method is not effective.The second index, denoted , is the signal-to-noise ratio of the DPA/CPA signal corresponding to the correct key.This index the SNR defined in the previous section.

A. Experimental Validation of the Cumulant-Based Analysis
In our experiment, we measure the electromagnetic emanations of a synthesized application-specific integrated circuit (ASIC) during a DES operation.The sampling rate is 612.5 MHz and the clock rate is 2.1 MHz.We obtain an electromagnetic signal from each random message used in the input (upper curve of Fig. 1).Here, the notation represents the voltage value at the output of our electromagnetic sensor corresponding to the message .
In the first experiment, we used 3000 messages to test 64 key assumptions.The DPA and CPA signals were computed using

B. Performance Evaluation
The variation of index , when the number of cipher messages varies from 100 to 10000 message, is illustrated in Fig. 9.This figure shows that the cumulant-based DPA method performs much better than the original DPA.This improvement is explained by the fact that the cumulant operation removes the Gaussian noise impact, corrects the misalignment, and keeps the difference of power dissipation to manipulate one bit to 1 or to 0. When comparing CPA and cumulant-based CPA, we see that the latter method still works but its improvement is not significant.
The evaluation of index is depicted in Fig. 10.It shows that the SNR of DPA and cumulant-based DPA signals are always good.Index of CPA-and cumulant-based CPA methods is low because of the normalization of CPA [13].The key detection depends on both indexes and .I ti s feasible and reliable if the two following conditions are satisfied and .The first condition is trivial.The choice of depends on the probability of false alarm (see Fig. 11).For a centered normalized Gaussian noise and a signal of 3, the SNR is equal to 3, then the corresponding probability of false alarm (i.e., noise amplitude signal amplitude) is about 7%.
According to Figs. 9 and 10, the DPA method needs about 2500 messages and CPA needs about 400 messages to detect the correct key.By using the cumulant tool, our proposed methods require only 200 messages to retrieve the encryption key.Fig. 12 confirms our conclusion about the required number of messages.The left column signals correspond to the experiment with 400 cipher messages.We see the appearance of many unexpected peaks in DPA signals (i.e., the encryption key cannot be uncovered).Meanwhile, CPA, the cumulant-based DPA, and CPA methods are effective.If the number of messages is reduced to 200, only the cumulant-based DPA/CPA methods allow detection of the secret key.
The experimental results show that our cumulant based methods are more powerful than the original ones.The cumulant application improves DPA significantly in terms of the number of messages.Instead of using 2500 messages, the cumulant based DPA needs only 200 messages.
Note that in this experiment, the misalignment of signals is relatively weak (about 3, 4 samples).If the misalignment becomes more important, the key detection of the original DPA and CPA methods will be reduced.The performance of cumulant methods, which use the sliding window technique, is not affected.In this case, the attack efficiency is much more remarkable.

C. Choice of the Window Length and the Sliding Step
As we observed in the previous paragraph, the cumulant method gives good values of index ; hence, the detection efficiency is related to .It depends strongly on the choice of the window size and the sliding step .One should note that the relevant information from the side channel signals of DES operation is located around 16 peaks corresponding to 16 rounds of DES.In our case, the distance between two consecutive peaks is about 300 samples.If we choose , some positions of the sliding window exist that contain two consecutive peaks.After performing cumulant calculation, the information included in two consecutive peaks will be merged into one large cumulant peak as depicted in the upper curve of Fig. 13 ( , ).The 16 original peaks in the side channel signal are completely deformed, and the effectiveness of the cumulant DPA will be degraded.

Fig. 1 .
Fig. 1.Horizontal axes represent the time sampling proportional to clock cycles.The first vertical axis represents the voltage value on the output of an electromagnetic sensor (mV), the second one represents its fourth-order cumulant signal, and the third one represents the power signal with noise subtracted.The two last signals are obtained by sliding a window of 100 samples on the upper signal.

Fig. 2 .
The summed signal is shown by the lower curve of Fig. 2. We clearly observe that the information contained in and is dispersed in two distinct peaks of .The temporal misalignment of side channel signals reduces the attack effectiveness.If we use the cumulant signals and (upper figure of Fig. 3), the information in both signals and is then accumulated into the signal (lower figure

Fig. 2 .
Fig. 2. Upper figure: misaligned signals s and s .Lower figure: sum of two signals s = s + s .

Fig. 3 .
Fig. 3. Upper figure: misaligned cumulant signals c and c .Lower figure: sum of two cumulant signals c = c + c .

Fig. 5 .
Fig. 5. Probability of detection P and the probability of a miss P .

Fig. 6 .Fig. 7 .
Fig. 6.Variation of the detection probability in function of N .

Fig. 8 .
Fig. 8. DPA, cumulant DPA, CPA and cumulant CPA signals.Left column: signals corresponding to the correct key.Right column: signals corresponding to a wrong key.

( 1 )
and (2), respectively.3For DPA, a selection function based on 1-b Hamming distance was used.For the CPA method, we examined 4 b.The cumulant signals were collected by sliding a window of samples with a step .The choice of is verified by the theoretical evaluation in Section V.It corresponds to the high values of and SNR (Figs. 6 and 7).Fig. 8 represents from top to bottom the DPA, cumulant-based DPA, CPA, and cumulant-based CPA signals corresponding to the correct key (left column) and a wrong key resulting in the highest ghost peak (right column).First, the results show that all four methods allow the retrieval of the correct key.It means that the cumulant operation does not eliminate the useful information for DPA and CPA in the electromagnetic signals.Second, thanks to the high dynamic of cumulant signals, the peaks at other instants than of DPA signals (the secondary peaks), which appear frequently in monobit DPA, are clearly reduced using the cumulant method.Third, we observe a high level of noise in the CPA signal.Index gives a good measure of the noise problem.