Efficient regular modular exponentiation using multiplicative half-size splitting

In this paper, we consider efficient RSA modular exponentiations $$x^K \mod N$$ x K mod N which are regular and constant time. We first review the multiplicative splitting of an integer x modulo N into two half-size integers. We then take advantage of this splitting to modify the square-and-multiply exponentiation as a regular sequence of squarings always followed by a multiplication by a half-size integer. The proposed method requires around 16 % less word operations compared to Montgomery-ladder, square-always and square-and-multiply-always exponentiations. These theoretical results are validated by our implementation results which show an improvement by more than 12 % compared approaches which are both regular and constant time.

some information about the exponent: the computation time is correlated to the Hamming weight of the exponent, which is then leaked out.
A prerequisite to be SPA resistant is then to be regular and constant time. A first method which satisfies both of these property is the square-and-multiply-always exponentiation proposed by Coron [6]. Its principle is to always perform a multiplication after a squaring, i.e., if the bit k i = 0 then a dummy multiplication is performed. Another popular strategy is the Montgomery-ladder [7] which also performs an exponentiation through a regular sequence of squarings always followed by a multiplication.
We present in this paper an alternative approach for regular and constant time exponentiation x K mod N . Our method uses a multiplicative splitting of x into two halves. We modify the square-and-multiply algorithm as a regular sequence of squarings always followed by a multiplication with half-size integer. The half-size multiplications and squarings modulo N are computed with the method of Montgomery [8], we then also provide a version of the proposed exponentiation with Montgomery modular multiplications adapted to the size of the operands. We analyze the complexity of this approach: Table I, below, contains the basic cost per loop turn of an exponentiation. We notice that the proposed approach always reach the best complexity while having the higher security level compared to the best known method of the literature. Square-and-multiply 5t 2 + O(t) 5 2 t 2 + O(t) Multiply-always [5] 6t 2 + O(t) Square-always [5] 6t 2 + O(t) 3t 2 + O(t) Square-and-multiply-always [6] 7t 2 + O(t) 7 2 t 2 + O(t) Montgomery-ladder [7] 7t 2 + O(t) 7 2 t 2 + O(t) Montgomery-ladder with CM [9] 6t 2 + O(t) 3t 2 + O(t) Proposed approach 5t 2 + O(t) The remainder of the paper is organized as follows. Section II summarizes state of the art methods for regular modular exponentiation. In Section II-C we review techniques to compute a multiplicative splitting of an integer modulo N . In Section III we then present a new modular exponentiation algorithm which uses this splitting to render regular the square-and-multiply exponentiation. In Section IV, we present a version of the proposed exponentiation which incorporates Montgomery modular multiplications. Finally, in Section V, we evaluate the complexity of the proposed algorithm, provide implementations results and discuss security issues related to side channel analysis.

II. REVIEW OF REGULAR MODULAR EXPONENTIATION
We review in this section several methods for performing an exponentiation x K mod N . The simplest and the most popular method is the square-and-multiply exponentiation [10]. The bits of the exponent K are scanned from August 21, 2015 DRAFT left to right, for each bit a squaring is performed and is followed by a multiplication by x if the bit is equal to 1.
This method is detailed in Algorithm 1.

Algorithm 1 Square-and-multiply
Require: x ∈ {0, . . . , N − 1} and K = (k −1 , . . . , k 0 ) 2 The sequence of squarings and multiplications in the square-and-multiply method has some irregularities due to the irregular sequence of the bits k i equal to 1. This can be used to mount a side channel attack by monitoring the power consumption or the electromagnetic emanation of the circuit performing the computations. Indeed, if the monitored signal of a multiplication and a squaring have a different shape, then, we can directly read on the power trace the sequence of squarings and multiplications. If a trace of a multiplication appears between two subsequent squarings then we deduce that the corresponding bit is 1, otherwise it is 0.
This means that a secure implementation of modular exponentiation must be computed through a regular sequence of squarings and multiplications uncorrelated to the key bits.

A. Non-constant time regular exponentiation
We review in this subsection two methods which perform an exponentiation through a regular sequence of operation (squarings or multiplications). The first one is the multiply-always approach which performs all the squarings in Algorithm 1 as they were multiplication with distinct operands [5]. This approach is shown in Algorithm 2 and its cost is in average 3 2 multiplications. This multiply-always approach can be threaten by the attack of [4]: this attack differentiates a power trace of a multiplication r × r (i.e. a hidden square) by a multiplication r × x with x = r based on a difference of the Hamming weight of the output bits. To overcome this problem the authors in [5] use the fact that a multiplication can be performed with two squarings: They could then re-express all the multiplications of the square-and-multiply exponentiation in order to get a square-always exponentiation. This leads to Algorithm 3 which has a complexity of 2 squarings in average. if k i = 1 then if k i = 1 then Both multiply-always and square-always approaches suffer from a weakness: they do not process the exponentiation with a constant time. In terms of side channel analysis this means that the time of the computation leaks some information of the key: its Hamming weight. In the next subsection we review two approaches which are regular and also constant time.

B. Constant time regular exponentiation
The first method which satisfies this property is the square-and-multiply-always exponentiation proposed by Coron in [6]. The idea of Coron is to perform a dummy multiplication when we read a bit which is equal to 0. This results in a power trace of a regular sequence of traces of squarings always followed by a trace of a multiplication. This method is given in Algorithm 4.
The square-and-multiply-always exponentiation is effective to counteract SPA and SEMA along with timing attacks. But it is still under the threat of another kind of side channel attack: the fault injection attack [11], [12].
The idea of this attack is to inject an error during the i-th loop of the square-and-multiply-always algorithm. If the error is injected during a dummy multiplication it will not affect the final result and it would reveal a bit k i equal August 21, 2015 DRAFT Algorithm 4 Square-and-multiply-always [6] Require: to zero, otherwise the result will be erroneous and this will reveal a bit k i equal to one.
This problem was fixed by the Montgomery-ladder approach (Algorithm 5) for modular exponentiation [7]. In this method there are two integers r 0 and r 1 where r 0 contains the same value as r in the square-and-multiply algorithm, and r 1 satisfies r 1 = r 0 × x mod N during the whole computation. At each loop iteration we always perform a multiplication r 1−ki ← r 1 × r 0 and a squaring r ki ← r 2 ki depending on the value of the current scanned bit k i . The algorithm is regular: we have for each bit a multiplication and a squaring. It also satisfies the important property that any error injected in any intermediate value would affect the final results. This renders the error injection attack ineffective.

Remark 1.
There are some alternative methods in the literature insuring a regularity of the operation while reducing the number of multiplications. This is for example the case of the methods reported in [13] which use a regular windowing recoding of the exponent K. The drawback of those methods is that they require additional resources to store some precomputed data. In this paper we focus on methods which require at most one or two intermediate variables, and are thus suitable for embedded devices with limited resources and thus the most susceptible to be attack by side channel analysis.

C. Multiplicative splitting of an integer x modulo N
We consider an RSA modulus N and an integer x ∈ [0, N ] which corresponds to the message we want to decrypt or sign by computing x K mod N . We will show in this section that x can be split into two parts as follows In order to get a multiplicative splitting of x modulo N , we use the method presented in [14] which consists in a partial execution of the extended Euclidean algorithm. The Euclidean algorithm computes the greatest common divisor of x and N through a sequence of reductions: we start with r 0 = N, r 1 = x and perform the following The sequence r 0 , r 1 , . . . , r i is a decreasing sequence of positive integers and the last non zero r i satisfies r i = gcd(x, N ).
The extended Euclidean algorithm computes, in addition to gcd(x, N ), two integers a, b satisfying which is called a Bezout identity. In order to compute a and b the extended Euclidean algorithm maintains two sequences a i and b i satisfying where the integers r i , i = 0, 1, . . . , are the consecutive remainders in (3) computed in the Euclidean algorithm. The . . , are computed as follows starting from r 0 = N, r 1 = x and a 0 = 0, a 1 = 1 and b 0 = 1, b 1 = 0. Then, when r i is equal to gcd(x, N ) the identity (5) is a valid Bezout relation (4). For a detailed presentation of this method the reader may refer to [15].
August 21, 2015 DRAFT In order to obtain a multiplicative splitting of x, the authors in [14] stop the extended Euclidean algorithm when r i ∼ = N 1/2 and a i ∼ = N 1/2 : indeed, due to (5), for any i we have x = a −1 i r i mod N . This method to compute the splitting of an integer x is reviewed in Algorithm 6. Ensure: The following lemma asserts that the output a i and r i satisfy |a i |, |r i | < c. Lemma 1. Let c ∈ N such that c > N 1/2 and let a 0 , a 1 , . . . , a i , . . . and r 0 , r 1 , . . . , r i , . . . be the sequences computed in Algorithm 6. Then Algorithm 6 correctly outputs a pair a ic , r ic such that Proof. The proof is a direct extension of [14]. A well known property on extended Euclidean algorithm (cf. Chapter 3 in [15]) provides that, for i ≥ 1, we have |a i | < |a i+1 | and r i > 0 and also that So if r ic is the first remainder such that r ic < c we have r ic+1 ≥ c > √ N . Then taking i = i c + 1 in (7) we have If Algorithm 6 is executed with c = N 1/2 then the multiplicative splitting a ic , r ic output by the algorithm satisfies |a ic | < N 1/2 and |r ic | < N 1/2 .
In other words, it is a half-size splitting.
Complexity. For the sake of simplicity we will only give an upper bound of cost a the multiplicative splitting.
Specifically, since computing a multiplicative splitting consists in a partial execution of the extended Euclidean algorithm, we can bound above its cost with an upper bound of the complexity of an extended Euclidean algorithm.
We use the following lemma inspired from [15]. with c = 1), with two positive integers a ≤ b of w-bit word length t as input, requires at most 4wt 2 word additions.
Proof. We will consider a modified version of Algorithm 6: we assume that the quotients q i are of the form q i = 2 αi .
In other words, we expand the Euclidean division through several shift and subtraction operations. In this case, if we assume that the integers a i and r i in Algorithm 6 are stored on t words, each loop turns requires 2t words subtractions. Furthermore we have the following: This implies that r i < r0 2 i/2 = b 2 i/2 and consequently the number of loop iterations before we get r i = 0 is at most 2 log 2 (b) ≤ 2tw. Then at the end the total number of operations is at most 2tw × 2t = 4t 2 w word subtractions.

III. REGULAR EXPONENTIATION WITH HALF-SIZE MULTIPLICATIVE SPLITTING
Given a multiplicative splitting (2) of x into two half-size integers, we can modify the square-and-multiply method in order to distribute a full multiplication by x to one half-size multiplication by x 0 when k i = 0 and one half-size multiplication by x 1 when k i = 1. This approach is depicted in Algorithm 7. This algorithm reaches our goal since it is regular: each loop iteration is a squaring followed by a half-size multiplication. It is also robust against fault injection attack: each error in one half-size multiplication will affect the final result. The following lemma establishes the validity of Algorithm 7, i.e., that it correctly computes r = x K mod N .
August 21, 2015 DRAFT Lemma 3. Let K = (k −1 , . . . , k 0 ) 2 with k i ∈ {0, 1} be an bit integer and let N and x be two positive integers such that x < N . If we set K i = (k −1 , . . . , k i ) 2 , then the value of r after loop i satisfies: Proof. We prove the assertion by a decreasing induction on i: we assume it is true for i and we prove it for i − 1.
By induction hypothesis, r i the value of r after the execution of loop i in Algorithm 7 satisfies r i = x Ki × x −1 0 . Now if k i−1 = 1 the execution of loop i − 1 gives:

IV. EXPONENTIATION WITH HALF-SIZE SPLITTING AND MONTGOMERY MULTIPLICATION
An RSA modulus N looks like a random integer: it has not a sparse binary representation and has no other underlying structure which can be used to speed-up a reduction modulo N . The most used method to perform a multiplication modulo a random integer is the Montgomery method [8]. We modify Algorithm 7 in order to and z satisfies z = (xyM −1 ) mod N and z < 2N . In practice taking M = 2 n+1 with n = log 2 (N ) simplifies the reduction and the division by M . This method also applies for a squaring, i.e., x = y and, in the sequel this will be referred to as FMS for Full Montgomery Squaring. The proposed regular exponentiation which incorporates FMS and HMM is depicted in Algorithm 8.

V. COMPLEXITY COMPARISON AND SECURITY EVALUATION
In this section we first briefly the review word-level forms of Montgomery multiplication and squaring along with their complexities. We then deduce the complexity of the proposed exponentiation and compare it with the approaches reviewed in Section II.

A. Word level Montgomery multiplication and squaring
The proposed exponentiation in Algorithm 8 involves Montgomery modular squarings and multiplications with adapted sizes to the operands, i.e., of size either log 2 (N ) or log 2 (N )/2 bits. The subsequent word level form of Montgomery multiplication can take as input two integers of different sizes.
Word-level Montgomery multiplication. We consider two integers x = (x t−1 , . . . , x 0 ) 2 w where t = N/2 w and y = (y s−1 , . . . , y 0 ) 2 w with s = t or s = t/2 . The word level form of the Montgomery multiplication interleaves multi-precision multiplication and small Montgomery reduction by sequentially performing for i = 0, 1, . . . , s − 1: where z is initially set to 0 and, at the end, it is equal to x × y × 2 −sw mod N . This method is detailed in Algorithm 9.
The complexity of Algorithm 9 is evaluated step by step in Table II. The cost of each step is expressed in terms of the complexity of a t-word addition or of a 1 × t multiplication which costs t word multiplications and t word additions with carry.
Word level Montgomery squaring. The Montgomery squaring of a t-word integer x can be computed with the word-level Montgomery multiplication. However, a squaring can be optimized by considering that we may save some redundant word multiplications x i ·x j and x j ·x i . We review here the formulation of the Montgomery squaring provided in [9]. The squaring x 2 is rewritten as follows: August 21, 2015 DRAFT Algorithm 9 Word level Montgomery multiplication [16] Require: N < 2 wt−2 the modulus, w the word size, x = (x t−1 , . . . , x 0 ) 2 w and y = (y s−1 , . . . , y 0 ) 2 w integers in  Operations # word add. # word mul. s Step 3 Step 7 z − N t 0 Total s(4t + 2) + t s(2t + 1) The integer x i+j 2 wj ) can be deduced from x = 2x = (x t−1 , . . . , x 0 ) 2 w as With the formulation (8) the authors in [9] could derive a word level Montgomery squaring as shown in Algorithm 10.
The complexity of Algorithm 10 is evaluated step by step in Table III. Only the complexity evaluation of Step 5 needs to be detailed. We first notice that: We add the contributions of all loop iterations and we get i=0 (2t− 2i + 1) = t(t + 1) + t = t 2 + 2t word additions for t Step 5, as stated in Table III. August 21, 2015 DRAFT Algorithm 10 Word level Montgomery squaring [9] Require: N < 2 wt−2 the modulus, x, with x = (x t−1 , . . . , x 0 ) 2 w with 0 ≤ x i < 2 w where w is the word size, q ← |z| 2 w · N mod 2 w  Operations # word add. # word mul.
Step 1 x + x t 0 Step 10 z − N t 0 Total

B. Complexity comparison
Now, we can deduce the cost of a FMM, FMS and HMM from the complexity of the word-level Montgomery multiplication and squarings. Specifically, the cost of a FMS with M = 2 tw is the same as the one shown in Table III. To obtain the complexity of FMM with M = 2 tw we take s = t in the formula of Table II and to get the complexity of a HMM with m = 2 tw/2 we take s = t/2 in the formula of Table II. This leads to the complexities shown in the upper part of Table IV. Now, we deduce the cost of the following approaches for an bit exponent for a modular exponentiation: August 21, 2015 DRAFT • The square-and-multiplication exponentiation requires FMS and /2 FMM in average.
• The square-always exponentiation necessitates 2 FMS in average.
• The square-and-multiply-always and Montgomery-ladder exponentiation require FMS and FMM.
• The Montgomery-ladder exponentiation with common multiplicand [9]: this necessitates word level combined Montgomery multiplications AB, AC, which have a reduced complexity by sharing some of the reduction computations.

C. Implementation results
We have implemented the different approaches on an Intel Core i5 with C language and compiled with gcc-4.8.6. For modular multiplication and modular squaring we implemented Algorithm 10 and Algorithm 9 using low level functions of GMP library (cf. GMP 6.0.0, https://gmplib.org) for 1 × t multiplications and t-word additions. We could then implement all the exponentiation algorithms considered in this paper. The multiplicative splitting of our approach is implemented using the low level function of gmp for Euclidean division. The timings obtained for a few different practical bit lengths of N (i.e., 1020, 2040, 3050 and 4090) are reported in Table V. We notice that the reported timings relate closely to the complexity results shown in Table IV. Indeed, the fastest approach is the square-and-multiply exponentiation which is not protected against simple side channel analysis. Our approach is less than 7% slower than square-and-multiply for any key size but become close to 3% for 4090 bits.
But it is better than all other approaches: by 6% − 11% compared to the multiply-always approach, which is not entirely secure against SPA, and more than 16% compared to all other approaches.

D. Security evaluation
The proposed exponentiation algorithm prevent a simple power analysis (SPA) or a simple electromagnetic analysis (SEMA). We discuss here some additional features in order to have a full protection of the secret exponent against differential power analysis (DPA) [2]. This attack exploits the lack of randomness in the exponentiation. In the exponentiation algorithms considered in this paper, the value taken by r in the i-th loop depends on x and on the key bits of k , k −1 , . . . , k i of K. DPA uses the fact that if we can predict the next bit k i−1 we can predict the next value of r i−1 in the loop i − 1. With this prediction we also predict the power consumption of the next loop since it is generally proportional to the Hamming weight of r i−1 . In a DPA analysis, averaging over many power traces reveals if the guess is correct (a peak appears) or not and thus reveals the value of k i−1 .
The main strategies for protecting an implementation against this DPA attack are as follows: August 21, 2015 DRAFT • Randomization of the exponent K [6], [17], [18]. This leads to unpredictable values taken by r during the exponentiation. Different strategies have been proposed: the first one add to k a random multiple of φ(N ) = (p − 1)(q − 1) with β a random integer generally taken in [0, 2 20 ]. Another method consists to randomly chose β ∈ [0, 2 20 ] coprime with φ(N ) and compute β −1 mod φ(N ). The value of K is then randomized as The exponentiation is performed in two steps: we first compute r = x K mod N and then the final result r = r β mod N = x K mod N .
• Blinding of the message [6]. The idea is to mask x and thus makes it impossible to predict anything regarding the power trace related to x. We choose a random value ρ and we compute where K is the public exponent. The exponentiation ρ K mod N is effective when K is small which is often the case in practice. The exponentiation algorithm then computes We get the final result x K mod N by multiplying r by ρ −1 modulo N .
The above strategies can be used in combination of the proposed regular exponentiation with half-size multiplication. This provides an RSA exponentiation protected against the following set of side channel attacks: SPA, SEMA, DPA, ZPA [19] and safe-error fault-injection attack.

VI. CONCLUSION
We presented in this paper a new approach for regular modular exponentiation. We first introduced a multiplicative splitting of an integer x modulo N . We showed that this splitting can be used to modify the square-and-multiply algorithm in order to have a regular sequence of squarings always followed by a multiplication with a half-size integer. We then modified this algorithm in order to perform modular multiplication with the Montgomery's method.
Compared to the usual regular and constant time modular exponentiations, the proposed method involves only multiplication by half-size integer instead of a full multiplication. This leads to a reduction of the complexity by 16%.