Trade-off Approaches for Leak Resistant Modular Arithmetic in RNS

On an embedded device, an implementation of cryptographic operation, like an RSA modular exponentiation [12], can be attacked by side channel analysis. In particular, recent improvements on horizontal power analysis [3,10] render ineﬀective the usual counter-measures which randomize the data at the very beginning of the computations [4,2]. To counteract horizontal analysis it is necessary to randomize the computations all along the exponentiation. The leak resistant arithmetic (LRA) proposed in [1] implements modular arithmetic in residue number sys-tem (RNS) and randomizes the computations by randomly changing the RNS bases. We propose in this paper a variant of the LRA in RNS: we propose to change only one or a few moduli of the RNS basis. This reduces the cost of the randomization and makes it possible to be executed at each loop of a modular exponentiation.


Introduction
Nowadays, the RSA cryptosystem [12] is constantly used in e-commerce and credit card transactions.The main operation in RSA protocols is an exponentiation x K mod N where N is a product of two primes N = pq.The secret data are the two prime factors of N and the private exponent K used to decrypt or sign a message.The actual recommended size for N is around 2000-4000 bits to insure the intractability of the factorization of N .The basic approach to perform efficiently the modular exponentiation is the square-and-multiply algorithm: it scans the bits k i of the exponent K and performs a sequence of squarings followed by a multiplication only when k i is equal to one.Thus the cryptographic operations are quite costly since they involve a few thousands of multiplications or squarings modulo a large integer N .
A cryptographic computation performed on an embedded device can be threaten by side channel analysis.These attacks monitor power consumption or electromagnetic emanation leaked by the device to extract the secret data.The simplest attack is the simple power analysis (SPA) [8] which applies when the power trace of a modular squaring and a modular multiplication are different.This makes it possible to read the sequence of operations on the power trace of an exponentiation and then derive the key bits of the exponent.This attack is easily overcome by using an exponentiation algorithm like the Montgomeryladder [6] which render the sequence of operation uncorrelated to the key bits.A more powerful attack, the differential power analysis (DPA) [8], makes this counter-measure against SPA inefficient.Specifically, DPA uses a large number of traces and correlate the intermediate values with the power trace: it then track the intermediate value all along the computation and then guess the bits of the exponent.Coron in [4] has shown that the exponentiation can be protected from DPA by randomizing the exponent and by blinding the integer x.Recently the horizontal attacks presented in [13,3] require only one power trace of an exponentiation, and threaten implementations which are protected against SPA and DPA with the method of Coron [4].The authors in [3] explains that the best approach to counteract horizontal attack is to randomize the computations all along the exponentiation.
One popular approach to randomize modular arithmetic is the leak-resistant approach presented in [1] based on residue number system (RNS).Indeed, in [1], the authors noticed that the mask induced by Montgomery modular multiplication can be randomized in RNS by permuting the moduli of the RNS bases.In this paper we investigate an alternative method to perform this permutation of bases.Our method changes only one modulus at a time.We provide formula for this kind of randomization along with the required updates of the constants involved in RNS computations.The complexity analysis shows that this approach can be advantageous for a lower level of randomization compared to [1].In other words this provides a trade-off between efficiency and randomization.
The remainder of the paper is organized as follows.In Section 2 we review modular exponentiation methods and modular arithmetic in RNS.We then recall in Section 3 the leak resistant arithmetic in RNS of [1].In Sections 4 and Appendix A we present our methods for randomizing the modular arithmetic in RNS.We then conclude the paper in Section 5 by a complexity comparison and some concluding remarks.

Modular exponentiation
The basic operation in RSA protocols is the modular exponentiation: given an RSA modulus N , an exponent K and a message x ∈ {0, 1, . . ., N −1}, a modular exponentiation consists to compute This exponentiation can be performed efficiently with the square-and-multiply algorithm.This method scans the bits k i of the exponent K = (k −1 , . . ., k 0 ) 2 from left to right and performs a sequence of squarings followed by multiplications by x if the bit k i = 1 as follows: The complexity of this approach is, in average, squarings and /2 multiplications.
Koche et al. in [8] showed that the square-and-multiply exponentiation is weak against simple power analysis.Indeed, if a squaring and a multiplication have different power traces, an eavesdropper can read on the trace of a modular exponentiation the exact sequence of squarings and multiplications, and then deduce the corresponding bits of K.It is thus recommended to perform an exponentiation using, for example, the Montgomery-ladder [6] which computes x K mod N through a regular sequence of squarings and multiplications.This method is detailed in Algorithm 1.The regularity of the exponentiation prevents an attacker to directly read the key bits on a single trace.
Algorithm 1 Montgomery-ladder [6] Require: x ∈ {0, . . ., N − 1} and K = (k −1 , . . ., k0)2 Some more sophisticated attacks can threaten a naive implementation of Montgomery-ladder exponentiation.For example differential power analysis [8] makes it necessary to randomize the exponent and blind the integer x by random mask as explained in [4].Horizontal approaches [13,3] are even more powerful since they require only a single trace to complete the attack and is effective even if the exponent K is masked and the data x is blinded.The authors in [3] propose to counteract horizontal power analysis by randomizing each multiplication and squaring using some temporary mask.In this paper we deal with the problem of randomizing modular multiplications and squarings: we will use the residue number system (RNS) to represent integers and perform efficiently modular operations.

Montgomery multiplication in RNS
Let N be a modulus and let x, y be two integers such that 0 ≤ x, y < N .One of the most used methods to perform modular multiplication x × y mod N is the method of Montgomery in [9].This approach avoids Euclidean division as follows: it uses an integer A such that A > N and gcd(A, N ) = 1 and computes z = xyA −1 mod N as follows: To check the validity of the above method we notice that (xy +qN ) mod A = 0, this means that the division by A is exact in the computation of z and then z = xyA −1 mod N .The integer z is almost reduced modulo N since z = (xy + qN )/A < (N 2 + AN )/A < 2N : if z > N , with a single subtraction of N we can have z < N .In practice the integer A is often taken as a power of 2 in order to have almost free reduction and division by A.
For a long sequence of multiplications, the use of the so-called Montgomery representation is used Indeed, the Montgomery multiplication applied to x and y output z = xyA mod N , i.e., the Montgomery representation of the product of x and y.
Residue number system.In [11] the authors showed that the use of residue number system (RNS) makes it possible to perform Montgomery multiplication efficiently with an alternative choice for A. Let a 1 , . . ., a t be t coprime integers.
In the residue number system an integer x such that 0 ≤ x < A = i=1 a i is represented by the t residues Moreover, x can be recovered from its RNS expression using the Chinese remainder theorem (CRT) as follows where A i = t j=1,j =i a i and the brackets [ • ] ai denotes a reduction modulo a i .The set A = {a 1 , . . ., a t } is generally called an RNS basis.
Let x = (x 1 , . . ., x t ) A and y = (y 1 , . . ., y t ) A be two integers given in an RNS basis A. Then, the CRT provides that an integer addition x + y or multiplication x × y in RNS consists in t independent additions/multiplications modulo a i The main advantage is that these operations can be implemented in parallel since each operation modulo a i are independent from the others.Only comparisons and Euclidean divisions are not easy to perform in RNS and require partial reconstruction of the integers x and y.
Montgomery multiplication in RNS.In [11] Posch and Posch notice that the Montgomery multiplication can be efficiently implemented in RNS: they use the fact that we can modify the second step of the Montgomery multiplication ( 1 Algorithm 2 Basic-MM-RNS(x, y, A, B) The second and fourth steps are necessary since if we want to compute z ← (xy +qN )A −1 mod B in B we need to convert the RNS representation of q from the basis A to the basis B: the base extension (BE) performs this conversion.The fourth step is also necessary to have z represented in both bases A and B.
Base extension.This is the most costly step in the RNS version of the Montgomery multiplication (Algorithm 2).We review the best known method to perform such RNS base extension.Let x = (x 1 , . . ., x t ) A be the representation of an integer x in the RNS basis A, the CRT 3 reconstructs x as follows: The correcting term −αA corresponds to the reduction modulo A in (3).We get the RNS representation [x] bj for j = 1, . . ., t of x in B by simply reducing modulo b j the expression in (5): We give some details on how to perform the above computations.
• Computations of x * bj .If the constants [A i ] bj are precomputed then x * bj for j = 1, . . ., t can be computed as There is an alternative method proposed by Garner in [5] which computes x * bj , but we will not use it in this paper, so we do not recall it here.The reader may refer to [5] to further details on this method.
• Computations of α.The base extension in (6) necessitates also to compute α.We arrange (5) as follows since when 0 < x < A we have 0 < x/A < 1.
The MM-RNS algorithm.Following [7] we inject in Algorithm 2 the formulas ( 4), ( 5) and ( 7) corresponding to the computations of the base extensions.We obtain the Montgomery multiplication in RNS (MM-RNS) shown in Algorithm 3 after some modifications.Specifically, the base extension of q and the computation of z are merged as follows In the second base extension BE B→A we rewrite [B j ] ai = [b −1 j B] ai .The complexity of each step of the MM-RNS algorithm is given in terms of the number of additions and multiplications modulo a i or b i .These complexities are detailed in Table 1.For the computation of α and β we assume that each a i and b i can be approximated by 2 w which simplifies the computations in Step 6 and Step 10 as a sequence of additions (cf.[7] for further details).Constants used in MM-RNS.In Algorithm 3, an important number of constants take part of the computations: Algorithm 3 MM-RNS(x, y, A, B) Require: x, y in A ∪ B for two RNS bases A = {a1, . . ., at} and B = {b1, . . ., bt} s.t.

Leak resistant arithmetic in RNS
The authors in [1] notice that the use of RNS facilitates the randomization of the representation of an integer and consequently the randomization of a modular multiplication.Indeed, if a modular exponentiation x K mod N is computed with MM-RNS the element is set in Montgomery representation and in the RNS bases A and B, i.e., [ x] A∪B .The Montgomery representation induces a multiplicative masking of the data x by the factor A. The authors in [1] propose to randomly construct the basis A to get a random multiplicative mask A on the data.Specifically, the authors in [1] propose two levels of such randomization: random initialization of the bases A and B at the very beginning of a modular exponentiation and random permutations of RNS bases A and B all along the modular exponentiation.
Random initialization of the bases A and B and x.We assume that we have a set of 2t moduli M = {m 1 , . . ., m 2t }.At the beginning of the computations we randomly set Note that we always have A ∪ B = M. Then the input of x of the modular exponentiation algorithm is first set in the residue number system M = A ∪ B by reducing x modulo each a i and b i Then we need to compute the Montgomery representation [ x] A∪B from [x] A∪B .The authors in [1] give a method which simplifies this computation.They assume that the RNS representation of A∪B by a single MM-RNS with bases B and A in reverse order: The output of this multiplication is the expected value: Random change of the bases A and B. The authors in [1] propose to change the bases A and B during the RSA exponentiation as follows: The bases A and B change all along the exponentiation, this implies to perform the base extension (BE) in MM-RNS using the approach of Garner [5]

Random update of the RNS bases with a set of spare moduli
In this section, our goal is to provide a cheaper variant of the leak resistant arithmetic in RNS proposed in [1] and reviewed in Section 3.

Proposed update of the bases and Montgomery representation
We present a first strategy which modifies only one modulus in A while keeping B unchanged during each update of the RNS bases.We need an additional set A of spare moduli where we randomly pick the new modulus for A. We have three sets of moduli: • The first RNS basis A = {a 1 , . . ., a t+1 } which is modified after each loop iteration.• The set A = {a 1 , . . ., a t+1 } of spare moduli.
• The second RNS basis B = {b 1 , . . ., b t+1 } which is fixed at the beginning of the exponentiation.
The integers a i , b i and a i are all pairwise co-prime and are all of the form 2 w − µ i where w is the same for all moduli and µ i < 2 w/2 .We will state later in Subsection 4.2 how large A and B have to be compared to N to render the proposed approach effective.But to give an insight A and B are roughly w-bits larger than N which means that the considered RNS bases contain t + 1 moduli.
Update of the base A. Updating the basis A is quite simple: we just swap one element of A with one element of A as follows In the sequel we will denote A old and A new the base A before and after the update, we will use similar notation for other updated data.Lemma 1.We consider two RNS bases A and B and let A be the set of spare moduli.We consider an integer x modulo N given by its RNS-Montgomery representation

Update of the
and satisfies Proof.We first notice that s 1 = x old + λN satisfies s 1 ≡ x old mod N and that In other words s 1 can be divided by a r,old and then multiplied by a r,new x new = (( x old + λN )/a r,old )a r,new .
which satisfies The value of x new is computed in the RNS basis A new ∪ B by replacing the division by a r,old by a multiplication by a −1 r,old and by noticing that its value modulo a r,new is equal to 0. This leads to (11).
Update of the constants.If we want to apply MM-RNS (Algorithm 3) after the update of the basis A and the Montgomery-RNS representation of x, we need also to update the constants involved in Algorithm 3. The constants considered are the one listed in (8) along with the following additional set of constants associated to the set of moduli A : . .,t+1.These constants are updated as follows: • Constant N .The constants [N ] bi , i = 1, . . ., t + 1 do not change when the base A is updated.
r,new × a r,old ] bi for i = r , and the two remaining special cases are: r ] a r,old × a −1 r,new ] a r,old (Note that a r,old = a r ,new ). • The constants which evolve are the ones corresponding to a r and a r and require only swaps: ) for j = 1, . . ., t + 1 and j = r, swap([a −1 j ] ar , [a −1 j ] a r ) for j = 1, . . ., t + 1 and j = r .Complexity of the updates.We evaluate the complexity of the above random change of the basis A: the update of x and the update of the constants.We do not consider swap operations since they do not require any computations.The cost of the update of the Montgomery representation x of x contributes to 6t + 4 multiplications and 2t + 1 additions and the contribution of the update of the constants is equal to 6t + 2 multiplications.Now, we establish that the above algorithm correctly outputs the expected result x K mod N .Indeed, during the execution of the algorithm an overflow could occur: some data could become larger than A or B. To show that no overflow occurs we first establish the growing factor produced by an update of the Montgomery representation.Lemma 2. Let A old , A old , B old , A new , A new and B new be the new and the old RNS bases.Let a imax,old the largest modulus in A old and a imax,new the largest modulus in A new .Assume that x old < N a imax,old and let a r and a r be the two moduli swapped in A and A .Then we have Proof.From Lemma 1 we have the following expression of x new We then notice that λ < a imax,old .We use the fact that x old < N a imax,old and we expand the product in (12), this gives: We then use that a i = 2 w − µ i with 0 ≤ µ i < 2 w/2 , which implies that for any i, j 0 In particular for i = i max and j = r we have 0 < a imax,old ar < 2. We use this to arrange (13) as follows: Knowing the growing factor induced by the update of the Montgomery representation helps us to state a sufficient condition to prevent an overflow in Algorithm 5. Lemma 3. Let A min be the product of the t smallest moduli in A∪A .Let a imax be the largest modulus of A. If N satisfies and if B is larger than any A then the following assertions hold: i) The data r 0 and r 1 in Algorithm 5 are < N a imax at the end of each loop.ii) Algorithm 5 correctly computes r 0 = x K mod N .
Proof.i) Let us prove that an update of the base A followed by a modular multiplication with MM-RNS keeps the data in the interval [0, N a imax ].We consider x old < N a imax,old and y old < N a imax,old .Then, from Lemma 2, we know that the updates on x old and y old provide: and consequently z < a imax,new N , as required.ii) At the beginning of each loop r 0 and r 1 are in [0, N a imax ] then, from i), they are in [0, N a imax ] at the end of the loop.Consequently all the computations in the algorithm are done without overflow and which then correctly outputs r 0 = x K mod N .

Complexity comparison and conclusion
In Appendix A we present a variant of the proposed randomization.This variant avoids the use of the set of spare moduli A : the modified modulus in A is randomly picked in B. The complexity of the update of the RNS bases A, B and the update of the Montgomery representation are sightly larger compared to the approach of Section 4, but the memory requirement is reduced and the number of moduli is also reduced.
In Table 3 we report the complexity of the randomization in the Montgomeryladder exponentiation for the two following cases: 1.Only one modulus is modified in the basis A. In this case, for each loop turn, the proposed approach in Section 4 and Appendix A requires an update of the constant and an update of r 0 and r 1 as shown in Algorithm 5. 2. s moduli are modified in A. At each loop turn, we perform s consecutive updates of the RNS bases A, A and the data following the strategy of Section 4: this requires s updates of the constants and s updates of r 0 and r 1 .
In this case, since an update of r i multiply by 4 (cf.Lemma 2) at the end of the s the two data r 0 and r 1 are multiplied by 4 s .This requires to expand the three bases A, A and B with an additional modulus assuming that 2 × 4 2s < 2 w in order to prevent an overflow in Algorithm 5.The resulting complexity of this randomization is given in Table 3.
For comparison purpose we provide in Table 3 the complexity when the randomization of [1] is performed at each loop turn in a Montgomery ladder.The complexity can be easily deduced from the complexity results of Section 3. The above complexities show that we get a cheaper randomization by changing only one modulus, at a cost of a lower level of randomization.We can increase this level by changing more than one modulus at each loop turn, resulting in a trade-off between randomization and complexity.For the average randomization of s = t/2 moduli changed per loop turn, our method requires 6t 2 + O(t) multiplications and 2t 2 + 3t additions: this is better than the complexity of [1].Another advantage of our technique is that it works in the cox-rower architecture [7] which is the most popular architecture for RNS implementation.
) as z ← (xy + qN )A −1 mod B where B is an integer coprime with A and N and greater than 2N .Furthermore, Posch and Posch propose to perform this modified version of the Montgomery multiplication in RNS.Specifically, they choose two RNS bases A = (a 1 , . . ., a t ) and B = (b 1 , . . ., b t ) such that gcd(a i , b j ) = 1 for all i, j.They perform z = xyA −1 mod N as it is shown in Algorithm 2: the multiplications modulo A are done in the RNS basis A and the operations modulo B are done in B.
Only, the constants [B−1 ] ai , [B −1 i ] bi , [A] bi and [A −1 i ]ai are susceptible to change and to be updated during the run of a modular exponentiation if the bases A and B are modified.
Montgomery-RNS representation.The modification of the basis requires at the same time the corresponding update of the Montgomery representation of x.Indeed we need to compute x new = [(xA new mod N )] Anew∪B from its old Montgomery representation x old = [(xA old mod N )] A old ∪B .The following lemma establishes how to perform this update.
ai and [B] a i .The constants [B i ] bi and [B] ai for i = r and [B] a i for i = r are not affected by the modification on A. The only required modification is the following swap

Table 1 .
t Complexity of MM-RNS instead of the CRT formula.Otherwise the constants A i and B i would have to be updated which can be expensive.The update of the bases A and B implies to also update the Montgomery representation x = x×A old mod N of x from the old bases A old ∪B old to the new Algorithm 4 Update of x Require: x old and Anew, Bnew, A old , B old and [M mod N ]M Ensure: xnew 1: temp ← MM-RNS( x old , (M mod N ), Bnew, Anew) 2: xnew ← MM-RNS(temp, 1, A old , B old ) representation x = x×A new mod N in the new bases A new ∪B new .The proposed approach in [1] consists in two modular multiplications (cf.Algorithm 4).We can easily check the validity of Algorithm 4: Step 1 computes temp = (xA old ) × (A new × B new ) × B −1 new mod N and Step 2 correctly computes x new = (xA old A new ) × A −1 old mod N = xA new mod N in the required RNS bases.The main drawback of this technique is that it is a bit costly: it requires two MM-RNS multiplications to perform the change of RNS representation.Consequently, using Table 1, we deduce that the amount of computation involved in this approach is as follows r and r be two random integers in {1, . . ., t, t + 1} and A new and A new be the two set of moduli obtained after exchanging a r,old et a r ,old .Then the new Montgomery-RNS representation of x in A new ∪ B can be computed as follows:

Table 2 .
Complexity of the updates when using a set of spare moduliWe present the modified version of the Montgomery-ladder: compared to the original Montgomery-ladder (Algorithm 1), this version inserts an update of the RNS bases and related constants along with an update of the data at the beginning of the loop iteration.This approach is shown in Algorithm 5.For the sake of simplicity, conversions between RNS and regular integer representation are skipped.
new and y new < 4N a imax,new .If we execute an MM-RNS algorithm with inputs x new , y new and bases A new and B we obtain a z satisfying z = ( x new × y new + qN )/A new Anew ai max ,new is the product of t moduli of A ∪ A it satisfies A min ≤

Table 3 .
Cost of the randomization in one loop iteration of randomizedMontgomeryladder