# **RESEARCH ARTICLE**

**OPEN ACCESS** 

# **"Error Analysis and Detection Procedures for a Hardware Implementation of the Advanced Encryption Standard"**

# Rupashree Sahu, Subhajit Pasa

Gandhi Institute of Excellent Technocrats, Bhubaneswar, India Modern Institute of Technology and Management, Bhubaneswar, Odisha, India

# ABSTRACT-

In order to prevent the Advanced Encryption Standard (AES) from suffering from differential fault attacks, the technique of error detection can be adopted to detect the errors during encryption or decryption and then to provide the information for taking further action, such as interrupting the AES process or redoing the process. Because errors occur within a function, it is not easy to predict the output. Therefore, general error control codes are not suited for AES operations. In this work, several error-detection schemes have been proposed. These schemes are based on the ðn þ1;nÞ cyclic redundancy check (CRC) over GFð2<sup>8</sup>Þ, where n 2 f4;8;16g. Because of the good algebraic properties of AES, specifically the MixColumns operation, these error detection schemes are suitable for AES and efficient for the hardware implementation; they may be designed using round-level, operation-level, or algorithm-level detection. The proposed schemes have high fault coverage. In addition, the schemes proposed are scalable and symmetrical. The scalability makes these schemes suitable for an AES circuit implemented in 8-bit, 32-bit, or 128-bit architecture. Symmetry also benefits the implementation of the proposed schemes to achieve that the encryption process and the decryption process can share the same error detection hardware. These schemes are also suitable for encryption-only or decryption-only cases. Error detection for the key schedule in AES is also proposed and is based on the derived results in the data procedure of AES.

Index Terms—Advanced encryption standard, error control code, CRC, differential fault attacks.

## I. INTRODUCTION

т HE Advanced Encryption Standard (AES) [10], the successor to the Data Encryption Standard (DES), was finalized in October 2000 by the US National Institute of Standards and Technology (NIST), when the Rijndael algorithm [12] was adopted. The data block size of AES is 128-bit and the key size can be 128-bit, 192-bit, or 256-bit. In AES, although the data block is 128-bit, all operations are byte-oriented over GFð2Þ or GFð2<sup>8</sup>Þ. Therefore, several kinds of AES implementations have been discussed. In general, three main types of AES implementations have been discussed, 8-bit, 32-bit, or 128-bit architecture. Each architecture has its own applications. Feldhofer et al. [6] designed an 8-bit AES chip to provide security for radio frequency identification (RFID). Satoh et al. [13] introduced a 32-bit implementation of AES. Mangard et al. [9] proposed a scalable architecture for AES, which could process 128-bit data or 32-bit data, depending on the number of Sbox.

The hardware implementation of AES would be countered by some side-channel attacks, such as Differential Fault Attacks (DFA) or Differential Power Analysis (DPA). Differential fault attacks was originally proposed by Biham and Shamir [4]. Theses side-channel attacks actually threaten the security of several cryptosystems because they are practical for a

crypto module. The idea of DFA is to apply the differential attacks to a crypto module or a crypto chip. The cryptanalyst injects errors by using microwave or ionizing techniques during the encryption or decryption process. These errors cause the encryption results to differ from the correct results; hence, the cryptanalyst will receive the difference of outputs. Therefore, such differential attacks may be carried out in the real world. Dusart et al. [5] broke the 128-bit AES under the assumption that you can physically modify the hardware AES device. This attack required 34 pairs of differential inputs and outputs to obtain the final round key. Piret and Quisquater [11] broke AES with two erroneous ciphertext under the assumption that the errors occur between the antepenultimate and the penultimate MixColumns.

To avoid the possibility of suffering such attacks, error detection can be considered while implementing a cipher. In 2002, Karri et al. [7] proposed a general error detection method, called concurrent error detection (CED), for several symmetric block ciphers including RC6, MARS, Serpent, Twofish, and Rijndael. CED requires an inverse operation to check whether errors have occurred in calculations or not and has three levels: the operation level, the round level, and the algorithm level. Taking an operation-level CED in AES as an example, the InvSubBytes is required to detect the errors occurring in SubBytes and vice versa. This method has very high fault coverage, but it is timeconsuming and high hardware cost because inverse operations are required. In 2003, Karri et al. [8] proposed a paritybased detection technique for general substitution-permutation block ciphers. However, the size of the table, required by the substitution box, is enlarged. In addition, the paper did not address the error detection techniques for some specific functions, such as MixColumns in AES. In 2004, Wu et al. [14] applied the structure of [8] to AES and used

Published by the IEEE Computer Society

one-bit parity for a 128-bit data block. The method of Wu et al. [14] can let the parity pass through the MixColumns. Bertoni et al. [1] used an error detection code of 16bit parity for a 128-bit data block. To be precise, this approach uses one-bit parity for each byte and, thus, can detect all single errors and perhaps all odd errors. In [2], Bertoni et al. used the error detection scheme in [1] not only to detect errors but also to locate errors. In 2004, Bertoni et al. [3] implemented the model proposed in [2]. The introduction of the mode into AES brought the performance 18 percent overhead of area and 26 percent decreasing of throughput. According to the results given in [1], their approach was able to detect most cases of multiple faults. However, this approach asymmetrical, between MixColumns is and InvMixColumns, because the parity prediction of InvMixColumns is more complex than that of MixColumns. Therefore, two circuits are required to predict the parity while merging the encryption and the decryption. Besides, the detection technique for SubBytes doubled the table size of SubBytes in AES, from 256 to 512 bytes. In addition, it cannot be easily applied to an AES implementation of 8-bit architecture because the parity prediction of MixColumns (InvMixColumns) requires information from other bytes and other parities.

This work proposes several error-detection schemes for AES. They are based on the ðn þ 1;nÞcyclic redundancy check (CRC) over GFð2<sup>8</sup>Þ, where n 2 f4;8;16g is the number of bytes contained in the message. The proposed schemes easily predict the parity of an operation's output. Because AES is byteoriented and its constants are ingeniously designed, the parity of the output can be predicted from a linear combination of the parity of the input. In most cases,

where  $s_i$  denotes the ith byte of the data block. In this context, S denotes the input of an operation and T denotes the output. AES is operated in two fields,

the parity is the summation of the input data; also, the proposed schemes are highly scalable and are suitable for 8-bit, 32-bit, or 128-bit architecture. This is important because many AES designs are in an AES hardware designed as either 8-bit or 32-bit architecture. Another advantage of the proposed approaches is that the parity calculation between the encryption and the decryption is symmetric because the parity generation in encryption is quite similar to the one in decryption. This will bring some benefits while integrating encryption and decryption into one circuit.

This paper is organized as follows: In Section 2, the AES algorithm is briefly described and the notations used throughout are defined. In Section 3, our proposed error detection schemes for AES are described. Derivation of error detection for each operation, including SubBytes, ShiftRows, MixColumns, and AddRoundKey, is explained, as well as the design of the key schedule. The undetectable errors of each proposed method are theoretically analyzed in Section 4, while, in Section 5, the realization issues of three levels, operation level, round level, and algorithm level, are described. In Section 6, advantages and comparisons between this work and other research studies are discussed and, in Section 7, the detection capability of each scheme is simulated. Finally, our conclusions are offered in Section 8.

## AES ALGORITHM

The AES [10] consists of two parts, the data procedure and the key schedule. The data procedure is the main body of the encryption (decryption) and of consists four operations, (Inv)SubBytes, (Inv)ShiftRows, (Inv)MixColumns, and (Inv)AddRoundKey. During encryption, these four operations are executed in a specific order-AddRoundKey, a number of rounds, and then the final round. The number of rounds is 10, 12, or 14, respectively, for a key size of 128 bits, 192 bits, or 256 bits. Each round is comprised of the four operations and the final round has SubBytes, ShiftRows, and AddRoundKey. The decryption flow is simply the reverse of the encryption, and each operation is the inverse of the corresponding one in encryption. In the data procedure, the 16-byte (128bit) data block is rearranged as a 4 4 matrix, called state S.

ð1Þ

GF $\partial$ 2P and GF $\partial$ 2<sup>8</sup>P. In GF $\partial$ 2P, addition is denoted by , and multiplication is denoted by . Similarly, the two

symbols, b and , denote addition and multiplication in GFð2<sup>8</sup>Þ.

2.1 SubBytes

Two calculations, the GFð2<sup>8</sup>P inversion and the affine transformation, are involved in this operation. SubBytes substitutes each byte siof the data block by ð2Þ

 $t_{i}^{1}/_{4} As_{i}^{1} b 63$ :

wheres; <sup>1</sup> is the inverse of the input byte, s<sub>i</sub>2 GF $\partial$ <sup>8</sup>P, A is an 8 8 circulant matrix of a constant row vector

<sup>1</sup>/<sub>2</sub>1 0 0 0 1 1 1 1 1 1 1 1 over GFð2Þ, and 63 (the Courier font number representing a hexadecimal value in this paper) belongs to GFð2<sup>8</sup>Þ. As<sub>i</sub><sup>1</sup> is a matrix-vector multiplication over GFð2Þ.

2.2 ShiftRows

The ShiftRows operation only changes the byte position in the state. It rotates each row with different offsets to obtain a new state as follows:

2 s0 s4s8 s12 3 2 s0 s4 s8 s12 3 775ShiftRows!664ss105 664 ss12ss56ss109ss1314 ss149ss132ss16 775: ð3Þ

s3s7s11s15 s15

s3 s7 s11 The first row is unchanged, the second row is left circular shifted by one, the third row is by two, and the last row is by three.

2.3 MixColumns

The MixColumns operation mixes every consecutive four bytes of the state to obtain four new bytes as follows:



Fig. 1.The block diagram of key expansion in AES.

| 2 s0                  | 3 2                                      | 3                                         |
|-----------------------|------------------------------------------|-------------------------------------------|
| 6 <sup>6</sup> 4      | s12 t0                                   | $s^{s}_{s}^{13}14$ t12 t13                |
| s <sub>s</sub> 12     | s <sub>8</sub> s <sub>9</sub> 775MixColu | $mns^{1}664$ $t_{8}$ $t_{9}775_{1}$       |
| <b>S</b> <sub>3</sub> | $s_4 s_5 s_{10} t^{t 1} t^2$             | t <sub>4</sub> t <sub>5</sub> t10 t14 t15 |
|                       | s <sub>6</sub> s <sub>7</sub> s11 s15 t3 | t <sub>6</sub> t <sub>7</sub> t11 ð4Þ     |

Let s<sub>i</sub>, s<sub>ib1</sub>, s<sub>ib2</sub>, and s<sub>ib3</sub> represent every consecutive four bytes, where i 2 f0;4;8;12g. Then, the four bytes are transformed by

> 3 3  $t_i \quad 02 \quad 03 \quad 01 \quad 01 \quad s_i$ 664 ttiiþþ12 775 ¼ 664 01 02 03 0101 01 02 ð5b 03 757466 ssiiþþ12 775: tiþ3 03 01 01 02 siþ3

Each entry of the constant matrix in (5) belongs to GFð2<sup>8</sup>Þ, hence (5) is a matrix-vector multiplication over GFð2<sup>8</sup>Þ.

k<sub>i</sub>as (1); the AddRoundKey operation is simply an addition,

 $t_i^{1/4} s_i b k_i$ ; where 0 i 15:  $\delta b b$ 

2.4 AddRoundKeyand Key Expansion Each round has a 128-bit round key which is segmented into 16 bytes

The key expansion expands a unique private key as a key stream of ð4r þ 4Þ 32-bit words, where r is 10, 12,

www.ijera.com

--- some operations

or 14. The private key is segmented into Nk words according to the key length, where NK is 4, 6, or 8 for a 128-bit, 192-bit, or 256-bit cipherkey, respectively. AsFig. 1 shows, then, it generatest heith word (32 bits) by EXORing the ðiNkÞth word with either the ði1Þth word or the conditionally transformed ði1Þth word, where NK i ð4r þ 3Þ. The ði1Þth word is conditionally transformed by RotWord, SubBytes and EXORing with  $Rcon^{1}/2i=Nk^{1/4}$  $f02^{bi=Nkc}$ ;00;00;00g, where the polynomial presentation of  $02^{bi=Nkc}$  is  $x^{bi=Nkc}$  over  $GF\delta 2^{8}P$ . Finally, the key stream is segmented into several round keys which are involved in the AddRoundKey operation.



Fig. 2.The error model assumed in this work. The solid line part appears in every operation and the dotted line part appears in some operations.

## ERROR DETECTION TECHNIQUES

The parts in decryption can be yielded in a similar way; hence, the following context only addresses the error detection in encryption. The differential faults attacks need differential inputs and outputs to attack a cryptosystem; hence, it is assumed that the states and round keys are polluted by additive errors, as shown in Fig. 2. In this work, one operation is the smallest granule for designing error detection. In Fig. 2, the errors are assumed to be induced between the previous operation and the current operation. If the errors occur in the output of the previous operation, the erroneous input of the current operation will be treated as a different state. Actually, this situation only exists in the first round or in the first operation. The assumed error model is logical, even in the case where the errors occur during the operation. Because each operation of AES is invertible, one unique error block e would exist for an erroneous output T such that T 1/4 fðS þ eÞ, where f denotes any operation in AES.

This paper adopts a systematic ðn þ 1;nÞcyclic redundancy check (CRC) over GFð2<sup>8</sup>Þ to detect errors i<sup>1</sup>/<sub>4</sub>0

occurring duringencryption, wheren 2 f4;8;16gisthenumberofbytes contained in the message. The generator polynomial is

gðxÞ ¼ 1 þ x; ð7Þ

where the coefficients of (7) are over  $GF\delta 2^{8}P$ . Giving a message  $s\delta xPof$  degree n1, a systematic codeword, generated by  $g\delta xP$ , can be obtained from the following two steps:

1. Obtain the remainder  $p\delta x P from$  dividing  $xs\delta x P by$  the generator polynomial  $g\delta x P$ . The remainder  $p\delta x P is$  a scalar p here because the degree of  $g\delta x P is$  one.

2. Combine pðxÞand xsðxÞto obtain the codeword polynomial,

pðxÞ þ xsðxÞ ¼ p þ  $s_0x$  þ  $s_1x^2$  þ þ  $s_{n1}x^n$ ; ð8Þ  $_8$  wherep; $s_i2$  GFð2 Þ:

In Step 1, while gðxÞis 1 þ x, the remaining pðxÞis the summation of all coefficients of the message,

 $p\partial x P \frac{1}{4}$  s<sub>i</sub>:  $\partial 9 P$ 



Xn

Fig. 3. The block diagram of the error detection in this paper.

Therefore, the parity of a message may be obtained by calculating the summation of the input message over

GF $\delta 2^{8}$ P. Assume that the received polynomial t $\delta x$ P is t $\delta x P$  i $_{4}$  t<sub>0</sub> b t<sub>1</sub>x b t<sub>2</sub>x<sup>2</sup> b bt<sub>n</sub>x<sup>n</sup>;t<sub>1</sub>2 GF $\delta 2^{8}$ P:  $\delta 10$ P

The detection scheme checks whether the syndrome equals zero or not, where syndrome u is Xn

u ¼ t<sub>i</sub>: ð11Þ

i¼0

If the syndrome equals zero, then it is assumed that no errors have occurred; otherwise, errors did occur.

In the channel coding field, it is assumed that the message sðxÞis transmitted over a noisy channel. The channel does not modify the message if no errors occur. Therefore, it is easy to predict that  $t_0$  is identical to p, with  $t_0$  being used to detect the errors. However, as shown in Fig. 3, the message, S <sup>1</sup>/<sub>4</sub> fs<sub>0</sub>;s<sub>1</sub>;...;s<sub>n1</sub>g, is transformed into another message, ft<sub>1</sub>;t<sub>2</sub>;...;t<sub>n</sub>g, by an AES operation; hence,  $t_0$  cannot be obtained instinctively. Therefore, this paper investigates the function, predicting  $t_0$  from p as shown in Fig. 3, for each operation to make error detection possible in AES.

This work applies an ðn þ 1;nPCRC to AES, where n 2 f4;8;16g. In the case where, n ¼ 16, a 128bit AES state is treated as a message; hence, only one parity is generated for a 128-bit data block. When n <sup>1</sup>/<sub>4</sub> 4, the error detection is designed to check each column of the output state. In other words, four 4-byte column vectors in an AES state,

 $ft_{4jb1};t_{4jb2};t_{4jb3};t_{4jb4}g, 0 j 3$ , are checked separately. Therefore, four parities are required for a 128-bit data block when n <sup>1</sup>/<sub>4</sub> 4. For n <sup>1</sup>/<sub>4</sub> 8, two parities are required for a

128-bit data block. The following context addresses the two cases, n  $\frac{1}{4}$  16 and n  $\frac{1}{4}$  4, because the  $\frac{3}{8}$ PCRC for the AES algorithm can be constructed under similar conditions to the  $\frac{3}{17}$ ;16P or  $\frac{35}{4}$ PCRC for AES.

#### 3.1 In SubBytes

In this paper, two implementation types of SubBytes are considered. The first type uses one table instead of the  $GF\delta2^{8}P$  inversion and the affine transformation. The second type separately calculates the  $GF\delta2^{8}P$  inversion and the affine transformation and the implementation of the  $GF\delta2^{8}P$  inversion is not limited to the look-up-table method or the combinational logical circuit. In this paper, the first



Fig. 4.The error detection for united SubBytes.

type is named united SubBytesand the second type is separated SubBytes.

For united SubBytes, it is assumed that both the Sub Bytes circuit and the InvSubBytes circuit are implemented in a chip. Error detection is achieved by feeding the output of SubBytes into InvSubBytes, then comparing the input of SubBytes and the output of InvSubBytes, and vice versa, as Fig. 4 shows. If both are identical, then it is concluded that no errors have occurred. Otherwise, the errors did occur. This error detection method may be time-consuming, if only the SubBytes operation is considered. However, in practical terms, normal encryption could be further processed, without waiting for the error detection result, because SubBytes is either the first operation or the second operation in each round. In other words, the operation after SubBytes, such as ShiftRows, MixColumns, or AddRoundKey, may continue, when the output of the round would be intercepted if errors are detected in SubBytes.

If separated SubBytesis adopted, error detection must be applied separately to the GFð2<sup>8</sup>P inversion and the affine transformation. Considering the error detection for the GFð2<sup>8</sup>P inversion first, there are two schemes proposed herein. Similarly to Fig. 4, the first scheme detects errors by using the relationship of the mutual inverse. However, the computation of the GFð2<sup>8</sup>P inversion is identical for both SubBytes and InvSubBytes; hence, this scheme does not require the encryption and decryption circuits to simultaneously exist in one chip. It can be used with the encryption-only or decryption-only hardware.

The second scheme is the  $\delta n \not\models 1$ ;nPCRC and assumes that the GF $\delta 2^8 P$  inversion is implemented in look-up-table approach. Instead of the inverse value of a giving input, the exclusive value of the giving input and its inverse is stored in the table. Therefore, giving an input 2 GF $\delta 2^8 P$ , the value, <sup>1</sup>/<sub>4</sub>  $\not\models$ <sup>1</sup>, is obtained from the table and then the input is added to to yield <sup>1</sup>, as the marked block in Fig. 5. The error is detected by the syndrome obtained by the dashed line in Fig. 5. In this diagram, no errors are introduced, hence the syndrome is zero.

For one GF $\partial 2^{8}$ P inversion, according to Fig. 3 and the error model given in Fig. 2, the errors induce a fault at the input of the GF $\partial 2^{8}$ P inversion, as shown in Fig. 6. Suppose that the byte s<sub>i</sub> is changed into another byte s<sup>0</sup><sub>i</sub> by adding the error e<sub>0</sub>. Then, the syndrome used to detect errors is calculated as  $\partial$ sib e1P b tib1 b  $\partial$  tib1 b tib11P <sup>1</sup>/<sub>4</sub> e0 b e1:  $\partial$ 12P

The one-byte structure of Fig. 5 could be Taking the 16-byte extended to the 4-byte, 8-byte, or 16-byte structure.





Ρ

S  $\frac{1}{4}$  fs<sub>0</sub>;s<sub>1</sub>;...;s<sub>15</sub>g and then the parity p is  $\frac{15}{i\frac{1}{40}}$  s<sub>i</sub>from (9). According to (12) and Fig. 3, the parity of the output parity t<sub>0</sub> could be predicted by

 $\begin{array}{ccc} & X15 & X15 \\ sib & \delta tib1 b tib11b; & \delta 13b \\ i^{1}\!40 & i^{1}\!40 \\ and the syndrome is \\ X15 \\ t0 b & tib1; \end{array}$ 

 $i\frac{1}{40}$ X15 X15  $\delta 14P$ ) tip1 p p $\delta tip1$  p tip11P:  $i\frac{1}{40}$   $i\frac{1}{40}$ 

If no errors have occurred, the value  $t_{ib}^{-1}$  will equal  $s_i$ . Therefore, the syndrome (14) is zero.

In this paper, all ShiftRows, MixColumns, and AddRoundKey are protected by error detection code. However, the detection technique of SubBytes is varied with its implementation. According to the error detection scheme for SubBytes, three proposed

architectures for AES are denoted by united-SubBytes detection (USBD, hybridSubBytes detection (HSBD), and parity-based-SubBytesdetection(PbSBD), as shown in Fig. 7.



Fig. 6. An error is injected into the input state after entering the GFð2<sup>8</sup>P inversion.

3



Fig. 7. The three proposed architectures for AES.

For the affine transformation, error detection is achieved by the  $\partial n \not = 1; n P C R C$ , where n 2 f4;8;16g. Considering n <sup>1</sup>/<sub>4</sub> 16 first, and according to (9), the parity p of an input state, S <sup>1</sup>/<sub>4</sub> fs<sub>0</sub>;s<sub>1</sub>;...;s<sub>15</sub>g, where s<sub>i</sub><sup>2</sup> GF $\partial 2^8 P$ , is generated by

X15 p<sup>1</sup>/<sub>4</sub> s<sub>i</sub>: ð15Þ i<sup>1</sup>/<sub>4</sub>0

The output state is denoted as T <sup>1</sup>/<sub>4</sub> ft<sub>0</sub>;t<sub>1</sub>;...;t<sub>16</sub>g. From (2) and Fig. 3,  $t_{ib1}$  is As<sub>i</sub>b 63, where 0 i15. The hexadecimal constant 63 will be eliminated after taking summation of the output state Tnt<sub>0</sub>, i.e.,

Therefore,  $t_0$  can be predicted by (16) with input parity p. If no errors occur, the syndrome u must be zero, X16

 $u \frac{1}{4}$   $t_i \frac{1}{4} 0$ : ð17Þ

i¼0

In the case of  $\delta 5;4$  DCRC or  $\delta 9;8$  DCRC, (16) also holds.

3.2 In ShiftRows

From (3), the ShiftRows operation simply rotates the input state S, but does not alter the value of  $s_i$ . Therefore,  $t_0$  may be directly predicted by  $P^n_{i140} s_i$  in the case of n <sup>1</sup>/<sub>4</sub> 16. Similarly, the ShiftRows operation is error free if the syndrome is zero X16

t<sub>i</sub><sup>1</sup>/<sub>4</sub> 0: ð18Þ

When n  $\frac{1}{4}$  4, because each column of the output state would be detected, the four parities  $p_j$ , where 0 j3, are p0  $\frac{1}{4}$  s0  $\frac{1}{5}$  s5  $\frac{1}{5}$  s10  $\frac{1}{5}$  s15; p1  $\frac{1}{4}$  s4  $\frac{1}{5}$  s9  $\frac{1}{5}$  s14  $\frac{1}{5}$  s3; p2  $\frac{1}{4}$  s8  $\frac{1}{5}$  s13  $\frac{1}{5}$  s2  $\frac{1}{5}$  s7; p3  $\frac{1}{4}$  s12  $\frac{1}{5}$  s1  $\frac{1}{5}$  s6  $\frac{1}{5}$  s11;

hence,thet<sub>j;0</sub> for eachoutputmessageft<sub>4jp1</sub>;t<sub>4jp2</sub>;t<sub>4jp3</sub>;t<sub>4jp4</sub>g is p<sub>j</sub>. The case of n <sup>1</sup>/<sub>4</sub> 8 is analogous to the case of n <sup>1</sup>/<sub>4</sub> 4.

#### 3.3 In MixColumns

The behavior of the MixColumns operation is more complex because each byte in the input state S influences four bytes in the output state T. However, because of the ingenious design of the matrix coefficients, it is also possible to apply the ðn þ 1;nPCRC directly, where n 2 f4;8;16g. The MixColumns operation works as follows:

2 t4jþ1 3 2 02 03 01 01 3 2 s4j

664 tt44jjþþ23 775 ¼ 466 01 02 03 0101 01 02 03 577 664 ss44jjþþ12 577;where 0 j 3:

|fflfflfflfflfflfflfflfflfjb0 4 ffl} 03 01 01 02 |fflfflfflfflfflfflfflfflfflfflfflfflfs4jb0 3

Т S ð19Þ

From (19), it is yielded that the summation of vector  $T^0$  equals that of vector  $S^0$ .

X3

 $t_{4j} \flat_k \flat_1 \sqrt[1]{4} \ \eth 02 \ \flat \ 01 \ \flat \ 01 \ \flat \ 03 \ \flat s_{4j} \flat$ 

k1/40

ð03 þ 02 þ 01 þ 01Þs<sub>4jþ1</sub>þ ð01 þ 03 þ 02 þ 01Þs<sub>4jþ2</sub>þ ð20Þ

ð01 þ 01 þ 03 þ 02Þs<sub>4jþ3</sub>; ¼ s4j þ s4jþ1 þ s4jþ2 þ s4jþ3; X3

<sup>1</sup>/<sub>4</sub> s4jbk:

k¼0

Therefore, when the  $\delta$ 5;4PCRC is applied, the output parity  $t_{j;0}$  of the jth column vector may be directly predicted from the jth column vector of the input state by  $P^{3}_{k^{1}40}$  s<sub>4jbk</sub>.

Similarly, in the case n  $\frac{1}{4}$  16, t<sub>0</sub> is predicted by

i¼0

Because the summation of 02, 01, 01, and 03 is 01, (20) can be satisfied for the  $\partial 17$ ;16P,  $\partial 9$ ;8P, or  $\partial 5$ ;4P CRC. The coefficients of InvMixColumns display an identical phenomenon. The summation of the four coefficients used in decryption, 0B, 0D, 09, 0E, is also 01. Therefore, t<sub>0</sub> or t<sub>j;0</sub>can be predicted in the same way as that of MixColumns.

#### 3.4 In AddRoundKey

Discussing the case n <sup>1</sup>/<sub>4</sub> 16 first, it is assumed that each round key already has a parity; hence, the round

key is represented as  $fk_0;k_1;...;k_{16}g$ , where  $k_0 \frac{1}{4} P^{15}_{i\frac{1}{4}0}$  $k_{ip1}$  is the parity and  $fk_1;...;k_{16}g$  is the normal round key. The AddRoundKey operation only adds the input  $T \frac{1}{4} S b K$ :  $\delta 21 P$ 

state with a normal key K  $\frac{1}{4}$  fk<sub>1</sub>;k<sub>2</sub>;...;k<sub>16</sub>g to yield the output state as follows:



Fig. 8.The error detection scheme for key expansion.

We apply the summation operation to (21) to obtain X15 X15X15tiþ1 ¼ siþ kiþ1 ¼ p þ k0: ð22Þ i¼0 i¼0i¼0

Accordingly,  $t_0$  may be obtained from p  $\not b k_0$ . The parities for n  $\frac{1}{4}$  4 or n  $\frac{1}{4}$  8,  $p_j$ , are calculated in the same way; however, the round key must also have four or two parities.

## 3.5 In the Key Expansion

The dn b 1;nPCRC is also adopted in key expansion, where n 2 f4;8;16g. However, the d5;4PCRC is always used in the interior of the key expansion. The key expansion and the error detection scheme are jointly depicted in Fig. 8, where the decision blocks are removed from Fig. 1 for a simple description of error detection, as the conditions only determine where the error detection is applied, not how it is designed.

In this key expansion, with error detection, one word contains five bytes and the symbol of a word is denoted by  $W^{0}\frac{1}{2}i \frac{1}{4} \frac{1}{2}W\frac{1}{2}i$  k parity, where k is a catenation symbol. At first, the parities of the first Nk words, where Nk 2 f4;6;8g, are obtained by the generator 1 b x, i.e., the parity  $p_i$  of

W<sup>1</sup>/2i <sup>1</sup>/4 <sup>1</sup>/2wi;0wi;1 wi;2 wi;3is

pi ¼ wi;0 þ wi;1 þ wi;2 þ wi;3:

Then, the Nk-pair parities and messages form new Nk words,  $W^{0}_{1/2}0; W^{0}_{1/2}1;...$ , and  $W^{0}_{1/2}Nk$  1. The new words are successively put into the Nk shift blocks, from  $W^{0}_{1/2}i$  Nkto  $W^{0}_{1/2}i$  1, at the top of Fig. 8, after which, the key expansion starts. A 128-bit round key and its one-byte parities are collected after each period of four shifts. If  $\delta$ 17;16P CRC is chosen for AES, the one-byte parity of a round key is obtained by summing the four parities of output words. If  $\delta$ 5;4PCRC is chosen, then the four parities are kept. In the key expansion, the RotWord rotates the byte order of  $W^{1}/2i$  1; hence, the parity is the same as that of  $W^{0}/2i$  1. For the SubWord operation because it is a function which executes SubBytes on each byte of input, the error detection scheme is the same as that in SubBytes, described in Section 3.1. However, in the case of united SubBytesbeing used, the parity must be calculated separately.

For the EXOR operation with Rcon<sup>1</sup>/<sub>2</sub>i=Nk, the error detection is achieved by EXORing the parity of temp and that of Rcon<sup>1</sup>/<sub>2</sub>i=Nk, where Rcon<sup>1</sup>/<sub>2</sub>i=Nk <sup>1</sup>/<sub>4</sub>  $f02^{bi=Nkc}$ ;00;00;00g. The parity of Rcon<sup>1</sup>/<sub>2</sub>i=Nkequals  $02^{bi=Nkc}$ due to the three bytes of zero value in Rcon<sup>1</sup>/<sub>2</sub>i=Nk. At the end of the key expansion, the parity t<sub>0</sub> is the EXOR of the parity of current data and the parity of W<sup>0</sup>/<sub>2</sub>Nk 1.

3.6 More Details for ð5;4Þ CRC

Although the d5;4PCRC has four parities, it is possible for only one parity to be used in realization of this scheme. AES can be implemented in a 32-bit structure, i.e., one column of a state is processed once in every round. In this structure, the position of ShiftRows must be shifted above the SubBytes operation. After ShiftRows, each column passes ðBrðhagh the identical calculations, SubBytes, MixColumns. and AddRoundKey; the parity generation, or the syndrome calculation for each column, are also identical, so only one circuit is required.

## UNDETECTABLE ERRORS

Even though the AES algorithm propagates the errors during encryption, the error coverage can be also analyzed mathematically. Actually, only the MixColumns and SubBytes operations cause the

numerous erroneous bits when a single-bit error is injected, when ShiftRows or AddRoundKey do not change the bit number of the errors. Several assumptions are made, as follows:

The error model is considered as Fig. 2. 1.

All nonzero error block over GFð2<sup>8ðnþ1Þ</sup>Þ have 2. the same probability, where n 2 f4:8:16g.

Each operation 3. has

same error injection probability.

4.1 The Undetectable Errors in SubBytes

Because SubBytes is invertible, all errors injected into input can be detected by InvSubBytes and vice versa. Therefore, the united SubBytes, has 100 percent fault coverage. In separated SubBytes, both operations, the GFð2<sup>8</sup>Þ inversion and the affine transformation, have their own error detection. The GF<sub>02</sub><sup>8</sup>P inversion is also invertible, so it has 100 percent fault coverage in hybrid SubBytes.

In parity-based SubBytes, the error detection capability of the GFð2<sup>8</sup>Þ inversion is analyzed. According to (14), the scheme only uses XOR operations, so all the codewords are the undetectable errors in parity-based SubBytes. Therefore, while applying the ð17;16PCRC to a 128-bit data block, the number of undetectable nonzero errors is  $2^8$ ) -1 $\eth 2^8 {I\!\!P}^{16}$  1 and the percentage of

undetectable errors is  $\frac{\delta}{}$  $\frac{16}{17}$  ffi0:4%. When  $\delta 2 P$ theð5;4Þ CRC is applied to a 128-bit data block, the total number of undetectable nonzero errors is  $\partial \partial 2^8 P^4$  $1\dot{P}^4$ the  $2^{\circ}$  -1 percentage is  $\eth_{8}^{\circ 4}$  5  $\clubsuit^{4}$ and 100% ffi2:56 10<sup>8</sup>%. Simið2 Þ larly, the percentage of undetectable errors for the  $\delta 9$ ;8PCRC is 0:16 10<sup>2</sup>%.

The affine transformation is detected by dn b 1;nPCRC. Although five erroneous bits were caused, while injecting a single-bit error, the error coverage can still be analyzed. Theorem 1.Given an input state S ¼ fp;s<sub>0</sub>;s<sub>1</sub>;...;s<sub>n1</sub>g,

where parity p is  ${}^{n}_{i'40}{}^{1}$  s<sub>i</sub>, and n 2 f4;8;16g, the output state is T  $\frac{1}{4}$  ft<sub>0</sub>;t<sub>1</sub>;...;t<sub>n</sub>g, where t<sub>0</sub> is Ap from (16), and

02 03 01 01 t<sub>i</sub>b<sub>1</sub> s<sub>i</sub>b e<sub>i</sub> 664 ttiibb23 775 ¼ 646 01 02 03 0101 01 02 03 577664 ssiiþþ12 þþ eeiiþþ12 775:

tib4 03 01 01 02 siþ3 þ eiþ3 Then, the summation of the column vector t<sub>ib1</sub> is X3

t<sub>ibkb1</sub> ¼ ð02 þ 01 þ 01 þ 03Þðs<sub>i</sub> þ e<sub>i</sub>Þþ  $k^{1}40 \ \delta 03 \ b \ 02 \ b \ 01 \ b \ 01 \ b \ \delta_{ib1} \ b \ e_{ib1} \ b \ b$ 

ð01 þ 03 þ 02 þ 01Þðs<sub>ib2</sub> þ e<sub>ib2</sub>Þþ ð01 þ 01 þ 03 þ 02Þðs<sub>ib3</sub>ð25Þ þ e<sub>ib3</sub>Þ; X3

<sup>1</sup>/<sub>4</sub>  $\partial s_i b_k b e_i b_k b$ :

 $t_{ib1}$ , 0 i n 1, is obtained from (2). Introducing an error  $E^{1/4}$  fe<sub>0</sub>;e<sub>1</sub>;...;e<sub>n</sub>g into the state S  $^{1/4}$  fp;s<sub>0</sub>;s<sub>1</sub>;...;s<sub>n1</sub>g, the summation of the output  $T^0$  will equal to zero if and only if  $P^{n}i^{1}40 e_{i}^{1}4 0$ .

Proof. Because n is even, the value 63 will be cancelled. Therefore, the summation of the erroneous output T<sup>0</sup> is

Xnt0, ¼ Ap b e0 b A Xn1ðsi b eib1Þ:

i<sup>1</sup>/<sub>4</sub>0 i<sup>1</sup>/<sub>4</sub>0

ff1} 0

Xn

1⁄4 A e<sub>i</sub>:

i¼0

ATherefore,gular overPni<sup>1</sup>/<sub>4</sub>0 ei<sup>1</sup>/<sub>4</sub>GF0Pis held. Because the matrix  $\partial_{i}^{n} 2_{4} P_{0}$ ,  $tA_{i}^{0}$  Pequalsneiis zero if and only ifto ifandAis zero nonsin-onlyPin<sup>1</sup>/40 eifi i¼0

is zero.

tu

In the dn b 1;nPCRC, the nonzero errors are undetected, when the equation  $P^{n}_{i^{1}40} e_{i}^{1}4 0$  is held, i.e., errors are also the codewords. According to Theorem 1, all undetectable errors are also undetected after the affine transformation. Therefore, while applying the ðn b 1:nPCRC to a 128-bit data block, the percentages of the undetectable errors are 0.4 percent,  $0.16 \ 10^2$ %, and 2:56  $10^8$ %, respectively, for n <sup>1</sup>/<sub>4</sub> 16, n <sup>1</sup>/<sub>4</sub> 8, and n <sup>1</sup>⁄<sub>4</sub> 4.

4.2 The Undetectable Errors in MixColumns

MixColumns also has a diffusion property. It causes five or 11 erroneous bits while injecting a single-bit error in one column vector of the input state. However, the coefficients eliminate the diffusion of errors after summing the erroneous columnvectoroftheoutputstate.TheMixColumns is shown again below, and it is supposed that each byte of the input vector is polluted by an error.

3 32 3 2 2 ð24Þ k1/40

The equation also holds for two or four columns vectors.

Theorem 2.Giving an input state S <sup>1</sup>/<sub>4</sub> fp;s<sub>0</sub>;s<sub>1</sub>;...;s<sub>n1</sub>g, P where p <sup>1</sup>/<sub>4</sub>  $^{n}_{i_{1}/0}$  s<sub>i</sub>is the checksum of the input state and n 2 f4;8;16g. After MixColumnsand the parity prediction (20), the output state is T <sup>1</sup>/<sub>4</sub> ft<sub>0</sub>;t<sub>1</sub>;...;t<sub>n</sub>g, where t<sub>0</sub> <sup>1</sup>/<sub>4</sub> p, and the rest is the output of MixColumns. Introducing an error

E <sup>1</sup>/<sub>4</sub> fe<sub>0</sub>;e<sub>1</sub>;...;e<sub>n</sub>g into the state S <sup>1</sup>/<sub>4</sub> fp;s<sub>0</sub>;s<sub>1</sub>;...;s<sub>n1</sub>g, then the errors of the ðn þ 1;nP CRC in MixColumnsare P

undetectable if and only if the summation  $\prod_{i\neq 0}^{n} e_i$  is zero.

Proof. The syndrome  $P^n_{\,il\!40}\,t_iis$  used to check whether errors occurred or not. It is assumed that no errors occurred, if and only if the syndrome is zero. The summation of the erroneous output state is

Xnt0<sub>i</sub> ¼ ðt0 þ e0Þ þ Xnt0i:

i¼0 i¼1

From (25), because n is the multiple of four, the above equation is represented as

e<sub>i</sub>:

0 Xn

1/4

i¼0

From Theorem 2, there are  $\delta \delta 2^8 P^{16} 1P$  nonzero errors that are undetectable, when the  $\delta 17;16PCRC$  is

applied to a 128-bit data block. This result is the same as those in the affine transformation described above. Similarly, the total number of the undetectable errors for the  $\partial 9$ ;8Por  $\partial 5$ ;4P CRC is  $\partial \partial 2^8 P^4 1 P^4$  or  $\partial \partial 2^8 P^8 1 P^2$ , respectively.

4.3 The Undetectable Errors in ShiftRowsor AddRoundKey

ShiftRows does not change the value of the input state, and AddRoundKey only EXORs the input state with a round key. Therefore, the undetectable errors are the same as those analyzed in the affine transformation or MixColumns.

## DETECTION LEVELS

The proposed scheme may be used in operation-level, roundlevel, or algorithm-level error detection. In operation-level detection, the syndrome is checked at the end of each operation. Similarly, if the syndrome is obtained at the end of each round, it is round-level detection. The implementation of operation-level error detection is easy to figure out. The syndrome is calculated at the end of each operation according to the equations derived in Section 3. However, the implementation of a roundlevel detection needs more ingenuity, when the SubBytes is protected by united SubBytes. The parity is generated at the end of the SubBytes or the beginning of the ShiftRows. Then, the parity directly passes through ShiftRows, and MixColumns because its value will not be changed after the two operations. Finally,



Fig. 9.The proposed scheme under round-level error detection.

the parity is EXORed with the key parity. The total path is shown in Fig. 9. Obviously, the syndrome could then be checked at the end of the round. In hybrid SubBytes, the structure for round-level error detection is similar to Fig. 9, but the parity is generated after the  $GF\partial 2^{8}P$  inversion. Because the parity of the state, in the ith round, cannot pass through the inversion of  $GF\partial 2^{8}P$  in i p 1 round, the parity must be regenerated in each round. Therefore, unitedSubBytes detection or hybrid-SubBytes detection cannot be implemented as algorithm-level detection.

However, each operation of parity-based SubBytesis protected by ðn þ 1;nPCRC, hence the parity could pass through a round. Therefore, parity-based SubBytescould be applied as an operation-level, round-level, or algorithmlevel error detection.

## FEATURES AND COSTS

6.1 Scalability

In Section 3, it was found that the three error detections, ðn þ 1;nPCRC, where n 2 f4;8;16g, had similar structures. The calculations of parities or syndromes were all based on Byte-EXOR (B-EXOR) operation and the length of the message was a multiple

of four bytes. Therefore, the proposed approach is scalable with practical hardware design; in other words, the three CRCs can be applied to an AES implementation of an 8-bit, 32-bit, or 128-bit structure. In general, the portable devices are more probable to encounter DFA than a nonportable device. Therefore, the scalability of error scheme is good for practical purposes because 8-bit and 32-bit architectures are most commonly used in portable applications, such as cell phones, SmartCard, or RFID tag.

The approach proposed by Bertoni et al. [1] cannot be easily scaled down into the 8-bit architecture because the parity of  $s_i$  requires the information from  $s_{ib1}$  and  $s_{ib2}$ . However, this work can

easily be applied to an 8-bit, 32-bit, or 128-bit AES architecture. The syndrome generation is similar to parity generation. Fig. 10 shows a block diagram of (17) and (16) for 8-bit AES architecture. While 16 bytes t<sub>i</sub>are obtained, the syndrome u is obtained immediately, where the initial value of parity registers as a zero byte. The ShiftRows, MixColumns, or AddRoundKey have similar structures to Fig. 10, but the matrix transformation, A, is not required. The 32-bit or 128-bit AES can also be implemented, based on the concept in Fig. 10.

The 32-bit architecture is the most flexible structure from the point of error detection because it could use  $\delta 17;16P$ ,



Fig. 10.The block diagram of error detection for 8-bit AES architecture.

 $\delta 9$ ;8Þ, or  $\delta 5$ ;4Þ CRC to achieve the error detection objective. No matter which one is selected, it is possible that only a one-byte register is required to store the parities. However, the input must be a onecolumn vector, defined in AES; thus, (20) may be used to detect faults for a one-column calculation.

# 6.2 Symmetry

From Fig. 10, it can be seen that the proposed scheme is symmetric in both encryption and decryption. This has the advantage of the encryption and decryption being integrated into one chip. However, the scheme proposed by Bertoni et al. [1] is asymmetrical in MixColumns and InvMixColumns. As shown in Table 1, the output parity prediction of InvMixColumns is more complex than that of MixColumns.

## 6.3 Costs

While introducing proposed error detection schemes into AES, the hardware cost required by those schemes is evaluated through their computational complexity. Error detection consists of two parts—the parity and syndrome generation. Discussing the cost in parity generation first, in our proposed schemes, the parity requires only the EXOR operation. A total of  $\partial n1P_{n}^{16}$  Byte-XORs (B-EXOR) is required to calculate the parity of the input for the proposed approach. Taking the  $\delta5;4$ PCRC for a 128-

bit data block as an example, one checksum of an input message is generated by three B-EXORs and a total of 12 B-EXORs for four parities. However, united SubBytesuses InvSubBytes to check error, so no parity generation is required. In hybrid SubBytes, the dn b 1;nPCRC is applied to the affine transformation; 15, 14, or 12 B-XORs are required to produce the parities for n of 16, 8, or 4, respectively. In the method proposed by Bertoni et al. [1], 16 7 bit-EXORs (b-EXOR) were required to obtain 16 one-bit parities for an AES state. In [7], they used the inversion operation to detect the errors; hence, no parities were paid for. However, the hardware of parity generation is minor because the parity generation is required to perform at the beginning of the parity-based detection is applied. In PbSBD, because the parity can pass through each operation along with predicting the parity, the parity generation only performs once. In USBD and HSBD, the parity must be regenerated in SubBytes of each round; nevertheless, only one circuit of parity generation is required when one round is implemented to achieve AES computing. In the approach of Bertoni et al. [1], the parity can also pass through the round; hence, one circuit of parity generation is required.

As regards the cost of the syndrome generation and parity prediction, it varies from operation to operation. United SubBytesuses the InvSubBytesto detect errors. In hybrid SubBytes, the  $GF\delta2^{8}P$ inversion is used to self-check errors; the  $\delta n$   $\beta$ 1;nPCRC is used to detect errors of affine transformation. According to (17), 16 B-EXORs are required to obtain the syndrome for every  $\delta n$   $\beta$ 1;nPCRCs. However, the execution number of affine multiplication to predict parity, (16), depends on n; the number is one, two, or four when n is 16, 8, or 4, respectively. For parity-based SubBytes, the cost in affine transformation is the same as that in hybrid SubBytes. However, the  $GF\delta2^{8}P$  inversion also uses  $\delta n$   $\beta$  1;nPCRC; according to (14), 32 B-EXORs are required (note that the  $\delta t_{ib1} b t_{ib}^{-1} P in (14)$  is obtained from a table, not requiring EXOR calculation). In ShiftRows and MixColumns, no prediction functions are necessary and the syndrome is obtained by summing all output byte and the parity. Therefore, in the two operations, 16 B-EXORs are required. In AddRoundKey, the one, two, or four one-byte parities of a round key are involved in the parity prediction, requiring extra B-EXORs to be paid for. The results summarized in Table 1 are the cost of the operationlevel detection, i.e., the error detection is at the end of every operation. If round-level or algorithm-level are chose, only

| TABLE 2                                           |  |  |  |  |
|---------------------------------------------------|--|--|--|--|
| The Possible Combinations of Our Proposed Schemes |  |  |  |  |
| LIEBD   HEDD   BEDD                               |  |  |  |  |

| USBD    | 112015  | POSBD   |  |
|---------|---------|---------|--|
| (17,16) | (17,16) | (17,16) |  |
| (9,8)   | (9,8)   | (9,8)   |  |
| (5,4)   | (5,4)   | (5,4)   |  |

the cost of parity prediction is required in every operation and the cost of syndrome generation is only paid at the end of each round or of the AES algorithm, respectively. The costs of Bertoni et al.'s [1] approach are also varied in each operation. The SubBytes requires extra m 256-byte memory spaces to predict the parity, where m is dependent on the implementation of the AES. Taking an

TABLE 1 The Cost of Syndrome Generation and Parity Prediction in Each AES Operation in the Operation-Level Detection

| 11151 m. 7 m. 4      |            | Ours (n =16, 8 or 4)                                                | Bertoni[1], [3]                                                            | Kam[7]        |  |
|----------------------|------------|---------------------------------------------------------------------|----------------------------------------------------------------------------|---------------|--|
| Bit number of parity |            | 8/16/32 bits                                                        | 16 bits                                                                    | 0 bit         |  |
| SubBytes             | USB        | InvSubBytes                                                         |                                                                            |               |  |
|                      | HSB        | the GF(2 <sup>8</sup> ) inversion, 16 × 8<br>b-EXORs, and 1/2/4 AMs | -                                                                          | InvSubBytes   |  |
|                      | PbSB       | 32 × 8 b-EXORs, 16 × 8 b-<br>EXORs, and 1/2/4 AMs                   | $m \times 256$ bits memory, $m \times 9$ b-EXORs, and comparison circuits. |               |  |
| ShiftRows            |            | 16 × 8 b-EXORs                                                      | bit shift+16 × 8 b-EXORs                                                   | InvShiftRows  |  |
| Winds lunss          | Cost in EN | 16 × 8 b-EXORs                                                      | $16 \times 8 + 16 \times 4$ b-EXORs                                        | InvMixColumns |  |
| MixColumns           | Cost in DE | 16 × 8 b-EXORs                                                      | More complicated than in EN                                                | MixColumns    |  |
| AddRoundKey          |            | $(16 + 1/2/4) \times 8$ b-EXORs                                     | 16 × 8 + 16 b-EXORs                                                        | AddRoundKey   |  |

B-EXOR = 8 b-EXORs, b-EXOR = bit EXOR operation, EN = encryption, DE = decryption, and AM = affine multiplication.

AES implemented in a 32-bit structure as an example, four bytes are calculated in parallel, thus four tables are required. The size of a table with error detection, in [1], is a double of that in AES, so a total of 512 bytes is for one table, i.e., 256 extra bytes are caused for one table. The 256 extra bytes are constants with odd parity, e.g., 00000000 1; therefore, one comparisoncircuitorsyndromegenerationcircuitisrequir ed to detect the error. This detection method has been modified by Bertonietal. [3] andthe extra memory size reduced from m 256 bytes to m 256 bits. Additionally, m 9 b-EXORs are introduced. The error detection of one byte, appended with one-bit parity, requires eight b-EXORs (bit EXOR operation) or a total of 16 8 b-EXORs for a 128-bit data block. However, Bertoni et al.'s scheme must predict the output parity in MixColumns, therefore, the extra calculations of 16 4 b-EXORs are required in the encryption process. In decryption, the error-detection hardware for InvMixColumns is more complicated than in encryption. Because the prediction of InvMixColumn is not derived in [1], the cost is not specified in Table 1. The costs of Karri et al.'s scheme required the inversion of each operation and it was also time-consuming. The operations in the key expansion are similar to the four major operations of AES; thus, the detailed comparisons of the key expansion are not discussed. Although most operations require 16 B-EXORs to compute the syndrome, it is possible to achieve the computation with less B-EXORs.

## ERROR DETECTION CAPABILITY

In Karri et al. [7], because the four operations of AES are bijective, their error detection capability is very high. If it is assumed that only one 128-bit error occurs during encryption or decryption, then all nonzero error patterns can be detected in the operation-level, round-level, or algorithmlevel detection. In Bertoni et al. [1], they used the paritybased technique and the undetectable errors do exist. Bertoni et al. [1] did a lot of tests to obtain the results about error detection capability and the results will be compared to ours in Fig. 14.

All simulations and statements of our proposed schemes, addressed here, are also under the three assumptions given in Section 4. Three architectures, USBD, HSBD, and PbSBD, were proposed herein; each architecture has three types of CRC,  $\delta$ 17;16Þ,  $\delta$ 9;8Þ, and  $\delta$ 5;4Þ CRCs, as shown in Table 2.



Fig. 11.The simulation model. Each data block has 64 ones and the position of ones uniformly distributed in a data block. The error bits uniformly distribute in an error block. The assignment of error blocks uniform distributes in both rounds and operations.

Thus, nine methods were simulated. In PbBSD, the data procedure is thoroughly protected by the ðn þ 1;nPCRC; thus, each operation has undetectable errors. However, in USBD, the fault coverage in SubBytes is 100 percent, so the amount of overall undetectable errors is 80 percent of that in USBD. Similarly, in HSBD, the amount is reduced to 75 percent of that in USBD.

The simulation model is shown in Fig. 11. Each method is simulated by 26 tests distinguished by the bit number of the injected errors. The last test in Fig.

12, Fig. 13, and Fig. 14, labeled as random, used error patterns with random erroneous bit number. Each error pattern has  $10^7$  blocks and thebitlengthofeveryblockis136ð128 þ 8Þ, 144ð2 ð64þ 8ÞÞ, or 160ð4 ð32 þ 8ÞÞ, respectively, for the ð17;16Þ, ð9;8Þ, or ð5;4Þ CRC. The all-one error block was considered as a totally different state; hence, the maximum number of erroneous bits was 135, 143, or 159 in a random test. Each test used one data pattern of  $10^7$  data blocks, and every



Fig. 12. Percentage of undetectable errors of the ð17;16Þ CRC over GFð2<sup>8</sup>Þ.

Fig. 13. Percentage of undetectable errors of the ð9;8Þ CRC over GFð2<sup>8</sup>Þ. Their percentage is 4.14 percent for 2-bit errors and 0.067 percent for 4-bit errors.



Fig. 14. Percentage of undectable errors of the ð5;4Þ CRC over GFð2<sup>8</sup>Þ. The percentage is 1.8 percent for 2-bit errors and 0.13 percent for

4-bit errors.

block has 64-bit ones of normal distribution. The erroneous rounds and erroneous operation were also randomly chosen.

As seen in Fig. 12, all the simulated odd-bit errors were detected. The percentage of the undetectable errors dropped dramatically as the erroneous bit number increased. When the number of erroneous bits was greater than eight, the percentage was below 1 percent and stable. The test using random erroneous bits is about 0.3 percent and it was close to the theoretic value obtained in Section 4, 0.4 percent. Obviously, all the experimental results followed the curves of ideal values.

The same data patterns used in the above tests were also used for the ð9:8PCRC and the ð5:4P CRC: all test conditions, except for the error patterns, were identical to those used to test the ð17;16Þ CRC. The ð9;8ÞCRC generated two parities for a 128-bit data block. Because the values in the two tests, 2-bit and 4bit erroneous bits, are too large, they were dependently shown in Fig. 13. All odd-bit errors were also detected. The percentage also dropped dramatically when the erroneous bits increased, as shown in Fig. 13. For the random test, the percentage is about 0:14  $10^2$ %, very close to the theoretical value of  $0:16 \ 10^2$ %.

In Fig. 14, the results of the  $\delta 5;4PCRC$  and Bertoni et al. [1] are shown. Obviously, this percentage is very small in contrast to the  $\delta 17;16PCRC$  or the  $\delta 9;8P$  CRC. When the number of erroneous bits was larger than 16, the percentages of undetectable errors dropped to zero. The percentage in the random test was 0 percent, very close to the theoretic value of 2:56  $10^8$ %. Of course, all odd-bit errors could be detected.

Fig. 14 also shows the results in Bertoni et al. [1]. The test models of Bertoni et al. [1] are different from ours. They have injected multiple bit errors (between 2 to 16) at the beginning of the round. From Fig. 14, their scheme has better error detection than ours, when the errors are between 2 and 6, and the cases of 8-bit errors are close. When the number of erroneous bits is above 10, the performance of the proposed scheme is better than that of Bertoni et al. [1].

## **II. CONCLUSIONS**

This work has proposed a simple, symmetric, and highfault-coverage error detection scheme for AES. Although the erroneous bits are diffused in AES, this work used the linear behavior of each operation in AES to design a detection scheme. This scheme only uses an  $\delta n \not = 1$ ;nPCRC to detect the errors, where n 2 f4;8;16g, and the parity of the output of each operation is predicted in a simple fashion. Even though the number of parities is two or four, respectively, for n <sup>1</sup>/<sub>4</sub> 8 or n <sup>1</sup>/<sub>4</sub> 4, it is possible to use only one 8-bit register

for storing the parities during hardware implementation. This error detection may also be used in encryption-only or decryption-only designs. Because of the symmetry of the proposed detection scheme, the encryption and decryption circuit can share the same error detection hardware. The proposed schemes can be applied in the implementation of AES against differential fault attacks and can be easily implemented in a variety of structures, such as 8-bit, 32-bit, or 128-bit structures.

## REFERENCES

- G. Bertoni, L. Brevegelieri, I. Koren, P. Maistri, and V. Piuri, "Error Analysis and Detection Procedures for a Hardware Implementation of the Advanced Encryption Standard," IEEE Trans. Computers, vol. 52, no. 4, pp. 492-505, Apr. 2003.
- [2] G. Bertoni, L. Brevegelieri, I. Koren, P. Maistri, and V. Piuri, "Detecting and Locating Faults in VLSI Implementations of the Advanced Encryption Standards," Proc. 18th IEEE Int'l Symp. Defect and Fault Tolerance in VLSI Systems, pp. 105-113, Nov. 2003.
- [3] G. Bertoni, L. Brevegelieri, I. Koren, and P. Maistri, "An Efficient Hardware-based Fault Diagnosis Scheme for AES: Performances and Cost," Proc. 19th IEEE Int'l Symp. Defect and Fault Tolerance in VLSI Systems, pp. 130-138, Oct. 2004.
- [4] E. Biham and A. Shamir, "Differential Fault Analysis of Secret Key Cryptosystems," Advances in Cryptology—Proc. CRYPTO '97, pp. 513-525, 1997.
- [5] P. Dusart, G. Letourneux, and O. Vivolo, "Differential Fault Analysis on A.E.S," Applied Cryptography and Network Security, pp. 293-306, 2003.
- [6] M. Feldhofer, S. Dominikus, and J. Wolkerstorfer, "Strong Authentication for RFID Systems Using the AES Algorithm," Proc. Cryptographic Hardware and Embedded Systems (CHES '04), pp. 357-370, 2004.
- [7] R. Karri, K. Wu, P. Mishra, and Y. Kim, "Concurrent Error Detection Schemes for Fault-Based Side-Channel Cryptanalysis of Symmetric Block Ciphers," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 21, no. 12, pp. 1509-1517, Dec. 2002.
- [8] R. Karri, G. Kuznetsov, and M. Goessel, "Parity-Based Concurrent Error Detection of Subsititution-Permutation Network Block Ciphers," Proc. Cryptographic Hardware and Embedded Systems (CHES '03), pp. 113-124. 2003.

- [9] S. Mangard, M. Aigner, and S. Dominikus, "A Highly Regular and Scalable AES Hardware Architecture," IEEE Trans. Computers, vol. 52, no. 4, pp. 483-491, Apr. 2003.
- [10] US Nat'l Inst. of Standards and Technology, "Federal Information Processing Standards Publication 197—Announcing the ADVANCED ENCRYPTION STANDARD (AES)," 2001, http:// csrc.nist.gov/publications/fips/fips197/fips-197.pdf.
- [11] G. Piret and J.J. Quisquater, "A Differential Fault Attack Technique against SPN Structures, with Application to the AES and KHAZAD," Proc. Cryptographic Hardware and Embedded Systems (CHES '03), pp. 77-88, 2003.
- [12] J. Daemen and V. Rijmen, "AES Proposal: Rijndael," AESAlgorithm Submission, Sept. 1999.
- [13] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, "A Compact Rijndael Hardware Architecture with S-Box Optimization," Proc. Advances in Cryptology (ASIACRYPT '01), pp. 171-184, 2001.
- [14] K. Wu, R. Karri, G. Kuznetsov, and M. Goessel, "Low Cost Concurrent Error Detection for the Advanced Encryption Standard," Proc. Int'l Test Conf. (ITC '04), pp. 1242-1248, 2004.