Proposed CAESAR Hardware API -

Viewer
Transcript

Proposed CAESAR Hardware API Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand, Panasayya Yalla, Jens-Peter Kaps, and Kris Gaj Cryptographic Engineering Research Group George Mason University Fairfax, Virginia 22030 email: {ehomsiri, wdiehl, aferozpu, ffarahma, pyalla, jkaps, kgaj}@gmu.edu

Abstract. In this paper, we propose a universal hardware Application Programming Interface (API) for authenticated ciphers. In particular, our API is intended to meet the requirements of all algorithms submitted to the CAESAR competition. The major parts of our proposal include: minimum compliance criteria, interface, communication protocol, and timing characteristics supported by the core. All of them have been defined with the goals of guaranteeing (a) compatibility among implementations of the same algorithm by different designers, and (b) fair benchmarking of authenticated ciphers in hardware.

1

Minimum Compliance Criteria

The proposed minimum compliance criteria are listed below: 1.1

Encryption/Decryption

Authenticated encryption and decryption should be implemented within one core, in such a way that only one of these two operations can be executed at a time (half-duplex). This feature demonstrates an algorithm’s ability to use shared resources for encryption and decryption. Alternatives (not recommended): a) separate cores for encryption and decryption (simplex) b) authenticated encryption and decryption within one core, with both operations capable of running in parallel (full-duplex). 1.2

Variants

Only a variant indicated in the cipher specification as the primary recommendation has to be implemented. Other variants, if implemented, should be selectable by changing the default values of generics or constants before synthesis. Implementation of these variants should not affect any benchmarking results for the main variant.

1.3

Key scheduling

Key scheduling should be fully implemented within the hardware core. This approach takes into account very different contributions of the key scheduling unit to the entire cipher core area, which are specific for each algorithm. An alternative (not recommended): a) generation of round keys outside of the cipher core, e.g., in software.

1.4

Incomplete blocks

The core should properly handle incomplete blocks in associated data, message, and ciphertext. The resources and the numbers of clock cycles required to handle incomplete blocks are different among candidates. For multiple candidates handling of incomplete blocks can substantially affect the area and/or speed of the core. Of particular concern are variable shifts, which typically introduce a significant overhead in terms of area and/or timing. Often, a trade-off between the area and the number of clock cycles necessary to process an incomplete block can be made by choosing an appropriate detailed hardware architecture. An alternative (not recommended): a) handling only associated data, messages, and ciphertexts composed of full blocks.

1.5

Padding

Padding should be implemented in hardware, assuming that an unused portion of the last input data word is filled with zeros. Padding cost, in terms of area, is algorithm dependent and not negligible. In some algorithms, padding in software may need to be reversed in hardware because the tag calculation uses an unpadded last block. Alternatives (not recommended): a) padding in hardware, assuming that an unused portion of the last block is filled with zeros. b) padding in software, followed, if needed, by modifications of the last blocks in hardware. 2

1.6

Unused portions of the last word

Any unused portions of the last word generated during encryption and decryption should be cleared (filled with zeros) before releasing this word outside of the cipher core. An alternative (not recommended): a) potentially leaking some key-related data using unused portions of the last word. 1.7

Decrypted message release

The decrypted message blocks should be released immediately, and buffered outside of the cipher core, before the result of authentication is known. We assume that the delayed release of decrypted messages, dependent on the result of authentication, will be handled by an external circuit, which is FIFObased and similar for each candidate. Please note that we believe that such a unit MUST be implemented externally to the basic cipher core before deploying the core in majority of real-world scenarios. We only omit this important feature from the basic core because it adds quite a substantial burden on the hardware designer, with little benefits in terms of better differentiation among the CAESAR candidates. Additionally, the resources used by this special external unit can be made identical (or almost identical) for all candidates, and are limited mostly to large blocks of memory, which are typically counted as separate resources, independent from logic gates in ASICs and reconfigurable logic units (such as LUTs, Slices, etc.) in FPGAs. An alternative (not recommended): a) storing a decrypted message internally, until the result of verification is known. Pros: More complete functionality. Cons: Complicates the design and benchmarking. Also, makes the calculation of the output latency and throughput dependent on the output buffer size and implementation details (e.g., support for simultaneous reading and writing). 1.8

Empty AD/message/ciphertext

Allowing empty AD, empty message/ciphertext, and empty input (no AD, no message/ciphertext) Empty input could be used together with the Public Message Number, Npub, for user authentication. Alternatives (not recommended): 3

a) not allowing empty AD b) not allowing empty message/ciphertext c) not allowing empty input. 1.9

Supported maximum size of AD/message/ciphertext

single-pass authenticated ciphers: two-pass authenticated ciphers:

232 − 1 bytes 211 − 1 bytes

Maximum sizes defined in the CAESAR candidates’ specifications are unrealistic. Values that are too large may affect both area and maximum clock frequency of the hardware core (e.g., because of wide internal counters). 211 − 1 bytes > 1500 bytes = maximum transmission unit (MTU) of popular communication protocols, such as Ethernet v2.

1.10

Fractions of bytes

The size of all inputs is assumed to be expressed in bytes. As a result, the core should support only inputs composed of full bytes. No fractions of bytes should be allowed. An alternative (not recommended): a) the size of inputs expressed in bits. Allowing inputs of arbitrary size in bits would substantially increase the area required for handling of incomplete blocks and padding.

1.11

Maximum number of independent streams of data processed in parallel

The core should process only one stream of data at a time without an overlap. By a stream of data we understand here a single independent input composed of any subset of Npub, Nsec, AD, Message, Ciphertext, and Tag, supported by the encryption or decryption operation of a given authenticated cipher. We call a core with such features a single-stream implementation. Please note that a single-stream implementation may still take advantage of parallel processing for blocks belonging to the same input/stream. An alternative (not recommended): a) a multi-stream implementation that supports the processing of multiple independent inputs/streams in parallel. 4

In the multi-stream implementations: – Throughput is limited only by the maximum circuit area. – Multiple messages/ciphertexts processed in parallel would require multiple public data input (PDI) and data outputs (DO) ports. See Section 2 for the detailed descriptions of these ports. 1.12

External memory

Single-pass algorithms: Two-pass algorithms: 1.13

No Yes (but only for results of the first pass)

Number of clock domains

The entire authenticated cipher core should have only one clock input. For the maximum performance, this clock should be operating at the maximum clock frequency determined by the critical path located entirely inside of the hardware module. An alternative (not recommended): a) separate clocks for input module, output module, and the encryption/decryption unit. Pros: Possible smaller values of data bus widths. Cons: Difficulties with determining the maximum clock frequency of the cipher core. 1.14

Passing unchanged parts of the input to the output

Parts of the data inputs that are not changed by encryption or decryption operations, respectively, are not passed to the output. In particular, Npub and AD are not a part of the output from either encryption or decryption. See Fig. 5. This assumption removes the need for any bypass FIFO necessary to pass any unchanged data to the output. Any formatting of an output from decryption, for the purpose of transmission through the network or decryption, is assumed to be performed outside of the cipher core. An alternative (not recommended): a) passing unchanged parts of the input to the output. Pros: More complete functionality. Cons: The design time and area overhead for adding standard functionality that may be implemented in a coherent way outside of the authenticated cipher core. 5

1.15

Permitted widths of data ports (in bits)

Public Data Input (PDI) and Data Output (DO) ports: Lightweight implementations: High-speed implementations:

w = 8, 16, 32 32 ≤ w ≤ 256.

Secret Data Input (SDI) ports: Lightweight implementations: High-speed implementations:

w = 8, 16, 32 32 ≤ sw ≤ 64.

See Section 2 and Figs. 1 and 2 for the exact meaning of PDI, SDI, DO, w and sw. Implementations of a particular authenticated cipher, with the same w and sw, following all other minimum compliance criteria, should be mutually compatible. Implementations with different values of w or sw should be compatible under the assumption that the decryption input is reformatted in software or hardware (from one input word width to another) using a universal function/circuit, common for all candidates. 1.16

Decrypted message authentication

The result of the decrypted message authentication (Success or Failure) is calculated within the core itself. An alternative (not recommended): a) Calculating only the full value of the Tag during authenticated decryption. No comparison with the expected value of the Tag. Pros: Simpler input for decryption (no expected Tag). Lower resource requirements (no comparison between the actual and expected Tag). Cons: Inconsistent behavior for algorithms that use the Tag and those that require only the Ciphertext for authentication. Potential need for the tag comparison outside of the cipher core. 1.17

Interface, Communication Protocol, and Timing Characteristics

For the purpose of full compatibility, – the interface of the circuit should be consistent with the interface defined in Section 2, – the communication protocol of the circuit should be consistent with the protocol defined in Section 3, and 6

– the timing dependencies between data and control signals of the respective ports of the cipher core should be consistent with the dependencies described in Section 4. All the aforementioned elements of the hardware API are closely related to each other, and are selected in a coherent fashion. The advantages include the ease of communication with standard components of modern digital systems, such as FIFOs (First-In First-Out units), DMAs (Direct Memory Access units), AXI (Advanced eXtensible Interface) Master and Slave units, etc. No complex input or output modules are required. Standard buses, with the typical widths, such as 32, 64, 128, and 256-bits can be used for communication with the core. Additionally, unusual bus widths, suitable for some algorithms, such as 40 in case of PRIMATEs, can be utilized. On top of that, the core can be easily extended with an external circuit used to prevent the release of unauthenticated decrypted messages outside of a given hardware device, such as an FPGA, an ASIC, or a programmable system on chip.

2

Interface

The general idea of the proposed interface for an authenticated cipher core (denoted by AEAD) is shown in Fig. 1 for single-pass algorithms, and in Fig. 2 for two-pass algorithms. For both types of algorithms, the interface includes three major data buses for: – Public Data Inputs (PDI) – Secret Data Inputs (SDI), and – Data Outputs (DO), respectively, as well as the corresponding handshaking control signals, named valid and ready. The valid signal indicates that the data is ready at the source, and the ready signal indicates that the destination is ready to receive them. Additionally, for two-pass algorithms, an external FIFO of the size of at least 211 bytes is assumed to be used for storing intermediate results. This FIFO is connected to AEAD using Memory Ports: fifo_di, fifo_do, fifo_rd, and fifo_wr. The width of the data ports, fifo_di and fifo_do, denoted as mw, should be typically set to the width of intermediate blocks in a given two-pass algorithm. An optional output, status_ready is assumed to be active whenever an authenticated encryption or decryption has completed, and the Status block, shown in Figs. 8, 10(b), 11(b), 12(b), 13(b) is present at the do_data output. The presence of this signal can substantially simplify the operation of an external circuit used to discard any decrypted messages that did not pass authentication. The physical separation of Public Data Inputs (such as the message, associated data, public message number, etc.) from Secret Data Inputs (such as the key) is dictated by the resistance against any potential attacks aimed at accepting public data, manipulated by an adversary, as a new key. The handshaking signals are a subset of major signals used in the AXI4Stream interface [1]. As a result, AEAD can communicate directly with the 7

clk

rst

clk

rst

AEAD PDI Public Data Input Ports SDI Secret Data Input Ports

w

pdi_data

do_data

pdi_valid

do_valid

w

pdi_ready do_ready sw

DO Data Output Ports

sdi_data sdi_valid

status_ready

sdi_ready

Fig. 1: AEAD interface for single-pass authenticated ciphers

clk

rst

clk

rst

AEAD PDI Public Data Input Ports

w

pdi_data

do_data

pdi_valid

do_valid

w

pdi_ready do_ready

DO Data Output Ports

status_ready

SDI Secret Data Input Ports

sw

sdi_data sdi_valid sdi_ready

fifo_di fifo_do fifo_rd

mw mw

Memory Ports

fifo_wr

Fig. 2: AEAD interface for two-pass authenticated ciphers

8

clk

rst

AXI4−Stream Master m_axis_tdata

w

sw

rst

pdi_data

do_data

pdi_valid

do_valid

rst

AXI4−Stream Slave w

s_axis_tdata s_axis_tvalid s_axis_tready

sdi_data sdi_valid

empty

sdi_ready

read clk

clk

clk

pdi_ready do_ready

m_axis_tready dout

rst

AEAD

m_axis_tvalid

SDI FIFO

clk

rst

Fig. 3: Typical external circuits for single-pass algorithms: AXI4-Stream IPs

wr_clk

rst

PDI FIFO

rd_clk = clk dout

w

empty read

wr_clk

rst

clk

rst

pdi_data

do_data

pdi_valid

do_valid

pdi_ready do_ready

read

SDI FIFO

rst wr_clk = clk rst

AEAD

empty

dout

clk

sw

w

din write

DO FIFO

full

sdi_data sdi_valid sdi_ready

rd_clk = clk

Fig. 4: Typical external circuits for single-pass algorithms: FIFOs

9

rd_clk

AXI4-Stream Master through the Public Data Input, and with the AXI4-Stream Slave through the Data Output, as shown in Fig. 3. At the same time, AEAD is also capable of communicating with much simpler external circuits, such as FIFOs, as shown in Fig. 4. In both cases, the Secret Data Input is connected to a FIFO, as the amount of data loaded to the core using this input port does not justify the use of a separate AXI4-Stream Master, such as DMA. An additional advantage of using FIFOs at all data ports is their potential role as suitable boundaries between the two clock domains, used for communication and computations, accordingly. This role is facilitated by the use of separate read and write clocks, shown in Fig. 4 as rd_clk and wr_clk, accordingly. For a better compatibility with the AXI communication interface, all FIFOs mentioned in our description are assumed to operate in the First-Word Fall-Through mode (as opposed to the standard mode). The reset input can be either synchronous or asynchronous, and either activehigh or active-low, depending on the conventions used in a given technology (e.g., FPGA vs. ASIC), as well as the personal preference of the designers.

3

Communication Protocol

All parts of a typical input and and a typical output of an authenticated cipher are shown in Fig. 5, for encryption and decryption, respectively. Npub denotes Public Message Number, such as Nonce or Initialization Vector. Nsec denotes Secret Message Number, which was recently introduced in some authenticated ciphers and is a part of the CAESAR software API [2]. Both Npub and Nsec are typically assumed to be unique for each message encrypted using a given key. The difference is that Npub is sent to the other side in clear, while Nsec is sent in the encrypted form. The proposed format of the Secret Data Input is shown in Fig. 6. The entire input starts with an instruction, which in case of SDI is limited to Load Key (LDKEY). The instruction is followed by segments. Each segment starts with a separate header, describing its type and size. In case of SDI, the only segment type necessary to meet the minimum compliance criteria is: Key, denoting a string of bits carrying an authenticated cipher key. The proposed format of the Public Data Input is shown in Fig. 7. The allowed instruction types are: Activate Key (ACTKEY), Authenticated Encryption (ENC), and Authenticated Decryption (DEC). The Activate Key instruction, typically directly precedes the Authenticated Encryption or Authenticated Decryption instruction. PDI is divided into segments. Segment types allowed during authenticated encryption include: Public Message Number (Npub), Secret Message Number (Nsec), Associated Data (AD), and Message. Segment types allowed during authenticated decryption include: Public Message Number (Npub), Encrypted Secret Message Number (Enc Nsec), Associated Data (AD), Ciphertext, and Tag. 10

Npub Nsec

AD

Enc Npub Nsec

Message

Key

AD

Ciphertext Tag

Key

Encryption

Decryption

Enc Nsec Ciphertext Tag Status

Nsec Message Status

Fig. 5: Proposed input and output of an authenticated cipher core. Notation: Npub - Public Message Number, Nsec - Secret Message Number, Enc Nsec Encrypted Secret Message Number, AD - Associated Data

instruction = LDKEY

seg_0_header seg_0 = Key

Fig. 6: Format of Secret Data Input for loading the key

instruction = ACTKEY

instruction = ENC instruction = ACTKEY

seg_0_header

instruction = ENC

seg_0 = Npub

seg_0_header

seg_1_header

seg_0 = Npub

seg_1 = AD_0

seg_1_header

seg_2_header

seg_1 = AD

seg_2 = AD_1

seg_2_header

seg_3_header

seg_2 = Msg

seg_3 = Msg_0

(a)

seg_4_header seg_4 = Msg_1 (b)

Fig. 7: Format of Public Data Input in case of a) one segment for each data type, b) multiple segments for AD and Message

11

MSB

LSB

Opcode

Reserved

or

Status

4

12

Opcode:

Status:

0010 − Authenticated Encryption (ENC) 1110 − Success 0011 − Authenticated Decryption (DEC) 1111 − Failure 0100 − Load Key (LDKEY) 0111 − Activate Key (ACTKEY) Others − Reserved Note: If w < 16, more than one word should be used

Fig. 8: Instruction/Status format

4

8

8

16

Info

Reserved

Segment Length

1 1 1 1

MSB EOI Segment EOT Type Reserved

LSB Divided into ceil(32/w) words, starting from MSB

EOT = 1 if the last segment of its type (e.g., AD, Message, Ciphertext), 0 otherwise

EOI = 1 if the last segment of input 0 otherwise

Fig. 9: Segment Header format

Any segment type can be omitted, if it is not required by a given cipher. However, empty AD, empty message, and empty ciphertext must be provided using a separate segment, with the Segment Length field of the respective header set to 0. Public and Secret Message Numbers can only use one segment, as their sizes are typically quite small (in the range of 16 bytes). The Associated Data and Message can be (but do not have to be) divided into multiple segments (as shown in Fig. 7). The maximum size of each segment is assumed to be 216 − 1 bytes for single-pass authenticated ciphers, and 211 − 1 bytes for two-pass authenticated ciphers. The primary reasons for dividing AD and Message into multiple segments is that the full message size may be unknown when authenticated encryption starts, and/or the maximum single segment size (specified above) is smaller than the message size. The instruction/status format is shown in Fig. 8. For instruction, the Opcode field determines which operation should be executed next. For status, the Opcode field is replaced by the Status field, which can be set to only two values: Success or Failure. The segment header format is shown in Fig. 9. Segment Length is a size of a segment expressed in bytes. The field Info contains information about the 12

Table 1: Segment Type encoding Encoding 0000 0001 0010 0011 0100 0101 0110 0111

Type Reserved AD Npub||AD AD||Npub Message Ciphertext Ciphertext||Tag Reserved

Encoding 1000 1001 1010 1011 1100 1101 1110 1111

Type Nsec Enc Nsec Reserved Reserved Npub Tag Length Key

Segment Type (as defined in Table 1), as well as single-bit flags denoting the last segment of a particular type (EOT), and the last segment of the entire input (EOI), accordingly. For majority of algorithms, each segment is associated with a single part of input (such as AD, Message, Npub, Nsec), or a single part of output (such as Ciphertext and Enc Nsec). A segment can also be of the type Tag, which is as an output segment for authenticated encryption, and an input segment for authenticated decryption. For some algorithms, the internal implementation can be significantly simplified (and the resource utilization considerably reduced) under the assumption that Npub and AD are provided at the pdi_data input as a part of the same segment, and thus, they are already pre-formatted, by concatenating them first, and only afterwards dividing the obtained string of bytes into words. This option is particularly important when the size of Npub is not a multiple of the word size, and/or AD is expected to be processed before Npub, and is not padded. In order to simplify the implementations of such algorithms, the segments carrying the concatenation of both input parts: namely Npub || AD and AD || Npub are defined in Table 1. Similarly, for the algorithms that either – do not define a clear separation between the Ciphertext and the Tag, or – assume that these two parts of the output appear one after another, without filling any unused bits of the last word of the ciphertext with zeros, the segment type Ciphertext || Tag is defined. Hardware designers should be aware that the suboptimal choice of the segment types can substantially increase the implementation area, and on top of that may lead to the need for reformatting an output from authenticated encryption before providing it at the input of an alternative circuit for authenticated decryption, implemented by a different group. As a result, the choice of segment types should be clearly reported in the documentation of the cipher implementation, similarly as it is the case for the port width parameters: w and sw. Figures 10 and 11 present typical format of input (PDI) and output (DO) of authenticated encryption and decryption operation, respectively, for the ci13

instruction = ACTKEY

instruction = ENC

seg_0_header

seg_0_header

seg_0 = Ciphertext

seg_0 = Npub

seg_1_header

seg_1_header

seg_1 = Tag

seg_1 = AD

Status

seg_2_header

(b)

seg_2 = Message (a)

Fig. 10: Format of Public Data Input (PDI) and Data Output (DO) of authenticated encryption operation for ciphers that do not use Nsec: a) PDI, b) DO

instruction = ACTKEY instruction = DEC

seg_0_header seg_0 = Npub seg_1_header

seg_0_header

seg_1 = AD

seg_0 = Message

seg_2_header

Status

seg_2 = Ciphertext

(b)

seg_3_header seg_3 = Tag (a)

Fig. 11: Format of Public Data Input (PDI) and Data Output (DO) of authenticated decryption operation for ciphers that do not use Nsec: a) PDI, b) DO

14

instruction = ACTKEY instruction = ENC

seg_0_header

seg_0_header

sec_0 = Enc Nsec

seg_0 = Npub

seg_1_header

seg_1_header

seg_1 = Ciphertext

sec_1 = Nsec

seg_2_header

seg_2_header

seg_2 = Tag

seg_2 = AD

Status

seg_3_header

(b)

seg_3 = Message (a)

Fig. 12: Format of Public Data Input (PDI) and Data Output (DO) of authenticated encryption operation for ciphers that use Nsec: a) PDI, b) DO

instruction = ACTKEY instruction = DEC

seg_0_header seg_0 = Npub

seg_0_header

seg_1_header

sec_0 = Nsec

sec_1 = Enc Nsec

seg_1_header

seg_2_header

seg_1 = Message

seg_2 = AD

Status

seg_3_header (b)

seg_3 = Ciphertext seg_4_header seg_4 = Tag (a)

Fig. 13: Format of Public Data Input (PDI) and Data Output (DO) of authenticated decryption operation for ciphers that use Nsec: a) PDI, b) DO

15

phers that do not use Nsec. At the input (PDI ports), a message typically starts with the key activation instruction (ACTKEY), followed by an operational instruction (ENC or DEC). Header and data segments for different types of data subsequently follow. For encryption and decryption operation, the order typically is Npub, AD, Data (Message or Ciphertext) and Tag (for decryption only). It must be noted that the order of these segments can be interchanged for maximum efficiency depending on the particular algorithm. For ciphers that do not use Nsec, at the output (DO ports), the cryptographic core needs to only output the ciphertext and the tag for encryption, and the message for decryption. In the case that Nsec is used, additional segments should be added as shown in Figures 12 and 13. An output from encryption and decryption is always followed by the Status block, shown in 8. For decryption, Status=Failure means that the authentication failed. For both encryption and decryption, this value of the Status could be also used in the future to indicate the detection of any input formatting errors. Nevertheless, to simplify the implementations, no such input formatting check is required from the implementations compliant with the proposed hardware API at this point.

seg_hdr = Length 32−bit Len_AD 32−bit Len_Data

Fig. 14: Format of the optional Length segment

Addr Value n n+1 n+2 n+3 n+4 n+5 n+6 n+7

D[0] D[1] D[2] D[3] D[4] D[5] D[6] D[7]

31

0

word 0 D[0] D[1] D[2] D[3] word 1 D[4] D[5] D[6] D[7]

(b) 32-bit word representation

(a) Memory

Fig. 15: Conversion of (a) a string of bytes in a memory of a computer system into (b) a string of 32-bit words at the inputs and outputs of AEAD

For some authenticated ciphers (e.g., AES-CCM), the entire lengths of associated data and message/ciphertext have to be known before the encryption/decryption starts. In order to make it possible, an optional Segment Type, called 16

Length is defined. This segment contains only the total length of associated data concatenated with the total length of message/ciphertext, expressed in bytes. In a typical usage, the Length segment is placed right after the instruction Authenticated Encryption (ENC) or Authenticated Decryption (DEC). The exact format of this segment is shown in Fig. 14. For the purpose of full compatibility with software implementations, in Fig. 15, we define the dependence between data stored at consecutive locations in a memory of a computer system, and w-bit data words appearing at the ports of AEAD, such as pdi_data, sdi_data, and do_data. For simplicity, in this figure, w is set to 32. The assumed dependence corresponds to the big-endian convention for an order of bytes within a word.

4

Timing Characteristics

Figures 16, 17, and 18 specify the timing characteristics of the ports PDI, SDI, and DO, respectively. Input ports are shown in blue and the output ports in red. The contents of data buses are read and acknowledged when *_valid and its corresponding *_ready are both asserted. Data is assumed to be present at the output of the source module when *_valid is asserted.

Fig. 16: Example timing diagram for PDI

Fig. 17: Example timing diagram for SDI

17

Fig. 18: Example timing diagram for DO

5

Conclusions

We propose the full specification of the hardware API for authenticated ciphers, suitable for hardware benchmarking of candidates competing in the CAESAR contest [2] and their comparison with a previous generation of authenticated encryption with associated data algorithms, such as AES-GCM and AES-CCM. Our proposal meets one of the fundamental properties of every properly defined API: If a given algorithm is implemented independently by two different groups using the same API, one should be able to – encrypt a message using the first implementation, and – decrypt it using the second implementation. To be exact, our assumption is that either 1. both implementations use the same values of the data port widths w and sw, and the same segment types, or 2. simple reformatting of the input to decryption is performed outside of the cipher core (in software or hardware). Examples of such reformatting include: word width conversion and/or splitting/concatenating two neighboring segment types, such as Npub||AD, Npub, and AD (see Table 1). A similar API, described in [3], has been successfully used to implement and benchmark over a dozen of Round 1 CAESAR candidates, all qualified to Round 2 of the competition.

References 1. ARM. AMBA Specifications. [Online]. Available: http://www.arm.com/products/ system-ip/amba-specifications.php 2. CAESAR: Competition for Authenticated Encryption: Security, Applicability, and Robustness. (2016, February) Cryptographic competitions. [Online]. Available: http://competitions.cr.yp.to/index.html 3. E. Homsirikamol, W. Diehl, A. Ferozpuri, F. Farahmand, M. U. Sharif, and K. Gaj, “GMU Hardware API for Authenticated Ciphers,” Cryptology ePrint Archive, Report 2015/669, 2015.

18