Proposed CAESAR Hardware API Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand, Panasayya Yalla, Jens-Peter Kaps, and Kris Gaj Cryptographic Engineering Research Group George Mason University Fairfax, Virginia 22030 email: {ehomsiri, wdiehl, aferozpu, ffarahma, pyalla, jkaps, kgaj}@gmu.edu
Abstract. In this paper, we propose a universal hardware Application Programming Interface (API) for authenticated ciphers. In particular, our API is intended to meet the requirements of all algorithms submitted to the CAESAR competition. The major parts of our proposal include: minimum compliance criteria, interface, communication protocol, and timing characteristics supported by the core. All of them have been defined with the goals of guaranteeing (a) compatibility among implementations of the same algorithm by different designers, and (b) fair benchmarking of authenticated ciphers in hardware.
1
Minimum Compliance Criteria
The proposed minimum compliance criteria are listed below: 1.1
Encryption/Decryption
Authenticated encryption and decryption should be implemented within one core, in such a way that only one of these two operations can be executed at a time (half-duplex). This feature demonstrates an algorithm’s ability to use shared resources for encryption and decryption. Alternatives (not recommended): a) separate cores for encryption and decryption (simplex) b) authenticated encryption and decryption within one core, with both operations capable of running in parallel (full-duplex). 1.2
Variants
Only a variant indicated in the cipher specification as the primary recommendation has to be implemented. Other variants, if implemented, should be selectable by changing the default values of generics or constants before synthesis. Implementation of these variants should not affect any benchmarking results for the main variant.
1.3
Key scheduling
Key scheduling should be fully implemented within the hardware core. This approach takes into account very different contributions of the key scheduling unit to the entire cipher core area, which are specific for each algorithm. An alternative (not recommended): a) generation of round keys outside of the cipher core, e.g., in software.
1.4
Incomplete blocks
The core should properly handle incomplete blocks in associated data, message, and ciphertext. The resources and the numbers of clock cycles required to handle incomplete blocks are different among candidates. For multiple candidates handling of incomplete blocks can substantially affect the area and/or speed of the core. Of particular concern are variable shifts, which typically introduce a significant overhead in terms of area and/or timing. Often, a trade-off between the area and the number of clock cycles necessary to process an incomplete block can be made by choosing an appropriate detailed hardware architecture. An alternative (not recommended): a) handling only associated data, messages, and ciphertexts composed of full blocks.
1.5
Padding
Padding should be implemented in hardware, assuming that an unused portion of the last input data word is filled with zeros. Padding cost, in terms of area, is algorithm dependent and not negligible. In some algorithms, padding in software may need to be reversed in hardware because the tag calculation uses an unpadded last block. Alternatives (not recommended): a) padding in hardware, assuming that an unused portion of the last block is filled with zeros. b) padding in software, followed, if needed, by modifications of the last blocks in hardware. 2
1.6
Unused portions of the last word
Any unused portions of the last word generated during encryption and decryption should be cleared (filled with zeros) before releasing this word outside of the cipher core. An alternative (not recommended): a) potentially leaking some key-related data using unused portions of the last word. 1.7
Decrypted message release
The decrypted message blocks should be released immediately, and buffered outside of the cipher core, before the result of authentication is known. We assume that the delayed release of decrypted messages, dependent on the result of authentication, will be handled by an external circuit, which is FIFObased and similar for each candidate. Please note that we believe that such a unit MUST be implemented externally to the basic cipher core before deploying the core in majority of real-world scenarios. We only omit this important feature from the basic core because it adds quite a substantial burden on the hardware designer, with little benefits in terms of better differentiation among the CAESAR candidates. Additionally, the resources used by this special external unit can be made identical (or almost identical) for all candidates, and are limited mostly to large blocks of memory, which are typically counted as separate resources, independent from logic gates in ASICs and reconfigurable logic units (such as LUTs, Slices, etc.) in FPGAs. An alternative (not recommended): a) storing a decrypted message internally, until the result of verification is known. Pros: More complete functionality. Cons: Complicates the design and benchmarking. Also, makes the calculation of the output latency and throughput dependent on the output buffer size and implementation details (e.g., support for simultaneous reading and writing). 1.8
Empty AD/message/ciphertext
Allowing empty AD, empty message/ciphertext, and empty input (no AD, no message/ciphertext) Empty input could be used together with the Public Message Number, Npub, for user authentication. Alternatives (not recommended): 3
a) not allowing empty AD b) not allowing empty message/ciphertext c) not allowing empty input. 1.9
Supported maximum size of AD/message/ciphertext
single-pass authenticated ciphers: two-pass authenticated ciphers:
232 − 1 bytes 211 − 1 bytes
Maximum sizes defined in the CAESAR candidates’ specifications are unrealistic. Values that are too large may affect both area and maximum clock frequency of the hardware core (e.g., because of wide internal counters). 211 − 1 bytes > 1500 bytes = maximum transmission unit (MTU) of popular communication protocols, such as Ethernet v2.
1.10
Fractions of bytes
The size of all inputs is assumed to be expressed in bytes. As a result, the core should support only inputs composed of full bytes. No fractions of bytes should be allowed. An alternative (not recommended): a) the size of inputs expressed in bits. Allowing inputs of arbitrary size in bits would substantially increase the area required for handling of incomplete blocks and padding.
1.11
Maximum number of independent streams of data processed in parallel
The core should process only one stream of data at a time without an overlap. By a stream of data we understand here a single independent input composed of any subset of Npub, Nsec, AD, Message, Ciphertext, and Tag, supported by the encryption or decryption operation of a given authenticated cipher. We call a core with such features a single-stream implementation. Please note that a single-stream implementation may still take advantage of parallel processing for blocks belonging to the same input/stream. An alternative (not recommended): a) a multi-stream implementation that supports the processing of multiple independent inputs/streams in parallel. 4
In the multi-stream implementations: – Throughput is limited only by the maximum circuit area. – Multiple messages/ciphertexts processed in parallel would require multiple public data input (PDI) and data outputs (DO) ports. See Section 2 for the detailed descriptions of these ports. 1.12
External memory
Single-pass algorithms: Two-pass algorithms: 1.13
No Yes (but only for results of the first pass)
Number of clock domains
The entire authenticated cipher core should have only one clock input. For the maximum performance, this clock should be operating at the maximum clock frequency determined by the critical path located entirely inside of the hardware module. An alternative (not recommended): a) separate clocks for input module, output module, and the encryption/decryption unit. Pros: Possible smaller values of data bus widths. Cons: Difficulties with determining the maximum clock frequency of the cipher core. 1.14
Passing unchanged parts of the input to the output
Parts of the data inputs that are not changed by encryption or decryption operations, respectively, are not passed to the output. In particular, Npub and AD are not a part of the output from either encryption or decryption. See Fig. 5. This assumption removes the need for any bypass FIFO necessary to pass any unchanged data to the output. Any formatting of an output from decryption, for the purpose of transmission through the network or decryption, is assumed to be performed outside of the cipher core. An alternative (not recommended): a) passing unchanged parts of the input to the output. Pros: More complete functionality. Cons: The design time and area overhead for adding standard functionality that may be implemented in a coherent way outside of the authenticated cipher core. 5
1.15
Permitted widths of data ports (in bits)
Public Data Input (PDI) and Data Output (DO) ports: Lightweight implementations: High-speed implementations:
w = 8, 16, 32 32 ≤ w ≤ 256.
Secret Data Input (SDI) ports: Lightweight implementations: High-speed implementations:
w = 8, 16, 32 32 ≤ sw ≤ 64.
See Section 2 and Figs. 1 and 2 for the exact meaning of PDI, SDI, DO, w and sw. Implementations of a particular authenticated cipher, with the same w and sw, following all other minimum compliance criteria, should be mutually compatible. Implementations with different values of w or sw should be compatible under the assumption that the decryption input is reformatted in software or hardware (from one input word width to another) using a universal function/circuit, common for all candidates. 1.16
Decrypted message authentication
The result of the decrypted message authentication (Success or Failure) is calculated within the core itself. An alternative (not recommended): a) Calculating only the full value of the Tag during authenticated decryption. No comparison with the expected value of the Tag. Pros: Simpler input for decryption (no expected Tag). Lower resource requirements (no comparison between the actual and expected Tag). Cons: Inconsistent behavior for algorithms that use the Tag and those that require only the Ciphertext for authentication. Potential need for the tag comparison outside of the cipher core. 1.17
Interface, Communication Protocol, and Timing Characteristics
For the purpose of full compatibility, – the interface of the circuit should be consistent with the interface defined in Section 2, – the communication protocol of the circuit should be consistent with the protocol defined in Section 3, and 6
– the timing dependencies between data and control signals of the respective ports of the cipher core should be consistent with the dependencies described in Section 4. All the aforementioned elements of the hardware API are closely related to each other, and are selected in a coherent fashion. The advantages include the ease of communication with standard components of modern digital systems, such as FIFOs (First-In First-Out units), DMAs (Direct Memory Access units), AXI (Advanced eXtensible Interface) Master and Slave units, etc. No complex input or output modules are required. Standard buses, with the typical widths, such as 32, 64, 128, and 256-bits can be used for communication with the core. Additionally, unusual bus widths, suitable for some algorithms, such as 40 in case of PRIMATEs, can be utilized. On top of that, the core can be easily extended with an external circuit used to prevent the release of unauthenticated decrypted messages outside of a given hardware device, such as an FPGA, an ASIC, or a programmable system on chip.
2
Interface
The general idea of the proposed interface for an authenticated cipher core (denoted by AEAD) is shown in Fig. 1 for single-pass algorithms, and in Fig. 2 for two-pass algorithms. For both types of algorithms, the interface includes three major data buses for: – Public Data Inputs (PDI) – Secret Data Inputs (SDI), and – Data Outputs (DO), respectively, as well as the corresponding handshaking control signals, named valid and ready. The valid signal indicates that the data is ready at the source, and the ready signal indicates that the destination is ready to receive them. Additionally, for two-pass algorithms, an external FIFO of the size of at least 211 bytes is assumed to be used for storing intermediate results. This FIFO is connected to AEAD using Memory Ports: fifo_di, fifo_do, fifo_rd, and fifo_wr. The width of the data ports, fifo_di and fifo_do, denoted as mw, should be typically set to the width of intermediate blocks in a given two-pass algorithm. An optional output, status_ready is assumed to be active whenever an authenticated encryption or decryption has completed, and the Status block, shown in Figs. 8, 10(b), 11(b), 12(b), 13(b) is present at the do_data output. The presence of this signal can substantially simplify the operation of an external circuit used to discard any decrypted messages that did not pass authentication. The physical separation of Public Data Inputs (such as the message, associated data, public message number, etc.) from Secret Data Inputs (such as the key) is dictated by the resistance against any potential attacks aimed at accepting public data, manipulated by an adversary, as a new key. The handshaking signals are a subset of major signals used in the AXI4Stream interface [1]. As a result, AEAD can communicate directly with the 7
clk
rst
clk
rst
AEAD PDI Public Data Input Ports SDI Secret Data Input Ports
w
pdi_data
do_data
pdi_valid
do_valid
w
pdi_ready do_ready sw
DO Data Output Ports
sdi_data sdi_valid
status_ready
sdi_ready
Fig. 1: AEAD interface for single-pass authenticated ciphers
clk
rst
clk
rst
AEAD PDI Public Data Input Ports
w
pdi_data
do_data
pdi_valid
do_valid
w
pdi_ready do_ready
DO Data Output Ports
status_ready
SDI Secret Data Input Ports
sw
sdi_data sdi_valid sdi_ready
fifo_di fifo_do fifo_rd
mw mw
Memory Ports
fifo_wr
Fig. 2: AEAD interface for two-pass authenticated ciphers
8
clk
rst
AXI4−Stream Master m_axis_tdata
w
sw
rst
pdi_data
do_data
pdi_valid
do_valid
rst
AXI4−Stream Slave w
s_axis_tdata s_axis_tvalid s_axis_tready
sdi_data sdi_valid
empty
sdi_ready
read clk
clk
clk
pdi_ready do_ready
m_axis_tready dout
rst
AEAD
m_axis_tvalid
SDI FIFO
clk
rst
Fig. 3: Typical external circuits for single-pass algorithms: AXI4-Stream IPs
wr_clk
rst
PDI FIFO
rd_clk = clk dout
w
empty read
wr_clk
rst
clk
rst
pdi_data
do_data
pdi_valid
do_valid
pdi_ready do_ready
read
SDI FIFO
rst wr_clk = clk rst
AEAD
empty
dout
clk
sw
w
din write
DO FIFO
full
sdi_data sdi_valid sdi_ready
rd_clk = clk
Fig. 4: Typical external circuits for single-pass algorithms: FIFOs
9
rd_clk
AXI4-Stream Master through the Public Data Input, and with the AXI4-Stream Slave through the Data Output, as shown in Fig. 3. At the same time, AEAD is also capable of communicating with much simpler external circuits, such as FIFOs, as shown in Fig. 4. In both cases, the Secret Data Input is connected to a FIFO, as the amount of data loaded to the core using this input port does not justify the use of a separate AXI4-Stream Master, such as DMA. An additional advantage of using FIFOs at all data ports is their potential role as suitable boundaries between the two clock domains, used for communication and computations, accordingly. This role is facilitated by the use of separate read and write clocks, shown in Fig. 4 as rd_clk and wr_clk, accordingly. For a better compatibility with the AXI communication interface, all FIFOs mentioned in our description are assumed to operate in the First-Word Fall-Through mode (as opposed to the standard mode). The reset input can be either synchronous or asynchronous, and either activehigh or active-low, depending on the conventions used in a given technology (e.g., FPGA vs. ASIC), as well as the personal preference of the designers.
3
Communication Protocol
All parts of a typical input and and a typical output of an authenticated cipher are shown in Fig. 5, for encryption and decryption, respectively. Npub denotes Public Message Number, such as Nonce or Initialization Vector. Nsec denotes Secret Message Number, which was recently introduced in some authenticated ciphers and is a part of the CAESAR software API [2]. Both Npub and Nsec are typically assumed to be unique for each message encrypted using a given key. The difference is that Npub is sent to the other side in clear, while Nsec is sent in the encrypted form. The proposed format of the Secret Data Input is shown in Fig. 6. The entire input starts with an instruction, which in case of SDI is limited to Load Key (LDKEY). The instruction is followed by segments. Each segment starts with a separate header, describing its type and size. In case of SDI, the only segment type necessary to meet the minimum compliance criteria is: Key, denoting a string of bits carrying an authenticated cipher key. The proposed format of the Public Data Input is shown in Fig. 7. The allowed instruction types are: Activate Key (ACTKEY), Authenticated Encryption (ENC), and Authenticated Decryption (DEC). The Activate Key instruction, typically directly precedes the Authenticated Encryption or Authenticated Decryption instruction. PDI is divided into segments. Segment types allowed during authenticated encryption include: Public Message Number (Npub), Secret Message Number (Nsec), Associated Data (AD), and Message. Segment types allowed during authenticated decryption include: Public Message Number (Npub), Encrypted Secret Message Number (Enc Nsec), Associated Data (AD), Ciphertext, and Tag. 10
Npub Nsec
AD
Enc Npub Nsec
Message
Key
AD
Ciphertext Tag
Key
Encryption
Decryption
Enc Nsec Ciphertext Tag Status
Nsec Message Status
Fig. 5: Proposed input and output of an authenticated cipher core. Notation: Npub - Public Message Number, Nsec - Secret Message Number, Enc Nsec Encrypted Secret Message Number, AD - Associated Data
instruction = LDKEY
seg_0_header seg_0 = Key
Fig. 6: Format of Secret Data Input for loading the key
instruction = ACTKEY
instruction = ENC instruction = ACTKEY
seg_0_header
instruction = ENC
seg_0 = Npub
seg_0_header
seg_1_header
seg_0 = Npub
seg_1 = AD_0
seg_1_header
seg_2_header
seg_1 = AD
seg_2 = AD_1
seg_2_header
seg_3_header
seg_2 = Msg
seg_3 = Msg_0
(a)
seg_4_header seg_4 = Msg_1 (b)
Fig. 7: Format of Public Data Input in case of a) one segment for each data type, b) multiple segments for AD and Message
11
MSB
LSB
Opcode
Reserved
or
Status
4
12
Opcode:
Status:
0010 − Authenticated Encryption (ENC) 1110 − Success 0011 − Authenticated Decryption (DEC) 1111 − Failure 0100 − Load Key (LDKEY) 0111 − Activate Key (ACTKEY) Others − Reserved Note: If w < 16, more than one word should be used
Fig. 8: Instruction/Status format
4
8
8
16
Info
Reserved
Segment Length
1 1 1 1
MSB EOI Segment EOT Type Reserved
LSB Divided into ceil(32/w) words, starting from MSB
EOT = 1 if the last segment of its type (e.g., AD, Message, Ciphertext), 0 otherwise
EOI = 1 if the last segment of input 0 otherwise
Fig. 9: Segment Header format
Any segment type can be omitted, if it is not required by a given cipher. However, empty AD, empty message, and empty ciphertext must be provided using a separate segment, with the Segment Length field of the respective header set to 0. Public and Secret Message Numbers can only use one segment, as their sizes are typically quite small (in the range of 16 bytes). The Associated Data and Message can be (but do not have to be) divided into multiple segments (as shown in Fig. 7). The maximum size of each segment is assumed to be 216 − 1 bytes for single-pass authenticated ciphers, and 211 − 1 bytes for two-pass authenticated ciphers. The primary reasons for dividing AD and Message into multiple segments is that the full message size may be unknown when authenticated encryption starts, and/or the maximum single segment size (specified above) is smaller than the message size. The instruction/status format is shown in Fig. 8. For instruction, the Opcode field determines which operation should be executed next. For status, the Opcode field is replaced by the Status field, which can be set to only two values: Success or Failure. The segment header format is shown in Fig. 9. Segment Length is a size of a segment expressed in bytes. The field Info contains information about the 12
Table 1: Segment Type encoding Encoding 0000 0001 0010 0011 0100 0101 0110 0111
Type Reserved AD Npub||AD AD||Npub Message Ciphertext Ciphertext||Tag Reserved
Encoding 1000 1001 1010 1011 1100 1101 1110 1111
Type Nsec Enc Nsec Reserved Reserved Npub Tag Length Key
Segment Type (as defined in Table 1), as well as single-bit flags denoting the last segment of a particular type (EOT), and the last segment of the entire input (EOI), accordingly. For majority of algorithms, each segment is associated with a single part of input (such as AD, Message, Npub, Nsec), or a single part of output (such as Ciphertext and Enc Nsec). A segment can also be of the type Tag, which is as an output segment for authenticated encryption, and an input segment for authenticated decryption. For some algorithms, the internal implementation can be significantly simplified (and the resource utilization considerably reduced) under the assumption that Npub and AD are provided at the pdi_data input as a part of the same segment, and thus, they are already pre-formatted, by concatenating them first, and only afterwards dividing the obtained string of bytes into words. This option is particularly important when the size of Npub is not a multiple of the word size, and/or AD is expected to be processed before Npub, and is not padded. In order to simplify the implementations of such algorithms, the segments carrying the concatenation of both input parts: namely Npub || AD and AD || Npub are defined in Table 1. Similarly, for the algorithms that either – do not define a clear separation between the Ciphertext and the Tag, or – assume that these two parts of the output appear one after another, without filling any unused bits of the last word of the ciphertext with zeros, the segment type Ciphertext || Tag is defined. Hardware designers should be aware that the suboptimal choice of the segment types can substantially increase the implementation area, and on top of that may lead to the need for reformatting an output from authenticated encryption before providing it at the input of an alternative circuit for authenticated decryption, implemented by a different group. As a result, the choice of segment types should be clearly reported in the documentation of the cipher implementation, similarly as it is the case for the port width parameters: w and sw. Figures 10 and 11 present typical format of input (PDI) and output (DO) of authenticated encryption and decryption operation, respectively, for the ci13
instruction = ACTKEY
instruction = ENC
seg_0_header
seg_0_header
seg_0 = Ciphertext
seg_0 = Npub
seg_1_header
seg_1_header
seg_1 = Tag
seg_1 = AD
Status
seg_2_header
(b)
seg_2 = Message (a)
Fig. 10: Format of Public Data Input (PDI) and Data Output (DO) of authenticated encryption operation for ciphers that do not use Nsec: a) PDI, b) DO
instruction = ACTKEY instruction = DEC
seg_0_header seg_0 = Npub seg_1_header
seg_0_header
seg_1 = AD
seg_0 = Message
seg_2_header
Status
seg_2 = Ciphertext
(b)
seg_3_header seg_3 = Tag (a)
Fig. 11: Format of Public Data Input (PDI) and Data Output (DO) of authenticated decryption operation for ciphers that do not use Nsec: a) PDI, b) DO
14
instruction = ACTKEY instruction = ENC
seg_0_header
seg_0_header
sec_0 = Enc Nsec
seg_0 = Npub
seg_1_header
seg_1_header
seg_1 = Ciphertext
sec_1 = Nsec
seg_2_header
seg_2_header
seg_2 = Tag
seg_2 = AD
Status
seg_3_header
(b)
seg_3 = Message (a)
Fig. 12: Format of Public Data Input (PDI) and Data Output (DO) of authenticated encryption operation for ciphers that use Nsec: a) PDI, b) DO
instruction = ACTKEY instruction = DEC
seg_0_header seg_0 = Npub
seg_0_header
seg_1_header
sec_0 = Nsec
sec_1 = Enc Nsec
seg_1_header
seg_2_header
seg_1 = Message
seg_2 = AD
Status
seg_3_header (b)
seg_3 = Ciphertext seg_4_header seg_4 = Tag (a)
Fig. 13: Format of Public Data Input (PDI) and Data Output (DO) of authenticated decryption operation for ciphers that use Nsec: a) PDI, b) DO
15
phers that do not use Nsec. At the input (PDI ports), a message typically starts with the key activation instruction (ACTKEY), followed by an operational instruction (ENC or DEC). Header and data segments for different types of data subsequently follow. For encryption and decryption operation, the order typically is Npub, AD, Data (Message or Ciphertext) and Tag (for decryption only). It must be noted that the order of these segments can be interchanged for maximum efficiency depending on the particular algorithm. For ciphers that do not use Nsec, at the output (DO ports), the cryptographic core needs to only output the ciphertext and the tag for encryption, and the message for decryption. In the case that Nsec is used, additional segments should be added as shown in Figures 12 and 13. An output from encryption and decryption is always followed by the Status block, shown in 8. For decryption, Status=Failure means that the authentication failed. For both encryption and decryption, this value of the Status could be also used in the future to indicate the detection of any input formatting errors. Nevertheless, to simplify the implementations, no such input formatting check is required from the implementations compliant with the proposed hardware API at this point.
seg_hdr = Length 32−bit Len_AD 32−bit Len_Data
Fig. 14: Format of the optional Length segment
Addr Value n n+1 n+2 n+3 n+4 n+5 n+6 n+7
D[0] D[1] D[2] D[3] D[4] D[5] D[6] D[7]
31
0
word 0 D[0] D[1] D[2] D[3] word 1 D[4] D[5] D[6] D[7]
(b) 32-bit word representation
(a) Memory
Fig. 15: Conversion of (a) a string of bytes in a memory of a computer system into (b) a string of 32-bit words at the inputs and outputs of AEAD
For some authenticated ciphers (e.g., AES-CCM), the entire lengths of associated data and message/ciphertext have to be known before the encryption/decryption starts. In order to make it possible, an optional Segment Type, called 16
Length is defined. This segment contains only the total length of associated data concatenated with the total length of message/ciphertext, expressed in bytes. In a typical usage, the Length segment is placed right after the instruction Authenticated Encryption (ENC) or Authenticated Decryption (DEC). The exact format of this segment is shown in Fig. 14. For the purpose of full compatibility with software implementations, in Fig. 15, we define the dependence between data stored at consecutive locations in a memory of a computer system, and w-bit data words appearing at the ports of AEAD, such as pdi_data, sdi_data, and do_data. For simplicity, in this figure, w is set to 32. The assumed dependence corresponds to the big-endian convention for an order of bytes within a word.
4
Timing Characteristics
Figures 16, 17, and 18 specify the timing characteristics of the ports PDI, SDI, and DO, respectively. Input ports are shown in blue and the output ports in red. The contents of data buses are read and acknowledged when *_valid and its corresponding *_ready are both asserted. Data is assumed to be present at the output of the source module when *_valid is asserted.
Fig. 16: Example timing diagram for PDI
Fig. 17: Example timing diagram for SDI
17
Fig. 18: Example timing diagram for DO
5
Conclusions
We propose the full specification of the hardware API for authenticated ciphers, suitable for hardware benchmarking of candidates competing in the CAESAR contest [2] and their comparison with a previous generation of authenticated encryption with associated data algorithms, such as AES-GCM and AES-CCM. Our proposal meets one of the fundamental properties of every properly defined API: If a given algorithm is implemented independently by two different groups using the same API, one should be able to – encrypt a message using the first implementation, and – decrypt it using the second implementation. To be exact, our assumption is that either 1. both implementations use the same values of the data port widths w and sw, and the same segment types, or 2. simple reformatting of the input to decryption is performed outside of the cipher core (in software or hardware). Examples of such reformatting include: word width conversion and/or splitting/concatenating two neighboring segment types, such as Npub||AD, Npub, and AD (see Table 1). A similar API, described in [3], has been successfully used to implement and benchmark over a dozen of Round 1 CAESAR candidates, all qualified to Round 2 of the competition.
References 1. ARM. AMBA Specifications. [Online]. Available: http://www.arm.com/products/ system-ip/amba-specifications.php 2. CAESAR: Competition for Authenticated Encryption: Security, Applicability, and Robustness. (2016, February) Cryptographic competitions. [Online]. Available: http://competitions.cr.yp.to/index.html 3. E. Homsirikamol, W. Diehl, A. Ferozpuri, F. Farahmand, M. U. Sharif, and K. Gaj, “GMU Hardware API for Authenticated Ciphers,” Cryptology ePrint Archive, Report 2015/669, 2015.
18