Lightweight reconfiguration security services for AXI-based MPSoCs
Pascal Cotret, Guy Gogniat, Jean-Philippe Diguet, Jérémie Crenne

To cite this version:
Pascal Cotret, Guy Gogniat, Jean-Philippe Diguet, Jérémie Crenne. Lightweight reconfiguration security services for AXI-based MPSoCs. FPL 2012 (22nd International Conference on Field Programmable Logic and Applications), Aug 2012, Oslo, Norway. pp.655-658, 2012, <10.1109/FPL.2012.6339233>. <hal-00750332>
Lightweight Reconfiguration Security Services for AXI-Based MPSoCs

Pascal Cotret, Guy Gogniat, Jean-Philippe Diguet

Laboratoire Lab-STICCNUniversité de Bretagne-Sud
Lorient, France
name.surname@univ-ubs.fr

Jérémie Crenne

Laboratoire LIRMM
Université de Montpellier
Montpellier, France
jeremie.crenne@lirmm.fr

Abstract

Nowadays, security is a key constraint in MPSoC development as many critical and secret information can be stored and manipulated within these systems. Addressing the protection issue in an efficient way is challenging as information can leak from many points. However, one strategic component of a bus-based MPSoC is the communication architecture as all information that an attacker could try to extract or modify would be visible on the bus. Thus monitoring and controlling communications allows an efficient protection of the whole system. Attacks can be detected and discarded before system corruption. In this work, we propose a lightweight solution to dynamically update hardware firewall enhancements which secure data exchanges in a bus-based MPSoC. It provides a standalone security solution for AXI-based embedded systems where no user intervention is required for security mechanisms update. An FPGA implementation demonstrates an area overhead of around 11% for the adaptive version of the hardware firewall compared to the static one.

1. Introduction

Embedded systems are facing an increasing number of threats as attackers’ motivation is raising every day. Our devices contain many sensitive information (passwords, private information) that needs to be protected from software and hardware attacks. Reconfigurable technologies such as FPGAs can be a good candidate to build trusted devices as they embed processors, memories and application-specific IPs in a single chip with moderate development costs. They also offer several interesting features to build high-performance high-security systems due to their intrinsic performance and adaptive properties. When dealing with logic attacks (e.g. targeting the external memory through code/data corruption) main existing solutions are based on software countermeasures. However, relying the system security on software-only solutions may not be adapted for high-constrained embedded systems. We believe adding some hardware mechanisms strongly increases systems security at a very low cost. Monitoring and controlling the communication architecture in a bus-based MPSoC is particularly well adapted to enhance the security of a system as all data are exposed to its structure. Thus if an attacker tries to extract, destroy or modify some data through a logical attack he will have to use the communication architecture. Therefore, enhancing the communication architecture with protection mechanisms is a crucial point. However, when building such a solution several key points need to be addressed: what happens when an attack is detected? How the system should behave? Is it required to increase the protection level of the system? In this paper, we address these questions and propose security update features to provide designers dynamic security levels.

The paper is organized as follows. Section 2 presents related work. Section 3 summarizes a static solution defined in [1]. Section 4 describes our contribution and Sections 5 and 6 propose several results and analysis. Section 7 highlights main perspectives.

2. Related Work

In the literature, several studies have addressed the security of embedded systems [2]. At the communication level, these systems can be protected either by software or hardware mechanisms. Software solutions generally do not require additional hardware but offer low efficiency in terms of latency which can be critical for applications where reactivity is essential to fend off attacks. From an hardware point of view, several solutions have been proposed depending on the communication architecture technology: network-on-chip (NoC) or bus. Regarding NoC-based architectures, Evain et al. [3] propose a solution where security controls are done in each network interface in a distributed manner. A management unit gathers all information from network interfaces according to a security policy. Fiorin et al. [4][5][6] propose an alternative to this work by adding probes within the interface structure to refine the protection mechanisms. These probes can block incoming traffic according to parameters stored in an embedded memory. A security manager collects information from individual security-enhanced interfaces to detect any collision or error in the traffic. They also provide a adaptive implementation by adding a shadow memory (which behaves like a buffer mechanism) to avoid a temporary state of the security enhancement during its reconfiguration. For bus-based communication architectures, the work of Coburn et al. [7] which is similar to Fiorin’s work (without updates) is based on SEI (Security Enforcement Interface) implemented in each interface between an IP and the bus. Each SEI computes information from the data handled by the IP and sends it to a global manager (SEM, Security Enforcement Module) which aims to check SEI data.

This work focuses on an FPGA implementation of an adaptive version of the work presented in [1]. It is inspired by
Table 1. Comparison with existing works

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Adaptive?</td>
<td>No</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td>Threat model</td>
<td>Wide range of soft. attacks</td>
<td>Mainly buffer overflow</td>
<td>Wide range of soft. attacks</td>
</tr>
</tbody>
</table>

Fiorin et al. approach [6] with an implementation specific to the AXI bus standard with a low area/latency trade-off. Moreover, several scenarios and a case study are discussed to demonstrate all the features provided in this work. The key contributions of this work include:

- Demonstration of adaptive firewall enhancements.
- Design of a set of security policies.
- Analysis of attack scenarios and case study implementation.

3. STATIC SECURITY ENHANCEMENTS WITHIN A BUS-BASED MPSoC SYSTEM

This work is based on static security enhancements proposed in [1] and [8]. It provides a low-latency solution based on hardware firewalls embedded in each IP bus interface providing protection against read/write access and format disruptions (Local Firewalls). The firewall connected to the external memory controller (Cryptographic Firewall) adds flexible cryptographic services to protect the memory with confidentiality and/or authentication (Figure 1).

Fiorin et al. approach [6] with an implementation specific to the AXI bus standard with a low area/latency trade-off. Moreover, several scenarios and a case study are discussed to demonstrate all the features provided in this work. The key contributions of this work include:

- Demonstration of adaptive firewall enhancements.
- Design of a set of security policies.
- Analysis of attack scenarios and case study implementation.

3. STATIC SECURITY ENHANCEMENTS WITHIN A BUS-BASED MPSoC SYSTEM

This work is based on static security enhancements proposed in [1] and [8]. It provides a low-latency solution based on hardware firewalls embedded in each IP bus interface providing protection against read/write access and format disruptions (Local Firewalls). The firewall connected to the external memory controller (Cryptographic Firewall) adds flexible cryptographic services to protect the memory with confidentiality and/or authentication (Figure 1).

When an attack event is detected, BRAM contents must be updated with new security policies in order to keep a safe execution environment for the target MPSoC. This work proposes an architecture for this purpose as detailed in Figure 2. All the components are connected through an AXI-Lite bus (also known as Security bus) and managed by a trustworthy processor (program stored in a trusted ROM) which stores important events in a log file readable by the processor running the main application (timestamps, attack events...). Each firewall has a custom bus connection with the monitoring IP. While log timer and monitoring IP are used for detecting and reporting an attack, AXI-Lite BRAM controller connections are used for security policies update. The first task accomplished in the architecture presented in Figure 2 is the monitoring of attack events by a dedicated IP in the left hand side of the figure. Once an attack is detected, register values are update and an interruption routine is launched for security update on the trustworthy processor depending on the main register value (representative of all attack events). On an attack event, the system security must be updated in order to avoid malicious data leakage. One very important issue during firewalls update is the data availability while switching between two security policies. For this purpose, a mechanism based on the handshake property of the AXI protocol is implemented in each Firewall Interface to block outcoming traffic while updating security rules associated with the current firewall.
4.1. Hierarchy and evolution of protection modes

Two classes of components are defined based on their ability to manipulate confidential information. Critical IPs (for instance, ciphering algorithms implementation) must not reveal any information when an attack is detected. Extracting keys and/or signatures would be a major threat for the system. In that case as soon as an attack is detected critical IPs are isolated from the system (known as “error mode” or quarantine). For non-critical IPs, an intermediate protection layer is authorized where reading accesses are still allowed but no writing ones. This feature aims for example to allow a backup of data before IP isolation.

In case of an attack event (detected through interruption routines launched by the monitoring IP), the trustworthy processor saves the current security policy of the attacked firewall in a dedicated on-chip memory and applies a higher security level according to the two schemes previously defined. For instance, if the current protection level of an IP is a “read-only mode”, the next one could be an “error mode”, equivalent to a quarantine feature.

In the most critical case (i.e. an attack event is detected even if the IP being monitored is under the “error mode” protection configuration), it is assumed that a reboot of the complete system is required. Therefore, the initial bitstream and security policies configuration are reloaded. The system restarts from an initial state.

4.2. Switching between the different modes

When an IP firewall must be updated, Security Policies parameters stored in Block RAMs have to be modified (assuming that the whole IP address space is covered by one or more SP, the only component to be updated is the BRAM containing SP values). This work considers only the update of read/write rights (fields 1 in Figure 3). Local Firewall SP is stored on a single 32-bit block while Cryptographic Firewall read/write information is stored in the first 32-bit word of the Cryptographic SP. Writing a Security Policy (stored in a 32-bit word) in a BRAM takes one clock cycle. Therefore, the update of N Security Policies in one firewall is done in N clock cycles. This time has no real impact on firewall data analysis because data is blocked as soon as an attack is detected (previously described in this work).

5. IMPLEMENTATIONS RESULTS

Implementations were done using Virtex-6 Xilinx FPGA technology (model xc6vlx240t1156-1). This device has around 240,000 logic cells and 15 Mb of Block RAM. First, different implementation options for firewalls are studied. Then, this work focuses on generic scenarios that serve as a basis for further case study analysis.

5.1. Area

In Table 2, two implementation options are considered: static Local Firewall (based on results of [1]) and the adaptive version.

<table>
<thead>
<tr>
<th>Static solution (Local Firewall)</th>
<th>Slices</th>
<th>Regs</th>
<th>LUTs</th>
<th>BRAMs</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>138</td>
<td>123</td>
<td>293</td>
<td>1</td>
</tr>
<tr>
<td>Adapt. version additions</td>
<td>Run-time</td>
<td>6</td>
<td>0</td>
<td>5</td>
</tr>
<tr>
<td></td>
<td>Error mode</td>
<td>17</td>
<td>0</td>
<td>18</td>
</tr>
<tr>
<td></td>
<td>Misc.</td>
<td>5</td>
<td>13</td>
<td>15</td>
</tr>
<tr>
<td></td>
<td>Overhead</td>
<td>+20.29%</td>
<td>+10.57%</td>
<td>+12.97%</td>
</tr>
</tbody>
</table>

Table 2. Standalone firewalls results
5.2. Latency

Update latency of security policies depends on the number of policies to be updated. The most critical step in this process is the computation of the new security policy configuration done by the trustworthy processor. For a basic implementation of this process, the new configuration is computed in 148 clock cycles.

When more than one attack has to be reported (N firewalls detect an attack), they are first blocked according to their order in the main register of the Monitoring IP. The trustworthy processor computes all the up-to-date security policies. Finally, each firewall is released as soon as its update is completed. Firewalls being updated are not affected because they are not able to transmit any information (handshake signals are controlled by the update logic). This method allows any updated firewall to be used by the main application of the MPSoC system.

The update latency depends on the firewall location in the update queue: the first firewall will be modified in N cycles, while the firewall placed in location #k in the queue is updated in k(N) cycles because it must wait for the first k − 1 firewalls to be updated.

6. CASE STUDY

The MPSoC case study considers 2 Microblaze processors, a 64KB shared Block RAM, an image processing IP and an external memory. Each processor has code and data sections in an external memory. This architecture is set with mixed cryptographic options and access rights to get all the options (integrity only, read/write, confidentiality and integrity…). 4 Local Firewalls and 1 Cryptographic Firewall are needed for the protection of this case study. Three options are considered: the MPSoC without firewalls, the static firewall-enhanced MPSoC (based on the results of [1]) and the adaptive version with firewalls. Area results are summarized in Table 3.

<table>
<thead>
<tr>
<th>Solution</th>
<th>Slices</th>
<th>Regs</th>
<th>LUTs</th>
<th>BRAMs</th>
</tr>
</thead>
<tbody>
<tr>
<td>Unprotected</td>
<td>5,446</td>
<td>7,195</td>
<td>8,354</td>
<td>32</td>
</tr>
<tr>
<td>Firewalls w/o recfg</td>
<td>+34.08%</td>
<td>+36.87%</td>
<td>+46.22%</td>
<td>+51</td>
</tr>
<tr>
<td>Adaptive protection</td>
<td>+36.65%</td>
<td>+37.78%</td>
<td>+48.49%</td>
<td>+51</td>
</tr>
</tbody>
</table>

Table 3. Table of standalone results

The firewall-enhanced case study has a quite high overhead: this is mainly due to the cryptographic module embedded in the firewall attached to the external memory controller. The logic added for update purposes implies a quite high area overhead of around 40% compared to the static firewall implementation: this is mainly due to the trustworthy processor, the monitoring IP and the security AXI-Lite bus.

7. CONCLUSION AND PERSPECTIVES

In this work, a bus-based MPSoC with adaptive security enhancements (also known as firewalls) is presented. These firewalls protect memories and memory-mapped IPs according to user-defined security policies. By using communication bus properties, it allows developers to get run-time updates with a low area overhead compared to a static solution; furthermore, when security updates are required, there is no invalid data leakage. Mechanisms defined in this work do not need any user event as protection levels are all defined in a dedicated processor.

This work corresponds to a trade-off between SECA [7] and DPU [5] solutions with an implementation on the AXI bus standard implemented in Xilinx tools suite. This work proposes an update feature which is not present in the other bus-based solution. In terms of area overhead over a basic softcore processor, adaptive firewalls have a lower impact (around 11%) than DPU solution (25%, it is based on a shadow memory acting as a buffer-like mechanism); SECA solution has the lowest area overhead (6.20%, mainly due to the centralized security manager) but provides a static solution.

Acknowledgment

The work presented in this paper was realized in the frame of the SecReSoC project number ANR-09-SEGI-013, supported by a grant of the French National Research Agency (ANR).

8. REFERENCES