Programmable Logic


The advantages of the 32-bit Cortex-M1 processor in Actel FPGAs

11 November 2009 Programmable Logic

The embedded market continues to move toward 32-bit processing. At the same time, the market has seen a significant increase in the use of FPGAs as flexible, cost-effective platforms for the rapid design of high-performance embedded systems.

In combination, these trends are driving demand for 32-bit processors in programmable logic. Certainly, there is no end to the number of new 32-bit processor architectures. However, most processor intellectual property (IP) is developed for ASIC implementation. As a result, when implemented in the coarse-grained architecture of FPGAs, the processors are often large and slow – a fate suffered by many widely used processors when ported to programmable logic.

Of course, a few soft proprietary IP processor solutions are available for FPGA implementation. However, only limited tools, support and designer experience exist for these proprietary solutions, making them harder and riskier to use. To address the need for an FPGA-optimised, 32-bit processor based on an industry-standard architecture, Actel and ARM developed the 32-bit Cortex-M1 processor (Figure 1), the first ARM core designed specifically for FPGA implementation.

Figure 1. Cortex-M1 block diagram
Figure 1. Cortex-M1 block diagram

When combined with Actel non-volatile M1-enabled ProASIC3 FPGAs and Fusion programmable system chips (PSCs), the small, fast and highly configurable Cortex-M1 processor offers a number of benefits. Because the Cortex-M1 processor is available from Actel free of licence fees and royalties, the combination of a low-cost FPGA and the Cortex-M1 processor extends the ARM architecture to lower-volume applications, provides a lower-cost entry into system-on-chip (SoC) design, and shortens time-to-market for ARM users. For designs that scale to ultra-high volumes, the 32-bit Cortex-M1 processor runs the Thumb-2 instruction set and is upward-compatible with the Cortex-M3 processor, providing an easy migration path to ASIC implementation. The Cortex-M1 processor also enables economical re-use of tools, code and knowledge, which reduces risk and gets products to market sooner.

An industry-standard solution designed for FPGA implementation

The challenge associated with proprietary architectures lies in making them efficient in targeted applications and putting tools in place to support them. Experienced engineers know that there is a learning curve when using anything new; it takes time to climb the learning curve and gain the experience to effectively deal with the product’s unique characteristics. This is in direct conflict with ever-shortening development schedules and increases design risk. For these reasons, designers tend to re-use what they are familiar with or what they have used before. Over time, this causes a few architectures to become widely used industry standards, while most are only used in narrow vertical niches.

When investigating which 32-bit processor would best suit customer needs in its flash-based FPGAs, Actel realised early on that an industry-standard architecture offers significant benefits over a proprietary architecture. Industry-standard processors have a broad selection of development tools, a significant volume of available program code, and a large following of design engineers who have knowledge and experience using them. These benefits enable users to get their designs developed faster and to market sooner while reducing risk, and as a result, offer customers a better solution and value.

A unique business model – FREE

When Actel and ARM developed an optimised version of the ARM7 processor for use in Actel FPGAs, it was offered with an innovative business model that dramatically increased industry access to the ARM architecture. By removing the licence, royalty fees and contracts typically associated with licensing models for industry-leading processor cores, Actel offers free access to advanced ARM processor technology for the broad marketplace. The same free delivery model is offered with Cortex-M1 for use in Actel Flash-based, M1-enabled Fusion and ProASIC3 FPGAs. This provides all embedded designers with access to programmable flexibility and system-level integration with the ARM architecture, enabling the development of low-cost, high-performance systems.

Cortex-M1 features

Derived from the ARM 3-stage Cortex-M3 processor pipeline, the highly configurable Cortex-M1 processor provides a good balance between size and speed for embedded applications. The core in its smallest configuration is less than 5% larger than the Actel Core8051, an industry-standard 8-bit controller. It runs at over 70 MHz in Actel M1-enabled ProASIC3 and Fusion devices. The processor runs a subset of the new Thumb-2 instruction set and features support for tightly coupled memory and a sophisticated low-latency interrupt controller to improve embedded performance and capabilities.

Improved code density with performance and power efficiency

The Thumb-2 new instruction set for the ARM architecture provides enhanced levels of performance, energy efficiency and code density for a wide range of embedded applications. Thumb-2 technology builds on the success of Thumb, the high-code-density instruction set for ARM microprocessors, to increase the power of the ARM microprocessor core available to developers of low-cost, high-performance systems. The technology is backward-compatible with existing ARM and Thumb solutions, while significantly extending the features available to the Thumb instruction set. This allows more applications to benefit from the code density of Thumb. For performance-optimised code, Thumb-2 technology uses 31% less memory to reduce system cost, while providing up to 38% higher performance than existing high-density code, which can be used to prolong battery life or to enrich the product feature set (Figure 2).

Figure 2. Thumb-2 performance and code density
Figure 2. Thumb-2 performance and code density

Cortex-M1 executes the ARMv6-M instruction set, which is a full subset of the Thumb-2 (ARMv7) instruction set that is used across the rest of the Cortex family. ARM recognises the benefit of having a large volume of legacy code available for customers to use, so they made the Cortex family upward compatible with Thumb code written for their legacy cores (ARM7, ARM9 and ARM11). Existing Thumb code can be run without change on Cortex-family processors, including Cortex-M1 devices. This is a big advantage for designers.

Most of the important processing on legacy ARM processors was done in subroutines written in Thumb code, so designers can take advantage of their existing code for ARM processors. One of the benefits of Thumb-2 over previous ARM instruction set architectures is that 16- and 32-bit instructions are executed in the same mode. In older ARM architectures, Thumb instructions were primarily used in subroutines with the 32-bit ARM instructions used to service interrupts. This often resulted in long latency between the time an interrupt was received and the time it was serviced.

In Thumb-2, ARM merged the 16- and 32-bit operating modes so that interrupts could be serviced without the need to switch from 16-bit mode. It is a big advantage to be able to freely mix 16- and 32-bit instructions. This greatly simplifies the programming task and eliminates the need to profile the code to minimise code size and maximise throughput. It also results in increased performance, because the instructions can be optimally mixed without having to cluster them in 16- and 32-bit groupings.

Fast memory access improves performance

Cortex-M1 has a separate memory interface from the external AMBA (advanced microcontroller bus architecture) peripheral bus interface. This is similar to the high-performance ARM9 architecture but different from ARM7, which has a combined memory and peripheral bus. The separate memory interface on Cortex-M1 is actually implemented as two interfaces, giving separate access to instruction and data tightly coupled memory spaces (ITCM and DTCM). This increases the performance of the processor, because it can fetch an instruction from the ITCM on every clock cycle (Figure 3), and it is never stalled due to data memory accesses, or reads and writes to the peripherals on the AMBA bus.

Figure 3. Tightly coupled memory signal timings
Figure 3. Tightly coupled memory signal timings

Efficient interrupts reduce latency

A configurable nested vectored interrupt controller (NVIC) is available on Cortex-M1 that facilitates low latency interrupt and exception handling and simplifies programming. The NVIC supports re-prioritisable interrupts and is closely coupled to the processor core to support low-latency interrupt processing and efficient processing of late-arriving interrupts. In many applications, especially those that operate in realtime, low interrupt latency is critical. The NVIC in Cortex-M1 also gives users the option of individually determining the priority of interrupts, supporting four interrupt levels, allowing designers to give a system event priority over other events. All of the capabilities built into the NVIC give users maximum control over how the processor responds to events as they occur in an application.

Simplified programmer’s model eases coding

Cortex-M1 implements a subset of the Thumb-2 architecture with two operating modes – thread mode and handler mode. For normal processing, thread mode is entered through reset or exception return. The handler mode is entered as the result of an exception. The processor is designed for embedded applications where additional operating modes are not necessary. By limiting the programmer’s model to a few operating modes, the size of the processor has been significantly reduced.

To ease programming and make transition between thread mode and handler mode as seamless as possible, the processor has two stacks (Figure 4). Out of reset, all code uses the main stack. An exception handler can change the stack used by thread mode from the main stack to the process stack by changing the value it uses on exit. The stack pointer, R13, is a banked register that switches between the main stack and the process stack. The processor has been architected to maximise the designer’s control over the flow of data and processing through the core and to simplify programming, especially when different engineers do the software and hardware development.

Figure 4. Cortex-M1 register file
Figure 4. Cortex-M1 register file

Implementation in Actel M1-enabled FPGAs

FPGA usage is growing in part because of the flexibility that these devices offer. Engineers can tailor the function of a device exactly to their application by adding or removing soft IP. This is similar to what can be done with an ASIC. With FPGAs, however, a design can be developed and running in the application within a few hours, whereas ASICs require many months and large up-front, non-recurring engineering (NRE) charges.

When developing Cortex-M1, ARM and Actel made the processor highly configurable. The tightly coupled memory size, the size and speed of the multiplier, the number of external interrupts, the endian-ness, and whether the debug circuitry and OS extensions are included are all selectable by the user. This gives designers the control to select the minimum processor configuration that best meets their application requirements.

Furthermore, because Cortex-M1 is being implemented in an FPGA, designers can quickly configure the core and program it into an M1-enabled, Flash-based device and test it in their end application. If a change is required, it can be modified and reprogrammed into the device within minutes. In this way, engineers can modify and test their design many times within a few hours to find the optimal implementation for their product.

The M1-enabled Actel FPGAs allow seamless use of the Cortex-M1 processor core. The M1-enabled ProASIC3 and Fusion devices offer all the benefits of Actel non-volatile, Flash-based families – single-chip, reprogrammable, live-at-power-up, secure, firm-error immune and low-power.



Credit(s)



Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

PolarFire SoC Discovery Kit
ASIC Design Services DSP, Micros & Memory
By offering a user-friendly, feature-rich development kit for embedded processing and compute acceleration, Microchip is making emerging technology more accessible to engineers at all levels.

Read more...
Power efficient mid-range FPGA
ASIC Design Services News
The new imperatives of the intelligent edge – power efficiency, security and reliability – are forcing system architects and design engineers to find new solutions. For the growing number of system designers ...

Read more...
Isolation transformers for high-speed SPE applications
ASIC Design Services Interconnection
Utilising the UWBX patent pending technology, HALO has been able to achieve the high-speed performance needed to meet insertion loss and return loss required for a fully-compliant 2.5GBASE-T1 Ethernet port over single-pair copper cables.

Read more...
MPLAB PICkit 5
ASIC Design Services DSP, Micros & Memory
Microchip Technology’s MPLAB PICkit 5 in-circuit debugger/programmer enables quick prototyping and portable, production-ready programming for all Microchip components, including PIC, dsPIC, AVR, and SAM devices.

Read more...
FPGAs speed up intelligent edge designs
ASIC Design Services Editor's Choice DSP, Micros & Memory
Microchip Technology has added nine new technology- and application-specific solution stacks to its growing collection of mid-range FPGA and SoC support.

Read more...
Updated portable library API
ASIC Design Services DSP, Micros & Memory
The Holt Portable Library API now supports all Holt’s MIL-STD-1553 terminal devices, providing the customer with a layer of abstraction using standardised functions.

Read more...
Successful review for FPGA’s crypto
ASIC Design Services DSP, Micros & Memory
The UK government’s National Cyber Security Centre has reviewed the PolarFire FPGAs, when used with the single-chip crypto design flow, against stringent device-level resiliency requirements.

Read more...
Successful review of PolarFire FPGAs crypto design
ASIC Design Services News
System architects and designers have received acknowledgement of the security of their designs that rely on Microchip Technology’s PolarFire FPGAs.

Read more...
Holt wins premier award
ASIC Design Services News
Holt Integrated Circuits has announced that Raytheon Technologies Corporation, one of the world’s largest defence manufacturers, has recognised Holt with a premier award for performance in 2022 for overall excellence in cost competitiveness.

Read more...
Industry’s most power-efficient mid-range FPGA
ASIC Design Services Edge Computing & IIoT
The additions expand Microchip FPGA’s comprehensive suite of tools and services supporting the PolarFire family of devices, and include the only RISC-V SoC FPGA shipping in volume production.

Read more...