DSP, Micros & Memory


Designing for low power - value-based source code specialisation for energy reduction

26 February 2003 DSP, Micros & Memory

Ref: z263146m

Around 25 years ago, when microprocessors were in their infancy, most designers had one overriding aim: to minimise the memory requirements. Memory was expensive and largely external, so that a small reduction in code size could often eliminate an external memory device and lead to a proportionately much greater reduction in component cost.

As high-level languages such as C replaced assembly-level coding, compiler developers shared the same goal and one of the most important metric of any new compiler was the size of the resulting code. As a result, the environment in which most software engineers learned their skills was one that favoured code compactness above all else. For example, if the program sometimes needed to calculate a sum of squares and some others needed to calculate a sum of cubes, the standard approach was to write a general procedure that calculated the sum of nth powers, where n was supplied as a parameter. By consolidating as much common code as possible into a parameterised procedure, the amount of memory required for code storage was minimised.

However, just as there were excellent reasons why this programming approach evolved, there are also sound reasons why it does not always deliver optimum results today.

One of the factors that have changed is that processor-based systems no longer necessarily consist of a standalone microprocessor plus external memory plus external peripherals. Today, they are likely to be all incorporated in an embedded system which allows considerably more flexibility in terms of amount of embedded code, data and program memory required

Designing for low power

Another of the key factors is the growing importance of low power consumption. Today, we are on the threshold of a new paradigm where mobile devices will become an increasing part of everyday life, and here minimising power consumption is often far more important than minimising code size. There are many approaches to reducing power consumption, perhaps the most obvious being a reduction in clock frequency, but to minimise power consumption without significant performance penalties, more sophisticated techniques are required.

ST is working with many prestigious universities around the world to develop tools and methodologies that allow software to be optimised for energy reduction. For example, one recent working paper by a team from ST, Stanford University (USA) and the University of Bologna (Italy) described the team's work in developing a tool that reduces the computational effort of programs by specialising it for highly expected situations.

Take, for example, the procedure sum(n,k) mentioned above that calculates the sum of the nth powers of the first k integers. Now suppose that, in practice, the value of n is 1 in 90% of the procedure calls. In such cases, n is called a constant-like argument (CLA) because its value is often - though not always - constant. If we write a simpler procedure, eg, sum1(k), to handle this special case and make sum(n,k) call sum1(k) whenever n = 1, the program will execute faster and consume less power. Of course, the code size will be slightly increased because we have added a specialised procedure to handle the case of n = 1 but this is often a very small price to pay for the lower power consumption and greater performance that results from calling a procedure that only uses one loop rather than two in 90% of cases.

Automatic code transformation

In real applications, the problem is far more complex. What ST and its research partners are developing is a tool that will take the input source code (written in C), find the procedure calls which are frequently executed with the same parameter values (known as 'highly expected situations' or 'common cases') and then generate additional specialised versions of the procedure code to handle the common cases with less computational effort. As research shows that reducing the computational effort usually reduces both power consumption and execution time, a tool that could automatically transform source code in this way would bring tremendous benefits for the customer.

In practice, there are three major problems to be addressed.

* The first is that a procedure may have several possible common cases and it may not be clear which common case is the most effective candidate.

* The second problem is to determine, once a common case has been selected, how best to optimise.

* Finally, after each procedure call has been specialised with the best combination of loop unrolling, it is necessary to analyse the global effect.

This must be done not only in terms of the resulting code size but also because changes in the calling sequences may introduce cache conflicts that were not present in the original code.

Figure 1 shows the key steps in the source code transformation flow. The first three steps do not depend on the target architecture, while the two final steps use instruction level simulation to consider the underlying hardware architecture.

Figure 1. Source code transformation
Figure 1. Source code transformation

* Step 1: The first step is to collect information for the three search problems and estimate the computational efforts involved in the procedure calls. Two types of profiling are performed: execution frequency profiling is used to estimate the total computational effort associated with each procedure, while value profiling identifies CLAs and their common values by observing the parameter value changes of procedure calls.

* Step 2: Armed with this information, the next step is to calculate the normalised computational effort for every detected common case. A user-defined threshold allows trivial common cases to be pruned from the search. In the next stage, all the remaining common cases are specialised, with the result that the original source code is transformed in such a way that all procedures for which there are effective common cases now include a conditional statement that checks for the occurrence of a common value and executes the appropriate specialised version if it is found.

* Step 3: Finally, the global interaction of the specialised calls is examined to determine which ones can be included in the final code.

Today results are very promising. For example, using the ST210 VLIW as the target architecture and a variety of DSP programs based on a set of industrial benchmarks for multimedia applications (eg, G721 encoding, FFT, FIR and edge detection and convolution of images) the average improvements in energy consumption and execution speed were both around 40%. And this for an average increase in code size of just 5%. In some cases, even more spectacular improvements were observed: in the FFT program, for example, over 80% improvement in both energy consumption and execution speed was achieved with a 14% increase in code size!

As the world becomes more and more mobile and connected, designing for low power is becoming increasingly critical. The beauty of this approach is that it allows one to optimise the trade-offs between price, performance and power consumption that can make all the difference to the customer's winning edge.

Avnet Kopp, 011 809 6100, [email protected]





Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

The 8-bit survival syndrome – Part 2
DSP, Micros & Memory
Just like the 4-bit pre-microcontroller, the 8-bit MCU has been finding ways to stick around. Their features and speeds have been improving, offering competitive reasons to work with them.

Read more...
Enhanced code protection for USB µC portfolio
Future Electronics DSP, Micros & Memory
To help easily incorporate USB power and communication functionality into embedded systems, Microchip Technology has launched the AVR DU family of microcontrollers.

Read more...
General-purpose MCU with RISC-V architecture
EBV Electrolink DSP, Micros & Memory
Renesas has released a general-purpose MCU to enhance its existing RISC-V portfolio, and this is its first MCU using a RISC-V core developed internally at the company.

Read more...
8-bit MCU with I3C support
Avnet Silica DSP, Micros & Memory
The PIC18-Q20 8-bit microcontrollers from Microchip easily interface with devices operating in multiple voltage domains, and the built-in I3C interface supports higher-speed and lower-power data transfers than I2C.

Read more...
An evolutionary step in customisable logic
Altron Arrow DSP, Micros & Memory
Microchip Technology is offering a tailored hardware solution with the launch of its PIC16F13145 family of microcontrollers, which are outfitted with a new Configurable Logic Block module.

Read more...
MCU for battery-powered applications
Altron Arrow DSP, Micros & Memory
Included in ST’s family of devices is the STM32U031, an ultra-low-power MCU featuring an ARM Cortex-M0+ 32-bit core running at up to 56 MHz.

Read more...
Serial SRAM up to 4 MB
EBV Electrolink DSP, Micros & Memory
The chips are designed to provide a lower-cost alternative to traditional parallel SRAM products, and include optional battery backup switchover circuitry in the SRAM memory to retain data on power loss.

Read more...
SiP supports LTE/NB-IoT and GNSS
RF Design DSP, Micros & Memory
The nRF9151 from Nordic Semiconductor is an integrated System-in-Package that supports LTE-M/NB-IoT, DECT NR+ and GNSS services.

Read more...
Qi2 dsPIC33-based reference design
DSP, Micros & Memory
Powered by a single dsPIC33 Digital Signal Controller, the Qi2 reference design offers efficient control for optimised performance.

Read more...
MIKROE’s IDE now includes MPLAB XC compilers
DSP, Micros & Memory
MIKROE has announced that the latest version of its multi-architectural IDE, NECTO Studio 6.1, now includes Microchip’s MPLAB XC compilers for 8-, 16- and 32-bit MCUs.

Read more...