DSP, Micros & Memory


Designing for low power - value-based source code specialisation for energy reduction

26 February 2003 DSP, Micros & Memory

Ref: z263146m

Around 25 years ago, when microprocessors were in their infancy, most designers had one overriding aim: to minimise the memory requirements. Memory was expensive and largely external, so that a small reduction in code size could often eliminate an external memory device and lead to a proportionately much greater reduction in component cost.

As high-level languages such as C replaced assembly-level coding, compiler developers shared the same goal and one of the most important metric of any new compiler was the size of the resulting code. As a result, the environment in which most software engineers learned their skills was one that favoured code compactness above all else. For example, if the program sometimes needed to calculate a sum of squares and some others needed to calculate a sum of cubes, the standard approach was to write a general procedure that calculated the sum of nth powers, where n was supplied as a parameter. By consolidating as much common code as possible into a parameterised procedure, the amount of memory required for code storage was minimised.

However, just as there were excellent reasons why this programming approach evolved, there are also sound reasons why it does not always deliver optimum results today.

One of the factors that have changed is that processor-based systems no longer necessarily consist of a standalone microprocessor plus external memory plus external peripherals. Today, they are likely to be all incorporated in an embedded system which allows considerably more flexibility in terms of amount of embedded code, data and program memory required

Designing for low power

Another of the key factors is the growing importance of low power consumption. Today, we are on the threshold of a new paradigm where mobile devices will become an increasing part of everyday life, and here minimising power consumption is often far more important than minimising code size. There are many approaches to reducing power consumption, perhaps the most obvious being a reduction in clock frequency, but to minimise power consumption without significant performance penalties, more sophisticated techniques are required.

ST is working with many prestigious universities around the world to develop tools and methodologies that allow software to be optimised for energy reduction. For example, one recent working paper by a team from ST, Stanford University (USA) and the University of Bologna (Italy) described the team's work in developing a tool that reduces the computational effort of programs by specialising it for highly expected situations.

Take, for example, the procedure sum(n,k) mentioned above that calculates the sum of the nth powers of the first k integers. Now suppose that, in practice, the value of n is 1 in 90% of the procedure calls. In such cases, n is called a constant-like argument (CLA) because its value is often - though not always - constant. If we write a simpler procedure, eg, sum1(k), to handle this special case and make sum(n,k) call sum1(k) whenever n = 1, the program will execute faster and consume less power. Of course, the code size will be slightly increased because we have added a specialised procedure to handle the case of n = 1 but this is often a very small price to pay for the lower power consumption and greater performance that results from calling a procedure that only uses one loop rather than two in 90% of cases.

Automatic code transformation

In real applications, the problem is far more complex. What ST and its research partners are developing is a tool that will take the input source code (written in C), find the procedure calls which are frequently executed with the same parameter values (known as 'highly expected situations' or 'common cases') and then generate additional specialised versions of the procedure code to handle the common cases with less computational effort. As research shows that reducing the computational effort usually reduces both power consumption and execution time, a tool that could automatically transform source code in this way would bring tremendous benefits for the customer.

In practice, there are three major problems to be addressed.

* The first is that a procedure may have several possible common cases and it may not be clear which common case is the most effective candidate.

* The second problem is to determine, once a common case has been selected, how best to optimise.

* Finally, after each procedure call has been specialised with the best combination of loop unrolling, it is necessary to analyse the global effect.

This must be done not only in terms of the resulting code size but also because changes in the calling sequences may introduce cache conflicts that were not present in the original code.

Figure 1 shows the key steps in the source code transformation flow. The first three steps do not depend on the target architecture, while the two final steps use instruction level simulation to consider the underlying hardware architecture.

Figure 1. Source code transformation
Figure 1. Source code transformation

* Step 1: The first step is to collect information for the three search problems and estimate the computational efforts involved in the procedure calls. Two types of profiling are performed: execution frequency profiling is used to estimate the total computational effort associated with each procedure, while value profiling identifies CLAs and their common values by observing the parameter value changes of procedure calls.

* Step 2: Armed with this information, the next step is to calculate the normalised computational effort for every detected common case. A user-defined threshold allows trivial common cases to be pruned from the search. In the next stage, all the remaining common cases are specialised, with the result that the original source code is transformed in such a way that all procedures for which there are effective common cases now include a conditional statement that checks for the occurrence of a common value and executes the appropriate specialised version if it is found.

* Step 3: Finally, the global interaction of the specialised calls is examined to determine which ones can be included in the final code.

Today results are very promising. For example, using the ST210 VLIW as the target architecture and a variety of DSP programs based on a set of industrial benchmarks for multimedia applications (eg, G721 encoding, FFT, FIR and edge detection and convolution of images) the average improvements in energy consumption and execution speed were both around 40%. And this for an average increase in code size of just 5%. In some cases, even more spectacular improvements were observed: in the FFT program, for example, over 80% improvement in both energy consumption and execution speed was achieved with a 14% increase in code size!

As the world becomes more and more mobile and connected, designing for low power is becoming increasingly critical. The beauty of this approach is that it allows one to optimise the trade-offs between price, performance and power consumption that can make all the difference to the customer's winning edge.

Avnet Kopp, 011 809 6100, [email protected]





Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

The end of ‘entry-level’: STMicroelectronics’ STM32C5 sets a new baseline for embedded systems
DSP, Micros & Memory
[Sponsored] Instead of incrementally improving legacy Cortex-M0+ architectures, STM32C5 introduces a Cortex-M33-based platform into the entry-level category. This changes not only performance expectations, but also how engineers approach system architecture, consolidation, and long-term scalability.

Read more...
GigaDevice expands GD25UF Series density
NuVision Electronics DSP, Micros & Memory
GigaDevice has announced the expanded density range of its GD25UF series 1,2 V ultra-low power SPI NOR Flash, now spanning from 8 Mb to 256 Mb.

Read more...
ARINC 429 line driver evaluation board
ASIC Design Services DSP, Micros & Memory
Holt Integrated Circuits have announced the release of the ADK-85104 Evaluation Board, a compact, ready-to-use platform designed to help engineers rapidly evaluate and characterise Holt’s HI-85104.

Read more...
Highly integrated 24-channel mixed signal IC
EBV Electrolink DSP, Micros & Memory
Microchip Technology has announced the LX4580, a 24-channel mixed-signal IC designed to replace multiple discrete components with a single device that supports synchronised data acquisition, fault monitoring, and motor control.

Read more...
Lower-power Thread and BLE connectivity
iCorp Technologies DSP, Micros & Memory
Espressif has released the ESP32-H21, a low-power wireless SoC aimed at Thread, Matter, Zigbee, and Bluetooth LE device designs.

Read more...
Touch-enabled 32-bit MCU
EBV Electrolink DSP, Micros & Memory
Microchip’s PIC32CM PL10 microcontroller family expands the company’s Arm Cortex-M0+ portfolio, delivering a compact, low-power 32-bit platform designed for cost-sensitive embedded applications.

Read more...
Build smarter with UNO Q
Electrocomp Express DSP, Micros & Memory
The Arduino UNO Q’s hybrid design combines a Linux Debian-capable microprocessor with a real-time STM32U585 microcontroller making it the perfect dual-brain platform for the next innovation.

Read more...
Compact AI modules for imaging
Otto Wireless Solutions DSP, Micros & Memory
SIMCom has introduced two compact smart AI modules, the SIM8666 and SIM8668, designed to simplify the development of imaging-based IoT applications.

Read more...
Compact Renesas MCU rapid development board
Dizzy Enterprises DSP, Micros & Memory
Built around the Renesas R7FA4M2AD3CFP microcontroller, the Clicker 4 board provides engineers with a ready to use solution for prototyping and testing applications that require reliable ARM based processing and flexible expansion.

Read more...
IO Ninja and Python working hand-in-hand
RF Design DSP, Micros & Memory
IO Ninja is a professional all-in-one terminal emulator, sniffer, and protocol analyser that runs natively on Windows, Linux, and macOS, and excels as a UI debugger for serial, network, USB, and all other forms of communication.

Read more...









While every effort has been made to ensure the accuracy of the information contained herein, the publisher and its agents cannot be held responsible for any errors contained, or any loss incurred as a result. Articles published do not necessarily reflect the views of the publishers. The editor reserves the right to alter or cut copy. Articles submitted are deemed to have been cleared for publication. Advertisements and company contact details are published as provided by the advertiser. Technews Publishing (Pty) Ltd cannot be held responsible for the accuracy or veracity of supplied material.




© Technews Publishing (Pty) Ltd | All Rights Reserved