DSP, Micros & Memory


Designing for low power - value-based source code specialisation for energy reduction

26 February 2003 DSP, Micros & Memory

Ref: z263146m

Around 25 years ago, when microprocessors were in their infancy, most designers had one overriding aim: to minimise the memory requirements. Memory was expensive and largely external, so that a small reduction in code size could often eliminate an external memory device and lead to a proportionately much greater reduction in component cost.

As high-level languages such as C replaced assembly-level coding, compiler developers shared the same goal and one of the most important metric of any new compiler was the size of the resulting code. As a result, the environment in which most software engineers learned their skills was one that favoured code compactness above all else. For example, if the program sometimes needed to calculate a sum of squares and some others needed to calculate a sum of cubes, the standard approach was to write a general procedure that calculated the sum of nth powers, where n was supplied as a parameter. By consolidating as much common code as possible into a parameterised procedure, the amount of memory required for code storage was minimised.

However, just as there were excellent reasons why this programming approach evolved, there are also sound reasons why it does not always deliver optimum results today.

One of the factors that have changed is that processor-based systems no longer necessarily consist of a standalone microprocessor plus external memory plus external peripherals. Today, they are likely to be all incorporated in an embedded system which allows considerably more flexibility in terms of amount of embedded code, data and program memory required

Designing for low power

Another of the key factors is the growing importance of low power consumption. Today, we are on the threshold of a new paradigm where mobile devices will become an increasing part of everyday life, and here minimising power consumption is often far more important than minimising code size. There are many approaches to reducing power consumption, perhaps the most obvious being a reduction in clock frequency, but to minimise power consumption without significant performance penalties, more sophisticated techniques are required.

ST is working with many prestigious universities around the world to develop tools and methodologies that allow software to be optimised for energy reduction. For example, one recent working paper by a team from ST, Stanford University (USA) and the University of Bologna (Italy) described the team's work in developing a tool that reduces the computational effort of programs by specialising it for highly expected situations.

Take, for example, the procedure sum(n,k) mentioned above that calculates the sum of the nth powers of the first k integers. Now suppose that, in practice, the value of n is 1 in 90% of the procedure calls. In such cases, n is called a constant-like argument (CLA) because its value is often - though not always - constant. If we write a simpler procedure, eg, sum1(k), to handle this special case and make sum(n,k) call sum1(k) whenever n = 1, the program will execute faster and consume less power. Of course, the code size will be slightly increased because we have added a specialised procedure to handle the case of n = 1 but this is often a very small price to pay for the lower power consumption and greater performance that results from calling a procedure that only uses one loop rather than two in 90% of cases.

Automatic code transformation

In real applications, the problem is far more complex. What ST and its research partners are developing is a tool that will take the input source code (written in C), find the procedure calls which are frequently executed with the same parameter values (known as 'highly expected situations' or 'common cases') and then generate additional specialised versions of the procedure code to handle the common cases with less computational effort. As research shows that reducing the computational effort usually reduces both power consumption and execution time, a tool that could automatically transform source code in this way would bring tremendous benefits for the customer.

In practice, there are three major problems to be addressed.

* The first is that a procedure may have several possible common cases and it may not be clear which common case is the most effective candidate.

* The second problem is to determine, once a common case has been selected, how best to optimise.

* Finally, after each procedure call has been specialised with the best combination of loop unrolling, it is necessary to analyse the global effect.

This must be done not only in terms of the resulting code size but also because changes in the calling sequences may introduce cache conflicts that were not present in the original code.

Figure 1 shows the key steps in the source code transformation flow. The first three steps do not depend on the target architecture, while the two final steps use instruction level simulation to consider the underlying hardware architecture.

Figure 1. Source code transformation
Figure 1. Source code transformation

* Step 1: The first step is to collect information for the three search problems and estimate the computational efforts involved in the procedure calls. Two types of profiling are performed: execution frequency profiling is used to estimate the total computational effort associated with each procedure, while value profiling identifies CLAs and their common values by observing the parameter value changes of procedure calls.

* Step 2: Armed with this information, the next step is to calculate the normalised computational effort for every detected common case. A user-defined threshold allows trivial common cases to be pruned from the search. In the next stage, all the remaining common cases are specialised, with the result that the original source code is transformed in such a way that all procedures for which there are effective common cases now include a conditional statement that checks for the occurrence of a common value and executes the appropriate specialised version if it is found.

* Step 3: Finally, the global interaction of the specialised calls is examined to determine which ones can be included in the final code.

Today results are very promising. For example, using the ST210 VLIW as the target architecture and a variety of DSP programs based on a set of industrial benchmarks for multimedia applications (eg, G721 encoding, FFT, FIR and edge detection and convolution of images) the average improvements in energy consumption and execution speed were both around 40%. And this for an average increase in code size of just 5%. In some cases, even more spectacular improvements were observed: in the FFT program, for example, over 80% improvement in both energy consumption and execution speed was achieved with a 14% increase in code size!

As the world becomes more and more mobile and connected, designing for low power is becoming increasingly critical. The beauty of this approach is that it allows one to optimise the trade-offs between price, performance and power consumption that can make all the difference to the customer's winning edge.

Avnet Kopp, 011 809 6100, [email protected]





Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

Redefining entry-level MCUs
NuVision Electronics DSP, Micros & Memory
The company positions the GD32C231 series as a ‘high-performance entry-level’ solution designed to offer more competitive options for multiple applications.

Read more...
Microchip enhances TrustMANAGER platform
Altron Arrow DSP, Micros & Memory
Firmware over-the-air updates and remote cryptographic key management provide scalable solutions for addressing IoT security challenges.

Read more...
MCU for low-power, IoT applications
NuVision Electronics DSP, Micros & Memory
Silicon Labs recently announced the PG26, a general-purpose microcontroller with a dedicated matrix vector processor to enhance AI/ML hardware accelerator speeds.

Read more...
EEPROMs for industrial and military markets
Vepac Electronics DSP, Micros & Memory
Designed to ensure the data retention and the secure and safe boot of digital systems, the memory product line includes small and medium density EEPROMs from 16 kb to 1 Mb.

Read more...
PLCnext – Open, IIoT-ready industrial platform
IOT Electronics DSP, Micros & Memory
PLCnext can be used alongside an existing PLC system, collecting control system data via EtherNet/IP, PROFINET, or MODBUS, and can push this information to a cloud instance.

Read more...
ICs vs modules: Understanding the technical trade-offs for IoT applications
NuVision Electronics Editor's Choice DSP, Micros & Memory
As the IoT continues to transform industries, design decisions around wireless connectivity components become increasingly complex with engineers often facing the dilemma of choosing between ICs and wireless modules for their IoT applications.

Read more...
Hardware quantum resistance to embedded controllers
Avnet Silica DSP, Micros & Memory
To help system architects meet evolving security demands, Microchip Technology has developed its MEC175xB embedded controllers with embedded immutable post-quantum cryptography support.

Read more...
High-performance processor for edge-AI
Altron Arrow DSP, Micros & Memory
The STM32MP23 microprocessor from STMicroelectronics is the latest addition to the STM32MP2 series, designed to meet the demands of industrial, IoT, and edge AI applications.

Read more...
PolarFire SoC FPGAs achieve AEC-Q100 qualification
ASIC Design Services DSP, Micros & Memory
Microchip Technology’s PolarFire SoC FPGAs have earned the Automotive Electronics Council AEC-Q100 qualification.

Read more...
Integrated STM32WBA6 wireless microcontrollers
Altron Arrow DSP, Micros & Memory
Cost-efficient and highly integrated embedded controllers for emerging 2,4 GHz wireless applications in smart home, health, factory, and agriculture.

Read more...









While every effort has been made to ensure the accuracy of the information contained herein, the publisher and its agents cannot be held responsible for any errors contained, or any loss incurred as a result. Articles published do not necessarily reflect the views of the publishers. The editor reserves the right to alter or cut copy. Articles submitted are deemed to have been cleared for publication. Advertisements and company contact details are published as provided by the advertiser. Technews Publishing (Pty) Ltd cannot be held responsible for the accuracy or veracity of supplied material.




© Technews Publishing (Pty) Ltd | All Rights Reserved