Computer/Embedded Technology


DSP applications benefit from AVX

3 August 2011 Computer/Embedded Technology

Intel has radically changed the relevance of its processors in the digital signal processing (DSP) arena with the addition of Advanced Vector Extensions (AVX) in its second generation Core i7 multicore platforms.

With this move, the company has doubled the vector engine capability of the processor from 128 to 256 bits.

Through this improvement in floating point processing capabilities, the processors formerly code-named ‘Sandy Bridge’ now become compelling choices in military and aerospace applications such as radar, sonar, signal processing, intelligence, surveillance and reconnaissance (ISR) and on a wide range of manned and unmanned platforms.

For many years, the DSP processing heavy lifting has been done by DSP chips and FPGAs, but these solutions and others created their own burden in the forms of power consumption and development programming. Placing the AVX technology in each core of the 32 nm second generation Core i7 processor eliminates the size, weight and power (SWaP) issues of the previous DSP solutions, and also allows developers to work within the familiar programming environment of the general purpose processor.

The resulting GFLOPS processing capability puts Intel processors into a DSP realm they have never before enjoyed. In addition, Intel Hyper-Threading technology has been reintroduced. This enables a single core to appear as two virtual cores, and can boost performance by 25-30% in some cases. Benchmarks using Synthetic Aperture Radar code have demonstrated over twice the performance on second generation Core i7 when compared with first generation processors (Sandy Bridge versus Arrandale) at the same clock frequency and two threads of execution. When running with four threads (on four cores for second generation Core i7 or two cores with Hyper-Threading on Core i7), the increase exceeds 4X.

This article looks at the potential impact that the addition of AVX technology in the second generation Intel Core i7 processors may have on DSP applications in the military and aerospace arena.

Genesis of AVX technology

The second generation Core i7 processor breaks new ground by being the first processor offering from Intel to be effective at floating point (FP) processing. The earliest processors from the company, the 8086 and its offspring, did not have the ability to perform these calculations without help from software emulation or a dedicated 8087 FP chip.

MultiMedia eXtension (MMX) technology improved matters somewhat with the introduction of the Pentium processors. The MMX instructions and execution units were tasked with the encoding and decoding of audio and video feeds, but could be called upon for some types of military and aerospace DSP as long as integer operations were kept to a minimum. As MMX morphed into streaming single instruction multiple data (SIMD) extensions (SSE), FP calculations and DSP became more achievable.

DSP developers had a brief fling with another Intel offering starting in the late 1980s in the form of the i860 RISC microprocessor. This CPU and graphics accelerator found itself inside of everything from microcomputers to supercomputers during its roughly six-year run, but military and aerospace contractors were particularly taken with the i860’s ability to handle DSP operations.

The i860 was quickly surpassed by other RISC processors which in turn were quickly exterminated by the ARM-based XScale processors, thus leaving the DSP world without another dramatic technology update to SSE until the January 2011 release of the second generation Core i7 with AVX. AMD is also set to introduce AVX-enabled processors during 2011.

Breaking through the vector engine ceiling

AVX extended the functionality of SSE by doubling the width of the MMX registers from 128 bits to 256 bits and adding extensions that can operate in this wider data environment. This new 256 bit vector engine allows eight single-precision, 32-bit FP operations or four double-precision 64-bit FP operations to be performed at the same time, up from four single-precision or two double-precision operations with SSE. This becomes an important factor in DSP applications, where the same operation frequently must be performed many times across a large data set.

The vector engine doubling also has an almost direct correlation to the processor’s ability to perform FP vector calculations. Whereas previous processors relied on clock rates and die geometries to realise any gains in FP vector performance, AVX leverages the SIMD operation functionality to achieve greater gains than otherwise possible. This is significant because clock rates above 2 or even 3 GHz are reaching an efficiency ceiling due to their greater power consumption. Similarly, increased leakage has erased some of the potential gains of reduced die geometries.

The ability to perform one instruction on eight discrete sets of data, however, has no such technology ceiling. Each of the two or four cores of the second generation Core i7 processors has a dedicated AVX unit. This gives the new processors the ability to execute twice as many operations per clock cycle as their predecessors, with new quad-core platforms able to process up to 64 operations per clock cycle.

The memory unit in a quad-core second generation Core i7 processor provides a 32 KB, four-way first-level instruction cache, a 32 KB eight-way first-level data cache, and a 256 KB, eight-way second-level unified cache. Additionally, as much as a 6 MB, 16-way third-level cache can be shared by all of the cores. The processors have two DDR3 memory controllers with up to 21,35 GBps of peak memory bandwidth. The memory unit is able to process two read requests of 16 Bytes each and one write request of the same size per clock cycle to prevent pipeline stalls caused by inadequate data feeds.

Thus, the new second generation Core i7 processors have added strong FP performance to Intel’s already robust integer capabilities. Future micro-architectures promise to match integer to FP performance, and also to introduce fused multiply accumulate (FMA), an operation that performs the multiply and add operations with one rounding stage instead of two to increase numerical accuracy and speed.

The military and aerospace DSP arena is now able to capitalise on this newfound melding of capabilities through ruggedised commercial off-the-shelf (COTS) single board computers (SBCs) based around the second generation Core i7 processor family. One of the first commercial offerings was the 6U VPXcel6 SBC624 from GE Intelligent Platforms. This successor to the company’s SBC620 and SBC622 products based, respectively, on the Intel Core 2 Duo and Core i7 processors, is available in five levels of ruggedisation ranging from benign to fully rugged.

GE has subsequently announced three other SBC products based on the second generation Core i7: the 3U form factor VPXcel3 SBC324; the 6U XVR14 rugged VME SBC; and the 6U XCR14 rugged CompactPCI SBC. All employ serial switched fabrics to optimise board-to-board data transfers to enhance DSP capabilities. GE is also expected to soon announce a 6U VPX multiprocessor platform that promises to optimise high-performance density.

GE has announced its fifth second generation Core i7 platform, its DSP280 dual-node multiprocessor specifically designed for defence and aerospace applications requiring the highest levels of DSP and multiprocessing capabilities. With its powerful dual processor configuration, this fully rugged 6U OpenVPX multiprocessor platform will be capable of more than 260 Gigaflops peak performance per card slot.

The DSP280 also features up to 21 GBps main memory bandwidth with error checking and correction per CPU node. This high-performance embedded computing architecture can scale to teraFLOP performance levels within a single chassis via RDMA-enabled 10 Gigabit Ethernet and double data rate Infiniband dual port network interface controllers delivering as much as 1,8 GBps data rates per channel at approximately 1 μs memory-to-memory latencies.

Intel has promised a seven-year parts lifecycle for the second generation Core i7 family. GE has chosen ball grid array devices from Intel’s Performance Mobile chipset family that can be soldered down for increased resistance to high shock and vibration environments, instead of the less secure land grid array socket emplacement.

Maximising AVX development

Taking full advantage of the added performance capability of the second generation Core i7 processor family presents some degree of challenge to the developer. Each AVX unit can be programmed using primitives that can be called from C or other high-level languages. While no more complex than assembly code programming, getting good performance at this level is not a trivial task.

Many factors must be understood and factored in when coding to avoid pipeline stalls and resource contention. Compilers offer some help. Several already have AVX support, coupled with varying degrees of automatic vectorisation. Source code is analysed and, where possible, procedural loops are mapped to SIMD operations. This allows un-modified code to take advantage of AVX to some extent. If code modification is acceptable, or the code already uses library calls, math libraries offer a good alternative.

Intel produces integrated performance primitives (IPP) and Math Kernel Library (MKL) that are highly tuned for AVX by Intel’s own experts. Algorithm coverage is broad and performance is hard to beat. However, these libraries are proprietary to Intel (AMD has its own variation). Because of this, some programs turn to more open application programming interfaces (APIs) such as the Vector Signal and Image Processing Library (VSIPL) API that was sponsored by DARPA as a cross-platform, cross-vendor standard, and its C++ sibling, VSIPL++. These libraries can help isolate applications from the intricacies of the underlying hardware architectures.

GE Intelligent Platforms has supported the VSIPL standard API across multiple architectures – PowerPC/Altivec, GPGPU/CUDA, and Intel/SSE with the AXISLib product suite for many years. The company has just announced the latest addition to this product family, AXISLib-AVX, which includes the full VSIPL Core 1.0+ profile. These libraries are hand-optimised for the second generation Core i7 platform with support for AVX and multithreading so that developers can extract the maximum performance out of the new Intel processors for SWaP-sensitive sensor processing applications.

The AXISLib-AVX library includes more than 600 high-performance DSP and vector mathematical functions for advanced real-time embedded signal processing applications, and can be used on their own or as an integral software module within the AXIS Advanced Multiprocessor Integrated Software environment.





Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

Microsoft Windows IoT on ARM
Altron Arrow Computer/Embedded Technology
This expansion means that the Windows IoT ecosystem can now harness the power of ARM processors, known for their energy efficiency and versatility.

Read more...
Hardened-grade network switches
CST Electronics Computer/Embedded Technology
Lantronix’s hardened switches provide Layer 2 or Layer 3 networking, and are available as Power-over-Ethernet (PoE) or Power-over-Ethernet Plus (PoE+).

Read more...
Switched mezzanine card for enhanced Ethernet connectivity
Rugged Interconnect Technologies Computer/Embedded Technology
The TXMC897 sets a new standard in high-speed Ethernet communication, with advanced features and flexibility.

Read more...
Ryzen V3000 computer on module
Altron Arrow Computer/Embedded Technology
SolidRun has recently announced the launch of its new Ryzen V3000 CX7 Com module, configurable with the eight-core/16-thread Ryzen Embedded V3C48 processor.

Read more...
1.6T Ethernet IP solution to drive AI and hyperscale data centre chips
Computer/Embedded Technology
As artificial intelligence (AI) workloads continue to grow exponentially, and hyperscale data centres become the backbone of our digital infrastructure, the need for faster and more efficient communication technologies becomes imperative. 1.6T Ethernet will rapidly be replacing 400G and 800G Ethernet as the backbone of hyperscale data centres.

Read more...
Maximising edge computing
Computer/Embedded Technology
Senao Networks has announced its launch of its SX904 SmartNIC based on the Intel NetSec Accelerator Reference Design.

Read more...
Duxbury unveils next-gen solar-powered switches
Computer/Embedded Technology
These powerful solar-powered switches are ideal for any environment requiring reliable Power-over-Ethernet (PoE) capabilities.

Read more...
UFS Ver. 4.0 embedded Flash memory devices
EBV Electrolink Computer/Embedded Technology
KIOXIA Europe has announced sampling of the industry’s first Universal Flash Storage (UFS) version 4.0 embedded Flash memory devices designed for automotive applications.

Read more...
Powering factory automation into the future
Rugged Interconnect Technologies Computer/Embedded Technology
Powered by the newest 13th Gen Intel processors, ADLINK Technology’s COM-HPC-cRLS module is a future-proof edge AI solution.

Read more...
Linux OS with immutable file system
Computer/Embedded Technology
What really sets VanillaOS apart from others, however, is in security, where it takes a new approach to computing by using an immutable file system for improved security and stability.

Read more...