Programmable Logic


The internal performance bottleneck for FPGAs in high-speed communications systems

27 August 2003 Programmable Logic

Ref: z2615196m

Traditionally, discussions regarding field-programmable gate arrays (FPGAs) used in interfacing applications within communication systems have focused on raw I/O speed. Unfortunately, how an FPGA handles high-speed data once 'on chip' is commonly overlooked. Many FPGA architectures are ill equipped to handle this flood of data, which results in a bottleneck of large proportions. Ultimately, this bottleneck forces the designer to consider expensive work-arounds that exist when using traditional FPGAs pushing the performance envelope. For FPGAs to be considered viable solutions for next-generation, high-speed communications applications, a new FPGA architecture is required that is optimised to address and eliminate this bottleneck.

Figure 1. The impact of speed
Figure 1. The impact of speed

The bottleneck

Increasing bandwidth in the network and the explosion of communications standards and interfaces has created a troubling problem for traditional FPGA architectures. Until recently, the speed of the internal logic core of most FPGAs has kept up with system performance. However, with performance requirements for communications systems increasing dramatically, FPGAs are often unable to process data through the chip at wireline speeds.

Figure 2. Datastream bottleneck of traditional FPGAs
Figure 2. Datastream bottleneck of traditional FPGAs

So, what are the options for a designer needing to handle a 10 Gbps Ethernet datastream? As an example, most ASSPs on the market today addressing 10 Gbps Ethernet will present using the XGMII interface standard (data at 64 bits wide operating at 156 MHz). This forces the designer using traditional FPGAs to use heavy pipelining and redundant logic to achieve the required system throughput. The bit width of the data must be doubled to 128 bits, once on chip so that operating frequency can be halved to 78 MHz. The result? A longer, more difficult design process, high power consumption and higher costs.

There are five areas in which the requirements for a new FPGA internal core architecture can be grouped - logic module, logic module cluster, embedded memory blocks, clocking schemes and routing structures.

Logic module

A high-performance core architecture needs a flexible logic module with minimal delay, but what about granularity? Fine granularity of two inputs allows minimal module delays and eases the burden on synthesis and mapping, but taxes the routing structure to the point of impracticality, creating excessive routing delays. Course granularity of seven or more inputs, while minimising routing delays, increases the burden on synthesis and mapping and creates greater module delays and inefficient logic utilisation for most datapaths. The optimal granularity appears to be in the range of 3-5 inputs, offering the best balance between flexibility and tool burden on one hand and logic efficiency and minimised delay on the other.

Logic module cluster

Nearly all FPGA architectures organise logic modules into clusters. A high-performance architecture requires a fully fracturable logic module cluster, meaning each logic module can be used independently of other modules within the same cluster and can be combined with any other logic module in the core. This architectural improvement enables greater levels of efficiency and higher logic module utilisation rates. A fully fracturable logic module also allows the placer greater flexibility, which decreases the distance between logic modules and results in better system performance.

At best, traditional FPGA architectures support only partial fracturability of the logic cluster, which often necessitates remapping before placement. This remapping can have an adverse impact on performance. In addition, the placer now has more restrictions regarding where certain functions of a given datapath can be placed, causing greater routing delays than expected.

Embedded memory blocks

Clearly, high-performance FPGA architectures require high-speed memory blocks, but attention needs to be paid to how memory is used in communication systems. When moving data from place to place, memory blocks often need to take the form of FIFOs. The best way to construct high-speed FIFOs is to build the control logic into the memory block. Adding an embedded FIFO controller results in higher speeds and increased logic efficiency as no internal logic is needed to construct the control logic, as with traditional FPGA architectures. In addition, the embedded controller eliminates the concern that a FIFO controller built out of soft gates would compete with the designer's logic for device resources or system performance.

Additionally, since designers of high-performance communication systems have to contend with multiple clock and phase domains, the embedded control logic should allow for asynchronous reads and writes as well as prevent metastability conditions, providing the designer the ability to easily cross clock and phase domains.

Clocking scheme

Modern communications designs require a flexible clocking scheme that is fairly transparent to the user and does not add to the design task. All global clocks should be available equally to all sequential resources on the chip, thereby eliminating the need for clock floorplanning. Quadrant-clocking schemes found in traditional FPGA architectures place an additional burden on the placer and could cause areas of congestion on the chip, thereby degrading system performance.

Additional enhancements to the clocking structure can further reduce the burden on the placer and help eliminate bottlenecks. For example, the architecture should allow the global clocks to be sourced from different resources: package pins, internal PLLs or even internal logic. Another improvement would be a segmentable clock structure, allowing for a high number of local, higher frequency clocks. Finally, the architecture should allow for additional clock networks to be constructed out of general routing resources for maximum flexibility.

Routing structure

Often the routing scheme of an FPGA architecture is overlooked, but, in fact, it is the single most important aspect of any PLD architecture. Because the routing scheme works in conjunction with the other elements of the architecture (logic modules, logic module clusters, embedded memory and clocking scheme) to achieve the required system performance, it takes on an even more important role when constructing a high-performance FPGA internal core architecture. The routing structure determines how efficient the architecture is and ultimately determines the maximum performance possible.

Of course, the speed of the interconnect element is important, but size also plays a role. The smaller the better as a smaller element allows for an abundant interconnect. Abundant interconnect allows the router to routinely achieve close to the theoretical Manhattan Rule wire length between logic modules, and therefore minimal delays. Critical to a high-speed architecture is a programmable interconnect element that does not take up valuable substrate real estate, such as SRAM-based interconnect, allowing for shorter distances between logic modules.

The routing scheme should be non-hierarchical - logic modules should be able to access high-speed routing tracks without having to go through short-distance resources first. This non-hierarchical routing scheme should provide the router with more than one type of routing resource that can be used to connect any two logic modules. Also, the routing structure should consist of high-speed local tracks, segmented local tracks of varying lengths, as well as across-chip routing tracks. And lastly, the routing scheme should also work in conjunction with dedicated routing buffers to speed up high fanout nets.

Increasing bandwidth in the network and the explosion of communications standards and interfaces has created a troubling problem for traditional FPGA architectures. Until recently, however, the speed of the internal logic core of most FPGAs has kept up with system performance. However, with performance requirements for communications systems increasing dramatically, FPGAs are often unable to process data through the chip at wireline speeds (ie, 64 bits at 156 MHz). Therefore, to eliminate this bottleneck, a new high-performance FPGA architecture is required to handle the high data rates. This new architecture should have the following specifications: a logic module with moderate granularity (3-5 inputs) to balance the load between synthesis, mapping and placement; A fully fracturable logic module cluster to allow the placer more options and enable high logic module utilisation; Embedded memory blocks with built in FIFO control to ease the construction of the high-performance FIFOs required for communication systems and minimise the use of vital internal resources; a flexible clock scheme that obviates the need for clock floorplanning and allows for multiple clock sourcing schemes; a high-performance routing scheme that allows the placer to achieve close to theoretical Manhattan Rule wire lengths between logic modules.

Figure 3. New generation FPGAs eliminate the bottleneck
Figure 3. New generation FPGAs eliminate the bottleneck

The AX architecture

Actel's recently announced AX architecture begins to address these issues. The AX architecture provides a high-speed, high-performance platform which when used in communications applications can enable money, power and time savings. Axcelerator, the first FPGA family based on AX relieves the communications bottleneck by providing 500+ MHz internal performance, high-speed embedded FIFOs and a high-speed clocking infrastructure. The future will see the introduction of BridgeFPGA products based on AX which will not only eliminate the communications bottleneck but will further enhance communications systems with the inclusion of embedded protocol controllers.

For more information contact ASIC Design Services, Kobus van Rooyen, 011 315 8316, [email protected]



Credit(s)



Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

PolarFire SoC Discovery Kit
ASIC Design Services DSP, Micros & Memory
By offering a user-friendly, feature-rich development kit for embedded processing and compute acceleration, Microchip is making emerging technology more accessible to engineers at all levels.

Read more...
Power efficient mid-range FPGA
ASIC Design Services News
The new imperatives of the intelligent edge – power efficiency, security and reliability – are forcing system architects and design engineers to find new solutions. For the growing number of system designers ...

Read more...
Isolation transformers for high-speed SPE applications
ASIC Design Services Interconnection
Utilising the UWBX patent pending technology, HALO has been able to achieve the high-speed performance needed to meet insertion loss and return loss required for a fully-compliant 2.5GBASE-T1 Ethernet port over single-pair copper cables.

Read more...
MPLAB PICkit 5
ASIC Design Services DSP, Micros & Memory
Microchip Technology’s MPLAB PICkit 5 in-circuit debugger/programmer enables quick prototyping and portable, production-ready programming for all Microchip components, including PIC, dsPIC, AVR, and SAM devices.

Read more...
FPGAs speed up intelligent edge designs
ASIC Design Services Editor's Choice DSP, Micros & Memory
Microchip Technology has added nine new technology- and application-specific solution stacks to its growing collection of mid-range FPGA and SoC support.

Read more...
Updated portable library API
ASIC Design Services DSP, Micros & Memory
The Holt Portable Library API now supports all Holt’s MIL-STD-1553 terminal devices, providing the customer with a layer of abstraction using standardised functions.

Read more...
Successful review for FPGA’s crypto
ASIC Design Services DSP, Micros & Memory
The UK government’s National Cyber Security Centre has reviewed the PolarFire FPGAs, when used with the single-chip crypto design flow, against stringent device-level resiliency requirements.

Read more...
Successful review of PolarFire FPGAs crypto design
ASIC Design Services News
System architects and designers have received acknowledgement of the security of their designs that rely on Microchip Technology’s PolarFire FPGAs.

Read more...
Holt wins premier award
ASIC Design Services News
Holt Integrated Circuits has announced that Raytheon Technologies Corporation, one of the world’s largest defence manufacturers, has recognised Holt with a premier award for performance in 2022 for overall excellence in cost competitiveness.

Read more...
Industry’s most power-efficient mid-range FPGA
ASIC Design Services Edge Computing & IIoT
The additions expand Microchip FPGA’s comprehensive suite of tools and services supporting the PolarFire family of devices, and include the only RISC-V SoC FPGA shipping in volume production.

Read more...