Editor's Choice


What is ML? – Part 3: Hardware conversion of convolutional neural networks

31 May 2023 Editor's Choice AI & ML

AI applications require massive energy consumption, often in the form of server farms or expensive field programmable gate arrays (FPGAs). The challenge lies in increasing computational power, while keeping energy consumption and costs low. Now, AI applications are seeing a dramatic shift enabled by powerful intelligent edge computing.

Compared to traditional firmware-based computation, hardware-based convolutional neural network acceleration is now ushering in a new era of computational performance with its impressive speed and power. By enabling sensor nodes to make their own decisions, intelligent edge technology dramatically reduces data transmission rates over 5G and Wi-Fi networks. This is powering emerging technologies and unique applications that were not previously possible. To examine how these capabilities are made possible, this article explores the hardware conversion of a CNN with a dedicated AI microcontroller.

Artificial intelligence microcontroller with ultra-low power convolutional neural network accelerator

The MAX78000 is an AI microcontroller with an ultra-low power CNN accelerator, an advanced system-on-chip. It enables neural networks at ultra-low power for resource-constrained edge devices or IoT applications. Such applications include object detection and classification, audio processing, sound classification, noise cancellation, facial recognition, time-series data processing for heart rate/health signal analysis, multi-sensor analysis, and predictive maintenance.

Figure 1 shows a block diagram of the MAX78000, which is powered up to 100 MHz by an Arm Cortex-M4F core with a floating-point unit. To give applications sufficient memory resources, this version of the microcontroller comes with 512 kB of flash and 128 kB of SRAM. Multiple external interfaces are included such as I2Cs, SPIs, and UARTs, as well as the I2S which is important for audio applications. Additionally, there is an integrated 60 MHz RISC-V core. The RISC-V copies data from/to the individual peripheral blocks and the memory (flash and SRAM), making it a smart direct memory access (DMA) engine. The RISC-V core pre-processes the sensor data for the AI accelerator, so the Arm core can be in a deep sleep mode during this time. If necessary, the inference result can trigger the Arm core via an interrupt, and the Arm CPU then performs actions in the main application, passes on sensor data wirelessly, or informs the user.

A hardware accelerator unit for performing inference of convolutional neural networks is a distinct feature of the MAX7800x series of microcontrollers, which sets it apart from the standard microcontroller architecture and peripherals. This hardware accelerator can support complete CNN model architectures along with all the required parameters (weights and biases). The CNN accelerator is equipped with 64 parallel processors and an integrated memory, with 442 kB for storing the parameters and 896 kB for the input data. Because the model and parameters are stored in SRAM memory, they can be adjusted via firmware, and the network can be adapted in real time.

Depending on whether 1-, 2-, 4-, or 8-bit weights are used in the model, this memory can be sufficient for up to 3,5 million parameters. Because the memory capabilities are an integral part of the accelerator, the parameters do not have to be fetched via the microcontroller bus structure with each consecutive mathematical operation. This activity is costly due to high latencies and high power consumption. The neural network accelerator can support 32 or 64 layers, depending on the pooling function. The programmable image input/output size is up to 1024 x 1024 pixels for each layer.

CNN hardware conversion: energy consumption and inference speed comparison

CNN inference is a complex calculations task comprising large linear equations in matrix form. Using the power of Arm

Cortex-M4F microcontrollers, CNN inference on an embedded system’s firmware is possible; however, there are certain drawbacks to consider. With firmware-based inference running on microcontrollers, energy and time are heavily consumed because the commands needed for calculation, along with associated parameter data, need to be retrieved from memory before intermediate results can then be written back.

Table 1 presents a comparison of CNN inference speed and energy consumption utilising three different solutions. This example model was developed using MNIST, a handwritten digit recognition training set, which classifies digits and letters from visual input data to arrive at an accurate output result. The inference time required by each processor type was measured to determine differences between energy consumption and speed.

These data illustrate the power of hardware-accelerated computation.Hardware-accelerated computing is an invaluable tool for applications unable to utilise connectivity or a continuous power supply. The MAX78000 enables edge processing without the demand for large amounts of energy, broadband internet access, or prolonged inference times.

Example use case for the MAX78000 AI microcontroller

The MAX78000 enables a multitude of potential applications, but let’s examine the following use case as an example. The requirement is to design a battery-powered camera that detects when a cat is in the field of view of its image sensor, and consequently enables access to the house, via a digital output through the cat door.

Figure 2 depicts an example block diagram for such a design. In this case, the RISC-V core switches the image sensor on at regular intervals and the image data is loaded into the CNN powered by the MAX78000. If the probability of a cat recognition is above a previously defined threshold, the cat door is enabled. The system then returns to standby mode.

Development environments and evaluation kits

The process of developing an AI-on-the-edge application can be divided up into the following phases:

• Phase 1: AI – Definition, training, and quantisation of the network.

• Phase 2: Arm firmware – Inclusion of the networks and parameters generated in Phase 1 in the C/C++ application and creation and testing of the application firmware.

The first part of the development process involves modelling, training, and evaluating the AI models. For this stage, the developer can leverage open-source tools such as PyTorch and TensorFlow. The GitHub repository provides comprehensive resources to help users map out their journey in building and training AI networks using the PyTorch development environment while taking into consideration the hardware specifications of the MAX78000. Included in the repository are a few simple AI networks and applications like facial recognition.

Figure 3 shows the typical AI development process in PyTorch. First, the network is modelled. It must be noted that not all MAX7800x microcontrollers have hardware that supports all data manipulations available in the PyTorch environment. For this reason, the file ai8x.py supplied by Analog Devices must first be included in the project. This file contains the PyTorch modules and operators required for using the MAX78000. Based on this setup, the network can be built and then trained, evaluated, and quantised using the training data.

The result of this step is a checkpoint file that contains the input data for the final synthesis process. In this final process step, the network and its parameters are converted to a form that fits into the hardware CNN accelerator. Network training can be done with any PC. However, without CUDA graphics card support, this can take a lot of time – even for small networks, days or even weeks are completely realistic.


Figure 4. MAX78000 evaluation kit.

In Phase 2 of the development process, the application firmware is created with the mechanism of writing data to the CNN accelerator and reading the results. The files created in the first phase are integrated into the C/C++ project via #include directives. Open-source tools such as Eclipse IDE and the GNU Toolchain are also used for the development environment for the microcontroller. ADI provides a software development kit (Maxim Micros SDK (Windows)) as an installer that already contains all the necessary components and configurations. The software development kit also contains peripheral drivers, and examples and instructions to ease the process of developing applications.

Once the project has been compiled and linked without any errors, it can be evaluated on the target hardware. ADI has developed two different hardware platforms for this purpose. Figure 4 shows the MAX78000EVKIT, and Figure 5 shows the MAX78000FTHR, which is a somewhat smaller, feather form factor board. Each board comes with a VGA camera and a microphone.


Figure 5. MAX78000FTHR evaluation kit.

Conclusion

Previously, AI applications required massive energy consumption in the form of server farms or expensive FPGAs. Now, with the MAX78000 family of microcontrollers with a dedicated CNN accelerator, it’s possible to power AI applications from a single battery, for extended periods. This breakthrough in energy efficiency and power is making edge-AI more accessible than ever before and unlocking the potential for new and exciting edge-AI applications that were previously impossible.


Credit(s)



Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

Adaptive optics’ power solution
Altron Arrow Opto-Electronics
Vicor power-dense adaptive optical modules enable colossal telescopes to look into the past for deep space discoveries.

Read more...
Wide input voltage buck-boost converter
Altron Arrow Power Electronics / Power Management
The MAX77859 from Analog Devices is a high-efficiency, high-performance buck-boost converter targeted for systems requiring a wide input voltage range of between 2,5 and 22 V.

Read more...
From the editor's desk: Is the current AI really what we want?
Technews Publishing Editor's Choice
The companies that develop LLMs need to change direction and concentrate on freeing up our time, not so that we can have more time to do the tasks we don’t want to do in the first place, but rather to allow us more time to do what we love.

Read more...
High-density power module for AI at the edge applications
Altron Arrow Power Electronics / Power Management
The MCPF1412 power module from Microchip has integrated I2C and PMBus interfaces for flexible configuration and monitoring.

Read more...
When it comes to long-term reliability of RF amplifier ICs, focus first on die junction temperature
Altron Arrow Editor's Choice Telecoms, Datacoms, Wireless, IoT
When considering the long-term reliability of integrated circuits, a common misconception is that high package or die thermal resistance is problematic. However, high or low thermal resistance, by itself, tells an incomplete story.

Read more...
ICs vs modules: Understanding the technical trade-offs for IoT applications
NuVision Electronics Editor's Choice DSP, Micros & Memory
As the IoT continues to transform industries, design decisions around wireless connectivity components become increasingly complex with engineers often facing the dilemma of choosing between ICs and wireless modules for their IoT applications.

Read more...
High-performance processor for edge-AI
Altron Arrow DSP, Micros & Memory
The STM32MP23 microprocessor from STMicroelectronics is the latest addition to the STM32MP2 series, designed to meet the demands of industrial, IoT, and edge AI applications.

Read more...
Why bis means business for LTE Cat 1 IoT connections
NuVision Electronics Editor's Choice Telecoms, Datacoms, Wireless, IoT
Tomaž Petaros, product manager IoT EMEA at Quectel Wireless Solutions explains why the market for Cat 1bis IoT connections is getting busy.

Read more...
Interview with Brian Aziz, vice president of global sales, Iridium
Editor's Choice
ridium is the leading satellite IoT player. Their network consists of 66 active low Earth orbit satellites covering every inch of the globe and are used for IoT and emergency services worldwide.

Read more...
6 GHz power amplifier
Altron Arrow Telecoms, Datacoms, Wireless, IoT
Analog Devices’ ADPA1116 is a 0,3 to 6 GHz GaN-based power amplifier with a saturated output power (POUT) of 39,5 dBm.

Read more...