Editor's Choice


What is ML? – Part 3: Hardware conversion of convolutional neural networks

31 May 2023 Editor's Choice AI & ML

AI applications require massive energy consumption, often in the form of server farms or expensive field programmable gate arrays (FPGAs). The challenge lies in increasing computational power, while keeping energy consumption and costs low. Now, AI applications are seeing a dramatic shift enabled by powerful intelligent edge computing.

Compared to traditional firmware-based computation, hardware-based convolutional neural network acceleration is now ushering in a new era of computational performance with its impressive speed and power. By enabling sensor nodes to make their own decisions, intelligent edge technology dramatically reduces data transmission rates over 5G and Wi-Fi networks. This is powering emerging technologies and unique applications that were not previously possible. To examine how these capabilities are made possible, this article explores the hardware conversion of a CNN with a dedicated AI microcontroller.

Artificial intelligence microcontroller with ultra-low power convolutional neural network accelerator

The MAX78000 is an AI microcontroller with an ultra-low power CNN accelerator, an advanced system-on-chip. It enables neural networks at ultra-low power for resource-constrained edge devices or IoT applications. Such applications include object detection and classification, audio processing, sound classification, noise cancellation, facial recognition, time-series data processing for heart rate/health signal analysis, multi-sensor analysis, and predictive maintenance.

Figure 1 shows a block diagram of the MAX78000, which is powered up to 100 MHz by an Arm Cortex-M4F core with a floating-point unit. To give applications sufficient memory resources, this version of the microcontroller comes with 512 kB of flash and 128 kB of SRAM. Multiple external interfaces are included such as I2Cs, SPIs, and UARTs, as well as the I2S which is important for audio applications. Additionally, there is an integrated 60 MHz RISC-V core. The RISC-V copies data from/to the individual peripheral blocks and the memory (flash and SRAM), making it a smart direct memory access (DMA) engine. The RISC-V core pre-processes the sensor data for the AI accelerator, so the Arm core can be in a deep sleep mode during this time. If necessary, the inference result can trigger the Arm core via an interrupt, and the Arm CPU then performs actions in the main application, passes on sensor data wirelessly, or informs the user.

A hardware accelerator unit for performing inference of convolutional neural networks is a distinct feature of the MAX7800x series of microcontrollers, which sets it apart from the standard microcontroller architecture and peripherals. This hardware accelerator can support complete CNN model architectures along with all the required parameters (weights and biases). The CNN accelerator is equipped with 64 parallel processors and an integrated memory, with 442 kB for storing the parameters and 896 kB for the input data. Because the model and parameters are stored in SRAM memory, they can be adjusted via firmware, and the network can be adapted in real time.

Depending on whether 1-, 2-, 4-, or 8-bit weights are used in the model, this memory can be sufficient for up to 3,5 million parameters. Because the memory capabilities are an integral part of the accelerator, the parameters do not have to be fetched via the microcontroller bus structure with each consecutive mathematical operation. This activity is costly due to high latencies and high power consumption. The neural network accelerator can support 32 or 64 layers, depending on the pooling function. The programmable image input/output size is up to 1024 x 1024 pixels for each layer.

CNN hardware conversion: energy consumption and inference speed comparison

CNN inference is a complex calculations task comprising large linear equations in matrix form. Using the power of Arm

Cortex-M4F microcontrollers, CNN inference on an embedded system’s firmware is possible; however, there are certain drawbacks to consider. With firmware-based inference running on microcontrollers, energy and time are heavily consumed because the commands needed for calculation, along with associated parameter data, need to be retrieved from memory before intermediate results can then be written back.

Table 1 presents a comparison of CNN inference speed and energy consumption utilising three different solutions. This example model was developed using MNIST, a handwritten digit recognition training set, which classifies digits and letters from visual input data to arrive at an accurate output result. The inference time required by each processor type was measured to determine differences between energy consumption and speed.

These data illustrate the power of hardware-accelerated computation.Hardware-accelerated computing is an invaluable tool for applications unable to utilise connectivity or a continuous power supply. The MAX78000 enables edge processing without the demand for large amounts of energy, broadband internet access, or prolonged inference times.

Example use case for the MAX78000 AI microcontroller

The MAX78000 enables a multitude of potential applications, but let’s examine the following use case as an example. The requirement is to design a battery-powered camera that detects when a cat is in the field of view of its image sensor, and consequently enables access to the house, via a digital output through the cat door.

Figure 2 depicts an example block diagram for such a design. In this case, the RISC-V core switches the image sensor on at regular intervals and the image data is loaded into the CNN powered by the MAX78000. If the probability of a cat recognition is above a previously defined threshold, the cat door is enabled. The system then returns to standby mode.

Development environments and evaluation kits

The process of developing an AI-on-the-edge application can be divided up into the following phases:

• Phase 1: AI – Definition, training, and quantisation of the network.

• Phase 2: Arm firmware – Inclusion of the networks and parameters generated in Phase 1 in the C/C++ application and creation and testing of the application firmware.

The first part of the development process involves modelling, training, and evaluating the AI models. For this stage, the developer can leverage open-source tools such as PyTorch and TensorFlow. The GitHub repository provides comprehensive resources to help users map out their journey in building and training AI networks using the PyTorch development environment while taking into consideration the hardware specifications of the MAX78000. Included in the repository are a few simple AI networks and applications like facial recognition.

Figure 3 shows the typical AI development process in PyTorch. First, the network is modelled. It must be noted that not all MAX7800x microcontrollers have hardware that supports all data manipulations available in the PyTorch environment. For this reason, the file ai8x.py supplied by Analog Devices must first be included in the project. This file contains the PyTorch modules and operators required for using the MAX78000. Based on this setup, the network can be built and then trained, evaluated, and quantised using the training data.

The result of this step is a checkpoint file that contains the input data for the final synthesis process. In this final process step, the network and its parameters are converted to a form that fits into the hardware CNN accelerator. Network training can be done with any PC. However, without CUDA graphics card support, this can take a lot of time – even for small networks, days or even weeks are completely realistic.


Figure 4. MAX78000 evaluation kit.

In Phase 2 of the development process, the application firmware is created with the mechanism of writing data to the CNN accelerator and reading the results. The files created in the first phase are integrated into the C/C++ project via #include directives. Open-source tools such as Eclipse IDE and the GNU Toolchain are also used for the development environment for the microcontroller. ADI provides a software development kit (Maxim Micros SDK (Windows)) as an installer that already contains all the necessary components and configurations. The software development kit also contains peripheral drivers, and examples and instructions to ease the process of developing applications.

Once the project has been compiled and linked without any errors, it can be evaluated on the target hardware. ADI has developed two different hardware platforms for this purpose. Figure 4 shows the MAX78000EVKIT, and Figure 5 shows the MAX78000FTHR, which is a somewhat smaller, feather form factor board. Each board comes with a VGA camera and a microphone.


Figure 5. MAX78000FTHR evaluation kit.

Conclusion

Previously, AI applications required massive energy consumption in the form of server farms or expensive FPGAs. Now, with the MAX78000 family of microcontrollers with a dedicated CNN accelerator, it’s possible to power AI applications from a single battery, for extended periods. This breakthrough in energy efficiency and power is making edge-AI more accessible than ever before and unlocking the potential for new and exciting edge-AI applications that were previously impossible.


Credit(s)



Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

The trends driving uptake of IoT Platform as a Service
Trinity IoT Editor's Choice Telecoms, Datacoms, Wireless, IoT
IoT platforms, delivered as a service, are the key that will enable enterprises to leverage a number of growing trends within the IT space, and access a range of benefits that will help them grow their businesses.

Read more...
Ultra-low power MEMS accelerometer
Altron Arrow Analogue, Mixed Signal, LSI
Analog Devices’ ADXL366 is an ultra-low power, 3-axis MEMS accelerometer that consumes only 0,96 µA at a 100 Hz output data rate and 191 nA when in motion-triggered wake-up mode.

Read more...
Interlynx-SA: Engineering SA’s digital backbone
Interlynx-SA Editor's Choice
At the heart of the industrial shift towards digitalisation lies the growing demand for telemetry, Industrial IoT (IIoT), advanced networking, and robust data solutions, and Interlynx-SA is meeting this demand.

Read more...
Converting high voltages without a transformer
Altron Arrow Editor's Choice Power Electronics / Power Management
With appropriate power converter ICs, such as the LTC7897 from Analog Devices, many applications can be suitably powered without having to use complex and cost-intensive transformers.

Read more...
MCU platform for battery-powered devices
Altron Arrow DSP, Micros & Memory
The MCX W23 is a new dedicated wireless MCU platform from NXP for battery-powered sensing devices.

Read more...
Grinn Global: From design house to SoM innovator
Editor's Choice
From its beginnings as a small electronic design house, Grinn Global has moved into the spotlight as a system-on-module innovator working alongside technology giants like MediaTek.

Read more...
Precision MEMS IMU modules
Altron Arrow Analogue, Mixed Signal, LSI
The ADIS16575/ADIS16576/ADIS16577 from Analog Devices are precision, MEMS IMUs that includes a triaxial gyroscope and a triaxial accelerometer.

Read more...
Altron Arrow introduces GX10 supercomputer
Altron Arrow AI & ML
Powered by the NVIDIA GB10 Grace Blackwell superchip, this is desktop-scale AI performance previously only available to enterprise data centres.

Read more...
MEMS with embedded AI processing
Altron Arrow Analogue, Mixed Signal, LSI
STMicroelectronics has announced an inertial measurement unit that combines sensors tuned for activity tracking and high-g impact measurement into a single, space-saving package.

Read more...
Multicore CPUs with on-chip accelerators
Altron Arrow DSP, Micros & Memory
NXP’s MCX N94x and N54x MCUs offer advanced features for consumer and industrial applications, including connectivity, security, and power management.

Read more...









While every effort has been made to ensure the accuracy of the information contained herein, the publisher and its agents cannot be held responsible for any errors contained, or any loss incurred as a result. Articles published do not necessarily reflect the views of the publishers. The editor reserves the right to alter or cut copy. Articles submitted are deemed to have been cleared for publication. Advertisements and company contact details are published as provided by the advertiser. Technews Publishing (Pty) Ltd cannot be held responsible for the accuracy or veracity of supplied material.




© Technews Publishing (Pty) Ltd | All Rights Reserved