mobile | classic
Dataweek Electronics & Communications Technology Magazine





Follow us on:
Follow us on Facebook Share via Twitter Share via LinkedIn


Search...

Electronics Buyers' Guide

Electronics Manufacturing & Production Handbook 2017


 

A fault tolerant, fully distributed RTOS
19 July 2017, Computer/Embedded Technology

As the world is moving towards ‘smarter’ systems, often embedded, and our life and society is becoming dependent on their uninterrupted operation, fault tolerance is becoming a prime requirement. Just think about autonomous driving. Will it safely bring you home all the time? In case of a system fault, the reaction time is less than 100 milliseconds. No time for a reboot.

A consequence of the fine-grain space partitioning support in the VirtuosoNext real-time operating system (RTOS) is the capability to recover from runtime faults within a few microseconds. The combination of fine-grain concurrency and this fast recovery effectively provides support for non-stop hard real-time processing even when faults occur, without a complex and costly system design. VirtuosoNext’s non-stop capability means that fault tolerance comes within reach in a cost-efficient manner, as well as in terms of development effort as in terms of compute resources.

Why do systems fail?

From time to time, the public is made aware of catastrophic failures induced by software errors. Such failures happen all the time but do not always make it to the news. A failing car is considered an annoyance, a failing mission to Mars or an airplane crashing is a costly disaster.

It is no wonder that in safety critical sectors much attention is given to the engineering process and in particular to the correctness of the software. Formal methods, rigorous processes, and external reviews by experts all aim at bringing the probability of failure to the lowest acceptable level.

Still, systems can fail spectacularly. Think about several missions to Mars, the first Ariane-5 rocket launch, the failing Patriots during the first Gulf war, etc. Sometimes the failure is not even induced by the software or the processors but by a failing power supply, as recently happened at British Airways.

Why do such systems still fail? A first obvious reason is human error. Even when using formal techniques, it is very difficult to obtain a complete set of specifications that take into account all real-world circumstances. Formal models might be proven correct, but remain incomplete models of a much more complex world.

Or the software engineer made a subtle mistake that no toolchain can catch, or worse: that will never be tested for because the mistake is unknown. And last but not least, software is an imperfect execution on an imperfect piece of hardware. In a digital world, software data types have a limited precision even if the real-world variables are continuous and can have any value. In addition, the bits that represent that data in the processor can become corrupted, for example because a cosmic ray induced extra electric charges in the circuit, or because a power glitch caused a bit to change its value. As chip features are continuously shrinking and chips operate at lower voltages, this becomes more probable.

Moreover, embedded systems work with real-world sampled data. The precision and range of the sensor introduces numeral imprecision and noise that, when not properly taken into account, can result in large deviations when used in long calculations. Any of these examples, and there are many more, will most likely result in a failure of the software and ultimately in a failing of the system.

A smart world needs to be fault tolerant

Computers have taken over the world, but they went into hiding as well and we call them embedded. Many of our systems (cars, energy networks, telecommunication systems, etc.) would not even be able to offer the service they provide without their ‘embedded’ computers.

Today, these systems are becoming ‘smart’ and adaptive. They can do this by constantly monitoring our environment and by interpreting what their sensors measure. An example of an emerging market is autonomous systems, e.g. self-driving cars and drones. Such systems have control loops measured in milliseconds. If they fail, the impact can be catastrophic because of avalanche effects.

To reduce these risks to a minimum, they must be able to continue to operate even when faults occur. We call this an ARRL-3 system, or with extra hardware redundancy an ARRL-4 or -5 system (see www.altreonic.com/content/arrl-novel-criterion-trustworthy-safety-critical-systems). The novel non-stop capability of the VirtuosoNext RTOS provides higher safety at a much lower development and implementation cost. Essentially, the non-stop capability offers ARRL-4 in an ARRL-3 package (when using the right hardware such as the presence of an MPU or MMU and error correcting logic for e.g. program memory).

Current remedies

So, what do engineers do to deal with these unlikely but inevitable failures? The most common approach is known by all of us when our PC presents us with a frozen screen: switch off the power and reboot. As anyone knows, this is brutally simple, but it can take a serious amount of time, from tens of seconds to minutes. In complex systems like airplanes it can take a lot more time as the rebooting results in mandatory system checks. Therefore, the reboot option is hardly a valid one when dealing with safety-critical real-time systems.

Another option is to use partitioning. Derived from the server world, partitioning is based on very trustworthy software that runs on top of the hardware, typically called a hypervisor. The latter will assign applications to specific memory regions and typically share the time between time partitions taking tens of milliseconds. If a failure occurs, the whole partition is re-initialised and the application is rebooted, leaving the rest of the system intact. While this avoids the complete reboot of the whole system, the impact on the failing application can still be substantial as outlined in the previous paragraph. It is not compatible with hard real-time when safety and security risks are present.

VirtuosoNext changes this in a disruptive and very convenient way. It allows applications to recover in a few microseconds without disrupting the continued execution of the application. VirtuosoNext Designer uses a combination of code generators and a formally developed, small but distributed real-time kernel.

Partitioning starts when the application is written. An application is very naturally written as a number of tasks that synchronise and communicate by using so-called hub entities. In practice these hub entities will present themselves as traditional RTOS services like events, semaphores, resource locks, mailboxes, etc. From the top level application model description, code generators generate the data structures, the initialisation code and static program images that can’t be modified at runtime as they reside in read-only memory.

A key element is the semantics of these hub services. The services pass their data by copying and with strict hand shaking. The data is passed on to the kernel task which in turn then delivers the data to the corresponding task. While it is possible to pass address pointers, this is only allowed for so-called trusted code, for example the kernel task and driver tasks.

First line of defence: decoupling

The first line of defence is that the underlying implementation fully decouples and strictly separates the application tasks from each other. If the program is correctly developed (hence not using shared memory unless under strictly controlled circumstances), no task will access the memory of another task. As the scheduling is not time based but event and priority based, small timing differences have no impact on the correct operation of the program. This is further reinforced by the use of an underlying packet-switching mechanism that also supports transparent programming across multiple processors, be it that the CPUs reside on the same chip (many/multicore processors) or across a network.

Second line of defence: fine grain partitioning at task level

The second line of defence is that VirtuosoNext Designer will allocate each task its own private memory space, fully protected by the memory protection logic of the processors. Also the code itself is protected and ‘read-only’, avoiding that it can be unwillingly modified. No task can read or write in the memory of another task, even as a result of a programming error or a fault in the hardware.

If such a fault would happen then the task will fail, but the rest of the system will continue to execute, unless another task was dependent on the execution of the failing task. In this case, the system will not crash but will stall, waiting for its data to arrive (over one of the hub entities). In many cases even this will not happen as using a hub service like a blackboard, the task can just read the latest data from that hub and continue. Note also that for security risks, this effectively limits the risks as well.

To learn about the third and fourth lines of defence offered by VirtuosoNext, and the benefits it provides for embedded security, read the full version of this article at www.dataweek.co.za, under the Computer/Embedded Technology category.

For more information contact Eric Verhulst, Altreonic, eric.verhulst@altreonic.com


  Follow us on Facebook Share via Twitter Share via LinkedIn    

Further reading:

  • Biometric fingerprint reader module
    16 August 2017, ICORP Technologies, Computer/Embedded Technology
    The GT-521F52 from ADH-tech is a high-performance fingerprint module that can be awakened by a finger touching the metal frame of the sensor. It is a one-chip module designed for integration into products ...
  • Industrial IoT gateway
    16 August 2017, Rugged Interconnect Technologies, Telecoms, Datacoms, Wireless, Computer/Embedded Technology
    Adlink’s new Matrix MXE-110i industrial IoT gateway, supporting Intel Gateway Solutions for the Internet of Things (IoT), provides versatile RF connectivity and fanless rugged construction, all in a more ...
  • Development kit for Intel RealSense
    19 July 2017, RS Components (SA), Computer/Embedded Technology
    Available from RS Components is the Intel RealSense SR300 development kit, which is ideal for software development, prototyping and depth-sensing evaluation in a wide range of applications, such as full-hand ...
  • Choosing the right mezzanine module for embedded systems
    19 July 2017, Rugged Interconnect Technologies, This Week's Editor's Pick, Computer/Embedded Technology
    Open architecture embedded systems for military/aerospace applications have always relied on mezzanine or daughter cards to provide flexibility and modularity because they are very effective in handling ...
  • 64-bit single-board computer
    19 July 2017, Electrocomp, Computer/Embedded Technology
    The Pine A64 single-board computer is based on a 1,2 GHz quad-core ARM Cortex A53 64-bit processor that executes both 64- and 32-bit code for scalable performance. Two models are available. The ‘Basic’ ...
  • Smart display I/O expansion shield
    19 July 2017, Electrocomp, Computer/Embedded Technology
    With the objective of helping both the maker community and professional engineers, Bridgetek is continuing to add new hardware to its CleO smart display platform. Compatible with both the CleO35 3,5” ...
  • Development board for Xilinx Zynq SoC
    14 June 2017, Avnet South Africa, Computer/Embedded Technology
    Avnet recently introduced the compact and cost-effective MiniZe Zynq SoC development platform, providing an entry-level development and demonstration platform for single-core Xilinx Zynq-7000 ‘all programmable’ ...
  • Embedded motherboard
    17 May 2017, NuVision Electronics, Computer/Embedded Technology
    Winstar has launched a series of embedded system products to complement its range of TFT LCD modules with motherboard solutions. Its ARM based embedded series is suitable for various application fields ...
  • Arduino education kit
    17 May 2017, RS Components (SA), Computer/Embedded Technology
    RS Components has announced the availability of the Arduino CTC 101 Education Kit, which is a complete e-learning platform enabling young students to learn the fundamentals of electronics, programming ...
  • Open-source wireless computing board
    17 May 2017, RS Components (SA), Electrocomp Express, Computer/Embedded Technology
    The latest member of the BeagleBone Black family of computer boards – the BeagleBone Black Wireless – is a credit-card-sized, easy-to-use Linux computer that integrates Wi-Fi and Bluetooth connectivity ...
  • Guinnux extends vendor-specific support
    17 May 2017, Keystone Electronic Solutions, Computer/Embedded Technology
    Keystone Electronic Solutions has made available its embedded Linux distribution, Guinnux, on NXP Semiconductors’ QorIQ LS1043A quad-core, 64-bit ARM-based processor for embedded networking and industrial ...
  • HMIs for outdoor applications
    17 May 2017, Phoenix Contact, Computer/Embedded Technology
    Featuring a new-generation processor and glass-film-glass touch technology, Phoenix Contact’s HMIs for outdoor applications provide performance and robustness for demanding applications. Thanks to their ...

 
 
         
Contact:
Technews Publishing (Pty) Ltd
1st Floor, Stabilitas House
265 Kent Ave, Randburg, 2194
South Africa
Publications by Technews
Dataweek Electronics & Communications Technology
Electronic Buyers Guide (EBG)

Hi-Tech Security Solutions
Hi-Tech Security Business Directory

Motion Control in Southern Africa
Motion Control Buyers’ Guide (MCBG)

South African Instrumentation & Control
South African Instrumentation & Control Buyers’ Guide (IBG)
Other
Terms & conditions of use, including privacy policy
PAIA Manual





 

         
    Classic | Mobile

Copyright © Technews Publishing (Pty) Ltd. All rights reserved.