Real-Time Linux: Towards multicore solutions and FPGA acceleration

Modern real-time systems combine a multitude of technologies such as connectivity, signal processing, machine vision and edge analytics just to name a few. As an operating system, embedded Linux fulfills many of the requirements faced in real-time systems of the technology industry. However, controlling machinery and physical devices still often requires strict timing which ordinary Linux is unable to provide. Fortunately, this issue is addressed by various software projects which aim to provide a real-time capable basis for software development by modifying parts of the Linux kernel. The resulting real-time kernel is usually a good solution for most tasks but for an even more predictable real-time environment it can be combined with asymmetric multiprocessing. Using a hypervisor to isolate critical and non-critical tasks brings new possibilities for building so-called mixed-criticality systems. Additionally, new advances in Field Programmable Gate Array (FPGA) technology open a whole new realm of possibilities for the most demanding real-time systems. By combining these tools and technologies it is possible to implement a modern and flexible mixed-criticality system without compromising on features, performance or cost.

Real-time Linux demonstration

We have implemented a real-time Linux demonstration which combines an embedded Linux operating system, asymmetric multiprocessing, FPGA accelerated algorithms and a web interface into a single system on chip. Built with latest real-time and web technologies such as Jailhouse, Xenomai, Docker, Node.js and AngularJS, the system demonstrates how software can be interconnected to form a seamless user experience. In addition, FPGA implemented with PYNQ and High-Level Synthesis demonstrates how computationally expensive algorithms can be accelerated using FPGA logic besides the main processor.

The real-time demonstration features Jailhouse, a lightweight hypervisor which enables us to run different isolated tasks on different physical cores of a processor. Jailhouse is called a partitioning hypervisor as it explicitly allocates physical resources such as CPU cores and memory regions to execution units called cells. This contrasts with ordinary hypervisors which usually do not restrict a task to run on a specific physical resource but simply restrict the number of resources instead. Jailhouse enables a flexible architecture where different tasks can be implemented as separate Jailhouse cells and run on-demand only when needed.

Our real-time Linux demonstration uses three Jailhouse cells: The host Linux system runs in the root cell, a bare-metal task runs in one guest cell and a Xenomai real-time kernel runs in a second guest cell. Each cell runs a control loop at a specific frequency which can be controlled from a web interface run in the host Linux system. The jitters of all three loops are measured, logged and displayed in the web interface along with other loop statistics. Some statistics algorithms are accelerated with the onboard FPGA using high-level synthesis and the PYNQ library. PYNQ and high-level synthesis together reduce the complexity involved in writing register transfer level code and logic specific device drivers which makes prototyping and development faster.

The demonstration also uses Docker technology to run a modular system monitoring setup isolated from the host Linux system. Docker and other Linux containers offer many possibilities for deploying product specific code on a hardware platform without interfering with the rest of the system. Containers can be an ideal solution for domains such as edge analytics and fleet monitoring.

Modern applications of real-time systems

Many systems developed today require guaranteed real-time performance. Some fields which use real-time systems extensively include:

Industrial
Robotics
Aerospace
Automotive
Medical
Telecommunication

Thereby, real-time systems an important topic for research and development. There are many reasons for requiring real-time performance. Some systems which interact with external machines do not work if they miss too many deadlines. A good example of such a system is an engine control unit of a car. The unit must precisely control aspects of an engine such as ignition, fuel injection, valve timing etc. while the crankshaft is spinning at some 2500 rpm. The engine may be even damaged if the required real-time constraints are not met.

Figure. Timing diagram of a basic real-time system.

The strictness of real-time requirements depends heavily on the planned application. Real-time systems can be roughly divided into three categories: Hard, soft and firm real-time systems. Hard real-time means that the system must always satisfy all deadlines or otherwise the system fails totally. In contrast, soft real-time allows the system to miss some deadlines, but the system must still satisfy most deadlines in order to work correctly. Firm real-time is quite similar to soft real-time: The system can sometimes miss a deadline but if it does, the result of the computation is useless and is therefore discarded.

Linux based real-time technologies

The Linux kernel doesn't provide any guarantees of real-time operation which is a problem for real-time systems. The Xenomai project solves this issue: It provides the required modifications to the Linux kernel in order to make it real-time without compromising support for existing hardware and software. Xenomai implements multiple Application Programming Interfaces called skins. These skins emulate APIs of other real-time operating systems on top of the Xenomai core, making it easier to migrate legacy industrial systems to run on a more modern Linux platform. Some skins provided by Xenomai include Alchemy, POSIX, VxWorks, pSOS+ and VRTX.

Figure. How Xenomai fits into the Linux ecosystem.

Another option for making Linux real-time is the PREEMPT_RT project which modifies the Linux kernel so that real-time constraints can be satisfied. In contrast to Xenomai, PREEMPT_RT doesn't introduce it's own real-time architecture beside the Linux kernel. It only modifies certain parts of the kernel to make it comply with real-time requirements.

It is also possible to implement software as standalone bare-metal code or running on top of a lightweight real-time kernel such as FreeRTOS. However, it may not be feasible to implement an entire software project on bare-metal if the application requires complex features. It may require too much time and development effort. Technologies such as Jailhouse make it possible to implement only the real-time critical parts of a software project on bare-metal. This gives developers and users the best of both worlds: A full-featured Linux system for complex but non-critical tasks and a guaranteed real-time environment for tasks which require accurate timing.

Asymmetric multiprocessing for mixed-criticality systems

In the real-time world an embedded system often needs to control an external device such as an electric motor or actuator in real-time as well as perform other less critical tasks such as display a user interface. The problem with this setup is that if the user interface crashes or hangs, it can take down the whole system with itself. This is an obvious issue if the system controls a critical piece of external equipment. In some cases it is also required that some parts of an application must under no circumstances interfere with execution of other critical code for security or safety reasons. Such a system is called a mixed-criticality system: It mixes both critical and non-critical components. Developing mixed-criticality systems involves a special set of architectural and development related challenges that need to be taken into account in order to ensure that the final product works as expected.

As always, there are many ways to implement mixed-criticality systems. Most general purpose systems today use symmetric multiprocessing where all processors of a system function as a whole and have access to the same hardware peripherals. Symmetric multiprocessing works fine for non real-time and non-critical systems but it's not always appropriate for mixed-criticality systems as resources are shared between different criticality levels. This is where asymmetric multiprocessing shines: Asymmetric multiprocessing enables critical code to run on a processor that is separate and isolated from the main processor. In asymmetric multiprocessing different processors are not treated equally with regards to memory or peripheral access. Each processor is isolated from others and only has access to a predefined subset of peripherals provided by the hardware. This guarantees that non-critical and critical tasks cannot interfere with each other.

Asymmetric multiprocessing with Jailhouse

Modern processors usually have more than one physical core in them. Common processors include 2, 4, 8 or even more cores in one package. In applications we usually think of these CPU cores as a single processor and do not make a clear distinction between different cores. However, wouldn't it be more resource efficient and flexible to treat the physical cores of a processor as separate processors instead? It turns out there are no technical reasons why this could not be done. The cores are mostly isolated and self-contained anyway. This kind of functionality is unfortunately not natively supported in most operating systems like Linux. Luckily projects such as the Jailhouse hypervisor exist and make it possible to achieve exactly what we want.

Jailhouse is a lightweight and low-overhead partitioning hypervisor which makes it possible to treat CPU cores as separate physical processors. A hypervisor creates and manages virtual machines which allows a single computer to run multiple operating systems or even bare-metal code alongside an ordinary operating system. Unlike most hypervisors on the market today, Jailhouse is more concerned with isolation than virtualization. This means that Jailhouse does not provide resource over-commitment or emulation features. It only allocates existing physical resources to separate partitions called cells. Cells are independent and isolated execution units which cannot interfere with each other. Cells can also be started dynamically from a host Linux system, providing flexibility and adaptability. Jailhouse makes it straightforward to allocate only the required resources to cells and to start and stop cells on-demand. Even though cells are isolated by default, inter-cell communication can be enabled using methods such as shared memory and network interfaces. One interesting thing to note is that in addition to bare-metal code it's also fairly straightforward to run multiple Linux kernels on separate cores with Jailhouse. For example, an ordinary Linux kernel and a patched real-time Linux kernel can be made to run on a single processor simultaneously.

Accelerating algorithms with field programmable gate arrays

Real-time Linux is a great solution for tasks which require strict timing accuracy and low or medium computational power. If an algorithm requires very high computational power and high throughput it is often not possible to run the algorithm on a general purpose processor due to power consumption, heat dissipation, availability and cost issues. Instead, FPGAs can be used to implement a particular logic in hardware. FPGAs provide a relatively inexpensive way to accelerate algorithms and other logic in hardware as opposed to Application Specific Integrated Circuits (ASIC).

FPGAs are manufactured by placing a predetermined number of configurable logic blocks (CLB) on the die of a silicon chip. FPGA programming is a synthesis process where the CLBs are interconnected and configured to perform various logic functions such as AND, OR and XOR. These are the same logic operations used by general purpose CPUs which means all logic that can be implemented in a CPU can also be implemented in an FPGA. Because the FPGA skips all general purpose circuitry used in a CPU, the resulting logic or algorithm is usually a lot more efficient than the CPU equivalent. Recent FPGAs also include more advanced logic units such as multipliers, DSP slices and block RAM among others. Instead of developing custom implementations, these building blocks can be used directly in order to further optimize an FPGA design.

Traditionally FPGA design requires low-level hardware design skills as FPGA design is effectively hardware design with a hardware description language such as Verilog or VHDL. With the introduction of high-level synthesis (HLS) the process of designing FPGA logic is greatly simplified. HLS is a set of tools which a developer can use to design logic as C or C++ code. Although the logic is described in a familiar programming language, behind the scenes the HLS tools synthesize the source code into register transfer level code and all the way into the final FPGA bitstream.

If communication between a processor and the FPGA logic is required, a developer still needs to write a Linux driver for the custom logic. Device drivers for FPGA logic can be developed using traditional methods, that is as a kernel module written with C code. However, this is not a very flexible way especially during prototyping phases. PYNQ offers a more flexible Python interface to Xilinx FPGAs. PYNQ automates many of the things required from a device driver and offloads them onto a software stack called Xilinx Runtime. The embedded developer can instead focus on implementing only the logic specific functionality as a Python driver module. Using PYNQ together with HLS enables rapid prototyping and faster software development without a steep learning curve.

Figure. FPGA design flow with HLS and PYNQ.

Although high-level synthesis offers many advantages in FPGA design, using it is still a tradeoff between ease of development and optimization. Implementing logic using HLS usually consumes more configurable logic blocks than implementing it using a hardware description language. In addition, it is often not possible to optimize the performance of HLS logic to the degree that is possible with Verilog or VHDL. These are issues which the FPGA industry is continuously fixing and as a result the synthesis toolchains are improving all the time. Using PYNQ can also become a bottleneck if very high driver performance is required. In this case device drivers written in C may be a more suitable option. Nevertheless, once drivers are first implemented with PYNQ, the transition to C based drivers is greatly simplified as the basic functionality is already sorted out. All things considered, high-level synthesis and PYNQ certainly have a lot to offer to the technology industry.

FPGA technology can also be combined with multi-core processing by using a system on a chip (SoC) which includes both a processor and an FPGA. Such a setup makes it possible to perform ordinary and real-time processing on a CPU while tailored FPGA logic handles more computationally expensive tasks. Such tasks include, for example, image and video processing or machine vision algorithms. Especially in video processing a very high resolution, framerate and throughput can be achieved if FPGAs are used. SoC based hardware is widely used in products of all kind and consequently we at Wapice also have expertise in cutting edge SoC research and development. Wapice takes part in the Finnish SoC HUB consortium together with Tampere University and other companies from the Finnish technology sector. The aim of SoC HUB is to advance national SoC development competence for applications such as 5G, AI, imaging and security.

Demanding applications require modern solutions

Our real-time Linux demonstration shows how modern mixed-criticality systems can be implemented using a single multi-core processor and how FPGA logic can be used to further accelerate critical logic and algorithms. Mixed-criticality systems for demanding industrial applications are already in widespread use today. Some potential applications for mixed-criticality systems include:

Real-time signal processing
Machine vision
Closed-loop control systems
Engine control
Robotics control
Industrial and manufacturing equipment
Traffic control
Vehicle control systems

All of these applications may not require all the technologies discussed so far, but being able to pick the right solution or technology for the right problem is one of the cornerstones of good engineering. Technologies such as real-time Linux, asymmetric multiprocessing and hardware acceleration will certainly be important subjects in embedded systems engineering for years to come. Although building real-time and mixed-criticality systems has its own set of challenges, by using these technologies it is possible to build robust, flexible and high-performance systems for the most demanding industrial applications.