Real-Time Systems - When Missing a Deadline Is a Bug
Helpful context:
- Kernel-Bypass Networking - Sending Packets Without Asking the OS
- CPU Affinity & NUMA - When Cores Care Where Memory Lives
A pacemaker fires 60 times per minute. Each pulse must arrive within a few milliseconds of the scheduled time. Not “usually within a few milliseconds.” Not “99th-percentile within a few milliseconds.” Every. Single. Time. If the software misses a deadline, the patient’s heart does not receive the stimulus. This is not a performance problem. It is a correctness problem. The system is wrong if it delivers the right output too late, regardless of how correct the output is.
This is what real-time systems engineering is about: not speed, but deadline guarantees. The goal is not to make the average case fast. The goal is to bound the worst case. Those are fundamentally different problems, and they require fundamentally different solutions.
“Real-Time” Does Not Mean Fast
This distinction cannot be overstated. A system can be extremely fast on average and completely unsuitable for real-time applications. A garbage-collected language runtime - Java, Go, Python - may typically run operations in microseconds, but the garbage collector pauses all threads for tens or hundreds of milliseconds at unpredictable intervals. The average is excellent; the worst case is unbounded. That is unacceptable for hard real-time.
Conversely, a system can meet hard real-time requirements while being slow by performance standards. An embedded controller that samples a sensor and adjusts a valve every 10 milliseconds, guaranteed, is a real-time system even if its operations complete in hundreds of microseconds with no particular optimization.
The engineering discipline is about bounded jitter - the maximum variation in response time - not about achieving low average latency.
Hard real-time: missing a deadline is a system failure. Failure may be catastrophic. Flight control systems (the control surfaces of an aircraft must respond within milliseconds of a pilot input), airbag controllers (the bag must deploy within 30ms of collision detection), industrial robot arms (a missed deadline in a welding robot means a misaligned weld or a worker injury), pacemakers. The system must guarantee worst-case execution time, not average-case.
Soft real-time: deadlines are important and should be met, but occasional misses degrade quality rather than cause failure. A video player must process frames at 30fps - miss a few frames and the video stutters, but the application is not broken. Voice-over-IP must keep end-to-end delay below 150ms for intelligibility - exceed it occasionally and the call is awkward, not failed. Statistical guarantees (95th, 99th percentile latency) are the standard metric.
Firm real-time: missing a deadline makes the result useless but not dangerous. The result is discarded rather than catastrophic. An online auction bid received after the auction closes is a missed deadline - the bid is simply discarded; nothing explodes.
Most systems that engineers call “real-time” are actually soft real-time - low-latency web servers, trading systems with latency requirements, game engines targeting 60fps. Understanding the distinction matters because the engineering approaches are different.
The History: VxWorks to PREEMPT_RT
Real-time operating systems predate Linux. The needs arose first in aerospace and industrial control.
VxWorks was developed in the 1980s by Wind River Systems and became the standard RTOS for safety-critical applications. It powered the Mars Pathfinder lander (1997), the Mars Exploration Rovers Spirit and Opportunity, the Boeing 787 avionics system, and dozens of spacecraft and aircraft. VxWorks provided deterministic preemptive scheduling, priority inheritance, and certification paths for DO-178C (aviation software safety standard).
FreeRTOS emerged as the dominant open-source RTOS for microcontrollers in the 2000s. With a footprint of 4 - 12KB of RAM, it runs on everything from 8-bit AVR microcontrollers to ARM Cortex-M series. Amazon acquired FreeRTOS in 2017 and released it under the MIT license, integrating it with AWS IoT Greengrass for edge compute.
Linux was never designed for real-time use. The kernel holds non-preemptible locks during various operations, disables preemption during critical sections, and takes hardware interrupts at unpredictable times. A regular Linux kernel can have scheduling latency spikes of 1 - 10 milliseconds - completely unacceptable for hard real-time, tolerable for soft real-time with careful tuning.
The PREEMPT_RT patchset (started in 2004 by Ingo Molnár and others) progressively converted the Linux kernel into a fully preemptible system. It replaced spinlocks with preemptible mutexes, made interrupt handlers run in preemptible kernel threads, and eliminated the longest non-preemptible sections. With PREEMPT_RT, Linux achieves worst-case scheduling latencies of 50 - 100μs on commodity hardware. After years as an out-of-tree patchset, PREEMPT_RT was merged into the Linux mainline kernel in version 6.12 (November 2024). Industrial Linux distributions (SUSE, Red Hat’s MRG) have shipped PREEMPT_RT in production for over a decade.
Scheduling Theory: Rate-Monotonic and EDF
For hard real-time systems, schedulability analysis determines at design time whether a given task set can always meet its deadlines. This is a formal guarantee - not a performance measurement, but a mathematical proof.
Assume $n$ periodic tasks, each with worst-case execution time $C_i$ and period $T_i$ (the task must complete within $T_i$ of being released). Utilization $U_i = C_i / T_i$ is the fraction of CPU time the task requires.
Rate-Monotonic Scheduling (RM): assign static priorities based on period - shorter period = higher priority. RM is optimal among fixed-priority algorithms for periodic tasks with deadlines equal to their periods. The schedulability bound is:
$$\sum_{i=1}^{n} \frac{C_i}{T_i} \leq n(2^{1/n} - 1)$$
As $n \to \infty$, this converges to $\ln 2 \approx 0.693$. For three tasks: the bound is $3(2^{1/3} - 1) \approx 0.780$. If total utilization is at most 78% for three tasks, the system is provably schedulable under RM.
Example:
| Task | WCET ($C_i$) | Period ($T_i$) | Utilization |
|---|---|---|---|
| T1 | 2ms | 10ms | 0.20 |
| T2 | 3ms | 15ms | 0.20 |
| T3 | 2ms | 8ms | 0.25 |
| Total | 0.65 |
Total utilization 0.65 ≤ RM bound 0.780 → schedulable. Every deadline is guaranteed to be met.
RM’s fixed priorities have an important practical advantage: behavior under overload is predictable. When the system is overloaded, lower-priority (longer-period) tasks miss deadlines first. The most critical tasks (shortest period, highest priority) are protected.
Earliest Deadline First (EDF): dynamic priority - whichever task has the earliest absolute deadline runs next. EDF is optimal for uniprocessor scheduling: if any feasible schedule exists, EDF will find one. The schedulability condition is simply:
$$\sum_{i=1}^{n} \frac{C_i}{T_i} \leq 1$$
EDF can utilize the CPU to 100% while guaranteeing all deadlines, whereas RM wastes up to 30% of CPU capacity. The trade-off: EDF requires computing absolute deadlines at each release (cheap), but its behavior under overload is less predictable than RM - when the system is overloaded, EDF may miss deadlines in a seemingly unpredictable order. RM’s graceful degradation under overload is why it is preferred in safety-critical systems even though it is theoretically less efficient.
Linux’s SCHED_DEADLINE policy implements EDF with explicit parameters:
struct sched_attr attr = {
.size = sizeof(attr),
.sched_policy = SCHED_DEADLINE,
.sched_runtime = 2000000, // 2ms WCET
.sched_deadline = 10000000, // 10ms deadline
.sched_period = 10000000, // 10ms period
};
syscall(SYS_sched_setattr, 0, &attr, 0);
// Kernel performs admission control: rejects if not schedulable
The kernel performs admission control - it computes whether adding this task violates the EDF schedulability condition and rejects the sched_setattr call if it does. This is a real guarantee, not just a scheduling hint.
Priority Inversion: The Mars Pathfinder Bug
Priority inversion is the classic failure mode that proved theoretical concerns are operational realities.
In 1997, the Mars Pathfinder rover began experiencing periodic system resets shortly after landing. The culprit was a priority inversion in the VxWorks RTOS:
- A low-priority task (meteorological data gathering) held a mutex protecting the shared information bus.
- A high-priority task (the bus scheduler) needed the same mutex - it blocked.
- A medium-priority task (communications) did not need the mutex - it preempted the low-priority task and ran.
- The high-priority bus scheduler was now effectively blocked by the medium-priority communications task, despite having higher priority.
- The watchdog timer, which expected the bus scheduler to run regularly, detected the starvation and triggered a reset.
Engineers at JPL diagnosed the problem from Earth, reproduced it in their test environment, and deployed a fix: enable priority inheritance on the affected mutex via a VxWorks configuration flag. The fix was uploaded to the rover over a radio link, and the resets stopped.
Priority inversion can happen on any system with priorities and shared resources. The formal mitigations:
Priority inheritance protocol: when a low-priority task holds a mutex that a higher-priority task is waiting for, the low-priority task temporarily inherits the higher task’s priority. This prevents medium-priority tasks from preempting the low-priority task while it holds the blocking mutex.
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_setprotocol(&attr, PTHREAD_PRIO_INHERIT);
pthread_mutex_init(&shared_mutex, &attr);
// Now: if high-priority thread blocks on this mutex,
// the holder temporarily runs at the high-priority thread's priority
Priority ceiling protocol: each mutex is assigned a priority ceiling equal to the highest priority of any task that may acquire it. A task can only lock a mutex if its current priority is higher than the ceiling of every currently locked mutex. This prevents the blocking scenario entirely and also prevents deadlock.
Priority inheritance is implemented in POSIX pthreads, VxWorks, and with PREEMPT_RT in Linux. It should be enabled on any mutex that may be shared between tasks of different priorities in a real-time context.
Eliminating Latency Sources
Real-time systems engineering is primarily the engineering of non-determinism elimination. Every source of variable latency must be accounted for and either eliminated or bounded.
Page faults: if memory pages are swapped to disk and a real-time task accesses them, the kernel must read them back - potentially taking milliseconds. Fix: lock all memory into RAM at startup.
#include <sys/mman.h>
mlockall(MCL_CURRENT | MCL_FUTURE);
// All current and future pages are locked in RAM
// No page faults can occur for the lifetime of this process
Dynamic memory allocation: malloc() and free() have unpredictable execution time. Under fragmentation, malloc() may take hundreds of microseconds or call brk() (a syscall). Under contention in a multi-threaded allocator, it may block. Fix: pre-allocate all memory at startup using a pool allocator. The hot path never calls malloc().
// Pre-allocated pool at startup
static uint8_t pool[POOL_SIZE];
static size_t pool_offset = 0;
void *rt_alloc(size_t size) {
// O(1), deterministic, no syscalls, no contention
void *p = pool + pool_offset;
pool_offset += (size + 15) & ~15; // align to 16 bytes
return p;
}
// No free() - reset the entire pool between periods
Garbage collection: GC languages (Java, Go, Python) pause all application threads during collection. GC pauses range from microseconds (Go’s concurrent GC) to hundreds of milliseconds (Java’s stop-the-world phases). This is fundamentally incompatible with hard real-time. Use C, C++, Rust, or Ada in the hard real-time path.
Scheduling jitter: the OS may preempt your thread to handle interrupts, run the scheduler tick, process RCU callbacks. On a standard kernel with no isolation, jitter can reach 1ms or more. Fix:
// Set SCHED_FIFO: preempts all SCHED_OTHER threads
struct sched_param param = { .sched_priority = 80 };
sched_setscheduler(0, SCHED_FIFO, ¶m);
Combined with isolcpus, nohz_full, and rcu_nocbs boot parameters (see CPU Affinity & NUMA - When Cores Care Where Memory Lives
), and PREEMPT_RT kernel, scheduling jitter on isolated cores drops to single-digit microseconds.
Interrupts: hardware interrupts preempt any thread, including SCHED_FIFO threads, to run the interrupt handler. Fix: redirect IRQs away from real-time cores using /proc/irq/N/smp_affinity. Network and storage interrupts go to non-real-time cores; the real-time cores are interrupt-free.
Linux Real-Time Configuration in Practice
The complete setup for a Linux soft-real-time system targeting sub-100μs worst-case latency:
# Kernel boot parameters (in /etc/default/grub)
GRUB_CMDLINE_LINUX="isolcpus=2-5 nohz_full=2-5 rcu_nocbs=2-5 \
intel_pstate=disable processor.max_cstate=1 idle=poll"
# At application startup:
mlockall(MCL_CURRENT | MCL_FUTURE); # no page faults
sched_setscheduler(0, SCHED_FIFO, {.sched_priority = 80}); # RT scheduling
cpu_set_t cpuset; CPU_SET(3, &cpuset);
pthread_setaffinity_np(tid, sizeof(cpuset), &cpuset); # pin to isolated core
# Redirect NIC interrupts away from cores 2-5
for irq in $(ls /proc/irq/); do
echo "3" > /proc/irq/$irq/smp_affinity 2>/dev/null || true
# Mask 0x3 = cores 0,1 only
done
Measuring actual worst-case latency with cyclictest:
# Install rt-tests package
cyclictest --mlockall --smp --priority=99 --interval=200 \
--loops=1000000 --histogram=400
# Standard kernel: max latency 500μs - 5ms
# PREEMPT_RT kernel: max latency 50 - 200μs
# + isolcpus + nohz: max latency 10 - 50μs
# + interrupt routing: max latency 5 - 20μs
cyclictest measures how accurately a thread can wake up at a specified interval. The worst-case deviation over one million iterations is the practical floor on scheduling jitter.
The Financial Trading Case: Microsecond Determinism
High-frequency trading (HFT) systems represent an interesting case: they are not hard real-time (a missed deadline does not crash anything), but they operate under soft real-time constraints so stringent that the engineering resembles hard real-time in practice.
A co-located order execution system must receive a market data tick, compute a trading signal, and submit an order in under 1 microsecond - round trip. At this timescale, the kernel is too slow, the NIC interrupt is too slow, and even cache misses are carefully budgeted.
The full stack: DPDK for kernel-bypass packet reception, RDMA or DPDK for order submission, isolated cores with all interrupts routed away, SCHED_FIFO at the highest priority, PREEMPT_RT kernel, mlockall, pool allocators, spin-polling the NIC. Some firms use FPGA-based NICs that process market data and generate orders entirely in hardware, with the host CPU in the loop only for signal computation.
This is not typical software engineering. The engineering discipline is closer to embedded real-time control than to application development. It is the extreme end of the soft-real-time spectrum, where the cost of a missed deadline is measured in dollars per microsecond of additional latency.
Real-Time and Cloud: An Uneasy Relationship
Cloud infrastructure and real-time guarantees are mostly incompatible at the hard real-time end of the spectrum.
Virtual machines introduce a layer of scheduling below the OS: the hypervisor schedules VMs just as the OS schedules threads. A vCPU can be preempted by the hypervisor at any time to service another VM or perform hypervisor maintenance. Even with SR-IOV and DPDK-enabled VMs, the hypervisor’s scheduling introduces jitter that cannot be eliminated from within the guest.
AWS’s Nitro hypervisor reduces this overhead substantially - Nitro VMs have near-bare-metal performance for I/O and much lower hypervisor jitter than traditional KVM or Xen. Dedicated bare-metal instances (i3.metal, c5.metal) eliminate the hypervisor entirely. HFT firms running on AWS typically use bare-metal instances with custom tuning rather than virtualized instances.
AWS IoT Greengrass with FreeRTOS demonstrates the cloud-embedded integration point: FreeRTOS runs on microcontrollers (hard real-time on the embedded side) and communicates with Greengrass Lambda functions (cloud-side processing, soft or non-real-time). The pattern is: hard real-time constraints are handled at the edge on dedicated hardware; non-deterministic processing happens in the cloud. The two meet at a message queue that absorbs the timing mismatch.
The Future: Rust in Safety-Critical Embedded
The embedded real-time space is conservatively C and C++ - certified toolchains, established MISRA guidelines, decades of deployed code. But Rust is making inroads, particularly for new projects where the safety guarantees matter.
Embassy is an async runtime for embedded Rust targeting bare-metal microcontrollers (ARM Cortex-M, RISC-V). Embassy’s async/await compiles to state machines with no heap allocation - each async fn becomes a stack-local state machine that can be driven to completion without dynamic dispatch or allocator involvement. This is compatible with hard real-time: the memory layout is fully determined at compile time, there is no GC, and the executor can be integrated with hardware timer interrupts for precise scheduling.
Zephyr RTOS, under the Linux Foundation, is the open-source answer to FreeRTOS with a broader hardware support matrix and a more active community. Zephyr is gaining adoption in IoT, automotive, and industrial control as an alternative to proprietary RTOS vendors. It supports Rust bindings as of recent versions.
The long-term trajectory: Rust’s compile-time memory safety guarantees are particularly valuable in safety-critical embedded contexts where a memory bug can have physical consequences. As certified Rust toolchains (qualification for IEC 61508, ISO 26262, DO-178C) mature, expect to see Rust appear in new embedded real-time projects while C/C++ remains entrenched in deployed systems.
The Honest Assessment
RTOS complexity is often overkill. Most “low latency” requirements are soft real-time at best, and the engineering discipline they need is: io_uring instead of blocking sockets, SCHED_FIFO for the hot-path thread, a pool allocator, mlockall, and careful interrupt routing. That stack, on a PREEMPT_RT kernel with isolated cores, achieves sub-millisecond worst-case latency for most application requirements without full RTOS complexity.
True hard real-time - guaranteed worst-case with mathematical proof of schedulability - is appropriate for safety-critical embedded systems where a missed deadline has physical consequences. If your deadline is “we lose money if we’re 100μs late” rather than “someone dies if we’re 15ms late,” you are in the soft real-time space and can use Linux with careful tuning rather than a dedicated RTOS.
Know which problem you are actually solving.
Summary
| Concept | What It Is | Why It Matters |
|---|---|---|
| Hard real-time | Missing a deadline is a system failure | Pacemakers, flight control, industrial robots |
| Soft real-time | Missing a deadline degrades quality | Video streaming, trading systems, games |
| Rate-Monotonic Scheduling | Static priority by period; proven schedulability bound | Foundational algorithm; predicts degradation under overload |
| Earliest Deadline First | Dynamic priority; optimal uniprocessor algorithm | Linux SCHED_DEADLINE; 100% CPU utilization possible |
| Priority inversion | Low-priority task indirectly blocks high-priority task | Mars Pathfinder bug; fixed by priority inheritance |
| Priority inheritance | Holder inherits waiter’s priority temporarily | POSIX PTHREAD_PRIO_INHERIT; prevents inversion |
| mlockall | Locks all memory pages into RAM | Eliminates page fault latency spikes |
| Pool allocator | Pre-allocated memory; O(1) deterministic alloc | Eliminates malloc latency and GC |
| PREEMPT_RT | Fully preemptible Linux kernel | Brings Linux to 50 - 100μs worst-case; mainline since 6.12 |
| cyclictest | Measures actual scheduling latency over many iterations | Ground truth for RT tuning verification |
| FreeRTOS / Zephyr | Lightweight RTOS for microcontrollers | IoT, embedded, hard real-time at the edge |
Read Next: