|Implementing Mutexes in the QNX Neutrino Realtime OS|
|By special contributor Elad Lahav on 2016-02-18 19:27:26|
A mutex is a common type of lock used to serialize concurrent access by multiple threads to shared resources. While support for POSIX mutexes in the QNX Neutrino Realtime OS dates back to the early days of the system, this area of the code has seen considerable changes in the last couple of years.
This article is written by Elad Lahav, Software Developer, QNX Software Systems Limited.
Multi-threaded programming allows software developers to break up a process into multiple, concurrent streams of execution. There are several good (and various bad) reasons for using multiple threads within a program:
Almost every multi-threaded program must deal with the problem of sharing data among the threads. Access to data by different threads needs to be serialized in order to preserve data integrity. Even the most trivial of data manipulations, such as incrementing a variable by one and reading its value, is prone to race conditions in the presence of multiple threads that can access that data. These race conditions can, and often will, lead to incorrect execution.
There are various techniques to ensure serialization, which is the act of governing access to data by multiple threads. Most modern processors provide atomic operations that allow for integrity-preserving manipulation of data, such as test-and-set and compare-and-exchange. These atomic operations, however, are limited to well-defined data types, typically those that can fit within a single register. Nevertheless, such operations are useful as building blocks for other serialization mechanisms, most notably for various lock implementations.
A spin-lock is the simplest lock, requiring only one shared bit. A thread that wishes to access a piece of shared data can check whether the bit is set, and, if not, acquire the lock by setting it. This is done in a loop that spins as long as the bit is set, i.e., as long as the lock is held. The operation of testing whether the bit is set and setting it if not itself needs to be atomic, in order to prevent race conditions among threads contending for the lock.
While spin-locks are indispensable in certain scenarios (mostly on SMP systems in contexts that cannot block, such as interrupt services routines), these locks are rarely used in (correct) multi-threaded programs. Since the OS scheduler is unaware that the thread is waiting for a lock to be released, it may take time before the spinning thread is pre-empted and the thread currently holding the lock is scheduled. On single-processor systems, if the spinning thread has a higher priority than the current lock owner, the former can end up spinning forever. Spinning is also bad for power consumption.
Most operating systems provide a locking primitive that allows a thread to block
while waiting for the lock to become available. Such a lock is commonly referred
to as a mutex. Blocking a thread requires that the OS
scheduler be aware of the lock, avoid scheduling a thread while it is held by
another thread, and re-schedule the thread once the lock can be acquired. POSIX
Mutexes in the QNX Neutrino OS
The QNX Realtime OS is a POSIX-certified operating system and, as
such, provides a complete implementation of the POSIX Threads (pthread)
API, including mutexes. The default mutex object in the QNX Neutrino OS (i.e., one that
is statically initialized with
The implementation of mutexes is split between the C library and the QNX Neutrino micro-kernel.
C Library Implementation
The C library provides the POSIX-defined functions for handling mutexes, (pthread_mutex_init(), pthread_mutex_lock(), etc.), as well as wrappers for the relevant kernel calls 1.
User code creates a mutex by declaring a variable of type
A call to pthread_mutex_lock() will attempt to lock the mutex by performing an atomic compare-and-exchange operation on the variable's owner field, looking for a current value of 0 (no owner). If the operation succeeds, then the owner field is updated with the calling thread's system-unique ID, and the mutex is now considered locked. No kernel intervention is required in this case. The atomic operation will fail if the owner field has a value other than 0, which can happen in the following cases:
The first case is handled by the C library code (recursive mutexes increment their lock count, non-recursive mutexes return an error). The other two are handled by invoking the SyncMutexLock() kernel call.
Unlocking the mutex with a call to pthread_mutex_unlock() is again handled by a compare-and-exchange operation, which attempts to replace the calling thread's ID with the value 0. The operation will fail if:
The first case returns an error, while the second invokes the SyncMutexUnlock() kernel call.
It can be seen from the description above that an uncontested mutex, i.e., a
mutex that is locked and unlocked by the same thread without any other thread
trying to acquire it at the same time, is handled completely in user mode,
without any kernel intervention. The only overhead is that of a function call
and an atomic operation. While not free (the atomic operation impacts the bus
and memory barriers are required after acquiring and before releasing the
mutex), this operation is orders of magnitude cheaper than a call that requires
a trap into the kernel. Nevertheless, since the information stored in the
The two most important kernel calls dealing with mutexes are SyncMutexLock() and SyncMutexUnlock(). As described in the previous section, these are invoked when the C library is unable to deal with the mutex by itself. There are other calls to initialize a mutex to non-default attributes, assign a priority to a priority-ceiling mutex, associate an event with a locked mutex whose owner dies unexpectedly, and more.
Since an attempt to lock a mutex may lead to the calling thread blocking, the kernel needs to maintain a list of threads waiting on the mutex. The list is sorted first by priority and than on a first-come-first-served basis. When a mutex that is waited on by other mutexes is unlocked (as indicated by a special bit in the user-mode owner field) the kernel will choose the next thread to wake up and attempt to lock the mutex.
This design has two important implications:
The kernel object is a
Other than holding the head of the waiters list, the
The implementation of SyncMutexLock() goes through the following steps:
SyncMutexUnlock() looks up the
The correct assignment and handling of thread priorities is essential for achieving real-time behaviour in an operating system. Unfortunately, the use of mutexes in multi-threaded code can easily lead to the break-down of an otherwise properly-designed priority-based system: when a high-priority thread blocks on a mutex held by a lower-priority thread, it cannot make any forward progress until the low-priority activity reaches the point of releasing the mutex. Thus, the priority of the waiter is effectively lowered to that of the owner, a situation known as priority inversion.
POSIX mutexes provide three protocols for dealing with priority inversion:
To facilitate both the design and implementation of these protocols, it is easier to associate a priority with each mutex: the priority of a priority-inheritance mutex is that of the highest-priority waiter, that of a priority-ceiling mutex is the fixed ceiling value and that of a "none" mutex is 0. We can now define the mutex-induced priority of a thread as the maximal priority of all mutexes the thread currently holds. In the QNX Realtime OS, a thread's effective priority is the maximum of its base priority (the one set when the thread is created), its client-inherited priority (if receiving a message from another thread) and its mutex-induced priority.
Figure 1 depicts four mutexes of different protocols held by a single thread2. The mutexes are sorted by their priorities: M1 is a priority-inheritance mutex with a priority 20 waiter, M2 is a priority-ceiling mutex with a ceiling value of 11 (note that the priority 30 waiter has no impact on a priority-ceiling mutex), M3 is a priority-inheritance mutex with a priority 10 waiter and M4 is a "none" mutex, so is associated with a priority of 0. Since the base priority of T1 is 10, and since its mutex-induced priority is 20, the effective priority of this thread is 20.
The mutex-induced priority of T1 can change every time a thread blocks or unblocks on one of the mutexes it holds. If a priority 30 thread blocks on M3, the priority of that mutex becomes 30, requiring the effective priority of T1 to become 30. Conversely, if T2 stops waiting for M1 (e.g., because of a timeout on a pthread_mutex_timedlock() call or because its process exited abnormally), then the new highest-priority mutex held by T1 becomes M2, and its priority is lowered to 11. Furthermore, any change to the priority of T1 can have an impact on the priority of other threads: if T1 is currently waiting on another thread to reply to a message, that thread (the server) needs to inherit the newly adjusted priority of T1. If T1 is currently blocked on another priority-inheriting mutex, the priority of that mutex needs to be adjusted, with potentially implications on the priority of its owner. The kernel is responsible for transitively adjusting all of these priorities.
While a basic mutex lock is fairly easy to implement in a kernel, performance considerations and support for a wide variety of features require a robust design and a careful implementation. In particular, since mutexes are the governors of concurrent access to shared resources, care must be taken to account for potential race conditions within the code.
For a real-time operating system, it is imperative that the kernel provide a solution for the priority inversion problem associated with mutexes. The solution needs to deal with threads holding multiple, potentially heterogeneous, mutexes, with threads of different priorities blocking on mutexes, and with threads leaving the wait queues unexpectedly. The concept of mutexes associated with priorities facilitates both the design and the implementation of the mutex code in the QNX Neutrino microkernel.
1 The micro-kernel
architecture distinguishes between system calls, many of which are
implemented as messages passed to various services, and kernel calls,
which are traps into the micro-kernel.