This note documents the design and reasoning behind a bounded work queue
implemented using pthread_mutex_t and
pthread_cond_t.
The goal was not to build the fastest queue possible, but to build one that is easy to reason about, correct under contention, and explicit about its guarantees and limitations.
Bounded queues are a solved problem in theory, but they remain a frequent source of subtle bugs in practice: missed wakeups, spurious wakeups, incorrect signaling, data races, and broken invariants under load.
This implementation was written as an exercise in making the invariants explicit and then verifying that the code enforces them under concurrency.
The implementation is structured around a small set of invariants that must hold at all times:
0 <= size <= capacityhead and tail remain within boundsAll design decisions follow directly from preserving these invariants under concurrent access.
A single mutex protects all shared queue state. This avoids the complexity of fine-grained locking and makes it possible to reason about state transitions atomically.
Two condition variables are used:
not_full — producers wait while the queue is fullnot_empty — consumers wait while the queue is empty
All waits are performed in while loops rather than
if statements. This handles spurious wakeups and ensures that
the invariants are re-checked after every wakeup.
Correctness is not just about mutual exclusion but also about visibility. The mutex and condition variable synchronization establish a happens-before relationship between producers and consumers.
Any memory written by a producer before enqueue is guaranteed to be visible to the consumer after dequeue.
This design deliberately avoids several common pitfalls:
if instead of while around waitsThe result is not the lowest-latency queue, but one whose behavior remains predictable under load.
The implementation was validated using:
The goal of testing was not to prove correctness, but to increase confidence that the stated invariants hold in practice.
This queue is intended as a correct baseline and a reference point. It is not lock-free, not wait-free, and not optimized for extreme throughput.
More complex designs should be evaluated against a baseline like this one, with clearly stated tradeoffs.