A Bounded Queue, Revisited

Every systems programmer eventually writes a bounded queue. And almost every systems programmer eventually gets one wrong.

This page is about why I wrote yet another bounded work queue using pthread_mutex_t and pthread_cond_t — even though this problem has been “solved” for decades.

The problem that never quite goes away

On paper, bounded queues are simple: a fixed-size buffer, producers, consumers, and some blocking when things fill up or run dry.

In practice, they are a graveyard of subtle bugs: missed wakeups, incorrect signaling, spurious wakeups, broken invariants under load, and code that works perfectly… until it doesn’t.

I’ve seen these bugs show up in thread pools, logging systems, job schedulers, and “temporary” utilities that somehow made it into production.

This implementation started as a personal reset: what does the simplest correct solution actually look like, if we care more about reasoning than raw speed?

Design goal: boring correctness

The goal here was intentionally unambitious:

Make the invariants obvious
Make state transitions easy to reason about
Make failure modes hard to introduce accidentally

If this queue ever misbehaves, it should be easy to explain why. If it’s slow, at least it’s slow in a predictable way.

The invariants that matter

Rather than starting with code, I started with a short list of invariants. Everything else exists only to preserve these under concurrency:

0 <= size <= capacity
head and tail stay within bounds
enqueue blocks only when the queue is full
dequeue blocks only when the queue is empty
FIFO order is preserved
No item is lost or duplicated

If any of these are violated, the queue is broken — no amount of performance can save it.

One lock, one story

The entire queue state is protected by a single mutex. This is a deliberate choice.

Fine-grained locking can reduce contention, but it also multiplies the number of states your brain has to simulate. For this exercise, I wanted one lock and one place where the truth lives.

Two condition variables complete the picture:

not_full — producers sleep when the buffer is full
not_empty — consumers sleep when the buffer is empty

All waits use while, never if. This isn’t defensive paranoia — it’s required to handle spurious wakeups and racing producers or consumers.

Visibility is part of correctness

Correctness isn’t just about mutual exclusion. It’s also about memory visibility.

The mutex and condition variable operations establish a happens-before relationship between producers and consumers. Anything a producer writes before enqueue is guaranteed to be visible to a consumer after dequeue.

This guarantee is easy to take for granted — and easy to break if synchronization is misused.

What this code refuses to do

This design deliberately avoids a few common “optimizations”:

No signaling outside the mutex
No if-based waits
No multiple locks guarding related state
No busy waiting under contention

The result is not the fastest queue you can build. It is, however, a queue whose behavior you can predict when things get ugly.

How I tried to break it

I don’t believe in “it looks correct”. This implementation was exercised with:

Single-threaded sanity tests
High-contention stress tests with many producers and consumers
Dynamic analysis using ASan, UBSan, and TSan

None of this proves correctness. It simply increases confidence that the invariants survive contact with reality.

What this is (and what it isn’t)

This queue is a baseline. Something to measure more complex designs against.

It is not lock-free. It is not wait-free. It is not meant to win benchmarks.

But if you’re building something faster, you should be able to explain exactly which guarantees you’re trading away — and why.

Source code