How Long Does `pread()` Really Take?

At some point, every performance investigation collapses into a deceptively simple question:

“How long does this actually take?”

This page documents one such question, stripped down as far as I know how to strip it:

How long does a single pread() system call take, and what role does the page cache play?

Not throughput. Not benchmarks. Not “real-world workloads”. Just one syscall, measured carefully.

The code for this experiment lives here:

Why ask such a small question?

Most performance discussions mix several effects into one number: syscall overhead, page faults, storage latency, filesystem behavior, CPU scheduling, and sometimes sheer luck.

Once mixed, cause and effect become hard to separate.

Rather than arguing about numbers, this experiment backs up and asks a deliberately narrow question under explicit conditions.

If we can’t reason about one syscall in isolation, we have no business reasoning about systems built on top of thousands of them.

What is actually being measured

Each data point is the elapsed time of one pread() call, measured from userspace entry to userspace return.

Each iteration:

enters the kernel
executes pread()
returns to userspace
records elapsed time in nanoseconds

There is no averaging, no warm-up phase, and no filtering. Every syscall stands on its own.

The output is intentionally raw. Interpretation is a separate step.

The shape of the experiment

Binary shape

Single executable
Single source file
Single thread
No frameworks, helpers, or abstractions

If the experiment does not fit in one file, it is not yet simple enough to trust.

Execution model

One process
CPU affinity pinned externally via taskset
No CPU migration during measurement

CPU pinning is treated as environmental control, not part of the program itself.

Cold cache vs warm cache

Page cache state is treated as an initial condition, not a runtime toggle.

Cold runs

sync
echo 3 | sudo tee /proc/sys/vm/drop_caches

This forces the page cache to start empty. Early iterations will include page faults and real I/O.

Warm runs

cat testfile > /dev/null

This ensures the file is resident in the page cache before measurement.

The binary itself is unchanged between runs. Only the initial conditions differ.

Why `pread()`?

pread() avoids shared file offset state. Each iteration reads the same offset and follows the same kernel path.

This removes lseek() noise and keeps the syscall behavior as consistent as possible across iterations.

The goal is not realism, but repeatability.

What this experiment is not

It is not measuring filesystem throughput
It is not benchmarking NVMe devices
It is not measuring scheduler or interrupt latency
It is not claiming to represent “application performance”

Those are all valid questions. They just require different experiments.

Output, by design

The program emits one number per line:

latency_ns
latency_ns
latency_ns

There are no summaries and no opinions in the output. The measurement path is kept as short and transparent as possible.

Any aggregation or visualization happens after the fact.

Where this goes next

This page documents the baseline.

Future work will modify exactly one variable at a time — block size, offset patterns, storage media, CPU isolation — while preserving the existing structure.

As the experiment evolves, this page will grow with it.

How Long Does pread() Really Take?