Atomic Store Ordering: memory_order_release vs memory_order_seq_cst

Question 3 / 51 • Correct so far: 0 (0 answered)

Snippet A

Release Store

static void publishIndices(std::atomic<std::uint32_t>& tail, int n) {
    for (int i = 0; i < n; ++i) {
        tail.store(static_cast<std::uint32_t>(i), std::memory_order_release);
        benchmark::DoNotOptimize(tail);
    }
}

publishIndices(tail_release, kIterations);

Snippet B

Seqcst Store

static void publishIndices(std::atomic<std::uint32_t>& tail, int n) {
    for (int i = 0; i < n; ++i) {
        tail.store(static_cast<std::uint32_t>(i), std::memory_order_seq_cst);
        benchmark::DoNotOptimize(tail);
    }
}

publishIndices(tail_seqcst, kIterations);

Shared test data (shared-setup)

constexpr int kIterations = 65536;

Which snippet is faster? (Assume x86-64)

Snippet A is faster. A release store compiles to a plain MOV on x86. A seq_cst store requires a full memory fence — typically XCHG or MOV+MFENCE — which serialises the store buffer and stalls the pipeline. The gap is even larger on ARM where seq_cst needs an explicit barrier instruction. This pattern appears in SPSC queues: the producer writes payload then publishes the index with release, not seq_cst.

Benchmark results

clang · C++17 · -O3 -march=native

Snippet	CPU time / iteration	Speedup
Release Store	12.2 us	7.9×
Seqcst Store	96 us	1.0×

Explore the source

Open in Compiler Explorer

Atomic Store Ordering: memory_order_release vs memory_order_seq_cst

Benchmark results

Explore the source

Per-question summary

Tracking settings