Cache Coherency: Packed vs Padded Thread Counters

Question 12 / 51 • Correct so far: 0 (0 answered)

Snippet A

Packed Counters

struct Counters {
    std::atomic<long long> a{0};
    std::atomic<long long> b{0};
};

auto& counter = isThreadA ? g_counters.a : g_counters.b;

counter.fetch_add(1, std::memory_order_relaxed);

Snippet B

Padded Counters

struct alignas(kCacheLineBytes) PaddedCounter {
    std::atomic<long long> value{0};
};
struct Counters {
    PaddedCounter a;
    PaddedCounter b;
};

auto& counter = isThreadA ? g_counters.a.value : g_counters.b.value;

counter.fetch_add(1, std::memory_order_relaxed);

Shared test data (shared-setup)

static constexpr std::size_t kCacheLineBytes = 64;

Which snippet is faster?

Snippet B is faster. When two counters share a cache line, the coherency protocol must transfer ownership between cores on every write even though the threads never read each other's value — this is false sharing. Padding each counter to its own cache line with alignas(64) eliminates the ping-pong.

Benchmark results

clang · C++17 · -O3 -march=native

Snippet	CPU time / iteration	Speedup
Packed Counters	7.7 ns	1.0×
Padded Counters	1.27 ns	6.1×

Explore the source

Open in Compiler Explorer

Cache Coherency: Packed vs Padded Thread Counters

Benchmark results

Explore the source

Per-question summary

Tracking settings