Cache Coherency: Packed vs Padded Thread Counters
Question 12 / 51 • Correct so far: 0 (0 answered)
Packed Counters
struct Counters {
std::atomic<long long> a{0};
std::atomic<long long> b{0};
};
auto& counter = isThreadA ? g_counters.a : g_counters.b;
counter.fetch_add(1, std::memory_order_relaxed); Padded Counters
struct alignas(kCacheLineBytes) PaddedCounter {
std::atomic<long long> value{0};
};
struct Counters {
PaddedCounter a;
PaddedCounter b;
};
auto& counter = isThreadA ? g_counters.a.value : g_counters.b.value;
counter.fetch_add(1, std::memory_order_relaxed); Shared test data (shared-setup)
static constexpr std::size_t kCacheLineBytes = 64; Which snippet is faster?
Snippet B is faster. When two counters share a cache line, the coherency protocol must transfer ownership between cores on every write even though the threads never read each other's value — this is false sharing. Padding each counter to its own cache line with alignas(64) eliminates the ping-pong.
Benchmark results
| Snippet | CPU time / iteration | Speedup |
|---|---|---|
| Packed Counters | 7.7 ns | 1.0× |
| Padded Counters | 1.27 ns | 6.1× |
Explore the source
Open in Compiler ExplorerQuiz complete. You can return to the question list to restart and compare.