False sharing

Overview

κ±°μ§“ κ³΅μœ λŠ” 캐싱 λ©”μ»€λ‹ˆμ¦˜μ— μ˜ν•΄ κ΄€λ¦¬λ˜λŠ” κ°€μž₯ μž‘μ€ λ¦¬μ†ŒμŠ€ 블둝 크기의 λΆ„μ‚°λ˜κ³  μΌκ΄€λœ μΊμ‹œκ°€ μžˆλŠ” μ‹œμŠ€ν…œμ—μ„œ λ°œμƒν•  수 μžˆλŠ” μ„±λŠ₯ μ €ν•˜ μ‚¬μš© νŒ¨ν„΄μ΄λ‹€.

  • 두 ν”„λ‘œμ„Έμ„œλ“€μ΄ 각기 λ‹€λ₯Έ λ‹€λ₯Έ μ£Όμ†Œμ— writeλ₯Ό ν•˜λ €κ³  ν•˜λ‚˜, 이 μ£Όμ†Œλ“€μ΄ 같은 μΊμ‹œ 라인에 λ§€ν•‘λœ 쑰건을 λ§ν•œλ‹€.
  • ν”„λ‘œμ„Έμ„œλ“€μ˜ μΊμ‹œ μ‚¬μ΄μ—μ„œ μΊμ‹œ 라인을 μ„œλ‘œ μ“°λŠ” 상황이 λ°œμƒν•˜κ²Œ 되면, cache coherence protocol으둜 인해 μƒλ‹Ήν•œ μ–‘μ˜ 톡신을 λ°œμƒμ‹œν‚¨λ‹€.

Example

#include <cstdio>
#include <chrono>
#include <pthread.h>

constexpr size_t
#if defined(__cpp_lib_hardware_interference_size)
  CACHE_LINE_SIZE = hardware_destructive_interference_size,
#else
  CACHE_LINE_SIZE = 64,
#endif
  MAX_THREADS = 8, MANY_ITERATIONS = 1000000000;

void* worker(void* arg) {
  volatile int* counter = (int*)arg;
  for (int i = 0; i < MANY_ITERATIONS; i++) (*counter)++;
  return NULL;
}
void test1(int num_threads) {
  auto begin = std::chrono::high_resolution_clock::now();

  pthread_t threads[MAX_THREADS];
  int counter[MAX_THREADS];

  for (int i = 0; i < num_threads; i++)
    pthread_create(&threads[i], NULL, &worker, &counter[i]);
  for (int i = 0; i < num_threads; i++)
    pthread_join(threads[i], NULL);

  auto end = std::chrono::high_resolution_clock::now();
  auto elapsed =
      std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
  printf("Time measured: %.3f seconds.\n", elapsed.count() * 1e-9);
}

struct padded_t
{
  int counter;
  char padding[CACHE_LINE_SIZE - sizeof(int)];
};
void test2(int num_threads) {
  auto begin = std::chrono::high_resolution_clock::now();

  pthread_t threads[MAX_THREADS];
  padded_t counter[MAX_THREADS];

  for (int i = 0; i < num_threads; i++)
    pthread_create(&threads[i], NULL, &worker, &(counter[i].counter));
  for (int i = 0; i < num_threads; i++)
    pthread_join(threads[i], NULL);

  auto end = std::chrono::high_resolution_clock::now();
  auto elapsed =
      std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
  printf("Time measured: %.3f seconds.\n", elapsed.count() * 1e-9);
}

int main()
{
    test1(8);
    test2(8);
}

μœ„ μ½”λ“œλ₯Ό μ‹€ν–‰ν–ˆμ„ λ•Œ, μ•„λž˜μ™€ 같은 κ²°κ³Όλ₯Ό 얻을 수 μžˆλ‹€.

Time measured: 2.946 seconds.
Time measured: 2.533 seconds.

참고자료

False sharing Lecture 10: Cache Coherence