Metaโ€™s Hyperscale Infrastructure. Overview and Insight.

Metaโ€™s Hyperscale Infrastructure: Overview and Insights

Meta์˜ ์ „ ์ง€๊ตฌ์  ๊ทœ๋ชจ์˜ ์ปดํ“จํŒ… ์ธํ”„๋ผ๋ฅผ ์‚ดํŽด๋ณด๊ณ , โ€œ๋ชจ๋“  ๊ธ€๋กœ๋ฒŒ ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ์ปดํ“จํ„ฐ์ฒ˜๋Ÿผ ์šด์˜ํ•œ๋‹คโ€๋Š” ๋น„์ „์„ ์‹คํ˜„ํ•˜๋Š” ๊ณผ์ •์—์„œ ์–ป์€ ์ฃผ์š” ๊ตํ›ˆ์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

Alibaba, Amazon, ByteDance, Google, Meta, Microsoft, Tencent์™€ ๊ฐ™์€ ํ•˜์ดํผ์Šค์ผ€์ผ๋Ÿฌ๋“ค์€ ์ „ ์„ธ๊ณ„ ์‚ฌ์šฉ์ž์—๊ฒŒ ํด๋ผ์šฐ๋“œ, ์›น, ๋˜๋Š” ๋ชจ๋ฐ”์ผ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด ์ง€๊ตฌ ๊ทœ๋ชจ์˜ ์ธํ”„๋ผ๋ฅผ ๊ตฌ์ถ•ํ•ด ์™”์Šต๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ์‹ค๋ฌด์ž๋“ค์ด ์ด๋Ÿฌํ•œ ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ๋ฅผ ์ง์ ‘ ๊ตฌ์ถ•ํ•˜์ง€๋Š” ์•Š์ง€๋งŒ, ์ด์— ๋Œ€ํ•ด ์กฐ๊ธˆ์ด๋ผ๋„ ๋ฐฐ์šฐ๋Š” ๊ฒƒ์ด ์œ ์ตํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์—ญ์‚ฌ์ ์œผ๋กœ, ๋งŽ์€ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๊ธฐ์ˆ ๋“ค์ด ๊ณ ๊ธ‰ ํ™˜๊ฒฝ์—์„œ ๋น„๋กฏ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. 1960๋…„๋Œ€์˜ ๋ฉ”์ธํ”„๋ ˆ์ž„๊ณผ ์ตœ๊ทผ 20๋…„๊ฐ„์˜ ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ๊ฐ€ ๊ทธ ์˜ˆ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ฐ€์ƒ ๋ฉ”๋ชจ๋ฆฌ๋Š” ๋ฉ”์ธํ”„๋ ˆ์ž„์—์„œ ์ฒ˜์Œ ๋“ฑ์žฅํ–ˆ์œผ๋ฉฐ, ํ˜„์žฌ๋Š” ์Šค๋งˆํŠธ์›Œ์น˜์—์„œ๋„ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, Kubernetes๋Š” Google์—์„œ, PyTorch๋Š” Facebook์—์„œ ๊ฐœ๋ฐœ๋˜์—ˆ์ง€๋งŒ, ํ˜„์žฌ๋Š” ๊ทœ๋ชจ๋ฅผ ๋ถˆ๋ฌธํ•˜๊ณ  ๋‹ค์–‘ํ•œ ์กฐ์ง์—์„œ ์ฑ„ํƒํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์ • ๊ธฐ์ˆ ๋“ค๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ์—์„œ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์›์น™๊ณผ ๊ตํ›ˆ์€ ์‹ค๋ฌด์ž๋“ค์ด ๋” ๋‚˜์€ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ธ€์—์„œ๋Š” Meta์˜ ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ์— ๋Œ€ํ•œ ๊ฐœ์š”๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ํŠนํžˆ ์‹œ์Šคํ…œ ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ ๊ณผ์ •์—์„œ ์–ป์€ ์ฃผ์š” ํ†ต์ฐฐ์„ ์ค‘์ ์ ์œผ๋กœ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๊ด€๋ จ๋œ ๋ถ€๋ถ„์—์„œ๋Š” ํผ๋ธ”๋ฆญ ํด๋ผ์šฐ๋“œ์™€์˜ ์ฐจ์ด์ ์„ ๊ฐ•์กฐํ•˜๋ฉฐ, ์„œ๋กœ ๋‹ค๋ฅธ ์ œ์•ฝ ์กฐ๊ฑด์ด ์–ด๋–ป๊ฒŒ ์ฐจ๋ณ„ํ™”๋œ ์ตœ์ ํ™”๋ฅผ ์ด๋Œ์—ˆ๋Š”์ง€ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๋งŽ์€ ์ง€์‹๋“ค์€ ์ด๋ฏธ ์—…๊ณ„์™€ ์—ฐ๊ตฌ ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ๊ณต์œ ๋˜๊ณ  ์‹ค์ฒœ๋˜์–ด ์™”์œผ๋ฉฐ, Meta์˜ ์ด์ „ ์—ฐ๊ตฌ์—์„œ๋„ ๋‹ค๋ฃฌ ๋ฐ” ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ๊ธ€์˜ ์ฃผ์š” ๊ธฐ์—ฌ๋Š” ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ์ „์ฒด์ ์ธ ๊ด€์ ์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Key insight

  • Meta์˜ ์—”์ง€๋‹ˆ์–ด๋ง ๋ฌธํ™”๋Š” ์‹ ์†ํ•œ ๊ฐœ๋ฐœ, ๊ธฐ์ˆ  ๊ฐœ๋ฐฉ์„ฑ, ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ์—ฐ๊ตฌ, ๊ทธ๋ฆฌ๊ณ  ๊ณต์œ  ์ธํ”„๋ผ๋ฅผ ์ค‘์‹œํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐœ๋ฐœ์ž์˜ ์ƒ์‚ฐ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด Meta๋Š” ๋ณดํŽธ์ ์œผ๋กœ ์ง€์†์  ๋ฐฐํฌ(Continuous Deployment)๋ฅผ ๋„์ž…ํ–ˆ์œผ๋ฉฐ, ์ „ํ†ต์ ์ธ ์„œ๋น„์Šค ์ฝ”๋“œ ๋Œ€์‹  ์„œ๋ฒ„๋ฆฌ์Šค(Serverless) ํ•จ์ˆ˜๋ฅผ ๋” ๋งŽ์ด ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ•˜๋“œ์›จ์–ด ๋น„์šฉ ์ ˆ๊ฐ์„ ์œ„ํ•ด Meta๋Š” ๋ฐ์ดํ„ฐ์„ผํ„ฐ ๊ทœ๋ชจ์—์„œ ํ•˜๋“œ์›จ์–ด-์†Œํ”„ํŠธ์›จ์–ด ๊ณต๋™ ์„ค๊ณ„๋ฅผ ํ™œ์šฉํ•˜๋ฉฐ, ๊ฐœ๋ณ„ ํด๋Ÿฌ์Šคํ„ฐ์— ๊ตญํ•œ๋˜์ง€ ์•Š๊ณ  ๊ธ€๋กœ๋ฒŒ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ „์ฒด์—์„œ ์›Œํฌ๋กœ๋“œ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์„ ํฌํ•จํ•œ ์ž์› ํ• ๋‹น์„ ์ž๋™์œผ๋กœ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค.
  • Meta์˜ AI ์ „๋žต์€ PyTorch๋ถ€ํ„ฐ AI ๊ฐ€์†๊ธฐ, ๋„คํŠธ์›Œํฌ, ๊ทธ๋ฆฌ๊ณ  Llama์™€ ๊ฐ™์€ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ๊นŒ์ง€ ์ „์ฒด ์Šคํƒ์„ ๊ณต๋™ ์„ค๊ณ„ํ•˜๋Š” ๊ฒƒ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

1. Engineering Culture

Meta์˜ ์ธํ”„๋ผ๋ฅผ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ธฐ ์ „์—, ๋จผ์ € ํšŒ์‚ฌ์˜ ์—”์ง€๋‹ˆ์–ด๋ง ๋ฌธํ™”์˜ ๋ช‡ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ์ธก๋ฉด์„ ๊ฐ•์กฐํ•˜๋ ค ํ•ฉ๋‹ˆ๋‹ค. ์กฐ์ง์˜ ๋ฌธํ™”๋Š” ๊ธฐ์ˆ ์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

Move fast

Facebook์€ ์ฐฝ๋ฆฝ ์ดˆ๊ธฐ๋ถ€ํ„ฐ โ€œ๋น ๋ฅด๊ฒŒ ์›€์ง์ด๊ธฐ(move-fast)โ€ ๋ฌธํ™”๋ฅผ ๊นŠ์ด ๋ฟŒ๋ฆฌ๋‚ด๋ฆฌ๊ณ  ์œ ์ง€ํ•ด ์™”์œผ๋ฉฐ, ๋ฏผ์ฒฉ์„ฑ๊ณผ ๋น ๋ฅธ ๋ฐ˜๋ณต(iteration)์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค. ์ด ์ฒ ํ•™์€ ์ง€์†์  ์†Œํ”„ํŠธ์›จ์–ด ๋ฐฐํฌ(continuous deployment)์— ๋Œ€ํ•œ ๊ฐ•ํ•œ ์˜์ง€์—์„œ ๋ถ„๋ช…ํ•˜๊ฒŒ ๋“œ๋Ÿฌ๋‚ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ตœ์‹  ์ฝ”๋“œ๋ฅผ ๊ฐ€๋Šฅํ•œ ํ•œ ๋นจ๋ฆฌ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์— ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ œํ’ˆ ์—”์ง€๋‹ˆ์–ด๋“ค์€ ์ฃผ๋กœ PHP, Python, Erlang์—์„œ ์ƒํƒœ๋ฅผ ๊ฐ€์ง€์ง€ ์•Š๋Š”(stateless) ์„œ๋ฒ„๋ฆฌ์Šค(serverless) ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ฝ”๋“œ์˜ ๋‹จ์ˆœ์„ฑ, ์ƒ์‚ฐ์„ฑ, ๊ทธ๋ฆฌ๊ณ  ๋ฐ˜๋ณต ์†๋„๋ฅผ ๋†’์ด๋Š” ๋ฐ ์œ ๋ฆฌํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ํŒ€์€ ๊ธด ์žฌ๊ณ„ํš(replanning) ๊ณผ์ • ์—†์ด๋„ ์‹คํ–‰ ์šฐ์„ ์ˆœ์œ„๋ฅผ ์‹ ์†ํ•˜๊ฒŒ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์• ๋งคํ•œ ๋ฌธ์ œ๋“ค์€ ๋ฐ˜๋ณต์ ์ธ ์‹คํ–‰ ๊ณผ์ •์—์„œ ํ•ด๊ฒฐํ•ด ๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์„ ์ทจํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ ํŒ€์ด ๋ณ€ํ™”ํ•˜๋Š” ์‹œ์žฅ ์ƒํ™ฉ์— ๋น ๋ฅด๊ฒŒ ์ ์‘ํ•˜๊ณ , ์ƒˆ๋กœ์šด ์ œํ’ˆ์„ ์‹ ์†ํ•˜๊ฒŒ ์ถœ์‹œํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

Technology openness

Meta๋Š” ๋‚ด๋ถ€์ ์œผ๋กœ๋‚˜ ์™ธ๋ถ€์ ์œผ๋กœ ๊ธฐ์ˆ  ๊ฐœ๋ฐฉ์„ฑ์„ ์ ๊ทน์ ์œผ๋กœ ์ถ”๊ตฌํ•ฉ๋‹ˆ๋‹ค. ๋‚ด๋ถ€์ ์œผ๋กœ, Meta๋Š” ๋ชจ๋…ธ๋ ˆํฌ(monorepo) ๋ฐฉ์‹์„ ์ฑ„ํƒํ•˜์—ฌ ๋ชจ๋“  ํ”„๋กœ์ ํŠธ์˜ ์ฝ”๋“œ๋ฅผ ๋‹จ์ผ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ฝ”๋“œ ๊ฒ€์ƒ‰๊ณผ ์žฌ์‚ฌ์šฉ์„ ์šฉ์ดํ•˜๊ฒŒ ํ•˜๊ณ , ํŒ€ ๊ฐ„ ํ˜‘์—…์„ ์ด‰์ง„ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์กฐ์ง๋“ค๋„ ๋ชจ๋…ธ๋ ˆํฌ๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ, ๊ฐœ๋ฐฉ์„ฑ์˜ ์ •๋„๋Š” ์กฐ์ง๋งˆ๋‹ค ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์ผ๋ถ€ ์กฐ์ง์—์„œ๋Š” ๊ฐ ํ”„๋กœ์ ํŠธ์— ์ง€์ •๋œ ์†Œ์œ ์ž๊ฐ€ ์žˆ์–ด, ์ฝ”๋“œ ๋ณ€๊ฒฝ์„ ์Šน์ธํ•  ์ˆ˜ ์žˆ๋Š” ๊ถŒํ•œ์ด ์˜ค์ง ์†Œ์œ ์ž์—๊ฒŒ๋งŒ ์ฃผ์–ด์ง‘๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์€ ๋ณ€๊ฒฝ์„ ์ œ์•ˆํ•  ์ˆ˜๋Š” ์žˆ์ง€๋งŒ, ์ตœ์ข… ๊ฒฐ์ •๊ถŒ์€ ์†Œ์œ ์ž๊ฐ€ ๊ฐ–์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, Meta์˜ ๋Œ€๋ถ€๋ถ„์˜ ํ”„๋กœ์ ํŠธ๋Š” ์ด๋Ÿฌํ•œ ์—„๊ฒฉํ•œ ์†Œ์œ ๊ถŒ ๊ทœ์น™์„ ์ ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค(์ผ๋ถ€ ์˜ˆ์™ธ๋ฅผ ์ œ์™ธํ•˜๊ณ ). ์ด๋Ÿฌํ•œ ๊ฐœ๋ฐฉ์„ฑ์€ ํŒ€ ๊ฐ„ ํ˜‘์—…๊ณผ ์ฝ”๋“œ ์žฌ์‚ฌ์šฉ์„ ์žฅ๋ คํ•˜๋ฉฐ, ์œ ์‚ฌํ•œ ๊ธฐ์ˆ ์„ ์ค‘๋ณต ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.

Meta์—์„œ๋Š” ์—”์ง€๋‹ˆ์–ด๋“ค์ด ๋ชจ๋…ธ๋ ˆํฌ์˜ ๋ฉ”์ธ๋ผ์ธ(mainline)์— ์ง์ ‘ ์ฝ”๋“œ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ์ปค๋ฐ‹ํ•˜๋ฉฐ, ์†Œํ”„ํŠธ์›จ์–ด ๋ฐฐํฌ๋„ ์•ˆ์ •์ ์ธ ๋ธŒ๋žœ์น˜๊ฐ€ ์•„๋‹ˆ๋ผ ์ตœ์‹  ์ฝ”๋“œ๊ฐ€ ํฌํ•จ๋œ ๋ฉ”์ธ๋ผ์ธ์—์„œ ์ปดํŒŒ์ผ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, RPC ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ๊ฐ™์€ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์—…๋ฐ์ดํŠธ๋˜๋ฉด, ์ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋“  ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ๋‹ค์Œ ๋ฆด๋ฆฌ์ฆˆ๋Š” ์ž๋™์œผ๋กœ ์ตœ์‹  ๋ฒ„์ „์˜ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ํ•จ๊ป˜ ์ปดํŒŒ์ผ๋ฉ๋‹ˆ๋‹ค.

์™ธ๋ถ€์ ์œผ๋กœ, Meta์˜ ๊ธฐ์ˆ  ๊ฐœ๋ฐฉ์„ฑ์— ๋Œ€ํ•œ ๋…ธ๋ ฅ์€ Open Compute Project๋ฅผ ํ†ตํ•œ ์˜คํ”ˆ์†Œ์Šค ํ•˜๋“œ์›จ์–ด ๋””์ž์ธ๊ณผ, PyTorch, Llama, Presto, RocksDB, Cassandra์™€ ๊ฐ™์€ ์˜คํ”ˆ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด ํ”„๋กœ์ ํŠธ๋ฅผ ํ†ตํ•ด ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, Meta์˜ ๋งŽ์€ ์ธํ”„๋ผ ๊ธฐ์ˆ ๋“ค์€ ์—ฐ๊ตฌ ๋…ผ๋ฌธ์„ ํ†ตํ•ด ๊ณต์œ ๋˜์—ˆ์œผ๋ฉฐ, ์ด ๊ธ€์—์„œ ์ฐธ๊ณ ํ•  ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ ์‚ฌ๋ก€๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

Research in production

์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ์—ฐ๊ตฌ. Meta์˜ ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ๋Š” ์ง€์†์ ์ธ ํ˜์‹ ์„ ์š”๊ตฌํ•˜์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์˜ ํ•˜์ดํผ์Šค์ผ€์ผ๋Ÿฌ๋“ค๊ณผ ๋‹ฌ๋ฆฌ Meta๋Š” ์ „๋‹ด ์‹œ์Šคํ…œ ์—ฐ๊ตฌ์†Œ๋ฅผ ์šด์˜ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋Œ€์‹ , Meta์˜ ๋ชจ๋“  ์‹œ์Šคํ…œ ์—ฐ๊ตฌ ๋…ผ๋ฌธ์€ ์‹ค์ œ ์šด์˜ ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜๋Š” ํŒ€์— ์˜ํ•ด ์ž‘์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด ํŒ€๋“ค์€ ๋Œ€๊ทœ๋ชจ ์šด์˜ ํ™˜๊ฒฝ์—์„œ์˜ ๋„์ „์ ์ธ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๋Š” ๊ณผ์ •์—์„œ ์ตœ์ฒจ๋‹จ ๊ธฐ์ˆ ์„ ๋ฐœ์ „์‹œํ‚ค๋ฉฐ, ์ด๋Ÿฌํ•œ ๊ฒฝํ—˜์„ ๋ฐ”ํƒ•์œผ๋กœ ํšจ๊ณผ์ ์ธ ํ•ด๊ฒฐ์ฑ…์„ ์—ฐ๊ตฌ ๋…ผ๋ฌธ์œผ๋กœ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์€ ์—ฐ๊ตฌ์—์„œ ๋‹ค๋ฃจ๋Š” ๋ฌธ์ œ๊ฐ€ ์‹ค์ œ ๋ฌธ์ œ์ด๋ฉฐ, ํ•ด๊ฒฐ์ฑ…์ด ๋Œ€๊ทœ๋ชจ ํ™˜๊ฒฝ์—์„œ๋„ ํšจ๊ณผ์ ์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค๋Š” ์ ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์„ฑ๊ณต์ ์ธ ์‹œ์Šคํ…œ ์—ฐ๊ตฌ์˜ ํ•ต์‹ฌ ๊ธฐ์ค€๊ณผ ์ž˜ ๋ถ€ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

Common infrastructure

์ผ๋ถ€ ์กฐ์ง๋“ค์€ ๊ฐœ๋ณ„ ํŒ€์ด ์ž์ฒด์ ์œผ๋กœ ๊ธฐ์ˆ  ์Šคํƒ์„ ๊ฒฐ์ •ํ•˜๋„๋ก ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Meta๋Š” ํ‘œ์ค€ํ™”์™€ ๊ธ€๋กœ๋ฒŒ ์ตœ์ ํ™”๋ฅผ ์šฐ์„ ์‹œํ•ฉ๋‹ˆ๋‹ค. ํ•˜๋“œ์›จ์–ด ์ธก๋ฉด์—์„œ, ๋‹ค์–‘ํ•œ ์ œํ’ˆ์„ ์ง€์›ํ•˜๋Š” ์„œ๋ฒ„๋“ค์€ ๋ชจ๋‘ ๊ณต์œ  ์„œ๋ฒ„ ํ’€์—์„œ ํ• ๋‹น๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, AI๊ฐ€ ์•„๋‹Œ ์ผ๋ฐ˜ ์ปดํ“จํŒ… ์›Œํฌ๋กœ๋“œ์˜ ๊ฒฝ์šฐ, ๋‹จ์ผ ์„œ๋ฒ„ ์œ ํ˜•๋งŒ์„ ์ œ๊ณตํ•˜๋ฉฐ, ํ•˜๋‚˜์˜ CPU์™€ ๋™์ผํ•œ ์šฉ๋Ÿ‰์˜ DRAM(์ด์ „์—๋Š” 64GB, ํ˜„์žฌ๋Š” 256GB)์„ ํƒ‘์žฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ๊ณ ๊ฐ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ ์„œ๋ฒ„๋ฅผ ์ œ๊ณตํ•ด์•ผ ํ•˜๋Š” ํผ๋ธ”๋ฆญ ํด๋ผ์šฐ๋“œ์™€ ๋‹ฌ๋ฆฌ, Meta๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ํ•˜๋“œ์›จ์–ด์— ๋งž๊ฒŒ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์–ด, ์„œ๋ฒ„ ์œ ํ˜•์˜ ๋ถˆํ•„์š”ํ•œ ํ™•์‚ฐ์„ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์†Œํ”„ํŠธ์›จ์–ด ์ธก๋ฉด์—์„œ๋„ ํ‘œ์ค€ํ™”๊ฐ€ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ณผ๊ฑฐ์—๋Š” Meta์˜ ๋‹ค์–‘ํ•œ ์ œํ’ˆ์ด ํ‚ค-๊ฐ’ ์ €์žฅ์†Œ๋กœ Cassandra, HBase, ZippyDB๋ฅผ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ, ํ˜„์žฌ๋Š” ๋ชจ๋‘ ZippyDB๋กœ ํ†ตํ•ฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์†Œํ”„ํŠธ์›จ์–ด ๋ฐฐํฌ, ๊ตฌ์„ฑ ๊ด€๋ฆฌ, ์„œ๋น„์Šค ๋ฉ”์‹œ, ์‚ฌ์ „ ์„ฑ๋Šฅ ํ…Œ์ŠคํŠธ, ์šด์˜ ์ค‘ ์„ฑ๋Šฅ ๋ชจ๋‹ˆํ„ฐ๋ง, ์šด์˜ ์ค‘ ๋ถ€ํ•˜ ํ…Œ์ŠคํŠธ์™€ ๊ฐ™์€ ๊ณตํ†ต ๊ธฐ๋Šฅ๋“ค์€ ๋ชจ๋‘ ๋ณดํŽธ์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋„๊ตฌ๋ฅผ ํ†ตํ•ด ์ง€์›๋ฉ๋‹ˆ๋‹ค.

ํ‘œ์ค€ํ™” ์™ธ์—๋„, ๊ณตํ†ต ์ธํ”„๋ผ๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐ ์žˆ์–ด ํ•ต์‹ฌ ์›์น™์€ ๋‹จ์ผ(monolithic) ์†”๋ฃจ์…˜๋ณด๋‹ค ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ์„ ํ˜ธํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์›์น™์˜ ์ข‹์€ ์˜ˆ๋Š” Meta์˜ ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ์ธ Tectonic์—์„œ์˜ ๊ตฌ์„ฑ ์š”์†Œ ์žฌ์‚ฌ์šฉ ์ฒด๊ณ„์ž…๋‹ˆ๋‹ค. Tectonic์€ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด ๋ถ„์‚ฐ ํ‚ค-๊ฐ’ ์ €์žฅ์†Œ์ธ ZippyDB๋ฅผ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ํ™•์žฅ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ZippyDB๋Š” ๋ฐ์ดํ„ฐ ์ƒค๋“œ๋ฅผ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๊ณตํ†ต ์ƒค๋”ฉ ํ”„๋ ˆ์ž„์›Œํฌ์ธ Shard Manager๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, Shard Manager๋Š” ๋‹ค์‹œ ์ƒค๋“œ ๊ฒ€์ƒ‰๊ณผ ์š”์ฒญ ๋ผ์šฐํŒ…์„ ์œ„ํ•ด Meta์˜ ์„œ๋น„์Šค ๋ฉ”์‹œ์ธ ServiceRouter์— ์˜์กดํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ServiceRouter๋Š” ์‚ฌ์ดํŠธ์˜ ์ง€์†์ ์ธ ์šด์˜์— ํ•„์ˆ˜์ ์ธ ์„œ๋น„์Šค ๊ฒ€์ƒ‰ ๋ฐ ๊ตฌ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์‹ ๋ขฐ์„ฑ, ๋ฌด์˜์กด์„ฑ(zero-dependency)์˜ ๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ์ธ Delos์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ๊ตฌ์„ฑ ์š”์†Œ ์žฌ์‚ฌ์šฉ ์ฒด๊ณ„๋Š” Tectonic โ†’ ZippyDB โ†’ Shard Manager โ†’ ServiceRouter โ†’ Delos๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. ์ด ๋ชจ๋“  ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ตฌ์„ฑ ์š”์†Œ๋“ค์€ ๋‹ค์–‘ํ•œ ๋‹ค๋ฅธ ์šฉ๋ก€์—์„œ๋„ ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ์˜คํ”ˆ์†Œ์Šค ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ์ธ HDFS๋Š” ์ด๋Ÿฌํ•œ ๋ชจ๋“  ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๋‚ด๋ถ€์ ์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ๋‹จ์ผ(monolithic) ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค.

Culture case study: The Threads app

Twitter/X์™€ ์ž์ฃผ ๋น„๊ต๋˜๋Š” Threads ์•ฑ์˜ ๊ฐœ๋ฐœ ๊ณผ์ •์€ ์•ž์„œ ์–ธ๊ธ‰ํ•œ Meta์˜ ๋ฌธํ™”๋ฅผ ์ž˜ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. โ€œ๋น ๋ฅด๊ฒŒ ์›€์ง์ด๊ธฐโ€ ๋ฌธํ™”๋ฅผ ๊ฐ•์กฐํ•˜๋ฉฐ, ์†Œ๊ทœ๋ชจ ํŒ€์ด ์Šคํƒ€ํŠธ์—…๊ณผ ๊ฐ™์€ ํ™˜๊ฒฝ์—์„œ ๋‹จ 5๊ฐœ์›”์˜ ๊ธฐ์ˆ  ๊ฐœ๋ฐœ์„ ํ†ตํ•ด Threads๋ฅผ ์™„์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ๊ฐœ๋ฐœ์ด ์™„๋ฃŒ๋œ ํ›„ ์ธํ”„๋ผ ํŒ€์€ ํ”„๋กœ๋•์…˜ ๋ฐฐํฌ๋ฅผ ์ค€๋น„ํ•  ์‹œ๊ฐ„์ด ๋‹จ ์ดํ‹€๋ฐ–์— ์ฃผ์–ด์ง€์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ๋Œ€๊ธฐ์—…์—์„œ๋Š” ์ˆ˜์‹ญ ๊ฐœ์˜ ์ƒํ˜ธ ์˜์กด์ ์ธ ํŒ€์ด ์ฐธ์—ฌํ•˜๋Š” ํ”„๋กœ์ ํŠธ ๊ณ„ํš์„ ์ž‘์„ฑํ•˜๋Š” ๋ฐ๋งŒ๋„ ์ดํ‹€ ์ด์ƒ์ด ๊ฑธ๋ฆฌ๋ฉฐ, ์‹คํ–‰๊นŒ์ง€๋Š” ํ›จ์”ฌ ๋” ์˜ค๋žœ ์‹œ๊ฐ„์ด ์†Œ์š”๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Meta์—์„œ๋Š” ๋ถ„์‚ฐ๋œ ์—ฌ๋Ÿฌ ์ง€์—ญ์— ์ฆ‰์‹œ โ€˜์›Œ๋ฃธ(war room)โ€™์„ ๋งˆ๋ จํ•˜์—ฌ ์ธํ”„๋ผ ํŒ€๊ณผ ์ œํ’ˆ ํŒ€์„ ํ•œ์ž๋ฆฌ์— ๋ชจ์œผ๊ณ  ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์ฒ˜๋Ÿผ ์ด‰๋ฐ•ํ•œ ์ผ์ •์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , Threads๋Š” ์ถœ์‹œ ํ›„ ๋‹จ 5์ผ ๋งŒ์— 1์–ต ๋ช…์˜ ์‚ฌ์šฉ์ž๋ฅผ ํ™•๋ณดํ•˜๋ฉฐ ์—ญ์‚ฌ์ƒ ๊ฐ€์žฅ ๋น ๋ฅด๊ฒŒ ์„ฑ์žฅํ•œ ์•ฑ์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๊ณตํ†ต ์ธํ”„๋ผ๋Š” ํŒ€์ด Threads๋ฅผ ์‹ ์†ํ•˜๊ฒŒ ๊ตฌํ˜„ํ•˜๊ณ  ์•ˆ์ •์ ์œผ๋กœ ํ™•์žฅํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค. Threads๋Š” Instagram์˜ Python ๋ฐฑ์—”๋“œ๋ฅผ ๊ทธ๋Œ€๋กœ ์žฌ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ, ์†Œ์…œ ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค, ํ‚ค-๊ฐ’ ์ €์žฅ์†Œ, ์„œ๋ฒ„๋ฆฌ์Šค ํ”Œ๋žซํผ, ๋จธ์‹ ๋Ÿฌ๋‹(ML) ํ•™์Šต ๋ฐ ์ถ”๋ก  ํ”Œ๋žซํผ, ๊ทธ๋ฆฌ๊ณ  ๋ชจ๋ฐ”์ผ ์•ฑ์˜ ๊ตฌ์„ฑ ๊ด€๋ฆฌ ํ”„๋ ˆ์ž„์›Œํฌ์™€ ๊ฐ™์€ Meta์˜ ๊ณตํ†ต ์ธํ”„๋ผ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

Meta์˜ ๋‚ด๋ถ€ ๊ธฐ์ˆ  ๊ฐœ๋ฐฉ์„ฑ, ์ฆ‰ ๋ชจ๋…ธ๋ ˆํฌ(monorepo) ๋ฐฉ์‹์„ ํ™œ์šฉํ•จ์œผ๋กœ์จ Threads๋Š” Instagram์˜ ์ผ๋ถ€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ฝ”๋“œ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜์—ฌ ๊ฐœ๋ฐœ ์†๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์™ธ๋ถ€ ๊ธฐ์ˆ  ๊ฐœ๋ฐฉ์„ฑ ์ธก๋ฉด์—์„œ, Threads๋Š” ๋‹ค๋ฅธ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ์˜ ์ƒํ˜ธ ์šด์šฉ์„ฑ์„ ์œ„ํ•ด ์˜คํ”ˆ ์†Œ์…œ ๋„คํŠธ์›Œํฌ ํ”„๋กœํ† ์ฝœ์ธ ActivityPub๊ณผ์˜ ํ†ตํ•ฉ์„ ๋ชฉํ‘œ๋กœ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” Threads๋ฅผ ์‹ ์†ํ•˜๊ฒŒ ๊ฐœ๋ฐœํ•œ ๊ฒฝํ—˜์„ ๊ณต๊ฐœ์ ์œผ๋กœ ๊ณต์œ ํ–ˆ์Šต๋‹ˆ๋‹ค.

Insight 1: ๋งŽ์€ ๋„์ „ ๊ณผ์ œ๊ฐ€ ์žˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ๋Œ€๊ทœ๋ชจ ์กฐ์ง์—์„œ๋„ โ€˜๋น ๋ฅด๊ฒŒ ์›€์ง์ด๋Š”โ€™ ๋ฌธํ™”๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ, ๊ณตํ†ต ์ธํ”„๋ผ๋ฅผ ํ™œ์šฉํ•˜๊ณ , ์—„๊ฒฉํ•œ ์ฝ”๋“œ ์†Œ์œ ๊ถŒ ๊ทœ์น™ ์—†์ด ๋ชจ๋…ธ๋ ˆํฌ๋ฅผ ๊ณต์œ ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ ์ด ์ž…์ฆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

End-to-End User Request Flow

์ด์ œ Meta์˜ ์ธํ”„๋ผ ๊ธฐ์ˆ ์„ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Meta์˜ ์ œํ’ˆ๋“ค์€ ๊ณตํ†ต ์„œ๋น„์Šค ์ธํ”„๋ผ์— ์˜ํ•ด ์ง€์›๋ฉ๋‹ˆ๋‹ค. ์ด ์ธํ”„๋ผ์˜ ์ „์ฒด์ ์ธ ๊ฐœ์š”๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด, ์‚ฌ์šฉ์ž ์š”์ฒญ์ด ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌ๋˜๋Š”์ง€ ์„ค๋ช…ํ•˜๋ฉฐ, ์ด ๊ณผ์ •์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋“  ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ์ž์„ธํžˆ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

Request Routing

์‚ฌ์šฉ์ž๊ฐ€ facebook.com์— ์š”์ฒญ์„ ๋ณด๋‚ผ ๋•Œ, Meta์˜ DNS ์„œ๋ฒ„๋Š” ํ•ด๋‹น ์š”์ฒญ์„ Meta๊ฐ€ ์šด์˜ํ•˜๋Š” ์†Œ๊ทœ๋ชจ ์—ฃ์ง€ ๋ฐ์ดํ„ฐ์„ผํ„ฐ(POP, Point of Presence)๋กœ ๋งคํ•‘๋œ IP ์ฃผ์†Œ๋ฅผ ๋™์ ์œผ๋กœ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ทธ๋ฆผ 1์— ๋‚˜์™€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋™์  DNS ๋งคํ•‘์€ ์„ ํƒ๋œ PoP๊ฐ€ ์‚ฌ์šฉ์ž์™€ ๊ฐ€๊นŒ์šด ์œ„์น˜์— ์žˆ๋„๋ก ๋ณด์žฅํ•˜๋ฉฐ, ๋™์‹œ์— ์—ฌ๋Ÿฌ PoP ๊ฐ„ ๋ถ€ํ•˜๋ฅผ ๊ท ํ˜• ์žˆ๊ฒŒ ๋ถ„์‚ฐ์‹œํ‚ต๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž์˜ TCP ์—ฐ๊ฒฐ์€ PoP์—์„œ ์ข…๋ฃŒ๋˜๋ฉฐ, PoP๋Š” Meta์˜ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์™€ ๋ณ„๋„์˜ ์žฅ๊ธฐ ์ง€์†์ ์ธ TCP ์—ฐ๊ฒฐ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ถ„ํ•  TCP(split-TCP) ์„ค์ •์€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์ด์ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, PoP์™€ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ๊ฐ„์— ๋ฏธ๋ฆฌ ์„ค์ •๋œ ์—ฐ๊ฒฐ์„ ์žฌ์‚ฌ์šฉํ•จ์œผ๋กœ์จ TCP ์—ฐ๊ฒฐ ์„ค์ • ์ง€์—ฐ ์‹œ๊ฐ„์„ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ PoP๋Š” ์ˆ˜๋ฐฑ ๋Œ€์˜ ์„œ๋ฒ„๋ฅผ ๋ณด์œ ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ผ๋ถ€ PoP๋Š” ์ˆ˜์ฒœ ๋Œ€์˜ ์„œ๋ฒ„๋ฅผ ๊ฐ–์ถ”๊ณ  ์žˆ์„ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ „ ์„ธ๊ณ„์— ์ˆ˜๋ฐฑ ๊ฐœ์˜ PoP๊ฐ€ ๋ฐฐ์น˜๋˜์–ด ์žˆ์–ด, ๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ์šฉ์ž๊ฐ€ ๊ฐ€๊นŒ์šด PoP๋ฅผ ํ†ตํ•ด ์„œ๋น„์Šค๋ฐ›์„ ์ˆ˜ ์žˆ๋„๋ก ํ•˜์—ฌ ๋„คํŠธ์›Œํฌ ์ง€์—ฐ ์‹œ๊ฐ„์„ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค.

Static-Content Caching

์‚ฌ์šฉ์ž๊ฐ€ ์ด๋ฏธ์ง€๋‚˜ ๋™์˜์ƒ๊ณผ ๊ฐ™์€ ์ •์  ์ฝ˜ํ…์ธ ๋ฅผ ์š”์ฒญํ•˜๋ฉด, ํ•ด๋‹น ์ฝ˜ํ…์ธ ๊ฐ€ ์ด๋ฏธ PoP์— ์บ์‹œ๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ PoP์—์„œ ์ง์ ‘ ์ œ๊ณต๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ •์  ์ฝ˜ํ…์ธ ๋Š” ์ฝ˜ํ…์ธ  ์ „์†ก ๋„คํŠธ์›Œํฌ(CDN)์—์„œ ์บ์‹œ๋  ์ˆ˜๋„ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ๊ทธ๋ฆผ 1์— ๋‚˜ํƒ€๋‚˜ ์žˆ์Šต๋‹ˆ๋‹ค. Meta ์ œํ’ˆ์˜ ํŠธ๋ž˜ํ”ฝ์ด ํŠน์ • ์ธํ„ฐ๋„ท ์„œ๋น„์Šค ์ œ๊ณต์—…์ฒด(ISP) ๋„คํŠธ์›Œํฌ์—์„œ ๋Œ€๋Ÿ‰์œผ๋กœ ๋ฐœ์ƒํ•˜๋Š” ๊ฒฝ์šฐ, Meta๋Š” ํ•ด๋‹น ISP์™€ ์ƒํ˜ธ ์ด์ต์ด ๋˜๋Š” ํŒŒํŠธ๋„ˆ์‹ญ์„ ๊ตฌ์ถ•ํ•˜๋ ค ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด, ISP ๋„คํŠธ์›Œํฌ ๋‚ด์— Meta ๋„คํŠธ์›Œํฌ ์žฅ๋น„๋ฅผ ๋ฐฐ์น˜ํ•˜์—ฌ ์ •์  ์ฝ˜ํ…์ธ ๋ฅผ ์บ์‹ฑํ•˜๋„๋ก ํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด CDN ์‚ฌ์ดํŠธ๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ CDN ์‚ฌ์ดํŠธ๋Š” ์ˆ˜์‹ญ ๋Œ€์˜ ์„œ๋ฒ„๋ฅผ ๋ณด์œ ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ผ๋ถ€ ์‚ฌ์ดํŠธ๋Š” 100๋Œ€ ์ด์ƒ์˜ ์„œ๋ฒ„๋ฅผ ์šด์˜ํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ์ „ ์„ธ๊ณ„ ์ˆ˜์ฒœ ๊ฐœ์˜ CDN ์‚ฌ์ดํŠธ๊ฐ€ Meta์˜ CDN์„ ๊ตฌ์„ฑํ•˜์—ฌ ์ •์  ์ฝ˜ํ…์ธ ๋ฅผ ๋ฐฐํฌํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

Meta์˜ ์ œํ’ˆ๋“ค์€ URL ์žฌ์ž‘์„ฑ(URL rewrite) ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž ์š”์ฒญ์„ ๊ฐ€๊นŒ์šด CDN ์‚ฌ์ดํŠธ๋กœ ๋ฆฌ๋””๋ ‰์…˜ํ•ฉ๋‹ˆ๋‹ค. Meta์˜ ์ œํ’ˆ์ด ์‚ฌ์šฉ์ž๊ฐ€ ์ •์  ์ฝ˜ํ…์ธ ์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋„๋ก URL์„ ์ œ๊ณตํ•  ๋•Œ, URL์„ ์žฌ์ž‘์„ฑํ•˜์—ฌ ์˜ˆ๋ฅผ ๋“ค์–ด facebook.com/image.jpg๋ฅผ CDN109.meta.com/image.jpg๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ํ•ด๋‹น ์ด๋ฏธ์ง€๋ฅผ ์š”์ฒญํ–ˆ์„ ๋•Œ, CDN109์— ์ด๋ฏธ์ง€๊ฐ€ ์บ์‹œ๋˜์–ด ์žˆ์ง€ ์•Š๋‹ค๋ฉด, CDN109๋Š” ์š”์ฒญ์„ ๊ฐ€๊นŒ์šด PoP๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. PoP๋Š” ์ดํ›„ ์š”์ฒญ์„ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ์— ์žˆ๋Š” ๋กœ๋“œ ๋ฐธ๋Ÿฐ์„œ๋กœ ์ „๋‹ฌํ•˜๋ฉฐ, ๋กœ๋“œ ๋ฐธ๋Ÿฐ์„œ๋Š” ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์—์„œ ํ•ด๋‹น ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. ๋ฐ˜ํ™˜ ๊ฒฝ๋กœ์—์„œ๋Š” PoP์™€ CDN ์‚ฌ์ดํŠธ๊ฐ€ ๋ชจ๋‘ ํ•ด๋‹น ์ด๋ฏธ์ง€๋ฅผ ์บ์‹ฑํ•˜์—ฌ ํ–ฅํ›„ ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

Dynamic-Content Request Routing

์‚ฌ์šฉ์ž๊ฐ€ ๋‰ด์Šคํ”ผ๋“œ์™€ ๊ฐ™์€ ๋™์  ์ฝ˜ํ…์ธ ๋ฅผ ์š”์ฒญํ•˜๋ฉด, PoP๋Š” ์ด๋ฅผ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ์œผ๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ๋Œ€์ƒ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ์˜ ์„ ํƒ์€ ํŠธ๋ž˜ํ”ฝ ์—”์ง€๋‹ˆ์–ด๋ง ๋„๊ตฌ์— ์˜ํ•ด ๊ฒฐ์ •๋˜๋ฉฐ, ์ด ๋„๊ตฌ๋Š” ๋ฐ์ดํ„ฐ์„ผํ„ฐ์˜ ์šฉ๋Ÿ‰๊ณผ ๋„คํŠธ์›Œํฌ ์ง€์—ฐ ์‹œ๊ฐ„๊ณผ ๊ฐ™์€ ์š”์†Œ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ PoP์—์„œ ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋กœ์˜ ๊ธ€๋กœ๋ฒŒ ํŠธ๋ž˜ํ”ฝ์„ ์ตœ์  ๋ถ„๋ฐฐํ•˜๋„๋ก ์ฃผ๊ธฐ์ ์œผ๋กœ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

PoP์—์„œ ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋กœ ๊ฐ€๋Š” ํŠธ๋ž˜ํ”ฝ์€ Meta์˜ ์‚ฌ์„ค ๊ด‘์—ญ ๋„คํŠธ์›Œํฌ(WAN)๋ฅผ ํ†ตํ•ด ์ด๋™ํ•˜๋ฉฐ, ์ด WAN์€ ์ „ ์„ธ๊ณ„์˜ Meta PoP์™€ ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ์ˆ˜๋งŒ ๋งˆ์ผ ๊ธธ์ด์˜ ๊ด‘์„ฌ์œ  ๋„คํŠธ์›Œํฌ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. Meta์˜ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์™€ PoP ๊ฐ„ ๋‚ด๋ถ€ ๋„คํŠธ์›Œํฌ ํŠธ๋ž˜ํ”ฝ์€ ์‚ฌ์šฉ์ž์™€ PoP ๊ฐ„์˜ ์™ธ๋ถ€ ํŠธ๋ž˜ํ”ฝ๋ณด๋‹ค ๋ช‡ ๋ฐฐ๋‚˜ ๋” ๋งŽ์œผ๋ฉฐ, ์ด๋Š” ์ฃผ๋กœ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ๊ฐ„ ๋ฐ์ดํ„ฐ ๋ณต์ œ ๋ฐ ๋งˆ์ดํฌ๋กœ์„œ๋น„์Šค ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์‚ฌ์„ค WAN์€ ๋‚ด๋ถ€ ํŠธ๋ž˜ํ”ฝ์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๋†’์€ ๋Œ€์—ญํญ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Insight 2: Meta์˜ ๊ธ€๋กœ๋ฒŒ ์ธํ”„๋ผ๋Š” CDN ์‚ฌ์ดํŠธ, ์—ฃ์ง€ ๋ฐ์ดํ„ฐ์„ผํ„ฐ, ๊ทธ๋ฆฌ๊ณ  ๋ฉ”์ธ ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์„ผํ„ฐ ๊ฐ„ ๋‚ด๋ถ€ ํŠธ๋ž˜ํ”ฝ์˜ ์–‘์ด ๋งค์šฐ ํฌ๊ธฐ ๋•Œ๋ฌธ์—, Meta๋Š” ๊ณต์šฉ ์ธํ„ฐ๋„ท์— ์˜์กดํ•˜๋Š” ๋Œ€์‹  ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ์‚ฌ์„ค WAN์„ ๊ตฌ์ถ•ํ–ˆ์Šต๋‹ˆ๋‹ค.

Infrastructure topology

์•„๋ž˜ ํ‘œ๋Š” ์•ž์„œ ์–ธ๊ธ‰ํ•œ ์ธํ”„๋ผ ๊ตฌ์„ฑ ์š”์†Œ๋“ค์„ ์š”์•ฝํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ „ ์„ธ๊ณ„์ ์œผ๋กœ ์ˆ˜์‹ญ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ, ์ˆ˜๋ฐฑ ๊ฐœ์˜ ์—ฃ์ง€ ๋ฐ์ดํ„ฐ์„ผํ„ฐ(PoP), ๊ทธ๋ฆฌ๊ณ  ์ˆ˜์ฒœ ๊ฐœ์˜ CDN ์‚ฌ์ดํŠธ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ์—๋Š” ๋ฐ˜๊ฒฝ ๋ช‡ ๋งˆ์ผ ๋‚ด์— ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ์„ผํ„ฐ๊ฐ€ ์œ„์น˜ํ•ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋Š” ์ „๋ ฅ ๋ถ„๋ฐฐ๋ฅผ ์œ„ํ•ด ์ตœ๋Œ€ 12๊ฐœ์˜ ๋ฉ”์ธ ์Šค์œ„์น˜๋ณด๋“œ(MSB)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ์ด๋“ค์€ ์ฃผ์š” ํ•˜์œ„ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์žฅ์•  ๋„๋ฉ”์ธ ์—ญํ• ๋„ ํ•ฉ๋‹ˆ๋‹ค. MSB๊ฐ€ ๊ณ ์žฅ ๋‚˜๋ฉด 1๋งŒ~2๋งŒ ๋Œ€์˜ ์„œ๋ฒ„๊ฐ€ ์‚ฌ์šฉ ๋ถˆ๊ฐ€๋Šฅํ•ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Edge Network

PoP๋Š” ์ธํ„ฐ๋„ท์ƒ์˜ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ž์œจ ์‹œ์Šคํ…œ(Autonomous Systems, AS)๊ณผ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์œผ๋ฉฐ, ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ์ž ๋„คํŠธ์›Œํฌ์— ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ฒฝ๋กœ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. PoP์™€ ์‚ฌ์šฉ์ž ๊ฐ„ ๊ฒฝ๋กœ๋ฅผ ์„ ํƒํ•  ๋•Œ, ๊ธฐ๋ณธ์ ์œผ๋กœ BGP(Border Gateway Protocol)๋Š” ๋„คํŠธ์›Œํฌ ์šฉ๋Ÿ‰๊ณผ ์„ฑ๋Šฅ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ PoP ๋„คํŠธ์›Œํฌ๋Š” ์ด๋Ÿฌํ•œ ์š”์†Œ๋“ค์„ ๊ณ ๋ คํ•˜์—ฌ ์ตœ์ ์˜ ๊ฒฝ๋กœ๋ฅผ ๋„คํŠธ์›Œํฌ ํ”„๋ฆฌํ”ฝ์Šค(prefix)์— ๊ด‘๊ณ ํ•ฉ๋‹ˆ๋‹ค.

Datacenter Network

๋ฐ์ดํ„ฐ์„ผํ„ฐ ๋‚ด์˜ ์„œ๋ฒ„๋“ค์€ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ํŒจ๋ธŒ๋ฆญ(datacenter fabric)์œผ๋กœ ์ƒํ˜ธ ์—ฐ๊ฒฐ๋˜๋ฉฐ, ๋„คํŠธ์›Œํฌ ์Šค์œ„์น˜๋Š” 3๊ณ„์ธต Clos ํ† ํด๋กœ์ง€๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ตฌ์กฐ๋Š” ์ตœ์ƒ์œ„ ๋ ˆ๋ฒจ์— ๋” ๋งŽ์€ ์Šค์œ„์น˜๋ฅผ ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์ ์ง„์ ์œผ๋กœ ํ™•์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ถฉ๋ถ„ํ•œ ์ˆ˜์˜ ์ตœ์ƒ์œ„ ์Šค์œ„์น˜๋ฅผ ๊ฐ–์ถ˜ ๊ฒฝ์šฐ, ์ด ํŒจ๋ธŒ๋ฆญ ๊ตฌ์กฐ๋Š” ๋ธ”๋กœํ‚น ์—†๋Š”(non-blocking) ๋ฐ ์ดˆ๊ณผ ๊ฐ€์ž…(over-subscription) ์—†๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ๊ณตํ•˜์—ฌ, ๋ชจ๋“  ์„œ๋ฒ„ ๊ฐ„์˜ ํ†ต์‹ ์ด ์ตœ๋Œ€ NIC ๋Œ€์—ญํญ์—์„œ ์›ํ™œํ•˜๊ฒŒ ์ด๋ฃจ์–ด์งˆ ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ฐ์ดํ„ฐ์„ผํ„ฐ ๋‚ด์—์„œ ๋„คํŠธ์›Œํฌ ์ดˆ๊ณผ ๊ฐ€์ž…์„ ์ œ๊ฑฐํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋‚˜์•„๊ฐ€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Regional Network

ํŒจ๋ธŒ๋ฆญ ์• ๊ทธ๋ฆฌ๊ฒŒ์ดํ„ฐ(fabric aggregator)๋Š” ์ง€์—ญ ๋‚ด ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋ฅผ ์—ฐ๊ฒฐํ•˜๋ฉฐ, ์ด๋ฅผ Meta์˜ ์‚ฌ์„ค WAN๊ณผ๋„ ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ํŒจ๋ธŒ๋ฆญ ์• ๊ทธ๋ฆฌ๊ฒŒ์ดํ„ฐ๋Š” โ€˜Fat Treeโ€™์™€ ์œ ์‚ฌํ•œ ํ† ํด๋กœ์ง€๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ์ ์ง„์ ์œผ๋กœ ๋” ๋งŽ์€ ์Šค์œ„์น˜๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋Œ€์—ญํญ์„ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ง€์—ญ ๋‚ด ๋ฐ์ดํ„ฐ์„ผํ„ฐ ๊ฐ„ ํ†ต์‹ ์ด ๋ณ‘๋ชฉ ํ˜„์ƒ์„ ์ผ์œผํ‚ค์ง€ ์•Š๋„๋ก, ์ง€์—ญ ๋„คํŠธ์›Œํฌ์—์„œ ๋„คํŠธ์›Œํฌ ์ดˆ๊ณผ ๊ฐ€์ž…์„ ๋Œ€ํญ ์ค„์ด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด, ๋จธ์‹ ๋Ÿฌ๋‹(ML) ํ•™์Šต์„ ์ œ์™ธํ•œ ๋Œ€๋ถ€๋ถ„์˜ ์„œ๋น„์Šค๋Š” ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ๊ฑฑ์ •ํ•˜์ง€ ์•Š๊ณ  ์ง€์—ญ ๋‚ด ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์— ๋ถ„์‚ฐ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Request Processing

์‚ฌ์šฉ์ž์˜ ์š”์ฒญ์ด ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ์— ๋„์ฐฉํ•˜๋ฉด, ๊ทธ๋ฆผ 2์— ๋‚˜ํƒ€๋‚œ ๊ฒฝ๋กœ๋ฅผ ๋”ฐ๋ผ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค. ๋กœ๋“œ ๋ฐธ๋Ÿฐ์„œ๋Š” ์‚ฌ์šฉ์ž ์š”์ฒญ์„ ์ˆ˜๋งŒ ๋Œ€์˜ ์„œ๋ฒ„์— ๋ถ„์‚ฐํ•˜๋ฉฐ, ์ด ์„œ๋ฒ„๋“ค์€ โ€˜ํ”„๋ก ํŠธ์—”๋“œ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜(frontend serverless functions)โ€™๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž ์š”์ฒญ์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด, ํ”„๋ก ํŠธ์—”๋“œ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜๋Š” ์—ฌ๋Ÿฌ ๋ฐฑ์—”๋“œ ์„œ๋น„์Šค๋ฅผ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ผ๋ถ€ ๋ฐฑ์—”๋“œ ์„œ๋น„์Šค๋Š” ์ถ”๊ฐ€์ ์œผ๋กœ โ€˜ML ์ถ”๋ก (ML inference)โ€™์„ ์‹คํ–‰ํ•˜์—ฌ ๊ด‘๊ณ ๋‚˜ ๋‰ด์Šคํ”ผ๋“œ ์ฝ˜ํ…์ธ  ์ถ”์ฒœ์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ”„๋ก ํŠธ์—”๋“œ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜๋Š” ์‹คํ–‰ ๋„์ค‘ โ€˜์ด๋ฒคํŠธ ํ(event queue)โ€™์— ์ด๋ฒคํŠธ๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” โ€˜์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜(event-driven serverless functions)โ€™๊ฐ€ ๋น„๋™๊ธฐ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์‚ฌ์šฉ์ž๊ฐ€ ์‚ฌ์ดํŠธ์—์„œ ํŠน์ • ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•œ ํ›„ ํ™•์ธ ์ด๋ฉ”์ผ์„ ๋ณด๋‚ด๋Š” ๊ฒƒ์ด ํ•˜๋‚˜์˜ ์ด๋ฒคํŠธ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ”„๋ก ํŠธ์—”๋“œ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜๋Š” ์‚ฌ์šฉ์ž ๊ฒฝํ—˜์— ์ง์ ‘์ ์ธ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋ฏ€๋กœ ์—„๊ฒฉํ•œ ์ง€์—ฐ ์‹œ๊ฐ„ ์„œ๋น„์Šค ์ˆ˜์ค€ ๋ชฉํ‘œ(SLO)๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜๋Š” ์‚ฌ์šฉ์ž ์‘๋‹ต ์‹œ๊ฐ„์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๊ณ  ๋น„๋™๊ธฐ์ ์œผ๋กœ ์‹คํ–‰๋˜๋ฉฐ, ์ง€์—ฐ ์‹œ๊ฐ„๋ณด๋‹ค๋Š” ์ฒ˜๋ฆฌ๋Ÿ‰๊ณผ ํ•˜๋“œ์›จ์–ด ํ™œ์šฉ๋„๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ ์ดˆ์ ์„ ๋งž์ถฅ๋‹ˆ๋‹ค. ํ”„๋ก ํŠธ์—”๋“œ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜๋ฅผ ์‹คํ–‰ํ•˜๋Š” ์„œ๋ฒ„์™€ ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜๋ฅผ ์‹คํ–‰ํ•˜๋Š” ์„œ๋ฒ„์˜ ๋น„์œจ์€ ๋Œ€๋žต 5:1์ž…๋‹ˆ๋‹ค.

Offline Processing

๊ทธ๋ฆผ 2์˜ ์˜ค๋ฅธ์ชฝ์— ์žˆ๋Š” ๊ตฌ์„ฑ ์š”์†Œ๋“ค์€ ๋‹ค์–‘ํ•œ ์˜คํ”„๋ผ์ธ ์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ์ด๋Š” ์™ผ์ชฝ์˜ ์˜จ๋ผ์ธ ์ฒ˜๋ฆฌ๋ฅผ ๋ณด์กฐํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์˜จ๋ผ์ธ ์ฒ˜๋ฆฌ์™€ ์˜คํ”„๋ผ์ธ ์ฒ˜๋ฆฌ๋ฅผ ๋ถ„๋ฆฌํ•˜๋ฉด, ๊ฐ๊ฐ์˜ ์ž‘์—… ๋ถ€ํ•˜ ํŠน์„ฑ์— ๋”ฐ๋ผ ๋…๋ฆฝ์ ์ธ ์ตœ์ ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž ์š”์ฒญ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๋™์•ˆ, ํ”„๋ก ํŠธ์—”๋“œ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜์™€ ๋ฐฑ์—”๋“œ ์„œ๋น„์Šค๋Š” ๊ด‘๊ณ  ํด๋ฆญ๋ฅ (ad-click-through)์ด๋‚˜ ๋™์˜์ƒ ์‹œ์ฒญ ํ†ต๊ณ„(video-watch metrics)์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ โ€˜๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค(data warehouse)โ€™์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ๋Š” ๋‹ค์–‘ํ•œ ์˜คํ”„๋ผ์ธ ์ฒ˜๋ฆฌ ๊ณผ์ •์—์„œ ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, โ€˜ML ํ•™์Šต(ML training)โ€™์€ ์ด ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ์—…๋ฐ์ดํŠธํ•˜๊ณ , โ€˜์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ(stream processing)โ€™๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ์‚ฌ์ดํŠธ์—์„œ ๊ฐ€์žฅ ๋งŽ์ด ๋…ผ์˜๋˜๋Š” ์ฃผ์ œ๋ฅผ ์—…๋ฐ์ดํŠธํ•œ ํ›„, ์ด๋ฅผ โ€˜๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค(databases)โ€™ ๋ฐ โ€˜์บ์‹œ(caches)โ€™์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ๋Š” ์ดํ›„ ์˜จ๋ผ์ธ ์‚ฌ์šฉ์ž ์š”์ฒญ์„ ์ฒ˜๋ฆฌํ•  ๋•Œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, Spark์™€ Presto๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” โ€˜๋ฐฐ์น˜ ๋ถ„์„(batch analytics)โ€™์€ ์‚ฌ์ดํŠธ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ƒˆ๋กœ์šด ํ™œ๋™์— ๋”ฐ๋ผ ์นœ๊ตฌ ์ถ”์ฒœ์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋“ฑ์˜ ์ž‘์—…์„ ์ฃผ๊ธฐ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค์—์„œ ์ด๋ฃจ์–ด์ง€๋Š” ๋ฐ์ดํ„ฐ ์—…๋ฐ์ดํŠธ๋Š” ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜๋ฅผ ์‹คํ–‰ํ•˜๋Š” ์ฃผ์š” ์ด๋ฒคํŠธ ์†Œ์Šค๋กœ ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค.

Insight 3: ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค๋ฅผ ์ค‘๊ฐ„ ๊ณ„์ธต์œผ๋กœ ํ™œ์šฉํ•˜์—ฌ ์˜จ๋ผ์ธ ์ฒ˜๋ฆฌ์™€ ์˜คํ”„๋ผ์ธ ์ฒ˜๋ฆฌ๋ฅผ ๋ถ„๋ฆฌํ•˜๋ฉด, ์•„ํ‚คํ…์ฒ˜๊ฐ€ ๋‹จ์ˆœํ•ด์ง€๊ณ  ๊ฐ๊ฐ์˜ ์ฒ˜๋ฆฌ๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Boosting Developer Productivity

๊ณต์œ  ์ธํ”„๋ผ์˜ ์ฃผ์š” ๋ชฉ์  ์ค‘ ํ•˜๋‚˜๋Š” ๊ฐœ๋ฐœ์ž์˜ ์ƒ์‚ฐ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ง€์†์ ์ธ ์†Œํ”„ํŠธ์›จ์–ด ๋ฐฐํฌ์™€ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜๊ฐ€ ๊ฐœ๋ฐœ์ž์˜ ์ƒ์‚ฐ์„ฑ์„ ๋†’์ด๋Š” ๋ฐ ๋„์›€์ด ๋œ๋‹ค๋Š” ๊ฒƒ์€ ๋„๋ฆฌ ์•Œ๋ ค์ ธ ์žˆ์ง€๋งŒ, Meta๋Š” ์ด๋Ÿฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์„ ๊ทนํ•œ๊นŒ์ง€ ๋ฐœ์ „์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

Continuous Deployment

Meta์˜ โ€˜๋น ๋ฅด๊ฒŒ ์›€์ง์ด๊ธฐ(move-fast)โ€™ ๋ฌธํ™”์— ๋งž์ถฐ, ์šฐ๋ฆฌ๋Š” ์ฝ”๋“œ์™€ ๊ตฌ์„ฑ(configuration)์˜ ์ง€์†์  ๋ฐฐํฌ๋ฅผ ๊ทน๋„๋กœ ๋น ๋ฅด๊ณ  ๋Œ€๊ทœ๋ชจ๋กœ ์šด์˜ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ฐœ๋ฐœ์ž๋Š” ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ๊ณผ ๋ฒ„๊ทธ ์ˆ˜์ • ์‚ฌํ•ญ์„ ์‹ ์†ํ•˜๊ฒŒ ๋ฐฐํฌํ•˜๊ณ , ์ฆ‰๊ฐ์ ์ธ ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ›์•„ ๋น ๋ฅด๊ฒŒ ๋ฐ˜๋ณต(iterate)ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ตฌ์„ฑ ๋ณ€๊ฒฝ์˜ ๊ฒฝ์šฐ, Meta์˜ ๊ตฌ์„ฑ ๊ด€๋ฆฌ ๋„๊ตฌ(configuration-management tool)๋Š” ๋งค์ผ 10๋งŒ ๊ฐœ ์ด์ƒ์˜ ์‹ค์‹œ๊ฐ„ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์— ๋ฐฐํฌํ•˜๋ฉฐ, ์ด๋Š” ์•ฝ 10,000๊ฐœ์˜ ์„œ๋น„์Šค์™€ ์ˆ˜๋ฐฑ๋งŒ ๋Œ€์˜ ์„œ๋ฒ„์— ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์€ ๋กœ๋“œ ๋ฐธ๋Ÿฐ์‹ฑ, ๊ธฐ๋Šฅ ๋กค์•„์›ƒ(feature rollout), A/B ํ…Œ์ŠคํŠธ, ๊ณผ๋ถ€ํ•˜ ๋ณดํ˜ธ(overload protection) ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ์ž‘์—…์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. Meta์—์„œ๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜๋Š” ๊ฑฐ์˜ ๋ชจ๋“  ์—”์ง€๋‹ˆ์–ด๊ฐ€ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ตฌ์„ฑ ๋ณ€๊ฒฝ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. Meta๋Š” โ€˜์ฝ”๋“œ๋กœ์„œ์˜ ๊ตฌ์„ฑ(Configuration-as-Code)โ€™ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋”ฐ๋ฅด๋ฉฐ, ์ˆ˜๋™ ๊ตฌ์„ฑ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์€ ์ฝ”๋“œ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์— ์ปค๋ฐ‹๋˜๊ธฐ ์ „์— ๋™๋ฃŒ ์ฝ”๋“œ ๋ฆฌ๋ทฐ(peer code review)๋ฅผ ๊ฑฐ์นฉ๋‹ˆ๋‹ค. ๋ณ€๊ฒฝ ์‚ฌํ•ญ์ด ์ปค๋ฐ‹๋˜๋ฉด ์ฆ‰์‹œ ์ง€์†์  ๋ฐฐํฌ ํŒŒ์ดํ”„๋ผ์ธ(continuous deployment pipeline)์— ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค. ๋ช‡ ์ดˆ ์•ˆ์— ์—…๋ฐ์ดํŠธ๋œ ๊ตฌ์„ฑ์€ ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ ๊ตฌ๋…๋œ Linux ํ”„๋กœ์„ธ์Šค๋กœ ํ‘ธ์‹œ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์—…์ฝœ(upcall) ์•Œ๋ฆผ์ด ํŠธ๋ฆฌ๊ฑฐ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ”„๋กœ์„ธ์Šค๋Š” ์žฌ์‹œ์ž‘ ์—†์ด ์ฆ‰์‹œ ๋Ÿฐํƒ€์ž„ ๋™์ž‘์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ˆ˜๋™ ๋ณ€๊ฒฝ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋กœ๋“œ ๋ฐธ๋Ÿฐ์‹ฑ๊ณผ ๊ฐ™์€ ์ž‘์—…์„ ์œ„ํ•ด ์ž๋™ํ™” ๋„๊ตฌ๋„ ๊ตฌ์„ฑ ๋ณ€๊ฒฝ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ฝ”๋“œ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์˜ ๊ฒฝ์šฐ, Meta์˜ ๋ฐฐํฌ ๋„๊ตฌ๋Š” 30,000๊ฐœ ์ด์ƒ์˜ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ด€๋ฆฌํ•˜๋ฉฐ ์†Œํ”„ํŠธ์›จ์–ด ์—…๊ทธ๋ ˆ์ด๋“œ๋ฅผ ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค. Meta์—์„œ 97%์˜ ์„œ๋น„์Šค๋Š” ์ˆ˜๋™ ๊ฐœ์ž… ์—†์ด ์™„์ „ ์ž๋™ํ™”๋œ ์†Œํ”„ํŠธ์›จ์–ด ๋ฐฐํฌ ๋ฐฉ์‹์„ ์ฑ„ํƒํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ค‘ 55%๋Š” ์ง€์†์  ๋ฐฐํฌ(Continuous Deployment)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ž๋™ํ™”๋œ ํ…Œ์ŠคํŠธ๋ฅผ ํ†ต๊ณผํ•œ ๋ชจ๋“  ์ฝ”๋“œ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ์ฆ‰์‹œ ํ”„๋กœ๋•์…˜์— ๋ฐฐํฌํ•˜๋ฉฐ, ๋‚˜๋จธ์ง€ 42%๋Š” ์ฃผ๋กœ ๋งค์ผ ๋˜๋Š” ๋งค์ฃผ ๊ณ ์ •๋œ ์ผ์ •์— ๋”ฐ๋ผ ์ž๋™ ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆผ 2์— ๋‚˜์˜ค๋Š” ํ”„๋ก ํŠธ์—”๋“œ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜๋ฅผ ์˜ˆ๋กœ ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด ํ•จ์ˆ˜๋“ค์€ 50๋งŒ ๋Œ€ ์ด์ƒ์˜ ์„œ๋ฒ„์—์„œ ์‹คํ–‰๋˜๋ฉฐ, 10,000๋ช… ์ด์ƒ์˜ ์ œํ’ˆ ๊ฐœ๋ฐœ์ž๊ฐ€ ๋งค์ผ ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•˜๊ณ , ํ•˜๋ฃจ์—๋„ ์ˆ˜์ฒœ ๊ฑด์˜ ์ฝ”๋“œ ์ปค๋ฐ‹์ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. ์ด์ฒ˜๋Ÿผ ๋งค์šฐ ์—ญ๋™์ ์ธ ํ™˜๊ฒฝ์—์„œ๋„, ๋ชจ๋“  ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜์˜ ์ƒˆ๋กœ์šด ๋ฒ„์ „์€ 3์‹œ๊ฐ„๋งˆ๋‹ค ํ”„๋กœ๋•์…˜์— ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค. Meta์˜ ๋„คํŠธ์›Œํฌ ์†Œํ”„ํŠธ์›จ์–ด๋„ ์ผ๋ฐ˜์ ์ธ ์„œ๋น„์Šค์ฒ˜๋Ÿผ ์„ค๊ณ„๋˜์—ˆ์œผ๋ฉฐ, ๋นˆ๋ฒˆํ•œ ์—…๋ฐ์ดํŠธ๋ฅผ ์œ„ํ•ด ์ตœ์ ํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, Meta์˜ ์‚ฌ์„ค WAN์€ ๋„คํŠธ์›Œํฌ ํ† ํด๋กœ์ง€๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ณ‘๋ ฌ ํ‰๋ฉด(plane)์œผ๋กœ ๋‚˜๋ˆ„๋ฉฐ, ๊ฐ ํ‰๋ฉด์€ ์ผ์ • ๋ถ€๋ถ„์˜ ํŠธ๋ž˜ํ”ฝ์„ ๋‹ด๋‹นํ•˜๊ณ  ์ž์ฒด ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ๊ฐ–์ถ”๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ปจํŠธ๋กค๋Ÿฌ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ์ž์ฃผ ์—…๋ฐ์ดํŠธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐœ๋ฐœ์ž๋Š” ํŠน์ • ํ‰๋ฉด์˜ ํŠธ๋ž˜ํ”ฝ์„ ์šฐํšŒ์‹œ์ผœ ํ•ด๋‹น ํ‰๋ฉด์—์„œ๋งŒ ์ƒˆ๋กœ์šด ์ œ์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋ฐฐํฌํ•˜๊ณ  ์‹คํ—˜ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‹ค๋ฅธ ํ‰๋ฉด์—๋Š” ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, Meta์˜ ๋„คํŠธ์›Œํฌ ์Šค์œ„์น˜ ์†Œํ”„ํŠธ์›จ์–ด๋„ ์ผ๋ฐ˜์ ์ธ ์„œ๋น„์Šค์ฒ˜๋Ÿผ ์ž์ฃผ ์—…๋ฐ์ดํŠธ๋ฉ๋‹ˆ๋‹ค. ๋„คํŠธ์›Œํฌ ์Šค์œ„์น˜์˜ ASIC์—์„œ ์ œ๊ณตํ•˜๋Š” โ€˜์›œ ๋ถ€ํŠธ(warm boot)โ€™ ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•˜๋ฉด, ์Šค์œ„์น˜ ์†Œํ”„ํŠธ์›จ์–ด๊ฐ€ ์—…๋ฐ์ดํŠธ๋˜๋Š” ๋™์•ˆ์—๋„ ๋ฐ์ดํ„ฐ ํ”Œ๋ ˆ์ธ์€ ํŠธ๋ž˜ํ”ฝ์„ ๊ณ„์† ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋นˆ๋ฒˆํ•œ ์ฝ”๋“œ ๋ฐ ๊ตฌ์„ฑ ์—…๋ฐ์ดํŠธ๋Š” ์• ์ž์ผ ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์ง€๋งŒ, ์‚ฌ์ดํŠธ ์žฅ์•  ๋ฐœ์ƒ ์œ„ํ—˜๋„ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์œ„ํ—˜์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, Meta๋Š” ํ…Œ์ŠคํŠธ, ๋‹จ๊ณ„์  ๋กค์•„์›ƒ(staged rollouts), ๊ทธ๋ฆฌ๊ณ  ์—…๋ฐ์ดํŠธ ๊ณผ์ •์—์„œ์˜ ์ƒํƒœ ์ ๊ฒ€(health checks)์— ๋งŽ์€ ํˆฌ์ž๋ฅผ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณผ๊ฑฐ์— Meta๋Š” ์ฝ”๋“œ ๋ฐฐํฌ ์ž๋™ํ™”๋ฅผ ๊ฐ•ํ™”ํ•˜๋Š” ํšŒ์‚ฌ ์ฐจ์›์˜ ์บ ํŽ˜์ธ์„ ์ง„ํ–‰ํ–ˆ์œผ๋ฉฐ, ์ƒํƒœ ์ ๊ฒ€(health checks)์œผ๋กœ ๋ณดํ˜ธ๋˜๋Š” ์™„์ „ ์ž๋™ํ™” ์ฝ”๋“œ ๋ฐฐํฌ์˜ ์ฑ„ํƒ๋ฅ ์„ 12%์—์„œ 97%๋กœ ๋Œ์–ด์˜ฌ๋ ธ์Šต๋‹ˆ๋‹ค. ๋น„์Šทํ•˜๊ฒŒ, ๋ชจ๋“  ๊ตฌ์„ฑ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์ด ์ž๋™ํ™”๋œ ์นด๋‚˜๋ฆฌ์•„ ํ…Œ์ŠคํŠธ(canary tests)๋ฅผ ๊ฑฐ์น˜๋„๋ก ํ•˜๋Š” ์ƒˆ๋กœ์šด ์ •์ฑ…์„ ๋„์ž…ํ•˜์—ฌ ๊ตฌ์„ฑ ์•ˆ์ „์„ฑ์„ ๋ณด์žฅํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, ์ง€์†์  ๋ฐฐํฌ(Continuous Deployment)์— ๋Œ€ํ•œ ์ด๋Ÿฌํ•œ ํˆฌ์ž๋Š” ๊ฐœ๋ฐœ์ž์˜ ์ƒ์‚ฐ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋ฏ€๋กœ ์ถฉ๋ถ„ํžˆ ๊ฐ€์น˜ ์žˆ๋‹ค๊ณ  ํŒ๋‹จ๋ฉ๋‹ˆ๋‹ค.

Insight 4 1๋งŒ ๊ฐœ ์ด์ƒ์˜ ์„œ๋น„์Šค๋ฅผ ์šด์˜ํ•˜๋Š” ๋Œ€๊ทœ๋ชจ ์กฐ์ง์—์„œ๋„, ์ง€์†์  ๋ฐฐํฌ(Continuous Deployment)๋ฅผ ๊ทน๋‹จ์ ์ธ ๊ทœ๋ชจ์™€ ์†๋„๋กœ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ถฉ๋ถ„ํžˆ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์ด ์ž…์ฆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Serverless Functions

์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜(Function-as-a-Service, FaaS)์˜ ๊ด‘๋ฒ”์œ„ํ•œ ์‚ฌ์šฉ์€ ๊ฐœ๋ฐœ์ž ์ƒ์‚ฐ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋˜ ๋‹ค๋ฅธ ํ•ต์‹ฌ ์š”์†Œ์ž…๋‹ˆ๋‹ค. ์ „ํ†ต์ ์ธ ๋ฐฑ์—”๋“œ ์„œ๋น„์Šค๋Š” ์ž„์˜์ ์ธ ๋ณต์žก์„ฑ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ๋ฐ˜๋ฉด, FaaS๋Š” ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜์ง€ ์•Š์œผ๋ฉฐ ๋‹จ์ˆœํ•œ ํ•จ์ˆ˜ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ FaaS ํ˜ธ์ถœ์€ ๋…๋ฆฝ์ ์œผ๋กœ ๊ด€๋ฆฌ๋˜๋ฉฐ, ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์ €์žฅ๋œ ์ƒํƒœ๋ฅผ ์ œ์™ธํ•˜๋ฉด ๋‹ค๋ฅธ ๋™์‹œ ์‹คํ–‰ ์ค‘์ธ ํ˜ธ์ถœ์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. FaaS๋Š” ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ ‘๊ทผ ์‹œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ํ™•๋ณดํ•˜๊ธฐ ์œ„ํ•ด ์™ธ๋ถ€ ์บ์‹ฑ ์‹œ์Šคํ…œ์— ํฌ๊ฒŒ ์˜์กดํ•ฉ๋‹ˆ๋‹ค. ๊ฐœ๋ฐœ์ž๋Š” FaaS ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•œ ํ›„, ์ฝ”๋“œ ๋ฐฐํฌ ๋ฐ ๋ถ€ํ•˜ ๋ณ€ํ™”์— ๋”ฐ๋ฅธ ์ž๋™ ํ™•์žฅ(auto-scaling)๊ณผ ๊ฐ™์€ ๋‚˜๋จธ์ง€ ์ž‘์—…์„ ์ž๋™ํ™”๋œ ์ธํ”„๋ผ๊ฐ€ ์ฒ˜๋ฆฌํ•˜๋„๋ก ๋งก๊น๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋‹จ์ˆœ์„ฑ ๋•๋ถ„์— Meta์˜ 10,000๋ช… ์ด์ƒ์˜ ์ œํ’ˆ ๊ฐœ๋ฐœ์ž๋Š” ์ธํ”„๋ผ ๊ด€๋ฆฌ์— ๋Œ€ํ•œ ๊ฑฑ์ • ์—†์ด ์ œํ’ˆ ๋กœ์ง์—๋งŒ ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ œํ’ˆ ๊ฐœ๋ฐœ์ž๊ฐ€ ๊ณผ๋„ํ•˜๊ฒŒ ์ž์›์„ ํ• ๋‹น(over-provisioning)ํ•˜์—ฌ ๋ฐœ์ƒํ•˜๋Š” ํ•˜๋“œ์›จ์–ด ๋‚ญ๋น„๋ฅผ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Meta๋Š” ๊ฐœ๋ฐœ์ž ์ƒ์‚ฐ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•ด FaaS ํ™œ์šฉ์„ ๊ทนํ•œ๊นŒ์ง€ ๋Œ์–ด์˜ฌ๋ ธ์Šต๋‹ˆ๋‹ค. Meta์˜ ์•ฝ 10,000๋ช… ์ด์ƒ์˜ ์—”์ง€๋‹ˆ์–ด ์ค‘, FaaS ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜๋Š” ์—”์ง€๋‹ˆ์–ด์˜ ์ˆ˜๋Š” ์ž์ฒด ์šด์˜ํ•˜๋Š” ์ผ๋ฐ˜ ์„œ๋น„์Šค ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜๋Š” ์—”์ง€๋‹ˆ์–ด๋ณด๋‹ค ์•ฝ 50% ๋” ๋งŽ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์„ฑ๊ณต์€ ์ œํ’ˆ ์—”์ง€๋‹ˆ์–ด๊ฐ€ ์ธํ”„๋ผ ๊ด€๋ฆฌ๋ฅผ ์‹ ๊ฒฝ ์“ฐ์ง€ ์•Š๋„๋ก ํ•œ ์ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, FaaS์— ์ตœ์ ํ™”๋œ ํ†ตํ•ฉ ๊ฐœ๋ฐœ ํ™˜๊ฒฝ(IDE)์˜ ๋†’์€ ์‚ฌ์šฉ์„ฑ ๋•๋ถ„์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ์ด IDE๋Š” ๊ณ ์ˆ˜์ค€(high-level) ์–ธ์–ด ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ์†Œ์…œ ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋ฐ ๋‹ค์–‘ํ•œ ๋ฐฑ์—”๋“œ ์‹œ์Šคํ…œ์— ์‰ฝ๊ฒŒ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ง€์†์  ํ†ตํ•ฉ ํ…Œ์ŠคํŠธ(Continuous Integration Tests)๋ฅผ ํ†ตํ•ด ๋น ๋ฅธ ํ”ผ๋“œ๋ฐฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆผ 2์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, Meta๋Š” ๋‘ ๊ฐœ์˜ FaaS ํ”Œ๋žซํผ์„ ์šด์˜ํ•ฉ๋‹ˆ๋‹ค. ํ•˜๋‚˜๋Š” ํ”„๋ก ํŠธ์—”๋“œ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜(Frontend Serverless Functions)์šฉ์ด๊ณ , ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜(Event-Driven Serverless Functions)์šฉ์ž…๋‹ˆ๋‹ค. ์ด ๋‘ ํ”Œ๋žซํผ์„ ๊ฐ๊ฐ FrontFaaS์™€ XFaaS๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. FrontFaaS ํ•จ์ˆ˜๋Š” PHP๋กœ ์ž‘์„ฑ๋˜๋ฉฐ, Meta๋Š” Python, Erlang, Haskell์„ ์ง€์›ํ•˜๋Š” FaaS ํ”Œ๋žซํผ๋„ ์šด์˜ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ˆ˜์‹ญ์–ต ๋ช…์˜ ์‚ฌ์šฉ์ž๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” ๋†’์€ ๋ถ€ํ•˜๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด, Meta๋Š” 50๋งŒ ๋Œ€ ์ด์ƒ์˜ ์„œ๋ฒ„๋ฅผ ์šด์˜ํ•˜๋ฉฐ PHP ๋Ÿฐํƒ€์ž„์„ ํ•ญ์ƒ ์‹คํ–‰ ์ƒํƒœ๋กœ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž ์š”์ฒญ์ด ๋„์ฐฉํ•˜๋ฉด, ์š”์ฒญ์€ ์ฆ‰์‹œ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์ด๋Ÿฌํ•œ ์„œ๋ฒ„ ์ค‘ ํ•˜๋‚˜๋กœ ๋ผ์šฐํŒ…๋˜๋ฉฐ, ์ฝœ๋“œ ์Šคํƒ€ํŠธ(cold start) ์ง€์—ฐ์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์‚ฌ์ดํŠธ์˜ ๋ถ€ํ•˜๊ฐ€ ๋‚ฎ์„ ๋•Œ๋Š” ์ž๋™ ํ™•์žฅ(auto-scaling) ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๋ถ€ FrontFaaS ์„œ๋ฒ„๋ฅผ ํ•ด์ œํ•˜๊ณ , ์ด๋ฅผ ๋‹ค๋ฅธ ์„œ๋น„์Šค์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. XFaaS๋Š” FrontFaaS์™€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์œ ์‚ฌํ•œ ์ ์„ ๊ฐ€์ง€์ง€๋งŒ, ๊ฐ€์žฅ ํฐ ์ฐจ์ด์ ์€ ์‚ฌ์šฉ์ž์™€ ์ง์ ‘ ์—ฐ๊ฒฐ๋˜์ง€ ์•Š๋Š”(non-user-facing) ํ•จ์ˆ˜๋ฅผ ์‹คํ–‰ํ•˜๋ฉฐ, ์„œ๋ธŒ์ดˆ(subsecond) ์‘๋‹ต ์‹œ๊ฐ„์ด ํ•„์š”ํ•˜์ง€ ์•Š์ง€๋งŒ ๋ถ€ํ•˜๊ฐ€ ๊ธ‰๊ฒฉํ•˜๊ฒŒ ๋ณ€ํ•˜๋Š” ํŒจํ„ด์„ ๋ณด์ธ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. XFaaS๋Š” ์ตœ๋Œ€ ๋ถ€ํ•˜ ์ƒํƒœ์—์„œ ์ž์›์„ ๊ณผ๋„ํ•˜๊ฒŒ ํ• ๋‹น(overprovisioning)ํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ์ง€์—ฐ์„ ํ—ˆ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ•จ์ˆ˜ ์‹คํ–‰์„ ๋น„ํ˜ผ์žก ์‹œ๊ฐ„๋Œ€๋กœ ์—ฐ๊ธฐํ•˜๋Š” ๊ฒƒ, ์ „์—ญ์ ์œผ๋กœ ์ง€์—ญ ๊ฐ„ ํ•จ์ˆ˜ ํ˜ธ์ถœ ๋ถ€ํ•˜๋ฅผ ๋ถ„์‚ฐํ•˜๋Š” ๊ฒƒ, ๊ทธ๋ฆฌ๊ณ  ํ• ๋‹น๋Ÿ‰(quota)์— ๊ธฐ๋ฐ˜ํ•œ ์ œํ•œ(throttling) ๊ตฌํ˜„์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. Meta์˜ ์ œํ’ˆ ๊ฐœ๋ฐœ์ž๋“ค์€ 2000๋…„๋Œ€ ํ›„๋ฐ˜๋ถ€ํ„ฐ FaaS๋ฅผ ์ฃผ์š” ์ฝ”๋”ฉ ํŒจ๋Ÿฌ๋‹ค์ž„์œผ๋กœ ์‚ฌ์šฉํ•ด ์™”์œผ๋ฉฐ, ์ด๋Š” โ€˜FaaSโ€™๋ผ๋Š” ์šฉ์–ด๊ฐ€ ๋„๋ฆฌ ์•Œ๋ ค์ง€๊ธฐ ์ „๋ถ€ํ„ฐ ์ ์šฉ๋˜์–ด ์™”์Šต๋‹ˆ๋‹ค. ์—…๊ณ„์˜ ๋‹ค๋ฅธ ์„œ๋ฒ„๋ฆฌ์Šค ํ”Œ๋žซํผ๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ, Meta์˜ ์„œ๋ฒ„๋ฆฌ์Šค ํ”Œ๋žซํผ์ด ๊ฐ€์ง€๋Š” ๋…ํŠนํ•œ ์ ์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํ•จ์ˆ˜๋ฅผ ๋™์ผํ•œ Linux ํ”„๋กœ์„ธ์Šค์—์„œ ๋™์‹œ์— ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ํผ๋ธ”๋ฆญ ํด๋ผ์šฐ๋“œ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๊ณ ๊ฐ ๊ฐ„ ๊ฒฉ๋ฆฌ ๊ฐ•ํ™”๋ฅผ ์œ„ํ•ด ๊ฐ€์ƒ ๋จธ์‹ ๋‹น ํ•˜๋‚˜์˜ ํ•จ์ˆ˜๋งŒ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ์‹๊ณผ๋Š” ์ฐจ๋ณ„ํ™”๋ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ํ•˜๋“œ์›จ์–ด ํšจ์œจ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค.

Insight 5: ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜๋Š” Meta์˜ ์ œํ’ˆ ๊ฐœ๋ฐœ์—์„œ ๊ธฐ๋ณธ์ ์ธ ์ฝ”๋”ฉ ํŒจ๋Ÿฌ๋‹ค์ž„์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Meta์˜ 10,000๋ช… ์ด์ƒ์˜ ์—”์ง€๋‹ˆ์–ด๊ฐ€ ์„œ๋ฒ„๋ฆฌ์Šค ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•˜๋ฉฐ, ์ด๋Š” ์ผ๋ฐ˜์ ์ธ ์„œ๋น„์Šค ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜๋Š” ์—”์ง€๋‹ˆ์–ด ์ˆ˜๋ณด๋‹ค 50% ๋” ๋งŽ์Šต๋‹ˆ๋‹ค.

Reducing Hardware Costs

๊ณต์œ  ์ธํ”„๋ผ์˜ ๋˜ ๋‹ค๋ฅธ ์ฃผ์š” ๋ชฉ์ ์€ ๊ฐœ๋ฐœ์ž ์ƒ์‚ฐ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๊ฒƒ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํ•˜๋“œ์›จ์–ด ๋น„์šฉ์„ ์ ˆ๊ฐํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ์„น์…˜์—์„œ๋Š” ์†Œํ”„ํŠธ์›จ์–ด ์†”๋ฃจ์…˜์ด ํ•˜๋“œ์›จ์–ด ๋น„์šฉ ์ ˆ๊ฐ์— ์–ด๋–ป๊ฒŒ ๊ธฐ์—ฌํ•˜๋Š”์ง€ ๋ช‡ ๊ฐ€์ง€ ์‚ฌ๋ก€๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

All Global Datacenters as a Computer

๋Œ€๋ถ€๋ถ„์˜ ์ธํ”„๋ผ๋Š” ์ง€๋ฆฌ์ ์œผ๋กœ ๋ถ„์‚ฐ๋œ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์˜ ๋ณต์žกํ•œ ๊ด€๋ฆฌ๋ฅผ ์‚ฌ์šฉ์ž์—๊ฒŒ ๋งก๊น๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋Š” ์„œ๋น„์Šค์˜ ๋ณต์ œ๋ณธ ๊ฐœ์ˆ˜๋ฅผ ์ˆ˜๋™์œผ๋กœ ๊ฒฐ์ •ํ•˜๊ณ , ๋ฐฐํฌํ•  ์ง€์—ญ์„ ์„ ํƒํ•˜๋ฉฐ, ๋™์‹œ์— ์„œ๋น„์Šค ์ˆ˜์ค€ ๋ชฉํ‘œ(SLO)๋ฅผ ์ถฉ์กฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ณต์žก์„ฑ์€ ๊ณผ๋„ํ•œ ์ž์› ํ• ๋‹น(overprovisioning), ์ง€์—ญ ๊ฐ„ ๋ถ€ํ•˜ ๋ถˆ๊ท ํ˜•, ๊ทธ๋ฆฌ๊ณ  ์›Œํฌ๋กœ๋“œ ๋ณ€ํ™” ๋ฐ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ž์› ๊ณต๊ธ‰์— ๋งž์ถ˜ ์ ์ ˆํ•œ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ๋ถ€์กฑ์œผ๋กœ ์ธํ•ด ํ•˜๋“œ์›จ์–ด ๋‚ญ๋น„๋ฅผ ์ดˆ๋ž˜ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, Meta๋Š” โ€˜๋ฐ์ดํ„ฐ์„ผํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ์ปดํ“จํ„ฐ์ฒ˜๋Ÿผ ์šด์˜(The Datacenter as a Computer, DaaC)โ€™ํ•˜๋Š” ๋ฐฉ์‹์„ ๋„˜์–ด์„œ โ€˜๋ชจ๋“  ๊ธ€๋กœ๋ฒŒ ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ์ปดํ“จํ„ฐ์ฒ˜๋Ÿผ ์šด์˜(All Global Datacenters as a Computer, Global-DaaC)โ€™ํ•˜๋Š” ๋น„์ „์„ ํ–ฅํ•ด ๋ฐœ์ „ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Global-DaaC์—์„œ๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ๋‹จ์ˆœํžˆ ์„œ๋น„์Šค์˜ ๊ธ€๋กœ๋ฒŒ ๋ฐฐํฌ๋ฅผ ์š”์ฒญํ•˜๋ฉด, ์ธํ”„๋ผ๊ฐ€ ๋ชจ๋“  ์„ธ๋ถ€ ์‚ฌํ•ญ์„ ์ž๋™์œผ๋กœ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์„œ๋น„์Šค ๋ณต์ œ๋ณธ์˜ ์ตœ์  ๊ฐœ์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•˜๊ณ , ์„œ๋น„์Šค ์ˆ˜์ค€ ๋ชฉํ‘œ์™€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ํ•˜๋“œ์›จ์–ด๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๋ณต์ œ๋ณธ์„ ์ ์ ˆํ•œ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ์— ๋ฐฐ์น˜ํ•˜๋ฉฐ, ์ตœ์ ์˜ ํ•˜๋“œ์›จ์–ด ์œ ํ˜•์„ ์„ ํƒํ•˜๊ณ , ํŠธ๋ž˜ํ”ฝ ๋ผ์šฐํŒ…์„ ์ตœ์ ํ™”ํ•˜๋ฉฐ, ์›Œํฌ๋กœ๋“œ ๋ณ€ํ™”์— ๋”ฐ๋ผ ์„œ๋น„์Šค ๋ฐฐ์น˜๋ฅผ ์ง€์†์ ์œผ๋กœ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ํผ๋ธ”๋ฆญ ํด๋ผ์šฐ๋“œ์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ, Meta๋Š” ๋ชจ๋“  ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์ง์ ‘ ์†Œ์œ ํ•˜๊ณ  ์žˆ์–ด ํ•„์š”ํ•  ๋•Œ๋งˆ๋‹ค ์ง€์—ญ ๊ฐ„ ์ด๋™์ด ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ Global-DaaC๋ฅผ ๋” ์‰ฝ๊ฒŒ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, ํผ๋ธ”๋ฆญ ํด๋ผ์šฐ๋“œ๋Š” ๊ณ ๊ฐ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์ง์ ‘ ์ด๋™ํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ด๋Ÿฌํ•œ ์œ ์—ฐ์„ฑ์ด ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. Global-DaaC๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด, Meta์˜ ๋„๊ตฌ๋“ค์€ ๊ธ€๋กœ๋ฒŒ, ์ง€์—ญ, ๊ฐœ๋ณ„ ์„œ๋ฒ„ ์ˆ˜์ค€์—์„œ ์ž์› ํ• ๋‹น์„ ์›ํ™œํ•˜๊ฒŒ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ €, Meta์˜ ๊ธ€๋กœ๋ฒŒ ์šฉ๋Ÿ‰ ๊ด€๋ฆฌ ๋„๊ตฌ(global capacity-management tool)๋Š” RPC ์ถ”์ ์„ ์‚ฌ์šฉํ•˜์—ฌ ์„œ๋น„์Šค ๊ฐ„ ์ข…์†์„ฑ์„ ์‹๋ณ„ํ•˜๊ณ , ์ž์› ์†Œ๋น„ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•œ ํ›„, ํ˜ผํ•ฉ ์ •์ˆ˜ ํ”„๋กœ๊ทธ๋ž˜๋ฐ(mixed-integer programming)์„ ์ ์šฉํ•˜์—ฌ ์„œ๋น„์Šค์˜ ๊ธ€๋กœ๋ฒŒ ์šฉ๋Ÿ‰ ์š”๊ตฌ ์‚ฌํ•ญ์„ ์ง€์—ญ๋ณ„ ํ• ๋‹น๋Ÿ‰์œผ๋กœ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ, ์ง€์—ญ ์šฉ๋Ÿ‰ ๊ด€๋ฆฌ ๋„๊ตฌ(regional capacity-management tool)๋Š” ์ง€์—ญ๋ณ„ ํ• ๋‹น๋Ÿ‰์— ๋”ฐ๋ผ ์„œ๋ฒ„ ์ž์›์„ ํ• ๋‹นํ•˜์—ฌ ๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ(virtual clusters)๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋ฌผ๋ฆฌ์  ํด๋Ÿฌ์Šคํ„ฐ์™€ ๋‹ฌ๋ฆฌ, ๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ๋Š” ๋™์ผํ•œ ์ง€์—ญ ๋‚ด ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์˜ ์„œ๋ฒ„๋กœ ๊ตฌ์„ฑ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํฌ๊ธฐ๊ฐ€ ๋™์ ์œผ๋กœ ํ™•์žฅ๋˜๊ฑฐ๋‚˜ ์ถ•์†Œ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹คํ–‰ ์‹œ๊ฐ„ ๋™์•ˆ, Meta์˜ ์ปจํ…Œ์ด๋„ˆ ๊ด€๋ฆฌ ๋„๊ตฌ(container-management tool)๋Š” ์ด๋Ÿฌํ•œ ๊ฐ€์ƒ ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด์— ์ปจํ…Œ์ด๋„ˆ๋ฅผ ํ• ๋‹นํ•˜๋ฉฐ, ๊ณ ์žฅ ํ—ˆ์šฉ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ํ•˜๋‚˜์˜ ์ž‘์—…์„ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์— ๋ถ„์‚ฐ ๋ฐฐ์น˜ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์„œ๋ฒ„ ์ˆ˜์ค€์—์„œ๋Š” Meta์˜ ์ปค๋„ ๋ฉ”์ปค๋‹ˆ์ฆ˜(kernel mechanisms)์ด ๊ฐœ๋ณ„ ์ปจํ…Œ์ด๋„ˆ์— ํ• ๋‹น๋œ ๋ฉ”๋ชจ๋ฆฌ ๋ฐ I/O ์ž์›์˜ ์ ์ ˆํ•œ ๊ณต์œ ์™€ ๊ฒฉ๋ฆฌ๋ฅผ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์™€ ๊ฐ™์€ ์ƒํƒœ ์ €์žฅ(stateful) ์„œ๋น„์Šค๋Š” Global-DaaC์˜ ์ด์ ์„ ๋ˆ„๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์„œ๋น„์Šค๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์ƒค๋”ฉ(sharding) ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๋ฉฐ, ๊ฐ ์ปจํ…Œ์ด๋„ˆ๋Š” ํšจ์œจ์„ฑ์„ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์ƒค๋“œ๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Meta์˜ ๊ธ€๋กœ๋ฒŒ ์„œ๋น„์Šค ๋ฐฐ์น˜ ๋„๊ตฌ(GSP, Global Service Placer)๋Š” ์ œ์•ฝ ์ตœ์ ํ™”(constrained optimization)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๋ฐ์ดํ„ฐ ์ƒค๋“œ์˜ ์ตœ์  ๋ณต์ œ๋ณธ ๊ฐœ์ˆ˜์™€ ์ง€์—ญ๋ณ„ ๋ฐฐ์น˜๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ํ›„, Meta์˜ ์ƒค๋”ฉ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” GSP๊ฐ€ ์„ค์ •ํ•œ ์ œ์•ฝ ์กฐ๊ฑด์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒค๋“œ ๋ณต์ œ๋ณธ์„ ์ปจํ…Œ์ด๋„ˆ์— ํ• ๋‹นํ•˜๋ฉฐ, ๋ถ€ํ•˜ ๋ณ€ํ™”์— ๋”ฐ๋ผ ์ด๋ฅผ ๋™์ ์œผ๋กœ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ํ•ฉ๋‹ˆ๋‹ค. ์œ ์‚ฌํ•˜๊ฒŒ, ๋จธ์‹ ๋Ÿฌ๋‹(ML) ์›Œํฌ๋กœ๋“œ๋„ Global-DaaC์˜ ํ˜œํƒ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ML ์ถ”๋ก (inference)์˜ ๊ฒฝ์šฐ, ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ ์ƒค๋“œ์™€ ์œ ์‚ฌํ•˜๊ฒŒ ๊ด€๋ฆฌ๋˜๋ฉฐ, GSP๊ฐ€ ๋ชจ๋ธ ๋ณต์ œ๋ณธ์˜ ๊ฐœ์ˆ˜์™€ ๋ฐฐ์น˜ ์œ„์น˜๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ML ํ•™์Šต(training)์˜ ๊ฒฝ์šฐ, ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ GPU๊ฐ€ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ์— ํ•จ๊ป˜ ๋ฐฐ์น˜๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ํŒ€์€ ๊ธ€๋กœ๋ฒŒ GPU ์šฉ๋Ÿ‰ ํ• ๋‹น๋Ÿ‰์„ ๋ฐ›์œผ๋ฉฐ, ํ•™์Šต ์ž‘์—…์„ ๊ธ€๋กœ๋ฒŒ ์ž‘์—… ํ(global job queue)์— ์ œ์ถœํ•ฉ๋‹ˆ๋‹ค. Meta์˜ ML ํ•™์Šต ์Šค์ผ€์ค„๋Ÿฌ๋Š” ์ž๋™์œผ๋กœ ๋ฐ์ดํ„ฐ ๋ณต์ œ ๋ฐ GPU ํ• ๋‹น ์ง€์—ญ์„ ์„ ํƒํ•˜์—ฌ, ๋ฐ์ดํ„ฐ์™€ GPU๊ฐ€ ํ•จ๊ป˜ ๋ฐฐ์น˜๋˜๋„๋ก ๋ณด์žฅํ•˜๋ฉด์„œ GPU ํ™œ์šฉ๋„๋ฅผ ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

Insight 6: Meta๋Š” โ€œ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ์ปดํ“จํ„ฐ์ฒ˜๋Ÿผ ์šด์˜(DaaC, Datacenter as a Computer)โ€ํ•˜๋Š” ๋ฐฉ์‹์—์„œ โ€œ๋ชจ๋“  ๊ธ€๋กœ๋ฒŒ ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ์ปดํ“จํ„ฐ์ฒ˜๋Ÿผ ์šด์˜(Global-DaaC, Global Datacenter as a Computer)โ€ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ฐœ์ „ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์—์„œ๋Š” ์ธํ”„๋ผ๊ฐ€ ์ž‘์—… ๋ถ€ํ•˜ ๋ณ€ํ™”์— ๋”ฐ๋ผ ์ž๋™์œผ๋กœ ๊ธ€๋กœ๋ฒŒ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ๊ฐ„ ๋ฐฐํฌ๋ฅผ ๊ฒฐ์ •ํ•˜๊ณ  ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ํ•˜๋ฉฐ, ์ด๋ฅผ ์œ„ํ•ด ์‚ฌ์šฉ์ž์˜ ๊ฐœ์ž…์ด ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. Meta๋Š” ์ด ์ ‘๊ทผ ๋ฐฉ์‹์„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค, ML ์‹œ์Šคํ…œ, ๊ทธ๋ฆฌ๊ณ  10๋งŒ ๋Œ€ ์ด์ƒ์˜ ์„œ๋ฒ„ ๋ฐ 10๋งŒ ๊ฐœ ์ด์ƒ์˜ GPU ๊ทœ๋ชจ๋กœ ์šด์˜๋˜๋Š” ๋‹ค์–‘ํ•œ ์„œ๋น„์Šค์—์„œ ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

Hardware and Software Co-Design

๋‹จ์ผ ์„œ๋ฒ„ ๋‚ด์—์„œ ํ•˜๋“œ์›จ์–ด์™€ ์†Œํ”„ํŠธ์›จ์–ด์˜ ๊ณต๋™ ์„ค๊ณ„(co-design)๋Š” ์ผ๋ฐ˜์ ์ด์ง€๋งŒ, Meta๋Š” ์ด๋ฅผ ๊ธ€๋กœ๋ฒŒ ์ˆ˜์ค€์œผ๋กœ ํ™•์žฅํ•˜์—ฌ ์†Œํ”„ํŠธ์›จ์–ด ์†”๋ฃจ์…˜์„ ํ™œ์šฉํ•ด ์ €๋น„์šฉ ํ•˜๋“œ์›จ์–ด์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Low-Cost Fault Tolerance

ํผ๋ธ”๋ฆญ ํด๋ผ์šฐ๋“œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋†’์€ ๊ฐ€์šฉ์„ฑ์„ ์ œ๊ณตํ•˜๋Š” ํ•˜๋“œ์›จ์–ด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํผ๋ธ”๋ฆญ ํด๋ผ์šฐ๋“œ ๊ณ ๊ฐ์˜ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์ถฉ๋ถ„ํ•œ ๋‚ด๊ฒฐํ•จ์„ฑ์„ ๊ฐ–์ถ”์ง€ ๋ชปํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, Meta๋Š” ๋ชจ๋“  ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์ง์ ‘ ์ œ์–ดํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ๋‚ฎ์€ ๊ฐ€์šฉ์„ฑ์„ ๋ณด์žฅํ•˜๋Š” ์ €๋น„์šฉ ํ•˜๋“œ์›จ์–ด์—์„œ๋„ ์‹คํ–‰๋  ์ˆ˜ ์žˆ๋„๋ก ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๋‚ด๊ฒฐํ•จ์„ฑ์ด ๊ฐ•ํ•œ ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ํผ๋ธ”๋ฆญ ํด๋ผ์šฐ๋“œ์˜ ์„œ๋ฒ„ ๋ž™์€ ๋†’์€ ๊ฐ€์šฉ์„ฑ์„ ๋ณด์žฅํ•˜๊ณ  ์‹คํ–‰ ์ค‘์ธ ์›Œํฌ๋กœ๋“œ์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๋„๋ก ์œ ์ง€๋ณด์ˆ˜๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์ด์ค‘ ์ „์› ๊ณต๊ธ‰ ์žฅ์น˜(dual power supplies)์™€ ์ด์ค‘ ToR(Top-of-Rack) ์Šค์œ„์น˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, Meta์˜ ์„œ๋ฒ„ ๋ž™์€ ์ด์ค‘ ์ „์› ๊ณต๊ธ‰ ์žฅ์น˜๋‚˜ ์ด์ค‘ ToR ์Šค์œ„์น˜๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋Œ€์‹ , ํ•˜๋“œ์›จ์–ด ์ด์ค‘ํ™”๋Š” ํ›จ์”ฌ ๋” ํฐ ๋ฒ”์œ„์—์„œ ์ด๋ฃจ์–ด์ง€๋ฉฐ, ๊ฐ ์ „๋ ฅ ๋ฉ”์ธ ์Šค์œ„์น˜๋ณด๋“œ(MSB)๊ฐ€ ์•ฝ 10,000~20,000๋Œ€์˜ ์„œ๋ฒ„๋ฅผ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค. 6๊ฐœ์˜ MSB๋งˆ๋‹ค ํ•˜๋‚˜์˜ ์˜ˆ๋น„ MSB๋งŒ ๋ฐฑ์—…์œผ๋กœ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ํผ๋ธ”๋ฆญ ํด๋ผ์šฐ๋“œ์˜ ๊ฐ€์ƒ ๋จธ์‹ (VM)์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋„คํŠธ์›Œํฌ ์—ฐ๊ฒฐ ๋ธ”๋ก ๋””๋ฐ”์ด์Šค(network-attached block devices)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ผ์ด๋ธŒ VM ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, Meta์˜ ์ปจํ…Œ์ด๋„ˆ๋Š” ๋ฃจํŠธ ๋””์Šคํฌ๋กœ ์ €๋น„์šฉ์˜ ์ง์ ‘ ์—ฐ๊ฒฐ SSD(directly attached SSD)๋ฅผ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์œ ์ง€๋ณด์ˆ˜ ์ค‘ ๋ผ์ด๋ธŒ ์ปจํ…Œ์ด๋„ˆ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์ด ์–ด๋ ต์Šต๋‹ˆ๋‹ค. Meta๋Š” ์ €๋น„์šฉ ํ•˜๋“œ์›จ์–ด์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์†Œํ”„ํŠธ์›จ์–ด ์†”๋ฃจ์…˜์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ฒซ์งธ, Meta์˜ ์ž์› ํ• ๋‹น ๋„๊ตฌ๋Š” ์„œ๋น„์Šค์˜ ์ปจํ…Œ์ด๋„ˆ์™€ ๋ฐ์ดํ„ฐ ์ƒค๋“œ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ํ•˜์œ„ ์žฅ์•  ๋„๋ฉ”์ธ(MSB)์— ์ถฉ๋ถ„ํžˆ ๋ถ„์‚ฐ๋˜๋„๋ก ํ•˜์—ฌ ๋‚ด๊ฒฐํ•จ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ๋‘˜์งธ, ์ปจํ…Œ์ด๋„ˆ์˜ ๋ผ์ดํ”„์‚ฌ์ดํด ๊ด€๋ฆฌ๋ฅผ ์„œ๋น„์Šค๊ฐ€ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ํ˜‘๋ ฅ ํ”„๋กœํ† ์ฝœ(cooperative protocol)์„ ํ†ตํ•ด, ๋™์ผํ•œ ๋ฐ์ดํ„ฐ ์ƒค๋“œ์˜ ๋‘ ๊ฐœ ๋ณต์ œ๋ณธ์ด ๋™์‹œ์— ์ข…๋ฃŒ๋˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๋“ฑ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ˆ˜์ค€์˜ ์ œ์•ฝ ์‚ฌํ•ญ์„ ์œ ์ง€๋ณด์ˆ˜ ์ž‘์—…์—์„œ๋„ ์ค€์ˆ˜ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, Global-DaaC๋Š” ์ „์ฒด ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ์˜ ์†์‹ค, ๊ฐ ์ง€์—ญ์˜ MSB ํ•œ ๊ฐœ ์†์‹ค, ๊ทธ๋ฆฌ๊ณ  ๊ฐ ์ง€์—ญ์—์„œ ์ž„์˜์˜ ์ผ๋ถ€ ์„œ๋ฒ„ ์†์‹ค์—๋„ ์„œ๋น„์Šค๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์šด์˜๋  ์ˆ˜ ์žˆ๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. Meta๋Š” ์ด๋Ÿฌํ•œ ๋‚ด๊ฒฐํ•จ์„ฑ์ด ์œ ์ง€๋˜๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ ์ •๊ธฐ์ ์œผ๋กœ ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Global-DaaC๋Š” ์ „์ฒด ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ์˜ ์žฅ์• , ๊ฐ ์ง€์—ญ์˜ MSB(Main Switch Board) ํ•˜๋‚˜์˜ ์žฅ์• , ๊ทธ๋ฆฌ๊ณ  ๊ฐ ์ง€์—ญ์—์„œ ์ผ์ • ๋น„์œจ์˜ ๋ฌด์ž‘์œ„ ์„œ๋ฒ„ ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ•˜๋”๋ผ๋„ ์„œ๋น„์Šค๊ฐ€ ์ง€์†์ ์œผ๋กœ ์šด์˜๋  ์ˆ˜ ์žˆ๋„๋ก ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค.

Meta์˜ ์ธํ”„๋ผ๋Š” ์ „์ฒด ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ์ด ์žฅ์• ๋ฅผ ๊ฒช๋”๋ผ๋„ ์‚ฌ์šฉ์ž์—๊ฒŒ ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š๋„๋ก ์„ค๊ณ„๋˜์–ด ์žˆ์ง€๋งŒ, ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ง€์—ญ์˜ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด์„œ ํ—ˆ๋ฆฌ์ผ€์ธ๊ณผ ๊ฐ™์€ ๋Œ€๊ทœ๋ชจ ์ž์—ฐ์žฌํ•ด๋กœ ์ธํ•ด ์ธ์ ‘ํ•œ ๋‘ ์ง€์—ญ์ด ๋™์‹œ์— ์˜ํ–ฅ์„ ๋ฐ›์„ ๊ฐ€๋Šฅ์„ฑ์ด ์ปค์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๊ฐœ์˜ ์ง€์—ญ์ด ๋™์‹œ์— ์žฅ์• ๋ฅผ ๊ฒช์„ ๊ฐ€๋Šฅ์„ฑ์— ๋Œ€๋น„ํ•ด ์šฉ๋Ÿ‰์„ ๊ณผ๋„ํ•˜๊ฒŒ ํ• ๋‹น(over-provisioning)ํ•˜๋Š” ๋Œ€์‹ , Meta๋Š” ์†Œํ”„ํŠธ์›จ์–ด ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ฑ„ํƒํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ์ง€์—ญ์ด ๋™์‹œ์— ์žฅ์• ๋ฅผ ๊ฒช์„ ๊ฒฝ์šฐ, ์ค‘์š”๋„๊ฐ€ ๋‚ฎ์€ ๊ธฐ๋Šฅ์„ ๋น„ํ™œ์„ฑํ™”ํ•˜๊ณ , ๋™์˜์ƒ ํ’ˆ์งˆ์„ ๋‚ฎ์ถ”๋Š” ๋“ฑ์˜ ๋ฐฉ์‹์œผ๋กœ ์„œ๋น„์Šค ํ’ˆ์งˆ์„ ์ ์ง„์ ์œผ๋กœ ์ €ํ•˜์‹œ์ผœ ์‹œ์Šคํ…œ ๋ถ€ํ•˜๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค.

Eliminating the Costs of Routing Proxies

์ผ๋ฐ˜์ ์ธ ์„œ๋น„์Šค ๋ฉ”์‹œ(service mesh)๋Š” RPC ์š”์ฒญ์„ ๋ผ์šฐํŒ…ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์ด๋“œ์นด ํ”„๋ก์‹œ(sidecar proxy)๋ฅผ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜์ง€๋งŒ, Meta์˜ ์„œ๋น„์Šค ๋ฉ”์‹œ๋Š” ์ „์ฒด RPC ์š”์ฒญ ์ค‘ ๋‹จ 1%๋งŒ ํ”„๋ก์‹œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ผ์šฐํŒ…ํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋จธ์ง€ 99%์˜ RPC ์š”์ฒญ์€ ์„œ๋น„์Šค ์‹คํ–‰ ํŒŒ์ผ์— ์ง์ ‘ ์—ฐ๊ฒฐ๋œ ๋ผ์šฐํŒ… ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํด๋ผ์ด์–ธํŠธ์—์„œ ์„œ๋ฒ„๋กœ ์ง์ ‘ ์—ฐ๊ฒฐ๋˜๋ฉฐ, ์ค‘๊ฐ„ ํ”„๋ก์‹œ๋ฅผ ์šฐํšŒํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋น„์ „ํ†ต์ ์ธ ๋ฐฉ์‹ ๋•๋ถ„์— Meta๋Š” ์•ฝ 10๋งŒ ๋Œ€์˜ ํ”„๋ก์‹œ ์„œ๋ฒ„ ๋น„์šฉ์„ ์ ˆ๊ฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์•ฝ 1๋งŒ ๊ฐœ์˜ ์„œ๋น„์Šค์— ์ปดํŒŒ์ผ๋˜์–ด ๊ฐ๊ธฐ ๋‹ค๋ฅธ ๋ฐฐํฌ ์ผ์ •์œผ๋กœ ์šด์˜๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฐฐํฌ ๊ณผ์ •์—์„œ ๋„์ „ ๊ณผ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. Meta์˜ ์†Œํ”„ํŠธ์›จ์–ด ๋ฐฐํฌ ๋ฐ ๊ตฌ์„ฑ ๊ด€๋ฆฌ ๋„๊ตฌ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

Tiered Storage and Local SSDs

Meta๋Š” ๋ฐ์ดํ„ฐ ์ ‘๊ทผ ๋นˆ๋„์™€ ์ง€์—ฐ ์‹œ๊ฐ„ ์š”๊ตฌ ์‚ฌํ•ญ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ๋ฅผ ํ•ซ(hot), ์›œ(warm), ์ฝœ๋“œ(cold) ๋ฐ์ดํ„ฐ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ , ๊ฐ ์œ ํ˜•์— ์ ํ•ฉํ•œ ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋น„์šฉ ํšจ์œจ์„ฑ์„ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์†Œ์…œ ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์™€ ๊ฐ™์€ ํ•ซ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์™€ ์บ์‹œ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์™€ SSD(Solid State Drive)์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ๋น„๋””์˜ค, ์ด๋ฏธ์ง€ ๋ฐ ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค(์˜ˆ: ์‚ฌ์šฉ์ž ํ™œ๋™ ๋กœ๊ทธ)์™€ ๊ฐ™์€ ์›œ ๋ฐ์ดํ„ฐ๋Š” ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ์— ์ €์žฅ๋˜๋ฉฐ, ์ฃผ๋กœ HDD(Hard Disk Drive)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ด€ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์Šคํ† ๋ฆฌ์ง€ ์„œ๋ฒ„์—๋Š” ํ•˜๋‚˜์˜ CPU, 36๊ฐœ์˜ HDD, ๊ทธ๋ฆฌ๊ณ  ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์บ์‹ฑ์„ ์œ„ํ•œ 2๊ฐœ์˜ SSD๊ฐ€ ์žฅ์ฐฉ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. 10๋…„ ์ด์ƒ ๋œ ๊ณ ํ•ด์ƒ๋„ ๋น„๋””์˜ค์™€ ๊ฐ™์€ ๊ฑฐ์˜ ์ ‘๊ทผํ•˜์ง€ ์•Š๋Š” ์ฝœ๋“œ ๋ฐ์ดํ„ฐ๋Š” ๊ณ ๋ฐ€๋„(high-density) HDD ์„œ๋ฒ„์— ์•„์นด์ด๋น™๋˜๋ฉฐ, ๊ฐ ์„œ๋ฒ„์—๋Š” ํ•˜๋‚˜์˜ CPU์™€ 216๊ฐœ์˜ HDD๊ฐ€ ํƒ‘์žฌ๋˜์–ด ์žˆ์–ด, ์ด ์†Œ์œ  ๋น„์šฉ(TCO)๊ณผ ๋ฐ์ดํ„ฐ ๋ณต๊ตฌ ์†๋„ ๊ฐ„์˜ ๊ท ํ˜•์„ ๋งž์ถฅ๋‹ˆ๋‹ค. ์ด HDD๋“ค์€ ๋Œ€๋ถ€๋ถ„์˜ ์‹œ๊ฐ„ ๋™์•ˆ ์ „์›์ด ๊บผ์ ธ ์žˆ์œผ๋ฉฐ, ํ•„์š”ํ•  ๋•Œ๋งŒ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค. SSD์— ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ์›Œํฌ๋กœ๋“œ ์ค‘ ์ผ๋ถ€๋Š” ๊ธด ๊ผฌ๋ฆฌ ์ง€์—ฐ ์‹œ๊ฐ„(tail latency)์„ ํ—ˆ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, SSD ํ™œ์šฉ๋„๋ฅผ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•ด SSD ๊ธฐ๋ฐ˜์˜ ๊ณต์œ  ์›๊ฒฉ ์Šคํ† ๋ฆฌ์ง€๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ง€์—ฐ ์‹œ๊ฐ„ ์š”๊ตฌ ์‚ฌํ•ญ์ด ์—„๊ฒฉํ•œ ์›Œํฌ๋กœ๋“œ๋Š” ์—ฌ์ „ํžˆ ์ง์ ‘ ์—ฐ๊ฒฐ๋œ ๋กœ์ปฌ SSD๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ, Meta๋Š” ๋น„์šฉ ์ ˆ๊ฐ์„ ์œ„ํ•ด ๋กœ์ปฌ SSD๋ฅผ ๋” ์ž์ฃผ ํ™œ์šฉํ•˜์ง€๋งŒ, ์ด๋Š” ๊ด€๋ฆฌ ๋ณต์žก์„ฑ์„ ์ฆ๊ฐ€์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ถ€ํ•˜ ๋ถ„๋ฐฐ๊ฐ€ ๋ถˆ๊ท ํ˜•ํ•  ๊ฒฝ์šฐ ๋กœ์ปฌ SSD๊ฐ€ ์ œ๋Œ€๋กœ ํ™œ์šฉ๋˜์ง€ ๋ชปํ•˜๊ณ  ์œ ํœด ์ƒํƒœ๋กœ ๋‚จ์•„ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์žฅ์•  ๋ณต๊ตฌ ๊ณผ์ •์—์„œ ์‹คํŒจํ•œ ์„œ๋ฒ„์˜ SSD์— ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ‡ํ˜€ ๋ณต๊ตฌ๊ฐ€ ์–ด๋ ค์›Œ์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, Meta๋Š” ๊ณตํ†ต ์ƒค๋”ฉ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ์ปฌ SSD ๊ธฐ๋ฐ˜์˜ ์ƒํƒœ ์ €์žฅ ์„œ๋น„์Šค(stateful service)๋ฅผ ๊ตฌํ˜„ํ•˜๋ฉฐ, ์ด ์†”๋ฃจ์…˜์„ ํ•œ ๋ฒˆ ๊ตฌ์ถ•ํ•œ ํ›„ ์—ฌ๋Ÿฌ ์„œ๋น„์Šค์—์„œ ์žฌ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

Insight 7: Meta๋Š” ํ•˜๋“œ์›จ์–ด ๋น„์šฉ์„ ์ ˆ๊ฐํ•˜๊ธฐ ์œ„ํ•ด ์†Œํ”„ํŠธ์›จ์–ด ์†”๋ฃจ์…˜์„ ํ™œ์šฉํ•˜์—ฌ ์ €๋น„์šฉ ํ•˜๋“œ์›จ์–ด์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•ฉ๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ์†Œํ”„ํŠธ์›จ์–ด ์Šคํƒ์˜ ๋ณต์žก์„ฑ์„ ์ฆ๊ฐ€์‹œํ‚ค์ง€๋งŒ, ๋น„์šฉ ์ ˆ๊ฐ ํšจ๊ณผ๊ฐ€ ํฌ๊ธฐ ๋•Œ๋ฌธ์— ์ถฉ๋ถ„ํžˆ ๊ฐ€์น˜ ์žˆ๋Š” ์„ ํƒ์ด๋ผ๊ณ  ํŒ๋‹จ๋ฉ๋‹ˆ๋‹ค.

In-House Hardware Design

Meta๋Š” ๋น„์šฉ ์ ˆ๊ฐ๊ณผ ์ „๋ ฅ ํšจ์œจ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ž์ฒด์ ์œผ๋กœ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์™€ ํ•˜๋“œ์›จ์–ด(์„œ๋ฒ„, ๋„คํŠธ์›Œํฌ ์Šค์œ„์น˜, ๋น„๋””์˜ค ๊ฐ€์†๊ธฐ, AI ์นฉ)๋ฅผ ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์„ผํ„ฐ์—์„œ ์ „๋ ฅ์€ ๊ฐ€์žฅ ์ œํ•œ์ ์ธ ์ž์›์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋ฅผ ๊ฑด์„คํ•  ๋•Œ ์ „๋ ฅ ์šฉ๋Ÿ‰์ด ๊ณ ์ •๋˜๋ฉฐ, ๋ฐ์ดํ„ฐ์„ผํ„ฐ์˜ 20~30๋…„ ์ˆ˜๋ช… ๋™์•ˆ ์ด๋ฅผ ํ™•์žฅํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, ๋„คํŠธ์›Œํฌ์™€ ์„œ๋ฒ„๋Š” ํ•„์š”์— ๋”ฐ๋ผ ์—…๊ทธ๋ ˆ์ด๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์„ผํ„ฐ์˜ ์ „๋ ฅ์€ ์ข…์ข… ๊ณผ์ž‰ ํ• ๋‹น(oversubscription)๋ฉ๋‹ˆ๋‹ค. ์ž‘์—… ๋ถ€ํ•˜๊ฐ€ ๊ธ‰์ฆํ•  ๋•Œ ์ „๋ ฅ ์‚ฌ์šฉ์ด ์ดˆ๊ณผ๋˜์ง€ ์•Š๋„๋ก, ์ž๋™ํ™” ๋„๊ตฌ๊ฐ€ ์ „๋ ฅ ๊ณต๊ธ‰ ๊ณ„์ธต ์ „์ฒด์—์„œ ์ „๋ ฅ ์ œํ•œ(power-capping) ์กฐ์น˜๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. Meta์˜ ํ•˜๋“œ์›จ์–ด ์„ค๊ณ„๋Š” ํ•˜๋“œ์›จ์–ด/์†Œํ”„ํŠธ์›จ์–ด ๊ณต๋™ ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ๋น„์šฉ๊ณผ ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ ˆ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, AI ์นฉ์˜ SRAM ์‚ฌ์šฉ์„ ์ž‘์—… ๋ถ€ํ•˜์— ๋งž๊ฒŒ ์ตœ์ ํ™”ํ•˜๊ฑฐ๋‚˜, ๋ถˆํ•„์š”ํ•œ ๊ตฌ์„ฑ ์š”์†Œ(์˜ˆ: ์••์ถ• ๋ƒ‰๊ฐ ์‹œ์Šคํ…œ์ด ํฌํ•จ๋œ ์—์–ด์ปจ)๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๋ฐฉ์‹์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, Meta๋Š” ์ž์ฒด์ ์œผ๋กœ ๋„คํŠธ์›Œํฌ ์Šค์œ„์น˜์™€ ๊ด€๋ จ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ, ์Šค์œ„์น˜ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ์ผ๋ฐ˜ ์„œ๋น„์Šค์ฒ˜๋Ÿผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ ์ž์ฃผ ์—…๋ฐ์ดํŠธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Meta์˜ ๋Œ€๋ถ€๋ถ„์˜ ํ•˜๋“œ์›จ์–ด ์„ค๊ณ„๋Š” Open Compute Project๋ฅผ ํ†ตํ•ด ์˜คํ”ˆ ์†Œ์Šค๋กœ ๊ณต์œ ๋ฉ๋‹ˆ๋‹ค.

Insight 8: Meta๋Š” ํ•˜๋“œ์›จ์–ด ๋น„์šฉ๊ณผ ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ž์ฒด์ ์œผ๋กœ ๋ฐ์ดํ„ฐ์„ผํ„ฐ, ์„œ๋ฒ„, ๋ž™, ๋„คํŠธ์›Œํฌ ์Šค์œ„์น˜๋ฅผ ์„ค๊ณ„ํ•˜๋ฉฐ, ์ด๋Ÿฌํ•œ ์„ค๊ณ„๋ฅผ ์˜คํ”ˆ ์†Œ์Šค๋กœ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค.

Designing Scalable Systems

ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ์—์„œ ๋ฐ˜๋ณต์ ์œผ๋กœ ๋“ฑ์žฅํ•˜๋Š” ํ•ต์‹ฌ ์ฃผ์ œ ์ค‘ ํ•˜๋‚˜๋Š” ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. BGP, BitTorrent, ๋ถ„์‚ฐ ํ•ด์‹œ ํ…Œ์ด๋ธ”(DHT)๊ณผ ๊ฐ™์€ ์ธํ„ฐ๋„ท ํ™˜๊ฒฝ์„ ์œ„ํ•ด ์„ค๊ณ„๋œ ๋ถ„์‚ฐ ์‹œ์Šคํ…œ์€ ์ข…์ข… ํ™•์žฅ์„ฑ์ด ๋›ฐ์–ด๋‚˜๋‹ค๋Š” ํ‰๊ฐ€๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ๋ฐ์ดํ„ฐ์„ผํ„ฐ ํ™˜๊ฒฝ์—์„œ๋Š” ์ž์› ์ œํ•œ์ด ๋œํ•˜๊ณ  ๋‹จ์ผ ์กฐ์ง์ด ์ „์ฒด ์‹œ์Šคํ…œ์„ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ค‘์•™ ์ง‘์ค‘์‹ ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ์ถฉ๋ถ„ํ•œ ํ™•์žฅ์„ฑ์„ ์ œ๊ณตํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋” ๋‹จ์ˆœํ•˜๊ณ  ๋†’์€ ํ’ˆ์งˆ์˜ ๊ฒฐ์ •์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด Meta์˜ ๊ฒฝํ—˜์ž…๋‹ˆ๋‹ค.

Deprecating Decentralized Controllers

์ด ์„น์…˜์—์„œ๋Š” ์ค‘์•™ ์ง‘์ค‘์‹ ์ปจํŠธ๋กค๋Ÿฌ์™€ ๋ถ„์‚ฐ ์ปจํŠธ๋กค๋Ÿฌ ๊ฐ„์˜ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„(trade-off)์— ๋Œ€ํ•œ ๋ช‡ ๊ฐ€์ง€ ์‚ฌ๋ก€๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค. Meta์˜ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ํŒจ๋ธŒ๋ฆญ(fabric) ๋‚ด ๋„คํŠธ์›Œํฌ ์Šค์œ„์น˜๋Š” ์—ฌ์ „ํžˆ ํ˜ธํ™˜์„ฑ์„ ์œ„ํ•ด BGP๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ, ๋„คํŠธ์›Œํฌ ํ˜ผ์žก(network congestion) ๋˜๋Š” ๋งํฌ ์žฅ์• (link failure) ๋ฐœ์ƒ ์‹œ ๊ฒฝ๋กœ๋ฅผ ์žฌ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋Š” ์ค‘์•™ ์ง‘์ค‘์‹ ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ๊ฐ–์ถ”๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. BGP๋ฅผ ์ œ์™ธํ•˜๊ณ , Meta๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๋ถ„์‚ฐ ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ์ค‘์•™ ์ง‘์ค‘์‹ ์ปจํŠธ๋กค๋Ÿฌ๋กœ ์ „ํ™˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, Meta์˜ ์‚ฌ์„ค WAN์—์„œ๋Š” ๊ธฐ์กด์˜ ๋ถ„์‚ฐํ˜• RSVP-TE์—์„œ ์ค‘์•™ ์ง‘์ค‘์‹ ์ปจํŠธ๋กค๋Ÿฌ๋กœ ์ „ํ™˜ํ•˜์—ฌ ์ตœ์ ์˜ ํŠธ๋ž˜ํ”ฝ ๊ฒฝ๋กœ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ์ผ๋ฐ˜์ ์ธ ์žฅ์•  ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋Œ€๋น„ํ•œ ๋ฐฑ์—… ๊ฒฝ๋กœ๋ฅผ ์‚ฌ์ „์— ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ „ํ™˜์„ ํ†ตํ•ด ๋„คํŠธ์›Œํฌ ์ž์›์„ ๋” ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ์œผ๋ฉฐ, ๋„คํŠธ์›Œํฌ ์žฅ์•  ๋ฐœ์ƒ ์‹œ ๋” ๋น ๋ฅด๊ฒŒ ๋ณต๊ตฌํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ํ‚ค-๊ฐ’ ์ €์žฅ์†Œ(key-value store)์—์„œ, DHT๋Š” ๋‹ค์ค‘ ํ™‰ ๋ผ์šฐํŒ…(multi-hop routing)์„ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ • ํ‚ค๋ฅผ ๋‹ด๋‹นํ•˜๋Š” ์„œ๋ฒ„๋ฅผ ๊ฒฐ์ •ํ•˜๋ฉฐ, Cassandra๋Š” ์ผ๊ด€์„ฑ ํ•ด์‹ฑ(consistent hashing)์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‘ ์‹œ์Šคํ…œ ๋ชจ๋‘ ์ค‘์•™ ์ง‘์ค‘์‹ ์ปจํŠธ๋กค๋Ÿฌ ์—†์ด ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, Meta๋Š” ๋” ๋‚˜์€ ๋ถ€ํ•˜ ๋ถ„์‚ฐ(load balancing)์„ ์œ„ํ•ด ์ค‘์•™ ์ง‘์ค‘์‹ ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ‚ค๊ฐ€ ํฌํ•จ๋œ ์ƒค๋“œ(shard)๋ฅผ ์„œ๋ฒ„์— ๋™์ ์œผ๋กœ ์žฌํ• ๋‹นํ•˜๋„๋ก ์„ค๊ณ„๋œ ์ƒค๋”ฉ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋Œ€๋Ÿ‰ ๋ฐ์ดํ„ฐ ๋ถ„๋ฐฐ ์‹œ์Šคํ…œ์—์„œ๋Š” BitTorrent์—์„œ Owl๋กœ ์ „ํ™˜ํ–ˆ์œผ๋ฉฐ, Owl์€ ํ”ผ์–ด(peer)๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋””์—์„œ ๊ฐ€์ ธ์˜ฌ์ง€๋ฅผ ์ค‘์•™์—์„œ ๊ฒฐ์ •ํ•˜์—ฌ ๋‹ค์šด๋กœ๋“œ ์†๋„๋ฅผ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. Owl๊ณผ Meta์˜ ์‚ฌ์„ค WAN์€ ๋” ๋‚˜์€ ์˜์‚ฌ ๊ฒฐ์ •์„ ์œ„ํ•ด ์ œ์–ด ํ‰๋ฉด(control plane)์„ ์ค‘์•™ ์ง‘์ค‘ํ™”ํ•˜์ง€๋งŒ, ์‹ค์ œ ๋ฐ์ดํ„ฐ ์ „์†ก์ด๋‚˜ ๋‹ค์šด๋กœ๋“œ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ์ดํ„ฐ ํ‰๋ฉด(data plane)์€ ์—ฌ์ „ํžˆ ๋ถ„์‚ฐ๋œ ๋ฐฉ์‹์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์†Œ๊ทœ๋ชจ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ถ„๋ฐฐ๋ฅผ ์œ„ํ•ด, ์ดˆ๊ธฐ์—๋Š” Java๋กœ ๊ตฌํ˜„๋œ 3๋‹จ๊ณ„ ๋ถ„๋ฐฐ ํŠธ๋ฆฌ(distribution tree)๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ํŠธ๋ฆฌ์˜ ์ค‘๊ฐ„ ๋…ธ๋“œ๋Š” ์ „์šฉ ํ”„๋ก์‹œ ์„œ๋ฒ„(proxy server)์˜€์œผ๋ฉฐ, ๋ฆฌํ”„ ๋…ธ๋“œ(leaf node)๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ตฌ๋…์ž๋กœ, ๋™์ ์œผ๋กœ ๊ฐ€์ž… ๋ฐ ํƒˆํ‡ดํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด ๊ตฌํ˜„ ๋ฐฉ์‹์ด ํ™•์žฅ์„ฑ์˜ ํ•œ๊ณ„์— ๋„๋‹ฌํ•˜์ž, Meta๋Š” ์ค‘๊ฐ„ ๋…ธ๋“œ๋„ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ตฌ๋…์ž๋กœ ๊ตฌ์„ฑ๋œ P2P(peer-to-peer) ๋ถ„๋ฐฐ ํŠธ๋ฆฌ๋กœ ์ „ํ™˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด ๊ตฌ๋…์ž๋“ค์€ ๋‹ค๋ฅธ ๊ตฌ๋…์ž์—๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋‹ฌํ•˜๋Š” ์—ญํ• ์„ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ตฌ๋…์ž ์ค‘ ์ผ๋ถ€๋Š” ์ „์šฉ ์„œ๋ฒ„๊ฐ€ ์•„๋‹ˆ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์„ฑ๋Šฅ ๋ณ€๋™์ด ์‹ฌํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, ์ด๋Ÿฌํ•œ ๊ตฌ๋…์ž๋ฅผ ์ค‘๊ฐ„ ๋…ธ๋“œ๋กœ ํ™œ์šฉํ•˜์—ฌ ํŠธ๋ž˜ํ”ฝ์„ ์ „๋‹ฌํ•˜๋Š” ๋ฐฉ์‹์€ ์‹ ๋ขฐ์„ฑ์ด ๋‚ฎ์•„์กŒ์œผ๋ฉฐ, ์ด์— ๋”ฐ๋ผ ๋นˆ๋ฒˆํ•˜๊ณ  ์‹œ๊ฐ„์ด ๋งŽ์ด ์†Œ์š”๋˜๋Š” ๋””๋ฒ„๊น…์ด ํ•„์š”ํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ, Meta๋Š” ๋ช‡ ๋…„๊ฐ„์˜ ์šด์˜ ๊ฒฝํ—˜ ๋์— P2P ๋ถ„๋ฐฐ ํŠธ๋ฆฌ๋ฅผ ํ๊ธฐํ•˜๊ณ , ์ „์šฉ ํ”„๋ก์‹œ ์„œ๋ฒ„๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ์กด ์•„ํ‚คํ…์ฒ˜๋กœ ๋Œ์•„๊ฐ”์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ๊ธฐ์กด Java ๊ตฌํ˜„์„ ์„ฑ๋Šฅ์ด ๋” ๋›ฐ์–ด๋‚œ C++ ๊ตฌํ˜„์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ ์ˆ˜์ฒœ๋งŒ ๋ช…์˜ ๊ตฌ๋…์ž ๊ทœ๋ชจ์—๋„ ์›ํ™œํ•˜๊ฒŒ ํ™•์žฅ๋  ์ˆ˜ ์žˆ๋„๋ก ๊ฐœ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

Insight 9: ๋ฐ์ดํ„ฐ์„ผํ„ฐ ํ™˜๊ฒฝ์—์„œ๋Š” ์ค‘์•™ ์ง‘์ค‘์‹ ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ๋” ๋‹จ์ˆœํ•˜๋ฉฐ, ๋” ๋†’์€ ํ’ˆ์งˆ์˜ ๊ฒฐ์ •์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— Meta๋Š” ๋ถ„์‚ฐํ˜• ์ปจํŠธ๋กค๋Ÿฌ๋ณด๋‹ค ์ค‘์•™ ์ง‘์ค‘์‹ ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ์„ ํ˜ธํ•ฉ๋‹ˆ๋‹ค. ๋งŽ์€ ๊ฒฝ์šฐ, ์ค‘์•™ ์ง‘์ค‘์‹ ์ œ์–ด ํ‰๋ฉด(control plane)๊ณผ ๋ถ„์‚ฐํ˜• ๋ฐ์ดํ„ฐ ํ‰๋ฉด(data plane)์„ ๊ฒฐํ•ฉํ•œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ ‘๊ทผ ๋ฐฉ์‹์ด ์ตœ์ ์˜ ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Case Study: Scalable Service Mesh ์ด ์„น์…˜์—์„œ๋Š” Meta์˜ ์„œ๋น„์Šค ๋ฉ”์‹œ(Service Mesh)์ธ ServiceRouter๋ฅผ ์‚ฌ๋ก€ ์—ฐ๊ตฌ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์‹œ์Šคํ…œ ์„ค๊ณ„ ๋ฐฉ์‹์„ ์„ค๋ช…ํ•˜๊ณ , ์ค‘์•™ ์ง‘์ค‘์‹ ์ปจํŠธ๋กค๋Ÿฌ์™€ ๋ถ„์‚ฐํ˜• ๋ฐ์ดํ„ฐ ํ‰๋ฉด(data plane)์ด ๊ฒฐํ•ฉ๋œ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ๋ฐ์ดํ„ฐ์„ผํ„ฐ ํ™˜๊ฒฝ์—์„œ ํšจ๊ณผ์ ์œผ๋กœ ํ™•์žฅ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.ServiceRouter๋Š” ์ดˆ๋‹น ์ˆ˜์‹ญ์–ต ๊ฐœ์˜ RPC๋ฅผ ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ L7(Application Layer) ๋ผ์šฐํ„ฐ๋ฅผ ํ†ตํ•ด ๋ผ์šฐํŒ…ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆผ 3์€ ์—…๊ณ„์—์„œ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ์„œ๋น„์Šค ๋ฉ”์‹œ ๊ตฌ์กฐ๋ฅผ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์—ฌ๊ธฐ์„œ ๊ฐ ์„œ๋น„์Šค ํ”„๋กœ์„ธ์Šค์—๋Š” RPC ์š”์ฒญ์„ ๋ผ์šฐํŒ…ํ•˜๋Š” L7 ์‚ฌ์ด๋“œ์นด ํ”„๋ก์‹œ(sidecar proxy)๊ฐ€ ํ•จ๊ป˜ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์„œ๋ฒ„ 1์—์„œ ์‹คํ–‰ ์ค‘์ธ ์„œ๋น„์Šค A๊ฐ€ ์„œ๋น„์Šค B์— ์š”์ฒญ์„ ๋ณด๋‚ผ ๋•Œ, ์„œ๋ฒ„ 1์˜ ํ”„๋ก์‹œ๋Š” ํ•ด๋‹น ์š”์ฒญ์„ ์„œ๋ฒ„ 2, 3, 4์— ๋ถ„์‚ฐํ•˜์—ฌ ๋ถ€ํ•˜๋ฅผ ๊ท ํ˜• ์žˆ๊ฒŒ ๋ฐฐ๋ถ„ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์ง€๋งŒ, ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ์—์„œ๋Š” ํ™•์žฅ์„ฑ์ด ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋Š” ์ค‘์•™ ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ ์‚ฌ์ด๋“œ์นด ํ”„๋ก์‹œ์˜ ๋ผ์šฐํŒ… ํ…Œ์ด๋ธ”์„ ์ง์ ‘ ๊ด€๋ฆฌํ•˜๊ธฐ์—๋Š” ๋ถ€๋‹ด์ด ํฌ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ค‘์•™ ์ปจํŠธ๋กค๋Ÿฌ๋Š” ๊ธ€๋กœ๋ฒŒ ๋ผ์šฐํŒ… ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ธฐ๋Šฅ๊ณผ ๊ฐ L7 ๋ผ์šฐํ„ฐ๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ๊ธฐ๋Šฅ์„ ๋™์‹œ์— ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ํ™•์žฅ์„ฑ์„ ํ™•๋ณดํ•˜๊ธฐ ์œ„ํ•ด, Meta๋Š” ๊ธ€๋กœ๋ฒŒ ๋ผ์šฐํŒ… ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๊ธฐ๋Šฅ์€ ์ค‘์•™ ์ปจํŠธ๋กค๋Ÿฌ์— ์œ ์ง€ํ•˜๊ณ , ๊ฐœ๋ณ„ L7 ๋ผ์šฐํ„ฐ ๊ด€๋ฆฌ ๊ธฐ๋Šฅ์€ L7 ๋ผ์šฐํ„ฐ ์ž์ฒด๊ฐ€ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์ด์ „ํ•˜์—ฌ, ๊ฐ ๋ผ์šฐํ„ฐ๊ฐ€ ์Šค์Šค๋กœ ์„ค์ •ํ•˜๊ณ  ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆผ 4๋Š” ServiceRouter์˜ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ƒ์œ„ ๊ณ„์ธต์—์„œ๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์ปจํŠธ๋กค๋Ÿฌ๋“ค์ด ๋…๋ฆฝ์ ์œผ๋กœ ๊ฐœ๋ณ„ ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์„œ๋น„์Šค ๋“ฑ๋ก, ๋„คํŠธ์›Œํฌ ์ง€์—ฐ(latency) ์ธก์ •๊ฐ’ ์—…๋ฐ์ดํŠธ, ๊ทธ๋ฆฌ๊ณ  ์„œ๋น„์Šค๋ณ„ ์ง€์—ญ ๊ฐ„ ๋ผ์šฐํŒ… ํ…Œ์ด๋ธ” ๊ณ„์‚ฐ ๋“ฑ์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์ปจํŠธ๋กค๋Ÿฌ๋Š” ์ค‘์•™ ๋ผ์šฐํŒ… ์ •๋ณด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค(RIB, Routing Information Base)๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ์—…๋ฐ์ดํŠธํ•˜๋ฉฐ, ๊ฐœ๋ณ„ L7 ๋ผ์šฐํ„ฐ์˜ ์„ค์ • ๋ฐ ๊ด€๋ฆฌ๋Š” ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. RIB๋Š” Paxos ๊ธฐ๋ฐ˜์˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์ด๋ฉฐ, ์ƒค๋”ฉ(sharding)์„ ํ†ตํ•ด ํ™•์žฅ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. RIB ๋•๋ถ„์— ์ปจํŠธ๋กค๋Ÿฌ๋Š” ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜์ง€ ์•Š๋Š”(stateless) ๋ฐฉ์‹์œผ๋กœ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ƒค๋”ฉ์„ ํ†ตํ•ด ์‰ฝ๊ฒŒ ํ™•์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ปจํŠธ๋กค๋Ÿฌ ์ธ์Šคํ„ด์Šค๊ฐ€ ๋™์‹œ์— ๋‹ค์–‘ํ•œ ์„œ๋น„์Šค์˜ ์ง€์—ญ ๊ฐ„ ๋ผ์šฐํŒ… ํ…Œ์ด๋ธ”์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆผ 4์˜ ์ค‘๊ฐ„ ๊ณ„์ธต์—์„œ, ๋ถ„๋ฐฐ ๊ณ„์ธต(distribution layer)์€ ์ˆ˜์ฒœ ๊ฐœ์˜ RIB ๋ณต์ œ๋ณธ(replica)์„ ํ™œ์šฉํ•˜์—ฌ ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ L7 ๋ผ์šฐํ„ฐ๋กœ๋ถ€ํ„ฐ ๋ฐœ์ƒํ•˜๋Š” ์ฝ๊ธฐ ํŠธ๋ž˜ํ”ฝ์„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ํ•˜์œ„ ๊ณ„์ธต์—์„œ๋Š”, RIB์˜ ์ง€์‹œ์— ๋”ฐ๋ผ ๊ฐ L7 ๋ผ์šฐํ„ฐ๊ฐ€ ์Šค์Šค๋กœ ์„ค์ •์„ ๊ตฌ์„ฑํ•˜๋ฉฐ, ์ œ์–ด ํ‰๋ฉด(control plane)์ด ์ง์ ‘ ๊ฐœ์ž…ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ์ด ์‹œ์Šคํ…œ์€ ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ L7 ๋ผ์šฐํ„ฐ๋ฅผ ์ง€์›ํ•˜๋ฉฐ, ์—ฌ๊ธฐ์—๋Š” ๋กœ๋“œ ๋ฐธ๋Ÿฐ์„œ(load balancer), ๋ผ์šฐํŒ… ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ํฌํ•จ๋œ ์„œ๋น„์Šค, ๊ทธ๋ฆฌ๊ณ  ์‚ฌ์ด๋“œ์นด ํ”„๋ก์‹œ๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ServiceRouter์˜ ์‚ฌ๋ก€์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, Meta๋Š” ์ค‘์•™ ์ง‘์ค‘์‹ ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด์„œ๋„ ํ™•์žฅ์„ฑ์„ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜์ง€ ์•Š๋Š”(stateless) ์ปจํŠธ๋กค๋Ÿฌ, ์ปจํŠธ๋กค๋Ÿฌ ์ƒค๋”ฉ, ๊ทธ๋ฆฌ๊ณ  ๊ฐœ๋ณ„ L7 ๋ผ์šฐํ„ฐ ๊ด€๋ฆฌ๋ฅผ ์ค‘์•™ ์ปจํŠธ๋กค๋Ÿฌ์—์„œ ์ œ๊ฑฐํ•˜๋Š” ๋“ฑ์˜ ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

Future Directions

Meta์˜ ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ๋Š” ๋งค์šฐ ๋ณต์žกํ•˜์ง€๋งŒ, ๋ณธ ๋ฌธ์„œ์—์„œ๋Š” ์ฃผ์š” ๊ฐœ๋ฐœ ์ธ์‚ฌ์ดํŠธ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ๊ฐ„๋žตํ•˜๊ณ  ์ƒ์œ„ ๊ฐœ๋… ์œ„์ฃผ์˜ ๊ฐœ์š”๋ฅผ ์ œ๊ณตํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ์˜ ์ž ์žฌ์ ์ธ ๋ฏธ๋ž˜ ํŠธ๋ Œ๋“œ์— ๋Œ€ํ•œ Meta์˜ ๊ฒฌํ•ด๋ฅผ ๊ณต์œ ํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

AI

AI ์›Œํฌ๋กœ๋“œ๋Š” ํ˜„์žฌ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์—์„œ ๊ฐ€์žฅ ํฐ ๋น„์ค‘์„ ์ฐจ์ง€ํ•˜๋Š” ์›Œํฌ๋กœ๋“œ ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Meta๋Š” ์ด๋ฒˆ 10๋…„์ด ๋๋‚˜๊ธฐ ์ „์— ๋ฐ์ดํ„ฐ์„ผํ„ฐ ์ „๋ ฅ ์†Œ๋น„์˜ ์ ˆ๋ฐ˜ ์ด์ƒ์ด AI ์›Œํฌ๋กœ๋“œ์— ํ• ๋‹น๋  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค. AI๋Š” ๋†’์€ ์ž์› ์†Œ๋ชจ(resource-intensive)์™€ ๊ณ ๋Œ€์—ญํญ ๋„คํŠธ์›Œํฌ ์š”๊ตฌ ์‚ฌํ•ญ ๋“ฑ ๋…ํŠนํ•œ ํŠน์„ฑ์„ ๊ฐ€์ง€๋ฏ€๋กœ, ์ธํ”„๋ผ์˜ ๋ชจ๋“  ์ธก๋ฉด์„ ๊ทผ๋ณธ์ ์œผ๋กœ ๋ณ€ํ™”์‹œํ‚ฌ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋ฉ๋‹ˆ๋‹ค. ์ง€๋‚œ 20๋…„ ๋™์•ˆ ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ๋Š” ์ €๋น„์šฉ ๋ฒ”์šฉ ์„œ๋ฒ„๋ฅผ ๋Œ€๋Ÿ‰์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ์Šค์ผ€์ผ ์•„์›ƒ(scale-out) ์ ‘๊ทผ ๋ฐฉ์‹์„ ํ†ตํ•ด ์„ฑ๊ณต์„ ๊ฑฐ๋‘์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํ–ฅํ›„ AI ํด๋Ÿฌ์Šคํ„ฐ๋Š” ๊ณผ๊ฑฐ ์Šˆํผ์ปดํ“จํ„ฐ์—์„œ ์‚ฌ์šฉ๋œ ์Šค์ผ€์ผ ์—…(scale-up) ๋ฐฉ์‹์œผ๋กœ ๋ฐœ์ „ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋Œ€๊ทœ๋ชจ ๋จธ์‹ ๋Ÿฌ๋‹ ํ•™์Šต์„ ์œ„ํ•ด RDMA(Remote Direct Memory Access)๋ฅผ Ethernet์„ ํ†ตํ•ด ํ™œ์šฉํ•˜์—ฌ ๋†’์€ ๋Œ€์—ญํญ๊ณผ ๋‚ฎ์€ ์ง€์—ฐ ์‹œ๊ฐ„์„ ์ œ๊ณตํ•˜๋Š” ๋ฐฉ์‹์ด ์ด์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. Meta๋Š” AI๋ฅผ ์œ„ํ•œ ์ „์ฒด ์Šคํƒ์„ ๊ณต๋™ ์„ค๊ณ„(co-design)ํ•˜๋Š” ์ ‘๊ทผ ๋ฐฉ์‹์„ ํƒํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์—ฌ๊ธฐ์—๋Š” PyTorch, ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ, AI ์นฉ, ๋„คํŠธ์›Œํฌ, ๋ฐ์ดํ„ฐ์„ผํ„ฐ, ์„œ๋ฒ„, ์Šคํ† ๋ฆฌ์ง€, ์ „๋ ฅ ๋ฐ ๋ƒ‰๊ฐ ์‹œ์Šคํ…œ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

Domain-Specific Hardware

2000๋…„๋Œ€ ์ดํ›„ ๊ฐ์†Œํ–ˆ๋˜ ํ•˜๋“œ์›จ์–ด ๋‹ค์–‘์„ฑ์ด ๋‹ค์‹œ ์ฆ๊ฐ€ํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋˜๋ฉฐ, AI ํ•™์Šต ๋ฐ ์ถ”๋ก , ๊ฐ€์ƒํ™”, ๋น„๋””์˜ค ์ธ์ฝ”๋”ฉ, ์•”ํ˜ธํ™”, ์••์ถ•, ๊ณ„์ธตํ˜• ๋ฉ”๋ชจ๋ฆฌ(tiered memory), ๋„คํŠธ์›Œํฌ ๋‚ด ๋ฐ ์Šคํ† ๋ฆฌ์ง€ ๋‚ด ์ฒ˜๋ฆฌ์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ๋ชฉ์ ์„ ์œ„ํ•œ ๋งž์ถคํ˜• ๋ฐ ํŠน์ˆ˜ ํ•˜๋“œ์›จ์–ด๊ฐ€ ํ™•์‚ฐ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ณ€ํ™”๋Š” ๊ทœ๋ชจ์˜ ๊ฒฝ์ œ(economies of scale)๋กœ ์ธํ•ด ํ•˜์ดํผ์Šค์ผ€์ผ ๊ธฐ์—…๋“ค์ด ๋Œ€๋Ÿ‰์˜ ํŠน์ˆ˜ ํ•˜๋“œ์›จ์–ด๋ฅผ ์„ค๊ณ„ํ•˜๊ณ  ๋ฐฐํฌํ•˜์—ฌ ๋น„์šฉ์„ ์ ˆ๊ฐํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Š” ๋งค์šฐ ์ด์งˆ์ ์ธ(homogeneous) ํ•˜๋“œ์›จ์–ด ๊ตฌ์„ฑ์„ ๊ด€๋ฆฌํ•˜๊ณ  ํ™œ์šฉํ•ด์•ผ ํ•˜๋Š” ์†Œํ”„ํŠธ์›จ์–ด ์Šคํƒ์— ์ƒˆ๋กœ์šด ๋„์ „ ๊ณผ์ œ๋ฅผ ์ œ์‹œํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Edge Datacenters

Meta๋Š” ๋ฉ”ํƒ€๋ฒ„์Šค(Metaverse) ๋ฐ ์‚ฌ๋ฌผ์ธํ„ฐ๋„ท(IoT) ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์‚ฌ์šฉ์ด ์ƒ๋‹นํžˆ ์ฆ๊ฐ€ํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ํด๋ผ์šฐ๋“œ ๊ฒŒ์ž„(Cloud Gaming)์€ ๊ทธ๋ž˜ํ”ฝ ๋ Œ๋”๋ง์„ ์‚ฌ์šฉ์ž ๊ธฐ๊ธฐ์—์„œ ์—ฃ์ง€ ๋ฐ์ดํ„ฐ์„ผํ„ฐ(Edge Datacenter)์˜ GPU ์„œ๋ฒ„๋กœ ์ด๋™์‹œํ‚ค๋ฉฐ, ์ด ๊ณผ์ •์—์„œ 25ms ์ดํ•˜์˜ ๋„คํŠธ์›Œํฌ ์ง€์—ฐ ์‹œ๊ฐ„์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์‹ค์‹œ๊ฐ„ ์‘๋‹ต์„ฑ์— ๋Œ€ํ•œ ์ˆ˜์š” ์ฆ๊ฐ€๋กœ ์ธํ•ด ์—ฃ์ง€ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์˜ ์ˆ˜๋Ÿ‰๊ณผ ๊ทœ๋ชจ๊ฐ€ ํฌ๊ฒŒ ์„ฑ์žฅํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค.์ด์— ๋”ฐ๋ผ, ์ธํ”„๋ผ ์ œ์–ด ํ‰๋ฉด(control plane)์€ ๋”์šฑ ๋ถ„์‚ฐ๋œ ๋ฐ์ดํ„ฐ์„ผํ„ฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์ ์‘ํ•ด์•ผ ํ•˜๋ฉฐ, ์ด์ƒ์ ์œผ๋กœ๋Š” Global-DaaC๋ฅผ ๊ฐœ์„ ํ•˜์—ฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐœ๋ฐœ์ž๊ฐ€ ๋ถ„์‚ฐ๋œ ์ธํ”„๋ผ์˜ ๋ณต์žก์„ฑ์„ ์‹ ๊ฒฝ ์“ฐ์ง€ ์•Š๋„๋ก ๋ณดํ˜ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Developer Productivity

์ง€๋‚œ 20๋…„ ๋™์•ˆ ์ž๋™ํ™” ๋„๊ตฌ๋Š” ์‹œ์Šคํ…œ ๊ด€๋ฆฌ์ž(System Administrator)์˜ ์ƒ์‚ฐ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผœ, ์„œ๋ฒ„ ๋Œ€๋น„ ๊ด€๋ฆฌ์ž ๋น„์œจ์ด ์ƒ๋‹นํžˆ ์ฆ๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, ์ผ๋ฐ˜์ ์ธ ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ์€ ์—ฌ์ „ํžˆ ๋…ธ๋™ ์ง‘์•ฝ์ ์ด๋ฉฐ, ์ƒ๋Œ€์ ์œผ๋กœ ์ƒ์‚ฐ์„ฑ ํ–ฅ์ƒ์ด ๋А๋ฆฐ ํŽธ์ž…๋‹ˆ๋‹ค. ์ด๋ฒˆ 10๋…„ ๋™์•ˆ ์ด๋Ÿฌํ•œ ์ถ”์„ธ์— ๋ณ€ํ™”๊ฐ€ ์žˆ์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋˜๋ฉฐ, ๋‘ ๊ฐ€์ง€ ์ด์œ ๋กœ ์ธํ•ด ๊ฐœ๋ฐœ์ž์˜ ์ƒ์‚ฐ์„ฑ์ด ๊ธ‰๊ฒฉํžˆ ํ–ฅ์ƒ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฒซ์งธ, AI ๊ธฐ๋ฐ˜ ์ฝ”๋“œ ์ƒ์„ฑ ๋ฐ ๋””๋ฒ„๊น…(AI-powered code generation and debugging), ๋‘˜์งธ, ํŠน์ • ์‚ฐ์—…(vertical domains)์—์„œ ์™„์ „ํžˆ ํ†ตํ•ฉ๋œ ์„œ๋ฒ„๋ฆฌ์Šค(serverless) ํ”„๋กœ๊ทธ๋ž˜๋ฐ ํŒจ๋Ÿฌ๋‹ค์ž„์ž…๋‹ˆ๋‹ค. Meta์˜ FrontFaaS๋Š” ํ›„์ž์˜ ์˜ˆ์‹œ์ด๋ฉฐ, ์•ž์œผ๋กœ ๋‹ค์–‘ํ•œ ์‚ฐ์—… ๋ถ„์•ผ์—์„œ ์ƒ์‚ฐ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ํŒจ๋Ÿฌ๋‹ค์ž„์ด ๋“ฑ์žฅํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋ฉ๋‹ˆ๋‹ค. ์ง€๋‚œ 20๋…„ ๋™์•ˆ ํ•˜์ดํผ์Šค์ผ€์ผ ์ธํ”„๋ผ์—์„œ ์ด๋ฃจ์–ด์ง„ ๋น ๋ฅธ ํ˜์‹ ์€ ์•ž์œผ๋กœ๋„ ๊ณ„์†๋  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋˜๋ฉฐ, ํŠนํžˆ AI์˜ ๋ฐœ์ „์ด ์ด๋ฅผ ์ฃผ๋„ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. Meta๋Š” ํ•˜์ดํผ์Šค์ผ€์ผ ๊ธฐ์—…๋“ค์ด ์ธ์‚ฌ์ดํŠธ๋ฅผ ๊ณต์œ ํ•จ์œผ๋กœ์จ, ์ปค๋ฎค๋‹ˆํ‹ฐ ์ „์ฒด๊ฐ€ ๋ฐœ์ „ ์†๋„๋ฅผ ๋”์šฑ ๊ฐ€์†ํ™”ํ•  ์ˆ˜ ์žˆ๊ธฐ๋ฅผ ๊ธฐ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.

Acknowledgments

๋ณธ ๋ฌธ์„œ๋Š” 10๋…„ ์ด์ƒ์— ๊ฑธ์ณ ์ˆ˜์ฒœ ๋ช…์˜ Meta ์ธํ”„๋ผ ์—”์ง€๋‹ˆ์–ด๋“ค์ด ์ˆ˜ํ–‰ํ•œ ์ž‘์—…์„ ์š”์•ฝํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๋ฌธ์„œ์— ์„ค๋ช…๋œ ์ผ๋ถ€ ์‹œ์Šคํ…œ์—๋Š” ์ €์ž๊ฐ€ ์ง์ ‘ ๊ธฐ์—ฌํ–ˆ์ง€๋งŒ, ์ง์ ‘ ๊ด€์—ฌํ•˜์ง€ ์•Š์€ ์‹œ์Šคํ…œ๋„ ๋งŽ์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

References

  1. Abhashkumar, A. et al. Running BGP in data centers at scale. In Proceedings of the 18th USENIX Symp. on Networked Systems Design and Implementation. USENIX, (2021), 65โ€“81.
  2. Andreyev, A. Introducing data center fabric, the next-generation Facebook data center network. Engineering at Meta, (2014); https://engineering. fb.com/production-engineering/introducing-data- center-fabric-the-next-generation-facebook-data- center-network/
  3. Balakrishnan, M. et al. Virtual consensus in Delos. In Proceedings of the 14th USENIX Symp. on Operating Systems Design and Implementation. USENIX, (2020), 617โ€“632.
  4. Barroso, L.A., Hรถlzle, U., and Ranganathan, P. The Datacenter as a Computer: Designing Warehouse- Scale Machines. Springer Nature, (2019).
  5. Bronson, N. et al. TAO: Facebookโ€™s distributed data store for the social graph. In Proceedings of the 2013 USENIX Annual Technical Conf. USENIX, (2013), 49โ€“60.
  6. Campbell, L. and Tang, C. How Meta built the infrastructure for Threads. Engineering at Meta, (2023); https://engineering.fb.com/2023/12/19/core- infra/how-meta-built-the-infrastructure-for-threads/
  7. Chen, G.J. et al. Realtime data processing at Facebook. In Proceedings of the 2016 Intern. Conf. on Management of Data. ACM, (6), 1087โ€“1098.
  8. Choi, S. et al. FBOSS: Building switch software at scale. In Proceedings of the 2018 Conf. of the ACM Special Interest Group on Data Communication. ACM, (2018), 342โ€“356.
  9. Chou, D. Tajji: Managing global user traffic for large- scale Internet services at the edge. In Proceedings of the 27th Symp. on Operating Systems Principles. ACM, (2019), 430โ€“446.
  10. Choudhury, A. et al. MAST: Global Scheduling of ML Training across Geo-Distributed Datacenters at Hyper- scale. In Proceedings of the 18th USENIX Symp. on Operating Systems Design and Implementation. USENIX, (2024).
  11. Chow, M. et al. ServiceLab: Preventing tiny performance regressions at hyperscale through pre- production testing. In Proceedings of the 28th Symp. on Operating Systems Principles. ACM, (2024).
  12. Denis, M. et al. EBB: Reliable and evolvable express backbone network in Meta. In Proceedings of the ACM SIGCOMM 2023 Conf. ACM, (2023), 346โ€“359.
  13. Eriksen, M. et al. Global capacity management with Flux. In Proceedings of the 17th USENIX Symp. on Operating Systems Design and Implementation. USENIX, (2023).
  14. Ferreira, J. et al. Fabric Aggregator: A flexible solution to our traffic demand. Engineering at Meta, (2014); https://engineering.fb.com/data-center- engineering/fabric-aggregator-a-flexible-solution-to- our-traffic-demand/
  15. Firoozshahian, A. et al. MTIA: First generation silicon targeting Metaโ€™s recommendation systems. In Proceedings of the 50th Annual Intern. Symp. on Computer Architecture. ACM, (2023), 1โ€“13.
  16. Flinn, J. et al. Owl: Scale and flexibility in distribution of hot content. In Proceedings of the 16th USENIX Symp. on Operating Systems Design and Implementation. USENIX, (2022), 1โ€“15; https://www. usenix.org/conference/osdi22/presentation/flinn
  17. Frachtenberg, E. et al. Thermal design in the open compute datacenter. In Proceedings of the 13th InterSociety Conf. on Thermal and Thermomechanical Phenomena in Electronic Systems. IEEE, (2012), 530โ€“538.
  18. Gangidi, A. et al. RDMA over Ethernet for distributed training at Meta scale. In Proceedings of the ACM SIGCOMM 2024 Conf. ACM, (2024), 57โ€“70.
  19. Grubic, B. et al. Conveyor: One-tool-fits-all continuous software deployment at Meta. In Proceedings of the 17th USENIX Symp. on Operating Systems Design and Implementation. USENIX, (2023).
  20. Guo, M. et al. MobileConfig: Holistic configuration management for mobile apps. In Proceedings of the 21st USENIX Symp. on Networked Systems Design and Implementation. USENIX, (2024).
  21. Heo, T. et al. IOCost: Block io control for containers in datacenters. In Proceedings of the 27th ACM Intern. Conf. on Architectural Support for Programming Languages and Operating Systems. ACM, (2022), 595โ€“608.
  22. Kumar, N. et al. Optimizing resource allocation in hyperscale datacenters: Scalability, usability, and experiences. In Proceedings of the 18th USENIX Symp. on Operating Systems Design and Implementation. USENIX, (2024).
  23. Lee, S. et al. Shard Manager: A generic shard management framework for geodistributed applications. In Proceedings of the 28th Symp. on Operating Systems Principles. ACM, (2021).
  24. Masti, S. How we built a general purpose key value store for Facebook with ZippyDB. (2021); https:// engineering.fb.com/2021/08/06/core-data/zippydb/
  25. Meza, J.J. et al. Defcon: Preventing overload with graceful feature degradation. In Proceedings of the 17th USENIX Symp. on Operating Systems Design and Implementation. USENIX, (2023), 607โ€“622.
  26. Newell, A. et al. RAS: Continuously optimized region- wide datacenter resource allocation. In Proceedings of the 28th Symp. on Operating Systems Principles. ACM, (2021).
  27. Nishtala, R. et al. Scaling Memcache at Facebook. In Proceedings of the 10th USENIX Symp. on Networked Systems Design and Implementation. USENIX, (2013), 385โ€“398.
  28. Open Compute Project. (2024); https://www. opencompute.org/
  29. Pan, S. Facebookโ€™s Tectonic filesystem: Efficiency from exascale. In Proceedings of the 19th USENIX Conf. on File and Storage Technologies. USENIX, (2021), 217โ€“231.
  30. Sahraei, A. et al. XFaaS: Hyperscale and low cost serverless functions at Meta. In Proceedings of the 29th Symp. on Operating Systems Principles. ACM, (2023), 231โ€“246.
  31. Saokar, H. et al. ServiceRouter: A scalable and minimal cost service mesh. In Proceedings of the 17th USENIX Symp. on Operating Systems Design and Implementation. USENIX, (2023).
  32. Schlinker, B. et al. Engineering egress with Edge Fabric: Steering oceans of content to the world. In Proceedings of the Conf. of the ACM Special Interest Group on Data Communication. ACM, (2017), 418โ€“431.
  33. Tang, C. et al. Holistic configuration management at Facebook. In Proceedings of the 25th Symp. on Operating Systems Principles. ACM, (2015), 328โ€“343.
  34. Tang, C. et al. Twine: A unified cluster management system for shared infrastructure. In Proceedings of the 14th USENIX Symp. on Operating Systems Design and Implementation. USENIX, (2020), 787โ€“803.
  35. Veeraraghavan, K. et al. Kraken: Leveraging live traffic tests to identify and resolve resource utilization bottlenecks in large scale web services. In Proceedings of the 12th USENIX Symp. on Operating Systems Design and Implementation. USENIX, (2016), 635โ€“651.
  36. Veeraraghavan, K. et al. Maelstrom: Mitigating datacenter-level disasters by draining interdependent traffic safely and efficiently. In Proceedings of the 13th USENIX Symp. on Operating Systems Design and Implementation. USENIX, (2018), 373โ€“389.
  37. Weiner, J. et al. TMO: Transparent memory offloading in datacenters. In Proceedings of the 27th ACM Intern. Conf. on Architectural Support for Programming Languages and Operating Systems. ACM, (2022), 609โ€“621.
  38. Wu, Q. et al. Dynamo: Facebookโ€™s data center- wide power management system. ACM SIGARCH Computer Architecture News 44, 3 (2016), 469โ€“480.
  39. Yoon, D.Y. et al. FBDetect: Catching tiny performance regressions at hyperscale through in-production monitoring. In Proceedings of the 30th Symp. on Operating Systems Principles. ACM, (2024).
  40. Yu, K. and Kumar, R. Viewing the world as a computer: Global capacity management. Engineering at Meta, (2022); https://engineering.fb.com/2022/09/06/data- center-engineering/viewing-the-world-as-a-computer- global-capacity-management/