Presentation: Scaling Large Language Model Serving Infrastructure at Meta

Ye Qi — Thu, 29 May 2025 13:11:00 GMT

Ye (Charlotte) Qi overviews LLM serving infrastructure challenges: fitting & speed (Model Runners, KV cache, and distributed inference), production complexities (latency optimization and continuous evaluation), and effective scaling strategies (heterogeneous deployment and autoscaling). Learn key concepts for robust LLM deployment.

By Ye Qi

InfoQ - Facebook - Presentations

Presentation: Scaling Large Language Model Serving Infrastructure at Meta