Presentation: Deploy MultiModal RAG Systems with vLLM

Stephen Batifol — Fri, 10 Oct 2025 14:12:00 GMT

Stephen Batifol discusses building and optimizing self-hosted, multimodal RAG systems. He breaks down vector search, nearest neighbor indexes (FLAT, IVF, HNSW), and the critical role of choosing the right embedding model. He then explains vLLM inference optimization (paged attention, quantization) and uses Mistral's Pixtral to detail multimodal large language model architecture.

By Stephen Batifol

InfoQ - Retrieval-Augmented Generation - Presentations

Presentation: Deploy MultiModal RAG Systems with vLLM