InfoQ - Scalability - Presentations

Presentation: Realtime and Batch Processing of GPU Workloads

Joseph Stein — Tue, 26 May 2026 09:08:00 GMT

Joseph Stein discusses engineering an enterprise AI-as-a-Service platform within a private cloud data center. He explains how to maximize underutilized GPU pools via multi-namespace scheduling, leverage Valkey and Lua for atomic priority queuing and backpressure management, mitigate OWASP Top 10 LLM risks via central proxy gateways, and scale batch pipelines using a custom S3-to-Kafka proxy.

By Joseph Stein

Presentation: The AI Gateway: Scaling Centralized Inference across Decentralized Teams

Meryem Arik — Wed, 20 May 2026 12:40:00 GMT

Meryem Arik discusses why modern engineering teams face "inference chaos" and how AI model gateways provide a critical control layer. She explains the balance between empowering decentralized teams to choose the best models and maintaining centralized oversight for security, RBAC, and cost control. Explore open-source solutions like LiteLLM and Doubleword to streamline your AI infra.

By Meryem Arik