InfoQ - Optimization

Netflix Serves 84% of Query Results from Cache with Interval-Aware Caching in Apache Druid

Leela Kumili — Mon, 11 May 2026 14:36:00 GMT

Netflix improves Apache Druid performance with interval aware caching, serving 84% of analytics results from cache and reducing query load by 33%. The system decomposes rolling window queries into reusable time segments, enabling partial cache reuse and recomputation only for recent data. At scale, it reduces scan volume, improves P90 latency, and optimizes real time analytics workloads.

By Leela Kumili

OpenAI Introduces Websocket-Based Execution Mode to Reduce Latency in Agentic Workflows

Leela Kumili — Thu, 07 May 2026 14:48:00 GMT

OpenAI introduces a WebSocket-based execution mode for its Responses API to improve agentic workflow performance in coding agents and real-time AI systems. The update reduces latency by up to 40 percent by replacing HTTP request-response cycles with persistent connections, improving streaming, tool execution, and multi-step orchestration in production-scale AI systems.

By Leela Kumili

Cloudflare Builds High-Performance Infrastructure for Running LLMs

Renato Losio — Sun, 03 May 2026 10:58:00 GMT

Cloudflare has recently announced new infrastructure designed to run large AI language models across its global network. As these models rely on costly hardware and must handle large volumes of incoming and outgoing text, Cloudflare separates the model's input processing and output generation onto different optimized systems.

By Renato Losio