The August 26, 2025 collaboration between Stanford, NVIDIA, Shanghai Jiao Tong University, University of Michigan, University of Colorado Boulder, Carnegie Mellon University introduces **Strata**, a hierarchical context caching framework designed to improve the performance of serving Large Language Models (LLMs) with long context windows. The core problem Strata addresses is that while caching key-value (KV) states is essential for efficiency, transferring large, fragmented cached contexts from slower memory tiers (like CPU memory) back to the GPU creates **severe I/O bottlenecks and performance stalls**. It also describes why paged attention creates data fragmentation when offloading even though its goal is to address memory fragmentation. That is paged attention becomes an issue when using offloading due to large contexts. Strata overcomes these issues through two main innovations: **GPU-assisted I/O** to mitigate data fragmentation and achieve high bandwidth utilization, and **cache-aware request scheduling** to intelligently form balanced batches and overlap unavoidable I/O stalls with complementary tasks. The evaluation shows that Strata significantly reduces the **Time-To-First-Token (TTFT)** and increases throughput compared to state-of-the-art serving systems like vLLM + LMCache and TensorRT-LLM on long-context benchmarks.
Source:
https://arxiv.org/html/2508.18572v1
Spooky Podcasts from iHeartRadio
Whether you’re a scaredy-cat or a brave bat, this collection of episodes from iHeartPodcasts will put you in the Halloween spirit. Binge stories, frights, and more that may keep you up at night!
Dateline NBC
Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com
Stuff You Should Know
If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.