This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
alphaXiv
alphaxiv.org
did:plc:xqp2wfy2sz5m7n2mu322izo2
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
LServe uses hybrid sparse attention to reduce computation and memory. It unifies static and dynamic sparsity patterns and employs a hierarchical KV page selection policy to optimize both prefilling and decoding stages.
2025-02-22T21:08:18.640Z