vLLM

inference engineOpen Source

PRIMARY LANGUAGE: Python • LICENSE: Apache License 2.0

GitHub Stars85,344+320 stars last 7d

Community rating★ 4.8Average based on 512 reviews

Hugging Face SyncedYesNative spaces support

A high-throughput and memory-efficient LLM serving engine. Features PagedAttention to eliminate memory waste in KV caches.

SGLangTGITensorRT-LLM

No active discussions referencing this tool on Hacker News or Reddit.