Rank #133 · on radar since 2026-07-03
Efficient-LLM-Inference-Serving-Systems
Why is LLM inference slow — and how do you make it fast? A hands-on, first-principles course: roofline → KV cache → quantization → parallelism → vLLM/SGLang, with GPU labs on open models.
Visit homepage ↗cudainference-optimizationsglangflash-attention+16
Momentum
87.5
24h–7d–
Why it's ranked
Every score decomposes into published factors — the same math for every tool, paid or not. Read the methodology →
| Velocity (weighted, cohort-normalized) | 0.908 |
| Signal decay | 0.995 |
| Corroboration | 1.000 |
| Quality gate | 1.000 |
Raw signals (30 days)
github · forks1 latest · 2 snapshots
github · stars13 latest · 2 snapshots