Submit repository
Discover trends that matter
Trending repositories
Daily
Weekly
Monthly
Yearly
Live mentions
Topics
GitHub trending
Repositories
Developers
Insights
Stats
huawei-csl/KVarN — GitHub trending stats & insights | Trendshift
Sponsor spot open
·
promote your product
huawei-csl/KVarN
#
AI infrastructure
KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
Visit GitHub
Python
141
4
2 contributors
Apache License 2.0
website
Social mentions
Recent discussions about this repository across the web
I think this might be the first KV quantisation that will work well...
@snapolino · x.com
华为在 GitHub 开源 KVarN 把大模型推理中最大的内存开销直接压缩。 同样的模型,内存少用一半,账单少付一半 ,果汁分你一半
@longlongsongs · x.com
The weird part about KV cache quantization failing on reasoning isn't angle distortion, it's just magnitude error. Huawei's KVarN fixes this with a Hadamard rotation and dual-axis variance…
@sakurayukiai · x.com
🤖 KVarN ⭐ 115 stars Get 3-5x more context for your LLM agents with throughput that beats FP16. No calibration, lossless accuracy, and enabled with a single flag. 🔗 🌐 #AI #MachineLearning
@Marco_Ramilli · x.com
KVarN ⚡️ Built for agentic and long-context workloads. 💡 KVarN delivers 3-5x more KV-cache capacity and up to ~1.3x the throughput of FP16, so you fit far longer contexts and serve more concurrent…
@LeopolisDream · x.com
this actually has the potential to be quite impactful for local models (though currently only available in vLLM). KV growth can be a major limit factor for long ctx use (e.g., ~8GB of memory at 8/8…
@JakeKAllDay · x.com
Huawei released KVarN, a KV-cache compression method for LLMs It delivers 3-5x more context length, beats FP16 throughput, and keeps FP16 accuracy. One flag in vLLM. Zero calibration.
@HuggingPapers · x.com
No trending activity
This repository has not yet been featured on GitHub Trending
Repository activities
repository's daily and monthly activities across stars, forks, merged PRs, issues, and closed issues