Submit repository
Discover trends that matter
Trending repositories
Daily
Weekly
Monthly
Yearly
Live mentions
Topics
GitHub trending
Repositories
Developers
Insights
Stats
TheTom/llama-cpp-turboquant — GitHub trending stats & insights | Trendshift
Sponsor spot open
·
promote your product
TheTom/llama-cpp-turboquant
#
Local LLM
LLM inference in C/C++
Visit GitHub
C++
1.8k
337
MIT License
Social mentions
Recent discussions about this repository across the web
A 5090 can now run Qwen 3.6 with 450K context. A llama.cpp fork adds Google's TurboQuant for KV cache and weights: - 4.6x KV compression at roughly 1% PPL loss - Multimodal support - Cross-backend…
@so_sthbryan · x.com
Looks like the TurboQuant fork of llama-cpp has merged the MTP PRs. Not sure turbo3 has done anything for memory usage tho. But has slowed it down a couple of %. Also needs TURBO_AUTO_ASYMMETRIC=0 to…
@conoro · x.com
Looking to sync llamacpp TheTom fork with upstream (especially with the MTP work), regression check welcome for non-metal targets:
@no_stp_on_snek · x.com
No trending activity
This repository has not yet been featured on GitHub Trending
Repository activities
repository's daily and monthly activities across stars, forks, merged PRs, issues, and closed issues