TheTom/llama-cpp-turboquant — GitHub trending stats & insights | Trendshift

Footer

Your go-to destination for discovering trending open-source projects and uncovering the insights that matter

Legal

Terms and conditions
Privacy policy

More

onedump
gosnakego
Contact us

Social

X (Twitter)
GitHub

What they build speaks louder than words

© 2026 Trendshift, created by Julian Li with ❤️

Discover trends that matter

Daily Weekly Monthly Yearly

Live mentions Topics

Repositories Developers

TheTom/llama-cpp-turboquant — GitHub trending stats & insights | Trendshift

Sponsor spot open·promote your product

TheTom/llama-cpp-turboquant

LLM inference in C/C++

C++

1.8k

337

MIT License

Social mentions

Recent discussions about this repository across the web

A 5090 can now run Qwen 3.6 with 450K context. A llama.cpp fork adds Google's TurboQuant for KV cache and weights: - 4.6x KV compression at roughly 1% PPL loss - Multimodal support - Cross-backend…

@so_sthbryan · x.com

Looks like the TurboQuant fork of llama-cpp has merged the MTP PRs. Not sure turbo3 has done anything for memory usage tho. But has slowed it down a couple of %. Also needs TURBO_AUTO_ASYMMETRIC=0 to…

@conoro · x.com

Looking to sync llamacpp TheTom fork with upstream (especially with the MTP work), regression check welcome for non-metal targets:

@no_stp_on_snek · x.com

No trending activity

This repository has not yet been featured on GitHub Trending

Repository activities

repository's daily and monthly activities across stars, forks, merged PRs, issues, and closed issues

Sponsorships//Open

100K+ monthly visitors

High-intent: developers tracking what's rising in OSS to decide what to try next

No tracking pixels, no popups

Promote your product