Submit repository
Discover trends that matter
Daily explore
Topics
GitHub trending
Repositories
Developers
Repository engagements
Insights
Stats
ggml-org/llama.cpp — GitHub trending stats & insights | Trendshift
Featured
Openhuman
Embed Badge
Visit GitHub
ggml-org/llama.cpp
#
Local LLM
#
Self-hosted
LLM inference in C/C++
Data last synced with GitHub 2 days ago
C++
109.3k
18k
1,661 contributors
last commit 2 days ago
last user commit 2 days ago
MIT License
created about 3 years ago
Social mentions
Recent discussions about this repository across the web
More Qwen3.6-27B MTP success but on dual Mi50s
r/LocalLLaMA · 2 days ago
Running Qwen3.5 / Qwen3.6 with NextN MTP (Multi-Token Prediction) speculative decode in llama.cpp — single RTX 3090 Ti GPU guide
r/LocalLLaMA · 5 days ago
why llama.cpp can’t combine speculative decode methods?
r/LocalLLaMA · 5 days ago
Get faster qwen 3.6 27b
r/LocalLLaMA · 5 days ago
Qwen3.6-27B with MTP grafted on Unsloth UD XL: 2.5x throughput via unmerged llama.cpp PR
r/LocalLLaMA · 6 days ago
2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints
r/LocalLLaMA · 6 days ago
MTP on strix halo with llama.cpp (PR #22673)
r/LocalLLaMA · 6 days ago
My setup for running Qwen3.6-35B-A3B-UD-Q4_K_M on single RX7900XT (20GB VRAM)
r/unsloth · 6 days ago
I Ralph-looped Opus overnight. It reduced my local model switching with cold backfilling context of 135k+ on llama.cpp from ~165s -> 5s! TL;DR - USE SLOTS!
r/LocalLLM · 7 days ago
Qwen3.6-27B DFlash on a 24GB RTX 5090 Laptop (sm_120) — 80 t/s avg via spiritbuun's buun-llama-cpp + Q8_0 GGUF drafter
r/Qwen_AI · 7 days ago
Load more
Repository activities
repository's daily and monthly activities across stars, forks, merged PRs, issues, and closed issues
GitHub trending history
Shows when the repository has appeared on GitHub Trending across any language
all language ranking
c++ ranking
c ranking