noonghunna/club-3090 — GitHub trending stats & insights

Reach 125K+ monthly visitors

noonghunna/club-3090

Community recipes for serving LLMs on RTX 3090/CUDA gpus. Multi-engine (vLLM, llama.cpp, SGLang) and model-agnostic. Currently shipping Qwen3.6-27B Qwen3.6 35B Gemma 4 26B Gemma 4 31B configs for 1× and 2× cards.

Visit GitHub

Python

1.5k

14 contributors

Apache License 2.0

Social mentions

Recent discussions about this repository across the web

Ran Qwen3.6-27B with vision on a single RTX 3090 24GB using ik_llama.cpp — here's the full breakdown.

@nawariokra · x.com

I was prepared for Fable fumble, and you?

@malikwas1f · x.com

Upto 1100 tps on RTX 3090x2 for Diffusion Gemma 4 26B. Unleash this mini monster on your gpus now! If you are running nvidia gpus locally, come grab the recipe at club-3090. P.S. a ⭐️ on Github is…

@malikwas1f · x.com

well it is basicaly the config i tried ( . i was on it but not same speed as expected

@DragonGroky · x.com

Gemma4 12b is now available to everyone on club-3090. Go ahead and give it a shot if you have RTX 3090/4090/5090 P.S. MTP is currently blocked on llama/beellama.

@malikwas1f · x.com

Gemma 4 12b, latest situation. 121+ tps with MTP on vllm + 2 x 3090s, 2.9 * concurrency. @googlegemma @vllm_project Can you fix #39914 please. Clubbers! Grab the recipes from

@malikwas1f · x.com

Gemma 4 12b, latest situation on 2x3090s. @googlegemma @vllm_project Can you fix #39914 please. Clubbers! Grab the recipes from

@malikwas1f · x.com

5/5 My current takeaway: For fast single-card Gemma-4 on 3090/4090-class hardware, the real path forward seems to be: engine-specific optimizations smarter KV handling MTP/spec decode Gemma-tuned…

@malikwas1f · x.com

Dual RTX 3090s took me from 40-50 tok/s to 70 tok/s. Switched from Windows to Ubuntu and hit 120 tok/s. Windows had CPU at 90C idle. Ubuntu runs 38C idle, 85C full load. Linux beats Windows for AI…

@Tech2Wild · x.com

在RTX 3090上跑大模型这件事，club-3090直接给了现成部署方案。本质就是个“配方库”：自动配好vLLM、llama.cpp多引擎，连Qwen3.6-27B这种都能在1-2张卡上跑起来。省去你自己查参数、改补丁的时间，脚本按步骤走就行。说白了就为了让本地玩LLM不卡在配置上。

@vintcessun · x.com

Social mentions

Repository activities