Reach 125K+ monthly visitors
Advertise on TrendshiftAI infrastructure
Serverless-GPU LLM serving: scale-to-zero with fast GPU snapshot/restore (cuda-checkpoint), multi-tenant packing, and an OpenAI-compatible API — built on vLLM.
A from-scratch LLM inference engine in Rust with a real tensor-graph compiler. Loads GGUF models and runs Qwen2 end to end with hand-written quantized NEON kernels, operator fusion, and liveness-based memory planning. No ML frameworks.
Step-by-step guide to set up Strix Halo NPU (Ryzen AI Max+ 395) with IRON, Peano, Chess, FastFlowLM — 31 TFLOPS
An open-source API Gateway & background daemon designed to queue inference surges and scale cloud GPUs down to zero when idle.
opencode plugin that silently routes NeuralWatt traffic to flex models and reports per-session usage, cost, energy and carbon telemetry.
An Open-Source, Decoupled, Agentic AI Framework for Privacy-First Edge Deployment
GLM-5.2 (744B/40B MoE) on a 4× DGX Spark / GB10 (sm_121) cluster: portable Triton sparse-MLA kernels, a data-free expert prune, MTP draft, and a one-script bootstrap.
Tensor-native semantic LLM cache and distributed data plane
A native C++ PyTorch compiler and execution engine that transparently expands GPU VRAM using NVMe and system RAM. Achieve massive model training and inference on consumer hardware with compiler-guaranteed async I/O latency hiding.
Self-host the modern LLM stack.
Open-source sandboxes for AI agents — run untrusted, AI-generated code safely on your own machine.
AlphaFast: ultra-high-throughput AlphaFold3 inference with MMSeqs2-GPU
A universal infrastructure layer for generative biology
High-performance Knowledge Graph engine for AI, LLMs, and GraphRAG — built for the next generation of intelligent applications.
Local-first distributed inference for Apple Silicon fleets
A protocol for hosting sharded frontier grade open source models on network of untrusted GPUs (non-TEE) with computational privacy guarantees and mathematical verifiability
Mixed NVFP4 serving of DeepSeek V4 Flash on DGX Spark (GB10) - fork of antirez/ds4 with REAP expert pruning, NVFP4 quantization, FP8-packed KV cache, and managed-memory serving
A default-deny capability floor the model can't talk past, plus an addressable KV cache — in one Go binary.