Reach 125K+ monthly visitors

Advertise on Trendshift

AI infrastructure

New 2026

Serverless-GPU LLM serving: scale-to-zero with fast GPU snapshot/restore (cuda-checkpoint), multi-tenant packing, and an OpenAI-compatible API — built on vLLM.

New 2026

A from-scratch LLM inference engine in Rust with a real tensor-graph compiler. Loads GGUF models and runs Qwen2 end to end with hand-written quantized NEON kernels, operator fusion, and liveness-based memory planning. No ML frameworks.

New 2026

Step-by-step guide to set up Strix Halo NPU (Ryzen AI Max+ 395) with IRON, Peano, Chess, FastFlowLM — 31 TFLOPS

New 2026

An open-source API Gateway & background daemon designed to queue inference surges and scale cloud GPUs down to zero when idle.

New 2026

opencode plugin that silently routes NeuralWatt traffic to flex models and reports per-session usage, cost, energy and carbon telemetry.

New 2026

An Open-Source, Decoupled, Agentic AI Framework for Privacy-First Edge Deployment

New 2026

GLM-5.2 (744B/40B MoE) on a 4× DGX Spark / GB10 (sm_121) cluster: portable Triton sparse-MLA kernels, a data-free expert prune, MTP draft, and a one-script bootstrap.

New 2026

Tensor-native semantic LLM cache and distributed data plane

New 2026

A native C++ PyTorch compiler and execution engine that transparently expands GPU VRAM using NVMe and system RAM. Achieve massive model training and inference on consumer hardware with compiler-guaranteed async I/O latency hiding.

New 2026

Fine we'll do some Python

New 2026

Open-source sandboxes for AI agents — run untrusted, AI-generated code safely on your own machine.

New 2026

AlphaFast: ultra-high-throughput AlphaFold3 inference with MMSeqs2-GPU

New 2026

A universal infrastructure layer for generative biology

New 2026

High-performance Knowledge Graph engine for AI, LLMs, and GraphRAG — built for the next generation of intelligent applications.

New 2026

Local-first distributed inference for Apple Silicon fleets

New 2026

A protocol for hosting sharded frontier grade open source models on network of untrusted GPUs (non-TEE) with computational privacy guarantees and mathematical verifiability

New 2026

Mixed NVFP4 serving of DeepSeek V4 Flash on DGX Spark (GB10) - fork of antirez/ds4 with REAP expert pruning, NVFP4 quantization, FP8-packed KV cache, and managed-memory serving

New 2026

A default-deny capability floor the model can't talk past, plus an addressable KV cache — in one Go binary.