Computer vision
A feed-forward 3D foundation model for reconstructing scenes from streaming data
Fine-tune Gemma 4 and 3n with audio, images and text on Apple Silicon, using PyTorch and Metal Performance Shaders.
SteerViT is a framework that equips any ViT with the ability to steer both its global and local visual representations with natural language.
A simple video streaming baseline that outperforms SOTAs.
"Single-image Layer Decomposition for Anime Characters" (SIGGRAPH 2026, Conditionally Accepted)
[CVPR 2026 Highlight] A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens
[CVPR 2026] LoST: Level of Semantics Tokenization for 3D Shapes
Official implementation of "Repurposing Geometric Foundation Models for Multi-view Diffusion"
Our method reconstructs 3D worlds from video diffusion models using non-rigid alignment to resolve inherent 3D inconsistencies in the generated sequences.
The official implementation of “MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction”
SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation
Netryx is a powerful, locally-hosted geolocation tool that uses state-of-the-art computer vision to identify the exact coordinates of a street-level image. It replicates the core pipeline of high-end geolocation SaaS platforms but runs entirely on your local hardware.
A fully on‑device Android‑native aim assistant that helps visually impaired players detect and track opponents in realtime
Open source robot vision framework for edge devices
A diffusion-based framework for document OCR that replaces autoregressive decoding with block-level parallel diffusion decoding.
A visual-based graph node editor for training computer vision models.
[CVPR 2026] 3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image
Fast SAM 3D Body: Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery