Reach 125K+ monthly visitors
Advertise on TrendshiftComputer vision
A tiny, fully-reproducible JEPA world model that learns the physics of a noisy bouncing DVD logo in representation space, dreams its future
Gesture-controlled computer vision system featuring selective invisibility, privacy-focused focus regions, and AR effects using OpenCV and MediaPipe.
AgentVision: Visual Perception System for AI Agents
Local SAM3 desktop app for text-prompt image segmentation.
Code Repository for "LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation“
Havada parmakla matematik yaz, yapay zekâ çözsün · Draw math in the air with your finger — MediaPipe hand tracking + Claude vision
A deep learning framework for 4-stage Alzheimer's Disease classification using T1 MRI scans. It features a ResNet-18 architecture with Squeeze-and-Excitation (SE) blocks. To handle severe class imbalance, Focal Loss and Weighted Sampling are used. Achieves 78.89% accuracy and 100% recall for Moderate Demented cases.
[ECCV 2026] Official Pytorch implementation for See & Sniff: Learning Visuo-Olfactory Representations
"A Python script that automatically detects faces using MediaPipe Tasks API and applies a smart padded Gaussian Blur with OpenCV to protect privacy."
A FiftyOne Plugin that flags dark, elongated, smooth blobs as raw carbon-fiber aircraft candidates in aerial survey imagery
# 简短要点总结 1. 基于PP-OCRv6(Tiny/Small/Medium三版本)搭建本地OCR工作台 2. 苹果硅芯片支持CoreML硬件加速 3. 提供模型一键切换功能 4. 配套OmniDocBench评测能力
Optical neural computation on a commodity smartphone: OLED screen + mirror + front camera as an analog matrix engine. 101 experiments, bilingual docs, and 6 Zenodo papers.
Give an AI eyes for video — turn a clip into a numbered frame grid + transcript a vision LLM can read.
Experiment upload image, generate 3d point cloud.
Model-agnostic state layer for world models — turn any detector (YOLO · VLM · DINO) into one standard, queryable stream of events + latent state. numpy-only, runs on CPU at the edge.
✨️ A Convolutional Neural Network implemented completely from scratch in C++ featuring custom tensors, Conv2D and Dense layers, forward and backward propagation, max pooling, dropout, cross-entropy loss, Adam optimization, and a custom JPEG decoder. No TensorFlow, PyTorch, OpenCV, or neural network libraries.
OpenCode plugin that lets text-only models work with user-provided and tool-returned images by sending each image to a vision-capable model first, then replacing the image with a text description.
Open research demo for dental CBCT anatomy segmentation using a fine-tuned NVIDIA NV-Segment-CT workflow, with 2D slice review, 3D mesh visualization