Featured

Audio processing

New 2026

MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run directly on CPU without a GPU, and keeps the deployment stack simple enough for local demos, web serving, and lightweight product integration.

New 2026

Fine-tune Gemma 4 and 3n with audio, images and text on Apple Silicon, using PyTorch and Metal Performance Shaders.

New 2026

Turn any content into a personalized AI podcast. NotebookLM-style, except you control the script, voices, and hosts. Listen in Apple Podcasts, Spotify, or any podcast app.

New 2026

Give Claude the ability to watch and understand videos — Claude Code plugin with frame extraction and multimodal audio analysis

New 2026

Local meeting transcription → Obsidian vault. No cloud, no API keys.

New 2026

100% private on-device voice models for speech-to-text and meeting transcription on macOS

New 2026

Transcribe microphone and computer audio to markdown.

New 2026

Agentic Hours-Long Video Editing via Music Synchronization

New 2026

Every meeting, every idea, every voice note — searchable by your AI. Open-source, privacy-first conversation memory layer.

New 2026

Personal project on Rust aimed to help understand foreign language better. Uses VAD+Whisper to transcribe, then translate according to the custom dictionary.

New 2026

Machine learning powered Karaoke app (with scores!)

New 2026

Fully local meeting transcription with speaker diarization, AI summaries, and PDF output

New 2026

The Infinite Crate is a DAW plugin built on JUCE, React, and the Lyria RealTime live music model

New 2026

NPM Library to transcribe Audio & Videos completely in browser with WebGPU and WebCodecs. 100% private and offline with WASM fallbacks

New 2026

Real-time transcription and AI assistant for Meta Ray-Ban smart glasses. Live speech-to-text, speaker diarization, Gemini Live vision+voice, and WebRTC streaming.

New 2026

✨✨[ICML 2026] Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

New 2026

Muesli - local meeting transcription + dictation for macOS (Granola + WisprFlow alternative)

New 2026

YouTube Music client for Android

New 2026

A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD