Audio processing
MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run directly on CPU without a GPU, and keeps the deployment stack simple enough for local demos, web serving, and lightweight product integration.
Fine-tune Gemma 4 and 3n with audio, images and text on Apple Silicon, using PyTorch and Metal Performance Shaders.
Turn any content into a personalized AI podcast. NotebookLM-style, except you control the script, voices, and hosts. Listen in Apple Podcasts, Spotify, or any podcast app.
Local meeting transcription → Obsidian vault. No cloud, no API keys.
Local meeting transcription → Obsidian vault. No cloud, no API keys.
100% private on-device voice models for speech-to-text and meeting transcription on macOS
Transcribe microphone and computer audio to markdown.
Agentic Hours-Long Video Editing via Music Synchronization
Every meeting, every idea, every voice note — searchable by your AI. Open-source, privacy-first conversation memory layer.
Personal project on Rust aimed to help understand foreign language better. Uses VAD+Whisper to transcribe, then translate according to the custom dictionary.
Fully local voice AI for iOS
Fully local meeting transcription with speaker diarization, AI summaries, and PDF output
The Infinite Crate is a DAW plugin built on JUCE, React, and the Lyria RealTime live music model
NPM Library to transcribe Audio & Videos completely in browser with WebGPU and WebCodecs. 100% private and offline with WASM fallbacks
Real-time transcription and AI assistant for Meta Ray-Ban smart glasses. Live speech-to-text, speaker diarization, Gemini Live vision+voice, and WebRTC streaming.
Muesli - local meeting transcription + dictation for macOS (Granola + WisprFlow alternative)
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
A beautiful Windows program for analyzing audio quality — detects fake lossless, clipping, MQA, and AI-generated audio; includes a spectrogram viewer and more. Built-in player with EQ and spatial audio.
MacOS desktop speech-to-text. Private, offline, open-source. Choose your own STT (Speech to Text) providers, and LLM providers that polish your mumbles. Bind any prompt templates with any shorcuts