Browse all topics

AI voice

Jupyter Notebook
851
83
Open Source Speech Language Model
Real-time transcription and AI assistant for Meta Ray-Ban smart glasses. Live speech-to-text, speaker diarization, Gemini Live vision+voice, and WebRTC streaming.
Talk to your Mac, query your docs, no cloud required. On-device voice AI + RAG
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
Allow your 🦞 bot to Shout, Speak, with "human" vibe
🎙️Voice input and translation app for macOS. Press to talk, release to paste.
WhisperCrabs is a simple terminal-based floating recording button, click, and transcribe to input
🌋LavaSR: Fast Speech restoration and enhancement
Warcraft III Peon voice notifications (+ more!) for Claude Code, Codex, IDEs, and any AI agent. Stop babysitting your terminal. Employ a Peon today.
A real-time and multilingual speech translation model
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.
Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
Real-time AI assistant for Meta Ray-Ban smart glasses -- voice + vision + agentic actions via Gemini Live and OpenClaw
Your voice is the fastest interface to AI
🎵 The Ultimate Open Source Suno Alternative - Professional UI for ACE-Step 1.5 AI Music Generation. Free, local, unlimited. Stop paying for Suno!
Offline voice control for Linux window managers built with Rust and Vosk
ComfyUI custom nodes for Qwen3-ASR (Automatic Speech Recognition) - audio-to-text transcription supporting 52 languages and dialects.
Run Qwen3-TTS text-to-speech locally on Mac (M1/M2/M3/M4). Voice cloning, voice design, custom voices. 100% offline using MLX.
MOVA: Towards Scalable and Synchronized Video–Audio Generation
TypeScript
13.8k
1.6k
The open-source voice synthesis studio
A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.