AI voice
MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run directly on CPU without a GPU, and keeps the deployment stack simple enough for local demos, web serving, and lightweight product integration.
On-device, real-time multimodal AI. Have natural voice and vision conversations with an AI that runs entirely on your machine. Powered by Gemma 4 E2B and Kokoro.
List of all local & free open-source voice-clone TTS models and music generation models.
Turn any content into a personalized AI podcast. NotebookLM-style, except you control the script, voices, and hosts. Listen in Apple Podcasts, Spotify, or any podcast app.
High-Quality Voice Cloning TTS for 600+ Languages
Open-source AI interview platform for voice, chat & video
A self-improving loop for voice AI agents. Uses karpathy's autoresearch as foundation.
Fully local voice AI for iOS
Native iOS app for talking to your OpenClaw agents by voice or text. On-device speech recognition, streaming responses, multi-agent channels.
Curated list of open-source speech-to-text and voice typing tools for Linux, macOS, Windows, Android, and iOS. Offline, local, and cloud.
Real-time transcription and AI assistant for Meta Ray-Ban smart glasses. Live speech-to-text, speaker diarization, Gemini Live vision+voice, and WebRTC streaming.
Thoth - Personal AI Sovereignty. A local-first AI assistant with integrated tools, a personal knowledge graph, voice, vision, shell, browser automation, scheduled tasks, health tracking, and messaging channels. Run locally via Ollama or add opt-in cloud models. Your data stays on your machine.
Open-source Indian language text-to-speech server — 22 languages, 44 speakers, WebSocket + REST API. Wraps ai4bharat/indic-parler-tts.
Muesli - local meeting transcription + dictation for macOS (Granola + WisprFlow alternative)
Talk to your Mac, query your docs, no cloud required. On-device voice AI + RAG
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD