Text to speech
MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run directly on CPU without a GPU, and keeps the deployment stack simple enough for local demos, web serving, and lightweight product integration.
List of all local & free open-source voice-clone TTS models and music generation models.
Turn any content into a personalized AI podcast. NotebookLM-style, except you control the script, voices, and hosts. Listen in Apple Podcasts, Spotify, or any podcast app.
High-Quality Voice Cloning TTS for 600+ Languages
100% private on-device voice models for speech-to-text and meeting transcription on macOS
Fully local voice AI for iOS
Native iOS app for talking to your OpenClaw agents by voice or text. On-device speech recognition, streaming responses, multi-agent channels.
Open-source Indian language text-to-speech server — 22 languages, 44 speakers, WebSocket + REST API. Wraps ai4bharat/indic-parler-tts.
Free, fully local macOS menu bar app for speech-to-text with LLM post-processing. Open-source SuperWhisper alternative for Apple Silicon.
Talk to your Mac, query your docs, no cloud required. On-device voice AI + RAG
MacOS desktop speech-to-text. Private, offline, open-source. Choose your own STT (Speech to Text) providers, and LLM providers that polish your mumbles. Bind any prompt templates with any shorcuts
Allow your 🦞 bot to Shout, Speak, with "human" vibe
Desktop app with Compose Multiplatform to use Qwen3-TTS with an UI.
Automated YouTube Shorts pipeline: news → script → AI visuals → voiceover → captions → upload
Give OpenClaw a voice — Let your agent speak from any Mac on your network
YumCut - free AI video generator to turn a prompt into ready vertical videos for TikTok, Reels and YouTube Shorts. Auto script, scenes, voiceover, subtitles and watermark. Built with Next.js. Local-first pipeline + templates, batch rendering and API hooks for creators and indie makers. Self-hosted, FFmpeg-ready, multi-language output. Low cost fast
Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.