AI prompts
base on # π£οΈ Open TTS Tracker
A one stop shop to track all open-access/ source TTS models as they come out. Feel free to make a PR for all those that aren't linked here.
This is aimed as a resource to increase awareness for these models and to make it easier for researchers, developers, and enthusiasts to stay informed about the latest advancements in the field.
> [!NOTE]
> This repo will only track open source/access codebase TTS models. More motivation for everyone to open-source! π€
| Name | GitHub | Weights | License | Fine-tune | Languages | Paper | Demo | Issues |
|---|---|---|---|---|---|---|---|---|
| Amphion | [Repo](https://github.com/open-mmlab/Amphion) | [π€ Hub](https://huggingface.co/amphion) | [MIT](https://github.com/open-mmlab/Amphion/blob/main/LICENSE) | No | Multilingual | [Paper](https://arxiv.org/abs/2312.09911) | [π€ Space](https://huggingface.co/amphion) | |
| AI4Bharat | [Repo](https://github.com/AI4Bharat/Indic-TTS) | [π€ Hub](https://huggingface.co/ai4bharat) | [MIT](https://github.com/AI4Bharat/Indic-TTS/blob/master/LICENSE.txt) | [Yes](https://github.com/AI4Bharat/Indic-TTS?tab=readme-ov-file#training-steps) | Indic | [Paper](https://arxiv.org/abs/2211.09536) | [Demo](https://models.ai4bharat.org/#/tts) |
| Bark | [Repo](https://github.com/huggingface/transformers/tree/main/src/transformers/models/bark) | [π€ Hub](https://huggingface.co/suno/bark) | [MIT](https://github.com/suno-ai/bark/blob/main/LICENSE) | No | Multilingual | [Paper](https://arxiv.org/abs/2209.03143) | [π€ Space](https://huggingface.co/spaces/suno/bark) | |
| EmotiVoice | [Repo](https://github.com/netease-youdao/EmotiVoice) | [GDrive](https://drive.google.com/drive/folders/1y6Xwj_GG9ulsAonca_unSGbJ4lxbNymM) | [Apache 2.0](https://github.com/netease-youdao/EmotiVoice/blob/main/LICENSE) | [Yes](https://github.com/netease-youdao/EmotiVoice/wiki/Voice-Cloning-with-your-personal-data) | ZH + EN | Not Available | Not Available | Separate [GUI agreement](https://github.com/netease-youdao/EmotiVoice/blob/main/EmotiVoice_UserAgreement_%E6%98%93%E9%AD%94%E5%A3%B0%E7%94%A8%E6%88%B7%E5%8D%8F%E8%AE%AE.pdf) |
| Glow-TTS | [Repo](https://github.com/jaywalnut310/glow-tts) | [GDrive](https://drive.google.com/file/d/1JiCMBVTG4BMREK8cT3MYck1MgYvwASL0/view) | [MIT](https://github.com/jaywalnut310/glow-tts/blob/master/LICENSE) | [Yes](https://github.com/jaywalnut310/glow-tts?tab=readme-ov-file#2-pre-requisites) | English | [Paper](https://arxiv.org/abs/2005.11129) | [GH Pages](https://jaywalnut310.github.io/glow-tts-demo/index.html) | |
| GPT-SoVITS | [Repo](https://github.com/RVC-Boss/GPT-SoVITS) | [π€ Hub](https://huggingface.co/lj1995/GPT-SoVITS) | [MIT](https://github.com/RVC-Boss/GPT-SoVITS/blob/main/LICENSE) | [Yes](https://github.com/RVC-Boss/GPT-SoVITS?tab=readme-ov-file#pretrained-models) | Multilingual | Not Available | Not Available | |
| HierSpeech++ | [Repo](https://github.com/sh-lee-prml/HierSpeechpp) | [GDrive](https://drive.google.com/drive/folders/1-L_90BlCkbPyKWWHTUjt5Fsu3kz0du0w) | [MIT](https://github.com/sh-lee-prml/HierSpeechpp/blob/main/LICENSE) | No | KR + EN | [Paper](https://arxiv.org/abs/2311.12454) | [π€ Space](https://huggingface.co/spaces/LeeSangHoon/HierSpeech_TTS) | |
| IMS-Toucan | [Repo](https://github.com/DigitalPhonetics/IMS-Toucan) | [GH release](https://github.com/DigitalPhonetics/IMS-Toucan/tags) | [Apache 2.0](https://github.com/DigitalPhonetics/IMS-Toucan/blob/ToucanTTS/LICENSE) | [Yes](https://github.com/DigitalPhonetics/IMS-Toucan#build-a-toucantts-pipeline) | Multilingual | [Paper](https://arxiv.org/abs/2206.12229) | [π€ Space](https://huggingface.co/spaces/Flux9665/IMS-Toucan) | |
| MahaTTS | [Repo](https://github.com/dubverse-ai/MahaTTS) | [π€ Hub](https://huggingface.co/Dubverse/MahaTTS) | [Apache 2.0](https://github.com/dubverse-ai/MahaTTS/blob/main/LICENSE) | No | English + Indic | Not Available | [Recordings](https://github.com/dubverse-ai/MahaTTS/blob/main/README.md#sample-outputs), [Colab](https://colab.research.google.com/drive/1qkZz2km-PX75P0f6mUb2y5e-uzub27NW?usp=sharing) | |
| Matcha-TTS | [Repo](https://github.com/shivammehta25/Matcha-TTS) | [GDrive](https://drive.google.com/drive/folders/17C_gYgEHOxI5ZypcfE_k1piKCtyR0isJ) | [MIT](https://github.com/shivammehta25/Matcha-TTS/blob/main/LICENSE) | [Yes](https://github.com/shivammehta25/Matcha-TTS/tree/main#train-with-your-own-dataset) | English | [Paper](https://arxiv.org/abs/2309.03199) | [π€ Space](https://huggingface.co/spaces/shivammehta25/Matcha-TTS) | GPL-licensed phonemizer |
| MetaVoice-1B | [Repo](https://github.com/metavoiceio/metavoice-src) | [π€ Hub](https://huggingface.co/metavoiceio/metavoice-1B-v0.1/tree/main) | [Apache 2.0](https://github.com/metavoiceio/metavoice-src/blob/main/LICENSE) | [Yes](https://github.com/metavoiceio/metavoice-src?tab=readme-ov-file) | Multilingual | Not Available | [π€ Space](https://ttsdemo.themetavoice.xyz/) | |
| Neural-HMM TTS | [Repo](https://github.com/shivammehta25/Neural-HMM) | [GitHub](https://github.com/shivammehta25/Neural-HMM/releases) | [MIT](https://github.com/shivammehta25/Neural-HMM/blob/main/LICENSE) | [Yes](https://github.com/shivammehta25/Neural-HMM?tab=readme-ov-file#setup-and-training-using-lj-speech) | English | [Paper](https://arxiv.org/abs/2108.13320) | [GH Pages](https://shivammehta25.github.io/Neural-HMM/) | |
| OpenVoice | [Repo](https://github.com/myshell-ai/OpenVoice) | [π€ Hub](https://huggingface.co/myshell-ai/OpenVoice) | [CC-BY-NC 4.0](https://github.com/myshell-ai/OpenVoice/blob/main/LICENSE) | No | ZH + EN | [Paper](https://arxiv.org/abs/2312.01479) | [π€ Space](https://huggingface.co/spaces/myshell-ai/OpenVoice) | Non Commercial |
| OverFlow TTS | [Repo](https://github.com/shivammehta25/OverFlow) | [GitHub](https://github.com/shivammehta25/OverFlow/releases) | [MIT](https://github.com/shivammehta25/OverFlow/blob/main/LICENSE) | [Yes](https://github.com/shivammehta25/OverFlow/tree/main?tab=readme-ov-file#setup-and-training-using-lj-speech) | English | [Paper](https://arxiv.org/abs/2211.06892) | [GH Pages](https://shivammehta25.github.io/OverFlow/) | |
| Parler TTS | [Repo](https://github.com/huggingface/parler-tts) | [π€ Hub](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) | [Apache 2.0](https://github.com/huggingface/parler-tts/blob/main/LICENSE) | [Yes](https://github.com/huggingface/parler-tts/tree/main/training) | English | Not Available | Not Available | |
| pflowTTS | [Unofficial Repo](https://github.com/p0p4k/pflowtts_pytorch) | [GDrive](https://drive.google.com/drive/folders/1x-A2Ezmmiz01YqittO_GLYhngJXazaF0) | [MIT](https://github.com/p0p4k/pflowtts_pytorch/blob/master/LICENSE) | [Yes](https://github.com/p0p4k/pflowtts_pytorch#instructions-to-run) | English | [Paper](https://openreview.net/pdf?id=zNA7u7wtIN) | Not Available | GPL-licensed phonemizer |
| Piper | [Repo](https://github.com/rhasspy/piper) | [π€ Hub](https://huggingface.co/datasets/rhasspy/piper-checkpoints/) | [MIT](https://github.com/rhasspy/piper/blob/master/LICENSE.md) | [Yes](https://github.com/rhasspy/piper/blob/master/TRAINING.md) | Multilingual | Not Available | Not Available | [GPL-licensed phonemizer](https://github.com/rhasspy/piper/issues/93) |
| Pheme | [Repo](https://github.com/PolyAI-LDN/pheme) | [π€ Hub](https://huggingface.co/PolyAI/pheme) | [CC-BY](https://github.com/PolyAI-LDN/pheme/blob/main/LICENSE) | [Yes](https://github.com/PolyAI-LDN/pheme#training) | English | [Paper](https://arxiv.org/abs/2401.02839) | [π€ Space](https://huggingface.co/spaces/PolyAI/pheme) | |
| RAD-MMM | [Repo](https://github.com/NVIDIA/RAD-MMM) | [GDrive](https://drive.google.com/file/d/1p8SEVHRlyLQpQnVP2Dc66RlqJVVRDCsJ/view) | [MIT](https://github.com/NVIDIA/RAD-MMM/blob/main/LICENSE) | [Yes](https://github.com/NVIDIA/RAD-MMM?tab=readme-ov-file#training) | Multilingual | [Paper](https://arxiv.org/pdf/2301.10335.pdf) | [Jupyter Notebook](https://github.com/NVIDIA/RAD-MMM/blob/main/inference.ipynb), [Webpage](https://research.nvidia.com/labs/adlr/projects/radmmm/) | |
| RAD-TTS | [Repo](https://github.com/NVIDIA/radtts) | [GDrive](https://drive.google.com/file/d/1Rb2VMUwQahGrnpFSlAhCPh7OpDN3xgOr/view?usp=sharing) | [MIT](https://github.com/NVIDIA/radtts/blob/main/LICENSE) | [Yes](https://github.com/NVIDIA/radtts#training-radtts-without-pitch-and-energy-conditioning) | English | [Paper](https://openreview.net/pdf?id=0NQwnnwAORi) | [GH Pages](https://nv-adlr.github.io/RADTTS) | |
| Silero | [Repo](https://github.com/snakers4/silero-models) | [GH links](https://github.com/snakers4/silero-models/blob/master/models.yml) | [CC BY-NC-SA](https://github.com/snakers4/silero-models/blob/master/LICENSE) | [No](https://github.com/snakers4/silero-models/discussions/78) | EM + DE + ES + EA | Not Available | Not Available | [Non Commercial](https://github.com/snakers4/silero-models/wiki/Licensing-and-Tiers) |
| StyleTTS 2 | [Repo](https://github.com/yl4579/StyleTTS2) | [π€ Hub](https://huggingface.co/yl4579/StyleTTS2-LibriTTS/tree/main) | [MIT](https://github.com/yl4579/StyleTTS2/blob/main/LICENSE) | [Yes](https://github.com/yl4579/StyleTTS2#finetuning) | English | [Paper](https://arxiv.org/abs/2306.07691) | [π€ Space](https://huggingface.co/spaces/styletts2/styletts2) | GPL-licensed phonemizer |
| Tacotron 2 | [Unofficial Repo](https://github.com/NVIDIA/tacotron2) | [GDrive](https://drive.google.com/file/d/1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA/view) | [BSD-3](https://github.com/NVIDIA/tacotron2/blob/master/LICENSE) | [Yes](https://github.com/NVIDIA/tacotron2/tree/master?tab=readme-ov-file#training) | English | [Paper](https://arxiv.org/abs/1712.05884) | [Webpage](https://google.github.io/tacotron/publications/tacotron2/) | |
| TorToiSe TTS | [Repo](https://github.com/neonbjb/tortoise-tts) | [π€ Hub](https://huggingface.co/jbetker/tortoise-tts-v2) | [Apache 2.0](https://github.com/neonbjb/tortoise-tts/blob/main/LICENSE) | [Yes](https://git.ecker.tech/mrq/tortoise-tts) | English | [Technical report](https://arxiv.org/abs/2305.07243) | [π€ Space](https://huggingface.co/spaces/Manmay/tortoise-tts) | |
| TTTS | [Repo](https://github.com/adelacvg/ttts) | [π€ Hub](https://huggingface.co/adelacvg/TTTS) | [MPL 2.0](https://github.com/adelacvg/ttts/blob/master/LICENSE) | No | ZH | Not Available | [Colab](https://colab.research.google.com/github/adelacvg/ttts/blob/master/demo.ipynb), [π€ Space](https://huggingface.co/spaces/mrfakename/TTTS) | |
| VALL-E | [Unofficial Repo](https://github.com/enhuiz/vall-e) | Not Available | [MIT](https://github.com/enhuiz/vall-e/blob/main/LICENSE) | [Yes](https://github.com/enhuiz/vall-e#get-started) | NA | [Paper](https://arxiv.org/abs/2301.02111) | Not Available | |
| VITS/ MMS-TTS | [Repo](https://github.com/huggingface/transformers/tree/7142bdfa90a3526cfbed7483ede3afbef7b63939/src/transformers/models/vits) | [π€ Hub](https://huggingface.co/kakao-enterprise) / [MMS](https://huggingface.co/models?search=mms-tts) | [Apache 2.0](https://github.com/huggingface/transformers/blob/main/LICENSE) | [Yes](https://github.com/ylacombe/finetune-hf-vits) | English | [Paper](https://arxiv.org/abs/2106.06103) | [π€ Space](https://huggingface.co/spaces/kakao-enterprise/vits) | GPL-licensed phonemizer |
| WhisperSpeech | [Repo](https://github.com/collabora/WhisperSpeech) | [π€ Hub](https://huggingface.co/collabora/whisperspeech) | [MIT](https://github.com/collabora/WhisperSpeech/blob/main/LICENSE) | No | English, Polish | Not Available | [π€ Space](https://huggingface.co/spaces/collabora/WhisperSpeech), [Recordings](https://github.com/collabora/WhisperSpeech/blob/main/README.md), [Colab](https://colab.research.google.com/github/collabora/WhisperSpeech/blob/8168a30f26627fcd15076d10c85d9e33c52204cf/Inference%20example.ipynb) | |
| XTTS | [Repo](https://github.com/coqui-ai/TTS) | [π€ Hub](https://huggingface.co/coqui/XTTS-v2) | [CPML](https://coqui.ai/cpml) | [Yes](https://docs.coqui.ai/en/latest/models/xtts.html#training) | Multilingual | [Paper](https://arxiv.org/abs/2406.04904) | [π€ Space](https://huggingface.co/spaces/coqui/xtts) | Non Commercial |
| xVASynth | [Repo](https://github.com/DanRuta/xVA-Synth) | [π€ Hub](https://huggingface.co/Pendrokar/xvapitch_nvidia) | [GPL-3.0](https://github.com/DanRuta/xVA-Synth/blob/master/LICENSE.md) | [Yes](https://github.com/DanRuta/xva-trainer) | Multilingual | [Paper](https://arxiv.org/abs/2009.14153) | [π€ Space](https://huggingface.co/spaces/Pendrokar/xVASynth) | Copyrighted materials used for training. |
### Capability specifics
<details>
<summary><b><i>Click on this to toggle table visibility</i></b></summary>
| Name | Processor<br>β‘ | Phonetic alphabet<br>π€ | Insta-clone<br>π₯ | Emotional control<br>π | Prompting<br>π | Speech control<br>π | Streaming support<br>π | S2S support<br>π¦ | Longform synthesis |
|---|---|---|---|---|---|---|---|---| --- |
| Amphion | CUDA | | π₯ | ππ₯ | β | | | | |
| Bark | CUDA | | β | π tags | β | | | | |
| EmotiVoice | | | | | | | | | |
| Glow-TTS | | | | | | | | | |
| GPT-SoVITS | | | | | | | | | |
| HierSpeech++ | | β | π₯ | ππ₯ | β | speed / stability<br>π | | π¦ | |
| IMS-Toucan | CUDA | β | β | β | β | | | | |
| MahaTTS | | | | | | | | | |
| Matcha-TTS | | IPA | β | β | β | speed / stability<br>π | | | |
| MetaVoice-1B | CUDA | | π₯ | ππ₯ | β | stability / similarity<br>π | | | Yes |
| Neural-HMM TTS | | | | | | | | | |
| OpenVoice | CUDA | β | π₯ | 6-type π<br>π‘πππ―π€«π | β | | | | |
| OverFlow TTS | | | | | | | | | |
| pflowTTS | | | | | | | | | |
| Piper | | | | | | | | | |
| Pheme | CUDA | β | π₯ | ππ₯ | β | stability<br>π | | | |
| RAD-TTS | | | | | | | | | |
| Silero | | | | | | | | | |
| StyleTTS 2 | CPU / CUDA | IPA | π₯ | ππ₯ | β | | π | | Yes |
| Tacotron 2 | | | | | | | | | |
| TorToiSe TTS | | β | β | β | π | | π | | |
| TTTS | CPU/CUDA | β | π₯ | | | | | | |
| VALL-E | | | | | | | | | |
| VITS/ MMS-TTS | CUDA | β | β | β | β | speed<br>π | | | |
| WhisperSpeech | CUDA | β | π₯ | ππ₯ | β | speed<br>π | | | |
| XTTS | CUDA | β | π₯ | ππ₯ | β | speed / stability<br>π | π | β | |
| xVASynth | CPU / CUDA | ARPAbet+ | β | 4-type π<br>π‘πππ―<br>perβphoneme | β | speed / pitch / energy / π<br>π<br>perβphoneme | β | π¦ | |
* Processor - CPU/CUDA/ROCm (single/multi used for inference; Real-time factor should be below 2.0 to qualify for CPU, though some leeway can be given if it supports audio streaming)
* Phonetic alphabet - None/[IPA](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet)/[ARPAbet](https://en.wikipedia.org/wiki/ARPABET)<other> (Phonetic transcription that allows to control pronunciation of certain words during inference)
* Insta-clone - Yes/No (Zero-shot model for quick voice clone)
* Emotional control - Yesπ/Strict (Strict, as in has no ability to go in-between states, insta-clone switch/ππ₯)
* Prompting - Yes/No (A side effect of narrator based datasets and a way to affect the emotional state, [ElevenLabs docs](https://elevenlabs.io/docs/speech-synthesis/prompting#emotion))
* Streaming support - Yes/No (If it is possible to playback audio that is still being generated)
* Speech control - speed/pitch/<other> (Ability to change the pitch, duration, energy and/or emotion of generated speech)
* Speech-To-Speech support - Yes/No (Streaming support implies real-time S2S; S2T=>T2S does not count)
</details>
## How can you help?
Help make this list more complete. Create demos on the Hugging Face Hub and link them here :)
Got any questions? Drop me a DM on Twitter [@reach_vb](https://twitter.com/reach_vb).
", Assign "at most 3 tags" to the expected json: {"id":"7067","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"