AI prompts
base on Have a natural, spoken conversation with AI!
# Real-Time AI Voice Chat 🎤💬🧠🔊
**Have a natural, spoken conversation with an AI!**
This project lets you chat with a Large Language Model (LLM) using just your voice, receiving spoken responses in near real-time. Think of it as your own digital conversation partner.
https://github.com/user-attachments/assets/16cc29a7-bec2-4dd0-a056-d213db798d8f
*(early preview - first reasonably stable version)*
> ❗ **Project Status: Community-Driven**
>
> This project is no longer being actively maintained by me due to time constraints. I've taken on too many projects and I have to step back. I will no longer be implementing new features or providing user support.
>
> I will continue to review and merge high-quality, well-written Pull Requests from the community from time to time. Your contributions are welcome and appreciated!
## What's Under the Hood?
A sophisticated client-server system built for low-latency interaction:
1. 🎙️ **Capture:** Your voice is captured by your browser.
2. ➡️ **Stream:** Audio chunks are whisked away via WebSockets to a Python backend.
3. ✍️ **Transcribe:** `RealtimeSTT` rapidly converts your speech to text.
4. 🤔 **Think:** The text is sent to an LLM (like Ollama or OpenAI) for processing.
5. 🗣️ **Synthesize:** The AI's text response is turned back into speech using `RealtimeTTS`.
6. ⬅️ **Return:** The generated audio is streamed back to your browser for playback.
7. 🔄 **Interrupt:** Jump in anytime! The system handles interruptions gracefully.
## Key Features ✨
* **Fluid Conversation:** Speak and listen, just like a real chat.
* **Real-Time Feedback:** See partial transcriptions and AI responses as they happen.
* **Low Latency Focus:** Optimized architecture using audio chunk streaming.
* **Smart Turn-Taking:** Dynamic silence detection (`turndetect.py`) adapts to the conversation pace.
* **Flexible AI Brains:** Pluggable LLM backends (Ollama default, OpenAI support via `llm_module.py`).
* **Customizable Voices:** Choose from different Text-to-Speech engines (Kokoro, Coqui, Orpheus via `audio_module.py`).
* **Web Interface:** Clean and simple UI using Vanilla JS and the Web Audio API.
* **Dockerized Deployment:** Recommended setup using Docker Compose for easier dependency management.
## Technology Stack 🛠️
* **Backend:** Python < 3.13, FastAPI
* **Frontend:** HTML, CSS, JavaScript (Vanilla JS, Web Audio API, AudioWorklets)
* **Communication:** WebSockets
* **Containerization:** Docker, Docker Compose
* **Core AI/ML Libraries:**
* `RealtimeSTT` (Speech-to-Text)
* `RealtimeTTS` (Text-to-Speech)
* `transformers` (Turn detection, Tokenization)
* `torch` / `torchaudio` (ML Framework)
* `ollama` / `openai` (LLM Clients)
* **Audio Processing:** `numpy`, `scipy`
## Before You Dive In: Prerequisites 🏊♀️
This project leverages powerful AI models, which have some requirements:
* **Operating System:**
* **Docker:** Linux is recommended for the best GPU integration with Docker.
* **Manual:** The provided script (`install.bat`) is for Windows. Manual steps are possible on Linux/macOS but may require more troubleshooting (especially for DeepSpeed).
* **🐍 Python:** 3.9 or higher (if setting up manually).
* **🚀 GPU:** **A powerful CUDA-enabled NVIDIA GPU is *highly recommended***, especially for faster STT (Whisper) and TTS (Coqui). Performance on CPU-only or weaker GPUs will be significantly slower.
* The setup assumes **CUDA 12.1**. Adjust PyTorch installation if you have a different CUDA version.
* **Docker (Linux):** Requires [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
* **🐳 Docker (Optional but Recommended):** Docker Engine and Docker Compose v2+ for the containerized setup.
* **🧠 Ollama (Optional):** If using the Ollama backend *without* Docker, install it separately and pull your desired models. The Docker setup includes an Ollama service.
* **🔑 OpenAI API Key (Optional):** If using the OpenAI backend, set the `OPENAI_API_KEY` environment variable (e.g., in a `.env` file or passed to Docker).
---
## Getting Started: Installation & Setup ⚙️
**Clone the repository first:**
```bash
git clone https://github.com/KoljaB/RealtimeVoiceChat.git
cd RealtimeVoiceChat
```
Now, choose your adventure:
<details>
<summary><strong>🚀 Option A: Docker Installation (Recommended for Linux/GPU)</strong></summary>
This is the most straightforward method, bundling the application, dependencies, and even Ollama into manageable containers.
1. **Build the Docker images:**
*(This takes time! It downloads base images, installs Python/ML dependencies, and pre-downloads the default STT model.)*
```bash
docker compose build
```
*(If you want to customize models/settings in `code/*.py`, do it **before** this step!)*
2. **Start the services (App & Ollama):**
*(Runs containers in the background. GPU access is configured in `docker-compose.yml`.)*
```bash
docker compose up -d
```
Give them a minute to initialize.
3. **(Crucial!) Pull your desired Ollama Model:**
*(This is done *after* startup to keep the main app image smaller and allow model changes without rebuilding. Execute this command to pull the default model into the running Ollama container.)*
```bash
# Pull the default model (adjust if you configured a different one in server.py)
docker compose exec ollama ollama pull hf.co/bartowski/huihui-ai_Mistral-Small-24B-Instruct-2501-abliterated-GGUF:Q4_K_M
# (Optional) Verify the model is available
docker compose exec ollama ollama list
```
4. **Stopping the Services:**
```bash
docker compose down
```
5. **Restarting:**
```bash
docker compose up -d
```
6. **Viewing Logs / Debugging:**
* Follow app logs: `docker compose logs -f app`
* Follow Ollama logs: `docker compose logs -f ollama`
* Save logs to file: `docker compose logs app > app_logs.txt`
</details>
<details>
<summary><strong>🛠️ Option B: Manual Installation (Windows Script / venv)</strong></summary>
This method requires managing the Python environment yourself. It offers more direct control but can be trickier, especially regarding ML dependencies.
**B1) Using the Windows Install Script:**
1. Ensure you meet the prerequisites (Python, potentially CUDA drivers).
2. Run the script. It attempts to create a venv, install PyTorch for CUDA 12.1, a compatible DeepSpeed wheel, and other requirements.
```batch
install.bat
```
*(This opens a new command prompt within the activated virtual environment.)*
Proceed to the **"Running the Application"** section.
**B2) Manual Steps (Linux/macOS/Windows):**
1. **Create & Activate Virtual Environment:**
```bash
python -m venv venv
# Linux/macOS:
source venv/bin/activate
# Windows:
.\venv\Scripts\activate
```
2. **Upgrade Pip:**
```bash
python -m pip install --upgrade pip
```
3. **Navigate to Code Directory:**
```bash
cd code
```
4. **Install PyTorch (Crucial Step - Match Your Hardware!):**
* **With NVIDIA GPU (CUDA 12.1 Example):**
```bash
# Verify your CUDA version! Adjust 'cu121' and the URL if needed.
pip install torch==2.5.1+cu121 torchaudio==2.5.1+cu121 torchvision --index-url https://download.pytorch.org/whl/cu121
```
* **CPU Only (Expect Slow Performance):**
```bash
# pip install torch torchaudio torchvision
```
* *Find other PyTorch versions:* [https://pytorch.org/get-started/previous-versions/](https://pytorch.org/get-started/previous-versions/)
5. **Install Other Requirements:**
```bash
pip install -r requirements.txt
```
* **Note on DeepSpeed:** The `requirements.txt` may include DeepSpeed. Installation can be complex, especially on Windows. The `install.bat` tries a precompiled wheel. If manual installation fails, you might need to build it from source or consult resources like [deepspeedpatcher](https://github.com/erew123/deepspeedpatcher) (use at your own risk). Coqui TTS performance benefits most from DeepSpeed.
</details>
---
## Running the Application ▶️
**If using Docker:**
Your application is already running via `docker compose up -d`! Check logs using `docker compose logs -f app`.
**If using Manual/Script Installation:**
1. **Activate your virtual environment** (if not already active):
```bash
# Linux/macOS: source ../venv/bin/activate
# Windows: ..\venv\Scripts\activate
```
2. **Navigate to the `code` directory** (if not already there):
```bash
cd code
```
3. **Start the FastAPI server:**
```bash
python server.py
```
**Accessing the Client (Both Methods):**
1. Open your web browser to `http://localhost:8000` (or your server's IP if running remotely/in Docker on another machine).
2. **Grant microphone permissions** when prompted.
3. Click **"Start"** to begin chatting! Use "Stop" to end and "Reset" to clear the conversation.
---
## Configuration Deep Dive 🔧
Want to tweak the AI's voice, brain, or how it listens? Modify the Python files in the `code/` directory.
**⚠️ Important Docker Note:** If using Docker, make any configuration changes *before* running `docker compose build` to ensure they are included in the image.
* **TTS Engine & Voice (`server.py`, `audio_module.py`):**
* Change `START_ENGINE` in `server.py` to `"coqui"`, `"kokoro"`, or `"orpheus"`.
* Adjust engine-specific settings (e.g., voice model path for Coqui, speaker ID for Orpheus, speed) within `AudioProcessor.__init__` in `audio_module.py`.
* **LLM Backend & Model (`server.py`, `llm_module.py`):**
* Set `LLM_START_PROVIDER` (`"ollama"` or `"openai"`) and `LLM_START_MODEL` (e.g., `"hf.co/..."` for Ollama, model name for OpenAI) in `server.py`. Remember to pull the Ollama model if using Docker (see Installation Step A3).
* Customize the AI's personality by editing `system_prompt.txt`.
* **STT Settings (`transcribe.py`):**
* Modify `DEFAULT_RECORDER_CONFIG` to change the Whisper model (`model`), language (`language`), silence thresholds (`silence_limit_seconds`), etc. The default `base.en` model is pre-downloaded during the Docker build.
* **Turn Detection Sensitivity (`turndetect.py`):**
* Adjust pause duration constants within the `TurnDetector.update_settings` method.
* **SSL/HTTPS (`server.py`):**
* Set `USE_SSL = True` and provide paths to your certificate (`SSL_CERT_PATH`) and key (`SSL_KEY_PATH`) files.
* **Docker Users:** You'll need to adjust `docker-compose.yml` to map the SSL port (e.g., 443) and potentially mount your certificate files as volumes.
<details>
<summary><strong>Generating Local SSL Certificates (Windows Example w/ mkcert)</strong></summary>
1. Install Chocolatey package manager if you haven't already.
2. Install mkcert: `choco install mkcert`
3. Run Command Prompt *as Administrator*.
4. Install a local Certificate Authority: `mkcert -install`
5. Generate certs (replace `your.local.ip`): `mkcert localhost 127.0.0.1 ::1 your.local.ip`
* This creates `.pem` files (e.g., `localhost+3.pem` and `localhost+3-key.pem`) in the current directory. Update `SSL_CERT_PATH` and `SSL_KEY_PATH` in `server.py` accordingly. Remember to potentially mount these into your Docker container.
</details>
---
## Contributing 🤝
Got ideas or found a bug? Contributions are welcome! Feel free to open issues or submit pull requests.
## License 📜
The core codebase of this project is released under the **MIT License** (see the [LICENSE](./LICENSE) file for details).
This project relies on external specific TTS engines (like `Coqui XTTSv2`) and LLM providers which have their **own licensing terms**. Please ensure you comply with the licenses of all components you use.
", Assign "at most 3 tags" to the expected json: {"id":"14648","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"