AI prompts
base on turnkey self-hosted offline transcription and diarization service with llm summary
# Transcription Stream Community Edition
Created by [https://transcription.stream](https://transcription.stream) with special thanks to [MahmoudAshraf97](https://github.com/MahmoudAshraf97) and his work on [whisper-diarization](https://github.com/MahmoudAshraf97/whisper-diarization/), and to [jmorganca](https://github.com/jmorganca/ollama) for [Ollama](https://ollama.com/) and its amazing simplicity in use.
## Overview
Transcription Stream is a turnkey self-hosted diarization service that works completely offline. Out of the box it includes:
- drag and drop diarization and transcription via SSH
- a web interface for upload, review, and download of files
- summarization with Ollama and Mistral
- Meilisearch for full text search
A web interface and SSH drop zones make this simple to use and implement into your workflows. Ollama allows for a powerful toolset, limited only by your prompt skills, to perform complex operations on your transcriptions. Meiliesearch adds ridiculously fast full text search.
Use the web interface to upload, listen to, review, and download output files, or drop files via SSH into `transcribe` or `diarize`. Files are processed with output placed into a named and dated folder. Have a quick look at the <a href="https://www.youtube.com/watch?v=3RufeOjnlcE">install</a> and <a href="https://www.youtube.com/watch?v=pbZ8o7_MjG4">ts-web walkthrough</a> videos for a better idea.
<div align="center">
<h3>ssh upload and transcribed</h3>
<img src="https://transcription.stream/ts-sshupload.png" width="33%" style="vertical-align: top;" alt="upload file to be diarized to the diarize folder"> <img src="https://transcription.stream/ts-sshtranscribed.png" width="33%" style="vertical-align: top;" alt="transcribed files in their folders">
<h3>ts-web interface</h3>
<a href="https://www.youtube.com/watch?v=pbZ8o7_MjG4">
<img src="https://transcription.stream/ts-web.png" width="66%" alt="Example Image">
</a>
<h3>ts-gpu diarization example </h3>
<a href="https://www.youtube.com/watch?v=UAgbcZjR4mM">
<img src="https://transcription.stream/videothumb.png" alt="watch video on youtube" style="width: 66%;">
</a>
</div>
<div align="center">
<h3>mistral summary</h3>
<img src="https://transcription.stream/summary.png" alt="local ollama mistral summary" style="width: 50%;">
</div>
```
prompt_text = f"""
Summarize the transcription below. Be sure to include pertinent information about the speakers, including name and anything else shared.
Provide the summary output in the following style
Speakers: names or identifiers of speaking parties
Topics: topics included in the transcription
Ideas: any ideas that may have been mentioned
Dates: dates mentioned and what they correspond to
Locations: any locations mentioned
Action Items: any action items
Summary: overall summary of the transcription
The transcription is as follows
{transcription_text}
"""
```
**Prerequisite: NVIDIA GPU**
> **Warning:** The resulting ts-gpu image is ~26GB and might take a hot second to create
## Quickstart (no build)
### Pulls all docker images and starts services
```bash
./start-nobuild.sh
```
## Build and Run Instructions
> If you'd like to build the images locally
### Automated Install and Run
```bash
chmod +x install.sh;
./install.sh;
```
### Run
```bash
chmod +x run.sh;
./run.sh
```
## Additional Information
### Ports
- **SSH:** 22222
- **HTTP:** 5006
- **Ollama:** 11434
- **Meilisearch:** 7700
### SSH Server Access
- **Port:** 22222
- **User:** `transcriptionstream`
- **Password:** `nomoresaastax`
- **Usage:** Place audio files in `transcribe` or `diarize`. Completed files are stored in `transcribed`.
### Web Interface
- **URL:** [http://dockerip:5006](http://dockerip:5006)
- **Features:**
- Audio file upload/download
- Task completion alerts with interactive links
- HTML5 web player with speed control and transcription highlighting
- Time-synced transcription scrubbing/highlighting/scrolling
### Ollama api
- **URL:** [http://dockerip:11434](http://dockerip:11434)
- change the prompt used, in `/ts-gpu/ts-summarize.py`
### Meilisearch api
- **URL:** [http://dockerip:7700](http://dockerip:7700)
> **Warning:** This is example code for example purposes and should not be used in production environments without additional security measures.
### Customization and Troubleshooting
- Update variables in the .env file
- Change the password for `transcriptionstream` in the `ts-gpu` Dockerfile.
- Update the Ollama api endpoint IP in .env if you want to use a different endpoint
- Update the secret in .env for ts-web
- Use .env to choose which models are included in the initial build.
- Change the prompt text in ts-gpu/ts-summarize.py to fit your needs. Update ts-web/templates/transcription.html if you want to call it something other than summary.
- 12GB of vram may not be enough to run both whisper-diarization and ollama mistral. Whisper-diarization is fairly light on gpu memory out of the box, but Ollama's runner holds enough gpu memory open causing the diarization/transcription to run our of CUDA memory on occasion. Since I can't run both on the same host reliably, I've set the batch size for both whisper-diarization and whisperx to 16, from their default 8, and let a m series mac run the Ollama endpoint.
### To-do
- Need to fix an issue with ts-web that throws an error to console when loading a transcription when a summary.txt file does not also exist. Lots of other annoyances with ts-web, but it's functional.
- Need to add a search/control interface to ts-web for Meilisearch
", Assign "at most 3 tags" to the expected json: {"id":"10435","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"