AI prompts
base on Control Any Computer Using LLMs. # Open Interface
<picture>
<img src="assets/icon.png" align="right" alt="Open Interface Logo" width="120" height="120">
</picture>
### Control Your Computer Using LLMs
Open Interface
- Self-drives your computer by sending your requests to an LLM backend (GPT-4o, Gemini, etc) to figure out the required steps.
- Automatically executes these steps by simulating keyboard and mouse input.
- Course-corrects by sending the LLM backend updated screenshots of the progress as needed.
<div align="center">
<h4>Full Autopilot for All Computers Using LLMs</h4>
[](https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#install)
[](https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#install)
[](https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#install)
<br>
[]((https://github.com/AmberSahdev/Open-Interface/releases/latest))



[](https://github.com/AmberSahdev/Open-Interface/releases/latest)
</div>
### <ins>Demo</ins> 💻
"Solve Today's Wordle"<br>
<br>
*clipped, 2x*
<details>
<summary><a href="https://github.com/AmberSahdev/Open-Interface/blob/main/MEDIA.md#demos">More Demos</a></summary>
<ul>
<li>
"Make me a meal plan in Google Docs"
<img src="assets/meal_plan_demo_2x.gif" style="margin: 5px; border-radius: 10px;">
</li>
<li>
"Write a Web App"
<img src="assets/code_web_app_demo_2x.gif" style="margin: 5px; border-radius: 10px;">
</li>
</ul>
</details>
<hr>
### <ins>Install</ins> 💽
<details>
<summary><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Apple_Computer_Logo_rainbow.svg/640px-Apple_Computer_Logo_rainbow.svg.png" alt="MacOS Logo" width="13" height="15"> <b>MacOS</b></summary>
<ul>
<li>Download the MacOS binary from the latest <a href="https://github.com/AmberSahdev/Open-Interface/releases/latest">release</a>.</li>
<li>Unzip the file and move Open Interface to the Applications Folder.<br><br>
<img src="assets/macos_unzip_move_to_applications.png" width="350" style="border-radius: 10px;
border: 3px solid black;">
</li>
</ul>
<details>
<summary><b>Apple Silicon M-Series Macs</b></summary>
<ul>
<li>
Open Interface will ask you for Accessibility access to operate your keyboard and mouse for you, and Screen Recording access to take screenshots to assess its progress.<br>
</li>
<li>
In case it doesn't, manually add these permission via <b>System Settings</b> -> <b>Privacy and Security</b>
<br>
<img src="assets/mac_m3_accessibility.png" width="400" style="margin: 5px; border-radius: 10px;
border: 3px solid black;"><br>
<img src="assets/mac_m3_screenrecording.png" width="400" style="margin: 5px; border-radius: 10px;
border: 3px solid black;">
</li>
</ul>
</details>
<details>
<summary><b>Intel Macs</b></summary>
<ul>
<li>
Launch the app from the Applications folder.<br>
You might face the standard Mac <i>"Open Interface cannot be opened" error</i>.<br><br>
<img src="assets/macos_unverified_developer.png" width="200" style="border-radius: 10px;
border: 3px solid black;"><br>
In that case, press <b><i><ins>"Cancel"</ins></i></b>.<br>
Then go to <b>System Preferences -> Security and Privacy -> Open Anyway.</b><br><br>
<img src="assets/macos_system_preferences.png" width="100" style="border-radius: 10px;
border: 3px solid black;">
<img src="assets/macos_security.png" width="100" style="border-radius: 10px;
border: 3px solid black;">
<img src="assets/macos_open_anyway.png" width="400" style="border-radius: 10px;
border: 3px solid black;">
</li>
<br>
<li>
Open Interface will also need Accessibility access to operate your keyboard and mouse for you, and Screen Recording access to take screenshots to assess its progress.<br><br>
<img src="assets/macos_accessibility.png" width="400" style="margin: 5px; border-radius: 10px;
border: 3px solid black;"><br>
<img src="assets/macos_screen_recording.png" width="400" style="margin: 5px; border-radius: 10px;
border: 3px solid black;">
</li>
</ul>
</details>
<ul>
<li>Lastly, checkout the <a href="#setup">Setup</a> section to connect Open Interface to LLMs (OpenAI GPT-4V)</li>
</ul>
</details>
<details>
<summary><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/TuxFlat.svg/640px-TuxFlat.svg.png" alt="Linux Logo" width="15" height="15"> <b>Linux</b></summary>
<ul>
<li>Linux binary has been tested on Ubuntu 20.04 so far.</li>
<li>Download the Linux zip file from the latest <a href="https://github.com/AmberSahdev/Open-Interface/releases/latest">release</a>.</li>
<li>
Extract the executable and checkout the <a href="https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#setup">Setup</a> section to connect Open Interface to LLMs, such as OpenAI GPT-4V.</li>
</ul>
</details>
<details>
<summary><img src="https://upload.wikimedia.org/wikipedia/commons/5/5f/Windows_logo_-_2012.svg" alt="Linux Logo" width="15" height="15"> <b>Windows</b></summary>
<ul>
<li>Windows binary has been tested on Windows 10.</li>
<li>Download the Windows zip file from the latest <a href="https://github.com/AmberSahdev/Open-Interface/releases/latest">release</a>.</li>
<li>Unzip the folder, move the exe to the desired location, double click to open, and voila.</li>
<li>Checkout the <a href="https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#setup">Setup</a> section to connect Open Interface to LLMs (OpenAI GPT-4V)</li>
</ul>
</details>
<details>
<summary><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/1869px-Python-logo-notext.svg.png" alt="Python Logo" width="15" height="15"> <b>Run as a Script</b></summary>
<ul>
<li>Clone the repo <code>git clone https://github.com/AmberSahdev/Open-Interface.git</code></li>
<li>Enter the directory <code>cd Open-Interface</code></li>
<li><b>Optionally</b> use a Python virtual environment
<ul>
<li>Note: pyenv handles tkinter installation weirdly so you may have to debug for your own system yourself.</li>
<li><code>pyenv local 3.12.2</code></li>
<li><code>python -m venv .venv</code></li>
<li><code>source .venv/bin/activate</code></li>
</ul>
</li>
<li>Install dependencies <code>pip install -r requirements.txt</code></li>
<li>Run the app using <code>python app/app.py</code></li>
</ul>
</details>
### <ins id="setup">Setup</ins> 🛠️
<details>
<summary><b>Set up the OpenAI API key</b></summary>
- Get your OpenAI API key
- Open Interface needs access to GPT-4o to perform user requests. GPT-4o keys can be downloaded from your OpenAI account at [platform.openai.com/settings/organization/api-keys](https://platform.openai.com/settings/organization/api-keys).
- [Follow the steps here](https://help.openai.com/en/articles/8264644-what-is-prepaid-billing) to add balance to your OpenAI account. To unlock GPT-4o a minimum payment of $5 is needed.
- [More info](https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4)
- Save the API key in Open Interface settings
- In Open Interface, go to the Settings menu on the top right and enter the key you received from OpenAI into the text field like so: <br>
<br>
<picture>
<img src="assets/set_openai_api_key.png" align="middle" alt="Set API key in settings" width="400">
</picture><br>
<br>
- After setting the API key for the first time you'll need to <b>restart the app</b>.
</details>
<details>
<summary><b>Set up the Google Gemini API key</b></summary>
- Go to Settings -> Advanced Settings and select the Gemini model you wish to use.
- Get your Google Gemini API key from https://aistudio.google.com/app/apikey.
- Save the API key in Open Interface settings.
- Save the settings and <b>restart the app</b>.
</details>
<details>
<summary><b>Optional: Setup a Custom LLM</b></summary>
- Open Interface supports using other OpenAI API style LLMs (such as Llava) as a backend and can be configured easily in the Advanced Settings window.
- Enter the custom base url and model name in the Advanced Settings window and the API key in the Settings window as needed.
- NOTE - If you're using Llama:
- You may need to enter a random string like "xxx" in the API key input box.
- You may need to append /v1/ to the base URL.
<br>
<picture>
<img src="assets/advanced_settings.png" align="middle" alt="Set API key in settings" width="400">
</picture><br>
<br>
- If your LLM does not support an OpenAI style API, you can use a library like [this](https://github.com/BerriAI/litellm) to convert it to one.
- You will need to restart the app after these changes.
</details>
<hr>
### <ins>Stuff It’s Error-Prone At, For Now</ins> 😬
- Accurate spatial-reasoning and hence clicking buttons.
- Keeping track of itself in tabular contexts, like Excel and Google Sheets, for similar reasons as stated above.
- Navigating complex GUI-rich applications like Counter-Strike, Spotify, Garage Band, etc due to heavy reliance on cursor actions.
### <ins>The Future</ins> 🔮
(*with better models trained on video walkthroughs like Youtube tutorials*)
- "Create a couple of bass samples for me in Garage Band for my latest project."
- "Read this design document for a new feature, edit the code on Github, and submit it for review."
- "Find my friends' music taste from Spotify and create a party playlist for tonight's event."
- "Take the pictures from my Tahoe trip and make a White Lotus type montage in iMovie."
### <ins>Notes</ins> 📝
- Cost Estimation: $0.0005 - $0.002 per LLM request depending on the model used.<br>
(User requests can require between two to a few dozen LLM backend calls depending on the request's complexity.)
- You can interrupt the app anytime by pressing the Stop button, or by dragging your cursor to any of the screen corners.
- Open Interface can only see your primary display when using multiple monitors. Therefore, if the cursor/focus is on a secondary screen, it might keep retrying the same actions as it is unable to see its progress.
<hr>
### <ins>System Diagram</ins> 🖼️
```
+----------------------------------------------------+
| App |
| |
| +-------+ |
| | GUI | |
| +-------+ |
| ^ |
| | |
| v |
| +-----------+ (Screenshot + Goal) +-----------+ |
| | | --------------------> | | |
| | Core | | LLM | |
| | | <-------------------- | (GPT-4o) | |
| +-----------+ (Instructions) +-----------+ |
| | |
| v |
| +-------------+ |
| | Interpreter | |
| +-------------+ |
| | |
| v |
| +-------------+ |
| | Executer | |
| +-------------+ |
+----------------------------------------------------+
```
---
### <ins>Star History</ins> ⭐️
<picture>
<img src="https://api.star-history.com/svg?repos=AmberSahdev/Open-Interface&type=Date" alt="Star History" width="720">
</picture>
### <ins>Links</ins> 🔗
- Check out more of my projects at [AmberSah.dev](https://AmberSah.dev).
- Other demos and press kit can be found at [MEDIA.md](MEDIA.md).
<div align="center">
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/AmberSahdev/Open-Interface">
<a href="https://github.com/AmberSahdev"> <img alt="GitHub followers" src="https://img.shields.io/github/followers/AmberSahdev"> </a>
</div>
", Assign "at most 3 tags" to the expected json: {"id":"12729","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"