AI prompts
base on Edge full-stack LLM platform. Written in Rust ⚠️ README outdated ([undergoing large refactor](https://github.com/llm-edge/hal-9100/tree/0.1)) ⚠️
<p align="center">
<img width="600" alt="hal-9100" src="https://github.com/llm-edge/hal-9100/assets/25003283/17c3792e-f191-48d7-9c77-7f39d8f94912">
<h1 align="center">🤖 HAL-9100</h1>
<h2 align="center">Build AI Assistants that don't need internet. Using OpenAI SDK. For production.</h2>
<h4 align="center">100% Private, 75% Cheaper & 23x Faster Assistants.</h4>
<p align="center">
<a href='https://codespaces.new/llm-edge/hal-9100?quickstart=1'><img src='https://github.com/codespaces/badge.svg' alt='Open in GitHub Codespaces' style='max-width: 100%;'></a>
<br />
<a href="https://discord.gg/pj5VRqqs84"><img alt="Join Discord" src="https://img.shields.io/discord/1066022656845025310?color=blue&style=for-the-badge"></a>
</p>
</p>
-----
<p align="center">
<a href="https://link.excalidraw.com/readonly/YSE7DNzB2LmEPfVdCqq3">🖼️ Infra</a>
<a href="https://github.com/llm-edge/hal-9100/issues/new?assignees=&labels=enhancement">✨ Feature?</a>
<a href="https://github.com/llm-edge/hal-9100/issues/new?assignees=&labels=bug">❤️🩹 Bug?</a>
<a href="https://cal.com/louis030195/applied-ai">📞 Help?</a>
</p>
-----
<!--
# ⭐️ Latest News
- [2024/01/19] 🔥 Added usage w ollama. Keep reading 👇.
- [2024/01/19] 🔥 Action tool. [Let your Assistant make requests to APIs](https://github.com/llm-edge/hal-9100/tree/main/examples/hello-world-mlc-llm-mistral-nodejs-action).
- [2023/12/19] 🔥 New example: Open source LLM with code interpreter. [Learn more](./examples/hello-world-code-interpreter-mixtral-nodejs/README.md).
- [2023/12/08] 🔥 New example: Open source LLM with function calling. [Learn more](./examples/hello-world-intel-neural-chat-nodejs-function-calling/README.md).
- [2023/11/29] 🔥 New example: Using mistral-7b, an open source LLM. [Check it out](./examples/hello-world-mistral-curl/README.md).
-->
# ✨ Key Features
- [x] **Code Interpreter**: Generate and runs Python code in a sandboxed environment autonomously. (beta)
- [x] **Knowledge Retrieval**: Retrieves external knowledge or documents autonomously.
- [x] **Function Calling**: Defines and executes custom functions autonomously.
- [x] **Actions**: Execute requests to external APIs autonomously.
- [x] **Files**: Supports a range of file formats.
- [x] **OpenAI compatible**: Works with OpenAI (Assistants) SDK
<!--
- [x] **Enterprise production-ready**:
- [x] observability (metrics, errors, traces, logs, etc.)
- [x] scalability (serverless, caching, autoscaling, etc.)
- [x] security (encryption, authentication, authorization, etc.)
-->
# 😃 Who is it for?
<img width="800" alt="hal-9100-2" src="https://github.com/llm-edge/hal-9100/assets/25003283/5a393d61-7a1d-4e06-8932-f822b18015ba">
- You want to increase customization (e.g. use your own models, extend the API, etc.)
- You work in a data-sensitive environment (healthcare, IoT, military, law, etc.)
- Your product does have poor or no internet access (military, IoT, edge, extreme environment, etc.)
- (not our main focus) You operate on a large scale and want to reduce your costs
- (not our main focus) You operate on a large scale and want to increase your speed
# 🤖 Our definition of Software 3.0
First, our definition of **Software 3.0**, as it is a loaded term:
Software 3.0 is the bridge connecting the cognitive capabilities of Large Language Models with the practical needs of human digital activity. It is a comprehensive approach that allows LLMs to:
1. perform the same activity (or better) on the digital world than humans
2. generally, allow the user to [perform more operations without conscious effort](https://third.software/)
# 📏 Principles
HAL-9100 is in continuous development, with the aim of always offering better infrastructure for **Edge Software 3.0**. To achieve this, it is based on several principles that define its functionality and scope.
<details>
<summary><strong>Less prompt is more</strong></summary>
<p>
As few prompts as possible should be hard-coded into the infrastructure, just enough to bridge the gap between **Software 1.0** and **Software 3.0** and give the client as much control as possible on the prompts.
</p>
</details>
<details>
<summary><strong>Edge-first</strong></summary>
<p>
HAL-9100 does not require internet access by focusing on **open source LLMs**. Which means you own your data and your models. It runs on a Raspberry PI (LLM included).
</p>
</details>
<details>
<summary><strong>OpenAI-compatible</strong></summary>
<p>
OpenAI spent a large amount of the best brain power to design this API, which makes it an incredible experience for developers. Support for OpenAI LLMs are not a priority at all though.
</p>
</details>
<details>
<summary><strong>Reliable and deterministic</strong></summary>
<p>
HAL-9100 focus on reliability and being as deterministic as possible by default. That's why everything has to be tested and benchmarked.
</p>
</details>
<details>
<summary><strong>Flexible</strong></summary>
<p>
A minimal number of hard-coded prompts and behaviors, a wide range of models, infrastructure components and deployment options and it play well with the open-source ecosystem, while only integrating projects that have stood the test of time.
</p>
</details>
# Quickstart
Get started in less than a minute through GitHub Codespaces:
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/llm-edge/hal-9100?quickstart=1)
Or:
```bash
git clone https://github.com/llm-edge/hal-9100
cd hal-9100
```
To get started quickly, let's use Anyscale API.
Get an API key from Anyscale. You can get it [here](https://app.endpoints.anyscale.com/credentials). Replace in [hal-9100.toml](./hal-9100.toml) the `model_api_key` with your API key.
<details>
<summary>Usage w/ ollama</summary>
<p>
1. use `model_url = "http://localhost:11434/v1/chat/completions"`
2. set `gemma:2b` in [examples/quickstart.js](./examples/quickstart.js)
3. and run `ollama run gemma:2b & && docker compose --profile api -f docker/docker-compose.yml up`
</p>
</details>
Install OpenAI SDK: `npm i openai`
Start the infra:
```bash
docker compose --profile api -f docker/docker-compose.yml up
```
Run the [quickstart](./examples/quickstart.js):
```bash
node examples/quickstart.js
```
# 🤔 FAQ
<details>
<summary>Is there a hosted version?</summary>
No. HAL-9100 is not a hosted service. It's a software that you can deploy on your infrastructure. We can help you deploy it on your infrastructure. [Contact us](https://cal.com/louis030195/applied-ai).
</details>
<details>
<summary>Which LLM API can I use?</summary>
Examples of LLM APIs that does support OpenAI API-like, that you can use:
- ollama
- [MLC-LLM](https://github.com/mlc-ai/mlc-llm)
- [FastChat (good if you have a mac)](https://github.com/llm-edge/hal-9100/tree/main/examples/hello-world-mistral-curl)
- [vLLM (good if you have a modern gpu)](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server)
- [Perplexity API](https://github.com/llm-edge/hal-9100/tree/main/examples/hello-world-code-interpreter-mixtral-nodejs)
- Mistral API
- anyscale
- together ai
We recommend these models:
- mistralai/Mixtral-8x7B-Instruct-v0.1
- mistralai/mistral-7b
Other models have not been extensively tested and may not work as expected, but you can try them.
</details>
<details>
<summary>What's the difference with LangChain?</summary>
1. LangChain spans proprietary LLM and open source, among the thousands of things it spans. HAL-9100 laser focuses on Software 3.0 for the edge.
2. You can write AI products in 50 lines of code instead of 5000 and having to learn a whole new abstraction
</details>
<details>
<summary>Are you related to OpenAI?</summary>
No.
</details>
<details>
<summary>I don't use Assistants API. Can I use this?</summary>
We recommend switching to the Assistants API for a more streamlined experience, allowing you to focus more on your product than on infrastructure.
</details>
<details>
<summary>Does the Assistants API support audio and images?</summary>
Images soon, working on it.
</details>
", Assign "at most 3 tags" to the expected json: {"id":"6221","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"