AI prompts
base on # local-llm
Run LLMs locally on Cloud Workstations. Uses:
- Quantized models from 🤗
- [llama-cpp-python's webserver][web-server]
In this guide:
* [Running as a Cloud Workstation](#running-as-a-cloud-workstation)
* [local-llm commands](#local-llm-commands)
* [Running locally](#running-locally)
* [Debugging](#debugging)
* [LLM disclaimer](#llm-disclaimer)
## Running as a [Cloud Workstation][cw]
This repository includes a [Dockerfile](./Dockerfile) that can be used to create a [custom base image][cw-custom]
for a Cloud Workstation environment that includes the `llm` tool.
To get started, you'll need to have a [GCP Project][gcp] and have the `gcloud` CLI [installed][gcloud].
1. Set environment variables
1. Set the `PROJECT_ID` and `PROJECT_NUM` environment variables from your GCP project. You must modify the values.
```shell
export PROJECT_ID=<project-id>
export PROJECT_NUM=<project-num>
```
1. Set other needed environment variables. You can modify the values.
```shell
export REGION=us-central1
export LOCALLLM_REGISTRY=localllm-registry
export LOCALLLM_IMAGE_NAME=localllm
export LOCALLLM_CLUSTER=localllm-cluster
export LOCALLLM_WORKSTATION=localllm-workstation
export LOCALLLM_PORT=8000
```
1. Set the default project.
```shell
gcloud config set project $PROJECT_ID
```
1. Enable needed services.
```shell
gcloud services enable \
cloudbuild.googleapis.com \
workstations.googleapis.com \
container.googleapis.com \
containeranalysis.googleapis.com \
containerscanning.googleapis.com \
artifactregistry.googleapis.com
```
1. Create an Artifact Registry repository for docker images.
```shell
gcloud artifacts repositories create $LOCALLLM_REGISTRY \
--location=$REGION \
--repository-format=docker
```
1. Build and push the image to Artifact Registry using Cloud Build. Details are in [cloudbuild.yaml](cloudbuild.yaml).
```shell
gcloud builds submit . \
--substitutions=_IMAGE_REGISTRY=$LOCALLLM_REGISTRY,_IMAGE_NAME=$LOCALLLM_IMAGE_NAME
```
1. Configure a Cloud Workstation cluster.
**Wait for this to complete before moving forward** which can take
[up to 20 minutes](https://cloud.google.com/workstations/docs/create-cluster#workstation-cluster).
```shell
gcloud workstations clusters create $LOCALLLM_CLUSTER \
--region=$REGION
```
1. Create a Cloud Workstation configuration. We suggest using a machine type of e2-standard-32 which has 32 vCPU, 16
core and 128 GB memory.
```shell
gcloud beta workstations configs create $LOCALLLM_WORKSTATION \
--region=$REGION \
--cluster=$LOCALLLM_CLUSTER \
--machine-type=e2-standard-32 \
--container-custom-image=us-central1-docker.pkg.dev/${PROJECT_ID}/${LOCALLLM_REGISTRY}/${LOCALLLM_IMAGE_NAME}:latest
```
1. Create a Cloud Workstation.
```shell
gcloud workstations create $LOCALLLM_WORKSTATION \
--cluster=$LOCALLLM_CLUSTER \
--config=$LOCALLLM_WORKSTATION \
--region=$REGION
```
1. Grant access to the default Cloud Workstation service account.
```shell
gcloud artifacts repositories add-iam-policy-binding $LOCALLLM_REGISTRY \
--location=$REGION \
--member=serviceAccount:service-$PROJECT_NUM@gcp-sa-workstationsvm.iam.gserviceaccount.com \
--role=roles/artifactregistry.reader
```
1. Start the workstation.
```shell
gcloud workstations start $LOCALLLM_WORKSTATION \
--cluster=$LOCALLLM_CLUSTER \
--config=$LOCALLLM_WORKSTATION \
--region=$REGION
```
1. Connect to the workstation using ssh. Alternatively, you can connect to the workstation
[interactively][launch-workstation] in the browser.
```bash
gcloud workstations ssh $LOCALLLM_WORKSTATION \
--cluster=$LOCALLLM_CLUSTER \
--config=$LOCALLLM_WORKSTATION \
--region=$REGION
```
1. Start serving the default model from the repo.
```shell
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF $LOCALLLM_PORT
```
1. Get the hostname of the workstation using:
```bash
gcloud workstations describe $LOCALLLM_WORKSTATION \
--cluster=$LOCALLLM_CLUSTER \
--config=$LOCALLLM_WORKSTATION \
--region=$REGION
```
1. Interact with the model by visiting the live OpenAPI documentation page: `https://$LOCALLLM_PORT-$LLM_HOSTNAME/docs`.
## local-llm commands
[!NOTE] The command is now `local-llm`, however the original command (`llm`) is supported
inside of the [cloud workstations image](#running-as-a-cloud-workstation).
Assumes that models are downloaded to `~/.cache/huggingface/hub/`. This is the default cache
path used by Hugging Face Hub [library][hf-hub] and only supports `.gguf` files.
If you're using models from TheBloke and you don't specify a filename, we'll attempt to use
the model with 4 bit medium quantization, or you can specify a filename explicitly.
1. List downloaded models.
```shell
local-llm list
```
1. List running models.
```shell
local-llm ps
```
1. Start serving models.
1. Start serving the default model from the repo. Download if not present.
```shell
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000
```
1. Start serving a specific model. Download if not present.
```shell
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf 8000
```
1. Stop serving models.
1. Stop serving all models from the repo.
```shell
local-llm kill TheBloke/Llama-2-13B-Ensemble-v5-GGUF
```
1. Stop serving a specific model.
```shell
local-llm kill TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf
```
1. Download models.
1. Download the default model from the repo.
```shell
local-llm pull TheBloke/Llama-2-13B-Ensemble-v5-GGUF
```
1. Download a specific model from the repo.
```shell
local-llm pull TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf
```
1. Remove models.
1. Remove all models downloaded from the repo.
```shell
local-llm rm TheBloke/Llama-2-13B-Ensemble-v5-GGUF
```
1. Remove a specific model from the repo.
```shell
local-llm rm TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf
```
## Running locally
1. Install the tools.
```shell
# Install the tools
pip3 install openai
pip3 install ./local-llm/.
```
1. Download and run a model.
```shell
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000
```
1. Try out a query. The default query is for a haiku about cats.
```shell
python3 querylocal.py
```
1. Interact with the Open API interface via the `/docs` extension. For the above, visit http://localhost:8000/docs.
## Debugging
To assist with debugging, you can configure model startup to write logs to a log file by providing a yaml
python logging configuration file:
```
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000 --log-config <some config file>
```
To run locally using the bundled log config
([log_config.yaml](../local-llm/log_config.yaml)):
```
sudo touch /var/log/local-llm.log
sudo chown user:user /var/log/local-llm.log # use your user and group
# provide the log config manually
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000 --log-config local-llm/log_config.yaml
# or use an environment variable so you don't have to pass the argument
export LOG_CONFIG=$(pip show local-llm | grep Location | awk '{print $2}')/log_config.yaml
local-llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000
```
You can follow the logs with:
```
tail -f /var/log/local-llm.log
```
If you are running multiple models, the logs from each will be written to the same file and interleaved.
If running from Cloud Workstations, logs from running models will be written to `/var/log/local-llm.log`
([log_config.yaml](../local-llm/log_config.yaml) is provided by default via the environment variable
`LOG_CONFIG` within the image).
## LLM Disclaimer
This project imports freely available LLMs and makes them available from [Cloud Workstations][cw]. We recommend
independently verifying any content generated by the models. We do not assume any responsibility or liability for
the use or interpretation of generated content.
[gcp]: https://cloud.google.com/docs/get-started
[gcloud]: https://cloud.google.com/sdk/docs/install
[cw]: https://cloud.google.com/workstations
[cw-custom]: https://cloud.google.com/workstations/docs/customize-container-images
[launch-workstation]: https://cloud.google.com/workstations/docs/create-workstation#launch_a_workstation
[hf-hub]: https://github.com/huggingface/huggingface_hub
[web-server]: https://github.com/abetlen/llama-cpp-python#web-server
", Assign "at most 3 tags" to the expected json: {"id":"7653","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"