AI prompts
base on A "large" language model running on a microcontroller # llama4micro 🦙🔬
A "large" language model running on a microcontroller.

## Background
I was wondering if it's possible to fit a non-trivial language model on a microcontroller. Turns out the answer is some version of yes! (Later, things got a bit out of hand and now the prompt is based on objects detected by the camera.)
This project is using the [Coral Dev Board Micro](https://coral.ai/products/dev-board-micro) with its [FreeRTOS toolchain](https://coral.ai/docs/dev-board-micro/freertos/). The board has a number of neat [hardware features](https://coral.ai/docs/dev-board-micro/get-started/#the-hardware), but – most importantly for our purposes – it has 64MB of RAM. That's tiny for LLMs, which are typically measured in the GBs, but comparatively huge for a microcontroller.
The LLM implementation itself is an adaptation of [llama2.c](https://github.com/karpathy/llama2.c) and the [tinyllamas](https://huggingface.co/karpathy/tinyllamas/tree/main) checkpoints trained on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset. The quality of the smaller model versions isn't ideal, but good enough to generate somewhat coherent (and occasionally weird) stories.
> [!NOTE]
> Language model inference runs on the 800 MHz [Arm Cortex-M7](https://developer.arm.com/Processors/Cortex-M7) CPU core. Camera image classification uses the [Edge TPU](https://coral.ai/technology/) and a [compiled](https://coral.ai/docs/edgetpu/compiler/) [YOLOv5 model](https://github.com/ultralytics/yolov5). The board also has a second 400 MHz [Arm Cortex-M4](https://developer.arm.com/Processors/Cortex-M4) CPU core, which is currently unused.
## Setup
Clone this repo with its submodules [`karpathy/llama2.c`](https://github.com/karpathy/llama2.c), [`google-coral/coralmicro`](https://github.com/google-coral/coralmicro), and [`ultralytics/yolov5`](https://github.com/ultralytics/yolov5).
```bash
git clone --recurse-submodules https://github.com/maxbbraun/llama4micro.git
cd llama4micro
```
The pre-trained models are in the [`models/`](models/) directory. Refer to the [instructions](models/README.md) on how to download and convert them.
Build the image:
```bash
mkdir build
cd build
cmake ..
make -j
```
Flash the image:
```bash
python3 -m venv venv
. venv/bin/activate
pip install -r ../coralmicro/scripts/requirements.txt
python ../coralmicro/scripts/flashtool.py \
--build_dir . \
--elf_path llama4micro
```
## Usage
1. The models load automatically when the board powers up.
- This takes ~7 seconds.
- The green light will turn on when ready.
2. Point the camera at an object and press the button.
- The green light will turn off.
- The camera will take a picture and detect an object.
3. The model now generates tokens starting with a prompt based on the object.
- The results are streamed to the serial port.
- This happens at a rate of ~2.5 tokens per second.
4. Generation stops after the end token or maximum steps.
- The green light will turn on again.
- Goto 2.
", Assign "at most 3 tags" to the expected json: {"id":"5053","tags":[]} "only from the tags list I provide: []" returns me the "expected json"