Trendshift - Ask AI

base on Raspberry Pi Voice Assistant # Pi-C.A.R.D <img src="assets/assistant.png" height="200"> <img src="assets/assistant-gpio.png" height="200"> ## Table of Contents - [Introduction](#introduction) - [Usage](#usage) - [Hardware](#hardware) - [Setup](#setup) - [Roadmap](#roadmap) **Note**: This project is constantly under development. If there are any issues, feel free to submit an issue or pull request and I can try to help! Otherwise, feel free to fork/clone/copy the repo and make your own changes! I hope with the newly added docker support it will be easier to set this up, and figure out ways to modify from there. I promise a video introduction is coming soon. One other thing to note is that due to llama cpp no longer actively supporting vision models, the camera functionality has been temporarily removed. ## Introduction Pi-Card is an AI powered assistant running entirely on a Raspberry Pi. It is capable of doing what a standard LLM (like ChatGPT) can do in a conversational setting. In addition, if there is a camera equipped, you can also ask Pi-card to take a photo, describe what it sees, and then ask questions about that image. ### Why Pi-card? Raspberry **Pi** - **C**amera **A**udio **R**ecognition **D**evice. <img src="assets/picard-facepalm.jpg" height="300"> Please submit an issue or pull request if you can think of a better way to force this acronym. ### How does it work? Pi-Card runs on your Raspberry Pi. **With a wake word**. Once the `main.py`, the system will listen for your wake word. Once your wake word has been said, you are officially in a conversation. Within this conversation you do not need to constantly repeat the wake word. The system will continue to listen for your commands until you say something like "stop", "exit", or "goodbye". More information on this / customization can be found in the `config.py` file. **With a button**. If you can get your hands on a breadboard, some wires, and a button, using a button to to handle the conversation is a much smoother (in my opinion) way to interact. This is done by pressing the button, and then speaking your command. The button is a simple GPIO button, and can be set up by following the instructions in the `main_button.py` file. The chatbot has a configurable memory of the conversation, meaning if you want the assistant to repeat something it said, or elaborate on a previous topic, you can do so. For quicker responses, you can set the memory to a smaller number in the `config.py` file. ### How useful is it? The system is designed to be a fun project that can be a _somewhat_ helpful AI assistant. Since everything is done locally, it will not be as capable, or as fast, as cloud based systems. However, in the past year significant strides have already been made in small LLM models, and it's likely that this will only continue to improve, meaning so too will this project! ### Why isn't this an app? The main reason for this is that I wanted to create a voice assistant that is completely offline and how efficient it can be on the relatively inexpensive hardware of a Raspberry Pi. I figure the hardest part of this project was to make it run fast, so if I can get it working like this, something similar could be done on increasingly more powerful hardware while only being faster. ## Usage After downloading the repository, installing the requirements, and following the other setup instructions, you can run the main program by running the following command: ```bash python main.py ``` or ```bash python main_button.py ``` Once the program is running, you can start a conversation with the assistant by saying the wake word. The default wake words are "raspberry", "barry", "razbear" (aka things that the transcription might have accidentally picked up), but you can change this in the `config.py` file to anything you want.If the button version is in place, you can press the button to start a conversation, or interrupt the assistant at any time. Please note that these options, like what the wake word is, what the GPIO button is, are all things that I set up for my own use. Feel free to change them! ## Setup ### Docker (recommended) To run the project in a docker container, you can use the following command: ```bash sudo docker-compose build sudo docker-compose up ``` This is a recent addition, so may not work perfectly. It also only works for the wake-word version, not sure how to get GPIO access passed to the container. ### Software To keep this system as fast and lean as possible, we use cpp implementations where possible. Some examples are [whisper.cpp](https://github.com/ggerganov/whisper.cpp) for audio transcription and [llama.cpp](https://github.com/ggerganov/llama.cpp) for the vision capabilities. Please clone these repositories wherever you like, and add its path to the `config.py` file. Once cloned, please follow the setup instructions to get the models running. Some pointers are given below: #### Tools To make pi-card a bit more like a real assistant, there are a couple tools it has access to. These are done through [tool-bert](https://huggingface.co/nkasmanoff/tool-bert2), a fine-tuned version of BERT deciding when to access external info. More info on how to make a version of this can be found [here](https://github.com/nkasmanoff/tool-bert) The model is easy to install, but to enable tool access, take a look at .env.example file for context on what keys and secrets are necessary. For whisper.cpp, you will need to follow the quick-start guide in the [README](https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#quick-start). ##### Vision Model If you can hook a camera up to your Raspberry Pi, you can enable the vision model. If you have it, you can ask pi-card to snap a photo, and describe what it sees. This is done by setting the `vision_model` to `vlm` in the `config.py` file. At the same time, you'll also need to download the appropriate models for this, which is Qwen2-VL-2B-Instruct. Since this model has a dynamic input image token size, by making the snapped photo smaller, we can speed up the inference time. There's a lot of potential for this and using VLMs on the Raspberry Pi. For info on how to download llama.cpp, the quants, and execute the code, please see [here](https://colab.research.google.com/drive/1RBb8Iw3GNWx2jhb3n7hKyHJHfHVICivz?usp=sharing). ### Hardware The hardware setup is quite simple. You will need a Raspberry Pi 5 Model B, a USB microphone, and a speaker. The USB microphone and speaker can be plugged into the Raspberry Pi's USB ports. The camera can be connected to the camera port on the Raspberry Pi. I used the following hardware for my setup: - [Raspberry Pi 5 Kit](https://www.amazon.com/dp/B0CRSNCJ6Y?psc=1&ref=ppx_yo2ov_dt_b_product_details) - [USB Microphone](https://www.amazon.com/dp/B087PTH787?psc=1&ref=ppx_yo2ov_dt_b_product_details) - [Speaker](https://www.amazon.com/dp/B075M7FHM1?ref=ppx_yo2ov_dt_b_product_details&th=1) - [Camera](https://www.amazon.com/dp/B012V1HEP4?ref=ppx_yo2ov_dt_b_product_details&th=1) - [Camera Connector](https://www.amazon.com/dp/B0716TB6X3?psc=1&ref=ppx_yo2ov_dt_b_product_details) - [Button](https://www.amazon.com/DIYables-Button-Arduino-ESP8266-Raspberry/dp/B0BXKN4TY6) - [Breadboard](https://www.amazon.com/dp/B09VKYLYN7?psc=1&ref=ppx_yo2ov_dt_b_product_details) Please note Pi 5's have a new camera port, hence the new camera connector. At the same time, while this project is focused on making this work on a Raspberry Pi 5, it should work on other devices as well. The camera connector is optional, but if you want to use the camera functionality, you will need to purchase one. For setting up the GPIO button, I found the first couple minutes of [this tutorial](https://youtu.be/IHvtJvgM_eQ?si=VZzhElu5yYTt7zcV) great. Feel free to use your own, this is what worked for me! ## Roadmap Coming soon, but I plan to add notes here on things currently implemented, and what can be done in the future. Some quick notes on it are below. I've started to make a notion board to keep track of things, but it's not complete yet. Check it out [here](https://marble-laugh-dd5.notion.site/14195743cced80229c3cddfd0cd5a750?v=a673ae0424b445d9983b71774a943b0f). - [x] Basic conversation capabilities - [x] Camera capabilities - [x] Benchmark response times - [x] Test overclocking - [x] Figure out how to speed up whisper times - [x] Add ability to interrupt assistant, and ask new question - [x] Use a custom tuned model for the assistant - [ ] Improved tutorials & videos - [x] Improve external service function model (tool-bert) - [x] Test when connected to a portable power source - [ ] Create optional model generation using entropix - [x] Dockerize repo for testing on more devices - [ ] Test in other languages - [ ] Add more external services ", Assign "at most 3 tags" to the expected json: {"id":"10159","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"

AI prompts

AI prompts