Trendshift - Ask AI

base on A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM # 🚀 RAG on Windows using TensorRT-LLM, NVIDIA NIM and LlamaIndex 🦙 ChatRTX is a demo app that lets you personalize a GPT large language model (LLM) connected to your own content—docs, notes, photos. Leveraging retrieval-augmented generation (RAG), [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/), [NVIDIA NIM microservices](https://docs.nvidia.com/nim/index.html) and RTX acceleration, you can query a custom chatbot to quickly get contextually relevant answers. This app also lets you give query through your voice. As it all runs locally on your Windows RTX PC, you’ll get fast and secure results. ChatRTX supports various file formats, including text, pdf, doc/docx, xml, png, jpg, bmp. Simply point the application at the folder containing your files and it'll load them into the library in a matter of seconds. ChatRTX supports following AI models: | Model | Supported GPUs | |--------------------------------------------|----------------| | LlaMa 3.1 8B NIM | RTX 6000 Ada, RTX GPUs 4080, 4090, 5080, 5090 | | RIVA Parakeet 0.6B NIM (for supporting voice input) | RTX 6000 Ada, RTX GPUs 4080, 4090, 5080, 5090 | | CLIP (for images) | RTX 6000 Ada, RTX 3xxx, RTX 4xxx, RTX 5080, RTX 5090 | | Whisper Medium (for supporting voice input)| RTX 6000 Ada, RTX 3xxx and RTX 4xxx series GPUs that have at least 8GB of GPU memory | | Mistral 7B | RTX 6000 Ada, RTX 3xxx and RTX 4xxx series GPUs that have at least 8GB of GPU memory | | ChatGLM3 6B | RTX 6000 Ada, RTX 3xxx and RTX 4xxx series GPUs that have at least 8GB of GPU memory | | LLaMa 2 13B | RTX 6000 Ada, RTX 3xxx and RTX 4xxx series GPUs that have at least 16GB of GPU memory | | Gemma 7B | RTX 6000 Ada, RTX 3xxx and RTX 4xxx series GPUs that have at least 16GB of GPU memory | The pipeline incorporates the above AI models, [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/), [LlamaIndex](https://www.llamaindex.ai/) and the [FAISS](https://github.com/facebookresearch/faiss) vector search library. In the sample application here, we have a dataset consisting of recent articles sourced from [NVIDIA Geforce News](https://www.nvidia.com/en-us/geforce/news/). ### What is RAG? 🔍 Retrieval-augmented generation (RAG) for large language models (LLMs) that seeks to enhance prediction accuracy by connecting the LLM to your data during inference. This approach constructs a comprehensive prompt enriched with context, historical data, and recent or relevant knowledge. ### Repository details - ChatRTX_APIs: ChatRTX APIs allow developers to seamlessly integrate their applications with the TensorRT-LLM powered inference engine and utilize the various AI models supported by ChatRTX. This integration enables developers to incorporate advanced AI inference and RAG features into their applications. These APIs serve as the foundation for the ChatRTX application. More details in ChatRTX_APIs [directory](./ChatRTX_APIs/README.md). - ChatRTX_App: ChatRTX_App is a demo application that is build on top of ChatRTX APIs using electron container. The UI is build in React with Material UI libraries. More details about how to build the UI is in ChatRTX_App [directory](./ChatRTX_App/README.md). ## Getting Started ### Hardware requirement - NVIDIA GeForce RTX 5090 or 5080 GPU or NVIDIA RTX 600 Ada or NVIDIA GeForce RTX 30 or 40 Series GPU with at least 8GB of VRAM - Windows 11 23H2 or 24H2 - Driver 572.16 or later This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use. ", Assign "at most 3 tags" to the expected json: {"id":"6864","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"

AI prompts