Trendshift - Ask AI

base on Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop. <div align="center"> <img src="./assets/xorbits-logo.png" width="180px" alt="xorbits" /> # Xorbits Inference: Model Serving Made Easy 🤖 <p align="center"> <a href="https://xinference.io/en">Xinference Enterprise</a> · <a href="https://inference.readthedocs.io/en/latest/getting_started/installation.html#installation">Self-hosting</a> · <a href="https://inference.readthedocs.io/">Documentation</a> </p> [![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/) [![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE) [![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main) [![Discord](https://img.shields.io/badge/join_Discord-5462eb.svg?logo=discord&style=for-the-badge&logoColor=%23f5f5f5)](https://discord.gg/Xw9tszSkr5) [![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=x&style=for-the-badge)](https://twitter.com/xorbitsio) <p align="center"> <a href="./README.md"><img alt="README in English" src="https://img.shields.io/badge/English-454545?style=for-the-badge"></a> <a href="./README_zh_CN.md"><img alt="简体中文版自述文件" src="https://img.shields.io/badge/中文介绍-d9d9d9?style=for-the-badge"></a> <a href="./README_ja_JP.md"><img alt="日本語のREADME" src="https://img.shields.io/badge/日本語-d9d9d9?style=for-the-badge"></a> </p> </div> <br /> Xorbits Inference(Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command. Whether you are a researcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full potential of cutting-edge AI models. <div align="center"> <i><a href="https://discord.gg/Xw9tszSkr5">👉 Join our Discord community!</a></i> </div> ## 🔥 Hot Topics ### Framework Enhancements - [Xllamacpp](https://github.com/xorbitsai/xllamacpp): New llama.cpp Python binding, maintained by Xinference team, supports continuous batching and is more production-ready.: [#2997](https://github.com/xorbitsai/inference/pull/2997) - Distributed inference: running models across workers: [#2877](https://github.com/xorbitsai/inference/pull/2877) - VLLM enhancement: Shared KV cache across multiple replicas: [#2732](https://github.com/xorbitsai/inference/pull/2732) - Support Continuous batching for Transformers engine: [#1724](https://github.com/xorbitsai/inference/pull/1724) - Support MLX backend for Apple Silicon chips: [#1765](https://github.com/xorbitsai/inference/pull/1765) - Support specifying worker and GPU indexes for launching models: [#1195](https://github.com/xorbitsai/inference/pull/1195) - Support SGLang backend: [#1161](https://github.com/xorbitsai/inference/pull/1161) - Support LoRA for LLM and image models: [#1080](https://github.com/xorbitsai/inference/pull/1080) ### New Models - Built-in support for [Qwen3-Embedding](https://github.com/QwenLM/Qwen3-Embedding): [#3627](https://github.com/xorbitsai/inference/pull/3627) - Built-in support for [Minicpm4](https://github.com/OpenBMB/MiniCPM): [#3609](https://github.com/xorbitsai/inference/pull/3609) - Built-in support for [CogView4](https://github.com/THUDM/CogView4): [#3557](https://github.com/xorbitsai/inference/pull/3557) - Built-in support for [Deepseek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528): [#3539](https://github.com/xorbitsai/inference/pull/3539) - Built-in support for [Qwen3](https://qwenlm.github.io/blog/qwen3/): [#3347](https://github.com/xorbitsai/inference/pull/3347) - Built-in support for [Qwen2.5-Omni](https://github.com/QwenLM/Qwen2.5-Omni): [#3279](https://github.com/xorbitsai/inference/pull/3279) - Built-in support for [Skywork-OR1](https://github.com/SkyworkAI/Skywork-OR1): [#3274](https://github.com/xorbitsai/inference/pull/3274) - Built-in support for [GLM-4-0414](https://github.com/THUDM/GLM-4): [#3251](https://github.com/xorbitsai/inference/pull/3251) ### Integrations - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable. - [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization. - [RAGFlow](https://github.com/infiniflow/ragflow): is an open-source RAG engine based on deep document understanding. - [MaxKB](https://github.com/1Panel-dev/MaxKB): MaxKB = Max Knowledge Base, it is a chatbot based on Large Language Models (LLM) and Retrieval-Augmented Generation (RAG). - [Chatbox](https://chatboxai.app/): a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux. ## Key Features 🌟 **Model Serving Made Easy**: Simplify the process of serving large language, speech recognition, and multimodal models. You can set up and deploy your models for experimentation and production with a single command. ⚡️ **State-of-the-Art Models**: Experiment with cutting-edge built-in models using a single command. Inference provides access to state-of-the-art open-source models! 🖥 **Heterogeneous Hardware Utilization**: Make the most of your hardware resources with [ggml](https://github.com/ggerganov/ggml). Xorbits Inference intelligently utilizes heterogeneous hardware, including GPUs and CPUs, to accelerate your model inference tasks. ⚙️ **Flexible API and Interfaces**: Offer multiple interfaces for interacting with your models, supporting OpenAI compatible RESTful API (including Function Calling API), RPC, CLI and WebUI for seamless model management and interaction. 🌐 **Distributed Deployment**: Excel in distributed deployment scenarios, allowing the seamless distribution of model inference across multiple devices or machines. 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates with popular third-party libraries including [LangChain](https://python.langchain.com/docs/integrations/providers/xinference), [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window), [Dify](https://docs.dify.ai/advanced/model-configuration/xinference), and [Chatbox](https://chatboxai.app/). ## Why Xinference | Feature | Xinference | FastChat | OpenLLM | RayLLM | |------------------------------------------------|------------|----------|---------|--------| | OpenAI-Compatible RESTful API | ✅ | ✅ | ✅ | ✅ | | vLLM Integrations | ✅ | ✅ | ✅ | ✅ | | More Inference Engines (GGML, TensorRT) | ✅ | ❌ | ✅ | ✅ | | More Platforms (CPU, Metal) | ✅ | ✅ | ❌ | ❌ | | Multi-node Cluster Deployment | ✅ | ❌ | ❌ | ✅ | | Image Models (Text-to-Image) | ✅ | ✅ | ❌ | ❌ | | Text Embedding Models | ✅ | ❌ | ❌ | ❌ | | Multimodal Models | ✅ | ❌ | ❌ | ❌ | | Audio Models | ✅ | ❌ | ❌ | ❌ | | More OpenAI Functionalities (Function Calling) | ✅ | ❌ | ❌ | ❌ | ## Using Xinference - **Cloud </br>** We host a [Xinference Cloud](https://inference.top) service for anyone to try with zero setup. - **Self-hosting Xinference Community Edition</br>** Quickly get Xinference running in your environment with this [starter guide](#getting-started). Use our [documentation](https://inference.readthedocs.io/) for further references and more in-depth instructions. - **Xinference for enterprise / organizations</br>** We provide additional enterprise-centric features. [send us an email](mailto:[email protected]?subject=[GitHub]Business%20License%20Inquiry) to discuss enterprise needs. </br> ## Staying Ahead Star Xinference on GitHub and be instantly notified of new releases. ![star-us](assets/stay_ahead.gif) ## Getting Started * [Docs](https://inference.readthedocs.io/en/latest/index.html) * [Built-in Models](https://inference.readthedocs.io/en/latest/models/builtin/index.html) * [Custom Models](https://inference.readthedocs.io/en/latest/models/custom.html) * [Deployment Docs](https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html) * [Examples and Tutorials](https://inference.readthedocs.io/en/latest/examples/index.html) ### Jupyter Notebook The lightest way to experience Xinference is to try our [Jupyter Notebook on Google Colab](https://colab.research.google.com/github/xorbitsai/inference/blob/main/examples/Xinference_Quick_Start.ipynb). ### Docker Nvidia GPU users can start Xinference server using [Xinference Docker Image](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html). Prior to executing the installation command, ensure that both [Docker](https://docs.docker.com/get-docker/) and [CUDA](https://developer.nvidia.com/cuda-downloads) are set up on your system. ```bash docker run --name xinference -d -p 9997:9997 -e XINFERENCE_HOME=/data -v </on/your/host>:/data --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0 ``` ### K8s via helm Ensure that you have GPU support in your Kubernetes cluster, then install as follows. ``` # add repo helm repo add xinference https://xorbitsai.github.io/xinference-helm-charts # update indexes and query xinference versions helm repo update xinference helm search repo xinference/xinference --devel --versions # install xinference helm install xinference xinference/xinference -n xinference --version 0.0.1-v<xinference_release_version> ``` For more customized installation methods on K8s, please refer to the [documentation](https://inference.readthedocs.io/en/latest/getting_started/using_kubernetes.html). ### Quick Start Install Xinference by using pip as follows. (For more options, see [Installation page](https://inference.readthedocs.io/en/latest/getting_started/installation.html).) ```bash pip install "xinference[all]" ``` To start a local instance of Xinference, run the following command: ```bash $ xinference-local ``` Once Xinference is running, there are multiple ways you can try it: via the web UI, via cURL, via the command line, or via the Xinference’s python client. Check out our [docs]( https://inference.readthedocs.io/en/latest/getting_started/using_xinference.html#run-xinference-locally) for the guide. ![web UI](assets/screenshot.png) ## Getting involved | Platform | Purpose | |-------------------------------------------------------------------------------------------------|---------------------------------------------| | [Github Issues](https://github.com/xorbitsai/inference/issues) | Reporting bugs and filing feature requests. | | [Discord](https://discord.gg/Xw9tszSkr5) | Collaborating with other Xinference users. | | [Twitter](https://twitter.com/xorbitsio) | Staying up-to-date on new features. | ## Citation If this work is helpful, please kindly cite as: ```bibtex @inproceedings{lu2024xinference, title = "Xinference: Making Large Model Serving Easy", author = "Lu, Weizheng and Xiong, Lingfeng and Zhang, Feng and Qin, Xuye and Chen, Yueguo", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-demo.30", pages = "291--300", } ``` ## Contributors <a href="https://github.com/xorbitsai/inference/graphs/contributors"> <img src="https://contrib.rocks/image?repo=xorbitsai/inference" /> </a> ## Star History [![Star History Chart](https://api.star-history.com/svg?repos=xorbitsai/inference&type=Date)](https://star-history.com/#xorbitsai/inference&Date)", Assign "at most 3 tags" to the expected json: {"id":"9054","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"

AI prompts

AI prompts