base on Official inference library for Mistral models # Mistral Inference <a target="_blank" href="https://colab.research.google.com/github/mistralai/mistral-inference/blob/main/tutorials/getting_started.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a> This repository contains minimal code to run Mistral models. Blog 7B: [https://mistral.ai/news/announcing-mistral-7b/](https://mistral.ai/news/announcing-mistral-7b/)\ Blog 8x7B: [https://mistral.ai/news/mixtral-of-experts/](https://mistral.ai/news/mixtral-of-experts/)\ Blog 8x22B: [https://mistral.ai/news/mixtral-8x22b/](https://mistral.ai/news/mixtral-8x22b/)\ Blog Codestral 22B: [https://mistral.ai/news/codestral](https://mistral.ai/news/codestral/) \ Blog Codestral Mamba 7B: [https://mistral.ai/news/codestral-mamba/](https://mistral.ai/news/codestral-mamba/) \ Blog Mathstral 7B: [https://mistral.ai/news/mathstral/](https://mistral.ai/news/mathstral/) \ Blog Nemo: [https://mistral.ai/news/mistral-nemo/](https://mistral.ai/news/mistral-nemo/) \ Blog Mistral Large 2: [https://mistral.ai/news/mistral-large-2407/](https://mistral.ai/news/mistral-large-2407/) \ Blog Pixtral 12B: [https://mistral.ai/news/pixtral-12b/](https://mistral.ai/news/pixtral-12b/) Blog Mistral Small 3.1: [https://mistral.ai/news/mistral-small-3-1/](https://mistral.ai/news/mistral-small-3-1/) Discord: [https://discord.com/invite/mistralai](https://discord.com/invite/mistralai)\ Documentation: [https://docs.mistral.ai/](https://docs.mistral.ai/)\ Guardrailing: [https://docs.mistral.ai/usage/guardrailing](https://docs.mistral.ai/usage/guardrailing) ## Installation Note: You will use a GPU to install `mistral-inference`, as it currently requires `xformers` to be installed and `xformers` itself needs a GPU for installation. ### PyPI ``` pip install mistral-inference ``` ### Local ``` cd $HOME && git clone https://github.com/mistralai/mistral-inference cd $HOME/mistral-inference && poetry install . ``` ## Model download ### Direct links | Name | Download | md5sum | |-------------|-------|-------| | 7B Instruct | https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar | `80b71fcb6416085bcb4efad86dfb4d52` | | 8x7B Instruct | https://models.mistralcdn.com/mixtral-8x7b-v0-1/Mixtral-8x7B-v0.1-Instruct.tar (**Updated model coming soon!**) | `8e2d3930145dc43d3084396f49d38a3f` | | 8x22 Instruct | https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-Instruct-v0.3.tar | `471a02a6902706a2f1e44a693813855b` | | 7B Base | https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-v0.3.tar | `0663b293810d7571dad25dae2f2a5806` | | 8x7B | **Updated model coming soon!** | - | | 8x22B | https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-v0.3.tar | `a2fa75117174f87d1197e3a4eb50371a` | | Codestral 22B | https://models.mistralcdn.com/codestral-22b-v0-1/codestral-22B-v0.1.tar | `1ea95d474a1d374b1d1b20a8e0159de3` | | Mathstral 7B | https://models.mistralcdn.com/mathstral-7b-v0-1/mathstral-7B-v0.1.tar | `5f05443e94489c261462794b1016f10b` | | Codestral-Mamba 7B | https://models.mistralcdn.com/codestral-mamba-7b-v0-1/codestral-mamba-7B-v0.1.tar | `d3993e4024d1395910c55db0d11db163` | | Nemo Base | https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-base-2407.tar | `c5d079ac4b55fc1ae35f51f0a3c0eb83` | | Nemo Instruct | https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar | `296fbdf911cb88e6f0be74cd04827fe7` | | Mistral Large 2 | https://models.mistralcdn.com/mistral-large-2407/mistral-large-instruct-2407.tar | `fc602155f9e39151fba81fcaab2fa7c4` | Note: - **Important**: - `mixtral-8x22B-Instruct-v0.3.tar` is exactly the same as [Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1), only stored in `.safetensors` format - `mixtral-8x22B-v0.3.tar` is the same as [Mixtral-8x22B-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1), but has an extended vocabulary of 32768 tokens. - `codestral-22B-v0.1.tar` has a custom non-commercial license, called [Mistral AI Non-Production (MNPL) License](https://mistral.ai/licenses/MNPL-0.1.md) - `mistral-large-instruct-2407.tar` has a custom non-commercial license, called [Mistral AI Research (MRL) License](https://mistral.ai/licenses/MRL-0.1.md) - All of the listed models above support function calling. For example, Mistral 7B Base/Instruct v3 is a minor update to Mistral 7B Base/Instruct v2, with the addition of function calling capabilities. - The "coming soon" models will include function calling as well. - You can download the previous versions of our models from our [docs](https://docs.mistral.ai/getting-started/open_weight_models/#downloading). ### From Hugging Face Hub | Name | ID | URL | |-------------|-------|-------| | Pixtral Large Instruct | mistralai/Pixtral-Large-Instruct-2411 | https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411 | | Pixtral 12B Base | mistralai/Pixtral-12B-Base-2409 | https://huggingface.co/mistralai/Pixtral-12B-Base-2409 | | Pixtral 12B | mistralai/Pixtral-12B-2409 | https://huggingface.co/mistralai/Pixtral-12B-2409 | | Mistral Small 3.1 24B Base | mistralai/Mistral-Small-3.1-24B-Base-2503 | https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503 | Mistral Small 3.1 24B Instruct | mistralai/Mistral-Small-3.1-24B-Instruct-2503 | https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503 | ### Usage **News!!!**: Mistral Large 2 is out. Read more about its capabilities [here](https://mistral.ai/news/mistral-large-2407/). Create a local folder to store models ```sh export MISTRAL_MODEL=$HOME/mistral_models mkdir -p $MISTRAL_MODEL ``` Download any of the above links and extract the content, *e.g.*: ```sh export 12B_DIR=$MISTRAL_MODEL/12B_Nemo wget https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar mkdir -p $12B_DIR tar -xf mistral-nemo-instruct-2407.tar -C $12B_DIR ``` or ```sh export M8x7B_DIR=$MISTRAL_MODEL/8x7b_instruct wget https://models.mistralcdn.com/mixtral-8x7b-v0-1/Mixtral-8x7B-v0.1-Instruct.tar mkdir -p $M8x7B_DIR tar -xf Mixtral-8x7B-v0.1-Instruct.tar -C $M8x7B_DIR ``` For Hugging Face models' weights, here is an example to download [Mistral Small 3.1 24B Instruct](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503): ```python from pathlib import Path from huggingface_hub import snapshot_download mistral_models_path = Path.home().joinpath("mistral_models") model_path = mistral_models_path / "mistral-small-3.1-instruct" model_path.mkdir(parents=True, exist_ok=True) repo_id = "mistralai/Mistral-Small-3.1-24B-Instruct-2503" snapshot_download( repo_id=repo_id, allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=model_path, ) ``` ## Usage The following sections give an overview of how to run the model from the Command-line interface (CLI) or directly within Python. ### CLI - **Demo** To test that a model works in your setup, you can run the `mistral-demo` command. *E.g.* the 12B Mistral-Nemo model can be tested on a single GPU as follows: ```sh mistral-demo $12B_DIR ``` Large models, such **8x7B** and **8x22B** have to be run in a multi-GPU setup. For these models, you can use the following command: ```sh torchrun --nproc-per-node 2 --no-python mistral-demo $M8x7B_DIR ``` *Note*: Change `--nproc-per-node` to more GPUs if available. - **Chat** To interactively chat with the models, you can make use of the `mistral-chat` command. ```sh mistral-chat $12B_DIR --instruct --max_tokens 1024 --temperature 0.35 ``` For large models, you can make use of `torchrun`. ```sh torchrun --nproc-per-node 2 --no-python mistral-chat $M8x7B_DIR --instruct ``` *Note*: Change `--nproc-per-node` to more GPUs if necessary (*e.g.* for 8x22B). - **Chat with Codestral** To use [Codestral](https://mistral.ai/news/codestral/) as a coding assistant you can run the following command using `mistral-chat`. Make sure `$M22B_CODESTRAL` is set to a valid path to the downloaded codestral folder, e.g. `$HOME/mistral_models/Codestral-22B-v0.1` ```sh mistral-chat $M22B_CODESTRAL --instruct --max_tokens 256 ``` If you prompt it with *"Write me a function that computes fibonacci in Rust"*, the model should generate something along the following lines: ```sh Sure, here's a simple implementation of a function that computes the Fibonacci sequence in Rust. This function takes an integer `n` as an argument and returns the `n`th Fibonacci number. fn fibonacci(n: u32) -> u32 { match n { 0 => 0, 1 => 1, _ => fibonacci(n - 1) + fibonacci(n - 2), } } fn main() { let n = 10; println!("The {}th Fibonacci number is: {}", n, fibonacci(n)); } This function uses recursion to calculate the Fibonacci number. However, it's not the most efficient solution because it performs a lot of redundant calculations. A more efficient solution would use a loop to iteratively calculate the Fibonacci numbers. ``` You can continue chatting afterwards, *e.g.* with *"Translate it to Python"*. - **Chat with Codestral-Mamba** To use [Codestral-Mamba](https://mistral.ai/news/codestral-mamba/) as a coding assistant you can run the following command using `mistral-chat`. Make sure `$7B_CODESTRAL_MAMBA` is set to a valid path to the downloaded codestral-mamba folder, e.g. `$HOME/mistral_models/mamba-codestral-7B-v0.1`. You then need to additionally install the following packages: ``` pip install packaging mamba-ssm causal-conv1d transformers ``` before you can start chatting: ```sh mistral-chat $7B_CODESTRAL_MAMBA --instruct --max_tokens 256 ``` - **Chat with Mathstral** To use [Mathstral](https://mistral.ai/news/mathstral/) as an assistant you can run the following command using `mistral-chat`. Make sure `$7B_MATHSTRAL` is set to a valid path to the downloaded codestral folder, e.g. `$HOME/mistral_models/mathstral-7B-v0.1` ```sh mistral-chat $7B_MATHSTRAL --instruct --max_tokens 256 ``` If you prompt it with *"Albert likes to surf every week. Each surfing session lasts for 4 hours and costs $20 per hour. How much would Albert spend in 5 weeks?"*, the model should answer with the correct calculation. You can then continue chatting afterwards, *e.g.* with *"How much would he spend in a year?"*. - **Chat with Mistral Small 3.1 24B Instruct** To use [Mistral Small 3.1 24B Instruct](https://mistral.ai/news/mistral-small-3-1/) as an assistant you can run the following command using `mistral-chat`. Make sure `$MISTRAL_SMALL_3_1_INSTRUCT` is set to a valid path to the downloaded mistral small folder, e.g. `$HOME/mistral_models/mistral-small-3.1-instruct` ```sh mistral-chat $MISTRAL_SMALL_3_1_INSTRUCT --instruct --max_tokens 256 ``` If you prompt it with *"The above image presents an image of which park ? Please give the hints to identify the park."* with the following image URL *https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png*, the model should answer with the Yosemite park and give hints to identify it. You can then continue chatting afterwards, *e.g.* with *"What is the name of the lake in the image?"*. The model should respond that it is not a lake but a river. ### Python - *Instruction Following*: ```py from mistral_inference.transformer import Transformer from mistral_inference.generate import generate from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest tokenizer = MistralTokenizer.from_file("./mistral-nemo-instruct-v0.1/tekken.json") # change to extracted tokenizer file model = Transformer.from_folder("./mistral-nemo-instruct-v0.1") # change to extracted model dir prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar." completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)]) tokens = tokenizer.encode_chat_completion(completion_request).tokens out_tokens, _ = generate([tokens], model, max_tokens=1024, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0]) print(result) ``` - *Multimodal Instruction Following*: ```python from pathlib import Path from huggingface_hub import snapshot_download from mistral_common.protocol.instruct.messages import ImageURLChunk, TextChunk from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_inference.generate import generate from mistral_inference.transformer import Transformer model_path = Path.home().joinpath("mistral_models") / "mistral-small-3.1-instruct" # change to extracted model tokenizer = MistralTokenizer.from_file(model_path / "tekken.json") model = Transformer.from_folder(model_path) url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png" prompt = "The above image presents an image of which park ? Please give the hints to identify the park." user_content = [ImageURLChunk(image_url=url), TextChunk(text=prompt)] tokens, images = tokenizer.instruct_tokenizer.encode_user_content(user_content, False) out_tokens, _ = generate( [tokens], model, images=[images], max_tokens=256, temperature=0.15, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id, ) result = tokenizer.decode(out_tokens[0]) print("Prompt:", prompt) print("Completion:", result) ``` - *Function Calling*: ```py from mistral_common.protocol.instruct.tool_calls import Function, Tool completion_request = ChatCompletionRequest( tools=[ Tool( function=Function( name="get_current_weather", description="Get the current weather", parameters={ "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA", }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the users location.", }, }, "required": ["location", "format"], }, ) ) ], messages=[ UserMessage(content="What's the weather like today in Paris?"), ], ) tokens = tokenizer.encode_chat_completion(completion_request).tokens out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0]) print(result) ``` - *Fill-in-the-middle (FIM)*: Make sure to have `mistral-common >= 1.2.0` installed: ``` pip install --upgrade mistral-common ``` You can simulate a code completion in-filling as follows. ```py from mistral_inference.transformer import Transformer from mistral_inference.generate import generate from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.tokens.instruct.request import FIMRequest tokenizer = MistralTokenizer.from_model("codestral-22b") model = Transformer.from_folder("./mistral_22b_codestral") prefix = """def add(""" suffix = """ return sum""" request = FIMRequest(prompt=prefix, suffix=suffix) tokens = tokenizer.encode_fim(request).tokens out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) result = tokenizer.decode(out_tokens[0]) middle = result.split(suffix)[0].strip() print(middle) ``` ### Test To run logits equivalence: ``` python -m pytest tests ``` ## Deployment The `deploy` folder contains code to build a [vLLM](https://M7B_DIR.com/vllm-project/vllm) image with the required dependencies to serve the Mistral AI model. In the image, the [transformers](https://github.com/huggingface/transformers/) library is used instead of the reference implementation. To build it: ```bash docker build deploy --build-arg MAX_JOBS=8 ``` Instructions to run the image can be found in the [official documentation](https://docs.mistral.ai/quickstart). ## Model platforms - Use Mistral models on [Mistral AI official API](https://console.mistral.ai/) (La Plateforme) - Use Mistral models via [cloud providers](https://docs.mistral.ai/deployment/cloud/overview/) ## References [1]: [LoRA](https://arxiv.org/abs/2106.09685): Low-Rank Adaptation of Large Language Models, Hu et al. 2021 ## License This library is licensed under the Apache 2.0 License. See the [LICENCE](./LICENCE) file for more information. *You must not use this library or our models in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.* ", Assign "at most 3 tags" to the expected json: {"id":"2647","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"