base on Ollama Python library # Ollama Python Library The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with [Ollama](https://github.com/ollama/ollama). ## Prerequisites - [Ollama](https://ollama.com/download) should be installed and running - Pull a model to use with the library: `ollama pull <model>` e.g. `ollama pull gemma3` - See [Ollama.com](https://ollama.com/search) for more information on the models available. ## Install ```sh pip install ollama ``` ## Usage ```python from ollama import chat from ollama import ChatResponse response: ChatResponse = chat(model='gemma3', messages=[ { 'role': 'user', 'content': 'Why is the sky blue?', }, ]) print(response['message']['content']) # or access fields directly from the response object print(response.message.content) ``` See [_types.py](ollama/_types.py) for more information on the response types. ## Streaming responses Response streaming can be enabled by setting `stream=True`. ```python from ollama import chat stream = chat( model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}], stream=True, ) for chunk in stream: print(chunk['message']['content'], end='', flush=True) ``` ## Cloud Models Run larger models by offloading to Ollama’s cloud while keeping your local workflow. - Supported models: `deepseek-v3.1:671b-cloud`, `gpt-oss:20b-cloud`, `gpt-oss:120b-cloud`, `kimi-k2:1t-cloud`, `qwen3-coder:480b-cloud`, `kimi-k2-thinking` See [Ollama Models - Cloud](https://ollama.com/search?c=cloud) for more information ### Run via local Ollama 1) Sign in (one-time): ``` ollama signin ``` 2) Pull a cloud model: ``` ollama pull gpt-oss:120b-cloud ``` 3) Make a request: ```python from ollama import Client client = Client() messages = [ { 'role': 'user', 'content': 'Why is the sky blue?', }, ] for part in client.chat('gpt-oss:120b-cloud', messages=messages, stream=True): print(part.message.content, end='', flush=True) ``` ### Cloud API (ollama.com) Access cloud models directly by pointing the client at `https://ollama.com`. 1) Create an API key from [ollama.com](https://ollama.com/settings/keys) , then set: ``` export OLLAMA_API_KEY=your_api_key ``` 2) (Optional) List models available via the API: ``` curl https://ollama.com/api/tags ``` 3) Generate a response via the cloud API: ```python import os from ollama import Client client = Client( host='https://ollama.com', headers={'Authorization': 'Bearer ' + os.environ.get('OLLAMA_API_KEY')} ) messages = [ { 'role': 'user', 'content': 'Why is the sky blue?', }, ] for part in client.chat('gpt-oss:120b', messages=messages, stream=True): print(part.message.content, end='', flush=True) ``` ## Custom client A custom client can be created by instantiating `Client` or `AsyncClient` from `ollama`. All extra keyword arguments are passed into the [`httpx.Client`](https://www.python-httpx.org/api/#client). ```python from ollama import Client client = Client( host='http://localhost:11434', headers={'x-some-header': 'some-value'} ) response = client.chat(model='gemma3', messages=[ { 'role': 'user', 'content': 'Why is the sky blue?', }, ]) ``` ## Async client The `AsyncClient` class is used to make asynchronous requests. It can be configured with the same fields as the `Client` class. ```python import asyncio from ollama import AsyncClient async def chat(): message = {'role': 'user', 'content': 'Why is the sky blue?'} response = await AsyncClient().chat(model='gemma3', messages=[message]) asyncio.run(chat()) ``` Setting `stream=True` modifies functions to return a Python asynchronous generator: ```python import asyncio from ollama import AsyncClient async def chat(): message = {'role': 'user', 'content': 'Why is the sky blue?'} async for part in await AsyncClient().chat(model='gemma3', messages=[message], stream=True): print(part['message']['content'], end='', flush=True) asyncio.run(chat()) ``` ## API The Ollama Python library's API is designed around the [Ollama REST API](https://github.com/ollama/ollama/blob/main/docs/api.md) ### Chat ```python ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}]) ``` ### Generate ```python ollama.generate(model='gemma3', prompt='Why is the sky blue?') ``` ### List ```python ollama.list() ``` ### Show ```python ollama.show('gemma3') ``` ### Create ```python ollama.create(model='example', from_='gemma3', system="You are Mario from Super Mario Bros.") ``` ### Copy ```python ollama.copy('gemma3', 'user/gemma3') ``` ### Delete ```python ollama.delete('gemma3') ``` ### Pull ```python ollama.pull('gemma3') ``` ### Push ```python ollama.push('user/gemma3') ``` ### Embed ```python ollama.embed(model='gemma3', input='The sky is blue because of rayleigh scattering') ``` ### Embed (batch) ```python ollama.embed(model='gemma3', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll']) ``` ### Ps ```python ollama.ps() ``` ## Errors Errors are raised if requests return an error status or if an error is detected while streaming. ```python model = 'does-not-yet-exist' try: ollama.chat(model) except ollama.ResponseError as e: print('Error:', e.error) if e.status_code == 404: ollama.pull(model) ``` ", Assign "at most 3 tags" to the expected json: {"id":"11263","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"