AI prompts
base on Deploy high-performance AI models and inference pipelines on FastAPI with built-in batching, streaming and more. <div align='center'>
# Deploy AI models and inference pipelines - ⚡ fast
<img alt="Lightning" src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/ls_banner2.png" width="800px" style="max-width: 100%;">
</div>
**LitServe** lets you build high-performance AI inference pipelines on top of FastAPI - no boilerplate. Define one or more models, connect vector DBs, stream responses, batch requests, and autoscale on GPUs out of the box.
LitServe is at least [2x faster](#performance) than plain FastAPI due to AI-specific multi-worker handling.
<div align='center'>
<pre>
✅ (2x)+ faster serving ✅ Easy to use ✅ LLMs, non LLMs and more
✅ Bring your own model ✅ PyTorch/JAX/TF/... ✅ Built on FastAPI
✅ GPU autoscaling ✅ Batching, Streaming ✅ Self-host or ⚡️ managed
✅ Inference pipeline ✅ Integrate with vLLM, etc ✅ Serverless
</pre>
<div align='center'>
[](https://pepy.tech/projects/litserve)
[](https://discord.gg/WajDThKAur)

[](https://codecov.io/gh/Lightning-AI/litserve)
[](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)
</div>
</div>
<div align="center">
<div style="text-align: center;">
<a target="_blank" href="#quick-start" style="margin: 0 10px;">Quick start</a> •
<a target="_blank" href="#featured-examples" style="margin: 0 10px;">Examples</a> •
<a target="_blank" href="#features" style="margin: 0 10px;">Features</a> •
<a target="_blank" href="#performance" style="margin: 0 10px;">Performance</a> •
<a target="_blank" href="#hosting-options" style="margin: 0 10px;">Hosting</a> •
<a target="_blank" href="https://lightning.ai/docs/litserve" style="margin: 0 10px;">Docs</a>
</div>
</div>
<div align="center">
<a target="_blank" href="https://lightning.ai/docs/litserve/home/get-started">
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/get-started-badge.svg" height="36px" alt="Get started"/>
</a>
</div>
# Quick start
Install LitServe via pip ([more options](https://lightning.ai/docs/litserve/home/install)):
```bash
pip install litserve
```
### Define a server
This toy example with 2 models (inference pipeline) shows LitServe's flexibility ([see real examples](#featured-examples)):
```python
# server.py
import litserve as ls
# (STEP 1) - DEFINE THE API ("inference" pipeline)
class SimpleLitAPI(ls.LitAPI):
def setup(self, device):
# setup is called once at startup. Defines elements of the pipeline: models, connect DBs, load data, etc...
self.model1 = lambda x: x**2
self.model2 = lambda x: x**3
def decode_request(self, request):
# Convert the request payload to model input.
return request["input"]
def predict(self, x):
# Run the inference pipeline and return the output
a = self.model1(x)
b = self.model2(x)
c = a + b
return {"output": c}
def encode_response(self, output):
# Convert the model output to a response payload.
return {"output": output}
# (STEP 2) - START THE SERVER
if __name__ == "__main__":
# scale with advanced features (batching, GPUs, etc...)
server = ls.LitServer(SimpleLitAPI(), accelerator="auto", max_batch_size=1)
server.run(port=8000)
```
Now run the server anywhere (local or cloud) via the command-line.
```bash
# Deploy to the cloud of your choice via Lightning AI (serverless, autoscaling, etc.)
lightning serve server.py
# Or run locally (self host anywhere)
lightning serve server.py --local
```
Learn more about managed hosting on [Lightning AI](#hosting-options).
You can also run the server manually:
```bash
python server.py
```
### Test the server
Simulate an http request (run this on any terminal):
```bash
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
```
### LLM serving
LitServe isn’t *just* for LLMs like vLLM or Ollama; it serves any AI model with full control over internals ([learn more](https://lightning.ai/docs/litserve/features/serve-llms)).
For easy LLM serving, integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe).
```
litgpt serve microsoft/phi-2
```
### Summary
- LitAPI lets you easily build complex AI systems with one or more models ([docs](https://lightning.ai/docs/litserve/api-reference/litapi)).
- Use the setup method for one-time tasks like connecting models, DBs, and loading data ([docs](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).
- LitServer handles optimizations like batching, GPU autoscaling, streaming, etc... ([docs](https://lightning.ai/docs/litserve/api-reference/litserver)).
- Self host on your machines or create a fully managed deployment with Lightning ([learn more](https://lightning.ai/docs/litserve/features/deploy-on-cloud)).
[Learn how to make this server 200x faster](https://lightning.ai/docs/litserve/home/speed-up-serving-by-200x).
# Featured examples
Here are examples of inference pipelines for common model types and use cases.
<pre>
<strong>Toy model:</strong> <a target="_blank" href="#define-a-server">Hello world</a>
<strong>LLMs:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-llama-3-2-vision-with-litserve">Llama 3.2</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/openai-fault-tolerant-proxy-server">LLM Proxy server</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-ai-agent-with-tool-use">Agent with tool use</a>
<strong>RAG:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api">vLLM RAG (Llama 3.2)</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-1-rag-api">RAG API (LlamaIndex)</a>
<strong>NLP:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-any-hugging-face-model-instantly">Hugging face</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-hugging-face-bert-model">BERT</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-text-embedding-api-with-litserve">Text embedding API</a>
<strong>Multimodal:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-open-ai-clip-with-litserve">OpenAI Clip</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-multi-modal-llm-with-minicpm">MiniCPM</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-phi3-5-vision-api-with-litserve">Phi-3.5 Vision Instruct</a>, <a target="_blank" href="https://lightning.ai/bhimrajyadav/studios/deploy-and-chat-with-qwen2-vl-using-litserve">Qwen2-VL</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-multi-modal-llm-with-pixtral">Pixtral</a>
<strong>Audio:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-open-ai-s-whisper-model">Whisper</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-an-music-generation-api-with-meta-s-audio-craft">AudioCraft</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-an-audio-generation-api">StableAudio</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-noise-cancellation-api-with-deepfilternet">Noise cancellation (DeepFilterNet)</a>
<strong>Vision:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-private-api-for-stable-diffusion-2">Stable diffusion 2</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-an-image-generation-api-with-auraflow">AuraFlow</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-an-image-generation-api-with-flux">Flux</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-super-resolution-image-api-with-aura-sr">Image Super Resolution (Aura SR)</a>,
<a target="_blank" href="https://lightning.ai/bhimrajyadav/studios/deploy-background-removal-api-with-litserve">Background Removal</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-controlled-image-generation-api-controlnet">Control Stable Diffusion (ControlNet)</a>
<strong>Speech:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-a-voice-clone-api-coqui-xtts-v2-model">Text-speech (XTTS V2)</a>, <a target="_blank" href="https://lightning.ai/bhimrajyadav/studios/deploy-a-speech-generation-api-using-parler-tts-powered-by-litserve">Parler-TTS</a>
<strong>Classical ML:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-random-forest-with-litserve">Random forest</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-xgboost-with-litserve">XGBoost</a>
<strong>Miscellaneous:</strong> <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-an-media-conversion-api-with-ffmpeg">Media conversion API (ffmpeg)</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/deploy-both-pytorch-and-tensorflow-in-a-single-api">PyTorch + TensorFlow in one API</a>, <a target="_blank" href="https://lightning.ai/lightning-ai/studios/openai-fault-tolerant-proxy-server">LLM proxy server</a>
</pre>
</pre>
[Browse 100+ community-built templates](https://lightning.ai/studios?section=serving)
# Hosting options
Self host LitServe anywhere or deploy to your favorite cloud via [Lightning AI](http://lightning.ai/deploy).
https://github.com/user-attachments/assets/ff83dab9-0c9f-4453-8dcb-fb9526726344
Self-hosting is ideal for hackers, students, and DIY developers while fully managed hosting is ideal for enterprise developers needing easy autoscaling, security, release management, and 99.995% uptime and observability.
*Note:* Lightning offers a generous free tier for developers.
To host on [Lightning AI](https://lightning.ai/deploy), simply run the command, login and choose the cloud of your choice.
```bash
lightning serve server.py
```
## Features
<div align='center'>
| [Feature](https://lightning.ai/docs/litserve/features) | Self Managed | [Fully Managed on Lightning](https://lightning.ai/deploy) |
|----------------------------------------------------------------------|-----------------------------------|------------------------------------|
| Docker-first deployment | ✅ DIY | ✅ One-click deploy |
| Cost | ✅ Free (DIY) | ✅ Generous [free tier](https://lightning.ai/pricing) with pay as you go |
| Full control | ✅ | ✅ |
| Use any engine (vLLM, etc.) | ✅ | ✅ vLLM, Ollama, LitServe, etc. |
| Own VPC | ✅ (manual setup) | ✅ Connect your own VPC |
| [(2x)+ faster than plain FastAPI](#performance) | ✅ | ✅ |
| [Bring your own model](https://lightning.ai/docs/litserve/features/full-control) | ✅ | ✅ |
| [Build compound systems (1+ models)](https://lightning.ai/docs/litserve/home) | ✅ | ✅ |
| [GPU autoscaling](https://lightning.ai/docs/litserve/features/gpu-inference) | ✅ | ✅ |
| [Batching](https://lightning.ai/docs/litserve/features/batching) | ✅ | ✅ |
| [Streaming](https://lightning.ai/docs/litserve/features/streaming) | ✅ | ✅ |
| [Worker autoscaling](https://lightning.ai/docs/litserve/features/autoscaling) | ✅ | ✅ |
| [Serve all models: (LLMs, vision, etc.)](https://lightning.ai/docs/litserve/examples) | ✅ | ✅ |
| [Supports PyTorch, JAX, TF, etc...](https://lightning.ai/docs/litserve/features/full-control) | ✅ | ✅ |
| [OpenAPI compliant](https://www.openapis.org/) | ✅ | ✅ |
| [Open AI compatibility](https://lightning.ai/docs/litserve/features/open-ai-spec) | ✅ | ✅ |
| [Authentication](https://lightning.ai/docs/litserve/features/authentication) | ❌ DIY | ✅ Token, password, custom |
| GPUs | ❌ DIY | ✅ 8+ GPU types, H100s from $1.75 |
| Load balancing | ❌ | ✅ Built-in |
| Scale to zero (serverless) | ❌ | ✅ No machine runs when idle |
| Autoscale up on demand | ❌ | ✅ Auto scale up/down |
| Multi-node inference | ❌ | ✅ Distribute across nodes |
| Use AWS/GCP credits | ❌ | ✅ Use existing cloud commits |
| Versioning | ❌ | ✅ Make and roll back releases |
| Enterprise-grade uptime (99.95%) | ❌ | ✅ SLA-backed |
| SOC2 / HIPAA compliance | ❌ | ✅ Certified & secure |
| Observability | ❌ | ✅ Built-in, connect 3rd party tools|
| CI/CD ready | ❌ | ✅ Lightning SDK |
| 24/7 enterprise support | ❌ | ✅ Dedicated support |
| Cost controls & audit logs | ❌ | ✅ Budgets, breakdowns, logs |
| Debug on GPUs | ❌ | ✅ Studio integration |
| [20+ features](https://lightning.ai/docs/litserve/features) | - | - |
</div>
# Performance
LitServe is designed for AI workloads. Specialized multi-worker handling delivers a minimum **2x speedup over FastAPI**.
Additional features like batching and GPU autoscaling can drive performance well beyond 2x, scaling efficiently to handle more simultaneous requests than FastAPI and TorchServe.
Reproduce the full benchmarks [here](https://lightning.ai/docs/litserve/home/benchmarks) (higher is better).
<div align="center">
<img alt="LitServe" src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/ls_charts_v6.png" width="1000px" style="max-width: 100%;">
</div>
These results are for image and text classification ML tasks. The performance relationships hold for other ML tasks (embedding, LLM serving, audio, segmentation, object detection, summarization etc...).
***💡 Note on LLM serving:*** For high-performance LLM serving (like Ollama/vLLM), integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm), or build your custom vLLM-like server with LitServe. Optimizations like kv-caching, which can be done with LitServe, are needed to maximize LLM performance.
# Community
LitServe is a [community project accepting contributions](https://lightning.ai/docs/litserve/community) - Let's make the world's most advanced AI inference engine.
💬 [Get help on Discord](https://discord.com/invite/XncpTy7DSt)
📋 [License: Apache 2.0](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)
", Assign "at most 3 tags" to the expected json: {"id":"11569","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"