base on TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation. <p><picture><img src="https://github.com/user-attachments/assets/9d0a93c6-7685-4e57-9737-7cbeb338a218" alt="TensorZero Logo" width="128" height="128"></picture></p>
# TensorZero
<p><picture><img src="https://www.tensorzero.com/github-trending-badge.svg" alt="#1 Repository Of The Day"></picture></p>
**TensorZero is an open-source stack for _industrial-grade LLM applications_:**
- **Gateway:** access every LLM provider through a unified API, built for performance (<1ms p99 latency)
- **Observability:** store inferences and feedback in your database, available programmatically or in the UI
- **Optimization:** collect metrics and human feedback to optimize prompts, models, and inference strategies
- **Evaluation:** benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc.
- **Experimentation:** ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.
Take what you need, adopt incrementally, and complement with other tools.
<video src="https://github.com/user-attachments/assets/04a8466e-27d8-4189-b305-e7cecb6881ee"></video>
---
<p align="center">
<b><a href="https://www.tensorzero.com/" target="_blank">Website</a></b>
·
<b><a href="https://www.tensorzero.com/docs" target="_blank">Docs</a></b>
·
<b><a href="https://www.x.com/tensorzero" target="_blank">Twitter</a></b>
·
<b><a href="https://www.tensorzero.com/slack" target="_blank">Slack</a></b>
·
<b><a href="https://www.tensorzero.com/discord" target="_blank">Discord</a></b>
<br>
<br>
<b><a href="https://www.tensorzero.com/docs/quickstart" target="_blank">Quick Start (5min)</a></b>
·
<b><a href="https://www.tensorzero.com/docs/gateway/deployment" target="_blank">Deployment Guide</a></b>
·
<b><a href="https://www.tensorzero.com/docs/gateway/api-reference" target="_blank">API Reference</a></b>
·
<b><a href="https://www.tensorzero.com/docs/gateway/deployment" target="_blank">Configuration Reference</a></b>
</p>
---
> [!NOTE]
>
> ### **Coming Soon: TensorZero Autopilot**
>
> TensorZero Autopilot is an **automated AI engineer** (powered by the TensorZero Stack) that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests.
> **[Learn more](https://www.tensorzero.com/)** **[Join the waitlist](https://tensorzero.com/autopilot-waitlist)**
## Features
### 🌐 LLM Gateway
> **Integrate with TensorZero once and access every major LLM provider.**
- [x] **[Call any LLM](https://www.tensorzero.com/docs/gateway/call-any-llm)** (API or self-hosted) through a single unified API
- [x] Infer with **[tool use](https://www.tensorzero.com/docs/gateway/guides/tool-use)**, **[structured outputs (JSON)](https://www.tensorzero.com/docs/gateway/generate-structured-outputs)**, **[batch](https://www.tensorzero.com/docs/gateway/guides/batch-inference)**, **[embeddings](https://www.tensorzero.com/docs/gateway/generate-embeddings)**, **[multimodal (images, files)](https://www.tensorzero.com/docs/gateway/call-llms-with-image-and-file-inputs)**, **[caching](https://www.tensorzero.com/docs/gateway/guides/inference-caching)**, etc.
- [x] **[Create prompt templates and schemas](https://www.tensorzero.com/docs/gateway/create-a-prompt-template)** to enforce a structured interface between your application and the LLMs
- [x] Satisfy extreme throughput and latency needs, thanks to 🦀 Rust: **[<1ms p99 latency overhead at 10k+ QPS](https://www.tensorzero.com/docs/gateway/benchmarks)**
- [x] Use any programming language: **[integrate via our Python SDK, any OpenAI SDK, or our HTTP API](https://www.tensorzero.com/docs/gateway/clients)**
- [x] **[Ensure high availability](https://www.tensorzero.com/docs/gateway/guides/retries-fallbacks)** with routing, retries, fallbacks, load balancing, granular timeouts, etc.
- [x] **[Track usage and cost](https://www.tensorzero.com/docs/operations/track-usage-and-cost)** and **[enforce custom rate limits](https://www.tensorzero.com/docs/operations/enforce-custom-rate-limits)** with granular scopes (e.g. tags)
- [x] **[Set up auth for TensorZero](https://www.tensorzero.com/docs/operations/set-up-auth-for-tensorzero)** to allow clients to access models without sharing provider API keys
#### Supported Model Providers
**[Anthropic](https://www.tensorzero.com/docs/gateway/guides/providers/anthropic)**,
**[AWS Bedrock](https://www.tensorzero.com/docs/gateway/guides/providers/aws-bedrock)**,
**[AWS SageMaker](https://www.tensorzero.com/docs/gateway/guides/providers/aws-sagemaker)**,
**[Azure](https://www.tensorzero.com/docs/gateway/guides/providers/azure)**,
**[DeepSeek](https://www.tensorzero.com/docs/gateway/guides/providers/deepseek)**,
**[Fireworks](https://www.tensorzero.com/docs/gateway/guides/providers/fireworks)**,
**[GCP Vertex AI Anthropic](https://www.tensorzero.com/docs/gateway/guides/providers/gcp-vertex-ai-anthropic)**,
**[GCP Vertex AI Gemini](https://www.tensorzero.com/docs/gateway/guides/providers/gcp-vertex-ai-gemini)**,
**[Google AI Studio (Gemini API)](https://www.tensorzero.com/docs/gateway/guides/providers/google-ai-studio-gemini)**,
**[Groq](https://www.tensorzero.com/docs/gateway/guides/providers/groq)**,
**[Hyperbolic](https://www.tensorzero.com/docs/gateway/guides/providers/hyperbolic)**,
**[Mistral](https://www.tensorzero.com/docs/gateway/guides/providers/mistral)**,
**[OpenAI](https://www.tensorzero.com/docs/gateway/guides/providers/openai)**,
**[OpenRouter](https://www.tensorzero.com/docs/gateway/guides/providers/openrouter)**,
**[SGLang](https://www.tensorzero.com/docs/gateway/guides/providers/sglang)**,
**[TGI](https://www.tensorzero.com/docs/gateway/guides/providers/tgi)**,
**[Together AI](https://www.tensorzero.com/docs/gateway/guides/providers/together)**,
**[vLLM](https://www.tensorzero.com/docs/gateway/guides/providers/vllm)**, and
**[xAI (Grok)](https://www.tensorzero.com/docs/gateway/guides/providers/xai)**.
Need something else? TensorZero also supports **[any OpenAI-compatible API (e.g. Ollama)](https://www.tensorzero.com/docs/gateway/guides/providers/openai-compatible)**.
#### Usage Example
You can use TensorZero with any OpenAI SDK (Python, Node, Go, etc.) or OpenAI-compatible client.
1. **[Deploy the TensorZero Gateway](https://www.tensorzero.com/docs/deployment/tensorzero-gateway)** (one Docker container).
2. Update the `base_url` and `model` in your OpenAI-compatible client.
3. Run inference:
```python
from openai import OpenAI
# Point the client to the TensorZero Gateway
client = OpenAI(base_url="http://localhost:3000/openai/v1", api_key="not-used")
response = client.chat.completions.create(
# Call any model provider (or TensorZero function)
model="tensorzero::model_name::anthropic::claude-sonnet-4-6",
messages=[
{
"role": "user",
"content": "Write a haiku about TensorZero.",
}
],
)
```
See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information.
### 🔍 LLM Observability
> **Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.**
- [x] Store inferences and **[feedback (metrics, human edits, etc.)](https://www.tensorzero.com/docs/gateway/guides/metrics-feedback)** in your own database
- [x] Dive into individual inferences or high-level aggregate patterns using the TensorZero UI or programmatically
- [x] **[Build datasets](https://www.tensorzero.com/docs/gateway/api-reference/datasets-datapoints)** for optimization, evaluation, and other workflows
- [x] Replay historical inferences with new prompts, models, inference strategies, etc.
- [x] **[Export OpenTelemetry traces (OTLP)](https://www.tensorzero.com/docs/operations/export-opentelemetry-traces)** and **[export Prometheus metrics](https://www.tensorzero.com/docs/observability/export-prometheus-metrics)** to your favorite application observability tools
- [ ] Soon: AI-assisted debugging and root cause analysis; AI-assisted data labeling
### 📈 LLM Optimization
> **Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.**
- [x] Optimize your models with **[supervised fine-tuning](https://www.tensorzero.com/docs/optimization/supervised-fine-tuning-sft)**, RLHF, and other techniques
- [x] Optimize your prompts with automated prompt engineering algorithms like **[GEPA](https://www.tensorzero.com/docs/optimization/gepa)**
- [x] Optimize your **[inference strategy](https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations)** with **[dynamic in-context learning](https://www.tensorzero.com/docs/optimization/dynamic-in-context-learning-dicl)**, best/mixture-of-N sampling, etc.
- [x] Enable a feedback loop for your LLMs: a data & learning flywheel turning production data into smarter, faster, and cheaper models
- [ ] Soon: synthetic data generation
### 📊 LLM Evaluation
> **Compare prompts, models, and inference strategies using evaluations powered by heuristics and LLM judges.**
- [x] **[Evaluate individual inferences](https://www.tensorzero.com/docs/evaluations/inference-evaluations/tutorial)** with _inference evaluations_ powered by heuristics or LLM judges (≈ unit tests for LLMs)
- [x] **[Evaluate end-to-end workflows](https://www.tensorzero.com/docs/evaluations/workflow-evaluations/tutorial)** with _workflow evaluations_ with complete flexibility (≈ integration tests for LLMs)
- [x] Optimize LLM judges just like any other TensorZero function to align them to human preferences
- [ ] Soon: more built-in evaluators; headless evaluations
<table>
<tr></tr> <!-- flip highlight order -->
<tr>
<td width="50%" align="center" valign="middle"><b>Evaluation » UI</b></td>
<td width="50%" align="center" valign="middle"><b>Evaluation » CLI</b></td>
</tr>
<tr>
<td width="50%" align="center" valign="middle"><img src="https://github.com/user-attachments/assets/f4bf54e3-1b63-46c8-be12-2eaabf615699"></td>
<td width="50%" align="left" valign="middle">
<pre><code class="language-bash">docker compose run --rm evaluations \
--evaluation-name extract_data \
--dataset-name hard_test_cases \
--variant-name gpt_4o \
--concurrency 5</code></pre>
<pre><code class="language-bash">Run ID: 01961de9-c8a4-7c60-ab8d-15491a9708e4
Number of datapoints: 100
██████████████████████████████████████ 100/100
exact_match: 0.83 ± 0.03 (n=100)
semantic_match: 0.98 ± 0.01 (n=100)
item_count: 7.15 ± 0.39 (n=100)</code></pre>
</td>
</tr>
</table>
### 🧪 LLM Experimentation
> **Ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.**
- [x] **[Run adaptive A/B tests](https://www.tensorzero.com/docs/experimentation/run-adaptive-ab-tests)** to ship with confidence and identify the best prompts and models for your use cases.
- [x] Enforce principled experiments in complex workflows, including support for multi-turn LLM systems, sequential testing, and more.
### & more!
> **Build with an open-source stack well-suited for prototypes but designed from the ground up to support the most complex LLM applications and deployments.**
- [x] Build simple applications or massive deployments with GitOps-friendly orchestration
- [x] **[Extend TensorZero](https://www.tensorzero.com/docs/operations/extend-tensorzero)** with built-in escape hatches, programmatic-first usage, direct database access, and more
- [x] Integrate with third-party tools: specialized observability and evaluations, model providers, agent orchestration frameworks, etc.
- [x] Iterate quickly by experimenting with prompts interactively using the Playground UI
## Frequently Asked Questions
**How is TensorZero different from other LLM frameworks?**
1. TensorZero enables you to optimize complex LLM applications based on production metrics and human feedback.
2. TensorZero supports the needs of industrial-grade LLM applications: low latency, high throughput, type safety, self-hosted, GitOps, customizability, etc.
3. TensorZero unifies the entire LLMOps stack, creating compounding benefits. For example, LLM evaluations can be used for fine-tuning models alongside AI judges.
**Can I use TensorZero with \_\_\_?**
Yes.
Every major programming language is supported.
It plays nicely with the **[OpenAI SDK](https://www.tensorzero.com/docs/gateway/clients/)**, **[OpenTelemetry](https://www.tensorzero.com/docs/operations/export-opentelemetry-traces/)**, and **[every major LLM](https://www.tensorzero.com/docs/integrations/model-providers/)**.
**Is TensorZero production-ready?**
Yes.
TensorZero is used by companies ranging from frontier AI startups to the Fortune 10 and powers ~1% of the global LLM API spend today.
Here's a case study: **[Automating Code Changelogs at a Large Bank with LLMs](https://www.tensorzero.com/blog/case-study-automating-code-changelogs-at-a-large-bank-with-llms)**
**How much does TensorZero cost?**
TensorZero Stack (LLMOps platform) is 100% self-hosted and open-source.
TensorZero Autopilot (automated AI engineer) is a complementary paid product powered by the TensorZero Stack.
**Who is building TensorZero?**
Our technical team includes a former Rust compiler maintainer, machine learning researchers (Stanford, CMU, Oxford, Columbia) with thousands of citations, and the chief product officer of a decacorn startup. We're backed by the same investors as leading open-source projects (e.g. ClickHouse, CockroachDB) and AI labs (e.g. OpenAI, Anthropic). See our **[$7.3M seed round announcement](https://www.tensorzero.com/blog/tensorzero-raises-7-3m-seed-round-to-build-an-open-source-stack-for-industrial-grade-llm-applications/)** and **[coverage from VentureBeat](https://venturebeat.com/ai/tensorzero-nabs-7-3m-seed-to-solve-the-messy-world-of-enterprise-llm-development/)**. We're **[hiring in NYC](https://www.tensorzero.com/jobs)**.
**How do I get started?**
You can adopt TensorZero incrementally. Our **[Quick Start](https://www.tensorzero.com/docs/quickstart)** goes from a vanilla OpenAI wrapper to a production-ready LLM application with observability and fine-tuning in just 5 minutes.
## Demo
> **Watch LLMs get better at data extraction in real-time with TensorZero!**
>
> **[Dynamic in-context learning (DICL)](https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations#dynamic-in-context-learning-dicl)** is a powerful inference-time optimization available out of the box with TensorZero.
> It enhances LLM performance by automatically incorporating relevant historical examples into the prompt, without the need for model fine-tuning.
https://github.com/user-attachments/assets/4df1022e-886e-48c2-8f79-6af3cdad79cb
## Get Started
**Start building today.**
The **[Quick Start](https://www.tensorzero.com/docs/quickstart)** shows it's easy to set up an LLM application with TensorZero.
**Questions?**
Ask us on **[Slack](https://www.tensorzero.com/slack)** or **[Discord](https://www.tensorzero.com/discord)**.
**Using TensorZero at work?**
Email us at **[
[email protected]](mailto:
[email protected])** to set up a Slack or Teams channel with your team (free).
## Examples
We are working on a series of **complete runnable examples** illustrating TensorZero's data & learning flywheel.
> **[Optimizing Data Extraction (NER) with TensorZero](https://github.com/tensorzero/tensorzero/tree/main/examples/data-extraction-ner)**
>
> This example shows how to use TensorZero to optimize a data extraction pipeline.
> We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL).
> In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.
> **[Agentic RAG — Multi-Hop Question Answering with LLMs](https://github.com/tensorzero/tensorzero/tree/main/examples/rag-retrieval-augmented-generation/simple-agentic-rag/)**
>
> This example shows how to build a multi-hop retrieval agent using TensorZero.
> The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question.
> **[Writing Haikus to Satisfy a Judge with Hidden Preferences](https://github.com/tensorzero/tensorzero/tree/main/examples/haiku-hidden-preferences)**
>
> This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste.
> You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants.
> You'll see progress by fine-tuning the LLM multiple times.
> **[Image Data Extraction — Multimodal (Vision) Fine-tuning](https://github.com/tensorzero/tensorzero/tree/main/examples/multimodal-vision-finetuning)**
>
> This example shows how to fine-tune multimodal models (VLMs) like GPT-4o to improve their performance on vision-language tasks.
> Specifically, we'll build a system that categorizes document images (screenshots of computer science research papers).
> **[Improving LLM Chess Ability with Best-of-N Sampling](https://github.com/tensorzero/tensorzero/tree/main/examples/chess-puzzles/)**
>
> This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.
## Blog Posts
We write about LLM engineering on the **[TensorZero Blog](https://www.tensorzero.com/blog)**.
Here are some of our favorite posts:
- **[Bandits in your LLM Gateway: Improve LLM Applications Faster with Adaptive Experimentation (A/B Testing)](https://www.tensorzero.com/blog/bandits-in-your-llm-gateway/)**
- **[Is OpenAI's Reinforcement Fine-Tuning (RFT) Worth It?](https://www.tensorzero.com/blog/is-openai-reinforcement-fine-tuning-rft-worth-it/)**
- **[Distillation with Programmatic Data Curation: Smarter LLMs, 5-30x Cheaper Inference](https://www.tensorzero.com/blog/distillation-programmatic-data-curation-smarter-llms-5-30x-cheaper-inference/)**
- **[From NER to Agents: Does Automated Prompt Engineering Scale to Complex Tasks?](https://www.tensorzero.com/blog/from-ner-to-agents-does-automated-prompt-engineering-scale-to-complex-tasks/)**
", Assign "at most 3 tags" to the expected json: {"id":"12213","tags":[]} "only from the tags list I provide: []" returns me the "expected json"