base on TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation. <p><picture><img src="https://github.com/user-attachments/assets/9d0a93c6-7685-4e57-9737-7cbeb338a218" alt="TensorZero Logo" width="128" height="128"></picture></p> # TensorZero <p><picture><img src="https://www.tensorzero.com/github-trending-badge.svg" alt="#1 Repository Of The Day"></picture></p> **TensorZero is an open-source stack for _industrial-grade LLM applications_:** - **Gateway:** access every LLM provider through a unified API, built for performance (<1ms p99 latency) - **Observability:** store inferences and feedback in your database, available programmatically or in the UI - **Optimization:** collect metrics and human feedback to optimize prompts, models, and inference strategies - **Evaluation:** benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc. - **Experimentation:** ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc. Take what you need, adopt incrementally, and complement with other tools. <video src="https://github.com/user-attachments/assets/04a8466e-27d8-4189-b305-e7cecb6881ee"></video> --- <p align="center"> <b><a href="https://www.tensorzero.com/" target="_blank">Website</a></b> · <b><a href="https://www.tensorzero.com/docs" target="_blank">Docs</a></b> · <b><a href="https://www.x.com/tensorzero" target="_blank">Twitter</a></b> · <b><a href="https://www.tensorzero.com/slack" target="_blank">Slack</a></b> · <b><a href="https://www.tensorzero.com/discord" target="_blank">Discord</a></b> <br> <br> <b><a href="https://www.tensorzero.com/docs/quickstart" target="_blank">Quick Start (5min)</a></b> · <b><a href="https://www.tensorzero.com/docs/gateway/deployment" target="_blank">Deployment Guide</a></b> · <b><a href="https://www.tensorzero.com/docs/gateway/api-reference" target="_blank">API Reference</a></b> · <b><a href="https://www.tensorzero.com/docs/gateway/deployment" target="_blank">Configuration Reference</a></b> </p> --- > [!NOTE] > > ### **Coming Soon: TensorZero Autopilot** > > TensorZero Autopilot is an **automated AI engineer** (powered by the TensorZero Stack) that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. > **[Learn more](https://www.tensorzero.com/)** **[Join the waitlist](https://tensorzero.com/autopilot-waitlist)** ## Features ### 🌐 LLM Gateway > **Integrate with TensorZero once and access every major LLM provider.** - [x] **[Call any LLM](https://www.tensorzero.com/docs/gateway/call-any-llm)** (API or self-hosted) through a single unified API - [x] Infer with **[tool use](https://www.tensorzero.com/docs/gateway/guides/tool-use)**, **[structured outputs (JSON)](https://www.tensorzero.com/docs/gateway/generate-structured-outputs)**, **[batch](https://www.tensorzero.com/docs/gateway/guides/batch-inference)**, **[embeddings](https://www.tensorzero.com/docs/gateway/generate-embeddings)**, **[multimodal (images, files)](https://www.tensorzero.com/docs/gateway/call-llms-with-image-and-file-inputs)**, **[caching](https://www.tensorzero.com/docs/gateway/guides/inference-caching)**, etc. - [x] **[Create prompt templates and schemas](https://www.tensorzero.com/docs/gateway/create-a-prompt-template)** to enforce a structured interface between your application and the LLMs - [x] Satisfy extreme throughput and latency needs, thanks to 🦀 Rust: **[<1ms p99 latency overhead at 10k+ QPS](https://www.tensorzero.com/docs/gateway/benchmarks)** - [x] Use any programming language: **[integrate via our Python SDK, any OpenAI SDK, or our HTTP API](https://www.tensorzero.com/docs/gateway/clients)** - [x] **[Ensure high availability](https://www.tensorzero.com/docs/gateway/guides/retries-fallbacks)** with routing, retries, fallbacks, load balancing, granular timeouts, etc. - [x] **[Track usage and cost](https://www.tensorzero.com/docs/operations/track-usage-and-cost)** and **[enforce custom rate limits](https://www.tensorzero.com/docs/operations/enforce-custom-rate-limits)** with granular scopes (e.g. tags) - [x] **[Set up auth for TensorZero](https://www.tensorzero.com/docs/operations/set-up-auth-for-tensorzero)** to allow clients to access models without sharing provider API keys #### Supported Model Providers **[Anthropic](https://www.tensorzero.com/docs/gateway/guides/providers/anthropic)**, **[AWS Bedrock](https://www.tensorzero.com/docs/gateway/guides/providers/aws-bedrock)**, **[AWS SageMaker](https://www.tensorzero.com/docs/gateway/guides/providers/aws-sagemaker)**, **[Azure](https://www.tensorzero.com/docs/gateway/guides/providers/azure)**, **[DeepSeek](https://www.tensorzero.com/docs/gateway/guides/providers/deepseek)**, **[Fireworks](https://www.tensorzero.com/docs/gateway/guides/providers/fireworks)**, **[GCP Vertex AI Anthropic](https://www.tensorzero.com/docs/gateway/guides/providers/gcp-vertex-ai-anthropic)**, **[GCP Vertex AI Gemini](https://www.tensorzero.com/docs/gateway/guides/providers/gcp-vertex-ai-gemini)**, **[Google AI Studio (Gemini API)](https://www.tensorzero.com/docs/gateway/guides/providers/google-ai-studio-gemini)**, **[Groq](https://www.tensorzero.com/docs/gateway/guides/providers/groq)**, **[Hyperbolic](https://www.tensorzero.com/docs/gateway/guides/providers/hyperbolic)**, **[Mistral](https://www.tensorzero.com/docs/gateway/guides/providers/mistral)**, **[OpenAI](https://www.tensorzero.com/docs/gateway/guides/providers/openai)**, **[OpenRouter](https://www.tensorzero.com/docs/gateway/guides/providers/openrouter)**, **[SGLang](https://www.tensorzero.com/docs/gateway/guides/providers/sglang)**, **[TGI](https://www.tensorzero.com/docs/gateway/guides/providers/tgi)**, **[Together AI](https://www.tensorzero.com/docs/gateway/guides/providers/together)**, **[vLLM](https://www.tensorzero.com/docs/gateway/guides/providers/vllm)**, and **[xAI (Grok)](https://www.tensorzero.com/docs/gateway/guides/providers/xai)**. Need something else? TensorZero also supports **[any OpenAI-compatible API (e.g. Ollama)](https://www.tensorzero.com/docs/gateway/guides/providers/openai-compatible)**. #### Usage Example You can use TensorZero with any OpenAI SDK (Python, Node, Go, etc.) or OpenAI-compatible client. 1. **[Deploy the TensorZero Gateway](https://www.tensorzero.com/docs/deployment/tensorzero-gateway)** (one Docker container). 2. Update the `base_url` and `model` in your OpenAI-compatible client. 3. Run inference: ```python from openai import OpenAI # Point the client to the TensorZero Gateway client = OpenAI(base_url="http://localhost:3000/openai/v1", api_key="not-used") response = client.chat.completions.create( # Call any model provider (or TensorZero function) model="tensorzero::model_name::anthropic::claude-sonnet-4-6", messages=[ { "role": "user", "content": "Write a haiku about TensorZero.", } ], ) ``` See **[Quick Start](https://www.tensorzero.com/docs/quickstart)** for more information. ### 🔍 LLM Observability > **Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time &mdash; all using the open-source TensorZero UI.** - [x] Store inferences and **[feedback (metrics, human edits, etc.)](https://www.tensorzero.com/docs/gateway/guides/metrics-feedback)** in your own database - [x] Dive into individual inferences or high-level aggregate patterns using the TensorZero UI or programmatically - [x] **[Build datasets](https://www.tensorzero.com/docs/gateway/api-reference/datasets-datapoints)** for optimization, evaluation, and other workflows - [x] Replay historical inferences with new prompts, models, inference strategies, etc. - [x] **[Export OpenTelemetry traces (OTLP)](https://www.tensorzero.com/docs/operations/export-opentelemetry-traces)** and **[export Prometheus metrics](https://www.tensorzero.com/docs/observability/export-prometheus-metrics)** to your favorite application observability tools - [ ] Soon: AI-assisted debugging and root cause analysis; AI-assisted data labeling ### 📈 LLM Optimization > **Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies &mdash; using the UI or programmatically.** - [x] Optimize your models with **[supervised fine-tuning](https://www.tensorzero.com/docs/optimization/supervised-fine-tuning-sft)**, RLHF, and other techniques - [x] Optimize your prompts with automated prompt engineering algorithms like **[GEPA](https://www.tensorzero.com/docs/optimization/gepa)** - [x] Optimize your **[inference strategy](https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations)** with **[dynamic in-context learning](https://www.tensorzero.com/docs/optimization/dynamic-in-context-learning-dicl)**, best/mixture-of-N sampling, etc. - [x] Enable a feedback loop for your LLMs: a data & learning flywheel turning production data into smarter, faster, and cheaper models - [ ] Soon: synthetic data generation ### 📊 LLM Evaluation > **Compare prompts, models, and inference strategies using evaluations powered by heuristics and LLM judges.** - [x] **[Evaluate individual inferences](https://www.tensorzero.com/docs/evaluations/inference-evaluations/tutorial)** with _inference evaluations_ powered by heuristics or LLM judges (&approx; unit tests for LLMs) - [x] **[Evaluate end-to-end workflows](https://www.tensorzero.com/docs/evaluations/workflow-evaluations/tutorial)** with _workflow evaluations_ with complete flexibility (&approx; integration tests for LLMs) - [x] Optimize LLM judges just like any other TensorZero function to align them to human preferences - [ ] Soon: more built-in evaluators; headless evaluations <table> <tr></tr> <!-- flip highlight order --> <tr> <td width="50%" align="center" valign="middle"><b>Evaluation » UI</b></td> <td width="50%" align="center" valign="middle"><b>Evaluation » CLI</b></td> </tr> <tr> <td width="50%" align="center" valign="middle"><img src="https://github.com/user-attachments/assets/f4bf54e3-1b63-46c8-be12-2eaabf615699"></td> <td width="50%" align="left" valign="middle"> <pre><code class="language-bash">docker compose run --rm evaluations \ --evaluation-name extract_data \ --dataset-name hard_test_cases \ --variant-name gpt_4o \ --concurrency 5</code></pre> <pre><code class="language-bash">Run ID: 01961de9-c8a4-7c60-ab8d-15491a9708e4 Number of datapoints: 100 ██████████████████████████████████████ 100/100 exact_match: 0.83 ± 0.03 (n=100) semantic_match: 0.98 ± 0.01 (n=100) item_count: 7.15 ± 0.39 (n=100)</code></pre> </td> </tr> </table> ### 🧪 LLM Experimentation > **Ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.** - [x] **[Run adaptive A/B tests](https://www.tensorzero.com/docs/experimentation/run-adaptive-ab-tests)** to ship with confidence and identify the best prompts and models for your use cases. - [x] Enforce principled experiments in complex workflows, including support for multi-turn LLM systems, sequential testing, and more. ### & more! > **Build with an open-source stack well-suited for prototypes but designed from the ground up to support the most complex LLM applications and deployments.** - [x] Build simple applications or massive deployments with GitOps-friendly orchestration - [x] **[Extend TensorZero](https://www.tensorzero.com/docs/operations/extend-tensorzero)** with built-in escape hatches, programmatic-first usage, direct database access, and more - [x] Integrate with third-party tools: specialized observability and evaluations, model providers, agent orchestration frameworks, etc. - [x] Iterate quickly by experimenting with prompts interactively using the Playground UI ## Frequently Asked Questions **How is TensorZero different from other LLM frameworks?** 1. TensorZero enables you to optimize complex LLM applications based on production metrics and human feedback. 2. TensorZero supports the needs of industrial-grade LLM applications: low latency, high throughput, type safety, self-hosted, GitOps, customizability, etc. 3. TensorZero unifies the entire LLMOps stack, creating compounding benefits. For example, LLM evaluations can be used for fine-tuning models alongside AI judges. **Can I use TensorZero with \_\_\_?** Yes. Every major programming language is supported. It plays nicely with the **[OpenAI SDK](https://www.tensorzero.com/docs/gateway/clients/)**, **[OpenTelemetry](https://www.tensorzero.com/docs/operations/export-opentelemetry-traces/)**, and **[every major LLM](https://www.tensorzero.com/docs/integrations/model-providers/)**. **Is TensorZero production-ready?** Yes. TensorZero is used by companies ranging from frontier AI startups to the Fortune 10 and powers ~1% of the global LLM API spend today. Here's a case study: **[Automating Code Changelogs at a Large Bank with LLMs](https://www.tensorzero.com/blog/case-study-automating-code-changelogs-at-a-large-bank-with-llms)** **How much does TensorZero cost?** TensorZero Stack (LLMOps platform) is 100% self-hosted and open-source. TensorZero Autopilot (automated AI engineer) is a complementary paid product powered by the TensorZero Stack. **Who is building TensorZero?** Our technical team includes a former Rust compiler maintainer, machine learning researchers (Stanford, CMU, Oxford, Columbia) with thousands of citations, and the chief product officer of a decacorn startup. We're backed by the same investors as leading open-source projects (e.g. ClickHouse, CockroachDB) and AI labs (e.g. OpenAI, Anthropic). See our **[$7.3M seed round announcement](https://www.tensorzero.com/blog/tensorzero-raises-7-3m-seed-round-to-build-an-open-source-stack-for-industrial-grade-llm-applications/)** and **[coverage from VentureBeat](https://venturebeat.com/ai/tensorzero-nabs-7-3m-seed-to-solve-the-messy-world-of-enterprise-llm-development/)**. We're **[hiring in NYC](https://www.tensorzero.com/jobs)**. **How do I get started?** You can adopt TensorZero incrementally. Our **[Quick Start](https://www.tensorzero.com/docs/quickstart)** goes from a vanilla OpenAI wrapper to a production-ready LLM application with observability and fine-tuning in just 5 minutes. ## Demo > **Watch LLMs get better at data extraction in real-time with TensorZero!** > > **[Dynamic in-context learning (DICL)](https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations#dynamic-in-context-learning-dicl)** is a powerful inference-time optimization available out of the box with TensorZero. > It enhances LLM performance by automatically incorporating relevant historical examples into the prompt, without the need for model fine-tuning. https://github.com/user-attachments/assets/4df1022e-886e-48c2-8f79-6af3cdad79cb ## Get Started **Start building today.** The **[Quick Start](https://www.tensorzero.com/docs/quickstart)** shows it's easy to set up an LLM application with TensorZero. **Questions?** Ask us on **[Slack](https://www.tensorzero.com/slack)** or **[Discord](https://www.tensorzero.com/discord)**. **Using TensorZero at work?** Email us at **[[email protected]](mailto:[email protected])** to set up a Slack or Teams channel with your team (free). ## Examples We are working on a series of **complete runnable examples** illustrating TensorZero's data & learning flywheel. > **[Optimizing Data Extraction (NER) with TensorZero](https://github.com/tensorzero/tensorzero/tree/main/examples/data-extraction-ner)** > > This example shows how to use TensorZero to optimize a data extraction pipeline. > We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL). > In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task &mdash; at a fraction of the cost and latency &mdash; using a small amount of training data. > **[Agentic RAG — Multi-Hop Question Answering with LLMs](https://github.com/tensorzero/tensorzero/tree/main/examples/rag-retrieval-augmented-generation/simple-agentic-rag/)** > > This example shows how to build a multi-hop retrieval agent using TensorZero. > The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question. > **[Writing Haikus to Satisfy a Judge with Hidden Preferences](https://github.com/tensorzero/tensorzero/tree/main/examples/haiku-hidden-preferences)** > > This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. > You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. > You'll see progress by fine-tuning the LLM multiple times. > **[Image Data Extraction — Multimodal (Vision) Fine-tuning](https://github.com/tensorzero/tensorzero/tree/main/examples/multimodal-vision-finetuning)** > > This example shows how to fine-tune multimodal models (VLMs) like GPT-4o to improve their performance on vision-language tasks. > Specifically, we'll build a system that categorizes document images (screenshots of computer science research papers). > **[Improving LLM Chess Ability with Best-of-N Sampling](https://github.com/tensorzero/tensorzero/tree/main/examples/chess-puzzles/)** > > This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options. ## Blog Posts We write about LLM engineering on the **[TensorZero Blog](https://www.tensorzero.com/blog)**. Here are some of our favorite posts: - **[Bandits in your LLM Gateway: Improve LLM Applications Faster with Adaptive Experimentation (A/B Testing)](https://www.tensorzero.com/blog/bandits-in-your-llm-gateway/)** - **[Is OpenAI's Reinforcement Fine-Tuning (RFT) Worth It?](https://www.tensorzero.com/blog/is-openai-reinforcement-fine-tuning-rft-worth-it/)** - **[Distillation with Programmatic Data Curation: Smarter LLMs, 5-30x Cheaper Inference](https://www.tensorzero.com/blog/distillation-programmatic-data-curation-smarter-llms-5-30x-cheaper-inference/)** - **[From NER to Agents: Does Automated Prompt Engineering Scale to Complex Tasks?](https://www.tensorzero.com/blog/from-ner-to-agents-does-automated-prompt-engineering-scale-to-complex-tasks/)** ", Assign "at most 3 tags" to the expected json: {"id":"12213","tags":[]} "only from the tags list I provide: []" returns me the "expected json"