base on The platform for LLM evaluations and AI agent testing <img width="1212" height="395" alt="012d1688-24ae-4759-ae70-5f8f81a13c0e" src="https://github.com/user-attachments/assets/27b6e50e-efde-41cf-9f7c-94b829b25a8c" /> <h3 align="center"> <a href="https://langwatch.ai">Website</a> · <a href="https://docs.langwatch.ai">Docs</a> · <a href="https://discord.gg/kT4PhDS2gH">Discord</a> · <a href="https://docs.langwatch.ai/self-hosting/overview">Self-hosting</a> </h3> <p align="center"> <a href="https://discord.gg/kT4PhDS2gH" target="_blank"><img src="https://img.shields.io/discord/1227886780536324106?logo=discord&labelColor=%20%235462eb&logoColor=%20%23f5f5f5&color=%20%235462eb" alt="chat on Discord"></a> <a href="https://pypi.python.org/pypi/langwatch" target="_blank"><img src="https://img.shields.io/pypi/dm/langwatch?logo=python&logoColor=white&label=pypi%20langwatch&color=blue" alt="langwatch Python package on PyPi"></a> <a href="https://www.npmjs.com/package/langwatch" target="_blank"><img src="https://img.shields.io/npm/dm/langwatch?logo=npm&logoColor=white&label=npm%20langwatch&color=blue" alt="langwatch npm package"></a> <a href="https://twitter.com/intent/follow?screen_name=langwatchai" target="_blank"> <img src="https://img.shields.io/twitter/follow/langwatchai?logo=X&color=%20%23f5f5f5" alt="follow on X"></a> </p> <video src="https://github.com/user-attachments/assets/ff49882d-4e9d-4b7c-819b-be690fba9387" autoplay loop muted playsinline width="100%" style="display: block; aspect-ratio: 16 / 9;"></video> ## Why LangWatch? The platform for LLM evaluations and AI agent testing. We help teams test, simulate, evaluate, and monitor LLM-powered agents end-to-end — before release and in production. Built for teams that need regression testing, simulations, and production observability without building custom tooling. - [**End-to-end agent simulations**](https://langwatch.ai/scenario/) Run realistic scenarios against your **full stack** (tools, state, user simulator, judge) and pinpoint where your agents break, and why? down to each decision. - **Eval + observability + prompts in one loop** [Trace](https://docs.langwatch.ai/integration/overview) → [dataset](https://docs.langwatch.ai/datasets/overview) → [evaluate](https://docs.langwatch.ai/llm-evaluation/offline-evaluation) → [optimize prompts/models](https://docs.langwatch.ai/optimization-studio/overview) → re-test. No glue code, no tool sprawl. - [**Open standards, no lock-in**](https://docs.langwatch.ai/integration/opentelemetry/guide) OpenTelemetry/OTLP-native. Framework- and LLM-provider agnostic by design. - [**Collaboration that doesn't slow shipping**](https://docs.langwatch.ai/features/annotations) Review runs, annotate failures, and ship fixes faster. Let domain experts label edge cases with [annotations & queues](https://docs.langwatch.ai/features/annotations), keep prompts in Git with the [GitHub integration](https://docs.langwatch.ai/prompt-management/features/essential/github-integration), and [link prompt versions to traces](https://docs.langwatch.ai/prompt-management/features/advanced/link-to-traces). LangWatch gives you full visibility into agent behavior and the tools to systematically improve reliability, performance, and cost, while keeping you in control of your AI system ## Getting Started ### Cloud ☁️ The easiest way to get started with LangWatch. [Create a free account](https://app.langwatch.ai) → create a project → get started/ copy your API key. ### Local setup 💻 Get up and running on your own machine using docker compose: ```bash git clone https://github.com/langwatch/langwatch.git cd langwatch cp langwatch/.env.example langwatch/.env docker compose up -d --wait --build ``` Once running, LangWatch will be available at `http://localhost:5560`, where you can create your first project and API key. ### Deployment options ⚓️ Run LangWatch on your own infrastructure: - [Docker Compose](https://docs.langwatch.ai/self-hosting/open-source#docker-compose) - Run LangWatch on your own machine. - [Kubernetes (Helm)](https://docs.langwatch.ai/self-hosting/open-source#helm-chart-for-langwatch) - Run LangWatch on a Kubernetes cluster using Helm. - [OnPrem](https://docs.langwatch.ai/self-hosting/onprem) - Cloud-specific setups for AWS, Google Cloud, and Azure. <details> <summary>Hybrid (OnPrem data) 🔀</summary> For companies that have strict data residency and control requirements, without needing to go fully on-prem. Read more about it on our [docs](https://docs.langwatch.ai/self-hosting/hybrid). </details> <details> <summary>Local Development 👩‍💻</summary> You can also run LangWatch locally without docker to develop and help contribute to the project. Start just the databases using docker and leave it running: ```bash docker compose up redis postgres opensearch ``` Then, on another terminal, install the dependencies and start LangWatch: ```bash make install make start ``` </details> ## 🚀 Quick Start Ship safer agents in minutes. [Create a free account](https://app.langwatch.ai), then dive into these guides: - **[Run your first agent simulation](https://langwatch.ai/scenario/introduction/getting-started)** - Test agents against realistic scenarios before production - **[Set up evaluations](https://docs.langwatch.ai/llm-evaluation/offline-evaluation)** - Measure quality, performance, and reliability - **[Send your first traces](https://docs.langwatch.ai/integration/overview)** - Integrate LangWatch with your stack - **[Get started with LangWatch MCP](https://langwatch.ai/docs/integration/mcp)** - Use LangWatch in Claude Desktop and other MCP clients ## 🗺️ Integrations LangWatch builds and maintains several integrations listed below. Our tracing platform is built on top of [OpenTelemetry](https://opentelemetry.io/), so we support any OpenTelemetry-compatible library out of the box. **Frameworks:** [LangChain](https://langwatch.ai/docs/integration/python/integrations/langchain) · [LangGraph](https://langwatch.ai/docs/integration/python/integrations/langgraph) · [Vercel AI SDK](https://langwatch.ai/docs/integration/typescript/integrations/vercel-ai) · [Mastra](https://langwatch.ai/docs/integration/typescript/integrations/mastra) · [CrewAI](https://langwatch.ai/docs/integration/python/integrations/crewai) · [Google ADK](https://langwatch.ai/docs/integration/python/integrations/google-ai) **Model Providers:** [OpenAI](https://langwatch.ai/docs/integration/python/integrations/openai) · [Anthropic](https://langwatch.ai/docs/integration/python/integrations/anthropic) · [Azure](https://langwatch.ai/docs/integration/python/integrations/azure) · [Google Cloud](https://langwatch.ai/docs/integration/python/integrations/google-cloud) · [AWS](https://langwatch.ai/docs/integration/python/integrations/aws) · [Groq](https://langwatch.ai/docs/integration/python/integrations/groq) · [Ollama](https://langwatch.ai/docs/integration/python/integrations/ollama) ### Platforms [LangFlow](https://docs.langwatch.ai/integration/langflow) · [Flowise](https://docs.langwatch.ai/integration/flowise) · [n8n](https://docs.langwatch.ai/integration/n8n) *and many more…* Are you using a platform that could benefit from a direct LangWatch integration? We'd love to hear from you, please [**fill out this very quick form.**](https://www.notion.so/1e35e165d48280468247fcbdc3349077?pvs=21) ## 💬 Support Have questions or need help? We're here to support you in multiple ways: - **Documentation:** Our comprehensive [documentation](https://docs.langwatch.ai) covers everything from getting started to advanced features. - **Discord Community:** Join our [Discord server](https://discord.gg/kT4PhDS2gH) for real-time help from our team and community. - **X (Twitter):** Follow us on [X](https://x.com/LangWatchAI) for updates and announcements. - **GitHub Issues:** Report bugs or request features through our [GitHub repository](https://github.com/langwatch/langwatch). - **Enterprise Support:** Enterprise customers receive priority support with dedicated response times. Our [pricing page](https://langwatch.ai/pricing) contains more information. ## 🤝 Collaborating Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**. Please read our [Contribution Guidelines](https://github.com/langwatch/langwatch/blob/main/CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests. ## ✍️ License Please read our [LICENSE.md](/LICENSE.md) file. ## 👮‍♀️ Security + Compliance As a platform that has access to data that is highly likely to be sensitive, we take security incredibly seriously and treat it as a core part of our culture. | Legal Framework | Current Status | | --------------- | ------------------------------------------------------------------------------ | | GDPR | Compliant. DPA available upon request. | | ISO 27001 | Certified. Certification report available upon request on our Enterprise plan. | Please refer to our Security page for more information. Contact us at [[email protected]](mailto:[email protected]) if you have any further questions. ### Vulnerability Disclosure If you need to do a responsible disclosure of a security vulnerability, you may do so by email to [[email protected]](mailto:[email protected]), or if you prefer you can reach out to one of our team privately on [Discord](https://discord.com/invite/kT4PhDS2gH). ", Assign "at most 3 tags" to the expected json: {"id":"12754","tags":[]} "only from the tags list I provide: []" returns me the "expected json"