AI prompts
base on Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more! 
# OpenContracts ([Demo](https://contracts.opensource.legal))
Open source document intelligence. Self-hosted, AI-powered, and built for teams who need to own their data.
[](https://github.com/sponsors/JSv4)
---
| | |
|---|---|
| Backend CI/CD | [](https://codecov.io/gh/JSv4/OpenContracts) |
| Meta | [](https://github.com/psf/black) [](https://github.com/python/mypy) [](https://github.com/pycqa/isort) [](https://www.gnu.org/licenses/agpl-3.0) |
## What is OpenContracts?
OpenContracts is an AGPL-3.0 licensed platform for document analysis, annotation, and collaboration. It combines document management with AI-powered analysis tools, discussion threads, and structured data extraction.
### Core Capabilities
- **Document Processing** — Upload PDFs and text files, automatically extract structure with ML-based parsers
- **Annotation & Analysis** — Highlight, label, and analyze documents with custom annotation schemas
- **AI Agents** — Chat with documents using configurable AI assistants that can search and analyze content
- **Collaboration** — Threaded discussions with @mentions, voting, and moderation at corpus and document levels
- **Data Extraction** — Extract structured data from hundreds of documents using agent-powered queries
- **Version Control** — Track document changes, restore previous versions, soft delete with recovery
---
## Quick Look
### Document Annotation

### Text Format Support

### Structured Data Extraction

### Custom Analytics

---
## Features
### Document Management
- Organize documents into collections (Corpuses) with folder hierarchies
- Fine-grained permissions with public/private visibility controls
- Document versioning with full history and restore capability
- Bulk upload and batch operations
### Parsing & Processing
- Pluggable parser architecture supporting multiple backends:
- [Docling](docs/pipelines/docling_parser.md) — ML-based structure extraction
- [NLM-Ingest](docs/pipelines/nlm_ingest_parser.md) — Layout-aware parsing
- Text/Markdown — Simple text extraction
- Automatic vector embeddings for semantic search (powered by pgvector)
- Structural annotation extraction (headers, paragraphs, tables)
### Annotation Tools
- Multi-page annotation support
- Custom label schemas with validation
- Relationship mapping between annotations
- Import/export in standard formats
### AI & LLM Integration
- Built on [PydanticAI](docs/architecture/llms/README.md) for structured LLM interactions
- Configurable AI agents with tool access (search, document loading, annotation queries)
- Real-time streaming responses via WebSocket
- Conversation history with context management
### Collaboration (New in v3.0.0.b3)
- Threaded discussions at global, corpus, and document levels
- @mentions for documents, corpuses, and AI agents
- Upvoting/downvoting with reputation tracking
- Thread pinning, locking, and moderation controls
- User profiles with activity feeds and statistics
- Badges and achievements for community engagement
- Leaderboards showing top contributors
### Data Extraction
- Define extraction schemas with multiple question types
- Run extractions across document collections
- Review and validate extracted data in grid view
- Export results in structured formats
---
## Documentation
Browse the full documentation at [jsv4.github.io/OpenContracts](https://jsv4.github.io/OpenContracts/) or in the repo:
| Guide | Description |
|-------|-------------|
| [Quick Start](docs/quick_start.md) | Get running with Docker in minutes |
| [Key Concepts](docs/walkthrough/key-concepts.md) | Core workflows and terminology |
| [PDF Data Format](docs/architecture/PDF-data-layer.md) | How text maps to PDF coordinates |
| [LLM Framework](docs/architecture/llms/README.md) | PydanticAI integration and agents |
| [Vector Stores](docs/extract_and_retrieval/vector_stores.md) | Semantic search architecture |
| [Pipeline Overview](docs/pipelines/pipeline_overview.md) | Parser and embedder system |
| [Custom Extractors](docs/walkthrough/advanced/write-your-own-extractors.md) | Build your own data extraction tasks |
| [v3.0.0.b3 Release Notes](docs/releases/v3.0.0.b3.md) | Latest features and migration guide |
---
## Architecture
### Data Format
OpenContracts uses a standardized format for representing text and layout on PDF pages, enabling portable annotations across tools:

### Processing Pipeline
The modular pipeline supports custom parsers, embedders, and thumbnail generators:

Each component inherits from a base class with a defined interface:
- **Parsers** — Extract text and structure from documents
- **Embedders** — Generate vector embeddings for search
- **Thumbnailers** — Create document previews
See the [pipeline documentation](docs/pipelines/pipeline_overview.md) for details on creating custom components.
---
## Deployment
### Quick Start (Development)
```bash
git clone https://github.com/JSv4/OpenContracts.git
cd OpenContracts
docker compose -f local.yml up
```
### Production
Run migrations before starting services:
```bash
# Apply database migrations
docker compose -f production.yml --profile migrate up migrate
# Start services
docker compose -f production.yml up -d
```
The migration service runs once to avoid race conditions and ensures all tables are created before dependent services start.
---
## Telemetry
OpenContracts collects anonymous usage data to guide development priorities. We collect:
- Installation events (unique installation ID)
- Feature usage statistics (analyzer runs, extracts created)
- Aggregate counts (documents, users, queries)
We do not collect document contents, extracted data, user identities, or query contents.
Disable with `TELEMETRY_ENABLED=False` in your settings.
---
## Supported Formats
Currently supported:
- PDF (full layout and annotation support)
- Text-based formats (plaintext, Markdown)
**Coming soon:** DOCX viewing and annotation powered by [Docxodus](https://github.com/JSv4/Docxodus), an open source in-browser Word document viewer. This will enable the same annotation and analysis workflows for Word documents that currently exist for PDFs.
---
## Acknowledgements
This project builds on work from:
- [AllenAI PAWLS](https://github.com/allenai/pawls) — PDF annotation data format and concepts
- [NLMatics nlm-ingestor](https://github.com/nlmatics/nlm-ingestor) — Document parsing pipeline
The data extraction grid UI draws inspiration from NLMatics' innovative approach to document querying:

---
## License
AGPL-3.0 — See [LICENSE](LICENSE) for details.
", Assign "at most 3 tags" to the expected json: {"id":"11092","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"