AI prompts
base on Mass document analytics platform based on LlamaIndex, Pgvector, React and Django. ![OpenContracts](docs/assets/images/logos/OS_Legal_Logo.png)
# Open Contracts ([Demo](https://opencontracts.opensource.legal))
## The Free and Open Source Document Analytics Platform [![](https://img.shields.io/static/v1?label=Sponsor&message=%E2%9D%A4&logo=GitHub&color=%23fe8e86)](https://github.com/sponsors/JSv4)
---
| | |
| --- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CI/CD | [![codecov](https://codecov.io/gh/JSv4/OpenContracts/branch/main/graph/badge.svg?token=RdVsiuaTVz)](https://codecov.io/gh/JSv4/OpenContracts) |
| Meta | [![code style - black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![types - Mypy](https://img.shields.io/badge/types-Mypy-blue.svg)](https://github.com/python/mypy) [![imports - isort](https://img.shields.io/badge/imports-isort-ef8336.svg)](https://github.com/pycqa/isort) [![License - Apache2](https://img.shields.io/badge/license-Apache%202-blue.svg)](https://spdx.org/licenses/) |
## What Does it Do?
OpenContracts is an **Apache-2 Licensed** enterprise document analytics tool. It provides several key features:
1. **Manage Documents** - Manage document collections (`Corpuses`)
2. **Layout Parser** - Automatically extracts layout features from PDFs
3. **Automatic Vector Embeddings** - generated for uploaded PDFs and extracted layout blocks
4. **Pluggable microservice analyzer architecture** - to let you analyze documents and automatically annotate them
5. **Human Annotation Interface** - to manually annotated documents, including multi-page annotations.
6. **LlamaIndex Integration** - Use our vector stores (powered by pgvector) and any manual or automatically annotated features
to let an LLM intelligently answer questions.
7. **Data Extract** - ask multiple questions across hundreds of documents using complex LLM-powered querying behavior.
Our sample implementation uses LlamaIndex + Marvin.
8. **Custom Data Extract** - Custom data extract pipelines can be used on the frontend to query documents in bulk.
![Grid Review And Sources.gif](docs/assets/images/gifs/Grid_Review_And_Sources.gif)
![Manual Annotations](docs/assets/images/screenshots/Jumped_To_Annotation.png)
## Key Docs
We recommend you [browse our docs](https://jsv4.github.io/OpenContracts/) via our Mkdocs Site. You can also view the
docs in the repo:
1. [Quickstart Guide](docs/quick-start.md) - You'll probably want to get started quickly. Setting up locally should be
pretty painless if you're already running Docker.
2. [Basic Walkthrough](docs/walkthrough/key-concepts.md) - Check out the walkthrough to step through basic usage of the
application for document and annotation management.
2. [PDF Annotation Data Format Overview](docs/architecture/PDF-data-layer.md) - You may be interested how we map text to
PDFs visually and the underlying data format we're using.
3. [Django + Pgvector Powered Hybrid Vector Database](docs/extract_and_retrieval/intro_to_django_annotation_vector_store.md)
We've used the latest open source tooling for vector storage in postgres to make it almost trivially easy to
combine structured metadata and vector embeddings with an API-powered application.
4. [LlamaIndex Integration Walkthrough](docs/extract_and_retrieval/intro_to_django_annotation_vector_store.md) - We wrote a
wrapper for our backend database and vector store to make it simple to load our parsed annotations, embeddings and
text into LlamaIndex. Even better, if you have additional annotations in the document, the LLM can access those too.
5. [Write Custom Data Extractors](docs/walkthrough/advanced/write-your-own-extractors.md) - Custom data extract tasks (which
can use LlamaIndex or can be totally bespoke) are automatically loaded and displayed on the frontend to let user's
select how to ask questions and extract data from documents.
## Architecture and Data Flows at a Glance
### Core Data Standard
The core idea here - besides providing a platform to analyze contracts - is an open and standardized architecture that
makes data extremely portable. Powering this is a set of data standards to describe the text and layout blocks on a PDF
page:
![Data Format](docs/assets/images/diagrams/pawls-annotation-mapping.svg)
### Robust PDF Processing Pipeline
We have a robust PDF processing pipeline that is horizontally scalable and generates our standardized data
consistently for PDF inputs (We're working on adding additional formats soon):
![PDF Processor](docs/assets/images/diagrams/PDF-processor-sequence-diagram.png)
Special thanks to Nlmatics and [nlm-ingestor](https://github.com/nlmatics/nlm-ingestor) for powering the layout parsing
and extraction.
## Limitations
At the moment, it only works with PDFs. In the future, it will be able to convert other document types to PDF for
storage and labeling. PDF is an excellent format for this as it introduces a consistent, repeatable format which we can
use to generate a text and x-y coordinate layer from scratch.
**Adding OCR and ingestion for other enterprise documents is a priority**.
## Acknowledgements
Special thanks to AllenAI's [PAWLS project](https://github.com/allenai/pawls) and Nlmatics
[nlm-ingestor](https://github.com/nlmatics/nlm-ingestor). They've pioneered a number of features and flows, and we are
using their code in some parts of the application.
", Assign "at most 3 tags" to the expected json: {"id":"11092","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"