AI prompts
base on 🦜⛏️ Did you say you like data? 🚧 Under Active Development 🚧
This repo is under active developments. Do not use code from `main`. Instead please checkout code from [releases](https://github.com/langchain-ai/langchain-extract/releases)
This repository is not a library, but a jumping point for your own application -- so do not be surprised to find breaking changes between releases!
Checkout the demo service deployed at [extract.langchain.com/](https://extract.langchain.com/).
# 🦜⛏️ LangChain Extract
https://github.com/langchain-ai/langchain-extract/assets/26529506/6657280e-d05f-4c0f-9c47-07a0ef7c559d
[![CI](https://github.com/langchain-ai/langchain-extract/actions/workflows/ci.yml/badge.svg)](https://github.com/langchain-ai/langchain-extract/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai)
[![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)
[![Open Issues](https://img.shields.io/github/issues-raw/langchain-ai/langchain-extract)](https://github.com/langchain-ai/langchain-extract/issues)
`langchain-extract` is a simple web server that allows you to extract information from text and files using LLMs. It is build using [FastAPI](https://fastapi.tiangolo.com/), [LangChain](https://python.langchain.com/) and [Postgresql](https://www.postgresql.org/).
The backend closely follows the [extraction use-case documentation](https://python.langchain.com/docs/use_cases/extraction) and provides
a reference implementation of an app that helps to do extraction over data using LLMs.
This repository is meant to be a starting point for building your own extraction application which
may have slightly different requirements or use cases.
## Functionality
- 🚀 FastAPI webserver with a REST API
- 📚 OpenAPI Documentation
- 📝 Use [JSON Schema](https://json-schema.org/) to define what to extract
- 📊 Use examples to improve the quality of extracted results
- 📦 Create and save extractors and examples in a database
- 📂 Extract information from text and/or binary files
- 🦜️🏓 [LangServe](https://github.com/langchain-ai/langserve) endpoint to integrate with LangChain `RemoteRunnnable`
## Releases:
0.0.1: https://github.com/langchain-ai/langchain-extract/releases/tag/0.0.1
0.0.2: https://github.com/langchain-ai/langchain-extract/releases/tag/0.0.2
## 📚 Documentation
See the example notebooks in the [documentation](https://github.com/langchain-ai/langchain-extract/tree/main/docs/source/notebooks)
to see how to create examples to improve extraction results, upload files (e.g., HTML, PDF) and more.
Documentation and server code are both under development!
## 🍯 Example API
Below are two sample `curl` requests to demonstrate how to use the API.
These only provide minimal examples of how to use the API,
see the [documentation](https://github.com/langchain-ai/langchain-extract/tree/main/docs/source/notebooks) for more information
about the API and the [extraction use-case documentation](https://python.langchain.com/docs/use_cases/extraction) for more information about how to extract
information using LangChain.
First we generate a user ID for ourselves. **The application does not properly manage users or include legitimate authentication**. Access to extractors, few-shot examples, and other artifacts is controlled via this ID. Consider it secret.
```sh
USER_ID=$(uuidgen)
export USER_ID
```
### Create an extractor
```sh
curl -X 'POST' \
'http://localhost:8000/extractors' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H "x-key: ${USER_ID}" \
-d '{
"name": "Personal Information",
"description": "Use to extract personal information",
"schema": {
"type": "object",
"title": "Person",
"required": [
"name",
"age"
],
"properties": {
"age": {
"type": "integer",
"title": "Age"
},
"name": {
"type": "string",
"title": "Name"
}
}
},
"instruction": "Use information about the person from the given user input."
}'
```
Response:
```json
{
"uuid": "e07f389f-3577-4e94-bd88-6b201d1b10b9"
}
```
Use the extract endpoint to extract information from the text (or a file)
using an existing pre-defined extractor.
```sh
curl -s -X 'POST' \
'http://localhost:8000/extract' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H "x-key: ${USER_ID}" \
-F 'extractor_id=e07f389f-3577-4e94-bd88-6b201d1b10b9' \
-F 'text=my name is chester and i am 20 years old. My name is eugene and I am 1 year older than chester.' \
-F 'mode=entire_document' \
-F 'file=' | jq .
```
Response:
```json
{
"data": [
{
"name": "chester",
"age": 20
},
{
"name": "eugene",
"age": 21
}
]
}
```
Add a few shot example:
```sh
curl -X POST "http://localhost:8000/examples" \
-H "Content-Type: application/json" \
-H "x-key: ${USER_ID}" \
-d '{
"extractor_id": "e07f389f-3577-4e94-bd88-6b201d1b10b9",
"content": "marcos is 10.",
"output": [
{
"name": "MARCOS",
"age": 10
}
]
}' | jq .
```
The response will contain a UUID for the example. Examples can be deleted with a DELETE request. This example is now persisted and associated with our extractor, and subsequent extraction runs will incorporate it.
## ✅ Running locally
The easiest way to get started is to use `docker-compose` to run the server.
**Configure the environment**
Add `.local.env` file to the root directory with the following content:
```sh
OPENAI_API_KEY=... # Your OpenAI API key
```
Adding `FIREWORKS_API_KEY` or `TOGETHER_API_KEY` to this file would enable additional models. You can access available models for the server and other information via a `GET` request to the `configuration` endpoint.
Build the images:
```sh
docker compose build
```
Run the services:
```sh
docker compose up
```
This will launch both the extraction server and the postgres instance.
Verify that the server is running:
```sh
curl -X 'GET' 'http://localhost:8000/ready'
```
This should return `ok`.
The UI will be available at [http://localhost:3000](http://localhost:3000).
## Contributions
Feel free to develop in this project for your own needs!
For now, we are not accepting pull requests, but would love to hear [questions, ideas or issues](https://github.com/langchain-ai/langchain-extract/discussions).
## Development
To set up for development, you will need to install [Poetry](https://python-poetry.org/).
The backend code is located in the `backend` directory.
```sh
cd backend
```
Set up the environment using poetry:
```sh
poetry install --with lint,dev,test
```
Run the following script to create a database and schema:
```sh
python -m scripts.run_migrations create
```
From `/backend`:
```sh
OPENAI_API_KEY=[YOUR API KEY] python -m server.main
```
### Testing
Create a test database. The test database is used for running tests and is
separate from the main database. It will have the same schema as the main
database.
```sh
python -m scripts.run_migrations create-test-db
```
Run the tests
```sh
make test
```
### Linting and format
Testing and formatting is done using a Makefile inside `[root]/backend`
```sh
make format
```
", Assign "at most 3 tags" to the expected json: {"id":"8916","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"