AI prompts
base on Turn any website into clean data pipelines & structured APIs in minutes! <h2 align="center">
<div>
<a href="https://www.maxun.dev/?ref=ghread">
<img src="/src/assets/maxunlogo.png" width="70" />
<br>
Maxun
</a>
</div>
Transform the Web into Structured Intelligence<br>
</h2>
<p align="center">
✨ Turn any website into clean, contextualized data pipelines for your AI applications ✨
<br />
Maxun is the easiest way to extract web data with no code. The <b>modern</b> open-source alternative to BrowseAI, Octoparse and similar tools.
</p>
<p align="center">
<a href="https://app.maxun.dev/?ref=ghread"><b>Go To App</b></a> •
<a href="https://docs.maxun.dev/?ref=ghread"><b>Documentation</b></a> •
<a href="https://www.maxun.dev/?ref=ghread"><b>Website</b></a> •
<a href="https://discord.gg/5GbPjBUkws"><b>Discord</b></a> •
<a href="https://www.youtube.com/@MaxunOSS?ref=ghread"><b>Watch Tutorials</b></a>
<br />
<br />
<a href="https://trendshift.io/repositories/12113" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12113" alt="getmaxun%2Fmaxun | Trendshift" style="width: 250px; height: 55px; margin-top: 10px;" width="250" height="55"/></a>
</p>
## What is Maxun?
Maxun helps you transform websites into structured APIs, clean markdown for AI workflows, and production-ready data pipelines — all in minutes.
### Ecosystem
1. **[Extract](https://docs.maxun.dev/category/extract)** – Emulate real user behavior and collect structured data from any website. No code required.
* **[Recorder Mode](https://docs.maxun.dev/robot/extract/robot-actions)** - Record your actions as you browse; Maxun turns them into a reusable extraction robot.
* **[AI Mode](https://docs.maxun.dev/robot/extract/llm-extraction)** - Describe what you want in natural language and let LLM-powered extraction do the rest.
2. **[Scrape](https://docs.maxun.dev/robot/scrape/scrape-robots)** – Convert full webpages into clean Markdown or HTML and capture screenshots. Ideal for AI workflows, agents, and document processing. No code required.
3. **[SDK](https://docs.maxun.dev/sdk/sdk-overview)** – A complete developer toolkit for scraping, extraction, scheduling, and end-to-end data automation.
Whether you prefer browsing through a website or integrating automation into your codebase, Maxun adapts to your workflow.
## How Does It Work?
Maxun uses web robots to power everything you can do on the platform. There are two types of robots, each designed for a different job.
### 1. Extract Robots
**Extract robots emulate real user behavior and capture structured data.**
Choose how to build them
### a. Recorder Mode: Record your actions as you browse
- Build robots visually by browsing like a human.
- Perfect for structured, deterministic data extraction.
### Example: Extract 10 Property Listings from Airbnb
[https://github.com/user-attachments/assets/recorder-mode-demo-video](https://github.com/user-attachments/assets/c6baa75f-b950-482c-8d26-8a8b6c5382c3)
### b. LLM Extraction (Beta): Describe what you want in plain language
- Use natural language to define extraction patterns.
- Works with closed source & open source LLMs.
Get Started with LLM Extraction: https://docs.maxun.dev/robot/extract/llm-extraction
### Example: Extract Names, Rating & Duration of Top 50 Movies from IMDb
https://github.com/user-attachments/assets/f714e860-58d6-44ed-bbcd-c9374b629384
### Core capabilities
- Extract from any website, including behind logins
- Convert sites into APIs, spreadsheets, and workflows
- Scale extractions and run on schedules or via API
- Handle infinite scrolling and pagination
- Auto-adapt to website layout & structural changes
### 2. Scrape Robots
**Built for clean content and AI workflows.**
- Get clean HTML and LLM-ready Markdown from any website
- Remove scripts, styling, ads, and clutter automatically
- Perfect for RAG systems, AI summarization, embeddings, and content pipelines
- Ideal for feeding clean data to LLMs
### Example: Scrape GitHub Trending Repositories in clean Markdown format
https://github.com/user-attachments/assets/c774cbd4-5a85-45b7-b41f-128ee570eae6
## Quick Start
### Getting Started
The simplest & fastest way to get started is to use the hosted version: https://app.maxun.dev. You can self-host if you prefer!
### Installation
Maxun can run locally with or without Docker
1. [Setup with Docker Compose](https://docs.maxun.dev/installation/docker)
2. [Setup without Docker](https://docs.maxun.dev/installation/local)
3. [Environment Variables](https://docs.maxun.dev/installation/environment_variables)
4. [SDK](https://github.com/getmaxun/node-sdk)
### Upgrading & Self Hosting
1. [Self Host Maxun With Docker & Portainer](https://docs.maxun.dev/self-host)
2. [Upgrade Maxun With Docker Compose Setup](https://docs.maxun.dev/installation/upgrade#upgrading-with-docker-compose)
3. [Upgrade Maxun Without Docker Compose Setup](https://docs.maxun.dev/installation/upgrade#upgrading-with-local-setup)
## Sponsors
<table>
<tr>
<td width="229">
<br/>
<a href="https://www.lambdatest.com/?utm_source=maxun&utm_medium=sponsor" target="_blank">
<img src="https://github.com/user-attachments/assets/904dd40e-0498-47dd-98f1-7fa6d318adb9" /><br/><br/>
<b>LambdaTest</b>
</a>
<br/>
<sub>GenAI-powered Quality Engineering Platform that empowers teams to test intelligently, smarter, and ship faster.</sub>
</td>
</tr>
</table>
## Features
- ✨ **Extract Data With No-Code** – Point and click interface
- ✨ **LLM-Powered Extraction** – Describe what you want; use LLMs to scrape structured data
- ✨ **Developer SDK** – Programmatic extraction, scheduling, and robot management
- ✨ **Handle Pagination & Scrolling** – Automatic navigation
- ✨ **Run Robots On Schedules** – Set it and forget it
- ✨ **Turn Websites to APIs** – RESTful endpoints from any site
- ✨ **Turn Websites to Spreadsheets** – Direct data export to Google Sheets & Airtable
- ✨ **Adapt To Website Layout Changes** – Auto-recovery from site updates
- ✨ **Extract Behind Login** – Handle authentication seamlessly
- ✨ **Integrations** – Connect with your favorite tools
- ✨ **MCP Support** – Model Context Protocol integration
- ✨ **LLM-Ready Data** – Clean Markdown for AI applications
- ✨ **Self-Hostable** – Full control over your infrastructure
- ✨ **Open Source** – Transparent and community-driven
## Use Cases
Maxun can be used for various use-cases, including lead generation, market research, content aggregation and more.
View use-cases in detail here: https://www.maxun.dev/#usecases
## Note
This project is in early stages of development. Your feedback is very important for us - we're actively working on improvements. </a>
## License
<p>
This project is licensed under <a href="./LICENSE">AGPLv3</a>.
</p>
## Support Us
Star the repository, contribute if you love what we’re building, or [sponsor us](https://github.com/sponsors/amhsirak).
## Contributors
Thank you to the combined efforts of everyone who contributes!
<a href="https://github.com/getmaxun/maxun/graphs/contributors">
<img src="https://contrib.rocks/image?repo=getmaxun/maxun" />
</a>
", Assign "at most 3 tags" to the expected json: {"id":"12113","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"