base on A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-bench lite and 46.2% tasks (pass@1) in SWE-bench verified with each task costs less than $0.7. # AutoCodeRover: Autonomous Program Improvement
<br>
<p align="center">
<img src="https://github.com/nus-apr/auto-code-rover/assets/16000056/8d249b02-1db4-4f58-a5a4-bdb694d65ab1" alt="autocoderover_logo" width="200px" height="200px">
</p>
<p align="center">
<a href="https://arxiv.org/abs/2404.05427"><strong>ArXiv Paper</strong></a>
<a href="https://autocoderover.dev/"><strong>Website</strong></a>
<a href="https://discord.gg/ScXsdE49JY"><strong>Discord server</strong></a>
</p>
<br>
![overall-workflow](https://github.com/nus-apr/auto-code-rover/assets/48704330/0b8da9ad-588c-4f7d-9c99-53f33d723d35)
<br>
> [!NOTE]
> This is a public version of the AutoCodeRover project. Check the latest results on our [website](https://autocoderover.dev/).
## 📣 Updates
- [November 21, 2024] AutoCodeRover(v20240620) achieves **46.20%** efficacy on SWE-bench Verified and **24.89%** on full SWE-bench.
- [August 14, 2024] On the SWE-bench Verified dataset released by OpenAI, AutoCodeRover(v20240620) achieves **38.40%** efficacy, and AutoCodeRover(v20240408) achieves 28.8% efficacy. More details in the [blog post](https://openai.com/index/introducing-swe-bench-verified/) from OpenAI and [SWE-bench leaderboard](https://www.swebench.com/).
- [July 18, 2024] AutoCodeRover now supports a new mode that outputs the list of potential fix locations.
- [June 20, 2024] AutoCodeRover(v20240620) now achieves **30.67%** efficacy (pass@1) on SWE-bench-lite!
- [June 08, 2024] Added support for Gemini, Groq (thank you [KasaiHarcore](https://github.com/KasaiHarcore) for the contribution!) and Anthropic models through AWS Bedrock (thank you [JGalego](https://github.com/JGalego) for the contribution!).
- [April 29, 2024] Added support for Claude and Llama models. Find the list of supported models [here](#using-a-different-model)! Support for more models coming soon.
- [April 19, 2024] AutoCodeRover now supports running on [GitHub issues](#github-issue-mode-set-up-and-run-on-new-github-issues) and [local issues](#local-issue-mode-set-up-and-run-on-local-repositories-and-local-issues)! Feel free to try it out and we welcome your feedback!
## [Discord](https://discord.gg/ScXsdE49JY) - server for general discussion, questions, and feedback.
## 👋 Overview
AutoCodeRover is a fully automated approach for resolving GitHub issues (bug fixing and feature addition) where LLMs are combined with analysis and debugging capabilities to prioritize patch locations ultimately leading to a patch.
[Update on June 20, 2024] AutoCodeRover(v20240620) now resolves **30.67%** of issues (pass@1) in SWE-bench lite! AutoCodeRover achieved this efficacy while being economical - each task costs **less than $0.7** and is completed within **7 mins**!
<p align="center">
<img src=https://github.com/nus-apr/auto-code-rover/assets/16000056/78d184b2-f15c-4408-9eac-cfd3a11a503a width=500/>
<img src=https://github.com/nus-apr/auto-code-rover/assets/16000056/83253ae9-8789-474e-942d-708495b5b310 width=500/>
</p>
[April 08, 2024] First release of AutoCodeRover(v20240408) resolves **19%** of issues in [SWE-bench lite](https://www.swebench.com/lite.html) (pass@1), improving over the current state-of-the-art efficacy of AI software engineers.
AutoCodeRover works in two stages:
- 🔎 Context retrieval: The LLM is provided with code search APIs to navigate the codebase and collect relevant context.
- 💊 Patch generation: The LLM tries to write a patch, based on retrieved context.
### ✨ Highlights
AutoCodeRover has two unique features:
- Code search APIs are *Program Structure Aware*. Instead of searching over files by plain string matching, AutoCodeRover searches for relevant code context (methods/classes) in the abstract syntax tree.
- When a test suite is available, AutoCodeRover can take advantage of test cases to achieve an even higher repair rate, by performing *statistical fault localization*.
## 🗎 arXiv Paper
### AutoCodeRover: Autonomous Program Improvement [[arXiv 2404.05427]](https://arxiv.org/abs/2404.05427)
<p align="center">
<a href="https://arxiv.org/abs/2404.05427">
<img src="https://github.com/nus-apr/auto-code-rover/assets/48704330/c6422951-a6e8-4494-9403-b5ada3d9ee7d" alt="First page of arXiv paper" width="570">
</a>
</p>
For referring to our work, please cite and mention:
```
@inproceedings{zhang2024autocoderover,
author = {Zhang, Yuntong and Ruan, Haifeng and Fan, Zhiyu and Roychoudhury, Abhik},
title = {AutoCodeRover: Autonomous Program Improvement},
year = {2024},
isbn = {9798400706127},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3650212.3680384},
doi = {10.1145/3650212.3680384},
booktitle = {Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis},
pages = {1592–1604},
numpages = {13},
keywords = {automatic program repair, autonomous software engineering, autonomous software improvement, large language model},
location = {Vienna, Austria},
series = {ISSTA 2024}
}
```
## ✔️ Example: Django Issue #32347
As an example, AutoCodeRover successfully fixed issue [#32347](https://code.djangoproject.com/ticket/32347) of Django. See the demo video for the full process:
https://github.com/nus-apr/auto-code-rover/assets/48704330/719c7a56-40b8-4f3d-a90e-0069e37baad3
### Enhancement: leveraging test cases
AutoCodeRover can resolve even more issues, if test cases are available. See an example in the video:
https://github.com/nus-apr/auto-code-rover/assets/48704330/26c9d5d4-04e0-4b98-be55-61c1d10a36e5
## 🚀 Setup & Running
### Setup API key and environment
We recommend running AutoCodeRover in a Docker container.
Set the `OPENAI_KEY` env var to your [OpenAI key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key):
```
export OPENAI_KEY=sk-YOUR-OPENAI-API-KEY-HERE
```
For Anthropic model, Set the `ANTHROPIC_API_KEY` env var can be found [here](https://docs.anthropic.com/claude/reference/getting-started-with-the-api)
```
export ANTHROPIC_API_KEY=sk-ant-api...
```
The same with `GROQ_API_KEY`
Build and start the docker image for the AutoCodeRover tool:
```
docker build -f Dockerfile.minimal -t acr .
docker run -it -e OPENAI_KEY="${OPENAI_KEY:-OPENAI_API_KEY}" acr
```
### Setup: local mode
Alternatively, you can have a local copy of AutoCodeRover and manage python dependencies with `environment.yml`.
This is the recommended setup for running SWE-bench experiments with AutoCodeRover.
With a working conda installation, do `conda env create -f environment.yml`.
Similarly, set `OPENAI_KEY` or `ANTHROPIC_API_KEY` in your shell before running AutoCodeRover.
## Running AutoCodeRover
You can run AutoCodeRover in three modes:
1. GitHub issue mode: Run ACR on a live GitHub issue by providing a link to the issue page.
2. Local issue mode: Run ACR on a local repository and a file containing the issue description.
3. SWE-bench mode: Run ACR on SWE-bench task instances. (local setup of ACR recommend.)
### [GitHub issue mode] Set up and run on new GitHub issues
If you want to use AutoCodeRover for new GitHub issues in a project, prepare the following:
- Link to clone the project (used for `git clone ...`).
- Commit hash of the project version for AutoCodeRover to work on (used for `git checkout ...`).
- Link to the GitHub issue page.
Then, in the docker container (or your local copy of AutoCodeRover), run the following commands to set up the target project
and generate patch:
```
cd /opt/auto-code-rover
conda activate auto-code-rover
PYTHONPATH=. python app/main.py github-issue --output-dir output --setup-dir setup --model gpt-4o-2024-05-13 --model-temperature 0.2 --task-id <task id> --clone-link <link for cloning the project> --commit-hash <any version that has the issue> --issue-link <link to issue page>
```
Here is an example command for running ACR on an issue from the langchain GitHub issue tracker:
```
PYTHONPATH=. python app/main.py github-issue --output-dir output --setup-dir setup --model gpt-4o-2024-05-13 --model-temperature 0.2 --task-id langchain-20453 --clone-link https://github.com/langchain-ai/langchain.git --commit-hash cb6e5e5 --issue-link https://github.com/langchain-ai/langchain/issues/20453
```
The `<task id>` can be any string used to identify this issue.
If patch generation is successful, the path to the generated patch will be written to a file named `selected_patch.json` in the output directory.
### [Local issue mode] Set up and run on local repositories and local issues
Instead of cloning a remote project and run ACR on an online issue, you can also prepare the local repository and issue beforehand,
if that suits the use case.
For running ACR on a local issue and local codebase, prepare a local codebase and write an issue description into a file,
and run the following commands:
```
cd /opt/auto-code-rover
conda activate auto-code-rover
PYTHONPATH=. python app/main.py local-issue --output-dir output --model gpt-4o-2024-05-13 --model-temperature 0.2 --task-id <task id> --local-repo <path to the local project repository> --issue-file <path to the file containing issue description>
```
If patch generation is successful, the path to the generated patch will be written to a file named `selected_patch.json` in the output directory.
### [SWE-bench mode] Set up and run on SWE-bench tasks
This mode is for running ACR on existing issue tasks contained in SWE-bench.
#### Set up
##### Install SWE-bench Docker
We use a [fork](https://github.com/nus-apr/SWE-bench-docker) of [SWE-bench docker](https://github.com/aorwall/SWE-bench-docker) to run regression tests (not `FAIL_TO_PASS` tests, but all the tests in the buggy programs). To install this, run
```
conda activate auto-code-rover
git submodule update --init --recursive
cd SWE-bench-docker
pip install .
```
##### Setting up Testbed
For SWE-bench mode, we recommend setting up ACR on a host machine, instead of running it in docker mode.
Firstly, set up the SWE-bench task instances locally.
1. Clone [this SWE-bench fork](https://github.com/yuntongzhang/SWE-bench) and follow the [installation instruction](https://github.com/yuntongzhang/SWE-bench?tab=readme-ov-file#to-install) to install dependencies.
2. Put the tasks to be run into a file, one per line:
```
cd <SWE-bench-path>
echo django__django-11133 > tasks.txt
```
Or if running on arm64 (e.g. Apple silicon), try this one which doesn't depend on Python 3.6 (which isn't supported in this env):
```
echo django__django-16041 > tasks.txt
```
Then, set up these tasks by running:
3. Set up these tasks in the file by running:
```
cd <SWE-bench-path>
conda activate swe-bench
python harness/run_setup.py --log_dir logs --testbed testbed --result_dir setup_result --subset_file tasks.txt
```
Once the setup for this task is completed, the following two lines will be printed:
```
setup_map is saved to setup_result/setup_map.json
tasks_map is saved to setup_result/tasks_map.json
```
The `testbed` directory will now contain the cloned source code of the target project.
A conda environment will also be created for this task instance.
_If you want to set up multiple tasks together, put multiple ids in `tasks.txt` and follow the same steps._
#### Run a single task in SWE-bench
Before running the task (`django__django-11133` here), make sure it has been set up as mentioned [above](#set-up-one-or-more-tasks-in-swe-bench).
```
cd <AutoCodeRover-path>
conda activate auto-code-rover
PYTHONPATH=. python app/main.py swe-bench --model gpt-4o-2024-05-13 --setup-map <SWE-bench-path>/setup_result/setup_map.json --tasks-map <SWE-bench-path>/setup_result/tasks_map.json --output-dir output --task django__django-11133
```
The output for a run (e.g. for `django__django-11133`) can be found at a location like this: `output/applicable_patch/django__django-11133_yyyy-MM-dd_HH-mm-ss/` (the date-time field in the directory name will be different depending on when the experiment was run).
Path to the final generated patch is written in a file named `selected_patch.json` in the output directory.
#### Run multiple tasks in SWE-bench
First, put the id's of all tasks to run in a file, one per line. Suppose this file is `tasks.txt`, the tasks can be run with
```
cd <AutoCodeRover-path>
conda activate auto-code-rover
PYTHONPATH=. python app/main.py swe-bench --model gpt-4o-2024-05-13 --setup-map <SWE-bench-path>/setup_result/setup_map.json --tasks-map <SWE-bench-path>/setup_result/tasks_map.json --output-dir output --task-list-file <SWE-bench-path>/tasks.txt
```
**NOTE**: make sure that the tasks in `tasks.txt` have all been set up in SWE-bench. See the steps [above](#set-up-one-or-more-tasks-in-swe-bench).
#### Using a config file
Alternatively, a config file can be used to specify all parameters and tasks to run. See `conf/example.conf` for an example.
Also see [EXPERIMENT.md](EXPERIMENT.md) for the details of the items in a conf file.
A config file can be used by:
```
python scripts/run.py conf/example.conf
```
### Using a different model
AutoCodeRover works with different foundation models. You can set the foundation model to be used with the `--model` command line argument.
The current list of supported models:
| | Model | AutoCodeRover cmd line argument |
|:--------------:|---------------|--------------|
| OpenAI | gpt-4o-2024-08-06 | --model gpt-4o-2024-08-06 |
| | gpt-4o-2024-05-13 | --model gpt-4o-2024-05-13 |
| | gpt-4-turbo-2024-04-09 | --model gpt-4-turbo-2024-04-09 |
| | gpt-4-0125-preview | --model gpt-4-0125-preview |
| | gpt-4-1106-preview | --model gpt-4-1106-preview |
| | gpt-3.5-turbo-0125 | --model gpt-3.5-turbo-0125 |
| | gpt-3.5-turbo-1106 | --model gpt-3.5-turbo-1106 |
| | gpt-3.5-turbo-16k-0613 | --model gpt-3.5-turbo-16k-0613 |
| | gpt-3.5-turbo-0613 | --model gpt-3.5-turbo-0613 |
| | gpt-4-0613 | --model gpt-4-0613 |
| Anthropic | Claude 3.5 Sonnet | --model claude-3-5-sonnet-20240620 |
| | Claude 3 Opus | --model claude-3-opus-20240229 |
| | Claude 3 Sonnet | --model claude-3-sonnet-20240229 |
| | Claude 3 Haiku | --model claude-3-haiku-20240307 |
| Meta | Llama 3 70B | --model llama3:70b |
| | Llama 3 8B | --model llama3 |
| AWS Bedrock | Claude 3 Opus | --model bedrock/anthropic.claude-3-opus-20240229-v1:0 |
| | Claude 3 Sonnet | --model bedrock/anthropic.claude-3-sonnet-20240229-v1:0 |
| | Claude 3 Haiku | --model bedrock/anthropic.claude-3-haiku-20240307-v1:0 |
| | Claude 3.5 Sonnet | --model bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0 |
| | Nova Pro | --model bedrock/us.amazon.nova-pro-v1:0 |
| | Nova Lite | --model bedrock/us.amazon.nova-lite-v1:0 |
| | Nova Micro | --model bedrock/us.amazon.nova-micro-v1:0 |
| LiteLLM | Any LiteLLM model | --model litellm-generic-<MODEL_NAME_HERE> |
| Groq | Llama 3 8B | --model groq/llama3-8b-8192 |
| | Llama 3 70B | --model groq/llama3-70b-8192 |
| | Llama 2 70B | --model groq/llama2-70b-4096 |
| | Mixtral 8x7B | --model groq/mixtral-8x7b-32768 |
| | Gemma 7B | --model groq/gemma-7b-it |
> [!NOTE]
> Using the Groq models on a free plan can cause the context limit to be exceeded, even on simple issues.
> [!NOTE]
> Some notes on running ACR with local models such as llama3:
> 1. Before using the llama3 models, please [install ollama](https://ollama.com/download/linux) and download the corresponding models with ollama (e.g. `ollama pull llama3`).
> 2. You can run ollama server on the host machine, and ACR in its container. ACR will attempt to communicate to the ollama server on host.
> 3. If your setup is ollama in host + ACR in its container, we recommend installing [Docker Desktop](https://docs.docker.com/desktop/) on the host, in addition to the [Docker Engine](https://docs.docker.com/engine/).
> - Docker Desktop contains Docker Engine, and also has a virtual machine which makes it easier to access the host ports from within a container. With Docker Desktop, this setup will work without additional effort.
> - When the docker installation is only Docker Engine, you may need to add either `--net=host` or `--add-host host.docker.internal=host-gateway` to the `docker run` command when starting the ACR container, so that ACR can communicate with the ollama server on the host machine.
> If you encounter any issue in the tool or experiment, you can contact us via email at
[email protected], or through our [discord server](https://discord.com/invite/ScXsdE49JY).
## Experiment Replication
Please refer to [EXPERIMENT.md](EXPERIMENT.md) for information on experiment replication.
## ✉️ Contacts
For any queries, you are welcome to open an issue.
Alternatively, contact us at: {[yuntong](https://yuntongzhang.github.io/),[hruan](https://www.linkedin.com/in/haifeng-ruan-701a731a4/),[zhiyufan](https://zhiyufan.github.io/)}@comp.nus.edu.sg.
## Acknowledgements
This work was partially supported by a Singapore Ministry of Education (MoE) Tier 3 grant "Automated Program Repair", MOE-MOET32021-0001.
", Assign "at most 3 tags" to the expected json: {"id":"9227","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"