base on Playing Pokemon Red with Reinforcement Learning # Train RL agents to play Pokemon Red ### New 10-19-24! Updated & Simplified V2 Training Script - See V2 below ### New 1-29-24! - [Multiplayer Live Training Broadcast](https://github.com/pwhiddy/pokerl-map-viz/) šŸŽ¦ šŸ”“ [View Here](https://pwhiddy.github.io/pokerl-map-viz/) Stream your training session to a shared global game map using the [Broadcast Wrapper](/baselines/stream_agent_wrapper.py) See how in [Training Broadcast](#training-broadcast) section ## Watch the Video on Youtube! <p float="left"> <a href="https://youtu.be/DcYLT37ImBY"> <img src="/assets/youtube.jpg?raw=true" height="192"> </a> <a href="https://youtu.be/DcYLT37ImBY"> <img src="/assets/poke_map.gif?raw=true" height="192"> </a> </p> ## Join the discord server [![Join the Discord server!](https://invidget.switchblade.xyz/RvadteZk4G)](http://discord.gg/RvadteZk4G) ## Running the Pretrained Model Interactively šŸŽ® šŸ Python 3.10+ is recommended. Other versions may work but have not been tested. You also need to install ffmpeg and have it available in the command line. ### Windows Setup Refer to this [Windows Setup Guide](windows-setup-guide.md) ### For AMD GPUs Follow this [guide to install pytorch with ROCm support](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/wsl/howto_wsl.html) ### Linux / MacOS V2 is now recommended over the original version. You may follow all steps below but replace `baselines` with `v2`. 1. Copy your legally obtained Pokemon Red ROM into the base directory. You can find this using google, it should be 1MB. Rename it to `PokemonRed.gb` if it is not already. The sha1 sum should be `ea9bcae617fdf159b045185467ae58b2e4a48b9a`, which you can verify by running `shasum PokemonRed.gb`. 2. Move into the `baselines/` directory: ```cd baselines``` 3. Install dependencies: ```pip install -r requirements.txt``` It may be necessary in some cases to separately install the SDL libraries. For V2 MacOS users should use ```macos_requirements.txt``` instead of ```requirements.txt``` 4. Run: ```python run_pretrained_interactive.py``` Interact with the emulator using the arrow keys and the `a` and `s` keys (A and B buttons). You can pause the AI's input during the game by editing `agent_enabled.txt` Note: the Pokemon.gb file MUST be in the main directory and your current directory MUST be the `baselines/` directory in order for this to work. ## Training the Model šŸ‹ļø <img src="/assets/grid.png?raw=true" height="156"> ### V2 - Trains faster and with less memory - Reaches Cerulean - Streams to map by default - Other improvements Replaces the frame KNN with a coordinate based exploration reward, as well as some other tweaks. 1. Previous steps but in the `v2` directory instead of `baselines` 2. Run: ```python baseline_fast_v2.py``` ## Tracking Training Progress šŸ“ˆ ### Training Broadcast Stream your training session to a shared global game map using the [Broadcast Wrapper](/baselines/stream_agent_wrapper.py) on your environment like this: ```python env = StreamWrapper( env, stream_metadata = { # All of this is part is optional "user": "super-cool-user", # choose your own username "env_id": id, # environment identifier "color": "#0033ff", # choose your color :) "extra": "", # any extra text you put here will be displayed } ) ``` Hack on the broadcast viewing client or set up your own local stream with this repo: https://github.com/pwhiddy/pokerl-map-viz/ ### Local Metrics The current state of each game is rendered to images in the session directory. You can track the progress in tensorboard by moving into the session directory and running: ```tensorboard --logdir .``` You can then navigate to `localhost:6006` in your browser to view metrics. To enable wandb integration, change `use_wandb_logging` in the training script to `True`. ## Static Visualization 🐜 Map visualization code can be found in `visualization/` directory. ## Follow up work Check out our follow up projects & papers! ### [Pokemon Red via Reinforcement Learning šŸ”—](https://arxiv.org/abs/2502.19920) ``` @misc{pleines2025pokemon, title={Pokemon Red via Reinforcement Learning}, author={Marco Pleines and Daniel Addis and David Rubinstein and Frank Zimmer and Mike Preuss and Peter Whidden}, year={2025}, eprint={2502.19920}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` ### [Pokemon RL Edition šŸ”—](https://drubinstein.github.io/pokerl/) ### [PokeGym šŸ”—](https://github.com/PufferAI/pokegym) ## Supporting Libraries Check out these awesome projects! ### [PyBoy](https://github.com/Baekalfen/PyBoy) <a href="https://github.com/Baekalfen/PyBoy"> <img src="/assets/pyboy.svg" height="64"> </a> ### [Stable Baselines 3](https://github.com/DLR-RM/stable-baselines3) <a href="https://github.com/DLR-RM/stable-baselines3"> <img src="/assets/sblogo.png" height="64"> </a> ", Assign "at most 3 tags" to the expected json: {"id":"3544","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"