AI prompts
base on [WWW'2024] "RLMRec: Representation Learning with Large Language Models for Recommendation" # RLMRec: Representation Learning with Large Language Models for Recommendation
<img src='RLMRec_cover.png' />
This is the PyTorch implementation by <a href='https://github.com/Re-bin'>@Re-bin</a> for RLMRec model proposed in this [paper](https://arxiv.org/abs/2310.15950):
>**Representation Learning with Large Language Models for Recommendation**
>Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, Chao Huang*\
>*WWW2024*
\* denotes corresponding author
<p align="center">
<img src="RLMRec.png" alt="RLMRec" />
</p>
In this paper, we propose a model-agnostic framework **RLMRec** that enhances existing recommenders with LLM-empowered representation learning. It proposes a paradigm that integrates representation learning with LLMs to capture intricate semantic aspects of user behaviors and preferences. RLMRec incorporates auxiliary textual signals, develops a user/item profiling paradigm empowered by LLMs, and aligns the semantic space of LLMs with the representation space of collaborative relational signals through a cross-view alignment framework.
## š Environment
You can run the following command to download the codes faster:
```bash
git clone --depth 1 https://github.com/HKUDS/RLMRec.git
```
Then run the following commands to create a conda environment:
```bash
conda create -y -n rlmrec python=3.9
conda activate rlmrec
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
pip install pyyaml tqdm
```
š The codes are developed based on the [SSLRec](https://github.com/HKUDS/SSLRec) framework.
## š Text-attributed Recommendation Dataset
We utilized three public datasets to evaluate RLMRec: *Amazon-book, Yelp,* and *Steam*.
Each user and item has a generated text description.
First of all, please **download the data** by running following commands.
```
cd data/
wget https://archive.org/download/rlmrec_data/data.zip
unzip data.zip
```
You can also download our data from the [[Google Drive](https://drive.google.com/file/d/1PzePFsBcYofG1MV2FisFLBM2lMytbMdW/view?usp=sharing)].
Each dataset consists of a training set, a validation set, and a test set. During the training process, we utilize the validation set to determine when to stop the training in order to prevent overfitting.
```
- amazon(yelp/steam)
|--- trn_mat.pkl # training set (sparse matrix)
|--- val_mat.pkl # validation set (sparse matrix)
|--- tst_mat.pkl # test set (sparse matrix)
|--- usr_prf.pkl # text description of users
|--- itm_prf.pkl # text description of items
|--- usr_emb_np.pkl # user text embeddings
|--- itm_emb_np.pkl # item text embeddings
```
### User/Item Profile
- Each profile is a **high quality text description** of a user/item.
- Both user and item profiles are generated from **Large Language Models** from raw text data.
- The `user profile` (in `usr_prf.pkl`) shows the particular types of items that the user tends to prefer.
- The `item profile` (in `itm_prf.pkl`) articulates the specific types of users that the item is apt to attract.
š You can run the code `python data/read_profile.py` as an example to read the profiles as follows.
```
$ python data/read_profile.py
User 123's Profile:
PROFILE: Based on the kinds of books the user has purchased and reviewed, they are likely to enjoy historical
fiction with strong character development, exploration of family dynamics, and thought-provoking themes. The user
also seems to enjoy slower-paced plots that delve deep into various perspectives. Books with unexpected twists,
connections between unrelated characters, and beautifully descriptive language could also be a good fit for
this reader.
REASONING: The user has purchased several historical fiction novels such as 'Prayers for Sale' and 'Fall of
Giants' which indicate an interest in exploring the past. Furthermore, the books they have reviewed, like 'Help
for the Haunted' and 'The Leftovers,' involve complex family relationships. Additionally, the user appreciates
thought-provoking themes and character-driven narratives as shown in their review of 'The Signature of All
Things' and 'The Leftovers.' The user also enjoys descriptive language, as demonstrated in their review of
'Prayers for Sale.'
```
### Semantic Representation
- Each user and item has a semantic embedding encoded from its own profile using **Text Embedding Models**.
- The encoded semantic embeddings are stored in `usr_emb_np.pkl` and `itm_emb_np.pkl`.
### Mapping to Original Data
The original data of our dataset can be found from following links (thanks to their work):
- Yelp: https://www.yelp.com/dataset
- Amazon-book: https://cseweb.ucsd.edu/~jmcauley/datasets/amazon/links.html
- Steam: https://github.com/kang205/SASRec
We provide the **mapping dictionary** in JSON format in the `data/mapper` folder to map the `user/item ID` in our processed data to the `original identification` in original data (e.g., asin for items in Amazon-book).
š¤ Welcome to use our processed data to improve your research!
## š Examples to run the codes
The command to evaluate the backbone models and RLMRec is as follows.
- Backbone
```python encoder/train_encoder.py --model {model_name} --dataset {dataset} --cuda 0```
- RLMRec-Con **(Constrastive Alignment)**:
```python encoder/train_encoder.py --model {model_name}_plus --dataset {dataset} --cuda 0```
- RLMRec-Gen **(Generative Alignment)**:
```python encoder/train_encoder.py --model {model_name}_gene --dataset {dataset} --cuda 0```
Supported models/datasets:
* model_name: `gccf`, `lightgcn`, `sgl`, `simgcl`, `dccf`, `autocf`
* dataset: `amazon`, `yelp`, `steam`
Hypeparameters:
* The hyperparameters of each model are stored in `encoder/config/modelconf` (obtained by grid-search).
**For advanced usage of arguments, run the code with --help argument.**
## š® Profile Generation and Semantic Representation Encoding
Here we provide some examples with *Yelp* Data to generate user/item profiles and semantic representations.
Firstly, we need to complete the following three steps.
- Install the openai library `pip install openai`
- Prepare your **OpenAI API Key**
- Enter your key on `Line 5` of these files: `generation\{item/user/emb}\generate_{profile/emb}.py`.
Then, here are the commands to generate the desired output with examples:
- **Item Profile Generation**:
```python generation/item/generate_profile.py```
- **User Profile Generation**:
```python generation/user/generate_profile.py```
- **Semantic Representation**:
```python generation/emb/generate_emb.py```
For semantic representation encoding, you can also try other text embedding models like [Instructor](https://github.com/xlang-ai/instructor-embedding) or [Contriever](https://github.com/facebookresearch/contriever).
š The **instructions** we designed are saved in the `{user/item}_system_prompt.txt` files and also the `generation/instruction` folder. You can modify them according to your requirements and generate the desired output!
## š Citation
If you find this work is helpful to your research, please consider citing our paper:
```bibtex
@inproceedings{ren2024representation,
title={Representation learning with large language models for recommendation},
author={Ren, Xubin and Wei, Wei and Xia, Lianghao and Su, Lixin and Cheng, Suqi and Wang, Junfeng and Yin, Dawei and Huang, Chao},
booktitle={Proceedings of the ACM on Web Conference 2024},
pages={3464--3475},
year={2024}
}
```
**Thanks for your interest in our work!**
", Assign "at most 3 tags" to the expected json: {"id":"4243","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"