Coder Social home page Coder Social logo

hkuds / llmrec Goto Github PK

View Code? Open in Web Editor NEW
256.0 4.0 35.0 3.74 MB

[WSDM'2024 Oral] "LLMRec: Large Language Models with Graph Augmentation for Recommendation"

Home Page: https://arxiv.org/abs/2311.00423

Python 100.00%
content-based-recommendation data-augmentation-strategies graph-augmentation recommendation-system recommendation-with-side-information multi-modal-recommendation colloborative-filtering graph-learning

llmrec's Introduction

LLMRec: Large Language Models with Graph Augmentation for Recommendation

PyTorch implementation for WSDM 2024 paper LLMRec: Large Language Models with Graph Augmentation for Recommendation.

Wei Wei, Xubin Ren, Jiabin Tang, Qingyong Wang, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin and Chao Huang*. (*Correspondence)

Data Intelligence Lab@University of Hong Kong, Baidu Inc.

YouTube

This repository hosts the code, original data and augmented data of LLMRec.


LLMRec

LLMRec is a novel framework that enhances recommenders by applying three simple yet effective LLM-based graph augmentation strategies to recommendation system. LLMRec is to make the most of the content within online platforms (e.g., Netflix, MovieLens) to augment interaction graph by i) reinforcing u-i interactive edges, ii) enhancing item node attributes, and iii) conducting user node profiling, intuitively from the natural language perspective.


πŸŽ‰ News πŸ“’πŸ“’

  • [2024.3.20] πŸš€πŸš€ πŸ“’πŸ“’πŸ“’πŸ“’πŸŒΉπŸ”₯πŸ”₯πŸš€πŸš€ Because baselines LATTICE and MMSSL require some minor modifications, we provide code that can be easily run by simply modifying the dataset path.

  • [2023.11.3] πŸš€πŸš€ Release the script for constructing the prompt.

  • [2023.11.1] πŸ”₯πŸ”₯ Release the multi-modal datasets (Netflix, MovieLens), including textual data and visual data.

  • [2023.11.1] πŸš€πŸš€ Release LLM-augmented textual data(by gpt-3.5-turbo-0613), and LLM-augmented embedding(by text-embedding-ada-002).

  • [2023.10.28] πŸ”₯πŸ”₯ The full paper of our LLMRec is available at LLMRec: Large Language Models with Graph Augmentation for Recommendation.

  • [2023.10.28] πŸš€πŸš€ Release the code of LLMRec.

πŸ‘‰ TODO

  • Provide different larger version of the datasets.
  • ...

Dependencies

pip install -r requirements.txt

Usage

Stage 1: LLM-based Data Augmentation

cd LLMRec/LLM_augmentation/
python ./gpt_ui_aug.py
python ./gpt_user_profiling.py
python ./gpt_i_attribute_generate_aug.py

Stage 2: Recommender training with LLM-augmented Data

cd LLMRec/
python ./main.py --dataset {DATASET}

Supported datasets: netflix, movielens

Specific code execution example on 'netflix':

# LLMRec
python ./main.py

# w/o-u-i
python ./main.py --aug_sample_rate=0.0

# w/o-u
python ./main.py --user_cat_rate=0

# w/o-u&i
python ./main.py --user_cat_rate=0  --item_cat_rate=0

# w/o-prune
python ./main.py --prune_loss_drop_rate=0

Datasets

β”œβ”€ LLMRec/ 
    β”œβ”€β”€ data/
      β”œβ”€β”€ netflix/
      ...

Multi-modal Datasets

🌹🌹 Please cite our paper if you use the 'netflix' dataset~ ❀️

We collected a multi-modal dataset using the original Netflix Prize Data released on the Kaggle website. The data format is directly compatible with state-of-the-art multi-modal recommendation models like LLMRec, MMSSL, LATTICE, MICRO, and others, without requiring any additional data preprocessing.

Textual Modality: We have released the item information curated from the original dataset in the "item_attribute.csv" file. Additionally, we have incorporated textual information enhanced by LLM into the "augmented_item_attribute_agg.csv" file. (The following three images represent (1) information about Netflix as described on the Kaggle website, (2) textual information from the original Netflix Prize Data, and (3) textual information augmented by LLMs.)

Image 1 Image 2 Image 2

Visual Modality: We have released the visual information obtained from web crawling in the "Netflix_Posters" folder. (The following image displays the poster acquired by web crawling using item information from the Netflix Prize Data.)

Image 1

Original Multi-modal Datasets & Augmented Datasets

Image 1

Download the Netflix dataset.

πŸš€πŸš€ We provide the processed data (i.e., CF training data & basic user-item interactions, original multi-modal data including images and text of items, encoded visual/textual features and LLM-augmented text/embeddings). 🌹 We hope to contribute to our community and facilitate your research πŸš€πŸš€ ~

Encoding the Multi-modal Content.

We use CLIP-ViT and Sentence-BERT separately as encoders for visual side information and textual side information.


Prompt & Completion Example

LLM-based Implicit Feedback Augmentation

Prompt

Recommend user with movies based on user history that each movie with title, year, genre. History: [332] Heart and Souls (1993), Comedy|Fantasy [364] Men with Brooms(2002), Comedy|Drama|Romance Candidate: [121]The Vampire Lovers (1970), Horror [155] Billabong Odyssey (2003),Documentary [248]The Invisible Guest 2016, Crime, Drama, Mystery Output index of user's favorite and dislike movie from candidate.Please just give the index in [].

Completion

248 121

LLM-based User Profile Augmentation

Prompt

Generate user profile based on the history of user, that each movie with title, year, genre. History: [332] Heart and Souls (1993), Comedy|Fantasy [364] Men with Brooms (2002), Comedy|Drama|Romance Please output the following infomation of user, output format: {age: , gender: , liked genre: , disliked genre: , liked directors: , country: , language: }

Completion

{age: 50, gender: female, liked genre: Comedy|Fantasy, Comedy|Drama|Romance, disliked genre: Thriller, Horror, liked directors: Ron Underwood, country: Canada, United States, language: English}

LLM-based Item Attributes Augmentation

Prompt

Provide the inquired information of the given movie. [332] Heart and Souls (1993), Comedy|Fantasy The inquired information is: director, country, language. And please output them in form of: director, country, language

Completion

Ron Underwood, USA, English

Augmented Data

Augmented Implicit Feedback (Edge)

For each user, 0 represents a positive sample, and 1 represents a negative sample.

Image 2

Augmented User Profile (User Node)

For each user, the dictionary stores augmented information such as 'age,' 'gender,' 'liked genre,' 'disliked genre,' 'liked directors,' 'country,' and 'language.'

Image 2

Augmented item attribute

For each item, the dictionary stores augmented information such as 'director,' 'country,' and 'language.'

Image 2

Candidate Preparing for LLM-based Implicit Feedback Augmentation

step 1: select base model such as MMSSL or LATTICE

step 2: obtain user embedding and item embedding

step 3: generate candidate

      _, candidate_indices = torch.topk(torch.mm(G_ua_embeddings, G_ia_embeddings.T), k=10)  
      pickle.dump(candidate_indices.cpu(), open('./data/' + args.datasets +  '/candidate_indices','wb'))

Example of specific candidate data.

In [3]: candidate_indices
Out[3]: 
tensor([[ 9765,  2930,  6646,  ..., 11513, 12747, 13503],
        [ 3665,  8999,  2587,  ...,  1559,  2975,  3759],
        [ 2266,  8999,  1559,  ...,  8639,   465,  8287],
        ...,
        [11905, 10195,  8063,  ..., 12945, 12568, 10428],
        [ 9063,  6736,  6938,  ...,  5526, 12747, 11110],
        [ 9584,  4163,  4154,  ...,  2266,   543,  7610]])

In [4]: candidate_indices.shape
Out[4]: torch.Size([13187, 10])

Citing

If you find this work helpful to your research, please kindly consider citing our paper.

@article{wei2023llmrec,
  title={LLMRec: Large Language Models with Graph Augmentation for Recommendation},
  author={Wei, Wei and Ren, Xubin and Tang, Jiabin and Wang, Qinyong and Su, Lixin and Cheng, Suqi and Wang, Junfeng and Yin, Dawei and Huang, Chao},
  journal={arXiv preprint arXiv:2311.00423},
  year={2023}
}

Acknowledgement

The structure of this code is largely based on MMSSL, LATTICE, MICRO. Thank them for their work.

llmrec's People

Contributors

eltociear avatar gmsft avatar guspan-tanadi avatar weiwei1206 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

llmrec's Issues

No such file or directory: 'candidate_indices'

Hi there,

when I run python gpt_ui_aug.py, the error is:

Traceback (most recent call last):
File "/cling/xcosdaem495/LLMRec/LLM_augmentation_construct_prompt/gpt_ui_aug.py", line 86, in
candidate_indices = pickle.load(open(file_path + 'candidate_indices','rb'))
FileNotFoundError: [Errno 2] No such file or directory: 'candidate_indices'

Can you please help?

how to install packages in requirements.txt?

hi!
when I perform:
pip install -r requirements.txt
it always causes errors like:
ERROR: Could not find a version that satisfies the requirement anaconda-client==1.11.0 (from versions: 1.1.1, 1.2.2)
ERROR: No matching distribution found for anaconda-client==1.11.0
and many other packages are not the correct version as well.
How can I solve it?

Which python version

Hi, thanks for this great work.

I was trying setup the virtual env, and when I run "pip install -r requirements.txt", I get errors related to version.
For example -
ERROR: Could not find a version that satisfies the requirement anaconda-client==1.11.0 (from versions: 1.1.1, 1.2.2)
ERROR: No matching distribution found for anaconda-client==1.11.0

I'm using Python 3.10.13 with Ubuntu 22.04.3 LTS.

I was wondering if there is any specific requirement of Python version. I anyone can suggest how to resolve these.

Best Regards
Raj

Inquiry about gpt_i_attribute_generate_aug.py Script Usage in LLMRec

I hope this message finds you well. I am reaching out regarding the gpt_i_attribute_generate_aug.py script included in the LLMRec that I found on GitHub. According to the README file, this script is intended to be executed as part of the project's workflow. However, upon inspecting the script, it appears to mainly consist of function definitions without a clear entry point or executable code block.

Could you kindly provide additional guidance or an updated version of the script that illustrates how to properly execute it or utilize these functions within the broader context of the project?

Your assistance in this matter would be greatly appreciated, as it would significantly enhance my understanding and usage of your valuable work.

Thank you for your time and contributions to this project. I look forward to your response.

Movielens dataset

Hi! Your code was using the movielens dataset. And it failed when using the netflix dataset😭. Could you please share the processed movielens data? Thanks in advance! πŸ™πŸ™

requirements.txt contains '@file' directive.

Hi,
Thanks for sharing your wonderful work.
I'm interested in your work, and want to reproduce it. But there are some '@file' directives in the requirements.txt file. Could you please fix it?

Thank you.

Could not install packages

Hi, when I perform:

pip install -r requirements.txt

Error comes out as:
Processing /home/ktietz/src/ci/alabaster_1611921544520/work (from -r requirements.txt (line 6))
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/home/ktietz/src/ci/alabaster_1611921544520/work'

requirements.txt refers some package paths that don't exist in my local PC.

Error: the first step by run:"python3 gpt_ui_aug.py"

Step1:
run
python3 gpt_ui_aug.py

error message:

Traceback (most recent call last):
File "/Users/~/Documents/dev/ai/LLMRec-main/LLM_augmentation_construct_prompt/gpt_ui_aug.py", line 85, in
candidate_indices = pickle.load(open(file_path + 'candidate_indices','rb'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'candidate_indices'

About the number of interactions in the data set

A good job! Stared!

However, the number of interactions in the Netflix or Movielens-10m dataset is larger than the dataset you use. How do you filte?

Looking forward to your answer! Thanks!

Failed to establish a new connection

There're two errors when running the file gpy_user_profiling.py:

  1. urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x145b449d0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known

  2. urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='llms-se.baidu-int.com', port=8200): Max retries exceeded with url: /chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x145b449d0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

Do you know to solve the problems?πŸ™πŸ™ Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.