Coder Social home page Coder Social logo

Comments (4)

ShanleiMu avatar ShanleiMu commented on August 15, 2024

Please refer to #614.

from recbole.

chenyushuo avatar chenyushuo commented on August 15, 2024

We always do data augmentation in the sequential model. You can see the following code for details:

def prepare_data_augmentation(self):
"""Augmentation processing for sequential dataset.
E.g., ``u1`` has purchase sequence ``<i1, i2, i3, i4>``,
then after augmentation, we will generate three cases.
``u1, <i1> | i2``
(Which means given user_id ``u1`` and item_seq ``<i1>``,
we need to predict the next item ``i2``.)
The other cases are below:
``u1, <i1, i2> | i3``
``u1, <i1, i2, i3> | i4``
Returns:
Tuple of ``self.uid_list``, ``self.item_list_index``,
``self.target_index``, ``self.item_list_length``.
See :class:`SequentialDataset`'s attributes for details.
Note:
Actually, we do not realy generate these new item sequences.
One user's item sequence is stored only once in memory.
We store the index (slice) of each item sequence after augmentation,
which saves memory and accelerates a lot.
"""
self.logger.debug('prepare_data_augmentation')
if hasattr(self, 'uid_list'):
return self.uid_list, self.item_list_index, self.target_index, self.item_list_length
self._check_field('uid_field', 'time_field')
max_item_list_len = self.config['MAX_ITEM_LIST_LENGTH']
self.sort(by=[self.uid_field, self.time_field], ascending=True)
last_uid = None
uid_list, item_list_index, target_index, item_list_length = [], [], [], []
seq_start = 0
for i, uid in enumerate(self.inter_feat[self.uid_field].values):
if last_uid != uid:
last_uid = uid
seq_start = i
else:
if i - seq_start > max_item_list_len:
seq_start += 1
uid_list.append(uid)
item_list_index.append(slice(seq_start, i))
target_index.append(i)
item_list_length.append(i - seq_start)
self.uid_list = np.array(uid_list)
self.item_list_index = np.array(item_list_index)
self.target_index = np.array(target_index)
self.item_list_length = np.array(item_list_length)
return self.uid_list, self.item_list_index, self.target_index, self.item_list_length

from recbole.

mayaKaplansky avatar mayaKaplansky commented on August 15, 2024

Thats perfect thank you so much!!

from recbole.

cramraj8 avatar cramraj8 commented on August 15, 2024

@chenyushuo

When I run the run_recbole for a GRU4Rec in sequential setting, the data augmentation were never run through. I tried to put print statements in the scripts to see if the data_augmentation function is called (

self.data_augmentation()
), but it never did.

And I am getting the below error because of that

"/home/xxx/xxx/RecBole/RecBole/recbole/model/sequential_recommender/gru4rec.py", line 84, in forward
    seq_output = self.gather_indexes(gru_output, item_seq_len - 1)
  File "/home/xxx/xxx/RecBole/RecBole/recbole/model/abstract_recommender.py", line 174, in gather_indexes
    output_tensor = output.gather(dim=1, index=gather_index)
RuntimeError: index 122111 is out of bounds for dimension 1 with size 53

Any ideas how I can solve it or am I missing any config statements ?

from recbole.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.