I may point out that the current bottleneck is we cannot easily incorporate th

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Not easy to incorporate the customized dataloader and sampler about recbole HOT 4 CLOSED

rucaibox commented on July 17, 2024

Not easy to incorporate the customized dataloader and sampler

from recbole.

Comments (4)

rowedenny commented on July 17, 2024

Hi @hyp1231, I read through the 0.1.X version again. I notice that currently the padding function is made within the dataset instead of dataloader. Thus it could be a challenge that what if the batch data is beyond the 2D, say a batch of comments(a 3D tensor), or probably a batch of images (a 4D tensor).

Therefore, will it be more flexible to implement the padding within the dataloader, and then the function dict_to_interaction just converts the numpy array to torch tensor?

from recbole.

hyp1231 commented on July 17, 2024

Hi @rowedenny, thanks for the kind suggestions and sorry for the delayed reply. It's an interesting question and our development team has made a thorough discussion on this issue.

Firstly, as for the challenge of higher dimension tensors, to make a trade-off between uniformity and flexibility, maybe it's a good idea to store the higher dimension tensors as SEQ-like features (e.g. token_seq or float_seq) in atomic files (actually they are flattened currently). After being fed into the models, these tensors can be reshaped into 3D or 4D manually according to hyperparameters. In this way, we still have only four feature types, but we can achieve higher dimension tensors inputs.

What's more, we find that the bottleneck of time-consuming lies in the conversion from pandas.DataFrame to Interaction in DataLoader of 0.1.x branch. Thus, @chenyushuo has made a refactorization, and now in branch 0.2.x, these conversions are done in Dataset, which speeds up a lot. However, in this situation, it's much more difficult to implement the padding within the DataLoader.

By the way, we have just opened the Discussions, and welcome to try it up! :D

from recbole.

rowedenny commented on July 17, 2024

I recently read throughout the implementation on version 0.2, and find that model.type is a key factor that affect how the pipeline decides what dataset, dataloader and sampler. So I am wondering can we register an enum variable named customized such that the user can freely implement its corresponding dataset, dataloader and sampler?

from recbole.

2017pxy commented on July 17, 2024

@rowedenny Hi, thx for your advice and we will carefully consider it. BTW, we have released a new version (v1.0.0) and we rebuild the dataloader, you can read our latest code for more details.

from recbole.

Recommend Projects

Not easy to incorporate the customized dataloader and sampler about recbole HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent