Coder Social home page Coder Social logo

Comments (4)

rowedenny avatar rowedenny commented on July 17, 2024

Hi @hyp1231, I read through the 0.1.X version again. I notice that currently the padding function is made within the dataset instead of dataloader. Thus it could be a challenge that what if the batch data is beyond the 2D, say a batch of comments(a 3D tensor), or probably a batch of images (a 4D tensor).

Therefore, will it be more flexible to implement the padding within the dataloader, and then the function dict_to_interaction just converts the numpy array to torch tensor?

from recbole.

hyp1231 avatar hyp1231 commented on July 17, 2024

Hi @rowedenny, thanks for the kind suggestions and sorry for the delayed reply. It's an interesting question and our development team has made a thorough discussion on this issue.

Firstly, as for the challenge of higher dimension tensors, to make a trade-off between uniformity and flexibility, maybe it's a good idea to store the higher dimension tensors as SEQ-like features (e.g. token_seq or float_seq) in atomic files (actually they are flattened currently). After being fed into the models, these tensors can be reshaped into 3D or 4D manually according to hyperparameters. In this way, we still have only four feature types, but we can achieve higher dimension tensors inputs.

What's more, we find that the bottleneck of time-consuming lies in the conversion from pandas.DataFrame to Interaction in DataLoader of 0.1.x branch. Thus, @chenyushuo has made a refactorization, and now in branch 0.2.x, these conversions are done in Dataset, which speeds up a lot. However, in this situation, it's much more difficult to implement the padding within the DataLoader.

By the way, we have just opened the Discussions, and welcome to try it up! :D

from recbole.

rowedenny avatar rowedenny commented on July 17, 2024

I recently read throughout the implementation on version 0.2, and find that model.type is a key factor that affect how the pipeline decides what dataset, dataloader and sampler. So I am wondering can we register an enum variable named customized such that the user can freely implement its corresponding dataset, dataloader and sampler?

from recbole.

2017pxy avatar 2017pxy commented on July 17, 2024

@rowedenny Hi, thx for your advice and we will carefully consider it. BTW, we have released a new version (v1.0.0) and we rebuild the dataloader, you can read our latest code for more details.

from recbole.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.