Coder Social home page Coder Social logo

Comments (13)

kaixuanliu avatar kaixuanliu commented on June 29, 2024 1
image

from graphlearn-for-pytorch.

kaixuanliu avatar kaixuanliu commented on June 29, 2024

It may be because igbh-large dataset has 2 another node type ('conference' and 'journal') which do not exist in igbh-tiny/small/medium, and we do not process them in dataset.py. I will try to fix it.

from graphlearn-for-pytorch.

yao-matrix avatar yao-matrix commented on June 29, 2024

@LiSu

from graphlearn-for-pytorch.

LiSu avatar LiSu commented on June 29, 2024

@kaixuanliu How was the data partitioned? Partitioning the dataset in each of the four nodes may incur this problem as there exists randomness in the process of partitioning. If the dataset was partitioned in each node independently in your experiment, try partitioning it using one node and copy the partitioned data to the rest.

from graphlearn-for-pytorch.

kaixuanliu avatar kaixuanliu commented on June 29, 2024

I use NFS and just partition the dataset once.

from graphlearn-for-pytorch.

LiSu avatar LiSu commented on June 29, 2024

I use NFS and just partition the dataset once.

I see, using NFS should be fine.

But journal and conference nodes and relevant edges are covered for the large and full datasets in dataset.py.

I will try to reproduce this problem.

from graphlearn-for-pytorch.

kaixuanliu avatar kaixuanliu commented on June 29, 2024

But journal and conference nodes and relevant edges are covered for the large and full datasets in dataset.py.

Yes, I checked this, these part is ok. And I root caused the bug. Here is the problem, when we do not have neighbors in one partition, the sampled neighbor output will use input seeds, while in distributed training, we need to get the partition book of sampled output, here we will get dst node partition book using src node global id, hence it will cause index out of bounds error.

from graphlearn-for-pytorch.

husimplicity avatar husimplicity commented on June 29, 2024

Thanks for your feedback. I agree this is the problem. Will seek a solution.

from graphlearn-for-pytorch.

kaixuanliu avatar kaixuanliu commented on June 29, 2024

seems dgl process this kind of situation using a different approach:dgl reference

from graphlearn-for-pytorch.

husimplicity avatar husimplicity commented on June 29, 2024

Yes, we are considering using an empty tensor when sampling nothing.

from graphlearn-for-pytorch.

husimplicity avatar husimplicity commented on June 29, 2024

It seems just using empty tensors can fix this and no other modification is necessary in my environment. Would you like to try it first? Will push it after holiday if no further problems.
Here

    if nbrs.numel() == 0:
      # nbrs, nbrs_num = input_seeds, torch.ones_like(input_seeds)
      # if self.with_edge:
      #   edge_ids = -1 * nbrs_num
      nbrs = torch.tensor([], dtype=torch.int64 ,device=self.device)
      nbrs_num = torch.zeros_like(input_seeds, dtype=torch.int64, device=self.device)
      edge_ids = torch.tensor([], dtype=torch.int64, device=self.device) if self.with_edge else None

And before Here
Add

  if output.nbr.size(0) > 0:

from graphlearn-for-pytorch.

kaixuanliu avatar kaixuanliu commented on June 29, 2024

Another minor changes needed, and I have verified it for 2 epochs in igbh-large dataset. 1 PR submitted. FYI.

from graphlearn-for-pytorch.

husimplicity avatar husimplicity commented on June 29, 2024

Closed by #49

from graphlearn-for-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.