Hi, I'm trying to study GAT, and awesome works! I'

Hi Yejun, The general answer is "it's up to you". :) <p dir="aut

how to understand “dropping all structural information” about gat HOT 3 CLOSED

petarv- commented on September 18, 2024 1

how to understand “dropping all structural information”

from gat.

Comments (3)

PetarV- commented on September 18, 2024 1

Hi Yejun,

Thank you for the issue, and your kind interest in GAT!

The phrase "without depending on the graph structure upfront" refers to the training/testing routine. Namely, GAT is an inductive method---the mechanism it learns is in principle not conditioned on the graph it has been trained on. This means that, at test time, you can apply GAT to any structure you'd like (including ones unseen at training time). This is in stark contrast to many methods that were published before (which were transductive, and wouldn't in theory work outside of the graph they were trained on).

Regarding the second phrase, I believe you've misread the paper a little bit. From what I recall, the phrase appears here:

"In its most general formulation, the model allows every node to attend on every other node, dropping all structural information"

This is the formulation before masked attention is introduced (i.e. we just do all-pairs self-attention as in the Transformer paper). Indeed, in this version the graph is not used at all. Afterwards we introduce the neighbourhoods, and the graph structure is injected.

So, to confirm, the GAT model does not drop all structural information. It uses the local adjacency information of every node to determine which other nodes to attend over. That being said, it only needs the local information (i.e. a node does not need to know anything about a node that is outside of its neighbourhood).

Hope that helps! Let me know more clarification is needed.

Thanks,
Petar

from gat.

guoyejun commented on September 18, 2024

thanks Petar!

btw, how does GAT consider about edge with an arrow. My understanding is that it depends on the definition of 'neighborhood', it means that alpha of one direction is learned/calculated, while alpha of the other direction is just zero.

from gat.

PetarV- commented on September 18, 2024

Hi Yejun,

The general answer is "it's up to you". :)

In the simplest case, as you suggested, the attention is simply not computed over one direction. Other authors like to include a notion of two "edge types" (inbound/outbound) and learn a separate set of attention heads for each edge type. I'd say -- it really depends on the problem you're trying to solve (and how expressive the edges actually are semantically), but ultimately the framework is quite flexible with respect to how you choose to approach this.

Thanks,
Petar

from gat.

how to understand “dropping all structural information” about gat HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent