ethanfetaya / nri Goto Github PK
View Code? Open in Web Editor NEWNeural relational inference for interacting systems - pytorch
License: MIT License
Neural relational inference for interacting systems - pytorch
License: MIT License
Hi,
Great paper. Do you plan to make your code compatible with Pytorch 1.xx ?
Thanks.
According to Section 5.1 of the original paper, I use the code by Laszuk (https://github.com/laszukdawid/Dynamical-systems/blob/master/kuramoto.py) to simulate the Kuramoto model. The settings are listed as follows.
N = 5 # number of particles
intrinsic frequencies \omega uniformly sampled from [1, 10)
initial phases \phi uniformly sampled from [0, 2\pi)
coupling constants k_{ij} = 1 with probability 0.5
subsample factor = 10
length of trajectories T = 50
particle states x = (d\phi / dt, sin \phi, \omega)
For normalization, I use the function load_kuramoto_data
from utils.py
.
Some important settings of NRI are listed as follows.
encoder: CNN
decoder: MLP
skip_first = True
lr = 5e-4
prediction_step = 10 # teacher forcing in every 10-th time step
It seems I've strictly followed the settings of the original paper, but the accuracy gets stucked at around 54%, and the mse gets stucked at the level of 1e-1. There must be some mistakes in simulation or training. Do you have any advice? Would you mind providing a copy of Kuramoto dataset to help me out?
Thanks for your amazing code. How to plot the beautiful trajectory like your figure1?
First, thanks a lot for sharing this great repo.
I have two questions with the computation of relation prediction accuracy:
batch-size
parameter (however, it should not be influenced by batch-size
because the model does not change), especially when the number of test examples is not very large. The reason could be that not all batches have batch-size
examples (if num_test_example % batch-size != 0
). I feel it is better that edge_accuracy() in utils.py returns the average accuracy and the number of examples in this batch, and then compute the average in the main script by taking the division.max(acc, 1.0-acc)
? Besides, I wonder do you have some ideas to compute the accuracy with multiple (>2) relation cases? (the current edge_accuracy() function seems only suitable for two-relation case).flake8 testing of https://github.com/ethanfetaya/NRI on Python 3.6.3
$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
./utils.py:459:24: F821 undefined name 'args'
const = np.log(args.edge_types)
^
In the Appendix, A.2., unsupervised learning was done:
To test whether our model can infer an empty graph, we create a test set of 1000 simulations with 5 non-interacting particles and test an unsupervised NRI model which was trained on the spring simulation dataset with 5 particles as before. We find that it achieves an accuracy of 98.4% in identifying ”no interaction” edges (i.e. the empty graph).
Can someone point out do unsupervised learning from the code in this repo?
what does mean edge_type?
what does the argument num_atoms
mean in the code?
atom
is not shown in the paper.
Many thanks for the interesting work.
Indeed, I am trying to use your model on large biological graphs (more than 10K nodes) but I am facing memory limits.
Basically, you are using the one-hot encoding for all the edges in a fully connected graph to exchange the messages and to facilitate the optimization of the ELBO. For very large graphs such encoding is not an option.
I tried using sparse tensors but the missing strides for torch.matmul (requires contiguous representation for the data) and the unsupported broadcasting for matrix multiplication with torch.mm limited my efforts to patch your implementation.
Do you have please an idea on how we could extend the application of your model on large graphs?
Thank you very much in advance.
For the system in which 2 particles interact or not, such as the spring experiments, if we use z_{ij}=[0,1] to denote interaction, and z_{ij} = [1,0] to denote non-interaction(no message between node i and j), in the decoder should we only consider the interaction edge type, i.e., h^t_(i,j) = z_{ij,0}fe([x^t_i, x^t_j])? Since no message between the non-interaction edge.
It doesn’t look like the temperature is annealed in your gumbel softmax. Is there a reason for this as it is not standard? @tkipf
Dear ethanfetaya:
I learn the codes of RNNDecoder and find some difference from the equations: (14)-(16) in your paper. In your code, you do not concatenate the MSG and x as the input of GRU and there is not additional hidden state. Why? Which is right?
hi,thanks for your really great code!
it seems you just implement Physics simulations dataset in your code. i want to apply it to reasoning in video/image, and i dont know the meaning of npy file.
'edges_valid_springs5.npy' is (10000,5,5),what dose the last (5,5) mean in video.
'loc_valid_springs5.npy' is (10000,49,2,5),what dose the last(2,5) mean in video.
vel_valid_springs5.npy' is (10000,49,2,5),what dose the last(2,5) mean in video.
, and can those nodes be output of regien proposal like ROIAlign?
look forward to your reply.
There is no supervised training in training. How to know the first type is the existence side and the second type is the non existence side.
def edge_accuracy(preds, target): _, preds = preds.max(-1) # preds torch.Size([32, 20, 2]) preds_hou torch.Size([32, 20]) correct = preds.float().data.eq( target.float().data.view_as(preds)).cpu().sum() return np.float(correct) / (target.size(0) * target.size(1))
what does mean logits shape ?
logits = encoder(pts, rel_rec, rel_send)
logits - torch.Size([32, 182, 3])
Why does the my_softmax function seems to be normalizing alongside the batch dimension instead of classes dimension?
The step x = F.elu(self.fc1(inputs)) has error. When using the forward in MLP class, the error says "mat1 and mat2 shapes cannot be multiplied (640*16 and 196*512)".
Hi,
I could generate the data using this command:
python generate_dataset.py
But when I want to run this command:
--simulation charged
It gives me this error:
error: '--simulation' is not recognized as an internal or external command, operable program or batch file.
In the test phase, the encoder sees ground truth data that should not be seen, resulting in higher precision. May I ask for some explanation?
Below is the code snippet of MLPDecoder.
I think prediction is ended with Eq. 11 in the paper.
I can't find the code of Eq. 12.
Am I missing something in this code??
Thanks in advance.
def single_step_forward(self, single_timestep_inputs, rel_rec, rel_send,
single_timestep_rel_type):
# single_timestep_inputs has shape
# [batch_size, num_timesteps, num_atoms, num_dims]
# single_timestep_rel_type has shape:
# [batch_size, num_timesteps, num_atoms*(num_atoms-1), num_edge_types]
# Node2edge
receivers = torch.matmul(rel_rec, single_timestep_inputs)
senders = torch.matmul(rel_send, single_timestep_inputs)
# Eq 10 [x_i^t, x_j^t] [#sims(batch_size), #tsteps_indexed, #edges, #dims*2]
pre_msg = torch.cat([senders, receivers], dim=-1)
# self.msg_out_shape = #node_features
all_msgs = Variable(torch.zeros(pre_msg.size(0), pre_msg.size(1),
pre_msg.size(2), self.msg_out_shape))
if single_timestep_inputs.is_cuda:
all_msgs = all_msgs.cuda()
if self.skip_first_edge_type:
start_idx = 1
else:
start_idx = 0
# Run separate MLP for every edge type
# NOTE: To exlude one edge type, simply offset range by 1
# Eq 10 MLP
for i in range(start_idx, len(self.msg_fc2)):
msg = F.relu(self.msg_fc1[i](pre_msg))
msg = F.dropout(msg, p=self.dropout_prob)
msg = F.relu(self.msg_fc2[i](msg))
msg = msg * single_timestep_rel_type[:, :, :, i:i + 1] #element-wise product with broadcast
all_msgs += msg
# Aggregate all msgs to receiver
# Eq 11 / rel_rec [#edges, #nodes]
agg_msgs = all_msgs.transpose(-2, -1).matmul(rel_rec).transpose(-2, -1)
agg_msgs = agg_msgs.contiguous()
# Skip connection
aug_inputs = torch.cat([single_timestep_inputs, agg_msgs], dim=-1)
# Output MLP
pred = F.dropout(F.relu(self.out_fc1(aug_inputs)), p=self.dropout_prob)
pred = F.dropout(F.relu(self.out_fc2(pred)), p=self.dropout_prob)
pred = self.out_fc3(pred)
# Predict position/velocity difference / Eq 11 >> Where is Eq 12??
return single_timestep_inputs + pred
def forward(self, inputs, rel_type, rel_rec, rel_send, pred_steps=1):
# NOTE: Assumes that we have the same graph across all samples.
# Input shape: [num_sims, num_atoms, num_timesteps, num_dims] > [#sims, #tsteps, #nodes, #dims]
inputs = inputs.transpose(1, 2).contiguous()
sizes = [rel_type.size(0), inputs.size(1), rel_type.size(1),
rel_type.size(2)]
rel_type = rel_type.unsqueeze(1).expand(sizes)
time_steps = inputs.size(1)
assert (pred_steps <= time_steps)
preds = []
# Only take n-th timesteps as starting points (n: pred_steps)
last_pred = inputs[:, 0::pred_steps, :, :]
curr_rel_type = rel_type[:, 0::pred_steps, :, :]
# NOTE: Assumes rel_type is constant (i.e. same across all time steps).
# Run n prediction steps / Eq 10~11
for step in range(0, pred_steps):
last_pred = self.single_step_forward(last_pred, rel_rec, rel_send,
curr_rel_type)
preds.append(last_pred)
sizes = [preds[0].size(0), preds[0].size(1) * pred_steps,
preds[0].size(2), preds[0].size(3)]
output = Variable(torch.zeros(sizes))
if inputs.is_cuda:
output = output.cuda()
# Re-assemble correct timeline
for i in range(len(preds)):
output[:, i::pred_steps, :, :] = preds[i]
# last prediction is one step beyond input
pred_all = output[:, :(inputs.size(1) - 1), :, :]
return pred_all.transpose(1, 2).contiguous()
Hello, thank you for your great work and nice code.
I saw the supplementary material, and it said that NRI can learn "known" 3 edge types (no-interaction, weak spring, strong spring).
In this sentence, dose "known" mean that NRI can learn the relations only in supervised manner, not in unsupervised manner?
In the source code, is it right that relation-supervised training is not implemented?
Again, thank you for your great work!
Hello,
I have read the paper and the code and I'm fascinated about this tool and their possible applications.
In my biological set-up I have different objects from which I want to create an interaction graph. Unfortunately, not all biological objects have the same number of attributes, e.g. fibrines have defined their morphometry but not their phenotype, and cells have defined their phenotype but not their morphology. I would like to know if there is any relation between them.
I have thought about creating an attribute vector containing all the features that are available. Following the example: fibrines would have a vector of 2 attributes with their morphometry leaving their phenotype undefined (using zeros or random numbers), and cells will have their phenotype defined leaving the morphometry undefined.
Can you give me any suggestions about this approach based on your experience?
Thank you,
Daniel Jiménez.
Hi, I can not reproduce the experimental results of the charged simulation dataset. The accuracy is only 50+% and I didn't modify the code (just modify variable() to fit higher pytorch versions). Also, when I try to reproduce the experimental results of the spring simulation dataset, the accuracy is not good when I do not apply --skip_first (only about 70%). Can you help me out? Thank you very much!
Hi, thanks for the the code release.
To make sure that I am running the code properly, I am trying to reproduce some of the paper results. What's the correspondence between the results returned by the code and those reported in the paper? My understanding is as follows:
np.mean(acc_test)
.mse_test
and mean_mse
. My understanding is that np.mean(mse_test)
should be similar to the first column of Table 2 (because a prediction step of 1 is being used, see line 323 of train.py), and np.mean(mean_mse)
should be similar to the third column of Table 2 (because a prediction step of 20 is being used, see line 351 of train.py).Is this correct? Thank you!
Hi,
Thanks for your great work,
Can you provide the link or the sport basketball dataset you used in your paper?
You also mentioned that you focused on the PnR instances of the game. How to find these instances?
Best,
Line 93 os.mkdir ----> os.makedirs
Line 46 default='logs' ----> default='./logs'
Not a big problem, Just mention it here for others' convenient.
Hi, thanks for your outstanding works and contribution. I have a question that Can we use dynamic_graph in training step? If yes, can you give me some implementation guidance? Thank you very much!
It took me nearly 8 hours to generate data. Is it normal?
The CPU utilization is very low during the generation process, I suppose the program could be further optimized.
I was wondering if we can fix the latent graph to be an undirected graph. The schematics in figure 1 suggests that this would be possible, but I can't see an option for that in the code. Thanks!
Hi, why is the prior uniform distributed?
I was wondering if we can fix the latent graph to be an undirected graph. The schematics in figure 1 suggests that this would be possible, but I can't see an option for that in the code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.