xilorole / raptgen Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi authors,
Thank you for your interesting work. I am right now trying your model, and I wonder what the sample data is. Is this data sampled from real data? Or is another source of data?
Thank you!
I could run scripts/real.py
on my data and train a model successfully. Then I also could run scripts/encode.py
to encode some sequences to their vector representations (exactly based on what is expected.) And I have a embed_seq.csv
in the output with the right format. Now that I'm trying to decode a file using scripts/decode.py
and the exact command in the readme, I get the following error:
saving to /content/drive/MyDrive/colab/RaptGen/out/decode
Traceback (most recent call last):
File "scripts/decode.py", line 111, in <module>
main()
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "scripts/decode.py", line 53, in main
arr = df.values[:,1:].astype(float)
ValueError: could not convert string to float: 'UGUAUAUGA'
Which I can see why this happens because df.values[:,1:]
tries to also convert the actual sequence's string to float. I changed the df.values[:,1:]
to df.values[:,2:]
so that we only convert the two-column vector representation of sequences, I don't get that error anymore, but this is the new error I get:
saving to /content/drive/MyDrive/colab/RaptGen/out/decode
loading model parameters
Traceback (most recent call last):
File "scripts/decode.py", line 111, in <module>
main()
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "scripts/decode.py", line 58, in main
model.load_state_dict(torch.load(model_path))
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CNN_PHMM_VAE:
size mismatch for decoder.tr_from_M.2.weight: copying a param with shape torch.Size([30, 32]) from checkpoint, the shape in current model is torch.Size([63, 32]).
size mismatch for decoder.tr_from_M.2.bias: copying a param with shape torch.Size([30]) from checkpoint, the shape in current model is torch.Size([63]).
size mismatch for decoder.tr_from_I.2.weight: copying a param with shape torch.Size([20, 32]) from checkpoint, the shape in current model is torch.Size([42, 32]).
size mismatch for decoder.tr_from_I.2.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([42]).
size mismatch for decoder.tr_from_D.2.weight: copying a param with shape torch.Size([20, 32]) from checkpoint, the shape in current model is torch.Size([42, 32]).
size mismatch for decoder.tr_from_D.2.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([42]).
size mismatch for decoder.emission.2.weight: copying a param with shape torch.Size([36, 32]) from checkpoint, the shape in current model is torch.Size([80, 32]).
size mismatch for decoder.emission.2.bias: copying a param with shape torch.Size([36]) from checkpoint, the shape in current model is torch.Size([80]).
Looks like it can't load the trained model. Do you have any idea why this happens? (I'm running this on a Google Colab.) PS. I could run this code before on the very Google Colab, but now I can't reproduce it!
Here are the steps I take to reproduce the results described in the section "Motif dependent embeddings using simulation data":
Running the scripts/multiple.py
script with default parameters
The following test loss values were observed:
To reproduce the plot, I took the following steps (as there is no script available in the repository)
scripts/multiple.py
andscripts/encode.py
script to create latent embeddings with the model created by scripts/multiple.py
.out/embed.seq
and out/sequences.txt
files created by scripts/encode.py, the latent embeddings are plotted with their corresponding motif (colour).This results in the following plot:
However, if I switch off the force_matching
option (see
Line 84 in c4986ca
These values are quite close to those reported in the paper (20.60, 16.02, 4.59 respectively).
The resulting plot also looks very similar to Fig.2b (HMM profile).
I repeated both described experiments (enabled force_matching
& disabled force_matching
) with different seeds.
Could you clarify which parameters you were using during training of the CNN_PHMM_VAE model?
Hello, please ask which module is used to generate candidate sequences in raptgen, I see two possible modules, decode and GMM, can you introduce the specific use of these two modules; And in the decode module target_len it is 20, why the length of the sequence in the example is 15
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.