xilorole / raptgen Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 9.0 25.6 MB

License: MIT License

Dockerfile 0.62% Python 99.38%

raptgen's People

Contributors

Stargazers

Watchers

Forkers

hmdlab unkosan thewall9 aalksii

raptgen's Issues

What is sample data, is this sampled from real data?

Hi authors,

Thank you for your interesting work. I am right now trying your model, and I wonder what the sample data is. Is this data sampled from real data? Or is another source of data?

Thank you!

ValueError: could not convert string to float

I could run scripts/real.py on my data and train a model successfully. Then I also could run scripts/encode.py to encode some sequences to their vector representations (exactly based on what is expected.) And I have a embed_seq.csv in the output with the right format. Now that I'm trying to decode a file using scripts/decode.py and the exact command in the readme, I get the following error:

saving to /content/drive/MyDrive/colab/RaptGen/out/decode
Traceback (most recent call last):
  File "scripts/decode.py", line 111, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "scripts/decode.py", line 53, in main
    arr = df.values[:,1:].astype(float)
ValueError: could not convert string to float: 'UGUAUAUGA'

Which I can see why this happens because df.values[:,1:] tries to also convert the actual sequence's string to float. I changed the df.values[:,1:] to df.values[:,2:] so that we only convert the two-column vector representation of sequences, I don't get that error anymore, but this is the new error I get:

saving to /content/drive/MyDrive/colab/RaptGen/out/decode
loading model parameters
Traceback (most recent call last):
  File "scripts/decode.py", line 111, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "scripts/decode.py", line 58, in main
    model.load_state_dict(torch.load(model_path))
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CNN_PHMM_VAE:
	size mismatch for decoder.tr_from_M.2.weight: copying a param with shape torch.Size([30, 32]) from checkpoint, the shape in current model is torch.Size([63, 32]).
	size mismatch for decoder.tr_from_M.2.bias: copying a param with shape torch.Size([30]) from checkpoint, the shape in current model is torch.Size([63]).
	size mismatch for decoder.tr_from_I.2.weight: copying a param with shape torch.Size([20, 32]) from checkpoint, the shape in current model is torch.Size([42, 32]).
	size mismatch for decoder.tr_from_I.2.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([42]).
	size mismatch for decoder.tr_from_D.2.weight: copying a param with shape torch.Size([20, 32]) from checkpoint, the shape in current model is torch.Size([42, 32]).
	size mismatch for decoder.tr_from_D.2.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([42]).
	size mismatch for decoder.emission.2.weight: copying a param with shape torch.Size([36, 32]) from checkpoint, the shape in current model is torch.Size([80, 32]).
	size mismatch for decoder.emission.2.bias: copying a param with shape torch.Size([36]) from checkpoint, the shape in current model is torch.Size([80]).

Looks like it can't load the trained model. Do you have any idea why this happens? (I'm running this on a Google Colab.) PS. I could run this code before on the very Google Colab, but now I can't reproduce it!

Not able to reproduce results for CNN_PHMM_VAE with simulation data

Here are the steps I take to reproduce the results described in the section "Motif dependent embeddings using simulation data":
Running the scripts/multiple.py script with default parameters
The following test loss values were observed:

ELBO: 23.95
Reconstruction error: 19.22
KL Divergence: 4.95

To reproduce the plot, I took the following steps (as there is no script available in the repository)

took the file out/seqences.txt resulting from running scripts/multiple.py and
sampled a fasta file from it
The sampled fasta file is used as input for the scripts/encode.py script to create latent embeddings with the model created by scripts/multiple.py.
Using the out/embed.seq and out/sequences.txt files created by scripts/encode.py, the latent embeddings are plotted with their corresponding motif (colour).

This results in the following plot:

However, if I switch off the force_matching option (see

raptgen/scripts/multiple.py

Line 84 in c4986ca

"force_matching" : True,

), I observe the following test loss values after running scripts/multiple.py with default parameters:

ELBO: 21.21
Reconstruction error: 17.05
KL Divergence: 4.16

These values are quite close to those reported in the paper (20.60, 16.02, 4.59 respectively).
The resulting plot also looks very similar to Fig.2b (HMM profile).

I repeated both described experiments (enabled force_matching & disabled force_matching) with different seeds.

Could you clarify which parameters you were using during training of the CNN_PHMM_VAE model?

Q

Hello, please ask which module is used to generate candidate sequences in raptgen, I see two possible modules, decode and GMM, can you introduce the specific use of these two modules; And in the decode module target_len it is 20, why the length of the sequence in the example is 15

-.-

Can you provide instructions on how to visualize?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.