Coder Social home page Coder Social logo

raptgen's People

Contributors

unkosan avatar xilorole avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

raptgen's Issues

ValueError: could not convert string to float

I could run scripts/real.py on my data and train a model successfully. Then I also could run scripts/encode.py to encode some sequences to their vector representations (exactly based on what is expected.) And I have a embed_seq.csv in the output with the right format. Now that I'm trying to decode a file using scripts/decode.py and the exact command in the readme, I get the following error:

saving to /content/drive/MyDrive/colab/RaptGen/out/decode
Traceback (most recent call last):
  File "scripts/decode.py", line 111, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "scripts/decode.py", line 53, in main
    arr = df.values[:,1:].astype(float)
ValueError: could not convert string to float: 'UGUAUAUGA'

Which I can see why this happens because df.values[:,1:] tries to also convert the actual sequence's string to float. I changed the df.values[:,1:] to df.values[:,2:] so that we only convert the two-column vector representation of sequences, I don't get that error anymore, but this is the new error I get:

saving to /content/drive/MyDrive/colab/RaptGen/out/decode
loading model parameters
Traceback (most recent call last):
  File "scripts/decode.py", line 111, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "scripts/decode.py", line 58, in main
    model.load_state_dict(torch.load(model_path))
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CNN_PHMM_VAE:
	size mismatch for decoder.tr_from_M.2.weight: copying a param with shape torch.Size([30, 32]) from checkpoint, the shape in current model is torch.Size([63, 32]).
	size mismatch for decoder.tr_from_M.2.bias: copying a param with shape torch.Size([30]) from checkpoint, the shape in current model is torch.Size([63]).
	size mismatch for decoder.tr_from_I.2.weight: copying a param with shape torch.Size([20, 32]) from checkpoint, the shape in current model is torch.Size([42, 32]).
	size mismatch for decoder.tr_from_I.2.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([42]).
	size mismatch for decoder.tr_from_D.2.weight: copying a param with shape torch.Size([20, 32]) from checkpoint, the shape in current model is torch.Size([42, 32]).
	size mismatch for decoder.tr_from_D.2.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([42]).
	size mismatch for decoder.emission.2.weight: copying a param with shape torch.Size([36, 32]) from checkpoint, the shape in current model is torch.Size([80, 32]).
	size mismatch for decoder.emission.2.bias: copying a param with shape torch.Size([36]) from checkpoint, the shape in current model is torch.Size([80]).

Looks like it can't load the trained model. Do you have any idea why this happens? (I'm running this on a Google Colab.) PS. I could run this code before on the very Google Colab, but now I can't reproduce it!

Not able to reproduce results for CNN_PHMM_VAE with simulation data

Here are the steps I take to reproduce the results described in the section "Motif dependent embeddings using simulation data":
Running the scripts/multiple.py script with default parameters
The following test loss values were observed:

  • ELBO: 23.95
  • Reconstruction error: 19.22
  • KL Divergence: 4.95

To reproduce the plot, I took the following steps (as there is no script available in the repository)

  • took the file out/seqences.txt resulting from running scripts/multiple.py and
  • sampled a fasta file from it
  • The sampled fasta file is used as input for the scripts/encode.py script to create latent embeddings with the model created by scripts/multiple.py.
  • Using the out/embed.seq and out/sequences.txt files created by scripts/encode.py, the latent embeddings are plotted with their corresponding motif (colour).

This results in the following plot:
cnn_phmm_vae

However, if I switch off the force_matching option (see

"force_matching" : True,
), I observe the following test loss values after running scripts/multiple.py with default parameters:

  • ELBO: 21.21
  • Reconstruction error: 17.05
  • KL Divergence: 4.16

These values are quite close to those reported in the paper (20.60, 16.02, 4.59 respectively).
The resulting plot also looks very similar to Fig.2b (HMM profile).
cnn_phmm_vae_multiple

I repeated both described experiments (enabled force_matching & disabled force_matching) with different seeds.

Could you clarify which parameters you were using during training of the CNN_PHMM_VAE model?

Q

Hello, please ask which module is used to generate candidate sequences in raptgen, I see two possible modules, decode and GMM, can you introduce the specific use of these two modules; And in the decode module target_len it is 20, why the length of the sequence in the example is 15

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.