Coder Social home page Coder Social logo

zheminzhou / etoki Goto Github PK

View Code? Open in Web Editor NEW
38.0 10.0 17.0 239.21 MB

all methods related to Enterobase

Home Page: https://enterobase.warwick.ac.uk

License: GNU General Public License v3.0

Python 99.53% Shell 0.47%
phylo assembly phylogeny spades mlst genotype

etoki's Issues

--install --path

Hi, could you clarify in the documentation which things I can put into --path? Like if I have the entire spades package downloaded, do I just run it with --path spades.py=$(which spades.py)? For blast, do I need to consider multiple executables like makeblastdb and blastn?

Thank you!

Error when running example script EToKi.py MLSTdb

Hello,
I installed EToKi and tried the example as showed in the README file but I got the following error.

./EToKi.py MLSTdb -i examples/Escherichia.Achtman.alleles.fasta -r examples/Escherichia.Achtman.references.fasta -d examples/Escherichia.Achtman.convert.tab
2020-07-29 20:45:04.997470 Exemplar sequences in ./NS_2085r4ap/clsFna.clust.exemplar
2020-07-29 20:45:04.997569 Clusters in ./NS_2085r4ap/clsFna.clust.tab
2020-07-29 20:45:05.030599 Run BLASTn starts
2020-07-29 20:45:05.384751 Run BLASTn finishes. Got 971 alignments
2020-07-29 20:45:05.384888 Run diamond starts
2020-07-29 20:45:06.047658 Run diamond finishes. Got 966 alignments
2020-07-29 20:45:06.375628 removed 0 paralogous sites.
2020-07-29 20:45:06.375682 obtained 5530 alleles and 39 references alleles
2020-07-29 20:45:06.378881 A file of reference alleles has been generated: examples/Escherichia.Achtman.references.fasta
Traceback (most recent call last):
File "EToKi.py", line 47, in
etoki()
File "EToKi.py", line 41, in etoki
eval(arg.cmd)(sys.argv[2:])
File "/mnt/data/disk1/biotools/EToKi/modules/MLSTdb.py", line 164, in MLSTdb
conversion[0].append(get_md5(allele['value']))
TypeError: string indices must be integers

I installed in a conda env with python 3.6.
Some ideas how to fix it?

Use a single tool with subcommands

It would be much nicer if all the compenents were "sub commands" of a single tool.

Having understandable names would be even better.

Thanks ๐Ÿ‘

Phylo.py core genome output

With phylo.py, the core genome output doesn't match expectations. Assuming I'm correct in thinking that the .matrix is the right output.

With my alignment and a threshold of 0.95, 4,800,000/5,000,000 sites should be retained. Instead, 4,781,411 are output.

Doesn't handle Help when no module is given

python EToKi/EToKi.py -h
Doesn't handle Help when no module is given

Traceback (most recent call last):
  File "EToKi/EToKi.py", line 55, in <module>
    etoki()
  File "EToKi/EToKi.py", line 31, in etoki
    exec('from modules.{0} import {0}'.format(sys.argv[1]))
  File "<string>", line 1
    from modules.-h import -h

requests

requests is required by package, but not mentioned in readme.

Need documentation on cgMLST

I am taking some notes on how I ran cgMLST, and I hope you can add documentation for it.

Create database: this took a very long time

# Downloaded the cgMLST scheme from enterobase FTP into Salmonella.cgMLSTv2.enterobase (undocumented)
\ls -f1 Salmonella.cgMLSTv2.enterobase/*.fasta | \
  grep -v cgMLST_v2_ref.fasta `# ignore already-established reference file` | \
  xargs seqtk seq -l 0 `# cat out all the fasta contents and two-line fasta format` | \
  perl -lane '
    # get the id with '>' and the seq on the next line since it is in a two-line fasta format
    $id=$F[0]; 
    $seq=<>; 
    chomp($seq); 
    # I don't think this will matter but just avoid any infinite loops by quitting if we see the same sequence
    my %seen; 
    if($seen{$id}++){print STDERR "Already seen $id. Done."; last;} 

    # Avoid deflines that might be problematic
    if($id =~ /[^_>0-9a-zA-Z]/){
      print STDERR "Skipping ".$id; 
      next;
    } 
    print "$id\n$seq";
  ' > enterobase.filtered.fasta

Comparing the names of programs to those described in Enterobase docs

Hi @zheminzhou ,

am trying to recreate the SNP analysis that would be performed on EnteroBase, given a directory of assemblies.

Would it be possible to add a note to the README about what programs correspond to the steps listed in the EnteroBase docs?

For what I can tell, it seems like:

  • refMasker ~ RecHMM
  • refMapper ~ align
  • refMapper_matrix ~ RecFilter
  • matrix_phylogeny ~ phylo

... but if that's the case, I'm not sure where the SNP matrix required by RecHMM would come from; the docs describe refMasker as identifying recombination regions from a reference genome, while RecHMM identifies regions from a SNP matrix.

Any help would be appreciated!

Thanks,

~Nick

Usearch is proprietary

Hi, I am wondering if it will work if I download vsearch and rename it to usearch in my PATH. I want to package EToKi into a container but the usearch license is prohibitive.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.