Coder Social home page Coder Social logo

d-j-e / snppar Goto Github PK

View Code? Open in Web Editor NEW
16.0 3.0 4.0 3.45 MB

Parallel/Homoplasic SNP Finder

License: GNU General Public License v3.0

Python 100.00%
homoplasic-snps phylogenetics parallel-snps convergent-snps revertant-snps treetime asr ancestral-state-reconstruction

snppar's People

Contributors

aggreen avatar d-j-e avatar katholt avatar scwatts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

snppar's Issues

Error and fix

I am doing an analysis from individual genes. My inputs are an "mFASTA" nucleotide alignment file, and a file with SNP locations. There is a bug at line 256:

File "snppar.py", line 256, in readMFASTA
all_calls[i]+=line[i].upper()
IndexError: list index out of range

The original line of code is:
for i in range(len(line.rstrip())):

This should actually read:
for i in range(len(all_calls)):

Explanation: The script is incorrectly iterating over all nucleotides in the sequence instead of all indices in the SNP location file that goes along with the mFASTA file.

SNPPar install issue

David,
First thank you for your software development. I have run into install error messages indicating that there is an incompatibility associated with Biopython-1.78 and snppar.py involving Bio.Alphabet indicating that Bio.Alphabet is no longer part of Biopython. I do however find that "/Users/sbberes/opt/anaconda3/lib/python3.8/site-packages/Bio/Alphabet/init.py" is present. Below is the output that I get when trying to install and run snppar, [email protected] - thanks for any assistance (p.s. a conda install would be nice):

(base) ➜ Desktop pip install git+https://github.com/d-j-e/SNPPar
Collecting git+https://github.com/d-j-e/SNPPar
Cloning https://github.com/d-j-e/SNPPar to /private/var/folders/4j/l7l40pq12tz8drwfw52y59n40000gn/T/pip-req-build-zix51bmy
Collecting biopython>=1.66
Downloading biopython-1.78-cp38-cp38-macosx_10_9_x86_64.whl (2.2 MB)
|████████████████████████████████| 2.2 MB 3.8 MB/s
Collecting ete3
Downloading ete3-3.1.2.tar.gz (4.7 MB)
|████████████████████████████████| 4.7 MB 3.9 MB/s
Collecting phylo-treetime
Downloading phylo_treetime-0.8.0-py3-none-any.whl (125 kB)
|████████████████████████████████| 125 kB 10.2 MB/s
Requirement already satisfied: numpy in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from biopython>=1.66->snppar==1.0) (1.19.2)
Requirement already satisfied: pandas>=0.17.1 in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from phylo-treetime->snppar==1.0) (1.1.3)
Requirement already satisfied: scipy>=0.16.1 in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from phylo-treetime->snppar==1.0) (1.5.2)
Requirement already satisfied: matplotlib>=2.0; python_version >= "3.6" in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from phylo-treetime->snppar==1.0) (3.3.2)
Requirement already satisfied: python-dateutil>=2.7.3 in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from pandas>=0.17.1->phylo-treetime->snppar==1.0) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from pandas>=0.17.1->phylo-treetime->snppar==1.0) (2020.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from matplotlib>=2.0; python_version >= "3.6"->phylo-treetime->snppar==1.0) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from matplotlib>=2.0; python_version >= "3.6"->phylo-treetime->snppar==1.0) (1.3.0)
Requirement already satisfied: certifi>=2020.06.20 in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from matplotlib>=2.0; python_version >= "3.6"->phylo-treetime->snppar==1.0) (2020.6.20)
Requirement already satisfied: pillow>=6.2.0 in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from matplotlib>=2.0; python_version >= "3.6"->phylo-treetime->snppar==1.0) (8.0.1)
Requirement already satisfied: cycler>=0.10 in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from matplotlib>=2.0; python_version >= "3.6"->phylo-treetime->snppar==1.0) (0.10.0)
Requirement already satisfied: six>=1.5 in /Users/sbberes/opt/anaconda3/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas>=0.17.1->phylo-treetime->snppar==1.0) (1.15.0)
Building wheels for collected packages: snppar, ete3
Building wheel for snppar (setup.py) ... done
Created wheel for snppar: filename=snppar-1.0-py3-none-any.whl size=47747 sha256=f8685d951ab017365c66d310892b71de7d163658fc0f6f89e6ca86a81270e973
Stored in directory: /private/var/folders/4j/l7l40pq12tz8drwfw52y59n40000gn/T/pip-ephem-wheel-cache-f3zpirmt/wheels/ca/85/92/7592af7eefda06c8a1d37239b3ad1304b4391810c7979e00ba
Building wheel for ete3 (setup.py) ... done
Created wheel for ete3: filename=ete3-3.1.2-py3-none-any.whl size=2272998 sha256=544565f8bb5e9b74fb80bce2cff88843d48622bfbd2bf115ea9737d0ea391a9e
Stored in directory: /Users/sbberes/Library/Caches/pip/wheels/78/96/a0/973292c4813e6b39b611bec535521655088425516959768f46
Successfully built snppar ete3
Installing collected packages: biopython, ete3, phylo-treetime, snppar
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

phylo-treetime 0.8.0 requires biopython<=1.76,>=1.66, but you'll have biopython 1.78 which is incompatible.
Successfully installed biopython-1.78 ete3-3.1.2 phylo-treetime-0.8.0 snppar-1.0
(base) ➜ Desktop which snppar
/Users/sbberes/opt/anaconda3/bin/snppar
(base) ➜ Desktop snppar
Traceback (most recent call last):
File "/Users/sbberes/opt/anaconda3/bin/snppar", line 5, in
from snppar import main
File "/Users/sbberes/opt/anaconda3/bin/snppar.py", line 38, in
from Bio.Alphabet import IUPAC, generic_dna
File "/Users/sbberes/opt/anaconda3/lib/python3.8/site-packages/Bio/Alphabet/init.py", line 20, in
raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.
(base) ➜ Desktop snppar -h
Traceback (most recent call last):
File "/Users/sbberes/opt/anaconda3/bin/snppar", line 5, in
from snppar import main
File "/Users/sbberes/opt/anaconda3/bin/snppar.py", line 38, in
from Bio.Alphabet import IUPAC, generic_dna
File "/Users/sbberes/opt/anaconda3/lib/python3.8/site-packages/Bio/Alphabet/init.py", line 20, in
raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

Recursion error with very large tree

Have encounter error with a large tree (~7800 isolates) when making a copy of the tree with 'pickle' (default for ete3 'copy'). Changing this to 'newick' version of 'copy' in v0.4.2dev - does not result in changes to outputs.

please tag a release

Hello

I was asked to install SNPPar on our cluster.
our policy is to install tagged release version.
can you provide such one ?

regards

Eric

Fail during calculation of mutation events

Same error as in #10 :

# These both fail, version is v1.0:
snppar -t timetree.nwk -m core.aln -l 4snppar.positions.txt -g target.gbk
snppar -s 4snppar.csv -t timetree.nwk -g target.gbk

Symmetrized rates from j->i (W_ij):
	A	C	G	T	-
  A	0	0.9229	0.7454	0.418	62.7575
  C	0.9229	0	0.5651	0.5147	46.3287
  G	0.7454	0.5651	0	0.932	48.9164
  T	0.418	0.5147	0.932	0	54.4489
  -	62.7575	46.3287	48.9164	54.4489	0

Actual rates from j->i (Q_ij):
	A	C	G	T	-
  A	0	0.1267	0.1024	0.0574	8.6174
  C	0.2632	0	0.1612	0.1468	13.2122
  G	0.2639	0.2	0	0.3299	17.317
  T	0.0893	0.1099	0.1991	0	11.6294
  -	0.6218	0.459	0.4847	0.5395	0


--- alignment including ancestral nodes saved as
	 treetime_out/ancestral_sequences.fasta

--- tree saved in nexus format as
	 treetime_out/annotated_tree.nexus



Extracting mutation events from ASR results...
Traceback (most recent call last):
  File ".../bin/snppar", line 8, in <module>
    sys.exit(main())
  File ".../bin/snppar.py", line 2648, in main
    snps_mapped, mapped_node_sequences, node_names_mapped = mapSNPsTT(snps_to_map,snptable,strains,arguments.tree,directory,tree,prefix,log)
  File ".../bin/snppar.py", line 1326, in mapSNPsTT
    snps_mapped = readMappedSNPs(output_dir+'annotated_tree.nexus',tree,snps_to_map,snptable)
  File ".../bin/snppar.py", line 2524, in readMappedSNPs
    derived_node = the_tree.search_nodes(name=parts[0])[0]
IndexError: list index out of range

As input I use a tree and SNP alignment where the reference sequence, to which target.gbk corresponds, has been removed from the alignment and tree. Is that a problem?

log string message vs log function :: name conflcit

hello

after installing SNPPar-0.4dev
and running some functionale test eg

snppar.py -s ../../datas/SNPPar/MTB_Global_L2_alleles.csv -t ../../datas/SNPPar/MTB_Global_L2.tre -g ../../datas/SNPPar/NC_00962_3_1.gbk -d /local/gensoft2/tests/SNPPar/0.4dev -f

I got this error.

Traceback (most recent call last):
  File "/local/gensoft2/exe/SNPPar/0.4dev/bin/snppar.py", line 2663, in <module>
    main()
  File "/local/gensoft2/exe/SNPPar/0.4dev/bin/snppar.py", line 2644, in main
    snps_mapped, tree_with_nodes, mapped_node_sequences, node_names_mapped = mapSNPs(arguments.fastml_execute, snps_to_map, snptable, strains, arguments.tree, directory,log)
  File "/local/gensoft2/exe/SNPPar/0.4dev/bin/snppar.py", line 1296, in mapSNPs
    executeCommand(fastml_command, log)
  File "/local/gensoft2/exe/SNPPar/0.4dev/bin/snppar.py", line 89, in executeCommand
    log(log, message, "CRITICAL")
TypeError: 'str' object is not callable

It seems that is due to name conflict between loag (function) and log (str holding the mesage)

fix is quite easy:

--- snppar.py.ori	2020-04-12 18:00:41.860354451 +0000
+++ snppar.py	2020-04-12 17:52:09.550799346 +0000
@@ -86,9 +86,9 @@
 		message = 'Failed to run command: ' + command
 		logPrint(log, message, "CRITICAL")
 		message = 'stdout: ' + result.stdout
-		log(log, message, "CRITICAL")
+		logPrint(log, message, "CRITICAL")
 		message = 'stderr: ' + result.stderr
-		log(log, message, "CRITICAL")
+		logPrint(log, message, "CRITICAL")
 		sys.exit(1)
 	else:
 		message = 'stdout: ' + result.stdout

and log function may be removed

the standard output gives an error

Hello,
When I run the program, the standard output gives an error:

--- alignment including ancestral nodes saved as
treetime_out/ancestral_sequences.fasta

--- tree saved in nexus format as
treetime_out/annotated_tree.nexus

Extracting mutation events from ASR results...
Traceback (most recent call last):
File "/mnt/data/home/lxd/miniconda3/bin/snppar", line 8, in
sys.exit(main())
File "/mnt/data/home/lxd/miniconda3/bin/snppar.py", line 2646, in main
snps_mapped, mapped_node_sequences, node_names_mapped = mapSNPsTT(snps_to_map,snptable,strains,arguments.tree,directory,tree,prefix,log)
File "/mnt/data/home/lxd/miniconda3/bin/snppar.py", line 1324, in mapSNPsTT
snps_mapped = readMappedSNPs(output_dir+'annotated_tree.nexus',tree,snps_to_map,snptable)
File "/mnt/data/home/lxd/miniconda3/bin/snppar.py", line 2522, in readMappedSNPs
derived_node = the_tree.search_nodes(name=parts[0])[0]
IndexError: list index out of range

Command Input:
snppar -m snps.fa -l ex_snp_pos.txt -t snps_filter_midpoint.nwk -g genome.gbk

Thanks a lot, hope to get your suggestions!

Reporting mutation events in the wrong nodes

I noticed an issue with reporting of mutation events after running snppar on my dataset, and was able to replicate it with a toy dataset of 5 SNPs. It seems like the correct number of mutation events are detected, but they are frequently not reported in the correct nodes. Strangely, the mutation event is usually incorrectly called on a node 2-3 nodes away from the correct node, but in one whose children do not have the mutation. This happens with about 1/3 of the mutation events in my toy dataset, so it's not a consistent indexing error. When I check the ancestral sequence reconstructions, it looks like the problem is there and not with calling of the mutation events (i.e. the reconstructed sequence for a node will contain a mutation, but none of the descendants of that node have the mutation, while 2-3 nodes away a group of isolates will have a mutation but it's not present in the sequence reconstruction of that ancestral node).

Genbank file not being recognised?

Hi, apologies for this issue as it is likely it is something I am doing wrong! I am having a problem with the genbank file and the software not liking it! Here is my command:
snppar -s core_snps_for_SNPpar.csv -t FINAL_snippy_consensus_mapped_aln.final_tree.tre -g test_Se4047_genbank_format.gbk

The error I get is as follows:
Reading SNP table from core_snps_for_SNPpar.csv

Finished reading 1315 SNPs in total
...keeping 1315 variable SNPs and ignoring 0 SNPs
that are non-variable among the 597 isolates

Reading Genbank file from test_Se4047_genbank_format.gbk
Traceback (most recent call last):
File "/home/hjw58/.local/bin/snppar", line 8, in
sys.exit(main())
File "/home/hjw58/.local/bin/snppar.py", line 2625, in main
record, sequence, geneannot = readGenbank(arguments.genbank, log)
File "/home/hjw58/.local/bin/snppar.py", line 1340, in readGenbank
record = SeqIO.read(handle, "genbank")
File "/home/hjw58/.local/lib/python3.7/site-packages/Bio/SeqIO/init.py", line 748, in read
raise ValueError("No records found in handle")
ValueError: No records found in handle
Changed directory to /rds/user/hjw58/hpc-work/treemer.

Here is a sample of the Genbank file (I thought it looked ok?)
CDS 1..1353
/ID=Streptococcus_equi_subsp_equi_4047_v1_00001
/locus_tag="Streptococcus_equi_subsp_equi_4047_v1_00001"
/inference="ab initio prediction:Prodigal:2.60,similar to
AA sequence:RefSeq:YP_008628649.1,similar to AA
sequence:UniProtKB:P05648,protein
motif:CLUSTERS:PRK00149,protein motif:Cdd:COG1484,protein
motif:TIGRFAMs:TIGR00362,protein motif:Pfam:PF00308.12"
/product="chromosomal replication initiation
protein,Chromosomal replication initiator protein
DnaA,chromosomal replication initiation protein,DNA
replication protein,chromosomal replication initiator
protein DnaA,Bacterial dnaA protein"
/protein_id="gnl|SC|Streptococcus_equi_subsp_equi_4047_v1_00001"
/gene="dnaA"
/codon_start=1

I am wondering if it is not quite in the right format? Maybe I have set it up wrong. Any help would be much appreciated!

SNPPar Installation

I used the command "pip install git+https://github.com/d-j-e/SNPPar" to install SNPPar, it was stopped by a reason that Biopython requires Python 3.6 or later, Python 2.7 detected. When I used the command "pip3 install git+https://github.com/d-j-e/SNPPa" to install SNPPar, it was done successfully. But the snppar software did not work. My system is Ubuntu LTS 18.04, python version is 3.6.9, biopython version is 1.78. How can I make it worked.

Second recursion error

There is a potential recursion issue with large trees (>7500), independent of the issue mentioned here.. #13

Unfortunately this recursion error occurred during the execution of TreeTime, not SNPPar itself, so the issue is beyond my immediate control. I will, however, report the issue as the author is aware of the problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.