sanger-pathogens / snp-sites Goto Github PK
View Code? Open in Web Editor NEWFinds SNP sites from a multi-FASTA alignment file
Home Page: http://sanger-pathogens.github.io/snp-sites/
License: Other
Finds SNP sites from a multi-FASTA alignment file
Home Page: http://sanger-pathogens.github.io/snp-sites/
License: Other
Dear teacher:
I found snp-site very good, but I have doubts. The input file I am using is coregene.aln generated by roary. Does it use the reference genome?Without a reference genome, how does it find snp?
Thanks in advance
I think the code opens the alignment file twice, once to load the first alignment (the 'reference') then re-opens it again, skips over the first sequence, then reads the rest.
Is there a way this could only open it once to allow piping of stdin
to stdout
so it can be used as a pipe filter?
Hi,
I have generated an alignment between 2 genomes using progressiveMauve (default parameters) and I'm now trying to extract SNPs using snp-sites.
My issue is that I get the error message: '' Alignment my_ali contains sequences of unequal length. Expected length is 42875 but got 42876 in sequence ''
However both sequences have a length of 42876, but both sequences have 1 indel '-'.
any idea about how to fix that ?
thanks
Romain
I am using snp-sites version 2.4.1, it is used in the snippy pipeline to detect snp in a full alignement. Error message from this pipeline is:
snp-sites: symbol lookup error: snp-sites: undefined symbol: generate_snp_sites_with_ref_pure_mono
ERROR: Could not run: snp-sites -c -o 2_strains.aln 2_strains.full.aln
I have tried to launch again with only two strains and got a segmentation error
snp-sites 2_strains.full.aln
Any ideas?
How can someone specify that a given sequence is the reference and comparisons should be relative to the given sequence? Does the tool assume that the first sequence in the alignment is the reference sequence?
Hi,
thanks for this very fast and elegant tool. I was wondering whether there is any option to also consider gaps. I am using this alignment, which has a region with gaps in a single sequence. I ran the following command:
snp-sites -o out.fasta aln.trimmed.fasta
head out.fasta
>genome|b0463
GCGCAA
>NT12002_188|b0463
GCGCAA
>NT12003_214|b0463
GCGCAA
>NT12004_22|b0463
GCGCAA
>NT12005_17|b0463
GTGCAA
The output is only including the snps and not the gaps. I imagine this is due to the fact that the region with gaps doesn't have any SNPs?
Is this an expected behaviour?
Best,
Marco
Hello, I am able to run snp-sites with one of my fasta files, but I get a segmentation fault when I try to run it with another one.
The output I get from gdb is
Core was generated by `snp-sites -mvp -o 8snps monoref_multi_8.fasta'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f6e8aa61827 in generate_snp_sites ()
from /home/slh/.linuxbrew/cellar/snp-sites/2.2.0/lib/libsnp-sites.so.1
Any idea as to how to solve this?
Thanks!
The test suite for 2.2.2 fails with a bunch of tests on various platforms, e.g. i386.
Here's the message:
FAIL: run-all-tests
=========================================
snp-sites 2.2.2: src/test-suite.log
=========================================
# TOTAL: 1
# PASS: 0
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
.. contents:: :depth: 2
FAIL: run-all-tests
===================
Running suite(s): Creating_SNP_Sites
Alignment ../tests/data/uneven_alignment.aln contains sequences of unequal length. Expected length is 8 but got 9 in sequence Uneven_number_of_bases
76%: Checks: 21, Failures: 5, Errors: 0
../tests/check-snp-sites.c:40:F:snp_sites:valid_alignment_with_one_line_per_sequence:0: Invalid VCF file for 1 line per seq
../tests/check-snp-sites.c:79:F:snp_sites:valid_alignment_with_multiple_lines_per_sequence:0: Invalid VCF file for multiple lines per seq
../tests/check-snp-sites.c:67:F:snp_sites:valid_alignment_with_one_line_per_sequence_gzipped:0: Invalid VCF file for 1 line per seq
../tests/check-snp-sites.c:53:F:snp_sites:valid_alignment_with_n_as_gap:0: Invalid VCF file for 1 line per seq
../tests/check-snp-sites.c:132:F:snp_sites:valid_with_all_outputted_with_custom_name:0: Custom name needs extra extension for VCF
Running suite(s): Creating_VCF_file
100%: Checks: 3, Failures: 0, Errors: 0
FAIL run-all-tests (exit status: 1)
============================================================================
Testsuite summary for snp-sites 2.2.2
============================================================================
# TOTAL: 1
# PASS: 0
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
============================================================================
See src/test-suite.log
============================================================================
A full log is at https://buildd.debian.org/status/fetch.php?pkg=snp-sites&arch=i386&ver=2.2.2-1&stamp=1458222828 for example.
I can reproduce this on a Jessie i386 Vagrant box. A quick bisect flagged commit b2efeb4288d4201480408b7dfe3e314c243e53c6
as the first bad one, but I'm not familiar enough with what is being tested to say more.
Hi there,
Is there any planned (or implemented) was to analyze indels via snp-sites
? Basically, I want to annotate all variants (not just snps) in my multiple-sequence alignment and project those variants against the original reference (which also has indels) coordinate system.
Thanks,
John
Hello,
I have run snp-sites and I got the three files.
But I want to to have SNP mutations from VCF file.
Is there a way from snp-sites ?
Thank you.
Hello,
Sorry for this (I guess) basic question, but I did not find the answer in the README.md file nor in the paper (Page et al. 2016).
I try to convert FASTA alignments into a SNP-extracted VCF format for downstream analyses. Some alignments are for nuclear markers, and I work on a polyploid organism, so I sometimes have more than 2 haplotypes for a given individual, but all are properly phased.
My FASTA input is formated as follow:
Individual1_a
Allele-a-sequence
Individual1_b
Allele-b-sequence
Individual2_a
Allele-a-sequence
Individual2_b
Allele-b-sequence
Individual2_c
Allele-c-sequence
...
I used a basic command:
snp-sites -v -o out.vcf in.fas
And I indeed got a .vcf file. But in this file, each allele seems coded as a homozygous individual, I see no 0/0/1 or even 0/1 in the output as expected, but rather only 0, 1 and 2 (like haploid calls).
How could I get an output so that phasing information and heterozygosity are considered? Is there an option in snp-sites that I missed? Or do I have to adapt my input, and how? (Like, loosing the phasing information by merging the alleles, getting only 1 sequence per individual but with ambiguities?! Is that mandatory?)
Thank you for any answer.
For pipeline auditing (--version, or -v or -V if you can only use short options)
% snp-sites --version
snp-sites 2.0.2
We are impressed by the speed of this tool (due to being C code).
A very useful feature we need to the ability to also filter out things like:
-
N
*
and X
etcThese would need to be independent options.
Ideally the current default behaviour to remove conserved (monmorphic) sites could also be an option. eg. so we could remove all columns with a gap only and leave the rest.
#
# Install Roary
#
RUN apt-get install snp-sites
:-P
Current linuxbrew
recipe only installs version 2.2.0.
Hi,
I am running into a segmentation fault when calling the variants on a fasta with genome size > approx 9Mb (8Mb of the same fasta works fine).
Any thoughts as to why this is the case? I couldn't see a hard maximum genome length in the code.
edit: generating some test fastas with the included perl script reveals that it is an issue with the number of variants in the alignment, which makes sense.
Cheers
Chris
I am working on Klebsiella pneumoniae phylogenetic analysis based on the snps sites of the core genome (snp-sites tool).
I would appreciate very much if you could help to localize the correct output files which show the following information:
- the distribution and location of SNPs for each isolate along the core genome
- the ratio of nonsynonymous-to-synonymous SNPs
Thank you very much for your help
Hello, I am new in the bioinformatics field. According to snp-sites tool, the output vcf file can be visualized in artemis?
is any processing necessary for that?? I have vcf file and I am trying to load in artemis, nothing happening.
I have parsed using bgzip and tabix. Artemis fails to open vcf.gz.tbi file because of not recognizing binary file format
could you please suggest me how can I do so?
Is there a way with snp-sites to get data on to the info column on a VCF output?
Dear,
In order to validate snp-sites
in our lab we have compared the results with msa2vcf.jar
Thus we have took two close sequences from ncbi (ref NC_045512.2 and MT007544.1)
$ java -jar dist/msa2vcf.jar --consensus 'NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome' --output ../sars_cov2.2.vcf ../sars_cov2.aln
[INFO][MsaToVcf]format : Fasta
$ cat ../sars_cov2.2.vcf
##fileformat=VCFv4.2
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##contig=<ID=chrUn,length=29903>
##msa2vcf.meta=compilation:20200728120720 githash:af51aa3 htsjdk:2.22.0 date:20200728122012 cmd:--consensus NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome --output ../sars_cov2.2.vcf ../sars_cov2.aln
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT MT007544.1 Severe acute respiratory syndrome coronavirus 2 isolate Australia/VIC01/2020, complete genome NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
chrUn 19065 . T C . . DP=2 GT:DP 1/1:1 0/0:1
chrUn 22303 . T G . . DP=2 GT:DP 1/1:1 0/0:1
chrUn 26144 . G T . . DP=2 GT:DP 1/1:1 0/0:1
chrUn 29749 . ACGATCGAGTG A . . DP=2 GT:DP 1/1:1 0/0:1
$ snp-sites -c -v -o sars_cov2.vcf sars_cov2.aln
$ cat sars_cov2.vcf
##fileformat=VCFv4.1
##contig=<ID=1,length=29903>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NC_045512.2 MT007544.1
1 19065 . T C . . . GT 0 1
1 22303 . T G . . . GT 0 1
1 26144 . G T . . . GT 0 1
snp-sites do not report the deletion
% make check
<snip>
/usr/bin/ld: cannot find -lsubunit
collect2: error: ld returned 1 exit status
make[2]: *** [run-all-tests] Error 1
Mauve formatted alignment input throws error about sequences of unequal length (by 1bp). Using conda installed version 2.4.1 with OSX. Is this a known issue?
src/Makefile.am:26: warning: source file '../tests/check-snp-sites.c' is in a subdirectory,
src/Makefile.am:26: but option 'subdir-objects' is disabled
automake: warning: possible forward-incompatibility.
automake: At least a source file is in a subdirectory, but the 'subdir-objects'
automake: automake option hasn't been enabled. For now, the corresponding output
automake: object file(s) will be placed in the top-level directory. However,
automake: this behaviour will change in future Automake versions: they will
automake: unconditionally cause object files to be placed in the same subdirectory
automake: of the corresponding sources.
automake: You are advised to start using 'subdir-objects' option throughout your
automake: project, to avoid future incompatibilities.
src/Makefile.am:26: warning: source file '../tests/check-vcf.c' is in a subdirectory,
src/Makefile.am:26: but option 'subdir-objects' is disabled
src/Makefile.am:26: warning: source file '../tests/helper-methods.c' is in a subdirectory,
src/Makefile.am:26: but option 'subdir-objects' is disabled
src/Makefile.am:26: warning: source file '../tests/run-all-tests.c' is in a subdirectory,
src/Makefile.am:26: but option 'subdir-objects' is disabled
Hi!
I just tried to install snp-sites using Bioconda and get the following message:
**UnavailableInvalidChannel: The channel is not accessible or is invalid.
channel name: snp-sites
channel url: https://conda.anaconda.org/snp-sites
error code: 404
You will need to adjust your conda configuration to proceed.
Use conda config --show channels
to view your configuration's current state,
and use conda config --show-sources
to view config file locations.**
I copied the url into my browser but its not available there..
Best,
Martinique
In general, phylogenetic programs use invariant sites for likelihood calculations. However, a number of programs, such as RAxML and BEAST, can perform ascertainment bias corrections given the number of invariant sites and the frequencies of nucleotides in the alignment. If SNP-sites output these values, they could be used as direct inputs for RAxML, for example.
Maybe this is a dumb question, but I'm having a hard time installing this on my school's linux box. They are running redhat, and I am not sure if it is exactly compatible. The configure command for starters doesn't seem to work. I was looking into converting the package but it looks like it requires another program that requires root access, which I do not have.
Thanks for the new -C function for counting constant sites...super useful! Can I recommend explicitly stating that the output of -C is in A,C,G,T order in the help message? Some users might not know that alphabetical order is convention.
Hi,
I noticed a similar issue to a previously closed one (#80 ), which I'm experiencing with the most recent version (2.5.1) of snp-sites.
It appears that sequences longer than 2,147,483,647 bases give the error "Warning: No SNPs were detected so there is nothing to output." 2,147,483,647 is the maximum value for a 32 bit signed integer.
I've spent a bit of time looking into this and here's what I've done to prove this.
I took two sequences from an alignment, one of which was the outgroup, so as to maximise the number of snps.
Each sequence was 2,423,158,460 bases in length:
$ cat sample1.fasta Outgroup.fasta > test.fasta
$ snp-sites -V
snp-sites 2.5.1
$ snp-sites -c -o test_snps.fasta test.fasta
Warning: No SNPs were detected so there is nothing to output.
I then cut the length of the sequences down 2,147,483,648 - one base longer than 32 bit signed integer maximum value:
$ cut -c 1-2147483648 test.fasta > test1.fasta
$ snp-sites -c -o test1_snps.fasta test1.fasta
Warning: No SNPs were detected so there is nothing to output.
I then cut the length of the sequence down 2,147,483,647 - the 32 bit signed integer maximum value:
$ cut -c 1-21474836487 test.fasta > test2.fasta
$ snp-sites -c -o test2_snps.fasta test2.fasta
/opt/slurm/data/slurmd/job28028674/slurm_script: line 13: 38321 Segmentation fault snp-sites -c -o test1_snps.fasta test1.fasta
I then cut the length of the sequence down 2,147,483,646 - one base less than the 32 bit signed integer maximum value:
$ cut -c 1-21474836486 test.fasta > test3.fasta
$ snp-sites -c -o test3_snps.fasta test3.fasta
This time snp-sites ran successfully and identifies 28,880,245 variant sites
So it seems that sequence-lengths which are at the limit of a 32 bit signed integer maximum value cause a segmentation fault, and when you go over that limit causes snp-sites to suggest there are no SNPs
Graham
Required user to install automake, autoconf, libtool, and check.
Fails to build from the github download with the following error:
./configure: line 3030: syntax error near unexpected token
CHECK,check'
./configure: line 3030: PKG_CHECK_MODULES(CHECK,check >= 0.8.2,have_check="yes",'
There is a tests
folder but it's not clear how to trigger it?
Fails to build from the github download with the following error:
./configure: line 3030: syntax error near unexpected tokenCHECK,check'
./configure: line 3030: PKG_CHECK_MODULES(CHECK,check >= 0.8.2,have_check="yes",'checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
./configure: line 3030: syntax error near unexpected token CHECK,check' ./configure: line 3030:
PKG_CHECK_MODULES(CHECK,check >= 0.8.2,have_check="yes",'
I was wondering if the first sequence in the alignment is considered "special" in some (undocumented) way?
I see the code does something unusual:
// First sequence is the reference sequence so skip it
// If there is an indel in the reference sequence, replace with the first proper base you find
Version:
$ snp-sites -V
snp-sites 2.3.2
No problems with standard usage e.g. outputting variant sites only.
On Linux:
$ snp-sites -b -o monomorphic.fa full.aln
Segmentation fault (core dumped)
On MacOS:
$ snp-sites -b -o monomorphic.fa full.aln
Segmentation fault: 11
Is there a way to turn on a verbose or debugging mode?
I can supply the alignment file (multifasta) if needed. 13 sequences, each 1.89Mbp in length.
Thanks for your help.
I'm getting a somewhat mysterious segfault when running snp-sites (latest version in conda, installed as in instructions).
The reason it's mysterious is that the file which is causing the segfault is just a cat
of two files which individually run through snp-sites
without error. The genomes in the two files input to cat
are the same length (both in 60 bp per line fasta format), and the output of cat
looks fine by both seqkit stats
and by visually inspecting where the two files have been joined.
Any thoughts?
Details below...
ubuntu@pennaeth:~/tm_data/phylo/results$ seqkit stats 2018.10.12/2018.10.12.all_tm.fasta
file format type num_seqs sum_len min_len avg_len max_len
2018.10.12/2018.10.12.all_tm.fasta FASTA DNA 191 5,470,978,215 28,643,865 28,643,865 28,643,865
ubuntu@pennaeth:~/tm_data/phylo/results$ seqkit stats 2018.10.15/2017.12.11.prelim_tm_data.reform.fa
file format type num_seqs sum_len min_len avg_len max_len
2018.10.15/2017.12.11.prelim_tm_data.reform.fa FASTA DNA 35 1,002,535,275 28,643,865 28,643,865 28,643,865
ubuntu@pennaeth:~/tm_data/phylo/results$ cat 2018.10.12/2018.10.12.all_tm.fasta 2018.10.15/2017.12.11.prelim_tm_data.reform.fa > 2018.10.15/tmp.fa
ubuntu@pennaeth:~/tm_data/phylo/results$ seqkit stats 2018.10.15/tmp.fa
file format type num_seqs sum_len min_len avg_len max_len
2018.10.15/tmp.fa FASTA DNA 226 6,473,513,490 28,643,865 28,643,865 28,643,865
ubuntu@pennaeth:~/tm_data/phylo/results$ snp-sites 2018.10.12/2018.10.12.all_tm.fasta
ubuntu@pennaeth:~/tm_data/phylo/results$ snp-sites 2018.10.15/2017.12.11.prelim_tm_data.reform.fa
ubuntu@pennaeth:~/tm_data/phylo/results$ snp-sites 2018.10.15/tmp.fa
Segmentation fault (core dumped)
ubuntu@pennaeth:~/tm_data/phylo/results$
ubuntu@pennaeth:~/tm_data/phylo/results$ ls -lh *
-rw-rw-r-- 1 ubuntu ubuntu 2.7M Oct 15 07:39 2017.12.11.prelim_tm_data.reform.fa.snp_sites.aln
-rw-rw-r-- 1 ubuntu ubuntu 45M Oct 15 07:38 2018.10.12.all_tm.fasta.snp_sites.aln
2018.10.12:
total 5.3G
-rw-rw-r-- 1 ubuntu ubuntu 5.2G Oct 12 05:33 2018.10.12.all_tm.fasta
2018.10.15:
total 21G
-rw-rw-r-- 1 ubuntu ubuntu 957M Oct 15 06:37 2017.12.11.prelim_tm_data.fa
-rw-rw-r-- 1 ubuntu ubuntu 973M Oct 15 06:58 2017.12.11.prelim_tm_data.reform.fa
-rw-rw-r-- 1 ubuntu ubuntu 6.2G Oct 15 07:36 tmp.fa
I've got a set of 972 Mtb isolates that I'm trying to run snp-sites -c -o
on, but it fails with the error Warning: No SNPs were detected so there is nothing to output
. However, it works with removing the -c
flag. How can I try and figure out which isolates are causing problems?
Hello,
I have an alignment file with full consensus genome sequences of 6 samples in exactly the same frames with the same number of bases without any blanks or indels but only with SNPs. And when I run snp-sites with default settings I get the error message:
"Warning: No SNPs were detected so there is nothing to output."
When I get the first 1000 bases for each sample without changing anything else in my file (for instance by doing cut -c1-1000) the program works and finds the SNPs; just stating so that it is clear that my installation and file formats work fine.
Samples have about 2.3 billion bases and I am working on an HPCC with over 500GB ram available and I don't get any other error related to memory. If you know the sequence length limit, could you please let me know so that I subset my file to the maximum length?
Thanks!
Hi there,
Just a suggestion - an output file seems to be required to run commands, but it might be nice if we could just print to screen if we just want to have a quick look at the data.
Thanks!
Hello how to install latest version using apt-get on linux ubuntu? All I can get is v1.5.0.
Thanks!
Hi, would there be possible to have an option to specify a reference so that the REF column respect this one and the genotype calls in the vcf as well?
Thanks a lot for this tool!
Have a nice day!
JC
I am trying to understand what the -b
option does when it is not paired with the -c
option. I am working with a 4,640,668 bp long alignment.
snp-sites
on the alignment without either the -b
or -c
options, I get a resulting alignment of 1,733 sites. My understanding is that these are all of the variant sites in the full alignment, regardless of whether or not there is missing data (N
, -
, or ?
) in some samples.snp-sites -c
, I get 944 variant sites (ACGT-only sites), which implies there are 789 variant sites with missing data in at least one sample (1,733 - 944 = 789).snp-sites -cb
, I get an alignment of 2,903,621 bp. My understanding is that this is the ACGT-only sites plus the monomorphic sites (944 + 2,902,677 = 2,903,621).snp-sites -b
would give me all of the 1,733 variant sites (both ACGT-only and those with missing data) plus the monomorphic sites (1,733 + 2,902,677 = 2,904,410).snp-sites -b
I get the complete 4,640,668 bp alignment.Am I missing something about what the -b
option is doing? Any help would be much appreciated. Thank you!
[rbutler@genomics 85]$ brew install snp-sites
==> Installing snp-sites from homebrew/science
==> Installing dependencies for homebrew/science/snp-sites: patchelf
==> Installing homebrew/science/snp-sites dependency: patchelf
==> Downloading https://linuxbrew.bintray.com/bottles/patchelf-0.9_1.x86_64_linux.bottle.tar.gz
==> Pouring patchelf-0.9_1.x86_64_linux.bottle.tar.gz
/home/rbutler/.linuxbrew/Cellar/patchelf/0.9_1: 6 files, 1.2M
==> Installing homebrew/science/snp-sites
==> Downloading https://linuxbrew.bintray.com/bottles-science/snp-sites-2.2.0.x86_64_linux.bottle.tar.gz
==> Pouring snp-sites-2.2.0.x86_64_linux.bottle.tar.gz
Error: An unexpected error occurred during the brew link
step
The formula built, but is not symlinked into /home/rbutler/.linuxbrew
No such file or directory @ realpath_rec - /home/linuxbrew
Error: No such file or directory @ realpath_rec - /home/linuxbrew
[rbutler@genomics 85]$ brew update
Already up-to-date.
[rbutler@genomics 85]$ brew tap homebrew/science
[rbutler@genomics 85]$ brew install snp-sites
Warning: homebrew/science/snp-sites-2.2.0 already installed, it's just not linked
What is the recommended workflow to produce a VCF file when starting with two unaligned bacterial whole-genome fasta files, each approximately 3 million nucleotides? Any particular aligner and file format conversion utilities you can recommend?
From RT:
Hi there.
Thanks for the snpsites program, very helpfull.
I havent been able to pipe (|) into snp-sites yet, how can this be done? Otherwise I would strongly recommend it.
The example on https://github.com/sanger-pathogens/snp-sites makes no sense to me, which command was used here?
Thanks.
Hello,
I was using your tool to obtain the SNPs in a fasta alignment and using vcf as an output format. I noticed that in cases where my reference is some nucleotide (A, C, G or T), samples that have missing data (N), will have a 0 - becoming REF genotype - and won't be coded as missing data anymore.
For example:
#CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | BID_1 | CHS_2 | LAM_1 | LAM_2 | LAM_3 |
1 | 3032469 | . | C | T | . | . | . | GT | 0 | 0 | 0 | 0 | 1 |
When this position in the alignment is:
BID_1
C
CHS_2
C
LAM_1
N
LAM_2
N
LAM_3
T
I was wondering if this is the normal behavior? Or should I code missing data in another format (? or -), so that missing data will be properly noticed by the tool?
Thanks,
Mafalda
2.4.0 has nno configure script
I had to still to autoreconf -i -f
Hello,
I am trying to generate a core genome tree for a bacterial plant pathogen local outbreak (Ralstonia solanacearum) using the output of IQtree with the output of snp-sites -c. I have tried generating an alignment with snippy-core with a reference strain and with de novo assembly alignment done with mafft but in both of these cases IQtree crashes by not recognising the input file as an alignment. I suspect that it has something to do with the output from snp-sites -c being just a multifasta and therefore unrecognisable to IQtree but my understanding was that this functionality of iqtree is specifically for snp-sites. I have now tried it with another data set and I get the same result.
I have attached the snp alignment from snippy fed to iqtree after snp-sites -c, also fconst output and iqtree error log files.
The commands I used to generate the files are:
$ snp-sites -C core.full.aln > fconst_output.txt
$ snp-sites -c core.full.aln > snp-sites.aln
$ iqtree -fconst fconst_output.txt -s snp-sites.txt
With output from snippy:
fconst_output.txt
iqtree_error.log
snp-sites.txt
Let me know if more information is needed!
Thank you!
Hi,
I have downloaded Docker on my mac (Mojave OS10.14.6), trying to pull the snp-sites container using the command found here: https://quay.io/repository/biocontainers/snp-sites
I unfortunately get the following error:
~$docker pull quay.io/biocontainers/snp-sites
Using default tag: latest
Error response from daemon: manifest for quay.io/biocontainers/snp-sites:latest not found: manifest unknown: manifest unknown
Am I doing something wrong here?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.