sanger-pathogens / snp-sites Goto Github PK

Finds SNP sites from a multi-FASTA alignment file

Home Page: http://sanger-pathogens.github.io/snp-sites/

License: Other

C 87.36% Perl 2.99% Makefile 0.96% M4 6.74% Dockerfile 0.27% C++ 1.68%

genomics sequencing next-generation-sequencing research bioinformatics bioinformatics-pipeline global-health infectious-diseases pathogen

snp-sites's People

Contributors

Stargazers

Watchers

Forkers

andrewjpage carlacummins aslett1 satta fw1121 aidandelaney yangming pauruihu sdwfrost lindechun matamoros blablabla76984534 kdm9 tseemann yazbraimah ariadnesabbag trstickland bioinfoacademy vaofford schonfju olumide-afolabi wangdi2014 pathogen-informatics vikash84 pvanheus alecbrown24 jribado-idm pxhhappy hj1994412 ldenti wanliu2019 shicheng-guo icompbioutc monomeric wook2014 sanjaymsh sdy2813 emmadebayos oronda zm-git-dev nahid18 dlp-informatics solaymane 12317d stanikae ybdong919 xavier-j shengxinzhuan

snp-sites's Issues

Is the reference genome used when using the snp-sites software?

Dear teacher：
I found snp-site very good, but I have doubts. The input file I am using is coregene.aln generated by roary. Does it use the reference genome?Without a reference genome, how does it find snp?
Thanks in advance

Opens <file> twice?

I think the code opens the alignment file twice, once to load the first alignment (the 'reference') then re-opens it again, skips over the first sequence, then reads the rest.

Is there a way this could only open it once to allow piping of stdin to stdout so it can be used as a pipe filter?

error: alignment contains sequences of unequal length

Hi,

I have generated an alignment between 2 genomes using progressiveMauve (default parameters) and I'm now trying to extract SNPs using snp-sites.

My issue is that I get the error message: '' Alignment my_ali contains sequences of unequal length. Expected length is 42875 but got 42876 in sequence ''

However both sequences have a length of 42876, but both sequences have 1 indel '-'.

any idea about how to fix that ?

thanks
Romain

Segmentation error

I am using snp-sites version 2.4.1, it is used in the snippy pipeline to detect snp in a full alignement. Error message from this pipeline is:

snp-sites: symbol lookup error: snp-sites: undefined symbol: generate_snp_sites_with_ref_pure_mono
ERROR: Could not run: snp-sites -c -o 2_strains.aln 2_strains.full.aln

I have tried to launch again with only two strains and got a segmentation error

snp-sites 2_strains.full.aln

Any ideas?

2_strains.full.aln.zip

specify reference sequence

How can someone specify that a given sequence is the reference and comparisons should be relative to the given sequence? Does the tool assume that the first sequence in the alignment is the reference sequence?

Gaps not reported in output?

Hi,

thanks for this very fast and elegant tool. I was wondering whether there is any option to also consider gaps. I am using this alignment, which has a region with gaps in a single sequence. I ran the following command:

snp-sites -o out.fasta aln.trimmed.fasta
head out.fasta

>genome|b0463
GCGCAA
>NT12002_188|b0463
GCGCAA
>NT12003_214|b0463
GCGCAA
>NT12004_22|b0463
GCGCAA
>NT12005_17|b0463
GTGCAA

The output is only including the snps and not the gaps. I imagine this is due to the fact that the region with gaps doesn't have any SNPs?

Is this an expected behaviour?
Best,
Marco

segmentation fault in one fasta but not on the other

Hello, I am able to run snp-sites with one of my fasta files, but I get a segmentation fault when I try to run it with another one.

The output I get from gdb is

Core was generated by `snp-sites -mvp -o 8snps monoref_multi_8.fasta'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f6e8aa61827 in generate_snp_sites ()
    from /home/slh/.linuxbrew/cellar/snp-sites/2.2.0/lib/libsnp-sites.so.1

Any idea as to how to solve this?

Thanks!

Failing tests on various platforms

The test suite for 2.2.2 fails with a bunch of tests on various platforms, e.g. i386.
Here's the message:

FAIL: run-all-tests
=========================================
   snp-sites 2.2.2: src/test-suite.log
=========================================

# TOTAL: 1
# PASS:  0
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: run-all-tests
===================

Running suite(s): Creating_SNP_Sites
Alignment ../tests/data/uneven_alignment.aln contains sequences of unequal length. Expected length is 8 but got 9 in sequence Uneven_number_of_bases

76%: Checks: 21, Failures: 5, Errors: 0
../tests/check-snp-sites.c:40:F:snp_sites:valid_alignment_with_one_line_per_sequence:0: Invalid VCF file for 1 line per seq
../tests/check-snp-sites.c:79:F:snp_sites:valid_alignment_with_multiple_lines_per_sequence:0: Invalid VCF file for multiple lines per seq
../tests/check-snp-sites.c:67:F:snp_sites:valid_alignment_with_one_line_per_sequence_gzipped:0: Invalid VCF file for 1 line per seq
../tests/check-snp-sites.c:53:F:snp_sites:valid_alignment_with_n_as_gap:0: Invalid VCF file for 1 line per seq
../tests/check-snp-sites.c:132:F:snp_sites:valid_with_all_outputted_with_custom_name:0: Custom name needs extra extension for VCF
Running suite(s): Creating_VCF_file
100%: Checks: 3, Failures: 0, Errors: 0
FAIL run-all-tests (exit status: 1)

============================================================================
Testsuite summary for snp-sites 2.2.2
============================================================================
# TOTAL: 1
# PASS:  0
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See src/test-suite.log
============================================================================

A full log is at https://buildd.debian.org/status/fetch.php?pkg=snp-sites&arch=i386&ver=2.2.2-1&stamp=1458222828 for example.

I can reproduce this on a Jessie i386 Vagrant box. A quick bisect flagged commit b2efeb4288d4201480408b7dfe3e314c243e53c6 as the first bad one, but I'm not familiar enough with what is being tested to say more.

Keeping indels, projecting reference coordinates?

Hi there,
Is there any planned (or implemented) was to analyze indels via snp-sites? Basically, I want to annotate all variants (not just snps) in my multiple-sequence alignment and project those variants against the original reference (which also has indels) coordinate system.

Thanks,
John

Is there a way to extract SNP mutations form the VCF file ?

Hello,

I have run snp-sites and I got the three files.
But I want to to have SNP mutations from VCF file.
Is there a way from snp-sites ?

Thank you.

Add Arxiv paper link to README.md

How to manage heterozygosity in SNP conversion?

Hello,

Sorry for this (I guess) basic question, but I did not find the answer in the README.md file nor in the paper (Page et al. 2016).

I try to convert FASTA alignments into a SNP-extracted VCF format for downstream analyses. Some alignments are for nuclear markers, and I work on a polyploid organism, so I sometimes have more than 2 haplotypes for a given individual, but all are properly phased.

My FASTA input is formated as follow:

Individual1_a
Allele-a-sequence
Individual1_b
Allele-b-sequence
Individual2_a
Allele-a-sequence
Individual2_b
Allele-b-sequence
Individual2_c
Allele-c-sequence
...

I used a basic command:

snp-sites -v -o out.vcf in.fas

And I indeed got a .vcf file. But in this file, each allele seems coded as a homozygous individual, I see no 0/0/1 or even 0/1 in the output as expected, but rather only 0, 1 and 2 (like haploid calls).

How could I get an output so that phasing information and heterozygosity are considered? Is there an option in snp-sites that I missed? Or do I have to adapt my input, and how? (Like, loosing the phasing information by merging the alleles, getting only 1 sequence per individual but with ambiguities?! Is that mandatory?)

Thank you for any answer.

Add a --version switch?

For pipeline auditing (--version, or -v or -V if you can only use short options)

% snp-sites --version
snp-sites 2.0.2

Option to disallow gap and/or N and/or non-AGTC ?

We are impressed by the speed of this tool (due to being C code).

A very useful feature we need to the ability to also filter out things like:

gap -
N
non-AGTC eg * and X etc

These would need to be independent options.

Ideally the current default behaviour to remove conserved (monmorphic) sites could also be an option. eg. so we could remove all columns with a gap only and leave the rest.

Docker file typo

#
# Install Roary
#
RUN apt-get install snp-sites

:-P

brew recipe needs to be updated to the latest release.

Current linuxbrew recipe only installs version 2.2.0.

Segmentation fault with larger sequences (on Desktop and HPC)

Hi,

I am running into a segmentation fault when calling the variants on a fasta with genome size > approx 9Mb (8Mb of the same fasta works fine).
Any thoughts as to why this is the case? I couldn't see a hard maximum genome length in the code.

edit: generating some test fastas with the included perl script reveals that it is an issue with the number of variants in the alignment, which makes sense.

Cheers

Chris

snps-sites tool help

I am working on Klebsiella pneumoniae phylogenetic analysis based on the snps sites of the core genome (snp-sites tool).
I would appreciate very much if you could help to localize the correct output files which show the following information:

- the distribution and location of SNPs for each isolate along the core genome 
- the ratio of nonsynonymous-to-synonymous SNPs

Thank you very much for your help

how to visualize VCF file in artemis?

Hello, I am new in the bioinformatics field. According to snp-sites tool, the output vcf file can be visualized in artemis?

is any processing necessary for that?? I have vcf file and I am trying to load in artemis, nothing happening.

I have parsed using bgzip and tabix. Artemis fails to open vcf.gz.tbi file because of not recognizing binary file format

could you please suggest me how can I do so?

Info in VCF output format

Is there a way with snp-sites to get data on to the info column on a VCF output?

snp-sites do not see do deletion events

Dear,

In order to validate snp-sitesin our lab we have compared the results with msa2vcf.jar

Thus we have took two close sequences from ncbi (ref NC_045512.2 and MT007544.1)

msa2vcf

$ java -jar dist/msa2vcf.jar --consensus 'NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome' --output ../sars_cov2.2.vcf ../sars_cov2.aln
[INFO][MsaToVcf]format : Fasta
$ cat ../sars_cov2.2.vcf
##fileformat=VCFv4.2
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##contig=<ID=chrUn,length=29903>
##msa2vcf.meta=compilation:20200728120720 githash:af51aa3 htsjdk:2.22.0 date:20200728122012 cmd:--consensus NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome --output ../sars_cov2.2.vcf ../sars_cov2.aln
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  MT007544.1 Severe acute respiratory syndrome coronavirus 2 isolate Australia/VIC01/2020, complete genome        NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
chrUn   19065   .       T       C       .       .       DP=2    GT:DP   1/1:1   0/0:1
chrUn   22303   .       T       G       .       .       DP=2    GT:DP   1/1:1   0/0:1
chrUn   26144   .       G       T       .       .       DP=2    GT:DP   1/1:1   0/0:1
chrUn   29749   .       ACGATCGAGTG     A       .       .       DP=2    GT:DP   1/1:1   0/0:1

snp-sites

$ snp-sites -c -v -o sars_cov2.vcf  sars_cov2.aln
$ cat sars_cov2.vcf
##fileformat=VCFv4.1
##contig=<ID=1,length=29903>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NC_045512.2     MT007544.1
1       19065   .       T       C       .       .       .       GT      0       1
1       22303   .       T       G       .       .       .       GT      0       1
1       26144   .       G       T       .       .       .       GT      0       1

Problem

snp-sites do not report the deletion

make check => cannot find -lsubunit

% make check

<snip>
/usr/bin/ld: cannot find -lsubunit
collect2: error: ld returned 1 exit status
make[2]: *** [run-all-tests] Error 1

Error with Mauve alignment input

Mauve formatted alignment input throws error about sequences of unequal length (by 1bp). Using conda installed version 2.4.1 with OSX. Is this a known issue?

autoreconf -i warnings => option 'subdir-objects' is disabled

src/Makefile.am:26: warning: source file '../tests/check-snp-sites.c' is in a subdirectory,
src/Makefile.am:26: but option 'subdir-objects' is disabled
automake: warning: possible forward-incompatibility.
automake: At least a source file is in a subdirectory, but the 'subdir-objects'
automake: automake option hasn't been enabled.  For now, the corresponding output
automake: object file(s) will be placed in the top-level directory.  However,
automake: this behaviour will change in future Automake versions: they will
automake: unconditionally cause object files to be placed in the same subdirectory
automake: of the corresponding sources.
automake: You are advised to start using 'subdir-objects' option throughout your
automake: project, to avoid future incompatibilities.
src/Makefile.am:26: warning: source file '../tests/check-vcf.c' is in a subdirectory,
src/Makefile.am:26: but option 'subdir-objects' is disabled
src/Makefile.am:26: warning: source file '../tests/helper-methods.c' is in a subdirectory,
src/Makefile.am:26: but option 'subdir-objects' is disabled
src/Makefile.am:26: warning: source file '../tests/run-all-tests.c' is in a subdirectory,
src/Makefile.am:26: but option 'subdir-objects' is disabled

Problem with installation via Bioconda

Hi!
I just tried to install snp-sites using Bioconda and get the following message:

**UnavailableInvalidChannel: The channel is not accessible or is invalid.
channel name: snp-sites
channel url: https://conda.anaconda.org/snp-sites
error code: 404

You will need to adjust your conda configuration to proceed.
Use conda config --show channels to view your configuration's current state,
and use conda config --show-sources to view config file locations.**

I copied the url into my browser but its not available there..

Best,
Martinique

Output invariant sites and nucleotide frequencies

In general, phylogenetic programs use invariant sites for likelihood calculations. However, a number of programs, such as RAxML and BEAST, can perform ascertainment bias corrections given the number of invariant sites and the frequencies of nucleotides in the alignment. If SNP-sites output these values, they could be used as direct inputs for RAxML, for example.

possible to convert to redhat?

Maybe this is a dumb question, but I'm having a hard time installing this on my school's linux box. They are running redhat, and I am not sure if it is exactly compatible. The configure command for starters doesn't seem to work. I was looking into converting the package but it looks like it requires another program that requires root access, which I do not have.

Didn't mean to open this

specify order of output of -C option in help message

Thanks for the new -C function for counting constant sites...super useful! Can I recommend explicitly stating that the output of -C is in A,C,G,T order in the help message? Some users might not know that alphabetical order is convention.

32 bit signed integer error

Hi,
I noticed a similar issue to a previously closed one (#80 ), which I'm experiencing with the most recent version (2.5.1) of snp-sites.
It appears that sequences longer than 2,147,483,647 bases give the error "Warning: No SNPs were detected so there is nothing to output." 2,147,483,647 is the maximum value for a 32 bit signed integer.
I've spent a bit of time looking into this and here's what I've done to prove this.

I took two sequences from an alignment, one of which was the outgroup, so as to maximise the number of snps.
Each sequence was 2,423,158,460 bases in length:

$ cat sample1.fasta Outgroup.fasta > test.fasta

$ snp-sites -V
snp-sites 2.5.1

$ snp-sites -c -o test_snps.fasta test.fasta
Warning: No SNPs were detected so there is nothing to output.

I then cut the length of the sequences down 2,147,483,648 - one base longer than 32 bit signed integer maximum value:

$ cut -c 1-2147483648 test.fasta > test1.fasta
$ snp-sites -c -o test1_snps.fasta test1.fasta
Warning: No SNPs were detected so there is nothing to output.

I then cut the length of the sequence down 2,147,483,647 - the 32 bit signed integer maximum value:

$ cut -c 1-21474836487 test.fasta > test2.fasta
$ snp-sites -c -o test2_snps.fasta test2.fasta
/opt/slurm/data/slurmd/job28028674/slurm_script: line 13: 38321 Segmentation fault      snp-sites -c -o test1_snps.fasta test1.fasta

I then cut the length of the sequence down 2,147,483,646 - one base less than the 32 bit signed integer maximum value:

$ cut -c 1-21474836486 test.fasta > test3.fasta
$ snp-sites -c -o test3_snps.fasta test3.fasta

This time snp-sites ran successfully and identifies 28,880,245 variant sites

So it seems that sequence-lengths which are at the limit of a 32 bit signed integer maximum value cause a segmentation fault, and when you go over that limit causes snp-sites to suggest there are no SNPs

Graham

Build is failing on Mac OS

Required user to install automake, autoconf, libtool, and check.

Fails to build from the github download with the following error:

./configure: line 3030: syntax error near unexpected tokenCHECK,check'
./configure: line 3030: PKG_CHECK_MODULES(CHECK,check >= 0.8.2,have_check="yes",'

How do we run the tests?

There is a tests folder but it's not clear how to trigger it?

line 3030: syntax error near unexpected token `CHECK,check' on REHL 7.5

Fails to build from the github download with the following error:

./configure: line 3030: syntax error near unexpected tokenCHECK,check'
./configure: line 3030: PKG_CHECK_MODULES(CHECK,check >= 0.8.2,have_check="yes",'checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
./configure: line 3030: syntax error near unexpected token CHECK,check' ./configure: line 3030: PKG_CHECK_MODULES(CHECK,check >= 0.8.2,have_check="yes",'

First sequence in alignment is "special" ?

I was wondering if the first sequence in the alignment is considered "special" in some (undocumented) way?

I see the code does something unusual:

       // First sequence is the reference sequence so skip it
       // If there is an indel in the reference sequence, replace with the first proper base you find

segmentation fault with `-b` option

Version:

$ snp-sites -V
snp-sites 2.3.2

No problems with standard usage e.g. outputting variant sites only.

On Linux:

$ snp-sites -b -o monomorphic.fa full.aln 
Segmentation fault (core dumped)

On MacOS:

$ snp-sites -b -o monomorphic.fa full.aln
Segmentation fault: 11

Is there a way to turn on a verbose or debugging mode?
I can supply the alignment file (multifasta) if needed. 13 sequences, each 1.89Mbp in length.

Thanks for your help.

SegFault for unknown reason

I'm getting a somewhat mysterious segfault when running snp-sites (latest version in conda, installed as in instructions).

The reason it's mysterious is that the file which is causing the segfault is just a cat of two files which individually run through snp-sites without error. The genomes in the two files input to cat are the same length (both in 60 bp per line fasta format), and the output of cat looks fine by both seqkit stats and by visually inspecting where the two files have been joined.

Any thoughts?

Details below...

ubuntu@pennaeth:~/tm_data/phylo/results$ seqkit stats 2018.10.12/2018.10.12.all_tm.fasta
file                                format  type  num_seqs        sum_len     min_len     avg_len     max_len
2018.10.12/2018.10.12.all_tm.fasta  FASTA   DNA        191  5,470,978,215  28,643,865  28,643,865  28,643,865

ubuntu@pennaeth:~/tm_data/phylo/results$ seqkit stats 2018.10.15/2017.12.11.prelim_tm_data.reform.fa
file                                          format  type  num_seqs        sum_len     min_len     avg_len     max_len
2018.10.15/2017.12.11.prelim_tm_data.reform.fa  FASTA   DNA         35  1,002,535,275  28,643,865  28,643,865  28,643,865

ubuntu@pennaeth:~/tm_data/phylo/results$ cat 2018.10.12/2018.10.12.all_tm.fasta 2018.10.15/2017.12.11.prelim_tm_data.reform.fa > 2018.10.15/tmp.fa

ubuntu@pennaeth:~/tm_data/phylo/results$ seqkit stats 2018.10.15/tmp.fa
file               format  type  num_seqs        sum_len     min_len     avg_len     max_len
2018.10.15/tmp.fa  FASTA   DNA        226  6,473,513,490  28,643,865  28,643,865  28,643,865

ubuntu@pennaeth:~/tm_data/phylo/results$ snp-sites 2018.10.12/2018.10.12.all_tm.fasta
ubuntu@pennaeth:~/tm_data/phylo/results$ snp-sites 2018.10.15/2017.12.11.prelim_tm_data.reform.fa
ubuntu@pennaeth:~/tm_data/phylo/results$ snp-sites 2018.10.15/tmp.fa 
Segmentation fault (core dumped)
ubuntu@pennaeth:~/tm_data/phylo/results$ 

ubuntu@pennaeth:~/tm_data/phylo/results$ ls -lh *
-rw-rw-r-- 1 ubuntu ubuntu 2.7M Oct 15 07:39 2017.12.11.prelim_tm_data.reform.fa.snp_sites.aln
-rw-rw-r-- 1 ubuntu ubuntu  45M Oct 15 07:38 2018.10.12.all_tm.fasta.snp_sites.aln

2018.10.12:
total 5.3G
-rw-rw-r-- 1 ubuntu ubuntu 5.2G Oct 12 05:33 2018.10.12.all_tm.fasta

2018.10.15:
total 21G
-rw-rw-r-- 1 ubuntu ubuntu 957M Oct 15 06:37 2017.12.11.prelim_tm_data.fa
-rw-rw-r-- 1 ubuntu ubuntu 973M Oct 15 06:58 2017.12.11.prelim_tm_data.reform.fa
-rw-rw-r-- 1 ubuntu ubuntu 6.2G Oct 15 07:36 tmp.fa

Warning: No SNPs were detected so there is nothing to output

I've got a set of 972 Mtb isolates that I'm trying to run snp-sites -c -o on, but it fails with the error Warning: No SNPs were detected so there is nothing to output. However, it works with removing the -c flag. How can I try and figure out which isolates are causing problems?

Is there a sequence length limit? "Warning: No SNPs were detected so there is nothing to output."

Hello,
I have an alignment file with full consensus genome sequences of 6 samples in exactly the same frames with the same number of bases without any blanks or indels but only with SNPs. And when I run snp-sites with default settings I get the error message:
"Warning: No SNPs were detected so there is nothing to output."

When I get the first 1000 bases for each sample without changing anything else in my file (for instance by doing cut -c1-1000) the program works and finds the SNPs; just stating so that it is clear that my installation and file formats work fine.

Samples have about 2.3 billion bases and I am working on an HPCC with over 500GB ram available and I don't get any other error related to memory. If you know the sequence length limit, could you please let me know so that I subset my file to the maximum length?

Thanks!

Print to screen

Hi there,
Just a suggestion - an output file seems to be required to run commands, but it might be nice if we could just print to screen if we just want to have a quick look at the data.
Thanks!

apt-get installation of snp-sites

Hello how to install latest version using apt-get on linux ubuntu? All I can get is v1.5.0.

Thanks!

Possibility to specify known reference

Hi, would there be possible to have an option to specify a reference so that the REF column respect this one and the genotype calls in the vcf as well?

Thanks a lot for this tool!

Have a nice day!
JC

functionality of -b option alone?

I am trying to understand what the -b option does when it is not paired with the -c option. I am working with a 4,640,668 bp long alignment.

When I run snp-sites on the alignment without either the -b or -c options, I get a resulting alignment of 1,733 sites. My understanding is that these are all of the variant sites in the full alignment, regardless of whether or not there is missing data (N, -, or ?) in some samples.
When I run snp-sites -c, I get 944 variant sites (ACGT-only sites), which implies there are 789 variant sites with missing data in at least one sample (1,733 - 944 = 789).
When I run snp-sites -cb, I get an alignment of 2,903,621 bp. My understanding is that this is the ACGT-only sites plus the monomorphic sites (944 + 2,902,677 = 2,903,621).
Based on the above logic, I assumed that running snp-sites -b would give me all of the 1,733 variant sites (both ACGT-only and those with missing data) plus the monomorphic sites (1,733 + 2,902,677 = 2,904,410).
However, when I do run snp-sites -b I get the complete 4,640,668 bp alignment.

Am I missing something about what the -b option is doing? Any help would be much appreciated. Thank you!

Linuxbrew installation issue (links issue)

[rbutler@genomics 85]$ brew install snp-sites
==> Installing snp-sites from homebrew/science
==> Installing dependencies for homebrew/science/snp-sites: patchelf
==> Installing homebrew/science/snp-sites dependency: patchelf
==> Downloading https://linuxbrew.bintray.com/bottles/patchelf-0.9_1.x86_64_linux.bottle.tar.gz

################################################################## 100.0%

==> Pouring patchelf-0.9_1.x86_64_linux.bottle.tar.gz
/home/rbutler/.linuxbrew/Cellar/patchelf/0.9_1: 6 files, 1.2M
==> Installing homebrew/science/snp-sites
==> Downloading https://linuxbrew.bintray.com/bottles-science/snp-sites-2.2.0.x86_64_linux.bottle.tar.gz

################################################################## 100.0%

==> Pouring snp-sites-2.2.0.x86_64_linux.bottle.tar.gz
Error: An unexpected error occurred during the brew link step
The formula built, but is not symlinked into /home/rbutler/.linuxbrew
No such file or directory @ realpath_rec - /home/linuxbrew
Error: No such file or directory @ realpath_rec - /home/linuxbrew
[rbutler@genomics 85]$ brew update
Already up-to-date.
[rbutler@genomics 85]$ brew tap homebrew/science
[rbutler@genomics 85]$ brew install snp-sites
Warning: homebrew/science/snp-sites-2.2.0 already installed, it's just not linked

Recommended workflow?

What is the recommended workflow to produce a VCF file when starting with two unaligned bacterial whole-genome fasta files, each approximately 3 million nucleotides? Any particular aligner and file format conversion utilities you can recommend?

Pipe into snp-sites

From RT:
Hi there.

Thanks for the snpsites program, very helpfull.
I havent been able to pipe (|) into snp-sites yet, how can this be done? Otherwise I would strongly recommend it.

The example on https://github.com/sanger-pathogens/snp-sites makes no sense to me, which command was used here?

Thanks.

Missing data in vcf output format

Hello,

I was using your tool to obtain the SNPs in a fasta alignment and using vcf as an output format. I noticed that in cases where my reference is some nucleotide (A, C, G or T), samples that have missing data (N), will have a 0 - becoming REF genotype - and won't be coded as missing data anymore.

For example:

#CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | BID_1 | CHS_2 | LAM_1 | LAM_2 | LAM_3 |
1 | 3032469 | . | C | T | . | . | . | GT | 0 | 0 | 0 | 0 | 1 |

When this position in the alignment is:

BID_1
C
CHS_2
C
LAM_1
N
LAM_2
N
LAM_3
T

I was wondering if this is the normal behavior? Or should I code missing data in another format (? or -), so that missing data will be properly noticed by the tool?

Thanks,
Mafalda

No ./configure in release tarball despite docs

2.4.0 has nno configure script

I had to still to autoreconf -i -f

FYI - updated brew package to 2.x series

FYI - https://github.com/Homebrew/homebrew-science/pull/2945

snp-sites -c doesn't produce recognisable alignment for IQtree

Hello,

I am trying to generate a core genome tree for a bacterial plant pathogen local outbreak (Ralstonia solanacearum) using the output of IQtree with the output of snp-sites -c. I have tried generating an alignment with snippy-core with a reference strain and with de novo assembly alignment done with mafft but in both of these cases IQtree crashes by not recognising the input file as an alignment. I suspect that it has something to do with the output from snp-sites -c being just a multifasta and therefore unrecognisable to IQtree but my understanding was that this functionality of iqtree is specifically for snp-sites. I have now tried it with another data set and I get the same result.

I have attached the snp alignment from snippy fed to iqtree after snp-sites -c, also fconst output and iqtree error log files.
The commands I used to generate the files are:
$ snp-sites -C core.full.aln > fconst_output.txt
$ snp-sites -c core.full.aln > snp-sites.aln
$ iqtree -fconst fconst_output.txt -s snp-sites.txt

With output from snippy:
fconst_output.txt
iqtree_error.log
snp-sites.txt

Let me know if more information is needed!
Thank you!

snp-sites docker install issue

Hi,

I have downloaded Docker on my mac (Mojave OS10.14.6), trying to pull the snp-sites container using the command found here: https://quay.io/repository/biocontainers/snp-sites

I unfortunately get the following error:

~$docker pull quay.io/biocontainers/snp-sites
Using default tag: latest
Error response from daemon: manifest for quay.io/biocontainers/snp-sites:latest not found: manifest unknown: manifest unknown

Am I doing something wrong here?

Thanks!

sanger-pathogens / snp-sites Goto Github PK

snp-sites's People

Contributors

Stargazers

Watchers

Forkers

snp-sites's Issues

msa2vcf

snp-sites

Problem

################################################################## 100.0%

################################################################## 100.0%

Recommend Projects

Recommend Topics

Recommend Org