Coder Social home page Coder Social logo

gatb / discosnp Goto Github PK

View Code? Open in Web Editor NEW
35.0 35.0 20.0 53.76 MB

DiscoSnp is designed for discovering all kinds of SNPs (not only isolated ones), as well as insertions and deletions, from raw set(s) of reads.

Home Page: https://gatb.inria.fr/software/discosnp/

License: GNU Affero General Public License v3.0

CMake 1.08% Shell 17.99% Python 54.22% C++ 25.46% C 1.14% Dockerfile 0.10%

discosnp's People

Contributors

cdeltel avatar chriou avatar clemaitre avatar genscale-admin avatar natir avatar pgtb33 avatar pierrepeterlongo avatar rchikhi avatar rizkg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

discosnp's Issues

error: there was a problem with readFileName Dumping

Hi,

I am trying to call SNPS from raw fasta files in my directory using the following command:

/my/discoSnp/path/discoSnpRAD/run_discoSnpRad.sh -r my_fof.txt

where my_fof.txt contains a list of files that created using the following command:

ls -d $PWD/*.fasta > my_fof.txt

and contains all the fasta files in my current working directory with their full path.

I keep getting the error "there was a problem with readFileName Dumping"

Do you know why this is?

Thanks!

Using Disco SNP to find sex markers; calling on a cohort with males and females

Hi,
I was wondering if anyone has used this tool to genotype multiple individuals. I'm trying to use this tool to find a sex marker in a species with no reference genome. It runs fine on a single sample, but how do you do joint genotyping? Do you merge input fasta files together? Do you have to cluster unitigs together after genotyping each sample individually? Thanks - Robert

update:
I just realized that the VCF file contains two samples (G1 and G2) as I put the R1 and R2 reads for a single sample on separate lines of the input read.txt file. I think this answers my own question. THere should be a way of writing the read.txt input file to get R1 and R2 for multiple sames in the same VCF and genotypes called per sample.

Distinguishing inherited variants from de novo variants

Greetings!

I had no issues installing and running discoSnp++ on my data set. This was a pleasant surprise since bioinformatics software can often be problematic. Thanks!

I have a question about interpreting and parsing the results. I analyzed a trio (mother, father, and child) and I'm interested in finding the de novo variants in the child. I called variants on all 3 individuals simultaneously using the "fof of fofs" configuration strategy. The samples are labeled G1, G2, and G3 in *coherent.vcf file. I'm pretty confident I've figured out which labels apply to which individuals, and now I'm looking for the de novo variants. My plan is to pull out records where filter=PASS and GT=0/0,01/,0/0 for dad, kid, and mom. It looks like the variants are already sorted by rank.

Is this the correct strategy? Anything I should keep in mind?

Thanks!

Unable to open VCF file in IGV - SAMPLE metainformation line

Hello,

I was unable to open the VCF file (with the filename suffix coherent_for_IGV.vcf) produced by discosnp++ in IGV.

The following error message was reported by IGV:

Error loading /mnt/data/uebergabe/gras-di-ref/discosnp_ref/grasdi/bpasc/discoRes_k_31_c_3_D_100_P_3_b_0_coherent_for_IGV.vcf: Unable to parse header with error: Invalid VCFSimpleHeaderLine: key=SAMPLE name=null, for input source: /mnt/data/uebergabe/gras-di-ref/discosnp_ref/grasdi/bpasc/discoRes_k_31_c_3_D_100_P_3_b_0_coherent_for_IGV.vcf

When I manually removed the meta-information line ##SAMPLE=file://discoRes_k_31_c_3_D_100_P_3_b_0_coherentBWA_MEM.sam , the VCF file could be successfully imported by IGV.

I'm new to Discosnp so I don't know if the behavior of IGV may have changed. Has anyone else encountered this? As far as I can tell the header is unchanged between the original VCF and the VCF formatted for IGV. Thanks for your help.

Software versions:

  • IGV v2.16.1 for Linux with bundled Java
  • discosnp v2.6.2 from Bioconda
  • OS: Ubuntu 20.04.5 LTS

```Number=0/1``` invalid vcf?

The VCF v 4.1 standard does not seem to support setting Number=0/1 as is done here:

VCF.write('##INFO=<ID=XA,Number=0/1,Type=String,Description="Other mapping positions (chromosome_position). Position is negative in case of Reverse alignment. The position designs the starting position of the alignment, not the position of the variant itself.">\n')

This line breaks attempts at loading the vcf using e.g. PyVCF. Setting Number=. seems to be a supported way of solving this issue.

EXCEPTION: cannot open ./trashme_3684_dsk_partitions_gatb/parts.360

Hello,

I previously ran DiscoSnp (reference free) with c 3, k31, b0, no low complexity successfully, but the genotyping rate is very low 0.036 after filtering. I tried changing to k51 and b1 to try to improve that, but I'm getting the below error. Why is the genotyping rate so low in the first run and what can I do to improve it? How can I try different parameters without getting this error? I can see the folder trashme_3684_dsk_partitions_gatb was created, but the run has crashed.

[DSK: Pass 1/2, Step 1: partitioning ] 25 % elapsed: 207 min 7 sec remaining: 621 min 22 sec cpu: 686.9 % mem: [2744, 2744, 2744] MB Error: can't create output directory (./trashme_3684_dsk_partitions_gatb/)
debug, doesexist:0created directory ./trashme_3684_dsk_partitions_gatb/

EXCEPTION: cannot open ./trashme_3684_dsk_partitions_gatb/parts.360 Too many open files
there was a problem with graph construction$ reset

Thanks.

Call to python 2 in run_discoSNP++.sh

Could it be possible to replace python by python3 at line 69 of run_discoSNP++.sh so all the project depends on python3.
The change doesn't to have an impact.

https://github.com/GATB/DiscoSnp/blob/master/run_discoSnp%2B%2B.sh#L69

-EDIR=$( python -c "import os.path; print(os.path.dirname(os.path.realpath(\"${BASH_SOURCE[0]}\")))" ) # as suggested by Philippe Bordron 
+ EDIR=$( python3 -c "import os.path; print(os.path.dirname(os.path.realpath(\"${BASH_SOURCE[0]}\")))" ) # as suggested by Philippe Bordron 

Another solution would be to not use python3 here:

-EDIR=$( python -c "import os.path; print(os.path.dirname(os.path.realpath(\"${BASH_SOURCE[0]}\")))" ) # as suggested by Philippe Bordron 
+EDIR=$( dirname $( realpath ${BASH_SOURCE[0]} ) )

I find this better as it re-becomes a bash-only script (except the project calls ofc).

For information: we're running on minimal environments and always omit the python2 which is deprecated.

SNP ID in VCF file with reference genome

Hello,
I calculated the same data set with different reference genomes. The fasta file with the de novo SNPs is created before mapping to the reference genome. Can I compare the SNP IDs from 2 vcf with different reference genomes but the same data set?

In other words if I have the SNP SNP_lower_path_178064, then is 178064 same ID in vcf file?

best regards
Stephan

Mixing of 0-based and 1-based position in the outputed vcf file of DiscoSNP++

Hello Pierre,

I use the bioconda discoSNP++ version (2.5.4).

I found a bug in the given positions of the SNPs in the vcf output of discoSNP++ (except if I missunderstood something).

When there are several SNPs on the same sequence, their positions are 1-based, and where there is only one SNP, it is 0-based.

Here are an extract of the discoRes_k_31_c_3_D_100_P_3_b_0_coherent.vcf:

SNP_higher_path_99998   55      99998_1 A       G       .       .       Ty=SNP;Rk=1;UL=24;UR=11;CL=24;CR=11;Genome=.;Sd=.       GT:DP:PL:AD:HQ  1|1:18:364,58,5:0,18:0,67       0|0:37:6,116,744:37,0:72,0
SNP_higher_path_99998   75      99998_2 G       A       .       .       Ty=SNP;Rk=1;UL=24;UR=11;CL=24;CR=11;Genome=.;Sd=.       GT:DP:PL:AD:HQ  1|1:18:364,58,5:0,18:0,67       0|0:37:6,116,744:37,0:72,0
SNP_higher_path_99998   101     99998_3 A       C       .       .       Ty=SNP;Rk=1;UL=24;UR=11;CL=24;CR=11;Genome=.;Sd=.       GT:DP:PL:AD:HQ  1|1:18:364,58,5:0,18:0,67       0|0:37:6,116,744:37,0:72,0
SNP_higher_path_99996   684     99996   A       T       .       .       Ty=SNP;Rk=1;UL=192;UR=277;CL=654;CR=778;Genome=.;Sd=.   GT:DP:PL:AD:HQ  1/1:19:384,61,5:0,19:0,70       0/0:13:5,43,264:13,0:72,0
SNP_higher_path_99994   209     99994   A       T       .       .       Ty=SNP;Rk=1;UL=57;UR=10;CL=179;CR=252;Genome=.;Sd=.     GT:DP:PL:AD:HQ  0/0:7:5,25,144:7,0:70,0 1/1:12:244,40,5:0,12:0,72
SNP_higher_path_99991   205     99991   C       G       .       .       Ty=SNP;Rk=1;UL=115;UR=173;CL=175;CR=205;Genome=.;Sd=.   GT:DP:PL:AD:HQ  0/0:18:5,58,364:18,0:70,0       1/1:22:444,70,5:0,22:0,73
SNP_higher_path_99990   94      99990_1 A       G       .       .       Ty=SNP;Rk=1;UL=63;UR=109;CL=63;CR=287;Genome=.;Sd=.     GT:DP:PL:AD:HQ  1|1:17:344,55,5:0,17:0,69       0|0:14:5,46,284:14,0:71,0
SNP_higher_path_99990   95      99990_2 G       C       .       .       Ty=SNP;Rk=1;UL=63;UR=109;CL=63;CR=287;Genome=.;Sd=.     GT:DP:PL:AD:HQ  1|1:17:344,55,5:0,17:0,69       0|0:14:5,46,284:14,0:71,0

And the corresponding discoRes_k_31_c_3_D_100_P_3_b_0_coherent.fa:

>SNP_higher_path_99998
ataactgcaccattttcaacccaaACACTACTCCACAAGTAAAAAGTCACCCAAAACAAATCAAGTTCAAACACGAACAAGCGACGGCAACCTCAACAGTAATTATGGCTCTCATTAGCAACTAACGTTCGcacaaggttca
>SNP_lower_path_99998
ataactgcaccattttcaacccaaACACTACTCCACAAGTAAAAAGTCACCCAAGACAAATCAAGTTCAAACACAAACAAGCGACGGCAACCTCAACAGTCATTATGGCTCTCATTAGCAACTAACGTTCGcacaaggttca

>SNP_higher_path_99996
agcatggatggaagtgccttgaatagctgtacttggtggtgcctcctagtgtatatgtcgtctttgattatccctgagaacagaaaaagaggaacaatttatggggagtggtgttgttatttcagatgatgctgaggaagccttttctcgtgctctcaagcttttgttggtttgtgattcttctccttctcaatcttgtgtctgatgtttgtgacttgagctattatttatacgcttcaacgcacatggttaatctttttgtttcctgcatcttgtagagtaaaagcttggtgaatgatggccagcatgtcactcttgtccacagtggagcacgaccaatctggcagaaatcatgttgagtagttggcatctaggttttattttgtaatttattcttacatgcagttgttttttttagctcaatttattctttctgcgatgatgagatatagaagttccttgagcgtggtttattatctttcaaagttatctgtggatgaaagattgcagttagatatcttttttgttgcattatagtttggatttcaccgtgtctagtcatcatcatctcagttctttctggattttctagttttactagacctctgaaacaaatacatcaacattgcggataaaacaaaacaatagaaaggtATTCTCTTTTGTAGTCTCTAGTGTAGTGTTACTTTTCTCAGAATTTGCATTTGATGACCTGaatccttttcggttagtcatattttactagtaaatgataaacttatgttctttgttgaactaattttgtgaggaaaatgcttcgagctctgaattgcatatgaaattaatgagatcttttaatacatggaactttttgggtacagtagtacttttcaacagtttcctggatacaggataaaccaatccaatttacatatcaatccaaaatgcttgtcaatttagaactaatatgatggaacaaacttatgcagcaacaacgagaaacgatgaacctagaaagtaagagttgtccagttccctggagacttctgcagagctgggcattttgaattgattgagttggagcattgccgagggaggaaatccaaataccaatcataaactctactcagaatgttgttggggtggctgcacatcttcctcttattcatatggttttgcaaaacatgcctaaggtatgatccatgtcaggtacgttagacttcacacttgtcctctgcctcatgttcctgatacttctcgctatgcttaatattcttgtaccacttagcacatgctatgtatgcagccaaatctattaaggtcaaaccagctaatagaaaatagaatctgtctaaatgacctttgttaaggtttcctggtatccatccaggcatgttgtcagttgctgatattttcatcactatgctcacgagaaagctactcacgtagttccccaatgatatggatgtcatacatagtgcacttccaaagcttttaagtccatcaggtgcttg
>SNP_lower_path_99996
agcatggatggaagtgccttgaatagctgtacttggtggtgcctcctagtgtatatgtcgtctttgattatccctgagaacagaaaaagaggaacaatttatggggagtggtgttgttatttcagatgatgctgaggaagccttttctcgtgctctcaagcttttgttggtttgtgattcttctccttctcaatcttgtgtctgatgtttgtgacttgagctattatttatacgcttcaacgcacatggttaatctttttgtttcctgcatcttgtagagtaaaagcttggtgaatgatggccagcatgtcactcttgtccacagtggagcacgaccaatctggcagaaatcatgttgagtagttggcatctaggttttattttgtaatttattcttacatgcagttgttttttttagctcaatttattctttctgcgatgatgagatatagaagttccttgagcgtggtttattatctttcaaagttatctgtggatgaaagattgcagttagatatcttttttgttgcattatagtttggatttcaccgtgtctagtcatcatcatctcagttctttctggattttctagttttactagacctctgaaacaaatacatcaacattgcggataaaacaaaacaatagaaaggtATTCTCTTTTGTAGTCTCTAGTGTAGTGTTTCTTTTCTCAGAATTTGCATTTGATGACCTGaatccttttcggttagtcatattttactagtaaatgataaacttatgttctttgttgaactaattttgtgaggaaaatgcttcgagctctgaattgcatatgaaattaatgagatcttttaatacatggaactttttgggtacagtagtacttttcaacagtttcctggatacaggataaaccaatccaatttacatatcaatccaaaatgcttgtcaatttagaactaatatgatggaacaaacttatgcagcaacaacgagaaacgatgaacctagaaagtaagagttgtccagttccctggagacttctgcagagctgggcattttgaattgattgagttggagcattgccgagggaggaaatccaaataccaatcataaactctactcagaatgttgttggggtggctgcacatcttcctcttattcatatggttttgcaaaacatgcctaaggtatgatccatgtcaggtacgttagacttcacacttgtcctctgcctcatgttcctgatacttctcgctatgcttaatattcttgtaccacttagcacatgctatgtatgcagccaaatctattaaggtcaaaccagctaatagaaaatagaatctgtctaaatgacctttgttaaggtttcctggtatccatccaggcatgttgtcagttgctgatattttcatcactatgctcacgagaaagctactcacgtagttccccaatgatatggatgtcatacatagtgcacttccaaagcttttaagtccatcaggtgcttg

>SNP_higher_path_99994
ttttttcaattttggttgtgttgttttttttgagtgtagcattgcaggcaaaaacaatatacagttgggaatacatgtcaccagacggactggctttaaactccaagtggaatgaagctgagaaatatatctgcaatcctttatcaggggaagtcccattagaatgtttatctgcaaaaACACTAAGTGGAAGATCATTTCGACAATTAACCAATAAAATCACCATGTCTGCACCTTTGAtttatccttcacaatatcagtgtgctcgacgattcaatccaaaacctcttacaaaagtagtacctcatgtgcctccacatcaactacaaatacaaattccaagcaaaggtatatatacatatatacatttcctctctttccatttcctcattactacatatatatatacatactccgatttctaacttgtacgtttttattattcaatttagagagaattggtagcattacaagagatgtgggaacacagag
>SNP_lower_path_99994
ttttttcaattttggttgtgttgttttttttgagtgtagcattgcaggcaaaaacaatatacagttgggaatacatgtcaccagacggactggctttaaactccaagtggaatgaagctgagaaatatatctgcaatcctttatcaggggaagtcccattagaatgtttatctgcaaaaACACTAAGTGGAAGATCATTTCGACAATTATCCAATAAAATCACCATGTCTGCACCTTTGAtttatccttcacaatatcagtgtgctcgacgattcaatccaaaacctcttacaaaagtagtacctcatgtgcctccacatcaactacaaatacaaattccaagcaaaggtatatatacatatatacatttcctctctttccatttcctcattactacatatatatatacatactccgatttctaacttgtacgtttttattattcaatttagagagaattggtagcattacaagagatgtgggaacacagag

>SNP_higher_path_99991
cattctccctaagcgcctctcttggcccagtcaccatcggttcattaccagcaagcatatttgttatcaattgcttgatttcatccatggcaccaacaactttcgaatccacacttctgctcagttccgagatggatccaacaacaccttccatggatcctttcaaattgtcaatCTCCCTGCGCATTTCCTGCAAGCTAGTGTTCTGATCACCCACACTCACAGGTCGAAAGAGCtctgaagctctgataccaacattacaactagcttcagagaacttctaattgattaagacttgctaattctttacttttctcaacaacgaacttcaactcattccttccaatctttttatagagtattacaagtaataggaaatttggaggaaaaaacagtaacaactaccttccgaaatgaacagtaaacttctaacagctttcc
>SNP_lower_path_99991
cattctccctaagcgcctctcttggcccagtcaccatcggttcattaccagcaagcatatttgttatcaattgcttgatttcatccatggcaccaacaactttcgaatccacacttctgctcagttccgagatggatccaacaacaccttccatggatcctttcaaattgtcaatCTCCCTGCGCATTTCCTGCAAGCTAGTGTTGTGATCACCCACACTCACAGGTCGAAAGAGCtctgaagctctgataccaacattacaactagcttcagagaacttctaattgattaagacttgctaattctttacttttctcaacaacgaacttcaactcattccttccaatctttttatagagtattacaagtaataggaaatttggaggaaaaaacagtaacaactaccttccgaaatgaacagtaaacttctaacagctttcc

>SNP_higher_path_99990
ccttacttttaaacttttgacaattcatactcgtgatacgttttaattccatatggctatcaaATCCATCAATAGTATATGGACTCTAGTGTTAGCTTTTATCTCACCTTTTGAATTTCTCAACATcttctacagctactgaagatggagtatagatattgctttcatctcaatatctataaagtaaatattgaagtattgagtatcgtatcgaaataatttcttatatatacaactaatacattatacacctgcaaacgcttaagctcatatagtcttcacatataataattagctatatactatttgtcaataaccaaaatattctgaaactcacattttcaattatttcatattacttgcgtttgcctttccaattatgttcatgattccaacactttaaaaaaaaaaaa
>SNP_lower_path_99990
ccttacttttaaacttttgacaattcatactcgtgatacgttttaattccatatggctatcaaATCCATCAATAGTATATGGACTCTAGTGTTGCTTTTTATCTCACCTTTTGAATTTCTCAACATcttctacagctactgaagatggagtatagatattgctttcatctcaatatctataaagtaaatattgaagtattgagtatcgtatcgaaataatttcttatatatacaactaatacattatacacctgcaaacgcttaagctcatatagtcttcacatataataattagctatatactatttgtcaataaccaaaatattctgaaactcacattttcaattatttcatattacttgcgtttgcctttccaattatgttcatgattccaacactttaaaaaaaaaaaa

The real position are:

SNP_higher_path_99998   55      99998_1 A       G       .       .       Ty=SNP;Rk=1;UL=24;UR=11;CL=24;CR=11;Genome=.;Sd=.       GT:DP:PL:AD:HQ  1|1:18:364,58,5:0,18:0,67       0|0:37:6,116,744:37,0:72,0
SNP_higher_path_99998   75      99998_2 G       A       .       .       Ty=SNP;Rk=1;UL=24;UR=11;CL=24;CR=11;Genome=.;Sd=.       GT:DP:PL:AD:HQ  1|1:18:364,58,5:0,18:0,67       0|0:37:6,116,744:37,0:72,0
SNP_higher_path_99998   101     99998_3 A       C       .       .       Ty=SNP;Rk=1;UL=24;UR=11;CL=24;CR=11;Genome=.;Sd=.       GT:DP:PL:AD:HQ  1|1:18:364,58,5:0,18:0,67       0|0:37:6,116,744:37,0:72,0
SNP_higher_path_99996   685     99996   A       T       .       .       Ty=SNP;Rk=1;UL=192;UR=277;CL=654;CR=778;Genome=.;Sd=.   GT:DP:PL:AD:HQ  1/1:19:384,61,5:0,19:0,70       0/0:13:5,43,264:13,0:72,0
SNP_higher_path_99994   210     99994   A       T       .       .       Ty=SNP;Rk=1;UL=57;UR=10;CL=179;CR=252;Genome=.;Sd=.     GT:DP:PL:AD:HQ  0/0:7:5,25,144:7,0:70,0 1/1:12:244,40,5:0,12:0,72
SNP_higher_path_99991   206     99991   C       G       .       .       Ty=SNP;Rk=1;UL=115;UR=173;CL=175;CR=205;Genome=.;Sd=.   GT:DP:PL:AD:HQ  0/0:18:5,58,364:18,0:70,0       1/1:22:444,70,5:0,22:0,73
SNP_higher_path_99990   94      99990_1 A       G       .       .       Ty=SNP;Rk=1;UL=63;UR=109;CL=63;CR=287;Genome=.;Sd=.     GT:DP:PL:AD:HQ  1|1:17:344,55,5:0,17:0,69       0|0:14:5,46,284:14,0:71,0
SNP_higher_path_99990   95      99990_2 G       C       .       .       Ty=SNP;Rk=1;UL=63;UR=109;CL=63;CR=287;Genome=.;Sd=.     GT:DP:PL:AD:HQ  1|1:17:344,55,5:0,17:0,69       0|0:14:5,46,284:14,0:71,0

Each SNP where the ID is XXXXX is 0-based, and when it's XXXXX_X it's 1-based.

I confirmed this on a 1 million SNPs vcf file.
My command line was
run_discoSnp++.sh --max_threads 16 -T -r ../file.fof

Do you have an idea of where this come from?

Best regards,
Jordi

Multiple reference alleles output

Hello,

I'm running DiscoSNP with -c 3 --branching 0 --no_low_complexity with a reference genome. Many SNPs that map to genome are showing multiple reference alleles (see below example). I've used DiscoSNP before both with and without genome, but never faced this issue.

Scaffold_1 ACGGCA
Scaffold_1 AG
Scaffold_1 AGA
Scaffold_1 AGAAAGGCGGAAAAACATTCGAAAGATAGTG

QUESTION- Is it possible to re-start a run?

Hi,

This may seem like a silly questions. But I was running discosnp on a computer cluster that killed the job in the KISSREADS module for exceeding the memory requested. Is it possible to just start back up with the command for the module?

Thanks.
Craig

vcf for IGV wont load in IGV

I'm using a freshly (yesterday) compiled version of DiscoSNP from the master branch.
IGV is v2.3 (but I also tested with beta 3.0).
The error I'm getting is:
The provided VCF is malformed at approximately line number 405: Duplicate allele added to VariantContext: A, for input source: /mydata/discoRes_k_109_c_auto_D_100_P_1_b_0_coherent_for_IGV.vcf

discoRes_k_109_c_auto_D_100_P_1_b_0_coherent_for_IGV.vcf.zip

DiscoSnp VCF record; Ref allele does not match Assembly Used

Hello,
Thanks for this great tool I just have a concern.

I ran DiscoSNP using the following command:

Running discoSnp++ 2.3.X, in directory /home/robert/tools/DiscoSnp with following parameters:
read_sets=mahi_file.list
prefix=discoRes_k_31_c_3
c=3
C=2147483647
k=31
b=0
d=1
D=40
s=
P=3
p=discoRes
G=/home/robert/assembly/MAHI_genome.fasta
e=
starting date=Thu 28 Oct 2021 09:15:24 AM ADT

I also generated a second VCF by mapping the fasta to the same reference as follows:

ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -M -t 10 -v 3 -R @rg\tID:F_003_\tLB:L001\tSM:F_003_\tPL:ILLUMINA /home/robert/assembly/MAHI_genome.fasta.gz /data/MAHI/F_003_i/F-003-i_S82_L002_R1_001.fastq.gz /data/MAHI/F_003/F-003-_S82_L002_R2_001.fastq.gz

The only difference was that for the VCF generated by read mapping the assembly was bgzipped.

I then wanted to do some concordance between the VCFs using the VCF generated by read mapping as TRUTH and the DiscoSnp VCF as the CALL set.

AT this point I noticed that for some records the ref alleles do not match. For example, here's a TRUTH VCF record:

ptg000002l 799357 . C A 4761.33 PASS AB=0.548571;ABP=6.59633;AC=28;AF=0.33;AN=86;AO=176;CIGAR=1X;DP=511;DPB=511;DPRA=0.994422;EPP=5.42853;EPPR=44.2023;GTI=2;LEN=1;MEANALT=1.08;MQM=60;MQMR=60;NS=49;NUMALT=1;ODDS=0.545245;PAIRED=0.988636;PAIREDR=0.99696;PAO=0;PQA=0;PQR=0;PRO=0;QA=6338;QR=11961;RO=329;RPL=81;RPP=5.42853;RPPR=3.54492;RPR=95;RUN=1;SAF=106;SAP=19.0002;SAR=70;SRF=192;SRP=22.976;SRR=137;TYPE=snp;technology.ILLUMINA=1 GT:AD:AO:DP:FT:GQ:PL:QA:QR:RO ./.:1,0:0:1:DP4:11:0,3,37:0:37:1 1/1:0,6:6:6:PASS:19:203,18,0:222:0:0 0/1:7,6:6:13:PASS:99:164,0,198:222:259:7 0/1:2,7:7:9:PASS:54:210,0,43:259:74:2 0/0:15,0:0:15:PASS:53:0,45,481:0:531:15 0/0:12,0:0:12:PASS:44:0,36,403:0:444:12 0/1:3,4:4:7:PASS:90:116,0,82:148:111:3 0/0:8,0:0:8:PASS:32:0,24,270:0:296:8 0/0:10,0:0:10:PASS:38:0,30,313:0:344:10 1/1:0,9:9:9:PASS:28:292,27,0:321:0:0 0/1:4,8:8:13:PASS:99:232,0,99:296:148:4 0/0:11,0:0:11:PASS:41:0,33,359:0:395:11 1/1:0,11:11:11:PASS:34:370,33,0:407:0:0 0/0:7,0:0:7:PASS:29:0,21,226:0:247:7 0/1:6,9:9:15:PASS:99:258,0,147:333:210:6 0/0:9,0:0:9:PASS:35:0,27,303:0:333:9 0/1:4,3:3:7:PASS:82:82,0,116:111:148:4 0/0:18,0:0:18:PASS:62:0,54,603:0:666:18 0/1:5,12:12:17:PASS:99:341,0,119:432:185:5 1/1:0,21:21:21:PASS:64:657,63,0:727:0:0 0/0:9,0:0:9:PASS:35:0,27,292:0:321:9 0/0:4,0:0:4:PASS:20:0,12,137:0:148:4 0/0:11,0:0:11:PASS:41:0,33,370:0:407:11 0/0:11,0:0:11:PASS:41:0,33,370:0:407:11 0/0:15,0:0:15:PASS:53:0,45,492:0:543:15 0/0:20,0:0:20:PASS:68:0,60,669:0:740:20 0/1:5,7:7:12:PASS:99:190,0,110:247:159:5 ./.:0,3:3:3:DP4:10:104,9,0:111:0:0 0/1:10,5:5:15:PASS:99:114,0,291:173:370:10 0/1:6,5:5:11:PASS:99:137,0,170:185:222:6 0/0:16,0:0:16:PASS:56:0,48,536:0:592:16 1/1:0,7:7:7:PASS:22:213,21,0:233:0:0 0/1:10,5:5:15:PASS:99:125,0,268:185:344:10 1/1:0,11:11:11:PASS:34:346,33,0:381:0:0 0/1:8,6:6:14:PASS:99:161,0,204:222:270:8 0/0:14,0:0:14:PASS:50:0,42,470:0:518:140/1:4,9:9:13:PASS:99:264,0,98:333:148:4 1/1:0,11:11:16:PASS:0:366,33,0:407:0:0 0/0:11,0:0:11:PASS:41:0,33,370:0:407:11 0/1:2,5:5:7:PASS:57:149,0,49:185:74:2 0/0:9,0:0:9:PASS:35:0,27,292:0:321:9 0/0:10,0:0:10:PASS:38:0,30,337:0:370:10 0/0:9,0:0:9:PASS:35:0,27,303:0:333:9 0/0:15,0:0:15:PASS:53:0,45,492:0:543:15 0/0:5,0:0:5:PASS:23:0,15,170:0:185:5

And here's the DiscoSnp record:

ptg000002l 799357 1589465 A G . PASS Ty=SNP;Rk=0.44587;UL=58;UR=50;CL=396;CR=395;Genome=C;Sd=-1 GT:DP:PL:AD:HQ 0/0:7:5,25,144:7,0:68,0 0/0:6:5,22,124:6,0:70,0 0/0:9:5,31,184:9,0:70,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 0/0:5:4,19,104:5,0:70,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 0/0:9:5,31,184:9,0:68,00/0:10:15,24,174:9,1:70,44 ./.:0:.,.,.:0,0:0,0 0/0:15:5,49,304:15,0:70,0 ./.:0:.,.,.:0,0:0,0 0/0:10:5,34,204:10,0:70,0 ./.:0:.,.,.:0,0:0,0 ./.:3:.,.,.:3,0:70,0 ./.:0:.,.,.:0,0:0,0 0/0:15:5,49,304:15,0:67,0 0/0:22:5,70,444:22,0:68,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 0/0:7:5,25,144:7,0:68,0 ./.:5:.,.,.:5,0:67,0 0/0:5:4,19,104:5,0:70,0 ./.:0:.,.,.:0,0:0,0 0/0:6:5,22,124:6,0:70,0 ./.:5:.,.,.:5,0:70,0 0/0:12:5,40,244:12,0:67,0 0/0:7:5,25,144:7,0:70,0 ./.:0:.,.,.:0,0:0,0 0/0:10:5,34,204:10,0:70,0 0/1:16:52,20,212:12,4:70,70 ./.:0:.,.,.:0,0:0,0 0/0:5:4,19,104:5,0:70,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0

And when I look at the assembly in this region I get this:

bedtools getfasta -fi MAHI_genome.fasta -bed var.bed

ptg000002l:799355-799359
CCTC

There's no A here. The start in the bed is 0-based so the second C is the REF being reported. I can't figure out what the problem is or why DiscoSnp is reporting a REF allele that does not appear to exist.

Any ideas? Thank You. - RObert

comparison between pointer and zero

DiscoSnp can't be compiled with CLANG 4.0 It bumped severity of such comparison to error.
tools/kissreads2/src/interface_libchash.cpp contains comparison between (long*) iterator and 0 in two places

/DiscoSnp/tools/kissreads2/src/interface_libchash.cpp:99:16: error: 
      ordered comparison between pointer and zero ('hash_iter' (aka 'long *')
      and 'int')
         if (iterator < 0)
             ~~~~~~~~ ^ ~
/DiscoSnp/tools/kissreads2/src/interface_libchash.cpp:133:16: error: 
      ordered comparison between pointer and zero ('hash_iter' (aka 'long *')
      and 'int')
         if (iterator < 0)
             ~~~~~~~~ ^ ~

Type of Variant (Ty) format error in .vcf

Hello,

I'm doing de novo snp calling and the .vcf file has formatting errors for Type of Variant (Ty). Some lines show Ty=SNP but some show TySNP. Please see below.

Thank you.

Command: ./run_discoSnp++.sh -r Test1/fof.txt -T -c 3 -u 24 &>> DiscoSNP_log.txt

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT G1 G2
SNP_higher_path_99972 483 99972 A T . . Ty=SNP;Rk=1;UL=31;UR=26;CL=452;CR=245;Genome=.;Sd=. GT:DP:PL:AD:HQ 0/0:7:5,25,144:7,0:70,0 1/1:18:364,58,5:0,18:0,68
SNP_higher_path_99962 35 99962 A T . . Ty=SNP;Rk=1;UL=4;UR=53;CL=4;CR=194;Genome=.;Sd=. GT:DP:PL:AD:HQ 0/0:5:4,19,104:5,0:67,0 1/1:12:244,40,5:0,12:0,70
SNP_higher_path_99933 591 99933_1 C T . . TySNP;Rk=1;UL=275;UR=22;CL=560;CR=22;Genome=.;Sd=. GT:DP:PL:AD:HQ 0|0:7:5,25,144:7,0:65,0 1|1:7:144,25,5:0,7:0,68
SNP_higher_path_99933 594 99933_2 G A . . TySNP;Rk=1;UL=275;UR=22;CL=560;CR=22;Genome=.;Sd=. GT:DP:PL:AD:HQ 0|0:7:5,25,144:7,0:65,0 1|1:7:144,25,5:0,7:0,68_

discoSnpRAD and short_read_connector

Hi there,

Apparently to run DiscoSnpRAD I need to have short_read_connector installed:

"src_path: absolute path to short_read_connector directory, containing the "short_read_connector.sh" file."

I installed it from https://github.com/GATB/short_read_connector, but there doesn't seem to be a short_read_connector.sh - instead, there are two .sh files, short_read_connector_counter.sh and short_read_connector_linker.sh.

I assume I'm missing something - what should I do?

Thanks and all the best,

Joanna

New release needed

Dear Pierre,

I've noticed you made some updates in the GitHb version (especially with DiscoSnpRad) but no new release has been done (to use with conda for example, @lecorguille). Would it be possible to make one please?
Thanks!

Best,
Komlan.

Options for .vcf filtering

Hello,

Firstly thank you for fixing issue #27. I tested it and the .vcf looks fine now.

I wanted some clarifications when using discosnp++ de novo (no reference genome), regarding the parameters output in the .vcf file:

  1. If I want to filter data based on Number of Samples with Data, SNP Quality, Allele Frequencies etc, how do I go about outputting this information to .vcf. I had a look at the vcf_creator_user_guide.pdf, but these parameters are not mentioned and the .vcf file doesn't contain them either.

  2. It seems that filter Pass / Multiple (variant mapping at unique position or on multiple positions) is only output when a reference genome is used. When there is no reference genome, isn't it still possible to classify unique and multiple mapping positions since reads are assigned to de novo contigs?

Thank you.

error: there was a problem with readFileName Dumping

Hi Pierre,

I am reopening this issue as it was not resolved.

Thank you for the suggestion, although both methods (what I initially ran and what you suggested) for me produce the same file where each line has the full path and the file name as such:

/my/current/path/file1.fasta
/my/current/path/file2.fasta

And I am still getting this error:

path/to/run_discoSnpRad.sh: line 553: /my/current/path../bin/read_file_names: No such file or directory
"there was a problem with readFileName Dumping"

I also tried putting the script in the directory with the files and just having a list of files without the full paths. This produced the same error.

I also tried running the simple_test.sh script and got the following error:

"there was a problem with graph construction$ reset
diff: discoRes_k_31_c_3_D_100_P_3_b_0_coherent.fa: No such file or directory
*** Test: FAILURE on diff .fa"

Best!
Coral

Error parsing sam input in VCF_creator.py

A small fraction of reads with split alignments fail when calling VCF, e.g., this file.

Traceback (most recent call last): File "/apps/unit/MikheyevU/DiscoSnp/2.3.0/scripts/VCF_creator.py", line 120, in <module> if int(listline1[1]) & 2048 :#checks if it's not a secondary alignment => means splitted aligned sequence IndexError: list index out of range

The input files are created using bwa mem with no fancy options.

Requires python3 but calls python2

When requiring python3, called executable should be python3 instead of python.

I have find this in "run_VCF_creator.sh " but as I saw some python in others files I guess there can be some unsafe behaviors (I didn't really checked it).

Note: it induces the build test error:

/home/averdier/work/jbaison_discosnp_ExomeCaptureData_20211118/tools/DiscoSnp/scripts/run_VCF_creator.sh -p discoRes_k_31_c_3_D_100_P_3_b_0_coherent.fa -o discoRes_k_31_c_3_D_100_P_3_b_0_coherent.vcf
This script requires python 3.0 or greater
 there was a problem with VCF creation. See how to use the "run_VCF_creator.sh" alone.
awk: cannot open discoRes_k_31_c_3_D_100_P_3_b_0_coherent.vcf (No such file or directory)
0a1,23
> ##fileformat=VCFv4.1
> ##source=VCF_creator
> ##SAMPLE=file://discoRes_k_31_c_3_D_100_P_3_b_0_coherent.fa
> ##REF=<ID=REF,Number=1,Type=String,Description="Allele of the path Disco aligned with the least mismatches">
> ##FILTER=<ID=MULTIPLE,Description="Mapping type : PASS or MULTIPLE or .">
> ##INFO=<ID=Ty,Number=1,Type=String,Description="SNP, INS, DEL or .">
> ##INFO=<ID=Rk,Number=1,Type=Float,Description="SNP rank">
> ##INFO=<ID=UL,Number=1,Type=Integer,Description="length of the unitig left">
> ##INFO=<ID=UR,Number=1,Type=Integer,Description="length of the unitig right">
> ##INFO=<ID=CL,Number=1,Type=Integer,Description="length of the contig left">
> ##INFO=<ID=CR,Number=1,Type=Integer,Description="length of the contig right">
> ##INFO=<ID=Genome,Number=1,Type=String,Description="Allele of the reference;for indel reference is . ">
> ##INFO=<ID=Sd,Number=1,Type=Integer,Description="Reverse (-1) or Forward (1) Alignement">
> ##INFO=<ID=XA,Number=.,Type=String,Description="Other mapping positions (chromosome_position). Position is negative in case of Reverse alignment. The position designs the starting position of the alignment, not the position of the variant itself.">
> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Cumulated depth accross samples (sum)">
> ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Phred-scaled Genotype Likelihoods">
> ##FORMAT=<ID=AD,Number=2,Type=Integer,Description="Depth of each allele by sample">
> ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
> #CHROM        POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  G1      G2
> SNP_higher_path_3     200      3      C       G       .       .       Ty=SNP;Rk=1;UL=86;UR=261;CL=169;CR=764;Genome=.;Sd=.    GT:DP:PL:AD:HQ  0/0:124:10,378,2484:124,0:0,0   1/1:134:2684,408,10:0,134:0,0
> SNP_higher_path_2     912      2      A       T       .       .       Ty=SNP;Rk=1;UL=86;UR=52;CL=881;CR=52;Genome=.;Sd=.      GT:DP:PL:AD:HQ  0/0:74:8,227,1484:74,0:0,0      1/1:86:1724,263,8:0,86:0,0
> SNP_higher_path_1     503      1      A       T       .       .       Ty=SNP;Rk=1;UL=472;UR=261;CL=472;CR=461;Genome=.;Sd=.   GT:DP:PL:AD:HQ  1/1:110:2204,335,9:0,110:0,0    0/0:114:9,347,2284:114,0:0,0
*** Test: FAILURE on diff .vcf

Problem with VCF creation: an error occurred in determining the filter of close snps (an unmapped SNP is "PASS")

Hi there,

I have been attempting to create a VCF file mapped to a reference database (formatted as a single .fasta file) and keep getting a KeyError: '3' message, terminating the run. I have attempted this using ./run_discoSnp++.sh and ./run_VCF_creator.sh, and both approaches yield the same error message. However, when I run ./run_discoSnp++.sh without attempting to map against the reference database, the VCF creator works fine. Could this be an issue with the formatting of the database.fasta file that I am using? I am not quite sure what the specific issue is. Below are the commands that I used (note that bwa is in my $PATH) and the output I received. Any help in resolving this issue would be greatly appreciated.

for run_discoSNP++: /home/noyes046/shared/tools/DiscoSnp/run_discoSnp++.sh -r test_reads.txt -G /home/noyes046/shared/databases/megares_v2.0/megares_full_database_v2.00.fasta

for VCF_creator: /home/noyes046/shared/tools/DiscoSnp/scripts/run_VCF_creator.sh -G /home/noyes046/shared/databases/megares_v2.0/megares_full_database_v2.00.fasta -p discoRes_k_31_c_3_D_100_P_3_b_0_coherent.fa -o test_output.vcf -w

Both produce this error message:

##use genome : /home/noyes046/shared/databases/megares_v2.0/megares_full_database_v2.00.fasta
##use disco SNPS : discoRes_k_31_c_3_D_100_P_3_b_0_coherent.fa
##output : test_output.vcf
...Indexation : Using the existing index...
ALIGNMENT: /panfs/roc/msisoft/bwa/0.7.17.CentOS7/bwa mem -h 80 -k 19 /home/noyes046/shared/databases/megares_v2.0/megares_full_database_v2.00.fasta discoRes_k_31_c_3_D_100_P_3_b_0_coherentbis.fasta > discoRes_k_31_c_3_D_100_P_3_b_0_coherentBWA_MEM.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 60518 sequences (3806986 bp)...
[M::mem_process_seqs] Processed 60518 reads in 1.691 CPU sec, 1.691 real sec
[main] Version: 0.7.17-r1188
[main] CMD: /panfs/roc/msisoft/bwa/0.7.17.CentOS7/bwa mem -h 80 -k 19 /home/noyes046/shared/databases/megares_v2.0/megares_full_database_v2.00.fasta discoRes_k_31_c_3_D_100_P_3_b_0_coherentbis.fasta
[main] Real time: 1.882 sec; CPU: 1.782 sec
!!! an error occurred in determining the filter of close snps (an unmapped SNP is "PASS")!!!
!!! Line where the error occurred !!!
['SNP_higher_path_19609|P_1:30_A/G|high|nb_pol_1|left_unitig_length_25|right_unitig_length_9|C1_7|C2_5|Q1_70|Q2_70|G1_0/1:109,11,109|G2_0/0:4,19,104|rank_0.45644', '16', 'MEG_6142|Drugs|Mycobacterium_tuberculosis-specific_Drug|Pyrazinamide-resistant_mutant|RPSA|RequiresSNPConfirmation', '324', '57', '55M6S', '', '0', '0', 'GTACGAGCGCGCCTGGGGCACCATCGAGGATCTCAAGGAGAAGGACGAGGCCGTCGCCGGC', '', 'NM:i:3', 'MD:Z:9T19C0G24', 'AS:i:40', 'XS:i:0']
['SNP_lower_path_19609|P_1:30_A/G|high|nb_pol_1|left_unitig_length_25|right_unitig_length_9|C1_7|C2_0|Q1_70|Q2_0|G1_0/1:109,11,109|G2_0/0:4,19,104|rank_0.45644', '16', 'MEG_6142|Drugs|Mycobacterium_tuberculosis-specific_Drug|Pyrazinamide-resistant_mutant|RPSA|RequiresSNPConfirmation', '324', '57', '55M6S', '', '0', '0', 'GTACGAGCGCGCCTGGGGCACCATCGAGGACCTCAAGGAGAAGGACGAGGCCGTCGCCGGC', '', 'NM:i:3', 'MD:Z:9T19C0G24', 'AS:i:40', 'XS:i:0']
!!! an error occurred in determining the filter of close snps (an unmapped SNP is "PASS")!!!
!!! Line where the error occurred !!!
['SNP_higher_path_8023|P_1:30_A/T|high|nb_pol_1|left_unitig_length_79|right_unitig_length_30|C1_7|C2_4|Q1_70|Q2_69|G1_0/0:16,19,135|G2_0/1:66,10,66|rank_0.40452', '0', 'MEG_6106|Drugs|Rifampin|Rifampin-resistant_beta-subunit_of_RNA_polymerase_RpoB|RPOB|RequiresSNPConfirmation', '1227', '47', '61M', '', '0', '0', 'CCACTTAGGTAACCGTCGTATTCGTTCAGTAGGGGAATTATTACAAAACCAATTCCGTATC', '', 'NM:i:4', 'MD:Z:12T8C11T26T0', 'AS:i:45', 'XS:i:27']
['SNP_lower_path_8023|P_1:30_A/T|high|nb_pol_1|left_unitig_length_79|right_unitig_length_30|C1_1|C2_4|Q1_71|Q2_71|G1_0/0:16,19,135|G2_0/1:66,10,66|rank_0.40452', '0', 'MEG_6106|Drugs|Rifampin|Rifampin-resistant_beta-subunit_of_RNA_polymerase_RpoB|RPOB|RequiresSNPConfirmation', '1227', '29', '61M', '', '0', '0', 'CCACTTAGGTAACCGTCGTATTCGTTCAGTTGGGGAATTATTACAAAACCAATTCCGTATC', '', 'NM:i:5', 'MD:Z:12T8C8A2T26T0', 'AS:i:40', 'XS:i:28']
!!! an error occurred in determining the filter of close snps (an unmapped SNP is "PASS")!!!
['SNP_higher_path_15008|P_1:30_A/G|high|nb_pol_1|left_unitig_length_21|right_unitig_length_4|C1_10|C2_9|Q1_70|Q2_68|G1_0/1:648,49,109|G2_0/1:884,79,86|rank_0.074389', '4', '', '0', '0', '', '', '0', '0', 'AGTTTGTTGGTGAGGTAATGGCTCACCAAGACGATGACGGGTAGCCGGCCTGAGAGGGCGA', '', 'AS:i:0', 'XS:i:0']
['SNP_lower_path_15008|P_1:30_A/G|high|nb_pol_1|left_unitig_length_21|right_unitig_length_4|C1_37|C2_49|Q1_68|Q2_68|G1_0/1:648,49,109|G2_0/1:884,79,86|rank_0.074389', '0', 'MEG_6964|Drugs|Tetracyclines|Tetracycline-resistant_16S_ribosomal_subunit_protein|TET16S|RequiresSNPConfirmation', '237', '23', '61M', '', '0', '0', 'AGTTTGTTGGTGAGGTAATGGCTCACCAAGGCGATGACGGGTAGCCGGCCTGAGAGGGCGA', '', 'NM:i:7', 'MD:Z:2C9G4G5T8T0T24T2', 'AS:i:30', 'XS:i:0']
Traceback (most recent call last):
File "/home/noyes046/shared/tools/DiscoSnp/scripts/VCF_creator.py", line 174, in
main()
File "/home/noyes046/shared/tools/DiscoSnp/scripts/VCF_creator.py", line 139, in main
table=MappingTreatement(variant_object,vcf_field_object,nbGeno)
File "/panfs/roc/groups/11/noyes046/shared/tools/DiscoSnp/scripts/functionObjectVCF_creator.py", line 47, in MappingTreatement
variant_object.RetrievePolymorphismFromHeader()
File "/panfs/roc/groups/11/noyes046/shared/tools/DiscoSnp/scripts/ClassVCF_creator.py", line 225, in RetrievePolymorphismFromHeader
self.upper_path.listNucleotideReverse.append(self.ReverseComplement(ntUp))
File "/panfs/roc/groups/11/noyes046/shared/tools/DiscoSnp/scripts/ClassVCF_creator.py", line 186, in ReverseComplement
return ''.join(self.char2char[c] for c in nucleotide)[::-1]
File "/panfs/roc/groups/11/noyes046/shared/tools/DiscoSnp/scripts/ClassVCF_creator.py", line 186, in
return ''.join(self.char2char[c] for c in nucleotide)[::-1]
KeyError: '3'
there was a problem with the VCF creation

problem running disco snp

Hi

I am working on developing SNPs for species without a reference genome and trying to use DiscoSnp++ to find SNPs in raw reads and used the following code;

run_discoSnp++.sh -r alba.fof -T

alba.fof is a file of file text is in the working directory, it contains another two file of files in each line which in turn contains the read files. I have 4 texts, one containing the name of the other 3 texts on each line, which in turn contains the read files. it’s organized as follows;

fof. alba:
9_fof.txt
12_fof.txt
13_fof.txt
15_fof.txt

9_fof.txt:
9_S6_L004_R1_001_clean.fastq.gz
9_S6_L004_R2_001_clean.fastq.gz

12_fof.txt
12_S7_L004_R1_001_clean.fastq.gz
12_S7_L004_R2_001_clean.fastq.gz

13_fof.txt
13_S8_L004_R1_001_clean.fastq.gz
13_S8_L004_R2_001_clean.fastq.gz

15_fof.txt
15_S9_L004_R1_001_clean.fastq.gz
15_S9_L004_R2_001_clean.fastq.gz

I keep getting 2 error messages depending on whether I’m using zcat or not – see below

Error 1. (with zcat)

tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c line 509 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
#1: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Fint.c line 1652 in H5F_open(): unable to read superblock
major: File accessibilty
minor: Read failed
#2: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Fsuper.c line 411 in H5F__super_read(): file signature not found
major: File accessibilty
minor: Not an HDF5 file
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 460 in H5Gopen2(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5L.c line 790 in H5Lexists(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 299 in H5Gcreate2(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5L.c line 790 in H5Lexists(): not a location
major: Invalid arguments to routine
minor: Inappropriate type

Error 2. (without zcat)

tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c line 425 in H5Aopen(): unable to load attribute info from object header for attribute: 'version'
major: Attribute
minor: Can't open object
#1: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Aint.c line 433 in H5A__open(): unable to load attribute info from object header for attribute: 'version'
major: Attribute
minor: Can't open object
#2: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Oattribute.c line 515 in H5O__attr_open_by_name(): can't locate attribute: 'version'
major: Attribute
minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c line 704 in H5Aget_space(): not an attribute
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5S.c line 1013 in H5Sget_simple_extent_dims(): not a dataspace
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c line 662 in H5Aread(): not an attribute
major: Invalid arguments to routine
minor: Inappropriate type
EXCEPTION: Unable to open bank './9_S6_L004_R1_001_clean2.fastq.gz ' (if it is a list of files, perhaps some of the files inside don't exist)
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c line 425 in H5Aopen(): unable to load attribute info from object header for attribute: 'version'
major: Attribute
minor: Can't open object
#1: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Aint.c line 433 in H5A__open(): unable to load attribute info from object header for attribute: 'version'
major: Attribute
minor: Can't open object
#2: /scratchdir/builds/workspace/gatb-discosnp/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Oattribute.c line 515 in H5O__attr_open_by_name(): ca:

I am trying to run the job on virtual cluster. I don’t have experience with this, so I’m not sure if it has something to do with the module configuration or my usage.

Kindly help in solving this problem.

Best regards,
Kedra

Problem running discoSnpRad

Hi
I am working on developing SNPs for radseq data and trying to use DiscoSnpRad to find SNPs and used the following code;
run_discoSnpRad.sh -r a_fof.txt -S rconnector-v1.2.0-Source
The absolute path is used in a_fof.txt
I gain an error but run_discoSnpRad.sh -r -r test/fof.txt -S /root/rconnector-v1.2.0-Source/ work successfully. Error message is attached

HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c line 509 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Fint.c line 1652 in H5F_open(): unable to read superblock
major: File accessibilty
minor: Read failed
#2: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Fsuper.c line 411 in H5F__super_read(): file signature not found
major: File accessibilty
minor: Not an HDF5 file
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 460 in H5Gopen2(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5L.c line 790 in H5Lexists(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 299 in H5Gcreate2(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5L.c line 790 in H5Lexists(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5D.c line 119 in H5Dcreate2(): not a location ID
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5D.c line 372 in H5Dget_space(): not a dataset
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5S.c line 1013 in H5Sget_simple_extent_dims(): not a dataspace
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5S.c line 489 in H5Sclose(): not a dataspace
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c line 415 in H5Aopen(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c line 704 in H5Aget_space(): not an attribute
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5S.c line 1013 in H5Sget_simple_extent_dims(): not a dataspace
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c line 662 in H5Aread(): not an attribute
major: Invalid arguments to routine
minor: Inappropriate type
EXCEPTION: Unable to open bank 'a_fof.txt' (if it is a list of files, perhaps some of the files inside don't exist)
############################################################
#################### GRAPH CREATION #######################
############################################################
/root/DiscoSNP++-v2.6.2-Source/discoSnpRAD/../build/ext/gatb-core/bin/dbgh5 -in a_fof.txt_discoRad_k_31_c_3_D_0_P_5_m_5_removemeplease -out discoRad_k_31_c_3 -kmer-size 31 -abundance-min 3 -abundance-max 2147483647 -solidity-kind one -verbose 1 -skip-bcalm -skip-bglue -no-mphf
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c line 509 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Fint.c line 1652 in H5F_open(): unable to read superblock
major: File accessibilty
minor: Read failed
#2: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Fsuper.c line 411 in H5F__super_read(): file signature not found
major: File accessibilty
minor: Not an HDF5 file
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 460 in H5Gopen2(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5L.c line 790 in H5Lexists(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5G.c line 299 in H5Gcreate2(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5L.c line 790 in H5Lexists(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5D.c line 119 in H5Dcreate2(): not a location ID
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5D.c line 372 in H5Dget_space(): not a dataset
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5S.c line 1013 in H5Sget_simple_extent_dims(): not a dataspace
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5S.c line 489 in H5Sclose(): not a dataspace
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c line 415 in H5Aopen(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Gloc.c line 246 in H5G_loc(): invalid object ID
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c line 704 in H5Aget_space(): not an attribute
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5S.c line 1013 in H5Sget_simple_extent_dims(): not a dataspace
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: /root/DiscoSNP++-v2.6.2-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c line 662 in H5Aread(): not an attribute
major: Invalid arguments to routine
minor: Inappropriate type

EXCEPTION: Unable to open bank 'a_fof.txt_discoRad_k_31_c_3_D_0_P_5_m_5_removemeplease' (if it is a list of files, perhaps some of the files inside don't exist)
there was a problem with graph construction

Kindly help in solving this problem.

Best regards,
bifenggang

Equivalent information between discosnp and STACKS2

Hi there,

I am using DiscoSnpRAD to explore the structure and demography of my target organism. For the demography, I have used the dadi software on an SFS and would like to convert the results of my runs into biologically meaningful values.

The output of STACKS2, gives you the total number of sites, the total number of variant sites, and which of those are polymorphic. Does the total number of variant sites from stacks translate into the total number of variant bubbles from DiscoSnpRAD? If not, where can I find that equivalent information?

Cheers!

Trouble filtering with vcftools

Hi Pierre,

First of all, thanks for writing such a great program, we are choosing it over STACKS for a lot of our work at the moment. I am trying to filter my vcf for individuals with too much missing data with vcftools but am facing the following error:

Parameters as interpreted:
--vcf discoRad_k_15_c_auto_D_0_P_1_m_10_clustered.vcf
--out discoRad_k_15_c_auto_D_0_P_1_m_10_clustered
--recode
--remove-indv INDS_REMOVE.txt

Warning: Expected at least 2 parts in INFO entry: ID=Ty,Number=1,Type=String,Description="SNP, INS, DEL or .">
Warning: Expected at least 2 parts in INFO entry: ID=Ty,Number=1,Type=String,Description="SNP, INS, DEL or .">
Warning: Expected at least 2 parts in INFO entry: ID=XA,Number=.,Type=String,Description="Other mapping positions (chromosome_position). Position is negative in case of Reverse alignment. The position designs the starting position of the alignment, not the position of the variant itself.">
Excluding individuals in 'exclude' list
After filtering, kept 169 out of 169 Individuals
Outputting VCF file...
After filtering, kept 51347 out of a possible 51347 Sites
Run Time = 30.00 seconds

vcftools has an issue reading the header of the vcf files produced with discosnp. Do you have a work around for this?

Thanks in advance!

Coral

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.