zhixingfeng / igda Goto Github PK

Detect and phase minor SNVs from long-read sequencing data

License: GNU General Public License v2.0

C++ 72.41% C 1.26% Makefile 0.02% Shell 0.01% CMake 0.12% CSS 0.52% TeX 0.03% Perl 0.03% HTML 24.35% JavaScript 1.10% PHP 0.01% Less 0.15%

snvs oxford-nanopore pacbio-sequencing variants strains metagenome-deconvolution

igda's People

Contributors

Stargazers

Watchers

igda's Issues

Detecting Minor SNV - Missing $autopar variable

Hi Dr. Feng, when detecting minor SNVs, the program runs but does show a message of "igda_pipe_detect_ont: line 114: [: missing `]'". I looked into line 114 of igda_pipe_detect_ont, and it calls a $auto_par variable that is set to $OPTARG. Do you know what this error message is saying? I thought I inputted all required variables, and the program seems to be running smoothly, but I wanted to make sure. Thank you so much!

haplotype frequency

Hello
is there an option to store frequency, Avg Coverage, num reads, for each haplotype ... that would be nice to save it in the fasta header for each. thank you and congrats for all the good work!

Preprocessing not clear enough

Hello,

First we need to "Convert lower case letters to upper case in reference fasta" with fasta2upper infasta outfasta
Second we need to "Convert wildcard letters to N in reference fasta" with fastaclean fastafile outfafile
Third we need to "Realign reads aligned to the negative strand" with igda_align_ont infile(bam or sam file) reffile outfile nthread

Here there is something I don't understand ....
The output of the previous step is a cleaned fastq file but you ask a bam or sam file as input file for the mapping step ....
The infile is fastq not bam or sam right ?
The output of the mapping is sam file right ?

Best

ERROR: Unexpected character 'M' found

Hello! Thanks for developing such an amazing tool. However when I apply the IGDA to my mitochondria bam file , it throws an error.
Here are details:
I am running igda_pipe_detect and the parameters are all default.
After it starts the "getbamchrrange" step, there is an error: "ERROR: Unexpected character 'M' found."
I wonder if there is some solution to fix it? like slightly correct my alignment chromosome information 'chrM' or add some code so it can identify the chrM.
Thank you very much!

Context Models

For detecting minor SNVs, what are the differences between the different context models? There seem to be several ONT models, so I wasn't sure which one to select.

Final Output - IGV and Reconstructing Final Contigs

Hi Zhixing, I'm sorry for the additional questions, but I have two final ones regarding the output:

How did you group together the outputted contigs to the reference strain contigs in IGV like this:

In your iGDA output for the metagenome set of Borrelia, you said that you outputted 753 final contigs. If you didn't know what the metagenome set contained, how would you decipher which strains you have based on the 753 final contigs? Did you perhaps run an assembly on the contigs to try to determine the contained strains?

Detect minor SNVs not clear enough

Hello,

Can you please provide more detail on how to chose a contextmodel ?

Also what type of file should we use as contextmodel ?

boosting.conf ?
boosting.model ?
a folder ?
What folder ? :
train_A
train_C
train_G
train_T
or just ont or pacbio ?

Thanks

minimap2 not found

Hello. I'm very excited to try iGDA, but am not able to run the 'idga_align_ont' step. I installed iGDA using a fresh conda environment as described in the GitHub page:
conda install -c bioconda -c conda-forge -c zhixingfeng igda

When I run 'igda_align_ont' I receive the error message "/scicomp/home-pure/ymw8/Software/miniconda3/envs/igda/bin/igda_align_ont: line 24: minimap2: command not found".

I do have several 'minimap2...' executables in the conda environment (such as "minimap2_nanopore"), but none are simply "minimap2".

Please advise. The full error message is below. Thanks -- Adam.

(igda) me> igda_align_ont barcode45.trim.primerclipped.bam iGDA/WuCoV_MN908947_clean.fasta iGDA/barcode45.igda 4
infile=barcode45.trim.primerclipped.bam
reffile=iGDA/WuCoV_MN908947_clean.fasta
outfile=iGDA/barcode45.igda
nthread=4
get forward sequences from sam/bam
run minimap2
~/Software/miniconda3/envs/igda/bin/igda_align_ont: line 24: minimap2: command not found
filter realigned sam
[main_samview] fail to read the header from "iGDA/barcode45.igda.realign.sam".

Regarding the maximum length of contigs

Hi @zhixingfeng ,

I am using igda for detecting sublines of one bacteria in a pooled PacBio Sequel II genomic data. The average length of reads is 8kb. I find that igda gives me very few contigs (3-6) for most of datasets. I do not expect 100s of sublines, but I do expect at least 2-3. In my igda results, I get 6 contigs placed very very far apart from each other on a 5Mb genome. From variant analysis, I know that the loci covered by these contigs are either deleted or have a high frequency of mutation (~80%). Do you think that these results could be due to smaller length of reads, thus limiting the maximum achievable length of the contig by igda? Or is it that I am doing something wrong (I followed exactly the commands suggested on the usage page for Sequel II reads)? Any insights will be super useful.

usability tweaks

Hello. Thank you for developing and sharing iGDA. While testing it out, I noticed a few features that could make it more useful to end-users, in case you decide to do additional development on iGDA.

Report SNVs in a standard human-readable format (e.g. 484K). Right now, they are encoded "for each integer x, floor(x/4) = 0-based locus, and x modulo 4 = base (0=A, 1=C, 2=G, 3=T)"
Report proportions, both for the individual SNV (detected_snv.vcf) and for the contigs if possible.
As I understand the current report format, there is no indication whether or not the 'reference genotype' is present in the sample; genotypes are only reported if they have SNVs.

Thanks
Adam

zhixingfeng / igda Goto Github PK

igda's People

Contributors

Stargazers

Watchers

igda's Issues

Detecting Minor SNV - Missing $autopar variable

haplotype frequency

Preprocessing not clear enough

ERROR: Unexpected character 'M' found

Context Models

Final Output - IGV and Reconstructing Final Contigs

Detect minor SNVs not clear enough

minimap2 not found

Regarding the maximum length of contigs

usability tweaks

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent