Coder Social home page Coder Social logo

zhixingfeng / igda Goto Github PK

View Code? Open in Web Editor NEW
13.0 3.0 0.0 116.97 MB

Detect and phase minor SNVs from long-read sequencing data

License: GNU General Public License v2.0

C++ 72.41% C 1.26% Makefile 0.02% Shell 0.01% CMake 0.12% CSS 0.52% TeX 0.03% Perl 0.03% HTML 24.35% JavaScript 1.10% PHP 0.01% Less 0.15%
snvs oxford-nanopore pacbio-sequencing variants strains metagenome-deconvolution

igda's People

Contributors

zhixingfeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

igda's Issues

Detecting Minor SNV - Missing $autopar variable

Hi Dr. Feng, when detecting minor SNVs, the program runs but does show a message of "igda_pipe_detect_ont: line 114: [: missing `]'". I looked into line 114 of igda_pipe_detect_ont, and it calls a $auto_par variable that is set to $OPTARG. Do you know what this error message is saying? I thought I inputted all required variables, and the program seems to be running smoothly, but I wanted to make sure. Thank you so much!

haplotype frequency

Hello
is there an option to store frequency, Avg Coverage, num reads, for each haplotype ... that would be nice to save it in the fasta header for each. thank you and congrats for all the good work!

Preprocessing not clear enough

Hello,

First we need to "Convert lower case letters to upper case in reference fasta" with fasta2upper infasta outfasta
Second we need to "Convert wildcard letters to N in reference fasta" with fastaclean fastafile outfafile
Third we need to "Realign reads aligned to the negative strand" with igda_align_ont infile(bam or sam file) reffile outfile nthread

Here there is something I don't understand ....
The output of the previous step is a cleaned fastq file but you ask a bam or sam file as input file for the mapping step ....
The infile is fastq not bam or sam right ?
The output of the mapping is sam file right ?

Best

ERROR: Unexpected character 'M' found

Hello! Thanks for developing such an amazing tool. However when I apply the IGDA to my mitochondria bam file , it throws an error.
Here are details:
I am running igda_pipe_detect and the parameters are all default.
After it starts the "getbamchrrange" step, there is an error: "ERROR: Unexpected character 'M' found."
I wonder if there is some solution to fix it? like slightly correct my alignment chromosome information 'chrM' or add some code so it can identify the chrM.
Thank you very much!

Context Models

For detecting minor SNVs, what are the differences between the different context models? There seem to be several ONT models, so I wasn't sure which one to select.

Final Output - IGV and Reconstructing Final Contigs

Hi Zhixing, I'm sorry for the additional questions, but I have two final ones regarding the output:

  1. How did you group together the outputted contigs to the reference strain contigs in IGV like this:

Screen Shot 2021-12-16 at 1 00 12 PM

  1. In your iGDA output for the metagenome set of Borrelia, you said that you outputted 753 final contigs. If you didn't know what the metagenome set contained, how would you decipher which strains you have based on the 753 final contigs? Did you perhaps run an assembly on the contigs to try to determine the contained strains?

Detect minor SNVs not clear enough

Hello,

Can you please provide more detail on how to chose a contextmodel ?

Also what type of file should we use as contextmodel ?

  • boosting.conf ?
  • boosting.model ?
  • a folder ?
  • What folder ? :
  • train_A
  • train_C
  • train_G
  • train_T
  • or just ont or pacbio ?

Thanks

minimap2 not found

Hello. I'm very excited to try iGDA, but am not able to run the 'idga_align_ont' step. I installed iGDA using a fresh conda environment as described in the GitHub page:
conda install -c bioconda -c conda-forge -c zhixingfeng igda

When I run 'igda_align_ont' I receive the error message "/scicomp/home-pure/ymw8/Software/miniconda3/envs/igda/bin/igda_align_ont: line 24: minimap2: command not found".

I do have several 'minimap2...' executables in the conda environment (such as "minimap2_nanopore"), but none are simply "minimap2".

Please advise. The full error message is below. Thanks -- Adam.

(igda) me> igda_align_ont barcode45.trim.primerclipped.bam iGDA/WuCoV_MN908947_clean.fasta iGDA/barcode45.igda 4
infile=barcode45.trim.primerclipped.bam
reffile=iGDA/WuCoV_MN908947_clean.fasta
outfile=iGDA/barcode45.igda
nthread=4
get forward sequences from sam/bam
run minimap2
~/Software/miniconda3/envs/igda/bin/igda_align_ont: line 24: minimap2: command not found
filter realigned sam
[main_samview] fail to read the header from "iGDA/barcode45.igda.realign.sam".

Regarding the maximum length of contigs

Hi @zhixingfeng ,

I am using igda for detecting sublines of one bacteria in a pooled PacBio Sequel II genomic data. The average length of reads is 8kb. I find that igda gives me very few contigs (3-6) for most of datasets. I do not expect 100s of sublines, but I do expect at least 2-3. In my igda results, I get 6 contigs placed very very far apart from each other on a 5Mb genome. From variant analysis, I know that the loci covered by these contigs are either deleted or have a high frequency of mutation (~80%). Do you think that these results could be due to smaller length of reads, thus limiting the maximum achievable length of the contig by igda? Or is it that I am doing something wrong (I followed exactly the commands suggested on the usage page for Sequel II reads)? Any insights will be super useful.

usability tweaks

Hello. Thank you for developing and sharing iGDA. While testing it out, I noticed a few features that could make it more useful to end-users, in case you decide to do additional development on iGDA.

  1. Report SNVs in a standard human-readable format (e.g. 484K). Right now, they are encoded "for each integer x, floor(x/4) = 0-based locus, and x modulo 4 = base (0=A, 1=C, 2=G, 3=T)"
  2. Report proportions, both for the individual SNV (detected_snv.vcf) and for the contigs if possible.
  3. As I understand the current report format, there is no indication whether or not the 'reference genotype' is present in the sample; genotypes are only reported if they have SNVs.

Thanks
Adam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.